To start using pandas
, you need to import it into your Python script or Jupyter Notebook. The standard way to import Pandas is as follows:
import pandas as pd
The two primary data structures provided by Pandas are:
Series
: ASeries
in Pandas is a one-dimensional labeled array that can hold any data type. It is similar to a column in a spreadsheet or a single column of a SQL table. Each element in a Series is associated with a unique label called anindex
.DataFrame
: A two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table.
2.2. Series
#
Here is an example where we create a list of data containing some numbers and an index list with corresponding labels. We then use these lists to create a Series using pd.Series() function.
data = [10, 20, 30, 40, 50]
index = ['A', 'B', 'C', 'D', 'E']
s = pd.Series(data, index=index)
print("\nOriginal series:")
print (s)
Original series:
A 10
B 20
C 30
D 40
E 50
dtype: int64
We also change the created labels without having any effect on the data as follows:
s.index = ['X', 'Y', 'Zebra', 'W', 'V']
print("\nUpdated series:")
print(s)
Updated series:
X 10
Y 20
Zebra 30
W 40
V 50
dtype: int64
Two helpful functions when working with pandas are the iloc[ ]
and loc[ ]
functions. For more information see the table below:
Function |
Description |
Example |
---|---|---|
|
Integer-based indexing and selection |
s.iloc[0] accesses the first row of a DataFrame |
s.iloc[2:4] accesses a slice of rows in a DataFrame |
||
|
Label-based indexing and selection |
s.loc[‘X’] accesses a row labeled ‘A’ in a DataFrame |
s.loc[[‘X’, ‘W’]] accesses multiple rows in a DataFrame |
s.iloc[2:4]
Zebra 30
W 40
dtype: int64