import pandas as pd
import numpy as np
2.1. Series
#
We start with pandas Series
, since a DataFrame
is made out of Series
; retrieving a row or a column from a DataFrame
results in a Series
. A Series
object is a numpy
ndarray used to hold one-dimensional data, like a list. We create a Series
object using its constructor pd.Series()
. It can be called by using a list that you want to convert into a pandas series
. Unlike numpy
arrays, a Series
may hold data of different types.
First we create a list containing elements of various types. Then we construct a Series
and a numpy.array
using our list. Finally we compare the type of each element in my_series
and my_nparray
.
my_list = ['begin', 2, 3/4, "end"]
my_series = pd.Series(data=my_list)
my_nparray = np.array(my_list)
for i in range(len(my_list)):
print('----type of each element----')
print(f'my_series element #{i} => {type(my_series[i])}')
print(f'my_nparray element #{i} => {type(my_nparray[i])}\n')
----type of each element----
my_series element #0 => <class 'str'>
my_nparray element #0 => <class 'numpy.str_'>
----type of each element----
my_series element #1 => <class 'int'>
my_nparray element #1 => <class 'numpy.str_'>
----type of each element----
my_series element #2 => <class 'float'>
my_nparray element #2 => <class 'numpy.str_'>
----type of each element----
my_series element #3 => <class 'str'>
my_nparray element #3 => <class 'numpy.str_'>
As expected, the numpy
array changed all elements to one type; in this case, strings. As mentioned in Section 5.1, in Notebook 5, a numpy
array cannot hold data of different types. Note that a pandas series is, by default, printed more elaborately.
print(my_series)
print('-----------------')
print(my_nparray)
0 begin
1 2
2 0.75
3 end
dtype: object
-----------------
['begin' '2' '0.75' 'end']
The values of a series can be accessed and sliced using the iloc()
function:
my_series.iloc[1:]
1 2
2 0.75
3 end
dtype: object
my_series.iloc[[2,len(my_series)-1]]
2 0.75
3 end
dtype: object
2.1.1. Labeling Series
#
So far we have referred to values within a list or array using indexing, but that might be confusing. With pandas Series
, you can refer to your values by labeling their indices. Labels allow you to access the values in a more informative way, similar to dictionaries; depicted in Section 2.3, in Notebook 2.
We create the indices of the same size as the list since we want to construct our Series object with and use the index option in the Series constructor. Note that our entries can be called both ways
my_index_labels = ["My first entry", "1","2","END"]
my_labeled_Series = pd.Series(data=my_list, index=my_index_labels)
print(my_labeled_Series[0] == my_labeled_Series["My first entry"])
True
C:\Users\gui-win10\AppData\Local\Temp\ipykernel_24848\1771852785.py:4: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
print(my_labeled_Series[0] == my_labeled_Series["My first entry"])
pandas
can automatically create labels of indices if we construct a Series
using a dictionary with labeled entries.
my_dictionary = {"a list": [420, 10],"a float": 380/3,
"a list of strings": ["first word", "Second Word", "3rd w0rd"] }
my_Series = pd.Series(my_dictionary)
print(my_Series)
a list [420, 10]
a float 126.666667
a list of strings [first word, Second Word, 3rd w0rd]
dtype: object
We can access an element within the list labeled "a list of strings"
by using its label followed by the desired index
my_Series["a list of strings"][1]
'Second Word'
Warning
When using pandas
, it’s a good idea to try and avoid for
loops or iterative solutions; pandas
usually has a faster solution than iterating through its elements.