2.4. Statistics with pandas
#
Recall some functions such as np.mean()
and np.max()
; these functions can be used to calculate a row’s or column’s statistics. Say you want to know what’s the average hardness of the different minerals:
df4['hardness'].mean()
4.666666666666667
Often we don’t know much about the data, and printing all the values is inconvenient. In that case, it’s wise to take a look at some of its attributes first.
See the labels of the columns and rows.
print(df4.columns)
print('----------------------')
print(df4.index)
Index(['hardness', 'sp. gr.', 'cleavage'], dtype='object')
----------------------
Index(['Amphibole', 'Biotite', 'Calcite', 'Dolomite', 'Feldspars', 'Garnet',
'Graphite', 'Kyanite', 'Muscovite', 'Pyroxene', 'Quartz',
'Sillimanite'],
dtype='object', name='name')
df4.info
is similar to print(df4.info)
.
df4.info
<bound method DataFrame.info of hardness sp. gr. cleavage
name
Amphibole 5.50 2.800 Two
Biotite 2.75 3.000 One
Calcite 3.00 2.720 Three
Dolomite 3.00 2.850 Three
Feldspars 6.00 2.645 Two
Garnet 7.00 3.900 Fracture
Graphite 1.50 2.300 One
Kyanite 6.00 4.010 One
Muscovite 2.25 2.930 One
Pyroxene 5.50 3.325 Two
Quartz 7.00 2.650 Fracture
Sillimanite 6.50 3.230 One>
2.4.1. Deep copying a DataFrame
#
As you have seen in Notebook 4, shallow copies can be troublesome if you’re not aware of it. In pandas
, it’s the same story.
To make a deep copy use the DataFrame.copy(deep=True)
function.
df_deep = df4.copy(deep=True)
Now, altering df_deep
will not alter df4
; and vice-versa.
2.5. Additional study material:#
After this Notebook you should be able to:
understand
Series
andDataFrames
concatenate
DataFrames
work with different labels of a
DataFrame
drop unwanted rows and columns
access and modify values within your
DataFrame
import data into a
pandas DataFrame
manipulate a
DataFrame
in several important ways