2.4. Statistics with pandas#

Recall some functions such as np.mean() and np.max(); these functions can be used to calculate a row’s or column’s statistics. Say you want to know what’s the average hardness of the different minerals:

df4['hardness'].mean()
4.666666666666667

Often we don’t know much about the data, and printing all the values is inconvenient. In that case, it’s wise to take a look at some of its attributes first.

See the labels of the columns and rows.

print(df4.columns)
print('----------------------')
print(df4.index)
Index(['hardness', 'sp. gr.', 'cleavage'], dtype='object')
----------------------
Index(['Amphibole', 'Biotite', 'Calcite', 'Dolomite', 'Feldspars', 'Garnet',
       'Graphite', 'Kyanite', 'Muscovite', 'Pyroxene', 'Quartz',
       'Sillimanite'],
      dtype='object', name='name')

df4.info is similar to print(df4.info).

df4.info
<bound method DataFrame.info of              hardness  sp. gr.  cleavage
name                                    
Amphibole        5.50    2.800       Two
Biotite          2.75    3.000       One
Calcite          3.00    2.720     Three
Dolomite         3.00    2.850     Three
Feldspars        6.00    2.645       Two
Garnet           7.00    3.900  Fracture
Graphite         1.50    2.300       One
Kyanite          6.00    4.010       One
Muscovite        2.25    2.930       One
Pyroxene         5.50    3.325       Two
Quartz           7.00    2.650  Fracture
Sillimanite      6.50    3.230       One>

2.4.1. Deep copying a DataFrame#

As you have seen in Notebook 4, shallow copies can be troublesome if you’re not aware of it. In pandas, it’s the same story.

To make a deep copy use the DataFrame.copy(deep=True) function.

df_deep = df4.copy(deep=True)

Now, altering df_deep will not alter df4; and vice-versa.

2.5. Additional study material:#

After this Notebook you should be able to:

  • understand Series and DataFrames

  • concatenate DataFrames

  • work with different labels of a DataFrame

  • drop unwanted rows and columns

  • access and modify values within your DataFrame

  • import data into a pandas DataFrame

  • manipulate a DataFrame in several important ways