## Statistics with <code>pandas</code>

Recall some functions such as <code>np.mean()</code> and <code>np.max()</code>; these functions can be used to calculate a row's or column's statistics. Say you want to know what's the average <i>hardness</i> of the different minerals:

In [None]:
df4['hardness'].mean()

4.666666666666667

Often we don't know much about the data, and printing all the values is inconvenient. In that case, it's wise to take a look at some of its attributes first.

See the labels of the columns and rows.

In [None]:
print(df4.columns)
print('----------------------')
print(df4.index)

Index(['hardness', 'sp. gr.', 'cleavage'], dtype='object')
----------------------
Index(['Amphibole', 'Biotite', 'Calcite', 'Dolomite', 'Feldspars', 'Garnet',
       'Graphite', 'Kyanite', 'Muscovite', 'Pyroxene', 'Quartz',
       'Sillimanite'],
      dtype='object', name='name')


`df4.info` is similar to `print(df4.info)`.

In [None]:
df4.info

<bound method DataFrame.info of              hardness  sp. gr.  cleavage
name                                    
Amphibole        5.50    2.800       Two
Biotite          2.75    3.000       One
Calcite          3.00    2.720     Three
Dolomite         3.00    2.850     Three
Feldspars        6.00    2.645       Two
Garnet           7.00    3.900  Fracture
Graphite         1.50    2.300       One
Kyanite          6.00    4.010       One
Muscovite        2.25    2.930       One
Pyroxene         5.50    3.325       Two
Quartz           7.00    2.650  Fracture
Sillimanite      6.50    3.230       One>

### Deep copying a <code>DataFrame</code>

As you have seen in Notebook 4, shallow copies can be troublesome if you're not aware of it. In <b><code>pandas</code></b>, it's the same story.<br><br>To make a deep copy use the <b><code>DataFrame.copy(deep=True)</code></b> function.

In [None]:
df_deep = df4.copy(deep=True)

Now, altering <b><code>df_deep</code></b> will not alter <b><code>df4</code></b>; and vice-versa.

## Additional study material:

* [Official pandas Documentation (Series)](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series)
* [Official pandas Documentation (DataFrame)](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame)
* [Note on processing speed](https://ggbaker.ca/732/content/pandas-speed.html)
* [Real Python](https://realpython.com/pandas-python-explore-dataset/)


<h4>After this Notebook you should be able to:</h4>


- understand <b><code>Series</code></b> and <b><code>DataFrames</code></b>
- concatenate <b><code>DataFrames</code></b>
- work with different labels of a <b><code>DataFrame</code></b>
- drop unwanted rows and columns
- access and modify values within your <b><code>DataFrame</code></b>
- import data into a <b><code>pandas DataFrame</code></b>
- manipulate a <b><code>DataFrame</code></b> in several important ways