## Importing data into <code>DataFrames</code> and exploring its attributes

<b><code>pandas</code></b> provides many functions to import data into <b><code>dataframes</code></b>, such as <b><code>read_csv()</code></b> to read delimited text files, or <b><code>read_excel()</code></b> for Excel or OpenDocument spreadsheets. <b><code>read_csv()</code></b> provides options that allow you to filter the data, such as specifying the separator/delimiter, the lines that form the headers, which rows to skip, etc. Let's analyze the <b><code>mineral_properties.txt</code></b>. Below a screenshot of it:<br><br>
    
```{image} 01.png
:alt: rectangle
:width: 400px
:align: center
```

below we import the `.txt`: 
* we indicate that the separator is the comma `"sep=','"`
* we indicate the header (what should be the columns names) is in the second line `"header=[1]"`
* we indicate to not skip any rows `"skiprows=None"`
* we indicate the first column should be the index of the rows `"index_col=0"`


In [66]:
file_location = ("mineral_properties.txt")
df4 = pd.read_csv(file_location, sep=',', header=[1],
                  skiprows=None, index_col=0)
df4

Unnamed: 0_level_0,hardness,sp. gr.,cleavage
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Amphibole,5.5,2.8,Two
Biotite,2.75,3.0,One
Calcite,3.0,2.72,Three
Dolomite,3.0,2.85,Three
Feldspars,6.0,2.645,Two
Garnet,7.0,3.9,Fracture
Graphite,1.5,2.3,One
Kyanite,6.0,4.01,One
Muscovite,2.25,2.93,One
Pyroxene,5.5,3.325,Two


Note that if we try to call any of the columns from `df4` we will get an error.

In [67]:
df4['hardness']

KeyError: 'hardness'

Do you know why?

<b>Answer</b> ...

In case you were not able to answer the above question, let's look into <b><code>df4.columns</code></b>

In [None]:
df4.columns

Index([' hardness', ' sp. gr.', ' cleavage'], dtype='object')

You see there are spaces at the beginning of each column name... this happens because that's how people usually type, with commas followed by a space. We could use the <b><code>skipinitialspace = True</code></b> from the <b><code>pd.read_csv()</code></b> function to avoid this. Let's try it out:

In [None]:
df4 = pd.read_csv(file_location + 'mineral_properties.txt',sep=',',header=[1], 
                  skiprows=None, index_col=0, skipinitialspace=True)
print(df4.columns)

Index(['hardness', 'sp. gr.', 'cleavage'], dtype='object')


Ok, much better!