5. Numpy#

5.1. Introduction#

The numpy package is one of the main packages when it comes to working with arrays and matrices in Python, making it indispensable to process and visualize scientific data. Its assortment of routines facilitates operations such as mathematical, logical, linear algebra, Fourier transforms, and much more. In this section, you will learn some of the most used numpy functions to work with multidimensional array objects.

As always, let’s import the package we will use

import numpy as np

The following functions will be discussed in this Notebook:

  • np.array()

  • np.zeros()

  • np.asarray()

  • np.shape()

  • np.min()

  • np.max()

  • np.mean()

  • np.sort()

  • np.linspace()

  • np.arange()

  • np.argmax()

  • np.argmin()

  • np.where()

  • np.astype()

  • np.dot()

  • np.transpose()

  • np.loadtxt()

  • np.sum()

  • np.cos()

  • np.sin()

  • np.sqrt()

In Section 2.3, of Notebook 2, you have already encountered lists, which are created with square brackets []. Arrays are the numpy equivalent of lists, with a few characteristic traits:

- numpy arrays can only store one type of element,
- numpy arrays take up much less memory than lists,
- numpy arrays have a much better runtime behavior,
- it is easier to work with multi-dimensional numpy arrays than with multi-dimensional lists

5.2 One-Dimensional arrays#

np.array(), np.asarray()#

So, how do you create a numpy 1-Dimensional (1D) array? There are a few ways to do it…

  • Option 1 - from scratch with np.array() similar to a list.

arr1 = np.array([1,2,3])
print('arr1 = {}, its type is {}'.format(arr1,type(arr1)))
arr1 = [1 2 3], its type is <class 'numpy.ndarray'>
  • Option 2 - from an existing list with np.array().

Create the list first and check its type. Then create the array A_1 from the list L_1 and check its type.

L_1 = [1,2,3,5,7,11,13]
print('L_1 = {} and its type is {}\n'.format(L_1,type(L_1)))

A_1 = np.array(L_1)
print('A_1 = {} and its type is {}'.format(A_1, type(A_1)))
L_1 = [1, 2, 3, 5, 7, 11, 13] and its type is <class 'list'>

A_1 = [ 1  2  3  5  7 11 13] and its type is <class 'numpy.ndarray'>
  • Option 3 - from an existing list with np.asarray()

L_1 = [1,2,3,5,7,11,13]
print('L_1 = {} and its type is {}\n'.format(L_1,type(L_1)))

A_1 = np.asarray(L_1)
print('A_1 = {} and its type is {}'.format(A_1, type(A_1)))
L_1 = [1, 2, 3, 5, 7, 11, 13] and its type is <class 'list'>

A_1 = [ 1  2  3  5  7 11 13] and its type is <class 'numpy.ndarray'>

From the above examples, you can’t really determine the difference between using np.array() or np.asarray(). Nonetheless, there is a very important one, similar to the = and copy conundrum discussed in Notebook 4. When generating an array from a list, both functions do pretty much the same. However, when generating an array from another array, their differences stand out.

First, let’s check the ID of arr1

print('arr1 ID is {}'.format(id(arr1)))
arr1 ID is 3116429230896

Now, let’s make two new arrays from arr1, using both functions

arr_array = np.array(arr1)
arr_asarray = np.asarray(arr1)

print('arr_array = {} and its ID is {}\n'.format(arr_array,id(arr_array)))
print('arr_asarray = {} and its ID is {}'.format(arr_asarray, id(arr_asarray)))
arr_array = [100   2   3] and its ID is 2342009211568

arr_asarray = [100   2   3] and its ID is 2342276386128

Hmm… it seems that the ID of arr_asarray is the same as the original arr1. Which means they are the same variable! Altering one will alter the other one as well. Let’s try it out.

arr1[0] = 'hello'
print(arr_asarray)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 1
----> 1 arr1[0] = 'hello'
      2 print(arr_asarray)

ValueError: invalid literal for int() with base 10: 'hello'

Oops… it didn’t work. Why do you think it didn’t work?

Answer: …

Change the first element of arr1. Then print arr_array and arr_asarray to see if the first element changed.

arr1[0] = 100

print('arr1 = {}\n'.format(arr1))
print('arr_array = {}\n'.format(arr_array))
print('arr_asarray = {}'.format(arr_asarray))
arr1 = [100   2   3]

arr_array = [100   2   3]

arr_asarray = [100   2   3]

Yep, our theory was right: arr1 and arr_asarray are indeed the same (but arr_array is not!). Therefore, altering arr1[0] will alter arr_asarray[0] in the same way.

Final check that they are indeed the same

print(arr1 is arr_asarray)
True

np.zeros()#

In case you already know the size of the array you will need, it is common to initialize it with zeros first using the np.zeros() function, as shown below.

Set a limit when printing the huge arrays we will generate.

Hide code cell source
np.set_printoptions(threshold=1) 

I know I will need an array with 100000 elements so I create it full of zeros first. Then, I assign the values I need to each element. in this example, I only wrote a for loop to assign random integer numbers between 0 and 9 to it. Note the use of range(len(my_arr)), we use this often to specify the range of the for loop to be of the same size as some array.

my_arr = np.zeros(100000)
print('my_arr with a bunch of zeros \n{}\n'.format(my_arr))
print('#######################')

import random

for i in range(len(my_arr)): 
    my_arr[i] = random.randint(0,9)
    
print('\nmy_arr with random numbers \n{}'.format(my_arr))
my_arr with a bunch of zeros 
[0. 0. 0. ... 0. 0. 0.]

#######################

my_arr with random numbers 
[1. 0. 9. ... 7. 2. 4.]

Note that these arrays still have \(100000\) elements, but due to our first line of code we truncated the print function to not print it completely — otherwise you would have to scroll a lot. :P

np.min(), np.max() and np.mean()#

Numpy also provides various packages to help you process your data. You can, for instance, find out what is the minimum value of an array, or its mean. Your task is to find the minimum, maximum, and mean values of an array.

Find the minimum, maximum and mean values of A_1 and print the results.

A_1_min = np.min(A_1)
A_1_max = np.max(A_1)
A_1_mean = np.mean(A_1)

print(f'The minimum value of A_1 is {A_1_min} \n')
print(f'The maximum value of A_1 is {A_1_max} \n')
print(f'The mean value of A_1 is {A_1_mean} \n')
The minimum value of A_1 is 1 

The maximum value of A_1 is 13 

The mean value of A_1 is 6.0 

np.arange()#

Another useful function of the numpy module is np.arange(). First, let’s see in the documentation what it does.

It reads:

arange([start,] stop[, step,], dtype=None, *, like=None)

Return evenly spaced values within a given interval.


To make a number range you need to choose:
1) the starting point,
2) the endpoint,
3) the interval between each point

The reason why it reads [start,] stop[, step,] with square brackets, is that the start and the step can be omitted. If not specified, start = 0 and step = 1, by default.

Warning

Your endpoint is not included in the array. If you want to include the endpoint in the array, you have to specify the stop to be endpoint + step. This will be clearer in the following examples.

omitted start and step

arr = np.arange(5) 
print('arr =', arr)
arr = [0 1 2 3 4]

As mentioned, the endpoint (5) is omitted. If you would like to include it:

arr = np.arange(5 + 1)
print('arr =', arr)
arr = [0 1 2 3 4 5]

Now, without omiting start nor step. Without endpoint.

arr = np.arange(1, 2, 0.01)
print('arr =', arr)
arr = [1.   1.01 1.02 ... 1.97 1.98 1.99]

Including endpoint

arr = np.arange(1, 2 + 0.01, 0.01) 
print('arr =', arr)
arr = [1.   1.01 1.02 ... 1.98 1.99 2.  ]

You can also generate a descending array, by using negative steps

arr = np.arange(10,0,-1)
print('arr =', arr)
arr = [10  9  8 ...  3  2  1]

np.sort()#

You can also sort an array in the crescent order using np.sort()

sorted_arr = np.sort(arr)
print('sorted_arr =', sorted_arr)
sorted_arr = [ 1  2  3 ...  8  9 10]

np.sum()#

As the name clearly states, np.sum() returns the sum of an array. Let’s try it out:

arr = ([1,2,3])
my_sum = np.sum(arr)
print(f'The sum of the array is {my_sum}')
The sum of the array is 6

5.3 Two-Dimensional arrays and matrices#

As mentioned earlier, numpy is not solely built for 1D arrays \(-\) it’s built to work with multidimensional arrays! So, let’s hop into 2D arrays and matrices, which you have already encountered in Linear Algebra. You can construct your own matrix using the np.array() function, as shown below.

Note

A matrix is strictly 2D. Furthermore, a 2D array is not the same as a 2D matrix. While a matrix is a 2D array, a 2D array is not a matrix, per se.

Let’s create a matrix

my_mat1 = np.array([[2,2],[2,2]])
print('my_mat1 is \n{}\nand its type is {}'.format(my_mat1,type(my_mat1)))
my_mat1 is 
[[2 2]
 [2 2]]
and its type is <class 'numpy.ndarray'>

Let’s create a second matrix so we can do some operations with my_mat1

my_mat2 = np.array([[1,0],[2,1]])

Now let’s multiply my_mat2 x my_mat1.

Let’s also multiply my_mat1 x my_mat2 (changed order)

print('my_mat2 * my_mat1 is: \n{}'.format(my_mat2 * my_mat1))
print('\n###########\n')

print('my_mat1 * my_mat2 is: \n{}'.format(my_mat1 * my_mat2))
my_mat2 * my_mat1 is: 
[[2 0]
 [4 2]]

###########

my_mat1 * my_mat2 is: 
[[2 0]
 [4 2]]

np.dot()#

Ok, this is not as expected. As you have seen in Linear Algebra, changing the order of the products affects the final result, when multiplying matrices. You may also check the result by calculating the matrix products by hand. What actually happened, is that Python performed an element-wise multiplication… But, what if you would like to apply matrix multiplication? We will show three options for that.

  • Option 1 - np.dot(arr1 ,  arr2)

print('np.dot(my_mat2 , my_mat1) is: \n{}'.format(np.dot(my_mat2 , my_mat1)))

print('\n###########\n')

print('np.dot(my_mat1 , my_mat2) is:\n{}'.format(np.dot(my_mat1 , my_mat2)))
np.dot(my_mat2 , my_mat1) is: 
[[2 2]
 [6 6]]

###########

np.dot(my_mat1 , my_mat2) is:
[[6 2]
 [6 2]]

Check the answers with the results you got when calculating by hand! This is better, isn’t it?

  • Option 2 - arr1.dot(arr2)

print('my_mat2.dot(my_mat1) is:\n{}'.format(my_mat2.dot(my_mat1)))

print('\n###########\n')

print('my_mat1.dot(my_mat2) is: \n{}'.format(my_mat1.dot(my_mat2)))
my_mat2.dot(my_mat1) is:
[[2 2]
 [6 6]]

###########

my_mat1.dot(my_mat2) is: 
[[6 2]
 [6 2]]
  • Option 3 - arr1 @ arr2

print('my_mat2 @ my_mat1 is:\n{}'.format(my_mat2 @ my_mat1))

print('\n###########\n')

print('my_mat1 @ my_mat2 is: \n{}'.format(my_mat1 @ my_mat2))
my_mat2 @ my_mat1 is:
[[2 2]
 [6 6]]

###########

my_mat1 @ my_mat2 is: 
[[6 2]
 [6 2]]

As you see from the previous examples, there are at least three different ways to multiply 2D arrays in the same way that you would multiply matrices.

Given the matrices \(A\), \(B\), and \(C\) below, calculate \(D = AB\ C^{-1}\)

\[\begin{split}A = \begin{bmatrix}2 & 3 \\ 4 & 5 \end{bmatrix}\end{split}\]
\[\begin{split}B = \begin{bmatrix}2 & 1 \\ 1 & 2\end{bmatrix}\end{split}\]
\[\begin{split}C = \begin{bmatrix}6 & 1 \\ 1 & 1\end{bmatrix}\end{split}\]

Let’s create the matrices A,B, and C

Mat_A = np.array([[2,3],[4,5]])
Mat_B = np.array([[2,1],[1,2]])
Mat_C = np.array([[6,1],[1,1]])

Invert matrix \(C\) using np.linalg.inv(Mat_C) and perform \(A\times B\times C_{inverse}\).

Mat_C_inv = np.linalg.inv(Mat_C)
Mat_D = (Mat_A @ Mat_B) @ Mat_C_inv
print(Mat_D)
[[-0.2  8.2]
 [-0.2 14.2]]

np.shape()#

A very useful tool of the numpy package is the np.shape() function. When dealing with very large arrays, you often need to find out how many elements are in there, or how many rows and columns there are. Your task for this exercise is to find out how many elements there are in random_array and how many rows and columns random_matrix has.

how many elements are in Mat_D?

how many rows and columns there are Mat_D?

print(f'There are {len(Mat_D)} elements in Mat_D\n')

print(f'The shape of Mat_D is {np.shape(Mat_D)}. Hence, there are {np.shape(Mat_D)[0]} rows and {np.shape(Mat_D)[1]} columns in Mat_D.\n')
There are 2 elements in Mat_D

The shape of Mat_D is (2, 2). Hence, there are 2 rows and 2 columns in Mat_D.

Ok, now let’s create a 2D array and play with it a bit, to show more functionalities of the numpy module…

Let’s generate a 10x11 array (10 rows, 11 columns).

arr = np.zeros([10,11])
np.set_printoptions(threshold=1000)
print(arr)

Now let’s give it the values 0,1,2,3,…,109. Since a 2D numpy array is ALWAYS regular, all rows have the same number of elements, which, in turn, is the number of columns.

cnt = 0
for i in range(len(arr)):
    for j in range (len(arr[0])): 
        arr[i,j] = cnt            
        cnt+=1
        
print(arr)

Conditions within arrays#

Similar to what you’ve seen in Notebook 2, you can also use conditions to check for values within an array. Let’s say you want to select the elements within arr that are bigger than \(100\):

arr = np.array([[60,70],[200,300],[1,2],[40,45]])
arr[arr>100]
array([200, 300])

If you want to know how many elements those are, you can simply check the length:

len(arr[arr>100])
2

Now, instead of counting how many elements are bigger than \(100\), let’s say you want to set all values bigger than \(100\) to be \(0\). You could do the following:

arr[arr>100] = 0
print(arr)
[[60 70]
 [ 0  0]
 [ 1  2]
 [40 45]]

You can also include multiple conditions… say you want to change all values between \((50,80)\) to be \(-1\), then:

Note that the & means AND, so both conditions must apply.

arr[(arr>50) & (arr<80)] = -1 
print(arr)
[[-1 -1]
 [ 0  0]
 [ 1  2]
 [40 45]]

Say you want to change all values between \([0,10]\) or values between \([40,50)\), to \(1\). Then:

arr[((arr>=0) & (arr<=10)) | (arr>=40) & (arr<50)] = 1 
print(arr)
[[-1 -1]
 [ 1  1]
 [ 1  1]
 [ 1  1]]

Note

  • The notation \([40,50)\) means every value between \(40\) and \(50\), including \(40\) but excluding \(50\); similar to \((40 \leq value < 50)\).

  • The notation & means ’and’, the notation | means ’or’.

np.where() and np.astype()#

Another useful way to modify values within your array is using the np.where() function. Let’s see how it works.

Let’s reset arr and test np.where.

arr = np.zeros([10,11])

cnt = 0
for i in range(len(arr)):
    for j in range (len(arr[0])): 
        arr[i,j] = cnt            
        cnt+=1
        
arr2 = np.where(arr>50)
print(type(arr2))
<class 'tuple'>

Ok, so using np.where() returns a tuple. Let’s transform it into an array, check its shape, and its elements.

arr2 = np.array(arr2)
print(arr2.shape)
print('\n\n------------')
print(arr2)
(2, 59)


------------
[[ 4  4  4 ...  9  9  9]
 [ 7  8  9 ...  8  9 10]]

Ok.. so it looks like np.where() returned a tuple with the index values where arr > 50. For instance, the first element of each row of arr2 is the first position where arr > 50, so arr[4][7] should return \(51\). Let’s try it out.

arr[4][7]
51.0

Indeed that’s how np.where() works. It also has similar functionality to the one we described above with arr[arr>100] = 0. Like this:

new_arr = np.where(arr>100,0,arr)
print(new_arr)
[[  0.   1.   2. ...   8.   9.  10.]
 [ 11.  12.  13. ...  19.  20.  21.]
 [ 22.  23.  24. ...  30.  31.  32.]
 ...
 [ 77.  78.  79. ...  85.  86.  87.]
 [ 88.  89.  90. ...  96.  97.  98.]
 [ 99. 100.   0. ...   0.   0.   0.]]

So, np.where() can be used in two ways. Either to return a tuple with the index of values that fall under the specified condition; or, to return a modified array following the specified condition.

In the above case, new_arr = np.where(arr>100,0,arr) reads: where arr>100, return 0, otherwise return arr

Lastly, we can change the type of variables within an array with the np.astype() function. Below a quick example:

arr_float = np.array([1.12,2.5,3.8], dtype='float')
print(arr_float)
[1.12 2.5  3.8 ]
arr_int = arr_float.astype(int)
print(arr_int)
[1 2 3]

Well.. apparently it worked, all numbers have been rounded down to their closer integer.

np.argmax() and np.argmin()#

Other functions that assist you in localizing some values within an array are the np.argmax() and np.argmin() functions. As the name states, they return the first index with the max (or min) value. Let’s check some quick examples:

print(np.argmin(arr),np.argmax(arr))
0 109

No surprises there.. the index where arr has its minimum value is \(0\), and it has its maximum at \(109\). Below an example of how these functions might be useful:

Say you have an array with the maximum temperature of each month in the NL, and you would like to know which month had the temperature closest to \(15\) degrees.

temp_NL = np.array([3.1,3.3,6.2,9.2,13.1,15.6,17.9,17.5,14.5,10.7,6.7,3.7]) 

You can use the abs() and np.argmin() functions to find out which month had the temperature closest to \(15\) degrees.

dif_arr is an array with the monthly (absolute) deviations from 15 degrees, now you only need to find where this deviation was the smallest using np.argmin().

dif_arr = abs(temp_NL - 15) 
    
min_temp_indx = np.argmin(dif_arr)
print(min_temp_indx)
8

The month with the temperature closest to \(15\) was the month with index \(8\), which is September. (Recall 0-th based index, therefore index 8 corresponds to the \(9^{th}\) month = September).

np.transpose()#

The function np.transpose() is another helpful function. It returns the transpose of an array (or matrix). Below some examples.

Let’s generate a (1,10) array with random int numbers between 0 and 100.

x = np.random.randint(100,size = (1,10))
print(x)
[[24 41 12 ... 17 92 24]]

Now, if we transpose the array:

x_transpose = np.transpose(x)
print(x_transpose)
[[24]
 [41]
 [12]
 ...
 [17]
 [92]
 [24]]

we can also see that the new shape is (10,1)

print(x_transpose.shape)
(10, 1)

Recall the equation \(A x = b\) from Linear Algebra, with \(A\) being an \(m\) x \(n\) Matrix. Here, \(A x\) only exists if \(x\) has an \(n\) x \(1\) shape. In this scenario, the np.transpose() could come in handy. Let’s look at the below example.

Let’s create a 10 x 10 matrix.

A = np.random.randint(10, size=(10,10))

print(A)
[[0 8 9 ... 0 8 6]
 [2 5 7 ... 0 6 0]
 [2 0 7 ... 3 5 9]
 ...
 [9 0 5 ... 5 4 3]
 [0 9 6 ... 3 7 0]
 [7 1 4 ... 2 1 7]]

Now if we try to perform \(Ax\)

b = A@x
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[93], line 2
      1 # now if we try to perform Ax
----> 2 b = A@x

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1 is different from 10)

It gives an error since x.shape is \(1\) x \(10\). Now if we try with x_tranpose

b = A@x_transpose
print(b)
[[3345]
 [2206]
 [2454]
 ...
 [2848]
 [2723]
 [1595]]

It works! Finally, a quicker way to transpose an array is to use .T, as shown below

x_tranpose2 = x.T
print(x_tranpose2)
[[24]
 [41]
 [12]
 ...
 [17]
 [92]
 [24]]