6 Numpy Functions Explained

NumPy, its importance, and some useful Numpy functions:

11 min readJun 15, 2021

NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python. One of the reasons why NumPy is so important for numerical computations in Python is because it is designed for efficiency on large arrays of data. It helps in performing complex computations on entire arrays without the need for Python for loops. Similarly, it is much faster and uses significantly less memory.

Following are 6 different NumPy functions that will be quite useful in the Data Analysis field. The functions are:

numpy.linspace
numpy.repeat
numpy.std
numpy.percentile
numpy.reshape
numpy.swapaxes

!pip install jovian --upgrade -qimport jovianjovian.commit(project='numpy-array-operations')<IPython.core.display.Javascript object>


[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "beekaysh/numpy-array-operations" on https://jovian.ai[0m
[jovian] Uploading notebook..[0m
[jovian] Uploading additional files...[0m
[jovian] Committed successfully! https://jovian.ai/beekaysh/numpy-array-operations[0m





'https://jovian.ai/beekaysh/numpy-array-operations'

Let’s begin by importing Numpy and listing out the functions covered in this notebook.

import numpy as np# List of functions explained 
function1 = np.linspace
function2 = np.repeat
function3 = np.std
function4 = np.percentile
function5 = np.reshape
function6 = np.swapaxes

Function 1 — np.linspace

numpy.linspace is a function that returns evenly spaced numbers over a specified interval.

Syntax: np.linspace(start, stop, num= 50, endpoint = True, retstep = False, dtype = None, axis = 0)

The np.linspace function returns num evenly spaced intervals calculated over the interval start and stop. By default, 50 samples are generated. The endpoint value determines whether the stop element is excluded or included. Only the start and stop values are compulsory and the rest are optional.

# Example 1:

np.linspace(2, 20)array([ 2.        ,  2.36734694,  2.73469388,  3.10204082,  3.46938776,
        3.83673469,  4.20408163,  4.57142857,  4.93877551,  5.30612245,
        5.67346939,  6.04081633,  6.40816327,  6.7755102 ,  7.14285714,
        7.51020408,  7.87755102,  8.24489796,  8.6122449 ,  8.97959184,
        9.34693878,  9.71428571, 10.08163265, 10.44897959, 10.81632653,
       11.18367347, 11.55102041, 11.91836735, 12.28571429, 12.65306122,
       13.02040816, 13.3877551 , 13.75510204, 14.12244898, 14.48979592,
       14.85714286, 15.2244898 , 15.59183673, 15.95918367, 16.32653061,
       16.69387755, 17.06122449, 17.42857143, 17.79591837, 18.16326531,
       18.53061224, 18.89795918, 19.26530612, 19.63265306, 20.        ])

Here, the code generated 48 evenly spaced numbers between 2 and 20(50 including 2 and 20). This means that the difference between every two corresponding elements is the same. Since num and endpoint were not defined, by default, num is 50(a total of 50 elements are generated) and the endpoint is True(the stop value(20) is also included).

# Example 2:

np.linspace(10, 101, 10, False, True, float)(array([10. , 19.1, 28.2, 37.3, 46.4, 55.5, 64.6, 73.7, 82.8, 91.9]), 9.1)

Here, the function returned 10 evenly spaced elements over the intervals 10 and 101. It returned 10 this time because the num value is 10. It also excluded 101 this time because the endpoint value is False. Similarly, since the retstep value is True, we got (samples, step) where step is the spacing between the elements which in this case is 9.1. And finally, the dtype value is float as defined in the function.

# Example 3 - breaking (to illustrate when it breaks)

np.linspace('z', 'a')
np.linspace(1, 'a')---------------------------------------------------------------------------

UFuncTypeError                            Traceback (most recent call last)

<ipython-input-25-9a150456fb6e> in <module>
      1 # Example 3 - breaking (to illustrate when it breaks)
      2 
----> 3 np.linspace('z', 'a')
      4 np.linspace(1, 'a')


<__array_function__ internals> in linspace(*args, **kwargs)


/opt/conda/lib/python3.8/site-packages/numpy/core/function_base.py in linspace(start, stop, num, endpoint, retstep, dtype, axis)
    118     # Convert float/complex array scalars to float, gh-3504
    119     # and make sure one can use variables that have an __array_interface__, gh-6634
--> 120     start = asanyarray(start) * 1.0
    121     stop  = asanyarray(stop)  * 1.0
    122 


UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U32')) -> dtype('<U32')

Here, both the functions break and throw an error. This is because we have used a string as an argument in the start and stop elements. The np.linspace can only be used to display evenly spaced intervals between two numbers and not strings. So we can eliminate this error by using an integer or a float value instead of a string.

Thus, we use np.linspace function when we need to generate a sequence with equally distributed/spaced numbers.

jovian.commit()<IPython.core.display.Javascript object>


[jovian] Attempting to save notebook..[0m

Function 2 — np.repeat

np.repeat is a function that repeats the elements of an array.

Syntax: np.repeat(a, repeats, axis=None)

The number of repetitions is specified by the second argument repeats.

# Example 1:

np.repeat(2, 5)array([2, 2, 2, 2, 2])

Here, the element ‘2’ is repeated 5 times since the value of the argument repeats is 5.

# Example 2:

arr = np.array([[1,4], [5,7]])
print (arr)
output1 = np.repeat(arr, 3)
print(output1)
output2 = np.repeat(arr, 3, axis=1)
print(output2)[[1 4]
 [5 7]]
[1 1 1 4 4 4 5 5 5 7 7 7]
[[1 1 1 4 4 4]
 [5 5 5 7 7 7]]

In this example, we have a two-dimensional array. In output1 it is converted into a 1d array and each element is repeated 3 times(the axis value was not defined so it returned a flat output array by default). And in output2, it is converted into a 2d array since the value of the axis is 1 and it repeated the value along that axis.

# Example 3 - breaking:

np.repeat('bikesh', 3, axis=2)---------------------------------------------------------------------------

AxisError                                 Traceback (most recent call last)

<ipython-input-24-779b0ddd346e> in <module>
      1 # Example 3 - breaking:
      2 
----> 3 np.repeat('bikesh', 3, axis=2)


<__array_function__ internals> in repeat(*args, **kwargs)


/opt/conda/lib/python3.8/site-packages/numpy/core/fromnumeric.py in repeat(a, repeats, axis)
    477 
    478     """
--> 479     return _wrapfunc(a, 'repeat', repeats, axis=axis)
    480 
    481 


/opt/conda/lib/python3.8/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     53     bound = getattr(obj, method, None)
     54     if bound is None:
---> 55         return _wrapit(obj, method, *args, **kwds)
     56 
     57     try:


/opt/conda/lib/python3.8/site-packages/numpy/core/fromnumeric.py in _wrapit(obj, method, *args, **kwds)
     42     except AttributeError:
     43         wrap = None
---> 44     result = getattr(asarray(obj), method)(*args, **kwds)
     45     if wrap:
     46         if not isinstance(result, mu.ndarray):


AxisError: axis 2 is out of bounds for array of dimension 1

This example breaks because I have given 2 as the value of axis, but since the axis is only of dimension 1, the value 5 is out of bounds. We can fix this by changing the value of axis to 0, or by using an array of dimensions more than 2.

We can use this function when we need to repeat certain values at certain times.

jovian.commit()<IPython.core.display.Javascript object>

Function 3 — np.std

np.std is used to measure the standard deviation of a given data set(array elements) along the specified axis(if any, otherwise it is calculated for the flattened array by default).

Syntax: np.std(arr, axis = None)

axis = 0 means SD along the column, and axis = 1 means SD along the row.

# Example 1:

arr = [10, 5, 33, 32, 1]
np.std(arr)13.614697940094006

This gives the standard deviation of the elements in arr.

# Example 2:

arr = [[1,3,4,6], [2,55,4,33], [1.44, 3.5, 3, 2], [1, 0.2, 2.3, 4.3],  [4,6,8,7.7]]

print(np.std(arr))
print(np.std(arr, axis = 0))
print(np.std(arr,axis=1))
print(np.std(arr[0:2]))12.799863124268164
[ 1.1181127  20.81168902  1.97747314 11.3576406 ]
[ 1.80277564 21.93741097  0.80973761  1.55        1.59432588]
18.5

Here, the first print statement gives the SD of all the elements in the array. The second one gives the SD of the elements along the row, since axis = 0. And the third one gives the SD of the elements along the column, since axis = 1. Finally, the fourth one gives the SD of the elements only in the first, second and third list, as specified.

# Example 3 - breaking:

arr = [[1, -4, 4, -5], [1, 4, 3, 2, 5]]

np.std(arr, axis = 1)---------------------------------------------------------------------------

AxisError                                 Traceback (most recent call last)

<ipython-input-25-42a000c96883> in <module>
      3 arr = [[1, -4, 4, -5], [1, 4, 3, 2, 5]]
      4 
----> 5 np.std(arr, axis = 1)


<__array_function__ internals> in std(*args, **kwargs)


/opt/conda/lib/python3.8/site-packages/numpy/core/fromnumeric.py in std(a, axis, dtype, out, ddof, keepdims)
   3494             return std(axis=axis, dtype=dtype, out=out, ddof=ddof, **kwargs)
   3495 
-> 3496     return _methods._std(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
   3497                          **kwargs)
   3498 


/opt/conda/lib/python3.8/site-packages/numpy/core/_methods.py in _std(a, axis, dtype, out, ddof, keepdims)
    231 
    232 def _std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False):
--> 233     ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
    234                keepdims=keepdims)
    235 


/opt/conda/lib/python3.8/site-packages/numpy/core/_methods.py in _var(a, axis, dtype, out, ddof, keepdims)
    177     arr = asanyarray(a)
    178 
--> 179     rcount = _count_reduce_items(arr, axis)
    180     # Make this warning show up on top.
    181     if ddof >= rcount:


/opt/conda/lib/python3.8/site-packages/numpy/core/_methods.py in _count_reduce_items(arr, axis)
     64     items = 1
     65     for ax in axis:
---> 66         items *= arr.shape[mu.normalize_axis_index(ax, arr.ndim)]
     67     return items
     68 


AxisError: axis 1 is out of bounds for array of dimension 1

In this case, the program breaks and shows an error because we have included two lists in the array with different numbers of elements; four in the first and five in the second. So it is not possible to compute SD along the column in this case. We can fix this by making the number of elements equal in both lists.

We use this function when we need to calculate the Standard Deviation or the spread of data distribution in any given data set.

jovian.commit()<IPython.core.display.Javascript object>


[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "beekaysh/numpy-array-operations" on https://jovian.ai[0m
[jovian] Uploading notebook..[0m
[jovian] Uploading additional files...[0m
[jovian] Committed successfully! https://jovian.ai/beekaysh/numpy-array-operations[0m





'https://jovian.ai/beekaysh/numpy-array-operations'

Function 4- np.percentile

np.percentile is used to compute the nth percentile of the given data(array elements) along the specified axis.

Syntax: np.percentile(arr, n, axis= None, out = None)

This function returns the nth percentile(s) of the array elements. As we saw before, axis = 0 means along the column and axis = 1 means working along the row.

# Example 1:

arr = [1,3,5,8,8.7,7,6,5]
np.percentile(arr, 13)2.82

The above example gives the 13th percentile of the elements in the array arr.

# Example 2:

arr = [[1,3,4.5,6.7], [4,6,6.8,9.9], [1,3,0.9,5], [1.2,3.3, 4, 5]]
print(np.percentile(arr, 45))
print(np.percentile(arr, 33, axis =0))
print(np.percentile(arr, 79, axis = 1))
arr1 = np.zeros_like(arr)
print(np.percentile(arr, 12, axis=0, out = arr1))3.825
[1.    3.    3.969 5.   ]
[5.314 7.947 3.74  4.37 ]
[[1.    3.    2.016 5.   ]
 [1.    3.    2.016 5.   ]
 [1.    3.    2.016 5.   ]
 [1.    3.    2.016 5.   ]]

Here, the first print statement simply calculates and gives the 45th percentile across all the elements in the array arr. The second one gives the 33rd percentile along the rows, since axis = 0. The third one gives the 79th percentile along the columns, since axis = 1. Finally, the fourth print statement calculates the 12th percentile along the rows(since axis = 0) and places the result in arr1 since out = arr1. For this, we first defined arr1 as an array of zeros with the same shape and type as that of arr.

# Example 3 - breaking:

np.percentile(arr)

np.percentile(arr, 55, axis = 1, out= arr2)---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-40-862ca3c237da> in <module>
      1 # Example 3 - breaking:
      2 
----> 3 np.percentile(arr)
      4 
      5 np.percentile(arr, 55, axis = 1, out= arr2)


<__array_function__ internals> in percentile(*args, **kwargs)


TypeError: _percentile_dispatcher() missing 1 required positional argument: 'q'

In this example, there are two cases where the program breaks and shows errors. The first one is when we do not include the value of n or the second argument inside the function. We can fix this by adding an argument to the function which defines which percentile we want the function to calculate. Similarly, in the second one, the program breaks because we haven’t defined the value of arr2 but have set ‘out=arr2’. We first need to set the value of arr2(the array we want to place the final result in) and it should be the same shape and type as that of the original array.

Each percentile is referred to by the percentage with which it splits the data, so the nth percentile is the value that is n% of the way through the data. We use the function np.percentile to compute the nth percentile of the given array elements. We can also use it(alternatively) to compute the median(n =50), first quartile(n=25) and third quartile(n=75).

jovian.commit()<IPython.core.display.Javascript object>


[jovian] Attempting to save notebook..[0m

Function 5- np.reshape

np.reshape function is used to shape an array without changing the data inside the array.

Syntax: np.reshape(array, shape, order = ‘C’)

# Example 1:

array = np.arange(20)
print(array)
print(array.reshape(5,4))
print(array.reshape(4,5))[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]

Here, we first have an array with 20 elements. The reshape function first converts the array into a two-dimensional of 5 rows and 4 columns, and then into a dimension of 4 rows and 5 columns.

# Example 2:

print(np.arange(18).reshape(2,9))
print(np.arange(18).reshape(3,2,3))[[ 0  1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16 17]]
[[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]

 [[12 13 14]
  [15 16 17]]]

In this example, we took an array with 18 elements and first converted it into a 2D array with 2 rows and 9 columns. Then, we converted it into a 3D array with 2 rows and 3 columns.

# Example 3 - breaking:

np.arange(28).reshape(2,3)---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-14-f05afcbc132d> in <module>
      1 # Example 3 - breaking:
      2 
----> 3 np.arange(28).reshape(2,3)


ValueError: cannot reshape array of size 28 into shape (2,3)

Here, the program breaks because we cannot reshape an array with size 28 into (2,3). The dimensions should be compatible with the original shape. So we can fix this by using a compatible dimension like (2,14), (28,1), (4,7), etc.

We use np.reshape to give a new shape to an array without changing its data.

jovian.commit()<IPython.core.display.Javascript object>


[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "beekaysh/numpy-array-operations" on https://jovian.ai[0m
[jovian] Uploading notebook..[0m
[jovian] Uploading additional files...[0m
[jovian] Committed successfully! https://jovian.ai/beekaysh/numpy-array-operations[0m





'https://jovian.ai/beekaysh/numpy-array-operations'

Function 6 — np.swapaxes

np.swapaxes is used to interchange two axes of an array.

Syntax: np.swapaxes(a, axis1, axis2)

# Example 1:

a = np.arange(14).reshape(2,7)
print(a)
np.swapaxes(a,0,1)[[ 0  1  2  3  4  5  6]
 [ 7  8  9 10 11 12 13]]





array([[ 0,  7],
       [ 1,  8],
       [ 2,  9],
       [ 3, 10],
       [ 4, 11],
       [ 5, 12],
       [ 6, 13]])

Here, we first have a 2D array with 2 rows and 7 columns. After using the np.swapaxes function, the axes are swapped and the array is converted into a 2D array with 7 rows and 2 columns.

# Example 2:

a = np.arange(12).reshape(3,2,2)
print(a)
print(np.swapaxes(a,0,2))
print(np.swapaxes(a,1,2))
print(np.swapaxes(a,0,1))[[[ 0  1]
  [ 2  3]]

 [[ 4  5]
  [ 6  7]]

 [[ 8  9]
  [10 11]]]
[[[ 0  4  8]
  [ 2  6 10]]

 [[ 1  5  9]
  [ 3  7 11]]]
[[[ 0  2]
  [ 1  3]]

 [[ 4  6]
  [ 5  7]]

 [[ 8 10]
  [ 9 11]]]
[[[ 0  1]
  [ 4  5]
  [ 8  9]]

 [[ 2  3]
  [ 6  7]
  [10 11]]]

In this example, we have a 3D array with 2 rows and 2 columns. First of all, the swapaxes function swaps the values across the axes 0 and 2. Then it swaps the values across axes 1 and 2. And finally, it swaps the values across the axes 0 and 1.

# Example 3 - breaking:

a = np.arange(10)
np.swapaxes(a,0,1)---------------------------------------------------------------------------

AxisError                                 Traceback (most recent call last)

<ipython-input-28-a5bfed914934> in <module>
      2 
      3 a = np.arange(10)
----> 4 np.swapaxes(a,0,1)


<__array_function__ internals> in swapaxes(*args, **kwargs)


/opt/conda/lib/python3.8/site-packages/numpy/core/fromnumeric.py in swapaxes(a, axis1, axis2)
    592 
    593     """
--> 594     return _wrapfunc(a, 'swapaxes', axis1, axis2)
    595 
    596 


/opt/conda/lib/python3.8/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     56 
     57     try:
---> 58         return bound(*args, **kwds)
     59     except TypeError:
     60         # A TypeError occurs if the object does have such a method in its


AxisError: axis2: axis 1 is out of bounds for array of dimension 1

In this case, the program breaks because we are trying to swap the axis of a 1D array. To fix this, we have to use an array that has more than one dimension. We use this function when we need to interchange the two axes of an array.

jovian.commit()<IPython.core.display.Javascript object>


[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "beekaysh/numpy-array-operations" on https://jovian.ai[0m
[jovian] Uploading notebook..[0m
[jovian] Uploading additional files...[0m
[jovian] Committed successfully! https://jovian.ai/beekaysh/numpy-array-operations[0m





'https://jovian.ai/beekaysh/numpy-array-operations'

Conclusion

In this notebook, I have discussed 6 different NumPy functions that can be used in Data analysis. I have listed them below:

numpy.linspace
numpy.repeat
numpy.std
numpy.percentile
numpy.reshape
numpy.swapaxes

To conclude, I learned a lot about the workflow of NumPy functions while working on this assignment. I can safely say that I now have a very good understanding of the six functions I wrote about.

Reference Links

Numpy official tutorial: https://numpy.org/doc/stable/user/quickstart.html
https://www.tutorialspoint.com/numpy/index.htm
https://www.geeksforgeeks.org/

jovian.commit()<IPython.core.display.Javascript object>


[jovian] Attempting to save notebook..[0m