A (sligthly more detailed) Numpy introduction#

As we learned in the introduction numpy provides an array object (technically correct: ndarray). It is the most important module for numerical and scientific programming.

Learning Outcomes#

In this section we will look with a bit more detail into the numpy module and how to use arrays. At the end of this section you should get an understanding of:

  • what are numpy ndarrays

  • how to generate ndarrays

  • how to do calculations using arrays

  • understand basic indexing

  • understand more advanced indexing like slicing and broadcasting (advanced)

  • get a rough overview of included numpy functionality (advanced)

Further Reading#

Numpy module#

As discussed in the previous section the numpy module provides an N-dimensional array object that forms the fundamental basis of most numerical/scientific Python programs.

To make the working with arrays easy numpy also contains

  • powerful broadcasting versions of most fundamental mathematical functions

  • tools for integrating with C, C++ and Fortran code

  • some useful linear algebra, Fourier transform and random number generation routines

The Numpy array object#

The central part of numpy is the ndarray object which is a homogenous multi-dimensional array. Essentially, it is a table of elements (typically, but not necessarily numeric) of the same type (usually but not strictly). All elements can be accessed by integer indices. Note that often ndarrays are just called “arrays”, however they are distinct (and more powerful) than the Python standard libraries array.array object.

The following two examples are a one-dimensional array and a 2-dimensional (3x3) array

[1, 3, 1]

[[ 2, 3, 4],
 [ 4, 4, 4],
 [ 5, 5, 5]]

To use arrays efficiently, it is useful to know about their most important attributes which are:

ndarray.ndim
the number of axes (dimensions) of the array.
ndarray.shape
the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.
ndarray.size
the total number of elements of the array. This is equal to the product of the elements of shape.
ndarray.dtype
an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.
ndarray.itemsize
the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize.
ndarray.data
the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

Example#

import numpy as np # this is convention, less typing

x1 = np.array([1., 3., 1.]) # the first array above
x2 = np.array([[ 2, 3, 4],[ 4, 4, 4],[ 5, 5, 5]])
print("x1 has {} dimenions".format(x1.ndim)) # 1D
print("x2 has {} dimenions".format(x2.ndim)) # 1D
print("The shape of x1 is {}".format(x1.shape))
print("The shape of x2 is {}".format(x2.shape))
print("The size of x1 is {}".format(x1.size))
print("The size of x2 is {}".format(x2.size))
print("x1.dtype={}".format(x1.dtype))
print("x2.dtype={}".format(x2.dtype)) # note the dtype for x1 and x2 is different ... Why?
print("x1.itemsize={}".format(x1.itemsize)) 
print("x2.itemsize={}".format(x2.itemsize)) 
print("x1.data={}".format(x1.data)) # not very useful to print
print(type(x1))
x1 has 1 dimenions
x2 has 2 dimenions
The shape of x1 is (3,)
The shape of x2 is (3, 3)
The size of x1 is 3
The size of x2 is 9
x1.dtype=float64
x2.dtype=int64
x1.itemsize=8
x2.itemsize=8
x1.data=<memory at 0x7805401c8940>
<class 'numpy.ndarray'>

Creating Arrays#

Arrays can be generated in a number of ways. As we have seen above we can create arrays from lists (or other python objects)

print(np.array([1, 2,3]))
print(np.array([1., 2, 3]))  
print(np.array([1, 2, 3+0j]))

# important see what happens if the elements have different types
print("--------------------")
print(np.array([1, 2,3]).dtype)
print(np.array([1., 2, 3]).dtype)
print(np.array([1, 2, 3+0j]).dtype)

# we can also use tuples and mix them with lists and initialize multi-dimensional arrays
print("--------------------")
print(np.array([[1,2,3], (3, 4, 5)]))

# We can also explicitly specify the dtype
print("--------------------")
print(np.array([1, 2, 3], dtype=complex))
[1 2 3]
[1. 2. 3.]
[1.+0.j 2.+0.j 3.+0.j]
--------------------
int64
float64
complex128
--------------------
[[1 2 3]
 [3 4 5]]
--------------------
[1.+0.j 2.+0.j 3.+0.j]

Some caveats#

# this does not work
x = np.array(1, 2, 3)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 2
      1 # this does not work
----> 2 x = np.array(1, 2, 3)

TypeError: array() takes from 1 to 2 positional arguments but 3 were given
# this does not work as expected
print(np.array([[1, 2],
               [3, 4, 5]]))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 2
      1 # this does not work as expected
----> 2 print(np.array([[1, 2],
      3                [3, 4, 5]]))

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Better array initialization#

Creating arrays from lists is often tedious. Fortunately numpy provides a multitude of ways to create arrays in a convenient way

x = np.zeros(10)
print(x)
y = np.zeros((3, 4)) # note we are passing a tuple to generate a multi-dimensional array
print(y)
z = np.ones(5)
print(z)
u = np.empty(4)
print(u)
r1 = np.arange(10)
print(r1)
r2 = np.linspace(-5, 5, 20)
print(r2)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[1. 1. 1. 1. 1.]
[5.38288797e-310 0.00000000e+000             nan 1.27319747e-313]
[0 1 2 3 4 5 6 7 8 9]
[-5.         -4.47368421 -3.94736842 -3.42105263 -2.89473684 -2.36842105
 -1.84210526 -1.31578947 -0.78947368 -0.26315789  0.26315789  0.78947368
  1.31578947  1.84210526  2.36842105  2.89473684  3.42105263  3.94736842
  4.47368421  5.        ]

Importantly, these functions also accept the dtype parameter and otherwise work in the same way as np.array in how they determine the dtype

x = np.zeros(3, dtype=float)
print(x)
y = np.arange(5, dtype=complex)
print(y)
z = np.arange(5.)
print(z.dtype)
[0. 0. 0.]
[0.+0.j 1.+0.j 2.+0.j 3.+0.j 4.+0.j]
float64

It is worth looking in a bit more detail at the linspace and arange commands

print(np.arange(0, 20, 2))
print(np.linspace(-5, 5, 10)) # note contains end point and first point
print(np.linspace(-5, 5, 10, endpoint=False))
[ 0  2  4  6  8 10 12 14 16 18]
[-5.         -3.88888889 -2.77777778 -1.66666667 -0.55555556  0.55555556
  1.66666667  2.77777778  3.88888889  5.        ]
[-5. -4. -3. -2. -1.  0.  1.  2.  3.  4.]

A very useful operation is to reshape arrays into a different shape. The most flexible way to achieve this is via the reshape method.

x = np.arange(10)
y = x.reshape(2,5)
print(y)
z = y.reshape(10,1)
print(z) # this is different to x!
[[0 1 2 3 4]
 [5 6 7 8 9]]
[[0]
 [1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]

Accessing array elements#

So now that we have learned how to create arrays, how do we address elements?

numpy has a straight forward indexing scheme, which looks very similar to lists.

x = np.arange(10)
print(x[1])
x[2] = 11
print(x)
1
[ 0  1 11  3  4  5  6  7  8  9]

Important as you can see above unlike Matlab indexing starts at 0, i.e. x[0] is the first element in an array.

# accessing elements in multi-dimensional arrays is straight forward as well. 
x = np.array([[1, 2, 3],
            [4, 5, 6],
            [7, 8, 9]])
print(x[1, 1]) # separate the different dimensions by ','
x[2, 0] = 11
print(x)
5
[[ 1  2  3]
 [ 4  5  6]
 [11  8  9]]

The matrix class#

Be aware that numpy also supplies a matrix class, which works more like Matlab. Generally, it is recommended to use the array class instead (see Numpy for Matlab users) for a more detailed discussion of pros and cons

Numpy functions and operators#

On top of the array class numpy provides mathematical functions which “broadcast” for working with arrays. All fundamental functions are included.

x = np.arange(10)
x*2
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
x/2 # note that we perform a conversion to float 
array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
x+1j
array([0.+1.j, 1.+1.j, 2.+1.j, 3.+1.j, 4.+1.j, 5.+1.j, 6.+1.j, 7.+1.j,
       8.+1.j, 9.+1.j])
x = np.array([[1, 2, 3],
            [4, 5, 6],
            [7, 8, 9]])
x.T #transpose
array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])
x = np.arange(10)
x.T # transpose for 1D arrays does not work
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
%pylab inline 
# we talk about the above line later
x = np.linspace(-2*np.pi, 2*np.pi, 100, endpoint=False)
y = np.sin(x)
plt.plot(x,y)
%pylab is deprecated, use %matplotlib inline and import the required libraries.
Populating the interactive namespace from numpy and matplotlib
[<matplotlib.lines.Line2D at 0x780527e4d3c0>]
_images/65ce531bca395058acf1b07fb836fc7e87b92fe6c4fb1bdb9eb786ce5f5d4e48.png

Important difference to Matlab. Operations are element-wise by default

x = np.arange(5)
print(x)
y = np.ones(5)*2
print(y)
print(x*y)
print(x**2)
[0 1 2 3 4]
[2. 2. 2. 2. 2.]
[0. 2. 4. 6. 8.]
[ 0  1  4  9 16]

If you want to have matrix operations you have to explicitly call the linear algebra functions (since python-3.5 the @ operator works as well).

print(np.dot(x,y)) # Note that transpose is not needed if both vectors are 1D
20.0
print(x@y) #works since python 3.5
20.0
print(np.inner(x,y)) # equivalent with dot if arrays are 1D
print(np.dot(x,y)) 
20.0
20.0
print(np.outer(x,y)) # equivalent to dot 
print(np.dot(x.reshape(-1,1), y.reshape(1,-1)))
[[0. 0. 0. 0. 0.]
 [2. 2. 2. 2. 2.]
 [4. 4. 4. 4. 4.]
 [6. 6. 6. 6. 6.]
 [8. 8. 8. 8. 8.]]
[[0. 0. 0. 0. 0.]
 [2. 2. 2. 2. 2.]
 [4. 4. 4. 4. 4.]
 [6. 6. 6. 6. 6.]
 [8. 8. 8. 8. 8.]]

Slicing (Supplementary material)#

  • So what if I want to access more than one element?

  • Or how do I access every second element?

The numpy indexing allows much more than simple accessing of indices. Generally for every dimension it follows the convention of x[start:end:step], with the important point that this includes the start but excludes the stop element.

x = np.arange(10)
print(x)
print(x[1:3])
print(x[0:3])
print(x[:3]) # the 0 can be omitted if we start from the beginning
print(x[5:10])
print(x[5:]) # if we want to index to the end we can leave out the last element
[0 1 2 3 4 5 6 7 8 9]
[1 2]
[0 1 2]
[0 1 2]
[5 6 7 8 9]
[5 6 7 8 9]
print(x[-3:]) # we can also count backwards from the end
print(x[-1]) # -1 is the last element
print(x[-6:-1]) # but not included when end
[7 8 9]
9
[4 5 6 7 8]
print(x[::2]) # going over whole array we can omit start and end 
[0 2 4 6 8]
print(x[::-2]) # we can also go backwards
[9 7 5 3 1]

Array indexing can do even fancier things, see Numpy indexing for a detailed overview.

Reshaping revisited (Supplementary material)#

If one wants to reshape an array into a row or column vector, one can also use the reshape method, a -1 indicates all remaining elements. Often it is however more convenient to use newaxis.

x = np.arange(10)
y = x[:,np.newaxis] # this generates a column vector
print(y.shape)
print(y)
z = x.reshape(-1, 1)# equivalent to the above note the -1
print(z)
(10, 1)
[[0]
 [1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]
[[0]
 [1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]

Broadcasting (Supplementary material)#

Generally, when multiplying different arrays they need to have the same dimensions/elements

x = np.ones(4)
y = np.arange(3)
x*y # does not work
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[27], line 3
      1 x = np.ones(4)
      2 y = np.arange(3)
----> 3 x*y # does not work

ValueError: operands could not be broadcast together with shapes (4,) (3,) 

One of the most common ways of using the newaxis above is for broadcasting, which is if one wants to e.g. multiply every element of an array with a vector to create a matrix.

z = x[:,np.newaxis]*y[np.newaxis,:]
print(z)
[[0. 1. 2.]
 [0. 1. 2.]
 [0. 1. 2.]
 [0. 1. 2.]]

However, broadcasting does not require newaxis. Have a look at the following example.

x = np.ones(6).reshape(2,3)
y = np.arange(1,4)
z = x * y # multiplies every row of x with y
print(z)
x = x.reshape(3,2)
y = y.reshape(3,1)
z = x * y # multiplication of every column of x with y
print(z)
[[1. 2. 3.]
 [1. 2. 3.]]
[[1. 1.]
 [2. 2.]
 [3. 3.]]

The rules for broadcasting are detailed in the numpy documentation here. It allows easy and powerful operations of a smaller array with elements of a higher dimensional array

Important notes on included functions (Supplementary material)#

Functions will “upcast” (return a more precise numeric type) when necessary, see e.g.

x = np.arange(10)
y = sin(x)
print(x.dtype)
print(y.dtype)
int64
float64

However, numpy by default does not upcast to complex numbers but instead returns nan (not a number). See below:

print(np.sqrt(-1))
print(np.sqrt(np.linspace(-1,1,5)))
nan
[       nan        nan 0.         0.70710678 1.        ]
/tmp/ipykernel_96397/3218816100.py:1: RuntimeWarning: invalid value encountered in sqrt
  print(np.sqrt(-1))
/tmp/ipykernel_96397/3218816100.py:2: RuntimeWarning: invalid value encountered in sqrt
  print(np.sqrt(np.linspace(-1,1,5)))

If we want the “mathematically correct” functions that return a complex value you need to import this specifically.

from numpy.lib import scimath

print(scimath.sqrt(-1))
print(scimath.log(-1))
1j
3.141592653589793j

Important Be aware that if you use scipy the default is the opposite

Why use Numpy arrays#

Apart from the convenience of having functions act in all dimensions, vector operations (operations on the full arrays) are much faster than element operations, e.g. on a list.

Let’s rerun the example from before.

def listmult(x,y):
    z = []
    for i in range(len(x)):
        z.append(x[i]*y[i])
    return z
x = [1]*100
y = [2]*100
%timeit listmult(x,y)
4.63 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
x = np.ones(100)
y = np.ones(100)*2
%timeit x*y
342 ns ± 0.227 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Numpy Linear Algebra#

While if you need to do linear algebra we generally recommend using scipy, numpy does contain a limited set of functions that do fast matrix operations, see here for the complete documentation. Below are some examples.

x=np.random.randn(3,3)
y = np.linalg.inv(x)
z = np.dot(x,y)
print(z)
[[ 1.00000000e+00 -1.72176418e-17 -1.57103354e-15]
 [ 2.43136249e-17  1.00000000e+00  1.98942999e-15]
 [ 4.07423571e-17 -1.36312715e-17  1.00000000e+00]]
print(np.eye(8))
[[1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1.]]

Numpy FFT#

numpy also contains an FFT library. Note that it is reasonably fast for inputs that are a power of 2, but significantly slower for others. If you need high-speed FFTs we recommend using pyfftw which is a wrapper around the extremely fast FFTW library.

Other useful functions#

Three very useful functions when looking for code or trying to understand what some numpy function does are: info, lookfor and source.

np.info(np.eye) # returns numpy specific help (used by ipython and jupyter)
 eye(N, M=None, k=0, dtype=<class 'float'>, order='C', *, like=None)

Return a 2-D array with ones on the diagonal and zeros elsewhere.

Parameters
----------
N : int
  Number of rows in the output.
M : int, optional
  Number of columns in the output. If None, defaults to `N`.
k : int, optional
  Index of the diagonal: 0 (the default) refers to the main diagonal,
  a positive value refers to an upper diagonal, and a negative value
  to a lower diagonal.
dtype : data-type, optional
  Data-type of the returned array.
order : {'C', 'F'}, optional
    Whether the output should be stored in row-major (C-style) or
    column-major (Fortran-style) order in memory.

    .. versionadded:: 1.14.0
like : array_like, optional
    Reference object to allow the creation of arrays which are not
    NumPy arrays. If an array-like passed in as ``like`` supports
    the ``__array_function__`` protocol, the result will be defined
    by it. In this case, it ensures the creation of an array object
    compatible with that passed in via this argument.

    .. versionadded:: 1.20.0

Returns
-------
I : ndarray of shape (N,M)
  An array where all elements are equal to zero, except for the `k`-th
  diagonal, whose values are equal to one.

See Also
--------
identity : (almost) equivalent function
diag : diagonal 2-D array from a 1-D array specified by the user.

Examples
--------
>>> np.eye(2, dtype=int)
array([[1, 0],
       [0, 1]])
>>> np.eye(3, k=1)
array([[0.,  1.,  0.],
       [0.,  0.,  1.],
       [0.,  0.,  0.]])
np.source(eye) #prints the source of a function or method
In file: /home/jschrod/.virtualenvs/jupyterbook2/lib/python3.10/site-packages/numpy/lib/twodim_base.py

@set_array_function_like_doc
@set_module('numpy')
def eye(N, M=None, k=0, dtype=float, order='C', *, like=None):
    """
    Return a 2-D array with ones on the diagonal and zeros elsewhere.

    Parameters
    ----------
    N : int
      Number of rows in the output.
    M : int, optional
      Number of columns in the output. If None, defaults to `N`.
    k : int, optional
      Index of the diagonal: 0 (the default) refers to the main diagonal,
      a positive value refers to an upper diagonal, and a negative value
      to a lower diagonal.
    dtype : data-type, optional
      Data-type of the returned array.
    order : {'C', 'F'}, optional
        Whether the output should be stored in row-major (C-style) or
        column-major (Fortran-style) order in memory.

        .. versionadded:: 1.14.0
    ${ARRAY_FUNCTION_LIKE}

        .. versionadded:: 1.20.0

    Returns
    -------
    I : ndarray of shape (N,M)
      An array where all elements are equal to zero, except for the `k`-th
      diagonal, whose values are equal to one.

    See Also
    --------
    identity : (almost) equivalent function
    diag : diagonal 2-D array from a 1-D array specified by the user.

    Examples
    --------
    >>> np.eye(2, dtype=int)
    array([[1, 0],
           [0, 1]])
    >>> np.eye(3, k=1)
    array([[0.,  1.,  0.],
           [0.,  0.,  1.],
           [0.,  0.,  0.]])

    """
    if like is not None:
        return _eye_with_like(like, N, M=M, k=k, dtype=dtype, order=order)
    if M is None:
        M = N
    m = zeros((N, M), dtype=dtype, order=order)
    if k >= M:
        return m
    # Ensure M and k are integers, so we don't get any surprise casting
    # results in the expressions `M-k` and `M+1` used below.  This avoids
    # a problem with inputs with type (for example) np.uint64.
    M = operator.index(M)
    k = operator.index(k)
    if k >= 0:
        i = k
    else:
        i = (-k) * M
    m[:M-k].flat[i::M+1] = 1
    return m
np.lookfor("convolution") # searches the numpy documentation for a keyword
Search results for 'convolution'
--------------------------------
numpy.convolve
    Returns the discrete, linear convolution of two one-dimensional sequences.
numpy.ma.convolve
    Returns the discrete, linear convolution of two one-dimensional sequences.
numpy.polymul
    Find the product of two polynomials.
numpy.bartlett
    Return the Bartlett window.
numpy.correlate
    Cross-correlation of two 1-dimensional sequences.
numpy.vectorize
    vectorize(pyfunc=np._NoValue, otypes=None, doc=None, excluded=None,

Important differences to Matlab#

  • array operations are by element by default

  • indexing starts at zero

  • pass by reference (numpy) compared to pass by value (Matlab)

  • Matlab arrays are double by default, numpy arrays have different dtypes

  • the & and | operators are bitwise “AND” and “OR” operators (not logical, use np.logical_and and np.logical_or)

Exercises#

  • Construct a 8x8 array of ones and make the 4th row 4

  • Create an array that repeats np.arange(5) 6-times (hint: use the lookfor function)

  • Generate a 3D array of shape 3x2x4 and make all elements 1, multiply all dimensions with the [[1,2,3],[4,5,6],[7,8,9],[10,11,12]] (don’t use np.array to generate this array) array (hint: use np.newaxis)