Numpy: (Numerical Python) a library for scientific computing in Python

Dilip Kumar
8 min readJan 17, 2025

--

Chapter# 1: Introduction to numpy

1. Introduction

1.1 What is NumPy?

  • NumPy (Numerical Python) is a fundamental library for scientific computing in Python.
  • It provides support for multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.
  • It is the foundation for many other libraries in the Python data science ecosystem, such as Pandas, SciPy, Scikit-learn, and TensorFlow.

1.2 Installing and Importing NumPy

Installation: If you don’t have NumPy installed, you can install it using pip:

pip install numpy

2. NumPy Arrays

2.1 Creating Arrays

NumPy arrays are the core data structure in NumPy. Here’s how you can create them:

  • From a Python list:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr) # [1 2 3 4 5]
  • np.zeros(): Creates an array filled with zeros.
zeros_arr = np.zeros(5)  # 1D array with 5 zeros
print(zeros_arr) # [0. 0. 0. 0. 0.]
  • np.ones(): Creates an array filled with ones.
ones_arr = np.ones((3, 3))  # 2D array (3x3) with ones
print(ones_arr)
# Output
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
  • np.arange(): Creates an array with evenly spaced values within a range.
range_arr = np.arange(0, 10, 2)  # Start, Stop, Step
print(range_arr) # [0 2 4 6 8]
  • np.linspace(): Creates an array with a specified number of evenly spaced values.
linspace_arr = np.linspace(0, 1, 5)  # Start, Stop, Number of points
print(linspace_arr) # [0. 0.25 0.5 0.75 1. ]

2.2 Array Attributes

NumPy arrays have several attributes that provide useful information:

  • shape: Returns the dimensions of the array.
  • dtype: Returns the data type of the array elements.
  • size: Returns the total number of elements in the array.
  • ndim: Returns the number of dimensions (axes) of the array.
arr = np.array([[1, 2, 3], [4, 5, 6]])
print("Shape:", arr.shape) # (2, 3)
print("Data type:", arr.dtype) # int64
print("Size:", arr.size) # 6
print("Number of dimensions:", arr.ndim) # 2

3. Array Indexing and Slicing

Indexing: Accessing individual elements of an array.

arr = np.array([1, 2, 3, 4, 5])
print(arr[0]) # First element: 1
print(arr[-1]) # Last element: 5

Slicing: Accessing a subset of an array.

print(arr[1:4])  # Elements from index 1 to 3: [2, 3, 4]
print(arr[:3]) # Elements from start to index 2: [1, 2, 3]
print(arr[::2]) # Every second element: [1, 3, 5]

Boolean Indexing: Filtering elements using a boolean condition.

arr = np.array([1, 2, 3, 4, 5])
print(arr[arr > 3]) # Elements greater than 3: [4, 5]

4. Basic Operations

Arithmetic Operations: Element-wise addition, subtraction, multiplication, and division.

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b) # [5, 7, 9]
print(a * b) # [4, 10, 18]

Aggregation Functions: Functions like sum, mean, min, max, etc.

arr = np.array([1, 2, 3, 4, 5])
print(np.sum(arr)) # 15
print(np.mean(arr), axis=0) # 3.0 (0=column, 1=row)

Chapter# 2. Intermediate NumPy

2.1 Broadcasting

What is Broadcasting?

  • Broadcasting allows NumPy to perform arithmetic operations on arrays of different shapes.
  • It automatically “stretches” smaller arrays to match the shape of larger arrays, without actually copying data.

Rules of Broadcasting:

  1. If arrays have different dimensions, the smaller array is padded with ones on its left side.
  2. If the shapes of the arrays do not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
  3. If any dimension does not match and neither is equal to 1, an error is raised.

Example:

a = np.array([1, 2, 3])  # Shape: (3,)
b = np.array([[10], [20]]) # Shape: (2, 1)
print(a + b)
# Output
[[11 12 13]
[21 22 23]]

Here, a is stretched to shape (2, 3) and b is stretched to shape (2, 3) before the addition.

2.2 Advanced Indexing

Integer Array Indexing:

Use integer arrays to index into another array.

arr = np.array([10, 20, 30, 40, 50])
indices = np.array([0, 2, 4])
print(arr[indices]) # [10, 30, 50]

Fancy Indexing:

Use arrays of indices to access multiple elements at once.

arr = np.array([[1, 2], [3, 4], [5, 6]])
rows = np.array([0, 2])
cols = np.array([1, 0])
print(arr[rows, cols]) # [2, 5]

2.3 Universal Functions (ufuncs)

What are ufuncs?

  • Universal functions are functions that operate element-wise on arrays.
  • They are highly optimized and written in C, making them very fast.

Common ufuncs:

  • Mathematical functions: np.sin, np.cos, np.exp, np.log, etc.
arr = np.array([0, np.pi/2, np.pi])
print(np.sin(arr)) # [0., 1., 0.]
  • Comparison functions: np.greater, np.less, np.equal, etc.
a = np.array([1, 2, 3])
b = np.array([2, 2, 2])
print(np.greater(a, b)) # [False, False, True]
  • Custom ufuncs:

You can create your own ufuncs using np.frompyfunc or np.vectorize.

2.4 Matrix Operations

Matrix Multiplication:

Use np.dot or the @ operator for matrix multiplication.

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
print(np.dot(a, b)) # or a @ b
# Output
[[19 22]
[43 50]]

Matrix Inversion:

Use np.linalg.inv to compute the inverse of a matrix.

a = np.array([[1, 2], [3, 4]])
inv_a = np.linalg.inv(a)
print(inv_a)
# Output
[[-2. 1. ]
[ 1.5 -0.5]]

Determinant:

Use np.linalg.det to compute the determinant of a matrix.

det_a = np.linalg.det(a)
print(det_a) # -2.0

Eigenvalues and Eigenvectors:

Use np.linalg.eig to compute eigenvalues and eigenvectors.

eigenvalues, eigenvectors = np.linalg.eig(a)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:", eigenvectors)

2.5 Random Module

Generating Random Numbers:

  • np.random.rand: Uniform distribution over [0, 1).
print(np.random.rand(3))  # [0.123, 0.456, 0.789]
  • np.random.randn: Standard normal distribution (mean=0, variance=1).
print(np.random.randn(3))  # [-0.123, 0.456, -0.789]
  • np.random.randint: Random integers within a range.
print(np.random.randint(0, 10, size=5))  # [3, 7, 2, 8, 1]

Random Sampling:

  • np.random.choice: Randomly sample from a given array.
arr = np.array([1, 2, 3, 4, 5])
print(np.random.choice(arr, size=3)) # [2, 5, 1]
  • np.random.shuffle: Shuffle an array in place.
np.random.shuffle(arr)
print(arr) # [3, 1, 5, 2, 4]

Setting Random Seeds:

Use np.random.seed to ensure reproducibility.

np.random.seed(42)
print(np.random.rand(3)) # [0.3745, 0.9507, 0.7320]

3. Advanced NumPy

3.1 Structured Arrays

What are Structured Arrays?

  • Structured arrays allow you to store heterogeneous data (e.g., integers, floats, strings) in a single NumPy array.
  • Each element of the array is a structure (similar to a row in a table).

Creating Structured Arrays:

  • Define a custom data type using dtype.
data = np.array([(1, 2.5, 'Hello'), (2, 3.7, 'World')],
dtype=[('id', 'i4'), ('value', 'f4'), ('label', 'U10')])
print(data)
# Output
[(1, 2.5, 'Hello') (2, 3.7, 'World')]

Accessing Fields:

  • Use field names to access specific columns.
print(data['id'])    # [1, 2]
print(data['value']) # [2.5, 3.7]
print(data['label']) # ['Hello', 'World']

3.2 Memory Management

Views vs Copies:

  • A view is a new array object that references the same data as the original array.
  • A copy is a new array object with its own copy of the data.
arr = np.array([1, 2, 3, 4])
view = arr[1:3] # View (references original data)
copy = arr[1:3].copy() # Copy (new data)

Memory Layout:

  • NumPy arrays can be stored in C-order (row-major) or F-order (column-major).
  • Use np.ascontiguousarray or np.asfortranarray to control memory layout.
arr = np.array([[1, 2], [3, 4]], order='C')  # C-order (default)
print(arr.flags['C_CONTIGUOUS']) # True

3.3 Performance Optimization

Vectorization

  • Replace explicit loops with vectorized operations for better performance.
# Non-vectorized (slow)
result = []
for i in range(1000):
result.append(i * 2)

# Vectorized (fast)
result = np.arange(1000) * 2

Using np.vectorize:

  • Convert a Python function into a vectorized function.
def my_func(x):
return x ** 2 + 1

vectorized_func = np.vectorize(my_func)
print(vectorized_func(np.array([1, 2, 3]))) # [2, 5, 10]

Profiling NumPy Code:

  • Use tools like timeit or cProfile to measure performance.
import timeit
setup = "import numpy as np; arr = np.random.rand(1000)"
print(timeit.timeit("arr * 2", setup=setup, number=1000))

3.4 Linear Algebra

Solving Linear Equations:

  • Use np.linalg.solve to solve systems of linear equations.
A = np.array([[3, 2], [1, 4]])
b = np.array([8, 9])
x = np.linalg.solve(A, b)
print(x) # [2., 1.]

Singular Value Decomposition (SVD):

  • Decompose a matrix into three matrices: U, Σ, and V.
A = np.array([[1, 2], [3, 4], [5, 6]])
U, S, V = np.linalg.svd(A)
print("U:", U)
print("S:", S)
print("V:", V)

QR Decomposition:

  • Decompose a matrix into an orthogonal matrix Q and an upper triangular matrix R.
Q, R = np.linalg.qr(A)
print("Q:", Q)
print("R:", R)

3.5 Masked Arrays

What are Masked Arrays?

  • Masked arrays are arrays that have a mask to indicate missing or invalid data.
  • Useful for handling incomplete datasets.

Creating Masked Arrays:

  • Use np.ma.masked_array to create a masked array.
data = np.array([1, 2, 3, -999, 5])
masked_data = np.ma.masked_array(data, mask=[0, 0, 0, 1, 0])
print(masked_data) # [1, 2, 3, --, 5]

Operations on Masked Arrays:

  • Masked arrays support most NumPy operations while ignoring masked values.
print(masked_data.mean())  # 2.75 (ignores masked value)

3.6 File I/O

Saving and Loading Arrays:

  • Use np.save and np.load to save and load arrays in .npy format.
arr = np.array([1, 2, 3])
np.save('my_array.npy', arr)
loaded_arr = np.load('my_array.npy')
print(loaded_arr)

Use np.savetxt and np.loadtxt for text files.

np.savetxt('my_array.txt', arr)
loaded_arr = np.loadtxt('my_array.txt')
print(loaded_arr)

4. NumPy for Machine Learning

4.1 Data Preprocessing

Normalization and Standardization:

  • Normalization scales data to a range of [0, 1].
data = np.array([1, 2, 3, 4, 5])
normalized_data = (data - np.min(data)) / (np.max(data) - np.min(data))
print(normalized_data) # [0., 0.25, 0.5, 0.75, 1.]
  • Standardization scales data to have a mean of 0 and a standard deviation of 1.
standardized_data = (data - np.mean(data)) / np.std(data)
print(standardized_data)

Handling Missing Values:

Replace missing values (e.g., NaN) with a specific value or an aggregate (e.g., mean).

data = np.array([1, 2, np.nan, 4, 5])
data[np.isnan(data)] = np.nanmean(data) # Replace NaNs with mean
print(data) # [1., 2., 3., 4., 5.]

4.2 Feature Engineering

Creating Polynomial Features:

Generate polynomial features for regression tasks.

from numpy.polynomial.polynomial import polyvander
data = np.array([1, 2, 3])
poly_features = polyvander(data, degree=2) # Degree 2 polynomial
print(poly_features)
# Output
[[1. 1. 1.]
[1. 2. 4.]
[1. 3. 9.]]

One-Hot Encoding:

Convert categorical data into binary vectors.

categories = np.array(['red', 'blue', 'green'])
one_hot = np.eye(len(np.unique(categories)))[categories.astype('int')]
print(one_hot)
# Output
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]

4.3 Distance Metrics

Euclidean Distance:

  • Compute the Euclidean distance between two vectors.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
distance = np.linalg.norm(a - b)
print(distance) # 5.196

Manhattan Distance:

  • Compute the Manhattan distance (sum of absolute differences).
distance = np.sum(np.abs(a - b))
print(distance) # 9

Cosine Similarity:

  • Compute the cosine of the angle between two vectors.
dot_product = np.dot(a, b)
norm_a = np.linalg.norm(a)
norm_b = np.linalg.norm(b)
cosine_sim = dot_product / (norm_a * norm_b)
print(cosine_sim) # 0.974

4.4 Matrix Factorization

Principal Component Analysis (PCA):

Use SVD to perform PCA for dimensionality reduction.

data = np.array([[1, 2], [3, 4], [5, 6]])
mean = np.mean(data, axis=0)
centered_data = data - mean
U, S, V = np.linalg.svd(centered_data)
print("Principal Components:", V)

4.5 Gradient Calculations

Computing Gradients:

  • Use NumPy to compute gradients for optimization (e.g., in gradient descent).
def loss_function(x):
return x ** 2 + 3 * x + 2

def gradient(x):
return 2 * x + 3

x = 2.0
print("Loss:", loss_function(x))
print("Gradient:", gradient(x))

4.6 Simulating Data

Generating Synthetic Datasets:

  • Create synthetic datasets for testing machine learning models.
# Linear dataset with noise
X = np.linspace(0, 10, 100)
y = 2 * X + 3 + np.random.normal(0, 1, 100)

5. Integration with Machine Learning Libraries

NumPy and Pandas:

  • Convert between NumPy arrays and Pandas DataFrames.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
numpy_array = df.to_numpy()
print(numpy_array)

NumPy and Scikit-Learn:

Use NumPy arrays as input to Scikit-Learn models.

from sklearn.linear_model import LinearRegression
X = np.array([[1], [2], [3]])
y = np.array([2, 4, 6])
model = LinearRegression()
model.fit(X, y)

NumPy and TensorFlow/PyTorch:

Convert between NumPy arrays and TensorFlow/PyTorch tensors.

import tensorflow as tf
numpy_array = np.array([1, 2, 3])
tensor = tf.convert_to_tensor(numpy_array)
print(tensor)

This post is based on interaction with https://chat.deepseek.com.

Happy learning :-)

--

--

Dilip Kumar
Dilip Kumar

Written by Dilip Kumar

With 18+ years of experience as a software engineer. Enjoy teaching, writing, leading team. Last 4+ years, working at Google as a backend Software Engineer.

No responses yet