Notation and Conventions

This book adopts consistent notation throughout to enhance readability and comprehension.

General Mathematical Notation

Symbol	Meaning
$a, b, c$	Scalars (lowercase italic)
$n, m, d$	Integer scalars (dimensions, indices)
$\vx, \vy, \vz$	Vectors (lowercase bold)
$\mA, \mB, \mC$	Matrices (uppercase bold)
$\mathcal{X}, \mathcal{D}$	Sets (uppercase calligraphic)
$f, g, h$	Functions (lowercase italic)
$\R, \N, \Z, \C$	Number sets (blackboard bold)

Linear Algebra

Symbol	Meaning
$\vx \in \R^n$	Vector $\vx$ with $n$ components
$\mA \in \R^{m \times n}$	Matrix $\mA$ with $m$ rows and $n$ columns
$a_{i,j}$ or $[\mA]_{i,j}$	Element in row $i$, column $j$ of matrix $\mA$
$\mA\transpose$	Transpose of matrix $\mA$
$\mA^{-1}$	Inverse of matrix $\mA$
$\mA \mB$	Matrix multiplication
$\mA \odot \mB$	Element-wise (Hadamard) product
$\vx \transpose \vy$	Dot product of vectors $\vx$ and $\vy$
$\norm{\vx}_2$	Euclidean (L2) norm
$\norm{\vx}_1$	L1 norm
$\norm{\mA}_F$	Frobenius norm of matrix $\mA$
$\text{tr}(\mA)$	Trace of matrix $\mA$
$\det(\mA)$	Determinant of matrix $\mA$
$\mI$ or $\mI_n$	Identity matrix

Deep Learning Specific

Symbol	Meaning
$\vx^{(i)}$	$i$-th training example
$\vx_t$	Input at time step $t$
$\vh^{(\ell)}$	Hidden state at layer $\ell$
$\mW^{(\ell)}$	Weight matrix at layer $\ell$
$\vb^{(\ell)}$	Bias vector at layer $\ell$
$\sigma(\cdot)$	Activation function (generic)
$\text{ReLU}(x)$	Rectified Linear Unit: $\max(0, x)$
$\text{softmax}(\vx)$	Softmax function
$N$ or $B$	Batch size
$d_{\text{model}}$	Model dimension
$d_k, d_v$	Dimension of keys and values
$h$	Number of attention heads
$L$	Number of layers
$V$	Vocabulary size
$n$ or $T$	Sequence length
$\eta$	Learning rate

Dimension Conventions

Throughout this book, we explicitly annotate dimensions:

For $\mW \in \R^{m \times n}$: $m$ rows, $n$ columns
Batch dimensions listed first: $\mX \in \R^{B \times n \times d}$
Superscripts for layer indices: $\vh^{(\ell)}$
Subscripts for time/position indices: $\vx_t$

← Preface 📚 Table of Contents Chapter 1: Linear Algebra for Deep Learning →