Notation and Conventions

This book adopts consistent notation throughout to enhance readability and comprehension.

General Mathematical Notation

SymbolMeaning
$a, b, c$Scalars (lowercase italic)
$n, m, d$Integer scalars (dimensions, indices)
$\vx, \vy, \vz$Vectors (lowercase bold)
$\mA, \mB, \mC$Matrices (uppercase bold)
$\mathcal{X}, \mathcal{D}$Sets (uppercase calligraphic)
$f, g, h$Functions (lowercase italic)
$\R, \N, \Z, \C$Number sets (blackboard bold)

Linear Algebra

SymbolMeaning
$\vx \in \R^n$Vector $\vx$ with $n$ components
$\mA \in \R^{m \times n}$Matrix $\mA$ with $m$ rows and $n$ columns
$a_{i,j}$ or $[\mA]_{i,j}$Element in row $i$, column $j$ of matrix $\mA$
$\mA\transpose$Transpose of matrix $\mA$
$\mA^{-1}$Inverse of matrix $\mA$
$\mA \mB$Matrix multiplication
$\mA \odot \mB$Element-wise (Hadamard) product
$\vx \transpose \vy$Dot product of vectors $\vx$ and $\vy$
$\norm{\vx}_2$Euclidean (L2) norm
$\norm{\vx}_1$L1 norm
$\norm{\mA}_F$Frobenius norm of matrix $\mA$
$\text{tr}(\mA)$Trace of matrix $\mA$
$\det(\mA)$Determinant of matrix $\mA$
$\mI$ or $\mI_n$Identity matrix

Deep Learning Specific

SymbolMeaning
$\vx^{(i)}$$i$-th training example
$\vx_t$Input at time step $t$
$\vh^{(\ell)}$Hidden state at layer $\ell$
$\mW^{(\ell)}$Weight matrix at layer $\ell$
$\vb^{(\ell)}$Bias vector at layer $\ell$
$\sigma(\cdot)$Activation function (generic)
$\text{ReLU}(x)$Rectified Linear Unit: $\max(0, x)$
$\text{softmax}(\vx)$Softmax function
$N$ or $B$Batch size
$d_{\text{model}}$Model dimension
$d_k, d_v$Dimension of keys and values
$h$Number of attention heads
$L$Number of layers
$V$Vocabulary size
$n$ or $T$Sequence length
$\eta$Learning rate

Dimension Conventions

Throughout this book, we explicitly annotate dimensions:

← Preface 📚 Table of Contents Chapter 1: Linear Algebra for Deep Learning →