Product Quantization
Product quantization (PQ) is a lossy vector compression technique that splits each vector into subvectors and encodes each with a small learned codebook. It shrinks high-dimensional embeddings by 90–97% while still supporting approximate distance computation in the compressed domain, making it a backbone of large-scale ANN indexes.
Product quantization (PQ) is a lossy compression scheme for high-dimensional vectors introduced by Hervé Jégou, Matthijs Douze, and Cordelia Schmid in 2011. It enables approximate nearest neighbor search over billion-scale datasets that would not fit in memory in their raw form, and is one of the core algorithms in FAISS. A d-dimensional vector is split into m equal-length subvectors. Each subspace is clustered independently with k-means, typically using 256 centroids so each subvector is encoded with a single byte. A full vector is then represented by m bytes — for example, a 1024-dimensional float32 vector of 4 KB compresses to 64 bytes at m=64, a 64x reduction. Decoding looks up centroids per subspace and concatenates them; reconstruction error depends on m and the number of centroids. PQ supports two fast distance computations on compressed codes. Symmetric distance compares two codes via a precomputed centroid distance table. Asymmetric distance keeps the query uncompressed and sums per-subspace distances from the query subvectors to centroids, giving better accuracy and dominating in practice. PQ is usually combined with an inverted-file index (IVFPQ) that first prunes to a few coarse clusters, or stacked under a graph index like HNSW. Extensions include optimized PQ (OPQ), which rotates the space to balance subspace variance, and residual PQ, which encodes the quantization error of a coarse index.