Rethinking Deep Learning Compression Using Information Theoretic Structures
Neural compression has brought tremendous progress in designing lossy compressors with good rate-distortion (RD) performance at low complexity. Thus far, neural compression design involves transforming the source to a latent vector, which is then rounded to integers and entropy coded. While this approach has been shown to be optimal in a one-shot sense on certain sources, we show that it is highly suboptimal in general, and in fact may only recover scalar quantization of the original source sequence when confronted by independent and identically distributed (IID) sources. We demonstrate that the sub-optimality is due to the choice of quantization scheme in the latent space, and not the transform design. By employing lattice quantization instead of scalar quantization in the latent space, we propose Lattice Transform Coding (LTC) and demonstrate that it is able to approximately recover optimal vector quantization at various dimensions.
Nonlinear Transform Coding (NTC), the standard paradigm in lossy neural compression, operates on a source realization x by first transforming it to a latent vector y via an analysis transform ga. The latent vector is then scalar quantized with Q and entropy coded. To decode, a synthesis transform gs transforms the latent into the reconstruction x. A theory on why NTC performs well is that while the source x is high-dimensional, it typically has an intrinsic low-dimensional latent representation, which we refer to as the latent source. For some theoretical sources with latent dimension one, prior work shows that NTC can (i) successfully recover the latent source, (ii) optimally quantize it, and (iii) map back to the original space, providing an optimal one-shot coding scheme.
On the other hand, experimental work has shown suboptimality of NTC for real-world sources. Information theoretically, optimal performance can in general be achieved through vector quantization (VQ), a classical technique that has recently been investigated for neural compression as well. However, VQ requires computational complexity that is exponential in the rate and dimension. Thus, we desire a near-optimal method that maintains low complexity. To resolve this, we propose to perform lattice quantization in the latent space. We refer to our proposed architecture as Lattice Transform Coding (LTC). Our contributions are as follows.
- We first demonstrate the inability of NTC to optimally compress IID sequences. We show that this is due to the choice of scalar quantization in the latent space.
- We propose Lattice Transform Coding (LTC), and show that it is able to optimally compress IID sequences, yet still maintain reasonable complexity. We discuss various design choices in the transform design as well as entropy modeling that are required to recover optimality.
- We demonstrate LTC on IID blocks of vector sources, using lattice transform coding in the latent space, and show the ability of LTC to approach the rate-distortion function of the vector source. We additionally demonstrate on general sources, such as correlated vector sources, and real-world data such as image patches and audio.