Rare Gems: Finding Lottery Tickets at Initialization
Abstract: Large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by following a time-consuming “train, prune, re-train” approach. Frankle & Carbin in 2019 conjectured that we can avoid this by training lottery tickets, i.e., special sparse subnetworks found at initialization, that can be trained to high accuracy. However, a subsequent line of work presents concrete evidence that current algorithms for finding trainable networks at initialization, fail simple baseline comparisons, e.g., against training random sparse subnetworks. Finding lottery tickets that train to better accuracy compared to simple baselines remains an open problem. In this work, we resolve this open problem by discovering Rare Gems: sparse, trainable networks at initialization, that achieve high accuracy even before training. When Rare Gems are trained with SGD, they achieve accuracy competitive or better than Iterative Magnitude Pruning (IMP) with warmup.