Venue: arXiv
Year: 2020
Paper: https://arxiv.org/abs/2006.07990
Abstract
The strong lottery ticket hypothesis (LTH) postulates that one can approximate any target neural network by only pruning the weights of a sufficiently over-parameterized random network. A recent work by Malach et al. establishes the first theoretical analysis for the strong LTH: one can provably approximate a neural network of width and depth , by pruning a random one that is a factor wider and twice as deep. This polynomial over-parameterization requirement is at odds with recent experimental research that achieves good approximation with networks that are a small factor wider than the target. In this work, we close the gap and offer an exponential improvement to the over-parameterization requirement for the existence of lottery tickets. We show that any target network of width and depth can be approximated by pruning a random network that is a factor wider and twice as deep. Our analysis heavily relies on connecting pruning random ReLU networks to random instances of the SubsetSum problem. We then show that this logarithmic over-parameterization is essentially optimal for constant depth networks. Finally, we verify several of our theoretical insights with experiments.
Additional information
Twitter thread: https://twitter.com/DimitrisPapail/status/1272723529222492168