On Pruning Large Language Model's

Apr 1, 2023

This blogpost describe the compressing of BERT, in the context of the Lottery Ticket Hypothesis. Through emperical evidence obtained by fine tuning on several tasks, it is found that 30-40% of the parameters in the BERT model can be discarded.