This blogpost describe the compressing of BERT, in the context of the Lottery Ticket Hypothesis. Through emperical evidence obtained by fine tuning on several tasks, it is found that 30-40% of the parameters in the BERT model can be discarded.