Macquarie University
01whole.pdf (602.24 kB)
Download file

An Empirical study on model pruning and quantization

Download (602.24 kB)
posted on 2023-02-23, 03:18 authored by Yuzhe Tian

In machine learning, model compression is vital for resource-constrained Internet of Things (IoT) devices, such as unmanned aerial vehicles (UAVs) and wearable devices. Currently, there are some state-of-the-art (SOTA) compression methods, but little study is conducted to evaluate these techniques across different models and datasets.

In this paper, we present an in-depth study on two SOTA model compression methods, pruning and quantization. We apply these methods on AlexNet, ResNet18, VGG16BN and VGG19BN, with three well-known datasets, Fashion-MNIST, CIFAR-10, and UCI-HAR. Through our study, we draw the conclusion that, applying pruning and retraining could keep the performance (average less than 0.5% degrade) while reducing the model size (at 10 compression rate) on spatial domain datasets (e.g. pictures); the performance on temporal domain datasets (e.g. motion sensors data) degrades more (average about 5.0% degrade); the performance of quantization is related with the pruning rate, network architecture, and clustering methods. We also conduct comparative experiments on knowledge distillation. The result indicates that more prerequisites need to be satisfied when using the knowledge distillation to achieve average performance.

Finally, we provide some interesting directions for future research.


Table of Contents

1. Introduction -- 2. Literature Review -- 3. An Empirical Study on Model Pruning and Quantization -- 4. Conclusion -- Appendix -- Bibliography

Awarding Institution

Macquarie University

Degree Type

Thesis MRes

Department, Centre or School

School of Computing

Year of Award


Principal Supervisor

James Xi Zheng


Copyright: The Author Copyright disclaimer:




63 pages

Usage metrics

    Macquarie University Theses