Globally, there are approximately 7% of the population suffering from Pneumonia (Ruuskanen et al. [2011]). Diagnosing pneumonia requires a review of chest radiography by specialized trained radiologists. Improving the efficiency and reach of diagnostic services is of great importance in developing countries.

Deep Neural Networks (DNN) have been experiencing a rapid, tremendous progress due to the availability of large datasets. DNNs such as ChexNet (Rajpurkar et al. [2017]), TieNet (Wang et al. [2018]) have been proposed by researchers to aid in early diagnosis and detection of pneumonia and there has been great success. These networks, however, have deep architectures resulting into large memory storage and computation requirements. And as such, poses a great challenge for portable medical devices and embedded systems.

Model compression is a technique for reducing such large computations and memory requirements. Various methods to do such have been proposed. Some of which include filter pruning (Li et al. [2016]), channel pruning (He et al. [2017]). Majority of these pruning techniques have been experimented on large DNN models for different task. To our knowledge, not much has been done for models trained for medical image detection.

Hence, the goal of this work is to build a model compression algorithm for ChexNet. The ChexNet network is chosen as the base model because it is the current state of the art technique in detecting Pneumonia on chest x-ray and as such, a reasonable choice.


Most model compression process involves training a model, pruning the model and fine-tuning the model. In this work, we have an extra condition of using an embedded device that in a way simulates a low power medical diagnostic device. As such our proposed method involves a unified model compression algorithm.

We propose a two different component to the model compression. One component is weight pruning and the other part is structured pruning. The objective of the weight pruning is to minimize the loss function while satisfying the pruning criteria.

The pruning criteria are constraints and one of which is the absolute value of each wight meeting a particular threshold. We have a condition of implementing on a low powered device and so, while weight pruning may reduce the computation, the memory storage could still be large. Hence, the other component is structured pruning where we aim to compress the convolutional layer in the deep network as this is where the computational overhead lies.

Long-term vision

It is common in most African countries that diagnostic services are not easily accessible to low income earners. Hence this project has a long term vision of developing low powered embedded devices. Since these devices would involve the use of machine learning models, our methodology can be easily extended to various models for reducing memory storage and computation resources.

Importantly, our model compression algorithm would be used for multiple models that would be deployed to edge devices. Diagnosing skin diseases on mobile phone is part of the suite of projects the team is working on. Various projects within medical imaging is ongoing in the team, and hence, this model compression is an essential component for other projects in the pipeline.