Deep learning has impacted so many fields. Perhaps one of the most important applications of deep learning has come up in the field of healthcare. Technologies made with deep learning models has shown much promise lately in assisting doctors for the better treatment of people.

Radiologists are highly trained professional who diagnose diseases using medical imaging. It takes years of training before one can master it. Poorer countries cannot afford great healthcare and don’t have many highly qualified personnel to cover the entire population. Technology can be game changer here.

In this blog we will try to develop models for detecting pneumothorax, a chest disease using techniques from Computer Vision and Machine Learning.

Table of Contents

  1. Business Problem
  2. Mapping to a DL Problem
  3. Understanding the data
  4. Exploratory Data Analysis
  5. Plan for tackling the problem
  6. Data Pipeline
  7. Segmentation Model
  8. Classification Model
  9. Quantization of Models
  10. Predictions/Result
  11. Deployment
  12. Conclusion
  13. Future Work

1. Business Problem


A pneumothorax (noo-moe-THOR-aks) is a collapsed lung. A pneumothorax occurs when air leaks into the space between your lung and chest wall. This air pushes on the outside of your lung and makes it collapse. Pneumothorax can be a complete lung collapse or a collapse of only a portion of the lung. It is usually diagnosed by a radiologist with several years of experience on a chest x- ray; which sometimes is very difficult to confirm. Our goal is to classify(if present segment) pneumothorax from a set of chest radiographic images.

This is a essentially a Kaggle problem.

2. Mapping to a DL Problem

Here we have to segment pneumothorax from chest radiographic images. It is basically a binary image segmentation problem. Image segmentation is a computer vision problem where we label each pixel of the image a particular label indicating what essentially is present on that pixel.

Here in our case the whole image can segmented by two types of labels i.e. areas having pneumothorax and areas free of those. We will use convolution neural networks for the task of image segmentation.

There are various metrics which are available for image segmentation problems such as iou score, dice coefficient. etc. Image segmentation problems are basically classification problems for each pixel. In our problem the classes are severely imbalanced so we can use dice coefficient, as it basically like an F1 score where it tries to maximize both precision and recall.

Dice coefficient ranges from 0 to 1, where a score of 1 means complete overlap. It can be calculated as,

where |A∩B| represents the common elements between sets A and B, and |A|represents the number of elements in set A (and likewise for set B).

We can also use dice coefficient as loss function, something known as soft dice loss, which can be formulated as 1−Dice Coefficient. This loss function also ranges between 0 and 1, indicating lower loss for higher dice coefficient.

Here in this problem, for the loss function we went for the combination of soft dice loss and binary cross entropy.

3. Understanding the data

The dataset for the problem can be downloaded from here. The dataset consists of images in folders named train and test images. It also contains a file named train-rle.csv which has information about the areas having pneumothorax for each image in train dataset.

The images are dicom files format. DICOM stands for Digital Imaging and Communications in Medicine which are like the standard in medicine imaging. In each dicom file along with the images it also contains other information about the patient.

Each entry in train-rle.csv defines the mask i.e. areas having pneumothorax information in the column EncodedPixels. Images that do not have pneumothorax have these masks as -1. The masks is defined in terms of run length encoding. It is an image compression technique. For this technique we try to encode the sequence length using letters and numbers.

For eg — say we have binary data — WWWBWWW (W — white, B — black). This data can be encoded as W3B1W3 by reducing the data from 8 letters to 6 letters.

In order to create masks for images we need to take each mask information and iterate through and mark the pixels. Also for the submission we need to predict the image and encode the images information in rle format. The functions required to achieve this were provided in a file named It contains two methods named mask2rle and rle2mask. The purpose of these functions are implicit from their names.

In order to extract data from the dicom images we used a library name pydicom, which has all the essential methods to deal with dicom images. Here is one example

4. Exploratory Data Analysis

In this section we will try to see under the hood and understand the dataset.

At first we have created a dataset by extracting all the information about the patient from the dicom images and stored in a csv file named train.csv.

This dataset contains the path of the image, age, sex, imageid, and mask information. It also has a column has_pneumothorax indicating whether the image has or has_not pneumothorax.

4.1 Checking if there are any null values and replacing them

  • There are 42 images where it doesn’t have any mask
  • We can assume that this images don’t have pneumothorax

4.2 Visualizing the mask

Here we have taken a random image from the train dataset and visualized the mask using the methods in

It is very difficult to make out from the picture what really constitutes of pneumothorax.

We can also try to apply CLAHE to equalize images. It is primarily done to improve the contrast of the picture.

4.3 Analysis of data other than images

  • Out of 12089 images, 2669 images have pneumothorax.
  • More data is present for the male population.
  • Out of all the data of a particular gender most of them don’t have pneumothorax

We have converted all the dicom images in jpg format and saved it in a simpler path.

5. Plan for tackling the problem

The dataset given to us is severely imbalanced. So what we are going to is train two different models one classifier and the other a segmentation model.

6. Data Pipeline

For training the model we are going to make a input data pipeline using tensorflow’s You can read more about it here.

We have pre-computed the masks for all images and keep it in a separate folder and added all those images paths in the train dataset. In order to train the model we are going to keep the size of the images as (256, 256) also we are going to use skimage’s equalize_adapthist method on each image to correct the contrast. We are going to augment the images randomly using tf.image’s augmentation methods.

Here are few examples of generated images :

7. Segmentation Model

Here we will train a segmentation model which will essentially take an image and try to predict the mask.

At first we have passed the image through an resnet 34 architecture as a backbone and then passed the predicted output image through a Unet architecture.

We implemented the resnet 34 plain architecture from scratch. Resnet34 essentially came after VGG. It contains 34 layers. VGG had one major drawback of vanishing gradients as the network became deeper. This stopped people from building larger networks. Original Resnet 34 has 22 million parameters. We have tried to reduce the number of parameters by reducing the no. of filters due to limited resource.

We used the output the of resnet 34 and passed it through Unet architecture.

U-Net is an architecture primarily developed for biomedical image segmentation. This architecture can be separated into two parts : encoder/contracting path and decoder/expanding path. The primary idea for this model is that given an image input we try to downsample the image and increase no. of channels. In the hope that this increased no. of channels somehow captures the various segments and then upsample it to make the image to its original resolution. We could have also tried to keep the image of the same resolution and applied various convolutions, but this is computationally more expensive. There are also skip connections which are from downsampling path to the upsampling path. This skip connection helps the architecture to focus both on localization and capturing the context. There is one important layer called UpSample, which tries to increase the image resolution. There are various ways to achieve this. In our case we have used Upsampling2D from tensorflow library which simply repeats the rows and columns. There are also other techniques such as Conv2D transpose which tries to learn filters to upsample. It can be thought of as the inverse of convolution.

7.1 Approach 1:

We are going to train an classifier in order to predict whether an image contains pneumothorax and then only pass those images through an segmentation model. So it makes sense to train the segmentation model using only those images who contains pneumothorax.

After training this model for 70 odd epochs we got these following results:

Some predictions:

7.2 Approach 2:

In this approach we took all the images having pneumothorax and sampled equal no of non-pneumothorax images.

We first tried training the model on its own but that wasn’t giving us any satisfactory results. So we used the weights we learned from the previous approach to initialize the model and trained this model 25 more epochs.

We got these results:

Some predictions:

On comparing loss and dice coefficient for both these models we decided that the first approach is giving us better results than the second one. So we selected the first one as our segmentation model.

8. Classification Model

For training the classification model we decided to use the same resnet34 model we used in our segmentation model. We added some dense layers after the resnet34 model.

Since it is a classification problem we decided to use binary cross-entropy as our loss function.

8.1 Approach 1:

In this approach we took all the images and trained the model using our classification model. We trained these model for some 25 epochs.


Here we got an AUC score of 0.69

8.2 Approach 2:

In this we approach we took all the images which were predicted very badly using 1st approach and again appended these images to our main dataset so that they are oversampled.

We trained the same model with these modified dataset. We used the weights we learned from the previous model. We trained the model for 5 more epochs after which it was overfitting.


This model gave us an AUC score of 0.77. This model drastically reduced our no of highly misclassified points.

Comparison of scores:

In our problem recall is very important. So we tried to improve our recall by changing our threshold.

We tried several thresholds using techniques similar to grid search and found the best threshold.

  • 0.38 is the best threshold.
  • Further increasing the recall is resulting in decrease in precision so we are stopping at this point.

9. Quantization of Models

We also did post training quantization which can reduce model size also improving the models latency. This also reduces the models accuracy.

We have tried the post training dynamic range optimization. You can learn more about it here.

10. Predictions/Results

We predicted some images using four models :

  1. Predicted mask : Here we used our non-oversampled classifier and the selected predictor.
  2. Predicted mask(quantized) : Here we used our quantized version of our non oversampled classifier and also the quantized version of our selected predictor.
  3. Predicted mask(Oversampled) : Here we used our oversampled classifier and the selected predictor.
  4. Predicted mask(quantized) : Here we used our quantized version of our oversampled classifier and also the quantized version of our selected predictor.

Here are few results:

11. Deployment

We have deployed the models using streamlit. For the segmentation model we have deployed the model we got using the 1st approach in the segmentation section. For the classification part we have deployed both the models we got from the approaches, for better comparison among models. We have also the deployed the quantized models.

Below is the video recording of running some instances.

12. Conclusion

  1. Oversampling of misclassified points turned out to be very useful.
  2. It can be sometimes very difficult to detect pneumothorax.
  3. More deeper models would have probably yielded more better results.

13. Future Work

  1. More recent models can also be tried in place of Unet like Unet++.
  2. Also different encoders can also be tried like EfficientNet .etc.
  3. We can also also try using different loss function such as focal loss.etc.

Full Code can be found here. My linkedIn page.

Hope you liked reading the blog. As much I enjoyed writing it….

Machine Learning Engineer, Dreams of seeing an alien someday. LinkedIn . GitHub-

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store