According to U.S. National Cancer Institute, histopathology analysis is description of a tumor based on how abnormal the cancer cells and tissue look under a microscope. Histopathological examination is usually considered the best way to tell if cancer is present . Besides laboriousness of manual identification of tumor tissues, individual pathologist’s previous experience may limit his or her ability to identify certain tumorigenesis given the wide heterogeneity of histologic images. This project aims to train a convolutional neural network (CNN) to aid pathologists to identify presence of tumor tissues from a histologic imagery, as well as to analyze sensitivity, specificity and overall accuracy attained by CNN-based automated histopathological processing.
Description of Technology
The core technology used in this project is convolutional neural network. Specifically, the model was built based on a trained image recognition neural network, VGG16 . This network is comprised of 16 convolutional layers, 14,714,688 parameters. By leveraging the trained VGG16 network, we are expecting to attain better predictive power with relatively small computational burden. Model training was conducted using keras and tensorflow-gpu. The model was trained on Amazon Web Service (AWS p3.2xlarge instance).
Data source and description
The data images and labels were attained from the Histopathologic Cancer Detection challenge presented by Kaggle
Examples of tumor positive images:
Examples of tumor negative images:
Preprocess data
Two steps were taken in data preprocessing. First, we need to separate tumor positive and negative images to different folders (named ‘pos’ and ‘neg’); second, we need to separate out data into training, validation and test sections. Here we employed a 70:20:10 ratio for the three sections.
import os
import pandas as pd
data_dir = r'C:\Users\zkuang\Desktop\histopathologic-cancer-detection'
labels = pd.read_csv('train_labels.csv',header=0)
train_cut = int(0.7*labels.shape[0]) # first cutpoint at 0.7
val_cut = int(0.9*labels.shape[0]) # second cutpoint at 0.9
def split_data(start,stop,labels,data_dir,sub_dir):
if not os.path.exists(os.path.join(data_dir,sub_dir,'pos')):
if not os.path.exists(os.path.join(data_dir, sub_dir, 'neg')):
os.makedirs(os.path.join(data_dir, sub_dir, 'neg'))
for i in range(start, stop):
filename = labels.loc[i, 'id']
if labels.loc[i, 'label'] == 1:
os.rename(os.path.join(data_dir, 'raw', filename + '.tif'),
os.path.join(data_dir, sub_dir,'pos', filename + '.tif'))
except FileNotFoundError:
print(filename + ' not found!')
os.rename(os.path.join(data_dir, 'raw', filename + '.tif'),
os.path.join(data_dir, sub_dir,'neg',filename + '.tif'))
except FileNotFoundError:
print(filename + ' not found!')
split_data(0,train_cut,labels,data_dir,'train')# create training data
split_data(train_cut+1,val_cut,labels,data_dir,'validation')# create validation data
split_data(val_cut+1,labels.shape[0],labels,data_dir,'test')# create validation data
Breakdown of training data set-up
Training with pre-trained CNN
import keras
import logging
import os
Using TensorFlow backend.
#from google.colab import drive
#os.chdir(r'/content/gdrive/My Drive/FinalProject')
basedir = data_dir
#basedir = r'/home/ubuntu/FinalProject'
#logging.basicConfig(filename = r'/content/gdrive/My Drive/FinalProject/test.log',level = logging.INFO)
#logging.basicConfig(filename = r'C:\Users\zkuang\Google Drive\FinalProject\test.log',level = logging.INFO)
logging.basicConfig(filename = os.path.join(basedir,'test2.log'),level = logging.DEBUG)
this is the cell we announce some set numbers with regard to our data
im_size = 96
n_channel = 3
batch_size = 64
Using Keras ImageDataGenerator
Load pretrained model and specify trainable blocks. We allow the last block of VGG16 to be trainable.
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 96, 96, 3) 0
block1_conv1 (Conv2D) (None, 96, 96, 64) 1792
block1_conv2 (Conv2D) (None, 96, 96, 64) 36928
block1_pool (MaxPooling2D) (None, 48, 48, 64) 0
block2_conv1 (Conv2D) (None, 48, 48, 128) 73856
block2_conv2 (Conv2D) (None, 48, 48, 128) 147584
block2_pool (MaxPooling2D) (None, 24, 24, 128) 0
block3_conv1 (Conv2D) (None, 24, 24, 256) 295168
block3_conv2 (Conv2D) (None, 24, 24, 256) 590080
block3_conv3 (Conv2D) (None, 24, 24, 256) 590080
block3_pool (MaxPooling2D) (None, 12, 12, 256) 0
block4_conv1 (Conv2D) (None, 12, 12, 512) 1180160
block4_conv2 (Conv2D) (None, 12, 12, 512) 2359808
block4_conv3 (Conv2D) (None, 12, 12, 512) 2359808
block4_pool (MaxPooling2D) (None, 6, 6, 512) 0
block5_conv1 (Conv2D) (None, 6, 6, 512) 2359808
block5_conv2 (Conv2D) (None, 6, 6, 512) 2359808
block5_conv3 (Conv2D) (None, 6, 6, 512) 2359808
block5_pool (MaxPooling2D) (None, 3, 3, 512) 0
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
This is the part that allows for fine tuning
conv_base.trainable = True
set_trainable = False
for layer in conv_base.layers:
if == 'block5_conv1':
set_trainable = True
if set_trainable:
layer.trainable = True
layer.trainable = False
Model set up
Added flattening layer and dense network. The last layer has a sigmoid activation function, since our output is binary
from keras import models
from keras import layers
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model
import json
model = models.Sequential()
model.add(conv_base) # how you used a trained model
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
Set up data generator. Notice that images were standardized before training (rescale = 1./255). Batch size used here is the global batch size 64.
datagen = ImageDataGenerator(rescale=1./255)
train_dir = os.path.join(basedir,'train')
validation_dir = os.path.join(basedir,'validation')
train_generator = datagen.flow_from_directory(
# This is the target directory
# All images will be resized to 150x150
target_size=(im_size, im_size),
validation_generator = datagen.flow_from_directory(
target_size=(im_size, im_size),
Found 154017 images belonging to 2 classes.
Found 44004 images belonging to 2 classes.
Model compilation and fitting. We saved our model as an h5 object. This if – else statement made sure that model fitting only takes place when there isn’t a trained model available. This was for the convenience of debugging and model diagnostics. We ran 80 epochs for training and each epoch took 100 steps.
Layer (type) Output Shape Param #
vgg16 (Model) (None, 3, 3, 512) 14714688
flatten_1 (Flatten) (None, 4608) 0
dense_1 (Dense) (None, 256) 1179904
dense_2 (Dense) (None, 1) 257
Total params: 15,894,849
Trainable params: 8,259,585
Non-trainable params: 7,635,264
Model run
if os.path.isfile(os.path.join(basedir,'hist_path.h5')): # only train the model when it's not already existent
model = load_model(os.path.join(basedir,'hist_path.h5'))
with open('model_call_back.json', 'r') as f:
history = json.load(f)
history = model.fit_generator(
history = history.history,'hist_path.h5'))
with open('model_call_back.json', 'w') as f:
json.dump(history, f)
Plotting cost and accuracy. This was the part we conducted internal model diagnostics. Loss and accuracy in training samples and validation samples were plotted against epochs.
import matplotlib.pyplot as plt
%matplotlib inline
acc = history['acc']
val_acc = history['val_acc']
loss = history['loss']
val_loss = history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
Investigate accuracy, sensitivity and specificity
The verification steps yielded pictures as follow, indicating proper data maneuvering for the test step.
from PIL import Image
import numpy as np
test_dir = os.path.join(basedir,'test')
test_n_pos = len(os.listdir(os.path.join(test_dir,'pos')))
pos_features = np.zeros(shape=(test_n_pos, im_size, im_size, 3))
i = 0
for file in os.listdir(os.path.join(test_dir,'pos')):
pos_features[i] = np.divide(np.array(,'pos',file))),255)
test_n_neg = len(os.listdir(os.path.join(test_dir,'neg')))
neg_features = np.zeros(shape=(test_n_neg, im_size, im_size, 3))
i = 0
for file in os.listdir(os.path.join(test_dir,'neg')):
neg_features[i] = np.divide(np.array(,'neg',file))),255)
last_tif = (pos_features[1]*255).astype(int)
last_tif = (neg_features[1]*255).astype(int)
if os.path.isfile(os.path.join(basedir,'y_pos.txt')):
y_pos = np.loadtxt("y_pos.txt")
y_neg = np.loadtxt("y_neg.txt")
y_pos = model.predict(pos_features)
y_pos = (y_pos>0.5) *1
np.savetxt("y_pos.txt", y_pos, delimiter=",")
y_neg = model.predict(neg_features)
y_neg = (y_neg>0.5) *1
np.savetxt("y_neg.txt", y_neg, delimiter=",")
print('true positive = ')
print('false negative = ')
print('true negative = ')
print('false positive = ')
true positive =
false negative =
true negative =
false positive =
print('sensitivity =')
print('specificity =')
print('overral accuracy = ')
print((sum(y_pos) + sum(1-y_neg)) /
sensitivity =
specificity =
overral accuracy =
The model attained sensitivity of 0.899, specificity of 0.932. The overall accuracy of our model is 0.918, which is comparable to state-of-art supervised machine learning histopathological image recognition results. Given our images are of lower resolution than most studies use (often ultrahigh resolution imageries with more than 1 million pixels ), the results should be deemed particularly remarkable. It is no surprising that we attained better specificity than sensitivity, given our data is slightly skewed towards negative cases.
# visualization of misclassified images
print('Mis-classified positive images')
j = 0
for i in range(len(y_pos)):
if y_pos[i] == 0:
tif = (pos_features[i]*255).astype(int)
j += 1
if j == 5:
Mis-classified positive images
print('Mis-classified negative images')
j = 0
for i in range(len(y_neg)):
if y_neg[i] == 1:
tif = (neg_features[i]*255).astype(int)
j += 1
if j == 5:
Mis-classified negative images
it’s is difficult to intuitively elucidate why some of the images are mis-classified (such as shown in the previous section). One can conjecture that less stained imageries such as 1) and 5) in misclassified negative images are more likely to be considered negative by our algorithm, versus slides with high internal heterogeneity as seen in misclassified positive images are more likely to be falsely classified as cancerous. Therefore, it will be interesting to bring together human pathologist and data scientist to better illuminate the strengths and weakness of CNN-based histopathological analyses.
Due to the limitations of computational power, we didn’t try hyperparameter tuning for our model. It was conjectured a deeper convolutional neural net and relatively large epoch size (n= 120) should suffice to attain a reasonable approximation of the optimal model. Indeed, from the internal diagnostics we saw valuation accuracy tapering off after roughly 80 epochs. Whether or not fine-tuning learning rate can improve model performance remained to be examined. Similarly, we didn’t experiment with setting different VGG16 blocks trainable, the effect of which also remained to be examined.