In this post we would like to show you some of the progress we have made in the use of deep neural networks in a small data scenario, instead of the typical presence of big data for deep learning.
The example application is the segmentation of certain cell types in a transmission electron miscrocopy (TEM) image of a zebrafish (see image below). The image is freely available at nanotomy.org. The goal is to show that it is possible to use deep learning when there is only a (very) limited amount of annotated data available.
The TEM image has a resolution of 3840 x 2920 pixels and was split in patches of 256 x 256 pixels for classification. The original TEM image has an even higher resolution but that was not used in this example.
Patches were annotated manually by outlining background and cell areas. Not all cell types were included but only the bigger type of cells (see image). Only part of each patch was annotated; typically a few cells and some background area.
Patches were then divided in a training set and test set. The training test was used to train a deep fully convolutional network for segmentation (Ronneberger et al. 2015). We extended the network with a recurrence to further improve the segmentation when it enters the higher time steps of the recurrence. Also support for learning from partially labeled patches was added. To be able to make use of small data we used a number of measures to prevent overfitting, such as extensive data augmentation, smart criteria for stopping training, and a modified optimization strategy which uses backtracking to find optimal models. We also added a scheme to optimally choose the next patch to be labeled based on what the system has learned so far.
We then performed an experiment to simulate an interactive annotation process by a human, where the system is continually retrained while the human annotates patches. The experiment also shows the effect of the amount of annotated data on the performance of the system.
In the first round only 3 patches were used (randomly chosen); one for training and two for evaluation/testing. The image below shows the output of the deep learning system when using 3 patches for learning a model. Bright indicates a high probability of a cell being present, dark a high probability of being background. Training patches are indicated with red, testing patches in blue and unused patches in green. Note that the output is shown at the moment that the model error is lowest on the test patches, which isn’t necessarily the best model for the training patch. With only 1 training patch it is difficult for the system to generalize and the classification result is not that good yet.
In the next round one training patch is added and the system is retrained. The performance of the classification is still limited but at least a somewhat better distinction is made between cells and background.
The animation below shows the effect of adding more, but still a small amount, of training patches. Especially look at the unused (green) patches as these have not been involved in any way in the training of the system. It shows that after annotating only a few training patches quite decent segmentations are made by the system.
After 15 training patches the system has learned to segment most of the cells of the target type with good accuracy. Annotating these 15 patches would take an observer at most half an hour, also because not everything in a patch does have to be annotated.
Obviously the system makes some errors in this small experiment and these need to be improved to turn this into a full application. The astute observer might for example have noticed the border effects due to the patches, but these can be easily fixed by using for example overlapping patches. Furthermore we are working on making the annotation process smarter so that only those objects of interest (cells in this case) that are poorly segmented have to be annotated in the interactive training process. Also we are looking into extending this application to discriminate between more structures in the image, such as between more cell types or classification of structures of the extracellular matrix.
What can be learned from this experiment is that it is possible to automate a (for most people) boring task, cell delineation, with a small annotation effort. The trained system can readily be applied to more TEM images of similar appearance. This allows a quick automatic labeling of large areas and supports the goal of a more quantitative cell biology. The system is general and can easily be extended to other applications, 2D images but also 3D and higher images.
The experiment was developed and run on a NVIDIA Digits DevBox inspired setup with two NVIDIA Titan X GPUs.