This thesis proposes different models for a variety of applications, such as semantic segmentation, in-the-wild face recognition, microscopy cell counting and detection, standardized re-orientation of 3D ultrasound fetal brain and Magnetic Resonance (MR) cardiac video segmentation. Our approach is to employ the large-scale machine learning models, in particular deep neural networks. Expert knowledge is either mathematically modelled as a differentiable hidden layer in the Artificial Neural Networks, or we tried to break the complex tasks into several small and easy-to-solve tasks. Multi-scale contextual information plays an important role in pixel-wise predic- tion, e.g. semantic segmentation. To capture the spatial contextual information, we present a new block for learning receptive field adaptively by within-layer recurrence. While interleaving with the convolutional layers, receptive fields are effectively enlarged, reaching across the entire feature map or image. The new block can be initialized as identity and inserted into any pre-trained networks, therefore taking benefit from the "pre-train and fine-tuning" paradigm. Current face recognition systems are mostly driven by the success of image classification, where the models are trained to by identity classification. We propose a multi-column deep comparator networks for face recognition. The architecture takes two sets (each contains an arbitrary number of faces) of images or frames as inputs, facial part-based (e.g. eyes, noses) representations of each set are pooled out, dynamically calibrated based on the quality of input images, and further compared with local "experts" in a pairwise way. Unlike the computer vision applications, collecting data and annotation is usually more expensive in biomedical image analysis. Therefore, the models that can be trained with fewer data and weaker annotations are of great importance. We approach the microscopy cell counting and detection based on density estimation, where only central dot annotations are needed. The proposed fully convolutional regression networks are first trained on a synthetic dataset of cell nuclei, later fine-tuned and shown to generalize to real data. In 3D fetal ultrasound neurosonography, establishing a coordinate system over the fetal brain serves as a precursor for subsequent tasks, e.g. localization of anatomical landmarks, extraction of standard clinical planes for biometric assessment of fetal growth, etc. To align brain volumes into a common reference coordinate system, we decompose the complex transformation into several simple ones, which can be easily tackled with Convolutional Neural Networks. The model is therefore designed to leverage the closely related tasks by sharing low-level features, and the task-specific predictions are then combined to reproduce the transformation matrix as the desired output. Finally, we address the problem of MR cardiac video analysis, in which we are interested in assisting clinical diagnosis based on the fine-grained segmentation. To facilitate segmentation, we present one end-to-end trainable model that achieves multi-view structure detection, alignment (standardized re-orientation), and fine- grained segmentation simultaneously. This is motivated by the fact that the CNNs in essence is not rotation equivariance or invariance, therefore, adding the pre-alignment into the end-to-end trainable pipeline can effectively decrease the complexity of segmentation for later stages of the model.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:757720 |
Date | January 2017 |
Creators | Xie, Weidi |
Contributors | Zisserman, Andrew P. ; Noble, Alison |
Publisher | University of Oxford |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://ora.ox.ac.uk/objects/uuid:5fcfb784-7b61-49cd-9561-64b5ffa5807a |
Page generated in 0.0022 seconds