Global ETD Search

11	The stability of host-pathogen multi-strain models Hawkins, Susan January 2017 (has links) Previous multi-strain mathematical models have elucidated that the degree of cross-protective responses between similar strains, acting as a form of immune selection, generates different behavioural states of the pathogen population. This thesis explores these multi-strain dynamic states, to examine their robustness and stability in the face of pathogenic intrinsic phenotypic variation, and the extrinsic force of immune selection. This is achieved in two main ways: Chapter 2 introduces phenotypic variation in pathogen transmissibility, testing the robustness of a stable pathogen population to the emergence of an introduced strain of higher transmission potential; and Chapter 3 introduces a new model with a possibility of immunity to both strain-specific and cross-strain (conserved) determinants, to investigate how heterogeneity in the specificity of a host immune response alters the pathogen population structure. A final investigation in Chapter 4 develops a method of reverse-pattern oriented modelling using a machine learning algorithm to determine which intrinsic properties of the pathogen, and their combinations, lead to particular disease-like population patterns. This research offers novel techniques to complement previous and ongoing work on multi-strain modelling, with direct applications to a range of infectious agents such as Plasmodium falciparum, influenza A, and rotavirus, but also with a wider potential for other multi-strain systems.
12	Designing energy-efficient computing systems using equalization and machine learning Takhirov, Zafar 20 February 2018 (has links) As technology scaling slows down in the nanometer CMOS regime and mobile computing becomes more ubiquitous, designing energy-efficient hardware for mobile systems is becoming increasingly critical and challenging. Although various approaches like near-threshold computing (NTC), aggressive voltage scaling with shadow latches, etc. have been proposed to get the most out of limited battery life, there is still no “silver bullet” to increasing power-performance demands of the mobile systems. Moreover, given that a mobile system could operate in a variety of environmental conditions, like different temperatures, have varying performance requirements, etc., there is a growing need for designing tunable/reconfigurable systems in order to achieve energy-efficient operation. In this work we propose to address the energy- efficiency problem of mobile systems using two different approaches: circuit tunability and distributed adaptive algorithms. Inspired by the communication systems, we developed feedback equalization based digital logic that changes the threshold of its gates based on the input pattern. We showed that feedback equalization in static complementary CMOS logic enabled up to 20% reduction in energy dissipation while maintaining the performance metrics. We also achieved 30% reduction in energy dissipation for pass-transistor digital logic (PTL) with equalization while maintaining performance. In addition, we proposed a mechanism that leverages feedback equalization techniques to achieve near optimal operation of static complementary CMOS logic blocks over the entire voltage range from near threshold supply voltage to nominal supply voltage. Using energy-delay product (EDP) as a metric we analyzed the use of the feedback equalizer as part of various sequential computational blocks. Our analysis shows that for near-threshold voltage operation, when equalization was used, we can improve the operating frequency by up to 30%, while the energy increase was less than 15%, with an overall EDP reduction of ≈10%. We also observe an EDP reduction of close to 5% across entire above-threshold voltage range. On the distributed adaptive algorithm front, we explored energy-efficient hardware implementation of machine learning algorithms. We proposed an adaptive classifier that leverages the wide variability in data complexity to enable energy-efficient data classification operations for mobile systems. Our approach takes advantage of varying classification hardness across data to dynamically allocate resources and improve energy efficiency. On average, our adaptive classifier is ≈100× more energy efficient but has ≈1% higher error rate than a complex radial basis function classifier and is ≈10× less energy efficient but has ≈40% lower error rate than a simple linear classifier across a wide range of classification data sets. We also developed a field of groves (FoG) implementation of random forests (RF) that achieves an accuracy comparable to Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) under tight energy budgets. The FoG architecture takes advantage of the fact that in random forests a small portion of the weak classifiers (decision trees) might be sufficient to achieve high statistical performance. By dividing the random forest into smaller forests (Groves), and conditionally executing the rest of the forest, FoG is able to achieve much higher energy efficiency levels for comparable error rates. We also take advantage of the distributed nature of the FoG to achieve high level of parallelism. Our evaluation shows that at maximum achievable accuracies FoG consumes ≈1.48×, ≈24×, ≈2.5×, and ≈34.7× lower energy per classification compared to conventional RF, SVM-RBF , Multi-Layer Perceptron Network (MLP), and CNN, respectively. FoG is 6.5× less energy efficient than SVM-LR, but achieves 18% higher accuracy on average across all considered datasets. Computer engineering Mobile Random forest Support vector machine
13	A random forest approach to segmenting and classifying gestures Joshi, Ajjen Das 12 March 2016 (has links) This thesis investigates a gesture segmentation and recognition scheme that employs a random forest classification model. A complete gesture recognition system should localize and classify each gesture from a given gesture vocabulary, within a continuous video stream. Thus, the system must determine the start and end points of each gesture in time, as well as accurately recognize the class label of each gesture. We propose a unified approach that performs the tasks of temporal segmentation and classification simultaneously. Our method trains a random forest classification model to recognize gestures from a given vocabulary, as presented in a training dataset of video plus 3D body joint locations, as well as out-of-vocabulary (non-gesture) instances. Given an input video stream, our trained model is applied to candidate gestures using sliding windows at multiple temporal scales. The class label with the highest classifier confidence is selected, and its corresponding scale is used to determine the segmentation boundaries in time. We evaluated our formulation in segmenting and recognizing gestures from two different benchmark datasets: the NATOPS dataset of 9,600 gesture instances from a vocabulary of 24 aircraft handling signals, and the CHALEARN dataset of 7,754 gesture instances from a vocabulary of 20 Italian communication gestures. The performance of our method compares favorably with state-of-the-art methods that employ Hidden Markov Models or Hidden Conditional Random Fields on the NATOPS dataset. We conclude with a discussion of the advantages of using our model. Computer science Computer vision Gesture recognition Machine learning Random forest
14	Machine learning and statistical analysis of complex mathematical models : an application to epilepsy Ferrat, L. January 2019 (has links) The electroencephalogram (EEG) is a commonly used tool for studying the emergent electrical rhythms of the brain. It has wide utility in psychology, as well as bringing a useful diagnostic aid for neurological conditions such as epilepsy. It is of growing importance to better understand the emergence of these electrical rhythms and, in the case of diagnosis of neurological conditions, to find mechanistic differences between healthy individuals and those with a disease. Mathematical models are an important tool that offer the potential to reveal these otherwise hidden mechanisms. In particular Neural Mass Models (NMMs), which describe the macroscopic activity of large populations of neurons, are increasingly used to uncover large-scale mechanisms of brain rhythms in both health and disease. The dynamics of these models is dependent upon the choice of parameters, and therefore it is crucial to be able to understand how dynamics change when parameters are varied. Despite they are considered low-dimensional in comparison to micro-scale neural network models, with regards to understanding the relationship between parameters and dynamics NMMs are still prohibitively high dimensional for classical approaches such as numerical continuation. We need alternative methods to characterise the dynamics of NMMs in high dimensional parameter spaces. The primary aim of this thesis is to develop a method to explore and analyse the high dimensional parameter space of these mathematical models. We develop an approach based on statistics and machine learning methods called decision tree mapping (DTM). This method is used to analyse the parameter space of a mathematical model by studying all the parameters simultaneously. With this approach, the parameter space can efficiently be mapped in high dimension. We have used measures linked with this method to determine which parameters play a key role in the output of the model. This approach recursively splits the parameter space into smaller subspaces with an increasing homogeneity of dynamics. The concepts of decision tree learning, random forest, measures of importance, statistical tests and visual tools are introduced to explore and analyse the parameter space. We introduce formally the theoretical background and the methods with examples. The DTM approach is used in three distinct studies to: • Identify the role of parameters on the dynamic model. For example, which parameters have a role in the emergence of seizure dynamics? • Constrain the parameter space, such that regions of the parameter space which give implausible dynamic are removed. • Compare the parameter sets to fit different groups. How does the thalamocortical connectivity of people with and without epilepsy differ? We demonstrate that classical studies have not taken into account the complexity of the parameter space. DTM can easily be extended to other fields using mathematical models. We advocate the use of this method in the future to constrain high dimensional parameter spaces in order to enable more efficient, person-specific model calibration.
15	In silico modeling for uncertain biochemical data Gusenleitner, Daniel January 2009 (has links) Analyzing and modeling data is a well established research area and a vast variety of different methods have been developed over the last decades. Most of these methods assume fixed positions of data points; only recently uncertainty in data has caught attention as potentially useful source of information. In order to provide a deeper insight into this subject, this thesis concerns itself with the following essential question: Can information on uncertainty of feature values be exploited to improve in silico modeling? For this reason a state-of-art random forest algorithm is developed using Matlab R. In addition, three techniques of handling uncertain numeric features are presented and incorporated in different modified versions of random forests. To test the hypothesis six realworld data sets were provided by AstraZeneca. The data describe biochemical features of chemical compounds, including the results of an Ames test; a widely used technique to determine the mutagenicity of chemical substances. Each of the datasets contains a single uncertain numeric feature, represented as an expected value and an error estimate. Themodified algorithms are then applied on the six data sets in order to obtain classifiers, able to predict the outcome of an Ames test. The hypothesis is tested using a paired t-test and the results reveal that information on uncertainty can indeed improve the performance of in silico models. Uncertain Data Random Forest Ames Test Bioinformatics Bioinformatik
16	In silico modeling for uncertain biochemical data Gusenleitner, Daniel January 2009 (has links) <p>Analyzing and modeling data is a well established research area and a vast variety of different methods have been developed over the last decades. Most of these methods assume fixed positions of data points; only recently uncertainty in data has caught attention as potentially useful source of information. In order to provide a deeper insight into this subject, this thesis concerns itself with the following essential question: Can information on uncertainty of feature values be exploited to improve in silico modeling? For this reason a state-of-art random forest algorithm is developed using Matlab R. In addition, three techniques of handling uncertain numeric features are presented and incorporated in different modified versions of random forests. To test the hypothesis six realworld data sets were provided by AstraZeneca. The data describe biochemical features of chemical compounds, including the results of an Ames test; a widely used technique to determine the mutagenicity of chemical substances. Each of the datasets contains a single uncertain numeric feature, represented as an expected value and an error estimate. Themodified algorithms are then applied on the six data sets in order to obtain classifiers, able to predict the outcome of an Ames test. The hypothesis is tested using a paired t-test and the results reveal that information on uncertainty can indeed improve the performance of in silico models.</p> Uncertain Data Random Forest Ames Test Bioinformatics Bioinformatik
17	Classification of terrain using superpixel segmentation and supervised learning / Klassificering av terräng med superpixelsegmentering och övervakad inlärning Ringqvist, Sanna January 2014 (has links) The usage of 3D-modeling is expanding rapidly. Modeling from aerial imagery has become very popular due to its increasing number of both civilian and mili- tary applications like urban planning, navigation and target acquisition. This master thesis project was carried out at Vricon Systems at SAAB. The Vricon system produces high resolution geospatial 3D data based on aerial imagery from manned aircrafts, unmanned aerial vehicles (UAV) and satellites. The aim of this work was to investigate to what degree superpixel segmentation and supervised learning can be applied to a terrain classification problem using imagery and digital surface models (dsm). The aim was also to investigate how the height information from the digital surface model may contribute compared to the information from the grayscale values. The goal was to identify buildings, trees and ground. Another task was to evaluate existing methods, and compare results. The approach for solving the stated goal was divided into several parts. The first part was to segment the image using superpixel segmentation, after that features were extracted. Then the classifiers were created and trained and finally the classifiers were evaluated. The classification method that obtained the best results in this thesis had approx- imately 90 % correctly labeled superpixels. The result was equal, if not better, compared to other solutions available on the market. Segmentation Superpixels Features Classification Machine learning Random forest
18	REMOTE SENSING BASED DETECTION OF FORESTED WETLANDS: AN EVALUATION OF LIDAR, AERIAL IMAGERY, AND THEIR DATA FUSION Suiter, Ashley E. 01 May 2015 (has links) Multi-spectral imagery provides a robust and low-cost dataset for assessing wetland extent and quality over broad regions and is frequently used for wetland inventories. However in forested wetlands, hydrology is obscured by tree canopy making it difficult to detect with multi-spectral imagery alone. Because of this, classification of forested wetlands often includes greater errors than that of other wetlands types. Elevation and terrain derivatives have been shown to be useful for modelling wetland hydrology. But, few studies have addressed the use of LiDAR intensity data detecting hydrology in forested wetlands. Due the tendency of LiDAR signal to be attenuated by water, this research proposed the fusion of LiDAR intensity data with LiDAR elevation, terrain data, and aerial imagery, for the detection of forested wetland hydrology. We examined the utility of LiDAR intensity data and determined whether the fusion of Lidar derived data with multispectral imagery increased the accuracy of forested wetland classification compared with a classification performed with only multi-spectral image. Four classifications were performed: Classification A - All Imagery, Classification B - All LiDAR, Classification C - LiDAR without Intensity, and Classification D - Fusion of All Data. These classifications were performed using random forest and each resulted in a 3-foot resolution thematic raster of forested upland and forested wetland locations in Vermilion County, Illinois. The accuracies of these classifications were compared using Kappa Coefficient of Agreement. Importance statistics produced within the random forest classifier were evaluated in order to understand the contribution of individual datasets. Classification D, which used the fusion of LiDAR and multi-spectral imagery as input variables, had moderate to strong agreement between reference data and classification results. It was found that Classification A performed using all the LiDAR data and its derivatives (intensity, elevation, slope, aspect, curvatures, and Topographic Wetness Index) was the most accurate classification with Kappa: 78.04%, indicating moderate to strong agreement. However, Classification C, performed with LiDAR derivative without intensity data had less agreement than would be expected by chance, indicating that LiDAR contributed significantly to the accuracy of Classification B. Accuracy Assessment Aerial Imagery Data Fusion LiDAR Random Forest Wetlands
19	Novel Methods of Biomarker Discovery and Predictive Modeling using Random Forest January 2017 (has links) abstract: Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF for feature selection and for generating prediction intervals. However, they are limited in their applicability and accuracy. In this dissertation, RF is applied to build a predictive model for a complex dataset, and used as the basis for two novel methods for biomarker discovery and generating prediction interval. Firstly, a biodosimetry is developed using RF to determine absorbed radiation dose from gene expression measured from blood samples of potentially exposed individuals. To improve the prediction accuracy of the biodosimetry, day-specific models were built to deal with day interaction effect and a technique of nested modeling was proposed. The nested models can fit this complex data of large variability and non-linear relationships. Secondly, a panel of biomarkers was selected using a data-driven feature selection method as well as handpick, considering prior knowledge and other constraints. To incorporate domain knowledge, a method called Know-GRRF was developed based on guided regularized RF. This method can incorporate domain knowledge as a penalized term to regulate selection of candidate features in RF. It adds more flexibility to data-driven feature selection and can improve the interpretability of models. Know-GRRF showed significant improvement in cross-species prediction when cross-species correlation was used to guide selection of biomarkers. The method can also compete with existing methods using intrinsic data characteristics as alternative of domain knowledge in simulated datasets. Lastly, a novel non-parametric method, RFerr, was developed to generate prediction interval using RF regression. This method is widely applicable to any predictive models and was shown to have better coverage and precision than existing methods on the real-world radiation dataset, as well as benchmark and simulated datasets. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2017 Biostatistics feature selection prediction interval predictive modeling random forest
20	Detección de anomalías en componentes mecánicos en base a Deep Learning y Random Cut Forests Aichele Figueroa, Diego Andrés January 2019 (has links) Memoria para optar al título de Ingeniero Civil Mecánico / Dentro del área de mantenimiento, el monitorear un equipo puede ser de gran utilidad ya que permite advertir cualquier anomalía en el funcionamiento interno de éste, y así, se puede corregir cualquier desperfecto antes de que se produzca una falla de mayor gravedad. En data mining, detección de anomalías es el ejercicio de identificar elementos anómalos, es decir, aquellos elementos que difieren a lo común dentro de un set de datos. Detección de anomalías tiene aplicación en diferentes dominios, por ejemplo, hoy en día se utiliza en bancos para detectar compras fraudulentas y posibles estafas a través de un patrón de comportamiento del usuario, por ese motivo se necesitan abarcar grandes cantidades de datos por lo que su desarrollo en aprendizajes de máquinas probabilísticas es imprescindible. Cabe destacar que se ha desarrollado una variedad de algoritmos para encontrar anomalías, una de las más famosas es el Isolated Forest dentro de los árboles de decisión. Del algoritmo de Isolated Forest han derivado distintos trabajos que proponen mejoras para éste, como es el Robust Random Cut Forest el cual, por un lado permite mejorar la precisión para buscar anomalías y, también, entrega la ventaja de poder realizar un estudio dinámico de datos y buscar anomalías en tiempo real. Por otro lado, presenta la desventaja de que entre más atributos contengan los sets de datos más tiempo de cómputo tendrá para detectar una anomalía. Por ende, se utilizará un método de reducción de atributos, también conocido como reducción de dimensión, por último se estudiará como afectan tanto en efectividad y eficiencia al algoritmo sin reducir la dimensión de los datos. En esta memoria se analiza el algoritmo Robust Random Cut Forest para finalmente entregar una posible mejora a éste. Para poner en prueba el algoritmo se realiza un experimento de barras de acero, donde se obtienen como resultado sus vibraciones al ser excitado por un ruido blanco. Estos datos se procesan en tres escenarios distintos: Sin reducción de dimensiones, análisis de componentes principales(principal component analysis) y autoencoder. En base a esto, el primer escenario (sin reducción de dimensiones) servirá para establecer un punto de orientación, para ver como varían el escenario dos y tres en la detección de anomalía, en efectividad y eficiencia. %partida para detección de anomalía, luego se ver si esta mejora Luego, se realiza el estudio en el marco de tres escenarios para detectar puntos anómalos; En los resultados se observa una mejora al reducir las dimensiones en cuanto a tiempo de cómputo (eficiencia) y en precisión (efectividad) para encontrar una anomalía, finalmente los mejores resultados son con análisis de componentes principales (principal component analysis). Localización de fallas (Ingeniería) Máquina - Partes - Fallas Random Forest Deep learning

Search results