Global ETD Search

1291	Strojové učení pro odpovídání na otázky v češtině / Machine Learning for Question Answering in Czech Pastorek, Peter January 2020 (has links) This Master's thesis deals with teaching neural network question answering in Czech. Neural networks are created in Python programming language using the PyTorch library. They are created based on the LSTM structure. They are trained on the Czech SQAD dataset. Because Czech data set is smaller than the English data sets, I opted to extend neural networks with algorithmic procedures. For easier application of algorithmic procedures and better accuracy, I divide question answering into smaller parts.
1292	Optimal Seeding Rates for New Hard Red Spring Wheat Cultivars in Diverse Environments Stanley, Jordan D. January 2019 (has links) Seeding rate in hard red spring wheat (HRSW) (Triticum aestivum L.) production impacts input cost and grain yield. Predicting the optimal seeding rate (OSR) for HRSW cultivars can aid growers and eliminate the need for costly seeding rate research. Research was conducted to determine the OSR of newer HRSW cultivars (released in 2013 or later) in diverse environments. Nine cultivars with diverse genetic and phenotypic characteristics were evaluated at four seeding rates in 11 environments throughout the northern Great Plains region in 2017-2018. Results from ANOVA indicated environment and cultivar were more important than seeding rate in determining grain yield. Though there was no environment x seeding rate interaction (P=0.37), OSR varied among cultivar within each environment. Cultivar x environment interactions were further explored with the objective of developing a decision support system (DSS) to aid growers in determining the OSR for the cultivar they select, and for the environment in which it is sown. Data from seeding rate trials conducted in ND and MN from 2013-2015 were also used. A novel method for characterizing cultivar for tillering capacity was developed and proposed as a source for information on tillering to be used in statistical modelling. A 10-fold repeated cross-validation of the seeding rate data was analyzed by 10 statistical learning algorithms to determine a model for predicting OSR of newer cultivars. Models were similar in prediction accuracy (P=0.10). The decision tree model was considered the most reliable as bias was minimized by pruning methods, and model variance was acceptable for OSR predictions (RMSE=1.24). Findings from this model were used to develop the grower DSS for determining OSR dependent on cultivar straw strength, tillering capacity, and yield of the environment. Recommendations for OSR ranged from 3.1 to 4.5 million seeds ha-1. Growers can benefit from using this DSS by sowing at OSR relative to their average yields; especially when seeding new HRSW cultivars. machine learning modelling optimal predict seeding rates wheat
1293	Extracting Useful Information and Building Predictive Models from Medical and Health-Care Data Using Machine Learning Techniques Kabir, Md Faisal January 2020 (has links) In healthcare, a large number of medical data has emerged. To effectively use these data to improve healthcare outcomes, clinicians need to identify the relevant measures and apply the correct analysis methods for the type of data at hand. In this dissertation, we present various machine learning (ML) and data mining (DM) methods that could be applied to the type of data sets that are available in the healthcare area. The first part of the dissertation investigates DM methods on healthcare or medical data to find significant information in the form of rules. Class association rule mining, a variant of association rule mining, was used to obtain the rules with some targeted items or class labels. These rules can be used to improve public awareness of different cancer symptoms and could also be useful to initiate prevention strategies. In the second part of the thesis, ML techniques have been applied in healthcare or medical data to build a predictive model. Three different classification techniques on a real-world breast cancer risk factor data set have been investigated. Due to the imbalance characteristics of the data set various resampling methods were used before applying the classifiers. It is shown that there was a significant improvement in performance when applying a resampling technique as compared to applying no resampling technique. Moreover, super learning technique that uses multiple base learners, have been investigated to boost the performance of classification models. Two different forms of super learner have been investigated - the first one uses two base learners while the second one uses three base learners. The models were then evaluated against well-known benchmark data sets related to the healthcare domain and the results showed that the SL model performs better than the individual classifier and the baseline ensemble. Finally, we assessed cancer-relevant genes of prostate cancer with the most significant correlations with the clinical outcome of the sample type and the overall survival. Rules from the RNA-sequencing of prostate cancer patients was discovered. Moreover, we built the regression model and from the model rules for predicting the survival time of patients were generated. classification data mining gene expression healthcare machine learning super learner
1294	Enabling statistical analysis of the main ionospheric trough with computer vision Starr, Gregory Walter Sidor 25 September 2021 (has links) The main ionospheric trough (MIT) is a key density feature in the mid-latitude ionosphere and characterizing its structure is important for understanding GPS radio signal scintillation and HF wave propagation. While a number of previous studies have statistically investigated the properties of the trough, they have only examined its latitudinal cross sections, and have not considered the instantaneous two-dimensional structure of the trough. In this work, we developed an automatic optimization-based method for identifying the trough in Total Electron Content (TEC) maps and quantified its agreement with the algorithm developed in (Aa et al., 2020). Using the newly developed method, we created a labeled dataset and statistically examined the two-dimensional structure of the trough. Specifically, we investigated how Kp affects the trough’s occurrence probability at different local times. At low Kp, the trough tends to form in the postmidnight sector, and with increasing Kp, the trough occurrence probability increases and shifts premidnight. We explore the possibility that this is due to increased occurrence of troughs formed by subauroral polarization streams (SAPS). Additionally, using SuperDARN convection maps and solar wind data, we characterized the MIT's dependence on the interplanetary magnetic field (IMF) clock angle. Electrical engineering Data science Ionosphere Machine learning Magnetosphere
1295	Identification of Enhancers In Human: Advances In Computational Studies Kleftogiannis, Dimitrios A. 24 March 2016 (has links) Roughly ~50% of the human genome, contains noncoding sequences serving as regulatory elements responsible for the diverse gene expression of the cells in the body. One very well studied category of regulatory elements is the category of enhancers. Enhancers increase the transcriptional output in cells through chromatin remodeling or recruitment of complexes of binding proteins. Identification of enhancer using computational techniques is an interesting area of research and up to now several approaches have been proposed. However, the current state-of-the-art methods face limitations since the function of enhancers is clarified, but their mechanism of function is not well understood. This PhD thesis presents a bioinformatics/computer science study that focuses on the problem of identifying enhancers in different human cells using computational techniques. The dissertation is decomposed into four main tasks that we present in different chapters. First, since many of the enhancer’s functions are not well understood, we study the basic biological models by which enhancers trigger transcriptional functions and we survey comprehensively over 30 bioinformatics approaches for identifying enhancers. Next, we elaborate more on the availability of enhancer data as produced by different enhancer identification methods and experimental procedures. In particular, we analyze advantages and disadvantages of existing solutions and we report obstacles that require further consideration. To mitigate these problems we developed the Database of Integrated Human Enhancers (DENdb), a centralized online repository that archives enhancer data from 16 ENCODE cell-lines. The integrated enhancer data are also combined with many other experimental data that can be used to interpret the enhancers content and generate a novel enhancer annotation that complements the existing integrative annotation proposed by the ENCODE consortium. Next, we propose the first deep-learning computational framework for identifying enhancers. The proposed system called Dragon Ensemble Enhancer Predictor (DEEP) is based on the novel deep learning two-layer ensemble algorithm capable of identifying enhancers characterized by different cellular conditions. Experimental results using data from ENCODE and FANTOM5, demonstrate that DEEP surpasses in terms of recognition performance the major systems for enhancer prediction and shows very good generalization capabilities in unknown cell-lines and tissues. Finally, we take a step further by developing a novel feature selection method suitable for defining a computational framework capable of analyzing the genomic content of enhancers and reporting cell-line specific predictive signatures. Bioinformatics machine learning computer science Epigenomics transcription regulation
1296	An Empirical Study of the Distributed Ellipsoidal Trust Region Method for Large Batch Training Alnasser, Ali 10 February 2021 (has links) Neural networks optimizers are dominated by first-order methods, due to their inexpensive computational cost per iteration. However, it has been shown that firstorder optimization is prone to reaching sharp minima when trained with large batch sizes. As the batch size increases, the statistical stability of the problem increases, a regime that is well suited for second-order optimization methods. In this thesis, we study a distributed ellipsoidal trust region model for neural networks. We use a block diagonal approximation of the Hessian, assigning consecutive layers of the network to each process. We solve in parallel for the update direction of each subset of the parameters. We show that our optimizer is fit for large batch training as well as increasing number of processes. optimization trust region distributed computing deep learning machine learning
1297	Computational Approaches Reveal New Insights into Regulation and Function of Non; coding RNAs and their Targets Alam, Tanvir 28 November 2016 (has links) Regulation and function of protein-coding genes are increasingly well-understood, but no comparable evidence exists for non-coding RNA (ncRNA) genes, which appear to be more numerous than protein-coding genes. We developed a novel machine-learning model to distinguish promoters of long ncRNA (lncRNA) genes from those of protein-coding genes. This represents the first attempt to make this distinction based on properties of the associated gene promoters. From our analyses, several transcription factors (TFs), which are known to be regulated by lncRNAs, also emerged as potential global regulators of lncRNAs, suggesting that lncRNAs and TFs may participate in bidirectional feedback regulatory network. Our results also raise the possibility that, due to the historical dependence on protein-coding gene in defining the chromatin states of active promoters, an adjustment of these chromatin signature profiles to incorporate lncRNAs is warranted in the future. Secondly, we developed a novel method to infer functions for lncRNA and microRNA (miRNA) transcripts based on their transcriptional regulatory networks in 119 tissues and 177 primary cells of human. This method for the first time combines information of cell/tissueVspecific expression of a transcript and the TFs and transcription coVfactors (TcoFs) that control activation of that transcript. Transcripts were annotated using statistically enriched GO terms, pathways and diseases across cells/tissues and associated knowledgebase (FARNA) is developed. FARNA, having the most comprehensive function annotation of considered ncRNAs across the widest spectrum of cells/tissues, has a potential to contribute to our understanding of ncRNA roles and their regulatory mechanisms in human. Thirdly, we developed a novel machine-learning model to identify LD motif (a protein interaction motif) of paxillin, a ncRNA target that is involved in cell motility and cancer metastasis. Our recognition model identified new proteins not previously known to harbor LD motifs and we experimentally confirmed some of our predicted motifs. This novel discovery will expand our knowledge of cancer metastasis and will facilitate therapeutic targeting linking specific ncRNAs via paxillin proteins to diseases. Finally, through bioinformatics approaches, we identified lncRNAs as markers that distinguish classical from alternative activation of macrophage. This result may have good use in the diagnosis of infectious diseases. Long non-coding RNA MiRNA LD motif Macrophage Machine Learning Bioinformatics
1298	A Hybrid Approach to General Information Extraction Grap, Marie Belen 01 September 2015 (has links) Information Extraction (IE) is the process of analyzing documents and identifying desired pieces of information within them. Many IE systems have been developed over the last couple of decades, but there is still room for improvement as IE remains an open problem for researchers. This work discusses the development of a hybrid IE system that attempts to combine the strengths of rule-based and statistical IE systems while avoiding their unique pitfalls in order to achieve high performance for any type of information on any type of document. Test results show that this system operates competitively in cases where target information belongs to a highly-structured data type and when critical contextual information is in close proximity to the target. information extraction machine learning natural language processing Computational Engineering
1299	A Data-Driven Approach to Cubesat Health Monitoring Singh, Serbinder 01 June 2017 (has links) Spacecraft health monitoring is essential to ensure that a spacecraft is operating properly and has no anomalies that could jeopardize its mission. Many of the current methods of monitoring system health are difficult to use as the complexity of spacecraft increase, and are in many cases impractical on CubeSat satellites which have strict size and resource limitations. To overcome these problems, new data-driven techniques such as Inductive Monitoring System (IMS), use data mining and machine learning on archived system telemetry to create models that characterize nominal system behavior. The models that IMS creates are in the form of clusters that capture the relationship between a set of sensors in time series data. Each of these clusters define a nominal operating state of the satellite and the range of sensor values that represent it. These characterizations can then be autonomously compared against real-time telemetry on-board the spacecraft to determine if the spacecraft is operating nominally. This thesis presents an adaption of IMS to create a spacecraft health monitoring system for CubeSat missions developed by the PolySat lab. This system is integrated into PolySat's flight software and provides real time health monitoring of the spacecraft during its mission. Any anomalies detected are reported and further analysis can be done to determine the cause. The system can also be used for the analysis of archived events. The IMS algorithms used by the system were validated, and ground testing was done to determine the performance, reliability, and accuracy of the system. The system was successful in the detection and identification of known anomalies in archived flight telemetry from the IPEX mission. In addition, real-time monitoring performed on the satellite yielded great results that give us confidence in the use of this system in all future missions. Cubesat machine learning satellites clustering Computer and Systems Architecture
1300	Prédire la structure des forêts à partir d'images PolInSAR par apprentissage de descripteurs LIDAR / Prediction of forests structure from PolInSAR images by machine learning using LIDAR derived features Brigot, Guillaume 20 December 2017 (has links) Ce travail de thèse a pour objectif la prédiction des paramètres structurels des forêts à grande échelle, grâce aux images de télédétection. La démarche consiste à étendre la précision des données LIDAR spatiales, en les utilisant là où elles sont disponibles, en tant que donnée d'apprentissage pour les images radar à synthèse d'ouverture polarimétriques et interférométrique (PolInSAR). A partir de l'analyse des propriétés géométriques de la forme de cohérence PolInSAR, nous avons proposé un ensemble de paramètres susceptibles d'avoir une corrélation forte avec les profils de densité LIDAR en milieu forestier. Cette description a été utilisée comme données d'entrée de techniques SVM, de réseaux de neurones, et de forêts aléatoires, afin d'apprendre un ensemble de descripteurs de forêts issus du LIDAR : la hauteur totale, le type de profil vertical, et la couverture horizontale. L'application de ces techniques à des données réelles aéroportées de forêts boréales en Suède et au Canada, et l'évaluation de leur précision, démontrent la pertinence de la méthode. Celle-ci préfigure les traitements qui pourront être appliqués à l'échelle planétaires aux futures missions satellites dédiées à la forêt : Biomass, Tandem-L et NiSAR. / The objective of this thesis is to predict the structural parameters of forests on a large scale using remote sensing images. The approach is to extend the accuracy of LIDAR full waveforms, on a larger area covered by polarimetric and interferometric (PolInSAR) synthetic aperture radar images using machine learning methods. From the analysis of the geometric properties of the PolInSAR coherence shape, we proposed a set of parameters that are likely to have a strong correlation with the LIDAR density profiles on forest lands. These features were used as input data for SVM techniques, neural networks, and random forests, in order to learn a set of forest descriptors deduced from LIDAR: the canopy height, the vertical profile type, and the canopy cover. The application of these techniques to airborne data over boreal forests in Sweden and Canada, and the evaluation of their accuracy, demonstrate the relevance of the method. This approach can be soon be adapted for future satellite missions dedicated to the forest: Biomass, Tandem-L and NiSAR. PolInSAR LIDAR Foresterie Apprentissage automatique PolInSAR Lidar Forestry Machine learning

Search results