Global ETD Search

221	Genetic Programming Approach for Nonstationary Data Analytics Kuranga, Cry 16 February 2021 (has links) Nonstationary data with concept drift occurring is usually made up of different underlying data generating processes. Therefore, if the knowledge of the existence of different segments in the dataset is not taken into consideration, then the induced predictive model is distorted by the past existing patterns. Thus, the challenge posed to a regressor is to select an appropriate segment that depicts the current underlying data generating process to be used in a model induction. The proposed genetic programming approach for nonstationary data analytics (GPANDA) provides a piecewise nonlinear regression model for nonstationary data. The GPANDA consists of three components: dynamic differential evolution-based clustering algorithm to split the parameter space into subspaces that resemble different data generating processes present in the dataset; the dynamic particle swarm optimization-based model induction technique to induce nonlinear models that describe each generated cluster; and dynamic genetic programming that evolves model trees that define the boundaries of nonlinear models which are expressed as terminal nodes. If an environmental change is detected in a nonstationary dataset, a dynamic differential evolution-based clustering algorithm clusters the data. For the clusters that change, the dynamic particle swarm optimization-based model induction approach adapts nonlinear models or induces new models to create an updated genetic programming terminal set and then, purple the genetic programming evolves a piecewise predictive model to fit the dataset. To evaluate the effectiveness of GPANDA, experimental evaluations were conducted on both artificial and real-world datasets. Two stock market datasets, GDP and CPI were selected to benchmark the performance of the proposed model to the leading studies. GPANDA outperformed the genetic programming algorithms designed for dynamic environments and was competitive to the state-of-art-techniques. / Thesis (PhD)--University of Pretoria, 2020. / UP Postgraduate Research Bursary / Computer Science / PhD / Unrestricted Computational Intelligence Machine learning UCTD
222	ASSESSING METHODS AND TOOLS TO IMPROVE REPORTING, INCREASE TRANSPARENCY, AND REDUCE FAILURES IN MACHINE LEARNING APPLICATIONS IN HEALTHCARE Unknown Date (has links) Artificial intelligence (AI) had a few false starts – the AI winters of the 1970s and 1980s. We are now in what looks like an AI summer. There are many useful applications of AI in the field. But there are still unfulfilled promises and outright failures. From self-driving cars that work only in constrained cases, to medical image analysis products that would replace radiologists but never did, we still struggle to translate successful research into successful real-world applications. The software engineering community has accumulated a large body of knowledge over the decades on how to develop, release, and maintain products. AI products, being software products, benefit from some of that accumulated knowledge, but not all of it. AI products diverge from traditional software products in fundamental ways: their main component is not a specific piece of code, written for a specific purpose, but a generic piece of code, a model, customized by a training process driven by hyperparameters and a dataset. Datasets are usually large and models are opaque. We cannot directly inspect them as we can inspect the code of traditional software products. We need other methods to detect failures in AI products. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection Machine learning Artificial intelligence Healthcare
223	Improved Design of Quadratic Discriminant Analysis Classifier in Unbalanced Settings Bejaoui, Amine 23 April 2020 (has links) The use of quadratic discriminant analysis (QDA) or its regularized version (RQDA) for classification is often not recommended, due to its well-acknowledged high sensitivity to the estimation noise of the covariance matrix. This becomes all the more the case in unbalanced data settings for which it has been found that R-QDA becomes equivalent to the classifier that assigns all observations to the same class. In this paper, we propose an improved R-QDA that is based on the use of two regularization parameters and a modified bias, properly chosen to avoid inappropriate behaviors of R-QDA in unbalanced settings and to ensure the best possible classification performance. The design of the proposed classifier builds on a refined asymptotic analysis of its performance when the number of samples and that of features grow large simultaneously, which allows to cope efficiently with the high-dimensionality frequently met within the big data paradigm. The performance of the proposed classifier is assessed on both real and synthetic data sets and was shown to be much higher than what one would expect from a traditional R-QDA. statistics machine learning QDA RMT
224	Web Conference Summarization Through a System of Flags Ankola, Annirudh M 01 March 2020 (has links) In today’s world, we are always trying to find new ways to advance. This era has given rise to a global, distributed workforce since technology has allowed people to access and communicate with individuals all over the world. With the rise of remote workers, the need for quality communication tools has risen significantly. These communication tools come in many forms, and web-conference apps are among the most prominent for the task. Developing a system to automatically summarize the web-conference will save companies time and money, leading to more efficient meetings. Current approaches to summarizing multi-speaker web-conferences tend to yield poor or incoherent results, since conversations do not flow in the same manner that monologues or well-structured articles do. This thesis proposes a system of flags used to extract information from sentences, where the flags are fed into Machine Learning models to determine the importance of the the sentence with which they are associated. The system of flags shows promise for multi-speaker conference summaries. NLP Data Science Machine Learning
225	Semantic Prioritization of Novel Causative Genomic Variants in Mendelian and Oligogenic Diseases Boudellioua, Imene 21 March 2019 (has links) Recent advances in Next Generation Sequencing (NGS) technologies have facilitated the generation of massive amounts of genomic data which in turn is bringing the promise that personalized medicine will soon become widely available. As a result, there is an increasing pressure to develop computational tools to analyze and interpret genomic data. In this dissertation, we present a systematic approach for interrogating patients’ genomes to identify candidate causal genomic variants of Mendelian and oligogenic diseases. To achieve that, we leverage the use of biomedical data available from extensive biological experiments along with machine learning techniques to build predictive models that rival the currently adopted approaches in the field. We integrate a collection of features representing molecular information about the genomic variants and information derived from biological networks. Furthermore, we incorporate genotype-phenotype relations by exploiting semantic technologies and automated reasoning inferred throughout a cross-species phenotypic ontology network obtained from human, mouse, and zebra fish studies. In our first developed method, named PhenomeNet Variant Predictor (PVP), we perform an extensive evaluation of a large set of synthetic exomes and genomes of diverse Mendelian diseases and phenotypes. Moreover, we evaluate PVP on a set of real patients’ exomes suffering from congenital hypothyroidism. We show that PVP successfully outperforms state-of-the-art methods, and provides a promising tool for accurate variant prioritization for Mendelian diseases. Next, we update the PVP method using a deep neural network architecture as a backbone for learning and illustrate the enhanced performance of the new method, DeepPVP on synthetic exomes and genomes. Furthermore, we propose OligoPVP, an extension of DeepPVP that prioritizes candidate oligogenic combinations in personal exomes and genomes by integrating knowledge from protein-protein interaction networks and we evaluate the performance of OligoPVP on synthetic genomes created by known disease-causing digenic combinations. Finally, we discuss some limitations and future steps for extending the applicability of our proposed methods to identify the genetic underpinning for Mendelian and oligogenic diseases. machine learning ontologies variant prioritization
226	Using Machine Learning and Text Mining Algorithms to Facilitate Research Discovery of Plant Food Metabolomics and Its Application for Human Health Benefit Targets Mathew, Jithin Jose January 2020 (has links) With the increase in scholarly articles published every day, the need for an automated systematic exploratory literature review tool is rising. With the advance in Text Mining and Machine Learning methods, such data exploratory tools are researched and developed in every scientific domain. This research aims at finding the best keyphrase extraction algorithm and topic modeling algorithm that is going to be the foundation and main component of a tool that will aid in Systematic Literature Review. Based on experimentation on a set of highly relevant scholarly articles published in the domain of food science, two graph-based keyphrase extraction algorithms, TopicalPageRank and PositionRank were picked as the best two algorithms among 9 keyphrase extraction algorithms for picking domain-specific keywords. Among the two topic modeling algorithms, Latent Dirichlet Assignment (LDA) and Non-zero Matrix Factorization (NMF), documents chosen in this research were best classified into suitable topics by the NMF method validated by a domain expert. This research lays the framework for a faster tool development for Systematic Literature Review. data mining machine learning textmining
227	Predicting Depression and Suicide Ideation in the Canadian Population Using Social Media Data Skaik, Ruba 30 June 2021 (has links) The economic burden of mental illness costs Canada billions of dollars every year. Millions of people suffer from mental illness, and only a fraction receives adequate treatment. Identifying people with mental illness requires initiation from those in need, available medical services, and professional experts’ time allocation. These resources might not be available all the time. The common practice is to rely on clinical data, which is generally collected after the illness is developed and reported. Moreover, such clinical data is incomplete and hard to obtain. An alternative data source is conducting surveys through phone calls, interviews, or mail, but this is costly and time-consuming. Social media analysis has brought advances in leveraging population data to understand mental health problems. Thus, analyzing social media posts can be an essential alternative for identifying mental disorders throughout the Canadian population. Big data research of social media may also endorse standard surveillance approaches and provide decision-makers with usable information. More precisely, social media analysis has shown promising results for public health assessment and monitoring. In this research, we explore the task of automatically analysing social media textual data using Natural Language Processing (NLP) and Machine Learning (ML) techniques to detect signs of mental health disorders that need attention, such as depression and suicide ideation. Considering the lack of comprehensive annotated data in this field, we propose a methodology for transfer learning to utilize the information hidden in a training sample and leverage it on a different dataset to choose the best-generalized model to be applied at the population level. We also present evidence that ML models designed to predict suicide ideation using Reddit data can utilize the knowledge they encoded to make predictions on Twitter data, even though the two platforms differ in the purpose, structure, and limitations. In our proposed models, we use feature engineering with supervised machine learning algorithms (such as SVM, LR, RF, XGBoost, and GBDT), and we compare their results with those of deep learning algorithms (such as LSTM, Bi-LSTM, and CNNs). We adopt the CNN model for depression classification that obtained the highest F1-score on the test dataset (0.898) and 0.941 recall. This model is later used to estimate the depression level of the population. For suicide ideation detection, we used the CNN model with pre-trained fastText word embeddings and linguistic features (LIWC). The model achieved an F1-score of 0.936 and a recall of 0.88 to predict suicide ideation at the user-level on the test set. To compare our models’ predictions with official statics, we used 2015-2016 population based Canadian Community Health Survey (CCHS) on Mental Health and Well-being conducted by Statistics Canada. The data is used to estimate depression and suicidality in Canadian provinces and territories. For depression, (n=53,050) respondents filled in the Patient Health Questionnaire-9 (PHQ-9) from 8 provinces/territories. Each survey respondent with a score ≥ 10 on the PHQ-9 was interpreted as having moderate to severe depression because this score is frequently used as a screening cut-point. The weighted percentage of depression prevalence during 2015 for females and males of the age between 15 to 75 was 11.5% and 8.1%, respectively (with 54.2% females and 45.8% males). Our model was applied on a population-representative dataset that contains 24,251 Twitter users who posted 1,735,200 tweets during 2015 with a Pearson correlation of 0.88 for both sex and age within the seven provinces and NT territory included in the CCHS. An age correlation of 0.95 was calculated for age and sex (separately) and our model estimated that 10% of the sample dataset has evidence of depression (58.3% females and 41.7% males). For the second task, suicide ideation, Statistics Canada (2015) estimated the total number of people who reported serious suicidal thoughts as 3,396,700 persons, i.e., 9.514% of the total population, whereas our models estimated 10.6% of the population sample were at risk of suicide ideation (59% females and 41% males). The Pearson correlation coefficients between the actual suicide ideation within the last 12 months and the predicted model for each province per age, sex, and both more than 0.62, which indicates a reasonable correlation. Machine Learning Natural Language Processing
228	Stratifying antimalarial compounds with similar mode of action using machine learning on chemo-transcriptomic profiles Van Heerden, Ashleigh January 2019 (has links) Malaria is a terrible disease caused by a protozoan parasite within the Plasmodium genus, claiming the lives of hundreds of thousands of people yearly, the majority of whom are children under the age of five. Of the five species of Plasmodium causing malaria in humans, P. falciparum is responsible for most of the death toll. An increase in malaria cases was detected between the years 2016 to 2017 according to the World Malaria Report of 2017, despite control efforts. The rapid development of resistance within P. falciparum against antimalarials has led to the use of artemisinin combinational therapy as the current gold standard for malaria treatment. Yet decreased parasite clearance demonstrates that using combination therapy is insufficient in maintaining current antimalarials’ effectiveness against these resistant parasites. Hence, novel compounds with a mode of action (MoA) different than current antimalarials are required. Though phenotypic screening has delivered thousands of promising hit compounds, hit-to-lead optimisation is still one of the rate-limiting steps in pre-clinical antimalarial drug development. While knowing the exact target or MoA is not required to progress a compound in a medicinal chemistry program, identifying the MoA early can accelerate hit prioritization, hit-to-lead optimisation and preclinical combination studies in malaria research. In this study, we assessed machine learning (ML) approaches for their ability to stratify antimalarials based on transcriptional responses associated with the treatments. From our results, we conclude that it is possible to identify biomarkers from the transcriptional responses that define the MoA of compounds. Moreover, only a limited set of 50 genes was required to build a ML model that can stratify compounds with similar MoA with a classification accuracy of 76.6 ± 6.4%. These biomarkers will help stratify new compounds with similar MoA to those already defined with our strategy. Additionally, the biomarkers can also be used to monitor if the MoA of a compound has changed during hit-to-lead optimisation. This work will contribute to accelerating antimalarial drug discovery during the hit-to-lead optimisation phase and help the identification of compounds with novel MoA. / Dissertation (MSc)--University of Pretoria, 2019. / Biochemistry / MSc / Unrestricted UCTD Machine learning Plasmodium falciparum
229	Learning-Based Approaches for Next-Generation Intelligent Networks Zhang, Liang 20 April 2022 (has links) The next-generation (6G) networks promise to provide extended 5G capabilities with enhanced performance at high data rates, low latency, low energy consumption, and rapid adaptation. 6G networks are also expected to support the unprecedented Internet of Everything (IoE) scenarios with highly diverse requirements. With the emerging applications of autonomous driving, virtual reality, and mobile computing, achieving better performance and fulfilling the diverse requirements of 6G networks are becoming increasingly difficult due to the rapid proliferation of wireless data and heterogeneous network structures. In this regard, learning-based algorithms are naturally powerful tools to deal with the numerous data and are expected to impact the evolution of communication networks. This thesis employed learning-based approaches to enhance the performance and fulfill the diverse requirements of the next-generation intelligent networks under various network structures. Specifically, we design the trajectory of the unmanned aerial vehicle (UAV) to provide energy-efficient, high data rate, and fair service for the Internet of things (IoT) networks by employing on/off-policy reinforcement learning (RL). Thereafter, we applied a deep RL-based approach for heterogeneous traffic offloading in the space-air-ground integrated network (SAGIN) to cover the co-existing requirements of ultra-reliable low-latency communication (URLLC) traffic and enhanced mobile broadband (eMBB) traffic. Precise traffic prediction can significantly improve the performance of 6G networks in terms of intelligent network operations, such as predictive network configuration control, traffic offloading, and communication resource allocation. Therefore, we investigate the wireless traffic prediction problem in edge networks by applying a federated meta-learning approach. Lastly, we design an importance-oriented clustering-based high quality of service (QoS) system with software-defined networking (SDN) by adopting unsupervised learning. Intelligent Network Machine Learning 6G
230	Stochastic gradient descent for pairwise learning : stability and optimization error Shen, Wei 19 August 2019 (has links) In this thesis, we study the stability and its trade-off with optimization error for stochastic gradient descent (SGD) algorithms in the pairwise learning setting. Pairwise learning refers to a learning task which involves a loss function depending on pairs of instances among which notable examples are bipartite ranking, metric learning, area under ROC curve (AUC) maximization and minimum error entropy (MEE) principle. Our contribution is twofold. Firstly, we establish the stability results for SGD for pairwise learning in the convex, strongly convex and non-convex settings, from which generalization errors can be naturally derived. Moreover, we also give the stability results of buffer-based SGD and projected SGD. Secondly, we establish the trade-off between stability and optimization error of SGD algorithms for pairwise learning. This is achieved by lower-bounding the sum of stability and optimization error by the minimax statistical error over a prescribed class of pairwise loss functions. From this fundamental trade-off, we obtain lower bounds for the optimization error of SGD algorithms and the excess expected risk over a class of pairwise losses. In addition, we illustrate our stability results by giving some specific examples and experiments of AUC maximization and MEE.

Search results