Global ETD Search

1	Conservation by Consensus: Reducing Uncertainty from Methodological Choices in Conservation-based Models Poos, Mark S. 01 September 2010 (has links) Modeling species of conservation concern, such as those that are rare, declining, or have a conservation designation (e.g. endangered or threatened), remains an activity filled with uncertainty. Species that are of conservation concern often are found infrequently, in small sample sizes and spatially fragmented distributions, thereby making accurate enumeration difficult and traditional statistical approaches often invalid. For example, there are numerous debates in the ecological literature regarding methodological choices in conservation-based models, such as how to measure functional traits to account for ecosystem function, the impact of including rare species in biological assessments and whether species-specific dispersal can be measured using distance based functions. This thesis attempts to address issues in methodological choices in conservation-based models in two ways. In the first section of the thesis, the impacts of methodological choices on conservation-based models are examined across a broad selection of available approaches, from: measuring functional diversity; to conducting bio-assessments in community ecology; to assessing dispersal in metapopulation analyses. It is the goal of this section to establish the potential for methodological choices to impact conservation-based models, regardless of the scale, study-system or species involved. In the second section of this thesis, the use of consensus methods is developed as a potential tool for reducing uncertainty with methodological choices in conservation-based models. Two separate applications of consensus methods are highlighted, including how consensus methods can reduce uncertainty from choosing a modeling type or to identify when methodological choices may be a problem. statistical methods conservation biology endangered species ensemble models 0329
2	Conservation by Consensus: Reducing Uncertainty from Methodological Choices in Conservation-based Models Poos, Mark S. 01 September 2010 (has links) Modeling species of conservation concern, such as those that are rare, declining, or have a conservation designation (e.g. endangered or threatened), remains an activity filled with uncertainty. Species that are of conservation concern often are found infrequently, in small sample sizes and spatially fragmented distributions, thereby making accurate enumeration difficult and traditional statistical approaches often invalid. For example, there are numerous debates in the ecological literature regarding methodological choices in conservation-based models, such as how to measure functional traits to account for ecosystem function, the impact of including rare species in biological assessments and whether species-specific dispersal can be measured using distance based functions. This thesis attempts to address issues in methodological choices in conservation-based models in two ways. In the first section of the thesis, the impacts of methodological choices on conservation-based models are examined across a broad selection of available approaches, from: measuring functional diversity; to conducting bio-assessments in community ecology; to assessing dispersal in metapopulation analyses. It is the goal of this section to establish the potential for methodological choices to impact conservation-based models, regardless of the scale, study-system or species involved. In the second section of this thesis, the use of consensus methods is developed as a potential tool for reducing uncertainty with methodological choices in conservation-based models. Two separate applications of consensus methods are highlighted, including how consensus methods can reduce uncertainty from choosing a modeling type or to identify when methodological choices may be a problem. statistical methods conservation biology endangered species ensemble models 0329
3	Deep Learning for Spatiotemporal Nowcasting Franch, Gabriele 08 March 2021 (has links) Nowcasting – short-term forecasting using current observations – is a key challenge that human activities have to face on a daily basis. We heavily rely on short-term meteorological predictions in domains such as aviation, agriculture, mobility, and energy production. One of the most important and challenging task for meteorology is the nowcasting of extreme events, whose anticipation is highly needed to mitigate risk in terms of social or economic costs and human safety. The goal of this thesis is to contribute with new machine learning methods to improve the spatio-temporal precision of nowcasting of extreme precipitation events. This work relies on recent advances in deep learning for nowcasting, adding methods targeted at improving nowcasting using ensembles and trained on novel original data resources. Indeed, the new curated multi-year radar scan dataset (TAASRAD19) is introduced that contains more than 350.000 labelled precipitation records over 10 years, to provide a baseline benchmark, and foster reproducibility of machine learning modeling. A TrajGRU model is applied to TAASRAD19, and implemented in an operational prototype. The thesis also introduces a novel method for fast analog search based on manifold learning: the tool leverages the entire dataset history in less than 5 seconds and demonstrates the feasibility of predictive ensembles. In the final part of the thesis, the new deep learning architecture ConvSG based on stacked generalization is presented, introducing novel concepts for deep learning in precipitation nowcasting: ConvSG is specifically designed to improve predictions of extreme precipitation regimes over published methods, and shows a 117% skill improvement on extreme rain regimes over a single member. Moreover, ConvSG shows superior or equal skills compared to Lagrangian Extrapolation models for all rain rates, achieving a 49% average improvement in predictive skill over extrapolation on the higher precipitation regimes. Settore INF/01 - Informatica
4	<b>EXPLORING ENSEMBLE MODELS AND GAN-BASED </b><b>APPROACHES FOR AUTOMATED DETECTION OF </b><b>MACHINE-GENERATED TEXT</b> Surbhi Sharma (18437877) 29 April 2024 (has links) <p dir="ltr">Automated detection of machine-generated text has become increasingly crucial in various fields such as cybersecurity, journalism, and content moderation due to the proliferation of generated content, including fake news, spam, and bot-generated comments. Traditional methods for detecting such content often rely on rule-based systems or supervised learning approaches, which may struggle to adapt to evolving generation techniques and sophisticated manipulations. In this thesis, we explore the use of ensemble models and Generative Adversarial Networks (GANs) for the automated detection of machine-generated text. </p><p dir="ltr">Ensemble models combine the strengths of different approaches, such as utilizing both rule-based systems and machine learning algorithms, to enhance detection accuracy and robustness. We investigate the integration of linguistic features, syntactic patterns, and semantic cues into machine learning pipelines, leveraging the power of Natural Language Processing (NLP) techniques. By combining multiple modalities of information, Ensemble models can effectively capture the subtle characteristics and nuances inherent in machine-generated text, improving detection performance. </p><p dir="ltr">In my latest experiments, I examined the performance of a Random Forest classifier trained on TF-IDF representations in combination with RoBERTa embeddings to calculate probabilities for machine-generated text detection. Test1 results showed promising accuracy rates, indicating the effectiveness of combining TF-IDF with RoBERTa probabilities. Test2 further validated these findings, demonstrating improved detection performance compared to standalone approaches.<br></p><p dir="ltr">These results suggest that leveraging Random Forest TF-IDF representation with RoBERTa embeddings to calculate probabilities can enhance the detection accuracy of machine-generated text.<br></p><p dir="ltr">Furthermore, we delve into the application of GAN-RoBERTa, a class of deep learning models comprising a generator and a discriminator trained adversarially, for generating and detecting machine-generated text. GANs have demonstrated remarkable capabilities in generating realistic text, making them a potential tool for adversaries to produce deceptive content. However, this same adversarial nature can be harnessed for detection purposes,<br>where the discriminator is trained to distinguish between genuine and machine-generated text.<br></p><p dir="ltr">Overall, our findings suggest that the use of Ensemble models and GAN-RoBERTa architectures holds significant promise for the automated detection of machine-generated text. Through a combination of diverse approaches and adversarial training techniques, we have demonstrated improved detection accuracy and robustness, thereby addressing the challenges posed by the proliferation of generated content across various domains. Further research and refinement of these approaches will be essential to stay ahead of evolving generation techniques and ensure the integrity and trustworthiness of textual content in the digital landscape.</p> Natural language processing Adversarial machine learning RoBERTa ensemble models built Adversarial Models
5	A Machine Learning-Based Heuristic to Explain Game-Theoretic Models Baswapuram, Avinashh Kumar 17 July 2024 (has links) This paper introduces a novel methodology that integrates Machine Learning (ML), Operations Research (OR), and Game Theory (GT) to develop an interpretable heuristic for principal-agent models (PAM). We extract solution patterns from ensemble tree models trained on solved instances of a PAM. Using these patterns, we develop a hierarchical tree-based approach that forms an interpretable ML-based heuristic to solve the PAM. This method ensures the interpretability, feasibility, and generalizability of ML predictions for game-theoretic models. The predicted solutions from this ensemble model-based heuristic are consistently high quality and feasible, significantly reducing computational time compared to traditional optimization methods to solve PAM. Specifically, the computational results demonstrate the generalizability of the ensemble heuristic in varying problem sizes, achieving high prediction accuracy with optimality gaps between 1--2% and significant improvements in solution times. Our ensemble model-based heuristic, on average, requires only 4.5 out of the 9 input features to explain its predictions effectively for a particular application. Therefore, our ensemble heuristic enhances the interpretability of game-theoretic optimization solutions, simplifying explanations and making them accessible to those without expertise in ML or OR. Our methodology adds to the approaches for interpreting ML predictions while also improving numerical tractability of PAMs. Consequently, enhancing policy design and operational decisions, and advancing real-time decision support where understanding and justifying decisions is crucial. / Master of Science / This paper introduces a new method that combines Machine Learning (ML) with Operations Research (OR) to create a clear and understandable approach for solving a principal-agent model (PAM). We use patterns from a group of decision trees to form an ML-based strategy to predict solutions that greatly reduces the time to solve the problem compared to traditional optimization techniques. Our approach works well for different sizes of problems, maintaining high accuracy with very small differences in objective function value from the best possible solutions (1-2%). The solutions predicted are consistently high quality and practical, significantly reducing the time needed compared to traditional optimization methods. Remarkably, our heuristic typically uses only 4.5 out of 9 input features to explain its predictions, making it much simpler and more interpretable than other methods. The results show that our method is both efficient and effective, with faster solution times and better accuracy. Our method can make complex game-theoretic optimization solutions more understandable, even for those without expertise in ML or OR. By improving the interpretability making PAMs analytically explainable, our approach supports better policy design and operational decision-making, advancing real-time decision support where clarity and justification of decisions are essential. principal-agent game theoretic model machine learning ensemble models interpretability optimization
6	Ensemble multi-label learning in supervised and semi-supervised settings / Apprentissage multi-label ensembliste dans le context supervisé et semi-supervisé Gharroudi, Ouadie 21 December 2017 (has links) L'apprentissage multi-label est un problème d'apprentissage supervisé où chaque instance peut être associée à plusieurs labels cibles simultanément. Il est omniprésent dans l'apprentissage automatique et apparaÃ®t naturellement dans de nombreuses applications du monde réel telles que la classification de documents, l'étiquetage automatique de musique et l'annotation d'images. Nous discutons d'abord pourquoi les algorithmes multi-label de l'etat-de-l'art utilisant un comité de modèle souffrent de certains inconvénients pratiques. Nous proposons ensuite une nouvelle stratégie pour construire et agréger les modèles ensemblistes multi-label basés sur k-labels. Nous analysons ensuite en profondeur l'effet de l'étape d'agrégation au sein des approches ensemblistes multi-label et étudions comment cette agrégation influece les performances de prédictive du modèle enfocntion de la nature de fonction cout à optimiser. Nous abordons ensuite le problème spécifique de la selection de variables dans le contexte multi-label en se basant sur le paradigme ensembliste. Trois méthodes de sélection de caractéristiques multi-label basées sur le paradigme des forêts aléatoires sont proposées. Ces méthodes diffèrent dans la façon dont elles considèrent la dépendance entre les labels dans le processus de sélection des varibales. Enfin, nous étendons les problèmes de classification et de sélection de variables au cadre d'apprentissage semi-supervisé. Nous proposons une nouvelle approche de sélection de variables multi-label semi-supervisée basée sur le paradigme de l'ensemble. Le modèle proposé associe des principes issues de la co-training en conjonction avec une métrique interne d'évaluation d'importnance des varaibles basée sur les out-of-bag. Testés de manière satisfaisante sur plusieurs données de référence, les approches développées dans cette thèse sont prometteuses pour une variété d'ap-plications dans l'apprentissage multi-label supervisé et semi-supervisé. Testés de manière satisfaisante sur plusieurs jeux de données de référence, les approches développées dans cette thèse affichent des résultats prometteurs pour une variété domaine d'applications de l'apprentissage multi-label supervisé et semi-supervisé / Multi-label learning is a specific supervised learning problem where each instance can be associated with multiple target labels simultaneously. Multi-label learning is ubiquitous in machine learning and arises naturally in many real-world applications such as document classification, automatic music tagging and image annotation. In this thesis, we formulate the multi-label learning as an ensemble learning problem in order to provide satisfactory solutions for both the multi-label classification and the feature selection tasks, while being consistent with respect to any type of objective loss function. We first discuss why the state-of-the art single multi-label algorithms using an effective committee of multi-label models suffer from certain practical drawbacks. We then propose a novel strategy to build and aggregate k-labelsets based committee in the context of ensemble multi-label classification. We then analyze the effect of the aggregation step within ensemble multi-label approaches in depth and investigate how this aggregation impacts the prediction performances with respect to the objective multi-label loss metric. We then address the specific problem of identifying relevant subsets of features - among potentially irrelevant and redundant features - in the multi-label context based on the ensemble paradigm. Three wrapper multi-label feature selection methods based on the Random Forest paradigm are proposed. These methods differ in the way they consider label dependence within the feature selection process. Finally, we extend the multi-label classification and feature selection problems to the semi-supervised setting and consider the situation where only few labelled instances are available. We propose a new semi-supervised multi-label feature selection approach based on the ensemble paradigm. The proposed model combines ideas from co-training and multi-label k-labelsets committee construction in tandem with an inner out-of-bag label feature importance evaluation. Satisfactorily tested on several benchmark data, the approaches developed in this thesis show promise for a variety of applications in supervised and semi-supervised multi-label learning Classification multi-label Apprentissage supervisé Apprentissage semi-supervisé Multi-label classification Ensemble models Semi-supervised learning Feature selection 004
7	Modeling Continental-Scale Outdoor Environmental Sound Levels with Limited Data Pedersen, Katrina Lynn 13 August 2021 (has links) (PDF) Modeling outdoor acoustic environments is a challenging problem because outdoor acoustic environments are the combination of diverse sources and propagation effects, including barriers to propagation such as buildings or vegetation. Outdoor acoustic environments are most commonly modeled on small geographic scales (e.g., within a single city). Extending modeling efforts to continental scales is particularly challenging due to an increase in the variety of geographic environments. Furthermore, acoustic data on which to train and validate models are expensive to collect and therefore relatively limited. It is unclear how models trained on this limited acoustic data will perform across continental-scales, which likely contain unique geographic regions which are not represented in the training data. In this dissertation, we consider the problem of continental-scale outdoor environmental sound level modeling using the contiguous United States for our area of study. We use supervised machine learning methods to produce models of various acoustic metrics and unsupervised learning methods to study the natural structures in geospatial data. We present a validation study of two continental-scale models which demonstrates that there is a need for better uncertainty quantification and tools to guide data collection. Using ensemble models, we investigate methods for quantifying uncertainty in continental-scale models. We also study methods of improving model accuracy, including dimensionality reduction, and explore the feasibility of predicting hourly spectral levels. sound level models geospatial modeling machine learning ensemble models uncertainty quantification GIS environmental noise validation Physical Sciences and Mathematics
8	Modeling Continental-Scale Outdoor Environmental Sound Levels with Limited Data Pedersen, Katrina Lynn 01 January 2021 (has links) (PDF) Modeling outdoor acoustic environments is a challenging problem because outdoor acoustic environments are the combination of diverse sources and propagation effects, including barriers to propagation such as buildings or vegetation. Outdoor acoustic environments are most commonly modeled on small geographic scales (e.g., within a single city). Extending modeling efforts to continental scales is particularly challenging due to an increase in the variety of geographic environments. Furthermore, acoustic data on which to train and validate models are expensive to collect and therefore relatively limited. It is unclear how models trained on this limited acoustic data will perform across continental-scales, which likely contain unique geographic regions which are not represented in the training data. In this dissertation, we consider the problem of continental-scale outdoor environmental sound level modeling using the contiguous United States for our area of study. We use supervised machine learning methods to produce models of various acoustic metrics and unsupervised learning methods to study the natural structures in geospatial data. We present a validation study of two continental-scale models which demonstrates that there is a need for better uncertainty quantification and tools to guide data collection. Using ensemble models, we investigate methods for quantifying uncertainty in continental-scale models. We also study methods of improving model accuracy, including dimensionality reduction, and explore the feasibility of predicting hourly spectral levels. sound level models geospatial modeling machine learning ensemble models uncertainty quantification GIS environmental noise validation Physical Sciences and Mathematics
9	Strategies for Combining Tree-Based Ensemble Models Zhang, Yi 01 January 2017 (has links) Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach. ensemble models model selection multiple correspondence analysis predictive models random forest extremely randomized tree and eXtreme gradient boosting model tree based ensemble model Computer Sciences
10	Protein-drug binding affinity prediction with machine learning : Assessing the impact of features from molecular dynamic simulations Guttormsson, Guðmundur Andri, Le Gallo, Léa January 2024 (has links) The development of medicine is generally a long and costly process, and one big factor is estimating the affinity of protein-drug binding. Leveraging machine learning in this field is a promising approach as it can streamline the prediction process and reduce the need for expensive experimental methods. Machine learning methods have already enabled significant advances in predicting protein-drug binding affinity, yet there remains room for improvement. The primary challenge is the quality of data used for these machine learning models. In this work, two ensemble machine learning models, Random Forest and Extreme Gradient Boosting Machine, have been tested and compared with a recent database of protein-ligand complex features calculated from molecular dynamics simulation. Additional features were also extracted from the PDB database through PLIP (Protein-Ligand interaction Profiler), aiming to improve the predictions further. The results indicate that while the features from the PDB database provided strong predictive power, including features from molecular dynamic simulations did not improve the models’ performance. machine learning ensemble models binding affinity molecular dynamics simulations scoring function Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Biochemistry and Molecular Biology Biokemi och molekylärbiologi Computer Sciences Datavetenskap (datalogi)

Search results