Spelling suggestions: "subject:"regardingmathematical models"" "subject:"duringmathematical models""
31 |
A Survey of Systems for Predicting Stock Market Movements, Combining Market Indicators and Machine Learning ClassifiersCaley, Jeffrey Allan 14 March 2013 (has links)
In this work, we propose and investigate a series of methods to predict stock market movements. These methods use stock market technical and macroeconomic indicators as inputs into different machine learning classifiers. The objective is to survey existing domain knowledge, and combine multiple techniques into one method to predict daily market movements for stocks. Approaches using nearest neighbor classification, support vector machine classification, K-means classification, principal component analysis and genetic algorithms for feature reduction and redefining the classification rule were explored. Ten stocks, 9 companies and 1 index, were used to evaluate each iteration of the trading method. The classification rate, modified Sharpe ratio and profit gained over the test period is used to evaluate each strategy. The findings showed nearest neighbor classification using genetic algorithm input feature reduction produced the best results, achieving higher profits than buy-and-hold for a majority of the companies.
|
32 |
The role of model implementation in neuroscientific applications of machine learningAbe, Taiga January 2024 (has links)
In modern neuroscience, large scale machine learning models are becoming increasingly critical components of data analysis. Despite the accelerating adoption of these large scale machine learning tools, there are fundamental challenges to their use in scientific applications that remain largely unaddressed. In this thesis, I focus on one such challenge: variability in the predictions of large scale machine learning models relative to seemingly trivial differences in their implementation.
Existing research has shown that the performance of large scale machine learning models (more so than traditional model like linear regression) is meaningfully entangled with design choices such as the hardware components, operating system, software dependencies, and random seed that the corresponding model depends upon. Within the bounds of current practice, there are few ways of controlling this kind of implementation variability across the broad community of neuroscience researchers (making data analysis less reproducible), and little understanding of how data analyses might be designed to mitigate these issues (making data analysis unreliable). This dissertation will present two broad research directions that address these shortcomings.
First, I will describe a novel, cloud-based platform for sharing data analysis tools reproducibly and at scale. This platform, called NeuroCAAS, enables developers of novel data analyses to precisely specify an implementation of their entire data analysis, which can then be used automatically by any other user on custom built cloud resources. I show that this approach is able to efficiently support a wide variety of existing data analysis tools, as well as novel tools which would not be feasible to build and share outside of a platform like NeuroCAAS.
Second, I conduct two large-scale studies on the behavior of deep ensembles. Deep ensembles are a class of machine learning model which uses implementation variability to improve the quality of model predictions; in particular, by aggregating the predictions of deep networks over stochastic initialization and training. Deep ensembles simultaneously provide a way to control the impact of implementation variability (by aggregating predictions across random seeds) and also to understand what kind of predictive diversity is generated by this particular form of implementation variability. I present a number of surprising results that contradict widely held intuitions about the performance of deep ensembles as well as the mechanisms behind their success, and show that in many aspects, the behavior of deep ensembles is similar to that of an appropriately chosen single neural network. As a whole, this dissertation presents novel methods and insights focused on the role of implementation variability in large scale machine learning models, and more generally upon the challenges of working with such large models in neuroscience data analysis. I conclude by discussing other ongoing efforts to improve the reproducibility and accessibility of large scale machine learning in neuroscience, as well as long term goals to speed the adoption and reliability of such methods in a scientific context.
|
33 |
Computational modeling for identification of low-frequency single nucleotide variantsHao, Yangyang 16 November 2015 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Reliable detection of low-frequency single nucleotide variants (SNVs) carries great significance in many applications. In cancer genetics, the frequencies of somatic variants from tumor biopsies tend to be low due to contamination with normal tissue and tumor heterogeneity. Circulating tumor DNA monitoring also faces the challenge of detecting low-frequency variants due to the small percentage of tumor DNA in blood. Moreover, in population genetics, although pooled sequencing is cost-effective compared with individual sequencing, pooling dilutes the signals of variants from any individual. Detection of low frequency variants is difficult and can be cofounded by multiple sources of errors, especially next-generation sequencing artifacts. Existing methods are limited in sensitivity and mainly focus on frequencies around 5%; most fail to consider differential, context-specific sequencing artifacts. To face this challenge, we developed a computational and experimental framework, RareVar, to reliably identify low-frequency SNVs from high-throughput sequencing data. For optimized performance, RareVar utilized a supervised learning framework to model artifacts originated from different components of a specific sequencing pipeline. This is enabled by a customized, comprehensive benchmark data enriched with known low-frequency SNVs from the sequencing pipeline of interest. Genomic-context-specific sequencing error model was trained on the benchmark data to characterize the systematic sequencing artifacts, to derive the position-specific detection limit for sensitive low-frequency SNV detection. Further, a machine-learning algorithm utilized sequencing quality features to refine SNV candidates for higher specificity. RareVar outperformed existing approaches, especially at 0.5% to 5% frequency. We further explored the influence of statistical modeling on position specific error modeling and showed zero-inflated negative binomial as the best-performed statistical distribution. When replicating analyses on an Illumina MiSeq benchmark dataset, our method seamlessly adapted to technologies with different biochemistries. RareVar enables sensitive detection of low-frequency SNVs across different sequencing platforms and will facilitate research and clinical applications such as pooled sequencing, cancer early detection, prognostic assessment, metastatic monitoring, and relapses or acquired resistance identification.
|
34 |
Forest dynamics and climate change: a multi-scale analysis of structure, degradation, recovery, and greenhouse gas fluxesCooley, Savannah S. January 2025 (has links)
This dissertation examines the dynamics of tropical forest landscapes through a multi-scale analysis of forest structure, recovery processes, and climate mitigation potential. With tropical forests increasingly promoted as a natural climate solution, rigorous assessment of their recovery dynamics and carbon sequestration potential is critical for effective policy design. Through four interconnected studies combining remote sensing, local ecological knowledge, and greenhouse gas measurements, this research reveals key insights about forest regeneration across local to global scales.
The research employs NASA's Global Ecosystem Dynamics Investigation (GEDI) lidar data, thermal remote sensing from NASA's Ecosystem Spacebourne Thermal Radiometer Experiment on Space Station (ECOSTRESS), and multiple optical satellite datasets to analyze forest structure and function. Methods include classification of forest structural types co-produced with local ecological knowledge, spatial hierarchical Bayesian modeling of thermal stress patterns in degraded forests, machine learning and spatial hierarchical Bayesian modeling of water use efficiency trajectories, and mixed-effects gamma regression meta-analysis of greenhouse gas fluxes in recovering ecosystems.
Results from Chapter 1 demonstrated substantial structural differences between forest types, with mature forests showing 41% higher mean canopy height (29.40 m vs. 20.82 m) and 38% lower height variance compared to secondary forests. Chapter 2 revealed that forest degradation impacts canopy structure and thermal conditions, with burned forests showing sustained elevated temperatures that exceed critical physiological thresholds in 92.5% of sampled canopy area compared to 65.5% in intact forests.
Chapter 3 suggested that water use efficiency strongly influences biomass accumulation during forest recovery, with high-stress conditions resulting in 150 Mg ha⁻¹ lower biomass after 120 years of regeneration. The global scale meta-analysis in Chapter 4 showed that while regenerating forests release more nitrous oxide and absorb less methane than mature forests (11.29 ± 8.16 Pg CO₂e yr-⁻¹ ), the carbon sequestration benefits outweigh these greenhouse gas emissions across all studied biomes for at least 100 years post-recovery. These findings provide insights for tropical forest conservation and restoration strategies while contributing to our understanding of their role in climate change mitigation.
|
Page generated in 0.217 seconds