Spelling suggestions: "subject:"random forests"" "subject:"random gorests""
51 |
Comparing machine learning models and physics-based models in groundwater scienceBoerman, Thomas Christiaan 25 January 2022 (has links)
The use of machine learning techniques in tackling hydrological problems has significantly
increased over the last decade. Machine learning tools can provide alternatives or surrogates to complex and comprehensive methodologies such as physics-based numerical models.
Machine learning algorithms have been used in hydrology for estimating streamflow, runoff,
water table fluctuations and calculating the impacts of climate change on nutrient loading
among many other applications. In recent years we have also seen arguments for and
advances in combining physics-based models and machine learning algorithms for mutual
benefit. This thesis contributes to these advances by addressing two different groundwater
problems by developing a machine learning approach and comparing this previously
developed physics-based models: i) estimating groundwater and surface water depletion
caused by groundwater pumping using artificial neural networks and ii) estimating a global
steady-state map of water table depth using random forests.
The first chapter of this thesis outlines the purpose of this thesis and how this thesis is a
contribution to the overall scientific knowledge on the topic. The results of this research
contribute to three of the twenty-three major unsolved problems in hydrology, as has been
summarized by a collective of hundreds of hydrologists.
In the second chapter, we tested the potential of artificial neural networks (ANNs), a deeplearning
tool, as an alternative method for estimating source water of groundwater
abstraction compared to conventional methods (analytical solutions and numerical models).
Surrogate ANN models of three previously calibrated numerical groundwater models were
developed using hydrologically meaningful input parameters (e.g., well-stream distance and
hydraulic diffusivity) selected by predictor parameter optimization, combining hydrological
expertise and statistical methodologies (ANCOVA). The output parameters were three
transient sources of groundwater abstraction (shallow and deep storage release, and local
surface-water depletion). We found that the optimized ANNs have a predictive skill of up to
0.84 (R2, 2σ = ± 0.03) when predicting water sources compared to physics-based numerical
(MODFLOW) models. Optimal ANN skill was obtained when using between five and seven
predictor parameters, with hydraulic diffusivity and mean aquifer thickness being the most
important predictor parameters. Even though initial results are promising and
computationally frugal, we found that the deep learning models were not yet sufficient or
outperforming numerical model simulations.
The third chapter used random forests in mapping steady-state water table depth on a global
scale (0.1°-spatial resolution) and to integrate the results to improve our understanding on
scale and perceptual modeling of global water table depth. In this study we used a spatially
biased ~1.5-million-point database of water table depth observations with a variety of
iv
globally distributed above- and below-ground predictor variables with causal relationships to
steady-state water table depth. We mapped water table depth globally as well as at regional
to continental scales to interrogate performance, feature importance and hydrologic process
across scales and regions with varying hydrogeological landscapes and climates. The global
water table depth map has a correlation (cross validation error) of R2 = 0.72 while our highest
continental correlation map (Australia) has a correlation of R2 = 0.86. The results of this study
surprisingly show that above-ground variables such as surface elevation, slope, drainage
density and precipitation are among the most important predictor parameters while
subsurface parameters such as permeability and porosity are notably less important. This is
contrary to conventional thought among hydrogeologists, who would assume that subsurface
parameters are very important. Machine learning results overall underestimate water
table depth similar to existing global physics-based groundwater models which also have
comparable differences between existing physics-based groundwater models themselves.
The feature importance derived from our random forest models was used to develop
alternative perceptual models that highlight different water table depth controls between
areas with low relief and high relief. Finally, we considered the representativeness of the
prediction domain and the predictor database and found that 90% of the prediction domain
has a dissimilarity index lower than 0.75. We conclude that we see good extrapolation
potential for our random forest models to regions with unknown water table depth, except
for some high elevation regions.
Finally in chapter four, the most important findings of chapters two and three are considered
as contributions to the unresolved questions in hydrology. Overall, this thesis has contributed
to advancing hydrological sciences through: i) mapping of global steady-state water table
depth using machine learning; ii) advancing hybrid modeling by using synthetic data derived
from physics-based models to train an artificial neural network for estimating storage
depletion; and (iii) it contributing to answering three unsolved problems in hydrology
involving themes of parameter scaling across temporal and spatial scales, extracting
hydrological insight from data, the use of innovative modeling techniques to estimate
hydrological fluxes/states and extrapolation of models to no-data regions. / Graduate
|
52 |
Next Stop Eastie: Using Machine Learning to Predict Socioeconomic Change in Boston and BeyondLaPlante, Rita January 2022 (has links)
Thesis advisor: Christopher Maxwell / This paper examines neighborhood socioeconomic ascent in both Boston and the Greater Boston metropolitan statistical area. Using random forests, a supervised machine learning algorithm, and a collection of physical and demographic neighborhood characteristics gathered from the American Community Survey, I model changes in neighborhood socioeconomic status and identify neighborhoods in my study area that experienced relative socioeconomic ascent or relative socioeconomic decline between 2010 and 2019. In order to gain a better understanding of future socioeconomic change throughout my study area, I use a random forests model to predict neighborhood socioeconomic status in 2028. I find that my best random forests model offers an improvement over traditional linear modeling techniques and, through mapping results for Boston specifically, that change in Boston is occurring in minority, working class neighborhoods, especially along the city’s waterfront. These findings, in combination with qualitative community data, can be used to inform policy concerning matters ranging from housing to transportation in the years to come. / Thesis (BA) — Boston College, 2022. / Submitted to: Boston College. College of Arts and Sciences. / Discipline: Departmental Honors. / Discipline: Economics.
|
53 |
Habitat Selection and Response to Disturbance by Pygmy Rabbits in UtahEdgel, Robert John 18 March 2013 (has links) (PDF)
The pygmy rabbit (Brachylagus idahoensis) is a sagebrush (Artemisia sp.) obligate that depends on sagebrush habitats for food and cover throughout its life cycle. Invasive species, frequent fires, overgrazing, conversion of land to agriculture, energy development, and many other factors have contributed to recent declines in both quantity and quality of sagebrush-steppe habitats required by pygmy rabbits. Because of the many threats to these habitats and the believed decline of pygmy rabbit populations, there is a need to further understand habitat requirements for this species and how they respond to disturbance. This study evaluated habitat selection by pygmy rabbits in Utah and assessed response of this small lagomorph to construction of a large-scale pipeline (i.e. Ruby pipeline) in Utah. We collected habitat data across Utah at occupied sites (pygmy rabbit occupied burrows) and compared these data to similar measurements at unoccupied sites (random locations within sagebrush habitat where pygmy rabbits were not observed). Variables such as horizontal obscurity, elevation, percent understory composed of sagebrush and other shrubs, and sagebrush decadence best described between occupied (active burrow) and unoccupied (randomly selected) sites. Occupied sites had greater amounts of horizontal obscurity, were located at higher elevations, had greater percentage of understory comprised of sagebrush and shrubs, and had less decadent sagebrush. When considering habitat alterations or management these variables should be considered to enhance and protect existing habitat for pygmy rabbits. The Ruby pipeline was a large-scale pipeline project that required the removal of vegetation and the excavation of soil in a continuous linear path for the length of the pipeline. The area that was disturbed is referred to as the right of way (ROW). From our assessment of pygmy rabbit response to construction of the Ruby pipeline, we found evidence for habitat loss and fragmentation as a result of this disturbance. The size of pygmy rabbit space-use areas and home ranges decreased post construction, rabbits shifted core-use areas away from the ROW, and there were fewer movements of collared rabbits across the ROW. Mitigation efforts should consider any action which may reduce restoration time and facilitate movements of rabbits across disturbed areas.
|
54 |
Predicting the Options Expiration Effect Using Machine Learning Models Trained With Gamma Exposure Data / Prediktion av inverkan på aktiemarknaden då optioner upphör med hjälp av maskininlärningsmodeller tränade med dagliga GEX värdenDubois, Alexander January 2022 (has links)
The option expiration effect is a well-studied phenome, however, few studies have implemented machine learning models to predict the effect on the underlying stock market due to options expiration. In this paper four machine learning models, SVM, random forest, AdaBoost, and LSTM, are evaluated on their ability to predict whether the underlying index rises or not on the day of option expiration. The options expiration effect is mainly driven by portfolio rebalancing made by market makers who aim to maintain delta-neutral portfolios. Whether or not market makers need to rebalance their portfolios depend on at least two variables; gamma and open interest. Hence, the machine learning models in this study use gamma exposure (i.e. a combination of gamma and open interest) to predict the options expiration effect. Furthermore, four architectures of LSTM are implemented and evaluated. The study shows that a three-layered many-to-one LSTM model achieves superior results with an F1 score of 62%. However, none of the models achieved better predictions than a model that predicts only positive classes. Some of the problems regarding gamma exposure are discussed and possible improvements for future studies are given. / Flera studier har visat att optionsmarknaden påverkar aktiemarknaden, speciellt vid optioners utgångsdatum. Dock har få studier undersökt maskininlärningsmodellers förmåga att förutse denna effekt. I den här studien, implementeras och utvärderas fyra olika maskininlärningsmodeller, SVM, random forest, AdaBoost, och LSTM, med syftet att förutse om den underliggande aktiemarknaden stiger vid optioners utgångsdatum. Att optionsmarknaden påverkar aktiemarknaden vid optioners utgångsdatum beror på att market makers ombalanserar sina portföljer för att bibehålla en delta-neutral portfölj. Market makers behov av att ombalansera sina portföljer beror på åtminstone två variabler; gamma och antalet aktiva optionskontrakt. Därmed använder maskininlärningsmodellerna i denna studie GEX, som är en kombination av gamma och antalet aktiva optionskontrakt, med syftet att förutse om marknaden stiger vid optioners utgångsdatum. Vidare implementeras och utvärderas fyra olika varianter av LSTM modeller. Studien visar att en many-to-one LSTM modell med tre lager uppnådde bäst resultat med ett F1 score på 62%. Dock uppnådde ingen av modellerna bättre resultat än en modell som predicerar endast positiva klasser. Avslutningsvis diskuteras problematiken med att använda GEX och rekommendationer för framtida studier ges.
|
55 |
Machine Learning for Inverse DesignThomas, Evan 08 February 2023 (has links)
"Inverse design" formulates the design process as an inverse problem; optimal values of a parameterized design space are sought so to best reproduce quantitative outcomes from the forwards dynamics of the design's intended environment. Arguably, two subtasks are necessary to iteratively solve such a design problem; the generation and evaluation of designs. This thesis work documents two experiments leveraging machine learning (ML) to facilitate each subtask. Included first is a review of relevant physics and machine learning theory. Then, analysis on the theoretical foundations of ensemble methods realizes a novel equation describing the effect of Bagging and Random Forests on the expected mean squared error of a base model.
Complex models of design evaluation may capture environmental dynamics beyond those that are useful for a design optimization. These constitute unnecessary time and computational costs. The first experiment attempts to avoid these by replacing EGSnrc, a Monte Carlo simulation of coupled electron-photon transport, with an efficient ML "surrogate model". To investigate the benefits of surrogate models, a simulated annealing design optimization is twice conducted to reproduce an arbitrary target design, once using EGSnrc and once using a random forest regressor as a surrogate model. It is found that using the surrogate model produced approximately an 100x speed-up, and converged upon an effective design in fewer iterations. In conclusion, using a surrogate model is faster and (in this case) also more effective per-iteration.
The second experiment of this thesis work leveraged machine learning for design generation. As a proof-of-concept design objective, the work seeks to efficiently sample 2D Ising spin model configurations from an optimized design space with a uniform distribution of internal energies. Randomly sampling configurations yields a narrow Gaussian distribution of internal energies. Convolutional neural networks (CNN) trained with NeuroEvolution, a mutation-only genetic algorithm, were used to statistically shape the design space. Networks contribute to sampling by processing random inputs, their outputs are then regularized into acceptable configurations. Samples produced with CNNs had more uniform distribution of internal energies, and ranged across the entire space of possible values. In combination with conventional sampling methods, these CNNs can facilitate the sampling of configurations with uniformly distributed internal energies.
|
56 |
Incorporating Climate Sensitivity for Eastern United States Tree Species into the Forest Vegetation SimulatorJiang, Huiquan 09 September 2015 (has links)
Detecting climate-induced effects in forest ecosystems become increasingly important as more evidence of greenhouse-gas-related climate change were founded. The Forest Vegetation Simulator (FVS) is an important growth and yield model used to support management and planning on public forest lands over the southern United States, however its prediction accuracy was challenged due to its climate- insensitive nature. The goal of this study was to develop species-specific prediction models for eastern U.S. forest tree species with climate and soil properties as predictors in order to incorporate the effects of climate and soils-based variables on forest growth and yield into FVS-Sn. Development of climate- sensitive models for site index, individual-tree mortality and diameter increment were addressed separately, which were all developed using Random Forests on the basis of USDA Forest Service Forest Inventory and Analysis program linked to contemporary climate data and soil properties mapped in the USDA Soil Survey Geographic SSURGO database. Results showed climate was a stronger driver of site index than soils. When soils and climate were used together, site index predictions for species grouped as conifers or hardwoods were almost as precise as species-specific models for many of the most common eastern forest tree species. Model comparison was conducted to pursue the most suitable individual-tree mortality prediction model for 20 most important species among Logistic Regression, Random Forests, and Artificial Neural Networks. Results showed that Random Forests with all indicators involved generally performed well, especially sound for species with medium and high mortality. At a chosen threshold, it frequently achieved the equally highest value of sensitivity and specificity among chosen candidates. To evaluate the prediction ability of Random Forests model on individual-tree diameter increment, Multiple Linear Regression model was built as baseline on each of most common 20 species eastern U.S. area. Comparison results showed that Random Forests gained advantages in model validation and future projection under climate change. Using the developed climate-sensitive models, multiple maps were produced to illustrate how forest tree growth, yield, and mortality of individual tree may change in the eastern U.S. over the 21st century under several climate change scenarios. / Ph. D.
|
57 |
Machine learning experiments with artificially generated big data from small immunotherapy datasetsMahmoud, Ahsanullah Y., Neagu, Daniel, Scrimieri, Daniele, Abdullatif, Amr A.A. 13 December 2022 (has links)
Yes / Big data and machine learning result in agile and
robust healthcare by expanding raw data into useful patterns
for data-enhanced decision support. The available datasets are
mostly small and unbalanced, resulting in non-optimal classification when the algorithms are implemented. In this study, five
novel machine learning experiments are conducted to address
the challenges of small datasets by expanding these into big
data and then utilising Random Forests. The experiments are
based on personalised adaptable strategies for both balanced
and unbalanced datasets. Multiple datasets from cryotherapy
and immunotherapy are considered, however, hereby only
immunotherapy is used. In the first experiment, artificially
generated data is presented by increasing the observations of
the dataset, each new data is four-time larger than the previous
one, resulting in better classification. In the second experiment,
the effect of volume on classification is considered based on
the number of attributes. The attributes of each new dataset
are built based on conditional probabilities. It did not make
any difference, in obtained classification, when the number of
attributes is increased to more than 879. In the third simulation
experiment, classes of data are classified manually by dividing
the data into a two-dimensional plane. This experiment is first
performed on small data and then on expanded big data: by
increasing observations, an accuracy of 73.68% is attained. In
the fourth experiment, the visualisation of the enlarged data did
not provide better insights. In the fifth experiment, the impact
of correlations among datasets’ attributes on classification
is observed, however, no improvements in performance are
achieved. The experiments generally improved performance
by comparing the classification results using the original and
artificial data.
|
58 |
Forecasting channel ranks in simulated 5G networks for carrier aggregationKarlsson, Sebastian January 2024 (has links)
Carrier aggregation is a technology in wireless communications which allows a user to use multiple cells simultaneously for communication. In order to select cells, it is crucial to estimate their potential throughput for a given user. As a part of this estimate, we investigate how many MIMO layers a given channel can expect to use in the future, and whether machine learning can be used to predict the number of layers. Simulated user traces are used to generate training data, and special attention is directed at the construction of features based on user history. Random forests and multi-layer perceptrons are trained on the generated data, and we show that the random forests achieve better performance than baseline models, while the MLP models fail to learn and do not reach the expected performance. The importance of the used features is analysed, and we find that the history-based features are especially useful for predicting future channel ranks and thus are promising for use in a cell set selection system for carrier aggregation.
|
59 |
[pt] ENSAIOS SOBRE TAXAS DE JUROS NEGATIVAS E PROJEÇÃO DO PIB / [en] ESSAYS ON NEGATIVE INTEREST RATES AND GDP FORECASTINGFERNANDA MAGALHAES RUMENOS GUARDADO 02 July 2021 (has links)
[pt] Esta tese é composta por três artigos. O primeiro monta um modelo DSGE baseado em Gertler e Karadi (2011), para estudar os efeitos da adoção de políticas de taxas de juros negativas concomitantes à intervenções de liquidez por parte do Banco Central, em um cenário em que o zero lower bound (ZLB)
é transferido dos bancos centrais para os bancos privados. Mostramos que, durante uma recessão, se os bancos privados não repassam as taxas negativas para seus depositantes em um ambiente de elevadas injeções de liquidez por parte do banco central, as consequências negativas do ZLB original se mantêm e
a recuperação é mais lenta. O segundo artigo usa uma versão mais simplificada do mesmo modelo para estudar a adoção de moedas digitais por parte do banco central, que poderia reestabelecer a transmissão de política monetária sob taxas de juros negativas, e analisa as respostas da economia a choques
de política monetária sob este regime. Mostramos que, apesar de se mostrar um ferramenta adicional interessante para o banco central, o efeito riqueza envolvido com mudanças exclusivamente da taxa de juros da moeda digital tornam-a um instrumento contra-cíclico menos confiável. O terceiro artigo
testa diferentes modelos de projeção para o crescimento do PIB americano de médio prazo. Utilizamos novos métodos, como adaLASSO e Random Forest, em conjunto com um conjunto grande de regressores, para elevar a acurácia sobre modelos tradicionais de projeção, como auto-regressões e modelos DSGE. O artigo aponta que Random Forest é capaz de projeções superiores ao longo de um horizonte de dois anos, mas não tem performance consistemente superior para projeção de crescimento do produto potencial ou do hiato do produto. / [en] The thesis is composed of three essays. The first designs a DSGE model based on Gertler and Karadi (2011) to study the effects of the adoption of negative interest rate policies along with liquidity intervention, in a scenario where the ZLB is transferred to private banks instead of central banks. We show that, during a recession, if banks do not pass along negative rates to depositors in an environment of heavy liquidity injection by the Central Bank, the main negative economic effects of the original ZLB are maintained and the recovery is slower. The second essay uses the same model in a simpler setting to study how the adoption of central bank digital currencies (CBDCs) might reestablish the traditional monetary policy transmission under negative interest rates, and analyses the responses of the economy under such a regime to monetary policy shocks. We show that while the adoption of a CBDC might improve the monetary policy toolkit, the wealth effects involved with changes exclusively in its interest rates make it a less reliable counter-cyclical tool. The third essay tries different models for the forecast of medium-term output
growth. We use new methods such as adaLASSO and Random Forest, along with a very large data set of regressors, in order to improve accuracy over traditional model long term forecasting such as autoregressions and DSGE models, which have a very good track record. We show that Random Forest is
able to better predict output growth over the two year horizon, but has mixed results in forecasting trend GDP growth and the output gap.
|
60 |
Discriminative hand-object pose estimation from depth images using convolutional neural networksGoudie, Duncan January 2018 (has links)
This thesis investigates the task of estimating the pose of a hand interacting with an object from a depth image. The main contribution of this thesis is the development of our discriminative one-shot hand-object pose estimation system. To the best of our knowledge, this is the first attempt at a one-shot hand-object pose estimation system. It is a two stage system consisting of convolutional neural networks. The first stage segments the object out of the hand from the depth image. This hand-minus-object depth image is combined with the original input depth image to form a 2-channel image for use in the second stage, pose estimation. We show that using this 2-channel image produces better pose estimation performance than a single stage pose estimation system taking just the input depth map as input. We also believe that we are amongst the first to research hand-object segmentation. We use fully convolutional neural networks to perform hand-object segmentation from a depth image. We show that this is a superior approach to random decision forests for this task. Datasets were created to train our hand-object pose estimator stage and hand-object segmentation stage. The hand-object pose labels were estimated semi-automatically with a combined manual annotation and generative approach. The segmentation labels were inferred automatically with colour thresholding. To the best of our knowledge, there were no public datasets for these two tasks when we were developing our system. These datasets have been or are in the process of being publicly released.
|
Page generated in 0.0856 seconds