Global ETD Search

51	Comparing machine learning models and physics-based models in groundwater science Boerman, Thomas Christiaan 25 January 2022 (has links) The use of machine learning techniques in tackling hydrological problems has significantly increased over the last decade. Machine learning tools can provide alternatives or surrogates to complex and comprehensive methodologies such as physics-based numerical models. Machine learning algorithms have been used in hydrology for estimating streamflow, runoff, water table fluctuations and calculating the impacts of climate change on nutrient loading among many other applications. In recent years we have also seen arguments for and advances in combining physics-based models and machine learning algorithms for mutual benefit. This thesis contributes to these advances by addressing two different groundwater problems by developing a machine learning approach and comparing this previously developed physics-based models: i) estimating groundwater and surface water depletion caused by groundwater pumping using artificial neural networks and ii) estimating a global steady-state map of water table depth using random forests. The first chapter of this thesis outlines the purpose of this thesis and how this thesis is a contribution to the overall scientific knowledge on the topic. The results of this research contribute to three of the twenty-three major unsolved problems in hydrology, as has been summarized by a collective of hundreds of hydrologists. In the second chapter, we tested the potential of artificial neural networks (ANNs), a deeplearning tool, as an alternative method for estimating source water of groundwater abstraction compared to conventional methods (analytical solutions and numerical models). Surrogate ANN models of three previously calibrated numerical groundwater models were developed using hydrologically meaningful input parameters (e.g., well-stream distance and hydraulic diffusivity) selected by predictor parameter optimization, combining hydrological expertise and statistical methodologies (ANCOVA). The output parameters were three transient sources of groundwater abstraction (shallow and deep storage release, and local surface-water depletion). We found that the optimized ANNs have a predictive skill of up to 0.84 (R2, 2σ = ± 0.03) when predicting water sources compared to physics-based numerical (MODFLOW) models. Optimal ANN skill was obtained when using between five and seven predictor parameters, with hydraulic diffusivity and mean aquifer thickness being the most important predictor parameters. Even though initial results are promising and computationally frugal, we found that the deep learning models were not yet sufficient or outperforming numerical model simulations. The third chapter used random forests in mapping steady-state water table depth on a global scale (0.1°-spatial resolution) and to integrate the results to improve our understanding on scale and perceptual modeling of global water table depth. In this study we used a spatially biased ~1.5-million-point database of water table depth observations with a variety of iv globally distributed above- and below-ground predictor variables with causal relationships to steady-state water table depth. We mapped water table depth globally as well as at regional to continental scales to interrogate performance, feature importance and hydrologic process across scales and regions with varying hydrogeological landscapes and climates. The global water table depth map has a correlation (cross validation error) of R2 = 0.72 while our highest continental correlation map (Australia) has a correlation of R2 = 0.86. The results of this study surprisingly show that above-ground variables such as surface elevation, slope, drainage density and precipitation are among the most important predictor parameters while subsurface parameters such as permeability and porosity are notably less important. This is contrary to conventional thought among hydrogeologists, who would assume that subsurface parameters are very important. Machine learning results overall underestimate water table depth similar to existing global physics-based groundwater models which also have comparable differences between existing physics-based groundwater models themselves. The feature importance derived from our random forest models was used to develop alternative perceptual models that highlight different water table depth controls between areas with low relief and high relief. Finally, we considered the representativeness of the prediction domain and the predictor database and found that 90% of the prediction domain has a dissimilarity index lower than 0.75. We conclude that we see good extrapolation potential for our random forest models to regions with unknown water table depth, except for some high elevation regions. Finally in chapter four, the most important findings of chapters two and three are considered as contributions to the unresolved questions in hydrology. Overall, this thesis has contributed to advancing hydrological sciences through: i) mapping of global steady-state water table depth using machine learning; ii) advancing hybrid modeling by using synthetic data derived from physics-based models to train an artificial neural network for estimating storage depletion; and (iii) it contributing to answering three unsolved problems in hydrology involving themes of parameter scaling across temporal and spatial scales, extracting hydrological insight from data, the use of innovative modeling techniques to estimate hydrological fluxes/states and extrapolation of models to no-data regions. / Graduate machine learning hydrology groundwater numerical models random forests neural networks data science earth science artificial intelligence
52	Next Stop Eastie: Using Machine Learning to Predict Socioeconomic Change in Boston and Beyond LaPlante, Rita January 2022 (has links) Thesis advisor: Christopher Maxwell / This paper examines neighborhood socioeconomic ascent in both Boston and the Greater Boston metropolitan statistical area. Using random forests, a supervised machine learning algorithm, and a collection of physical and demographic neighborhood characteristics gathered from the American Community Survey, I model changes in neighborhood socioeconomic status and identify neighborhoods in my study area that experienced relative socioeconomic ascent or relative socioeconomic decline between 2010 and 2019. In order to gain a better understanding of future socioeconomic change throughout my study area, I use a random forests model to predict neighborhood socioeconomic status in 2028. I find that my best random forests model offers an improvement over traditional linear modeling techniques and, through mapping results for Boston specifically, that change in Boston is occurring in minority, working class neighborhoods, especially along the city’s waterfront. These findings, in combination with qualitative community data, can be used to inform policy concerning matters ranging from housing to transportation in the years to come. / Thesis (BA) — Boston College, 2022. / Submitted to: Boston College. College of Arts and Sciences. / Discipline: Departmental Honors. / Discipline: Economics. Machine Learning Random Forests Boston American Community Survey Principal Components Analysis Socioeconomic Change
53	Habitat Selection and Response to Disturbance by Pygmy Rabbits in Utah Edgel, Robert John 18 March 2013 (has links) (PDF) The pygmy rabbit (Brachylagus idahoensis) is a sagebrush (Artemisia sp.) obligate that depends on sagebrush habitats for food and cover throughout its life cycle. Invasive species, frequent fires, overgrazing, conversion of land to agriculture, energy development, and many other factors have contributed to recent declines in both quantity and quality of sagebrush-steppe habitats required by pygmy rabbits. Because of the many threats to these habitats and the believed decline of pygmy rabbit populations, there is a need to further understand habitat requirements for this species and how they respond to disturbance. This study evaluated habitat selection by pygmy rabbits in Utah and assessed response of this small lagomorph to construction of a large-scale pipeline (i.e. Ruby pipeline) in Utah. We collected habitat data across Utah at occupied sites (pygmy rabbit occupied burrows) and compared these data to similar measurements at unoccupied sites (random locations within sagebrush habitat where pygmy rabbits were not observed). Variables such as horizontal obscurity, elevation, percent understory composed of sagebrush and other shrubs, and sagebrush decadence best described between occupied (active burrow) and unoccupied (randomly selected) sites. Occupied sites had greater amounts of horizontal obscurity, were located at higher elevations, had greater percentage of understory comprised of sagebrush and shrubs, and had less decadent sagebrush. When considering habitat alterations or management these variables should be considered to enhance and protect existing habitat for pygmy rabbits. The Ruby pipeline was a large-scale pipeline project that required the removal of vegetation and the excavation of soil in a continuous linear path for the length of the pipeline. The area that was disturbed is referred to as the right of way (ROW). From our assessment of pygmy rabbit response to construction of the Ruby pipeline, we found evidence for habitat loss and fragmentation as a result of this disturbance. The size of pygmy rabbit space-use areas and home ranges decreased post construction, rabbits shifted core-use areas away from the ROW, and there were fewer movements of collared rabbits across the ROW. Mitigation efforts should consider any action which may reduce restoration time and facilitate movements of rabbits across disturbed areas. Brachylagus idahoensis sagebrush fragmentation energy development pipeline habitat loss Random Forests sagebrush obligate Animal Sciences
54	Predicting the Options Expiration Effect Using Machine Learning Models Trained With Gamma Exposure Data / Prediktion av inverkan på aktiemarknaden då optioner upphör med hjälp av maskininlärningsmodeller tränade med dagliga GEX värden Dubois, Alexander January 2022 (has links) The option expiration effect is a well-studied phenome, however, few studies have implemented machine learning models to predict the effect on the underlying stock market due to options expiration. In this paper four machine learning models, SVM, random forest, AdaBoost, and LSTM, are evaluated on their ability to predict whether the underlying index rises or not on the day of option expiration. The options expiration effect is mainly driven by portfolio rebalancing made by market makers who aim to maintain delta-neutral portfolios. Whether or not market makers need to rebalance their portfolios depend on at least two variables; gamma and open interest. Hence, the machine learning models in this study use gamma exposure (i.e. a combination of gamma and open interest) to predict the options expiration effect. Furthermore, four architectures of LSTM are implemented and evaluated. The study shows that a three-layered many-to-one LSTM model achieves superior results with an F1 score of 62%. However, none of the models achieved better predictions than a model that predicts only positive classes. Some of the problems regarding gamma exposure are discussed and possible improvements for future studies are given. / Flera studier har visat att optionsmarknaden påverkar aktiemarknaden, speciellt vid optioners utgångsdatum. Dock har få studier undersökt maskininlärningsmodellers förmåga att förutse denna effekt. I den här studien, implementeras och utvärderas fyra olika maskininlärningsmodeller, SVM, random forest, AdaBoost, och LSTM, med syftet att förutse om den underliggande aktiemarknaden stiger vid optioners utgångsdatum. Att optionsmarknaden påverkar aktiemarknaden vid optioners utgångsdatum beror på att market makers ombalanserar sina portföljer för att bibehålla en delta-neutral portfölj. Market makers behov av att ombalansera sina portföljer beror på åtminstone två variabler; gamma och antalet aktiva optionskontrakt. Därmed använder maskininlärningsmodellerna i denna studie GEX, som är en kombination av gamma och antalet aktiva optionskontrakt, med syftet att förutse om marknaden stiger vid optioners utgångsdatum. Vidare implementeras och utvärderas fyra olika varianter av LSTM modeller. Studien visar att en many-to-one LSTM modell med tre lager uppnådde bäst resultat med ett F1 score på 62%. Dock uppnådde ingen av modellerna bättre resultat än en modell som predicerar endast positiva klasser. Avslutningsvis diskuteras problematiken med att använda GEX och rekommendationer för framtida studier ges. AdaBoost LSTM Machine learning Random forests Stock markets SVM Computer and Information Sciences Data- och informationsvetenskap
55	Machine Learning for Inverse Design Thomas, Evan 08 February 2023 (has links) "Inverse design" formulates the design process as an inverse problem; optimal values of a parameterized design space are sought so to best reproduce quantitative outcomes from the forwards dynamics of the design's intended environment. Arguably, two subtasks are necessary to iteratively solve such a design problem; the generation and evaluation of designs. This thesis work documents two experiments leveraging machine learning (ML) to facilitate each subtask. Included first is a review of relevant physics and machine learning theory. Then, analysis on the theoretical foundations of ensemble methods realizes a novel equation describing the effect of Bagging and Random Forests on the expected mean squared error of a base model. Complex models of design evaluation may capture environmental dynamics beyond those that are useful for a design optimization. These constitute unnecessary time and computational costs. The first experiment attempts to avoid these by replacing EGSnrc, a Monte Carlo simulation of coupled electron-photon transport, with an efficient ML "surrogate model". To investigate the benefits of surrogate models, a simulated annealing design optimization is twice conducted to reproduce an arbitrary target design, once using EGSnrc and once using a random forest regressor as a surrogate model. It is found that using the surrogate model produced approximately an 100x speed-up, and converged upon an effective design in fewer iterations. In conclusion, using a surrogate model is faster and (in this case) also more effective per-iteration. The second experiment of this thesis work leveraged machine learning for design generation. As a proof-of-concept design objective, the work seeks to efficiently sample 2D Ising spin model configurations from an optimized design space with a uniform distribution of internal energies. Randomly sampling configurations yields a narrow Gaussian distribution of internal energies. Convolutional neural networks (CNN) trained with NeuroEvolution, a mutation-only genetic algorithm, were used to statistically shape the design space. Networks contribute to sampling by processing random inputs, their outputs are then regularized into acceptable configurations. Samples produced with CNNs had more uniform distribution of internal energies, and ranged across the entire space of possible values. In combination with conventional sampling methods, these CNNs can facilitate the sampling of configurations with uniformly distributed internal energies. Machine Learning Inverse Design Bagging Optimization Artificial Intelligence Random Forests Automation Design Surrogate Model Generative Design
56	Incorporating Climate Sensitivity for Eastern United States Tree Species into the Forest Vegetation Simulator Jiang, Huiquan 09 September 2015 (has links) Detecting climate-induced effects in forest ecosystems become increasingly important as more evidence of greenhouse-gas-related climate change were founded. The Forest Vegetation Simulator (FVS) is an important growth and yield model used to support management and planning on public forest lands over the southern United States, however its prediction accuracy was challenged due to its climate- insensitive nature. The goal of this study was to develop species-specific prediction models for eastern U.S. forest tree species with climate and soil properties as predictors in order to incorporate the effects of climate and soils-based variables on forest growth and yield into FVS-Sn. Development of climate- sensitive models for site index, individual-tree mortality and diameter increment were addressed separately, which were all developed using Random Forests on the basis of USDA Forest Service Forest Inventory and Analysis program linked to contemporary climate data and soil properties mapped in the USDA Soil Survey Geographic SSURGO database. Results showed climate was a stronger driver of site index than soils. When soils and climate were used together, site index predictions for species grouped as conifers or hardwoods were almost as precise as species-specific models for many of the most common eastern forest tree species. Model comparison was conducted to pursue the most suitable individual-tree mortality prediction model for 20 most important species among Logistic Regression, Random Forests, and Artificial Neural Networks. Results showed that Random Forests with all indicators involved generally performed well, especially sound for species with medium and high mortality. At a chosen threshold, it frequently achieved the equally highest value of sensitivity and specificity among chosen candidates. To evaluate the prediction ability of Random Forests model on individual-tree diameter increment, Multiple Linear Regression model was built as baseline on each of most common 20 species eastern U.S. area. Comparison results showed that Random Forests gained advantages in model validation and future projection under climate change. Using the developed climate-sensitive models, multiple maps were produced to illustrate how forest tree growth, yield, and mortality of individual tree may change in the eastern U.S. over the 21st century under several climate change scenarios. / Ph. D. Climate change Random Forests Site index Mortality Diameter increment Climate and soils Bootstrap
57	Forecasting channel ranks in simulated 5G networks for carrier aggregation Karlsson, Sebastian January 2024 (has links) Carrier aggregation is a technology in wireless communications which allows a user to use multiple cells simultaneously for communication. In order to select cells, it is crucial to estimate their potential throughput for a given user. As a part of this estimate, we investigate how many MIMO layers a given channel can expect to use in the future, and whether machine learning can be used to predict the number of layers. Simulated user traces are used to generate training data, and special attention is directed at the construction of features based on user history. Random forests and multi-layer perceptrons are trained on the generated data, and we show that the random forests achieve better performance than baseline models, while the MLP models fail to learn and do not reach the expected performance. The importance of the used features is analysed, and we find that the history-based features are especially useful for predicting future channel ranks and thus are promising for use in a cell set selection system for carrier aggregation. Machine learning 5G Simulation Carrier aggregation Cell set selection Random forests Neural networks Communication Systems Kommunikationssystem
58	Machine learning experiments with artificially generated big data from small immunotherapy datasets Mahmoud, Ahsanullah Y., Neagu, Daniel, Scrimieri, Daniele, Abdullatif, Amr A.A. 13 December 2022 (has links) Yes / Big data and machine learning result in agile and robust healthcare by expanding raw data into useful patterns for data-enhanced decision support. The available datasets are mostly small and unbalanced, resulting in non-optimal classification when the algorithms are implemented. In this study, five novel machine learning experiments are conducted to address the challenges of small datasets by expanding these into big data and then utilising Random Forests. The experiments are based on personalised adaptable strategies for both balanced and unbalanced datasets. Multiple datasets from cryotherapy and immunotherapy are considered, however, hereby only immunotherapy is used. In the first experiment, artificially generated data is presented by increasing the observations of the dataset, each new data is four-time larger than the previous one, resulting in better classification. In the second experiment, the effect of volume on classification is considered based on the number of attributes. The attributes of each new dataset are built based on conditional probabilities. It did not make any difference, in obtained classification, when the number of attributes is increased to more than 879. In the third simulation experiment, classes of data are classified manually by dividing the data into a two-dimensional plane. This experiment is first performed on small data and then on expanded big data: by increasing observations, an accuracy of 73.68% is attained. In the fourth experiment, the visualisation of the enlarged data did not provide better insights. In the fifth experiment, the impact of correlations among datasets’ attributes on classification is observed, however, no improvements in performance are achieved. The experiments generally improved performance by comparing the classification results using the original and artificial data. Immunotherapy Big data Machine learning Classification Random Forests Warts Cryotherapy Health-care
59	[pt] ENSAIOS SOBRE TAXAS DE JUROS NEGATIVAS E PROJEÇÃO DO PIB / [en] ESSAYS ON NEGATIVE INTEREST RATES AND GDP FORECASTING FERNANDA MAGALHAES RUMENOS GUARDADO 02 July 2021 (has links) [pt] Esta tese é composta por três artigos. O primeiro monta um modelo DSGE baseado em Gertler e Karadi (2011), para estudar os efeitos da adoção de políticas de taxas de juros negativas concomitantes à intervenções de liquidez por parte do Banco Central, em um cenário em que o zero lower bound (ZLB) é transferido dos bancos centrais para os bancos privados. Mostramos que, durante uma recessão, se os bancos privados não repassam as taxas negativas para seus depositantes em um ambiente de elevadas injeções de liquidez por parte do banco central, as consequências negativas do ZLB original se mantêm e a recuperação é mais lenta. O segundo artigo usa uma versão mais simplificada do mesmo modelo para estudar a adoção de moedas digitais por parte do banco central, que poderia reestabelecer a transmissão de política monetária sob taxas de juros negativas, e analisa as respostas da economia a choques de política monetária sob este regime. Mostramos que, apesar de se mostrar um ferramenta adicional interessante para o banco central, o efeito riqueza envolvido com mudanças exclusivamente da taxa de juros da moeda digital tornam-a um instrumento contra-cíclico menos confiável. O terceiro artigo testa diferentes modelos de projeção para o crescimento do PIB americano de médio prazo. Utilizamos novos métodos, como adaLASSO e Random Forest, em conjunto com um conjunto grande de regressores, para elevar a acurácia sobre modelos tradicionais de projeção, como auto-regressões e modelos DSGE. O artigo aponta que Random Forest é capaz de projeções superiores ao longo de um horizonte de dois anos, mas não tem performance consistemente superior para projeção de crescimento do produto potencial ou do hiato do produto. / [en] The thesis is composed of three essays. The first designs a DSGE model based on Gertler and Karadi (2011) to study the effects of the adoption of negative interest rate policies along with liquidity intervention, in a scenario where the ZLB is transferred to private banks instead of central banks. We show that, during a recession, if banks do not pass along negative rates to depositors in an environment of heavy liquidity injection by the Central Bank, the main negative economic effects of the original ZLB are maintained and the recovery is slower. The second essay uses the same model in a simpler setting to study how the adoption of central bank digital currencies (CBDCs) might reestablish the traditional monetary policy transmission under negative interest rates, and analyses the responses of the economy under such a regime to monetary policy shocks. We show that while the adoption of a CBDC might improve the monetary policy toolkit, the wealth effects involved with changes exclusively in its interest rates make it a less reliable counter-cyclical tool. The third essay tries different models for the forecast of medium-term output growth. We use new methods such as adaLASSO and Random Forest, along with a very large data set of regressors, in order to improve accuracy over traditional model long term forecasting such as autoregressions and DSGE models, which have a very good track record. We show that Random Forest is able to better predict output growth over the two year horizon, but has mixed results in forecasting trend GDP growth and the output gap. [pt] PROJECAO [pt] RANDOM FORESTS [pt] MOEDAS DIGITAIS DE BANCOS CENTRAIS [pt] RESERVAS EXCEDENTES [pt] BANCOS CENTRAIS [pt] TAXAS DE JUROS NEGATIVAS [pt] ADALASSO [pt] POLITICA MONETARIA [en] PROJECTION [en] RANDOM FORESTS [en] ADALASSO [en] MONETARY POLICY
60	Discriminative hand-object pose estimation from depth images using convolutional neural networks Goudie, Duncan January 2018 (has links) This thesis investigates the task of estimating the pose of a hand interacting with an object from a depth image. The main contribution of this thesis is the development of our discriminative one-shot hand-object pose estimation system. To the best of our knowledge, this is the first attempt at a one-shot hand-object pose estimation system. It is a two stage system consisting of convolutional neural networks. The first stage segments the object out of the hand from the depth image. This hand-minus-object depth image is combined with the original input depth image to form a 2-channel image for use in the second stage, pose estimation. We show that using this 2-channel image produces better pose estimation performance than a single stage pose estimation system taking just the input depth map as input. We also believe that we are amongst the first to research hand-object segmentation. We use fully convolutional neural networks to perform hand-object segmentation from a depth image. We show that this is a superior approach to random decision forests for this task. Datasets were created to train our hand-object pose estimator stage and hand-object segmentation stage. The hand-object pose labels were estimated semi-automatically with a combined manual annotation and generative approach. The segmentation labels were inferred automatically with colour thresholding. To the best of our knowledge, there were no public datasets for these two tasks when we were developing our system. These datasets have been or are in the process of being publicly released. 004

Search results