Global ETD Search

291	Application of machine learning for soil survey updates: A case study in southeastern Ohio Subburayalu, Sakthi Kumaran 18 March 2008 (has links) No description available. Agriculture, Soil Science machine learning data mining soil survey SSURGO updates soil-landscape modeling predictive soil modeling Random Forest
292	Оценка кредитных рисков с применением методов машинного обучения : магистерская диссертация / Credit risk assessment using machine learning methods Спирова, А. С., Spirova, A. S. January 2023 (has links) В рамках исследования были проанализированы данные о кредитных операциях, предоставленные коммерческими банками. Была проведена подробная предобработка и нормализация данных для подготовки их к дальнейшему анализу и использованию в моделях машинного обучения. Основной фокус работы был сосредоточен на применении двух моделей: логистической регрессии и случайного леса. Логистическая регрессия была выбрана из-за своей простоты и интерпретируемости, а случайный лес – из-за своей способности обрабатывать большие объемы данных и выявлять сложные зависимости. В ходе экспериментов было показано, что обе модели успешно справляются с задачей оценки кредитного риска. Логистическая регрессия показала хорошую производительность, быстроту и точность, что делает ее подходящей для применения в реальном времени, например, при личной подаче заявки в банке или при онлайн-заявках. Случайный лес, в свою очередь, достиг высокой точности, хотя требует больше вычислительных ресурсов. Дополнительно, в работе был использован метод генетического программирования для создания новых признаков на основе исходных данных. Этот подход позволил значительно улучшить производительность модели и повысить ее точность. Хотя не все созданные признаки вошли в топ-5 наиболее важных, генетическое программирование оказалось эффективным способом генерации признаков, что имеет важное значение в области оценки кредитного риска. / The study analyzed data on credit transactions provided by commercial banks. Detailed pre-processing and normalization of the data was carried out to prepare it for further analysis and use in machine learning models. The main focus of the work was on the use of two models: logistic regression and random forest. Logistic regression was chosen for its simplicity and interpretability, and random forest for its ability to handle large amounts of data and identify complex relationships. During the experiments, it was shown that both models successfully cope with the task of assessing credit risk. Logistic regression has demonstrated good performance, speed, and accuracy, making it suitable for real-time applications such as in-person applications at a bank or online applications. Random forest, in turn, has achieved high accuracy, although it requires more computing resources. Additionally, the work used the genetic programming method to create new traits based on the original data. This approach significantly improved the model's performance and accuracy. Although not all of the features generated were in the top 5 most important, genetic programming has proven to be an effective way to generate features, which has important implications in the field of credit risk assessment. МАШИННОЕ ОБУЧЕНИЕ СЛУЧАЙНЫЙ ЛЕС MASTER'S THESIS MACHINE LEARNING LOGISTIC REGRESSION RANDOM FOREST GENETIC PROGRAMMING
293	Analysis of Vegetation Vulnerability Dynamics and Driving Forces to Multiple Drought Stresses in a Changing Environment Wei, Xiaoting, Huang, Shengzhi, Huang, Qiang, Liu, Dong, Leng, Guoyong, Yang, Haibo, Duan, Weili, Li, Jianfeng, Bai, Qingjun, Peng, Jian 15 January 2024 (has links) Quantifying changes in the vulnerability of vegetation to various drought stresses in different seasons is important for rational and effective ecological conservation and restoration. However, the vulnerability of vegetation and its dynamics in a changing environment are still unknown, and quantitative attribution analysis of vulnerability changes has been rarely studied. To this end, this study explored the changes of vegetation vulnerability characteristics under various drought stresses in Xinjiang and conducted quantitative attribution analysis using the random forest method. In addition, the effects of ecological water transport and increased irrigation areas on vegetation vulnerability dynamics were examined. The standardized precipitation index (SPI), standardized precipitation-evapotranspiration index (SPEI), and standardized soil moisture index (SSMI) represent atmospheric water supply stress, water and heat supply stress, and soil water supply stress, respectively. The results showed that: (1) different vegetation types responded differently to water stress, with grasslands being more sensitive than forests and croplands in summer; (2) increased vegetation vulnerability under drought stresses dominated in Xinjiang after 2003, with vegetation growth and near-surface temperature being the main drivers, while increased soil moisture in the root zone was the main driver of decreased vegetation vulnerability; (3) vulnerability of cropland to SPI/SPEI/SSMI-related water stress increased due to the rapid expansion of irrigation areas, which led to increasing water demand in autumn that was difficult to meet; and (4) after ecological water transport of the Tarim River Basin, the vulnerability of its downstream vegetation to drought was reduced. info:eu-repo/classification/ddc/551 ddc:551
294	Machine Learning Methods for Predicting Trading Behaviour of an Actively Managed Mutual Fund Forslund, Herman, Johnson, Marcus January 2021 (has links) This paper aims to reverse engineer the tradingstrategy of an actively managed mutual fund by identifyingtechnical patterns in their trading. Investment strategies formany institutional investors consists of both fundamental andtechnical analysis. The purpose of the paper is to explore towhich extent the latter can be used to predict the trading actionsby taking some commonly used technical indicators as input invarious machine learning algorithms to assess patterns betweenthem and the trading of the fund. Furthermore, the technicalindicators’ ability to predict future prices is analysed using thesame methods. The results are not sufficiently clear to suggestthat the fund uses technical indicators to begin with, let alonewhich ones. As for the prediction of future prices, the technicalindicators appear to have some predictive ability. / Syftet med denna rapport är att prediktera handeln i en aktivt förvaltad aktiefond med hjälp av fyra maskininlärningsmetoder. Investeringsstrategier kombinerar i regel två analysmetoder, fundamental respektive teknisk analys. Avsikten med rapporten är att utforska huruvida det sistnämnda kan användas för att förutspå fondens handel genom att använda ett antal vanligt förekommande tekniska indikatorer och medelst maskininlärningsmetoder söka efter mönster mellan dessa och handeln. Vidare innefattar även studien en analys över hur väl tekniska indikatorer predikterar upprespektive nedgångar på aktiepriser. Vad gäller investeringsstrategierna återfanns inga tydliga samband mellan de utvalda indikatorerna och transaktionerna. Resultaten för andra delen av studien tyder på viss prediktiv förmåga för tekniska indikatorer på marknadsrörelser. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm Machine Learning Random Forest XGBoost Long Short-Term Memory AdaBoost Allocation Strategies Elektroteknik och elektronik
295	Machine Learning-based Biometric Identification Israelsson, Hanna, Wrife, Andreas January 2021 (has links) With the rapid development of computers andmodels for machine learning, image recognition has, in recentyears, become widespread in various areas. In this report, imagerecognition is discussed in relation to biometric identificationusing fingerprint images. The aim is to investigate how well abiometric identification model can be trained with an extendeddataset, which resulted from rotating and shifting the images inthe original dataset consisting of very few images. Furthermore,it is investigated how the accuracy of this single-stage modeldiffers from the accuracy of a model with two-stage identification.We chose Random Forest (RF) as the machine learning modeland Scikit default values for the hyperparameters. We furtherincluded five-fold cross-validation in the training process. Theperformance of the trained machine learning model is evaluatedwith testing accuracy and confusion matrices. It was shown thatthe method for extending the dataset was successful. A greaternumber of images gave a greater accuracy in the predictions.Two-stage identification gave approximately the same accuracyas the single-stage method, but both methods would need tobe tested on datasets with images from a greater number ofindividuals before any final conclusions can be drawn. / Tack vare den snabba utvecklingen av datoreroch modeller för maskininlärning har bildigenkänning desenaste åren fått stor spridning i samhället. I denna rapportbehandlas bildigenkänning i relation till biometrisk identifieringi form av fingeravtrycksavläsning. Målet är att undersöka hurväl en modell för biometrisk identifiering kan tränas och testaspå ett dataset med ursprungligen mycket få bilder, om datasettetförst expanderas genom att flertalet kopior av originalbildernaskapas och sedan roteras och förskjuts i olika riktningar.Vidare undersöks hur noggrannheten för denna enstegsmodellskiljer sig jämfört med identifiering i två steg. Vi valdeRandom Forest (RF) som maskininlärningsmodell och Scikitsstandardinställningar för hyperparametrarna. Vidare inkluderadesfemfaldig korsvalidering i träningsprocessen. Prestandanhos den tränade maskininlärningsmodellen bedömdes med hjälpav testnoggrannhet och confusion matriser. Det visades sig attmetoden för att expandera datasettet var framgångsrik. Ettstörre antal bilder gav större noggrannhet i förutsägelserna.Tvåstegsidentifiering gav ungefärligen samma noggrannhet somenstegsidentifiering, men metoderna skulle behöva testas på datamängder med bilder från ett större antal individer innannågra slutgiltiga slutsatser kan dras. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm Machine Learning Biometric identification Classification Dataset expansion Random forest Elektroteknik och elektronik
296	Application of data-driven models in exploring cyanobacterial bloom risks in Lake Mälaren / Tillämpning av datadrivna modeller för att utforska cyanobakterieblomningsrisker i Mälaren Herrera, Abigail Huertas January 2021 (has links) Cyanobacteria are a unique organism, a bacterium that develop photosynthesis, thus it contains chlorophyll, a pigment commonly associated to algae. For this reason, cyanobacteria are also known as blue-green algae. One of the secondary metabolites of cyanobacteria is cyanotoxin, a substance which is hepatoxic, neurotoxic, and dermatoxic. The frequency and intensity of cyanobacterial blooms have been of increasing concern in the last decades for drinking water supply. There is a need to improve monitoring of cyanobacteria content at source water for drinking water supply and its indicators and correlation with other chemical, physical and biological parameters. This study aims to identify the potential cyanobacterial bloom risk in Lake Mälaren by determining the influential chemical and physical parameters using Random Forest in classification mode. The classification was done using the WHO Alert Level Frameworks and study cases for lakes in Sweden. The data used to model was downloaded from the website of the Swedish University of Agricultural Science. It comprises 33 monitoring stations from 1964 to 2020, 21 chemical parameters, including cyanobacteria biovolume and chlorophyll content. Given the heterogeneity of data, the monitoring stations were grouped into Clusters. Using the data, statistical, correlation, time series, and principal component analysis were performed. Through these methods, spatial, distribution and temporal analysis were obtained. Afterwards, several models were determined using Random Forest. However, the mean values of cyanobacteria distributed over time indicated a medium risk, the maximum values suggested high risk in several areas of the Lake. Maximum concentrations were present at the west and northeast of the Lake, where the major inflows from the Watershed are discharged. As the water flows through the basin, the concentration of cyanobacteria reduces by half, which suggested that the large and deep bays act as sedimentation ponds. A very high correlation was found between the Cluster 5 and 6, east and middle northeast of the Lake, respectively. Finally, the contributing factors identified after modelling cyanobacteria as target factor were chlorophyll, month, water temperature, oxygen content, transparency, NO2NO3N, TN/TP, Ca, Mg and Cl. / Cyanobakterier är unika organismer, bakterier som utvecklar fotosyntes, så de innehåller klorofyll, ett pigment som vanligtvis förknippas med alger. Av denna anledning är cyanobakterier också kända som blågröna alger. En av de sekundära metaboliterna av cyanobakterier är cyanotoxin, ett ämne som är hepatoxiskt, neurotoxiskt och dermatoxiskt. Frekvensen och intensiteten av cyanobakterieblomningar har varit ett ökande problem under de senaste decennierna för dricksvattenförsörjningen. Många vattenreningsverk mäter inte innehållet av cyanobakterier i vatten; medan andra kemiska, fysikaliska och biologiska parametrar mäts. Denna studie syftar till att identifiera den potentiella risken för cyanobakteriell blomning i Mälaren genom att bestämma de mest inflytelserika kemiska och fysikaliska parametrarna med hjälp av metoden Random Forest i klassificeringsläge. Klassificeringen gjordes med hjälp av WHO Alert Level Frameworks och olika studier av sjöar i Sverige. Data som användes för att modellera laddades ner från Sveriges Lantbruksuniversitets webbplats. Den omfattar 33 övervakningsstationer från 1964 till 2020, med 21 kemiska parametrar, inklusive cyanobakteriers biovolym och klorofyllhalt. Med tanke på heterogeniteten i data grupperades övervakningsstationerna i kluster. Med hjälp av data utfördes statistisk analys, korrelation, tidsserier och huvudkomponentanalys. Genom dessa metoder erhölls rumslig, distribution och tidsanalys. Efteråt bestämdes flera modeller med hjälp av Random Forest. Medelvärdena för cyanobakterier fördelade över tiden indikerade en medelrisk, medan maximivärdena antydde något annat. Maximala koncentrationer fanns i väster och nordost om Mälaren, där de stora inflödena från vattendelaren släpps ut. När vattnet rinner genom bassängen minskar koncentrationen av cyanobakterier till hälften, vilket tyder på att de stora och djupa vikarna fungerar som sedimentationsdammar. En mycket hög korrelation hittades mellan kluster 5 och 6, öster respektive mellan nordost om sjön. Slutligen var de viktigaste faktorerna som identifierades efter modellering av cyanobakterier som målfaktor klorofyll, månad, vattentemperatur, syrehalt, transparens, NO2NO3N, TN/TP, Ca, Mg och Cl. Random Forest blue-green algae WHO Alert Level Framework drinking water supply cyanobacterial blooms Engineering and Technology Teknik och teknologier
297	Machine learning assisted convective wall heat transfer models for fire modeling along vertical walls, ceilings and floors Jie Tao (18859882) 24 June 2024 (has links) <p dir="ltr">Fires cause significant casualties and property damage. As critical component of indoor and building fires, fires along a surface (vertical or horizontal) contribute significantly to fire spreading and resulted damage. Accurately predicting the interactions between a wall surface and fire is crucial to minimizing losses. Computational methods, such as large-eddy simulations (LES), can result in errors in fire modeling along a surface due to various model and numerical errors among which the error in the convective wall heat transfer models is an important source. The convective heat transfer model error grows when the grid resolution near a thermal boundary layer along a wall surface decreases. Traditional wall-function based heat transfer models, mostly developed for forced convection heat transfer problems, tend to fail in the buoyancy-driven fire wall heat transfer. It is imperative to develop accurate and efficient convective wall heat transfer models for fire modeling.</p><p dir="ltr">In this study, machine learning is employed as an alternative to traditional physics-based modeling approach for wall heat transfer in fire modeling. A significant advantage of machine learning over physics-based modeling is that machine learning does not require thorough knowledge of fire wall heat transfer which is generally hard to acquire due to the complexity of the problem. A machine-learning assisted convective wall heat transfer model, aiming to enhance wall fire predictions, is developed in this work. The objective is to improve predictions of convective heat flux to a wall in under-resolved LES of wall fires. An amplification factor ($\beta$) is introduced to compensate the under-prediction of temperature gradients normal to a wall surface in coarse grid simulations. Machine learning is then employed to assist the construction of models for $\beta$ with the training data obtained directly from fine-resolution LES. Extensive studies are conducted to identify suitable machine learning architecture, input features, training data generation strategies, training procedure, and testing and validation approaches.</p><p dir="ltr">A vertical wall fire test case is considered first to develop a baseline machine learning model. The focus is on identifying suitable input features and training strategies for machine learning of convective wall fire heat transfer. A four-parameter (input) machine learning model for $\beta$ is constructed. Both \textit{a priori} and \textit{a posteriori} testing are developed in the vertical wall fire case to provide preliminary model performance assessment. The fully tested model is also examined in an intermediate-scale parallel-wall fire spreading case that was not seen in the model training to assess the applicability of the developed machine learning model. In general, excellent model performance is observed in the vertical wall fire case.</p><p dir="ltr">The established machine learning approach for the vertical wall case is then extended to horizontal surfaces like floor fires and ceiling fires to expand the training scope of the machine learning model. The unique challenges in these new fire scenarios are investigated separately to identify the need of additional input features and training strategies. It is found that a fifth input parameter, in addition to the four parameters identified in the vertical wall, is generally needed in order to correctly identify different fire scenarios. Data augmentation techniques are also found to be a useful technique to handle data sparsity during model training. Different machine learning architectures like random forest and deep neural network are also compared.</p><p dir="ltr">The above studies are finally integrated into a unified machine learning model suitable for both vertical and horizontal surfaces. Extensive testing shows that the unified model reproduces the model performance of the separately trained models. The work is significant in demonstrating the feasibility of using machine learning approaches to enhance fire simulations. The developed machine learning modeling techniques improve predictions in various fire scenarios by using relatively coarse grid to maintain low computational cost, a critical consideration when simulation approaches are employed in real fire simulations.</p> machine learning wall heat flux random forest wall fires fire modeling LES CFD
298	Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology Mistry, Pritesh, Neagu, Daniel, Trundle, Paul R., Vessey, J.D. 22 October 2015 (has links) Yes / Drug vehicles are chemical carriers that provide beneficial aid to the drugs they bear. Taking advantage of their favourable properties can potentially allow the safer use of drugs that are considered highly toxic. A means for vehicle selection without experimental trial would therefore be of benefit in saving time and money for the industry. Although machine learning is increasingly used in predictive toxicology, to our knowledge there is no reported work in using machine learning techniques to model drug-vehicle relationships for vehicle selection to minimise toxicity. In this paper we demonstrate the use of data mining and machine learning techniques to process, extract and build models based on classifiers (decision trees and random forests) that allow us to predict which vehicle would be most suited to reduce a drug’s toxicity. Using data acquired from the National Institute of Health’s (NIH) Developmental Therapeutics Program (DTP) we propose a methodology using an area under a curve (AUC) approach that allows us to distinguish which vehicle provides the best toxicity profile for a drug and build classification models based on this knowledge. Our results show that we can achieve prediction accuracies of 80 % using random forest models whilst the decision tree models produce accuracies in the 70 % region. We consider our methodology widely applicable within the scientific domain and beyond for comprehensively building classification models for the comparison of functional relationships between two variables. Big data in toxicology Computational toxicology Classification Vehicle-toxicity modelling Area under the curve Decision tree Random forest Data mining
299	The Foundation of Pattern Structures and their Applications Lumpe, Lars 06 October 2021 (has links) This thesis is divided into a theoretical part, aimed at developing statements around the newly introduced concept of pattern morphisms, and a practical part, where we present use cases of pattern structures. A first insight of our work clarifies the facts on projections of pattern structures. We discovered that a projection of a pattern structure does not always lead again to a pattern structure. A solution to this problem, and one of the most important points of this thesis, is the introduction of pattern morphisms in Chapter4. Pattern morphisms make it possible to describe relationships between pattern structures, and thus enable a deeper understanding of pattern structures in general. They also provide the means to describe projections of pattern structures that lead to pattern structures again. In Chapter5 and Chapter6, we looked at the impact of morphisms between pattern structures on concept lattices and on their representations and thus clarified the theoretical background of existing research in this field. The application part reveals that random forests can be described through pattern structures, which constitutes another central achievement of our work. In order to demonstrate the practical relevance of our findings, we included a use case where this finding is used to build an algorithm that solves a real world classification problem of red wines. The prediction accuracy of the random forest is better, but the high interpretability makes our algorithm valuable. Another approach to the red wine classification problem is presented in Chapter 8, where, starting from an elementary pattern structure, we built a classification model that yielded good results. info:eu-repo/classification/ddc/510 ddc:510
300	A random forest model for predicting soil properties using Landsat 9 bare soil images Tokeshi Muller, Ivo 13 August 2024 (has links) (PDF) Digital soil mapping (DSM) provides a cost-effective approach for characterizing the spatial variation in soil properties which contributes to inconsistent productivity. This study utilized Random Forest (RF) models to facilitate DSM of apparent soil electrical conductivity (ECa), estimated cation exchange capacity (CEC), and soil organic matter (SOM) in agricultural fields across the Lower Mississippi Alluvial Valley. The RF models were trained and tested using in situ collected ECa, CEC, and SOM data, paired with a bare soil composite of Landsat 9 imagery. Field data and imagery were collected during the study period of 2019 through 2023. Models ranged from fair to moderate in accuracy (R2 from 0.27 to 0.68). The contrasting performance between CEC/SOM and ECa models is likely due to the dynamic nature of soil properties. Accordingly, models could have benefitted from covariates such as soil moisture, topography, and climatic factors, or higher spectral resolution imagery, such as hyperspectral.

Search results