• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 336
  • 25
  • 21
  • 13
  • 8
  • 5
  • 5
  • 5
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 502
  • 502
  • 268
  • 266
  • 144
  • 132
  • 128
  • 126
  • 113
  • 89
  • 86
  • 76
  • 75
  • 74
  • 58
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Sequential Machine Learning in Material Science / Sekventiell maskininlärning inom materialvetenskap

Bellander, Victor January 2023 (has links)
This report evaluates the possibility of using sequential learning in a material development setting to help predict material properties and speed up the development of new materials. To do this a Random forest model was built incorporating carefully calibrated prediction uncertainty estimates. The idea behind the model is to use the few data points available in this field and leverage that data to build a better representation of the input-output space as each experiment is performed. Having both predictions and uncertainties to evaluate, several different strategies were developed to investigate performance. Promising results regarding feasibility and potential cost-cutting were found using these strategies. It was found that within a specific performance region of the output space, the mean difference in alloying component price between the cheapest and most expensive material could be as high as 100 %. Also, the model performed fast extrapolation to previously unknown output regions, meaning new, differently performing materials could be found even with very poor initial data. / I denna rapport utvärderas möjligheten att använda sekventiell maskininlärning inom materialutveckling för att kunna prediktera materials egenskaper och därigenom förkorta materialutvecklingsprocessen. För att göra detta byggdes en Random forest regressionsmodell som även innehöll en uppskattning av prediktionsosäkerheten. Tanken bakom modellen är att använda de relativt få datapunkter som generellt brukar vara tillgängliga inom materialvetenskap, och med hjälp av dessa bygga en bättre representation av input-output-rummet genom varje experiment som genomförs. Med både förutsägelser och osäkerheter att utvärdera utvecklades flera olika strategier för att undersöka prestanda för de olika kandidatmaterialen. Genom att använda dessa strategier kunde lovande resultat vad gäller genomförbarhet och potentiell kostnadsbesparing hittas. Det visade sig att, för specifika prestandakrav, den genomsnittliga skillnaden i pris mellan den billigaste och den dyraste materialkemin kan vara så hög som 100 %. Vad gäller övriga resultat klarade modellen av att snabbt extrapolera initial data till tidigare okända regioner av output-rummet. Detta innebär att nya material med ny typ av prestanda kunde hittas även med mycket missanpassad initial träningsdata.
72

Analys av luftkvaliteten på Hornsgatan med hjälp av maskininlärning utifrån trafikflödesvariabler / Air Quality Analysis on Hornsgatan using Machine Learning with regards to Traffic Flow Variables

Treskog, Paulina, Teurnberg, Ellinor January 2023 (has links)
Denna studie har syftet att undersöka sambandet mellan luftföroreningar och olika fordonsvariabler, såsom årsmodell, bränsletyp och fordonstyp, på Hornsgatan i Stockholm. Studien avser att besvara vilka faktorer som har störst inverkan på luftkvaliteten. Utförandet baseras på maskininlärningsalgoritmerna Random Forest och Support Vector Regression, vilka jämförs utifrån R^2 och RMSE. Modellerna skapade med Random Forest överträffar Support Vector Regression för de olika luftföroreningarna. Den modell som presterade bäst var modellen för kolmonoxid vilken hade ett R^2-värde på 99.7%. Den modell som gav prediktioner med lägst R^2-värde, 68.4%, var modellen för kvävedioxid. Överlag var resultaten goda i relation till tidigare studier. Utifrån modellerna diskuteras variablers inverkan och olika åtgärder som kan införas i Stockholm Stad och på Hornsgatan för att förbättra luftkvaliteten. / This study aims to investigate the relationship between multiple air pollution and different vehicle variables, such as vehicle year, fuel type and vehicle type, on Hornsgatan in Stockholm. The study intends to answer which factors have the greatest impact on air quality. The implementation is based on the two machine learning algorithms Random Forest and Support Vector Regression, which are compared based on R^2 and RMSE. The models created with Random Forest outperform Support Vector Regression for the various air pollutants. The best performing model was the carbon monoxide model which had an R^2-value of 99.7%. The model that gave predictions with the lowest R^2-value, 68.4%, was the model for nitrogen dioxide. Overall, the results were good in relation to previous studies. With regards to these models, the impact of variables and different measures that can be introduced in the City of Stockholm and on Hornsgatan to improve air quality are discussed.
73

General Attitudes and Mode Choice : A mode choice study in Stockholm using Schwartz value-items grouped by personal characteristics / Generella attityder och färdmedel : En färdmedelvalsstudie i Stockholm med Schwartz värdeobjekt grupperade efter personliga egenskaper

Andersson, Malin January 2021 (has links)
Value-items from the Schwartz scale of Values have been added to travel data to investigate if the value-items can be used to model mode choice. Two kinds of mode choice models, both discrete choice models, multinomial models (MNL) and the Machine Learning Models Random Forests (RF) were constructed, using Travel Diary data (RVU) and additional data from European Social Survey (ESS). The additional data was connected to the base data by grouping the individuals using three key variables: gender, age, and household income. Models were then created with and without any data from the value-items to screen for variables that had an impact on the model. The RF model predicted the correct modes for all but the smaller groups, car passengers and biking. While the MNL model had less success accurately assessing which mode someone had chosen. The MNL with additional grouped value-items improved, while the models created using Random Forest had no difference in accuracy based on the addition. Even though there were some significant value-items in the MNL-models, the expected consequences from them small, as the base model specification might be insufficient in incorporating more relevant variables. Based on the Random Forest having no use from the value-items along with them being of similar importance no value-items stood out for further testing. The main findings were thus that no value-items of particular interest could be found with the RF model while the results for the MNL-model were inconclusive. Suggested improvements for further similar studies would be to perform grouping using data for a longer time frame and or to use a value-model as input for the mode choice modelling. It is deemed appropriate to study what values people associate with specific modes directly, and to investigate if other models such as car ownership models or models of choices between different versions of the same mode could be more suitable for additional value-data. / Värdeobjekt från Schwartz värderingsskala har kombinerats med resedata för att undersöka om värdeobjekten kan användas vid modellering av färdmedelsval. Två typer av färdmedelsmodeller, multinomiala modeller (MNL) och Random Forests konstruerades. Den data som användes var Resvanedata (RVU), med kompletterande värderingsdata från European Social Survey (ESS). ESS-datan kopplades till basdatan genom att gruppera individerna med hjälp av tre nyckelvariabler: kön, ålder och hushållsinkomst. Sedan skapades modeller med och utan den kompletterande datan för att se om modellerna påverkades. RF-modellens resultat överensstämde väl med de faktiska valen förutom för de mindre grupperna: bilpassagerare och cyklister. MNL-modellen hade mindre framgång med att bedöma vilket färdmedel en individ hade valt. MNL-modellen med ytterligare grupperade värdeobjekt förbättrades i jämförelse med grundmodellen, medan modellerna skapade med Random Forest inte skilde sig märkbart från varandra. Även om värdeobjekten i MNL-modellerna var signifikanta är de förväntade konsekvenserna av dem små, eftersom specifikationen för basmodellen tros saknar andra mer relevanta variabler. RF-modellen gynnades inte av värdeobjekten och inga värdeobjekt var betydande för modellen. De huvudsakliga fynden var att inga värdeobjekt av särskilt intresse kunde hittas med RF-modellen medan resultaten för MNL-modellen var ofullständiga. Föreslagna förbättringar för ytterligare liknande studier skulle vara att utföra gruppering med hjälp av data för ett längre tidsspann eller att introducera en värdemodell som indata för modelleringen av färdmedelsval. Det bedöms lämpligt att studera vilka värderingar människor förknippar med specifika färdmedel direkt samt att undersöka om andra modeller såsom av bilägande eller i val mellan olika versioner av samma färdmedel skulle var mer passande för att modelleras med hjälp av data med värderingar.
74

Property Valuation by Machine Learning and Hedonic Pricing Models : A Case study on Swedish Residential Property / Fastighetsvärderingar efter maskininlärning och hedoniska prissättningsmodeller : En fallstudie om svensk bostadsfastigheter

Teang, Kanha, Lu, Yiran January 2021 (has links)
Property valuation is a critical concept for a variety of applications in the real estate market such as transactions, taxes, investments, and mortgages. However, there is little consistency in which method is the best for estimating the property value. This paper aims at investigating and comparing the differences in the Stockholm residential property valuation results among parametric hedonic pricing models (HPM) including linear and log-linear regression models, and Random Forest (RF) as the machine learning algorithm. The data consists of 114,293 arm-length transactions of the tenant-owned apartment between January 2005 to December 2014. The same variables are applied on both the HPM regression models and RF. There are two adopted techniques for data splitting into training and testing datasets, randomly splits and splitting based on the transaction years. These datasets will be used to train and test all the models. The performance evaluation and measurement of each model will base on four performance indicators: R-squared, MSE, RMSE, and MAPE.   The results from both data splitting circumstances have shown that the accuracy of random forest is the highest among the regression models. The discussions point out the causes of the models’ performance changes once applied on different datasets obtained from different data splitting techniques. Limitations are also pointed out at the end of the study for future improvements. / Fastighetsvärdering är ett kritiskt koncept för en mängd olika applikationer på fastighetsmarknaden som transaktioner, skatter, investeringar och inteckningar. Det finns dock liten konsekvens i vilken metod som är bäst för att uppskatta fastighetsvärdet. Denna uppsats syftar till att undersöka och jämföra skillnaderna i Stockholms fastighetsvärderingsresultat bland parametriska hedoniska prissättningsmodeller (HPM) inklusive linjära och log-linjära regressionsmodeller, och Random Forest (RF) som maskininlärningsalgoritm. Uppgifterna består av 114,293 armlängds-transaktioner för hyresgästen från januari 2005 till december 2014. Samma variabler tillämpas på både HPM-regressionsmodellerna och RF. Det finns två antagna tekniker för uppdelning av data i utbildning och testning av datamängder: slumpmässig uppdelning och uppdelning baserat på transaktionsåren. Dessa datamängder kommer att användas för att träna och testa alla modeller. Prestationsutvärderingen och mätningen av varje modell baseras på fyra resultatindikatorer: R-kvadrat, MSE, RMSE och MAPE. Resultaten från båda uppdelningsförhållandena har visat att noggrannheten hos slumpmässig skog är den högsta bland regressionsmodellerna. Diskussionerna pekar på orsakerna till modellernas prestandaförändringar när de tillämpats på olika datamängder erhållna från olika datasplittringstekniker. Begränsningar påpekas också i slutet av studien för framtida förbättringar.
75

Process monitoring and fault diagnosis using random forests

Auret, Lidia 12 1900 (has links)
Thesis (PhD (Process Engineering))--University of Stellenbosch, 2010. / Dissertation presented for the Degree of DOCTOR OF PHILOSOPHY (Extractive Metallurgical Engineering) in the Department of Process Engineering at the University of Stellenbosch / ENGLISH ABSTRACT: Fault diagnosis is an important component of process monitoring, relevant in the greater context of developing safer, cleaner and more cost efficient processes. Data-driven unsupervised (or feature extractive) approaches to fault diagnosis exploit the many measurements available on modern plants. Certain current unsupervised approaches are hampered by their linearity assumptions, motivating the investigation of nonlinear methods. The diversity of data structures also motivates the investigation of novel feature extraction methodologies in process monitoring. Random forests are recently proposed statistical inference tools, deriving their predictive accuracy from the nonlinear nature of their constituent decision tree members and the power of ensembles. Random forest committees provide more than just predictions; model information on data proximities can be exploited to provide random forest features. Variable importance measures show which variables are closely associated with a chosen response variable, while partial dependencies indicate the relation of important variables to said response variable. The purpose of this study was therefore to investigate the feasibility of a new unsupervised method based on random forests as a potentially viable contender in the process monitoring statistical tool family. The hypothesis investigated was that unsupervised process monitoring and fault diagnosis can be improved by using features extracted from data with random forests, with further interpretation of fault conditions aided by random forest tools. The experimental results presented in this work support this hypothesis. An initial study was performed to assess the quality of random forest features. Random forest features were shown to be generally difficult to interpret in terms of geometry present in the original variable space. Random forest mapping and demapping models were shown to be very accurate on training data, and to extrapolate weakly to unseen data that do not fall within regions populated by training data. Random forest feature extraction was applied to unsupervised fault diagnosis for process data, and compared to linear and nonlinear methods. Random forest results were comparable to existing techniques, with the majority of random forest detections due to variable reconstruction errors. Further investigation revealed that the residual detection success of random forests originates from the constrained responses and poor generalization artifacts of decision trees. Random forest variable importance measures and partial dependencies were incorporated in a visualization tool to allow for the interpretation of fault conditions. A dynamic change point detection application with random forests proved more successful than an existing principal component analysis-based approach, with the success of the random forest method again residing in reconstruction errors. The addition of random forest fault diagnosis and change point detection algorithms to a suite of abnormal event detection techniques is recommended. The distance-to-model diagnostic based on random forest mapping and demapping proved successful in this work, and the theoretical understanding gained supports the application of this method to further data sets. / AFRIKAANSE OPSOMMING: Foutdiagnose is ’n belangrike komponent van prosesmonitering, en is relevant binne die groter konteks van die ontwikkeling van veiliger, skoner en meer koste-effektiewe prosesse. Data-gedrewe toesigvrye of kenmerkekstraksie-benaderings tot foutdiagnose benut die vele metings wat op moderne prosesaanlegte beskikbaar is. Party van die huidige toesigvrye benaderings word deur aannames rakende liniariteit belemmer, wat as motivering dien om nie-liniêre metodes te ondersoek. Die diversiteit van datastrukture is ook verdere motivering vir ondersoek na nuwe kenmerkekstraksiemetodes in prosesmonitering. Lukrake-woude is ’n nuwe statistiese inferensie-tegniek, waarvan die akkuraatheid toegeskryf kan word aan die nie-liniêre aard van besluitnemingsboomlede en die bekwaamheid van ensembles. Lukrake-woudkomitees verskaf meer as net voorspellings; modelinligting oor datapuntnabyheid kan benut word om lukrakewoudkenmerke te verskaf. Metingbelangrikheidsaanduiers wys watter metings in ’n noue verhouding met ’n gekose uitsetveranderlike verkeer, terwyl parsiële afhanklikhede aandui wat die verhouding van ’n belangrike meting tot die gekose uitsetveranderlike is. Die doel van hierdie studie was dus om die uitvoerbaarheid van ’n nuwe toesigvrye metode vir prosesmonitering gebaseer op lukrake-woude te ondersoek. Die ondersoekte hipotese lui: toesigvrye prosesmonitering en foutdiagnose kan verbeter word deur kenmerke te gebruik wat met lukrake-woude geëkstraheer is, waar die verdere interpretasie van foutkondisies deur addisionele lukrake-woude-tegnieke bygestaan word. Eksperimentele resultate wat in hierdie werkstuk voorgelê is, ondersteun hierdie hipotese. ’n Intreestudie is gedoen om die gehalte van lukrake-woudkenmerke te assesseer. Daar is bevind dat dit moeilik is om lukrake-woudkenmerke in terme van die geometrie van die oorspronklike metingspasie te interpreteer. Verder is daar bevind dat lukrake-woudkartering en -dekartering baie akkuraat is vir opleidingsdata, maar dat dit swak ekstrapolasie-eienskappe toon vir ongesiene data wat in gebiede buite dié van die opleidingsdata val. Lukrake-woudkenmerkekstraksie is in toesigvrye-foutdiagnose vir gestadigde-toestandprosesse toegepas, en is met liniêre en nie-liniêre metodes vergelyk. Resultate met lukrake-woude is vergelykbaar met dié van bestaande metodes, en die meerderheid lukrake-woudopsporings is aan metingrekonstruksiefoute toe te skryf. Verdere ondersoek het getoon dat die sukses van res-opsporing op die beperkte uitsetwaardes en swak veralgemenende eienskappe van besluitnemingsbome berus. Lukrake-woude-metingbelangrikheidsaanduiers en parsiële afhanklikhede is ingelyf in ’n visualiseringstegniek wat vir die interpretasie van foutkondisies voorsiening maak. ’n Dinamiese aanwending van veranderingspuntopsporing met lukrake-woude is as meer suksesvol bewys as ’n bestaande metode gebaseer op hoofkomponentanalise. Die sukses van die lukrake-woudmetode is weereens aan rekonstruksie-reswaardes toe te skryf. ’n Voorstel wat na aanleiding van hierde studie gemaak is, is dat die lukrake-woudveranderingspunt- en foutopsporingsmetodes by ’n soortgelyke stel metodes gevoeg kan word. Daar is in hierdie werk bevind dat die afstand-vanaf-modeldiagnostiek gebaseer op lukrake-woudkartering en -dekartering suksesvol is vir foutopsporing. Die teoretiese begrippe wat ontsluier is, ondersteun die toepassing van hierdie metodes op verdere datastelle.
76

Generation of Individualized Treatment Decision Tree Algorithm with Application to Randomized Control Trials and Electronic Medical Record Data

Doubleday, Kevin January 2016 (has links)
With new treatments and novel technology available, personalized medicine has become a key topic in the new era of healthcare. Traditional statistical methods for personalized medicine and subgroup identification primarily focus on single treatment or two arm randomized control trials (RCTs). With restricted inclusion and exclusion criteria, data from RCTs may not reflect real world treatment effectiveness. However, electronic medical records (EMR) offers an alternative venue. In this paper, we propose a general framework to identify individualized treatment rule (ITR), which connects the subgroup identification methods and ITR. It is applicable to both RCT and EMR data. Given the large scale of EMR datasets, we develop a recursive partitioning algorithm to solve the problem (ITR-Tree). A variable importance measure is also developed for personalized medicine using random forest. We demonstrate our method through simulations, and apply ITR-Tree to datasets from diabetes studies using both RCT and EMR data. Software package is available at https://github.com/jinjinzhou/ITR.Tree.
77

A fine-scale lidar-based habitat suitability mapping methodology for the marbled murrelet (Brachyramphus marmoratus) on Vancouver Island, British Columbia

Clyde, Georgia Emily 18 April 2017 (has links)
The marbled murrelet (Brachyramphus marmoratus) is a Threatened seabird with very particular nesting requirements. They choose to nest almost exclusively on mossy platforms, provided by large branches or deformities, in the upper canopies of coniferous old-growth trees located within 50 km of the ocean. Due primarily to a loss of this nesting habitat, populations in B.C. have seen significant decline over the past several decades. As such, reliable spatial habitat data are required to facilitate efficient management of the species and its remaining habitats. Current habitat mapping methodologies are limited by their qualitative assessment of habitat attributes and the large, stand-based spatial scale at which they classify and map habitat. This research aimed to address these limitations by utilizing light detection and ranging (lidar) technologies to develop an object-based habitat mapping methodology capable of quantitatively mapping habitat suitability at the scale of an individual tree on Northern Vancouver Island, British Columbia (B.C.). Using a balanced random forest (BRF) classification algorithm and in-field habitat suitability data derived from low-level aerial surveys (LLAS), a series of lidar-derived terrain and canopy descriptors were used to predict the habitat suitability (Rank 1: Very High Suitability – Rank 6: Nil Suitability) of lidar-derived individual tree objects. The classification model reported an overall classification accuracy of 71%, with Rank 1 – Rank 5 reporting individual class accuracies of 90%, 86%, 74%, 67%, and 98%, respectively. Evaluation of the object-based predictive habitat suitability maps provided evidence that this new methodology is capable of identifying and quantifying within-stand habitat variability at the scale of an individual tree. This improved quantification provides a superior level of habitat differentiation currently unattainable using existing habitat mapping methods. As the total amount of suitable nesting habitat in B.C. is expected to continue to decline, this improved quantification is a critical advancement for strategic managers, facilitating improved habitat and species management. / Graduate / 2018-04-07 / 0329 / 0368 / 0478 / gclyde@uvic.ca
78

A machine learning approach to fundraising success in higher education

Ye, Liang 01 May 2017 (has links)
New donor acquisition and current donor promotion are the two major programs in fundraising for higher education, and developing proper targeting strategies plays an important role in the both programs. This thesis presents machine learning solutions as targeting strategies for the both programs based on readily available alumni data in almost any institution. The targeting strategy for new donor acquisition is modeled as a donor identification problem. The Gaussian na ̈ıve bayes, random forest, and support vector machine algorithms are used and evaluated. The test results show that having been trained with enough samples, all three algorithms can distinguish donors from rejectors well, and big donors are identified more often than others.While there is a trade off between the cost of soliciting candidates and the success of donor acquisition, the results show that in a practical scenario where the models are properly used as the targeting strategy, more than 85% of new donors and more than 90% of new big donors can be acquired when only 40% of the candidates are solicited. The targeting strategy for donor promotion is modeled as a promising donor(i.e., those who will upgrade their pledge) prediction problem in machine learning.The Gaussian na ̈ıve bayes, random forest, and support vector machine algorithms are tested. The test results show that all the three algorithms can distinguish promising donors from non-promising donors (i.e., those who will not upgrade their pledge).When the age information is known, the best model produces an overall accuracy of 97% in the test set. The results show that in a practical scenario where the models are properly used as the targeting strategy, more than 85% of promising donors can be acquired when only 26% candidates are solicited. / Graduate / liangye714@gmail.com
79

Quantifying the Effects of Correlated Covariates on Variable Importance Estimates from Random Forests

Kimes, Ryan Vincent 01 January 2006 (has links)
Recent advances in computing technology have lead to the development of algorithmic modeling techniques. These methods can be used to analyze data which are difficult to analyze using traditional statistical models. This study examined the effectiveness of variable importance estimates from the random forest algorithm in identifying the true predictor among a large number of candidate predictors. A simulation study was conducted using twenty different levels of association among the independent variables and seven different levels of association between the true predictor and the response. We conclude that the random forest method is an effective classification tool when the goals of a study are to produce an accurate classifier and to provide insight regarding the discriminative ability of individual predictor variables. These goals are common in gene expression analysis, therefore we apply the random forest method for the purpose of estimating variable importance on a microarray data set.
80

Machine Learning for Beam Based Mobility Optimization in NR

Ekman, Björn January 2017 (has links)
One option for enabling mobility between 5G nodes is to use a set of area-fixed reference beams in the downlink direction from each node. To save power these reference beams should be turned on only on demand, i.e. only if a mobile needs it. An User Equipment (UE) moving out of a beam's coverage will require a switch from one beam to another, preferably without having to turn on all possible beams to find out which one is the best. This thesis investigates how to transform the beam selection problem into a format suitable for machine learning and how good such solutions are compared to baseline models. The baseline models considered were beam overlap and average Reference Signal Received Power (RSRP), both building beam-to-beam maps. Emphasis in the thesis was on handovers between nodes and finding the beam with the highest RSRP. Beam-hit-rate and RSRP-difference (selected minus best) were key performance indicators and were compared for different numbers of activated beams. The problem was modeled as a Multiple Output Regression (MOR) problem and as a Multi-Class Classification (MCC) problem. Both problems are possible to solve with the random forest model, which was the learning model of choice during this work. An Ericsson simulator was used to simulate and collect data from a seven-site scenario with 40 UEs. Primary features available were the current serving beam index and its RSRP. Additional features, like position and distance, were suggested, though many ended up being limited either by the simulated scenario or by the cost of acquiring the feature in a real-world scenario. Using primary features only, learned models' performance were equal to or worse than the baseline models' performance. Adding distance improved the performance considerably, beating the baseline models, but still leaving room for more improvements.

Page generated in 0.1015 seconds