Global ETD Search

251	Kan en bättre prediktion uppnås genom en kategorispecifik modell? : Teknologiprojekt på Kickstarter och maskininlärning Appelquist, Niklas, Karlsson, Emelia January 2020 (has links) Crowdfunding används för att samla in pengar för tänkta projekt via internet, där ett stort antal investerare bidrar med små summor. Kickstarter är en av de största crowdfundingplattformarna idag. Trots det stora intresset för crowdfunding misslyckas många kampanjer att nå sin målsumma och projekt av kategorin teknologi visar sig vara de projekt som misslyckas till högst grad. Därmed är det av intresse att kunna förutsäga vilka kampanjer som kommer att lyckas eller misslyckas. Denna forskningsansats syftar till att undersöka genomförbarheten i att uppnå en högre accuracy vid prediktion av framgången hos lanserade kickstarterprojekt med hjälp av maskininlärning genom att använda en mindre mängd kategorispecifik data. Data över 192 548 lanserade projekt på plattformen Kickstarter har samlats in via www.kaggle.com. Två modeller av typen RandomForest har sedan tränats där en modell tränades med data över samtliga projekt i uppsättningen och en tränades med data över teknologiprojekt med syftet att kunna jämföra modellernas prestation vid klassificering av teknologiprojekt. Resultatet visar att en högre accuracy uppmättes för teknologimodellen som nådde 68,37% träffsäkerhet vid klassificeringen gentemot referensmodellens uppvisade accuracy på 68,00%. / Crowdfunding is used to collect money via internet for potential projects through a large number of backers which contribute with small pledges. Kickstarter is one of the largest crowdfunding platforms today. Despite the big interest in crowdfunding a lot of launched campaigns fail to reach their goal and projects of the category technology shows the largest rate of failure on Kickstarter. Therefore, it is important to be able to predict which campaigns are likely to succeed or fail. This thesis aims to explore the possibility of reaching a higher accuracy when predicting the success of launched projects with machine learning with a smaller amount of category-specific data. The data consists om 192 548 launched projects on Kickstarter and has been collected through Kaggle.com. Two models of the type Random Forest has been developed where one model has been trained with general data over all projects and one model has been trained with category specific data over technology projects. The results show that the technology model show a higher accuracy rate with 68,37 % compared to the reference model with 68,00 %. Crowdfunding Kickstarter maskininlärning prediktiv modellering Random Forest Information Systems, Social aspects
252	Training Machine Learning-based QSAR models with Conformal Prediction on Experimental Data from DNA-Encoded Chemical Libraries Geylan, Gökçe January 2021 (has links) DNA-encoded chemical libraries (DEL) allows an exhaustive chemical space sampling with a large-scale data consisting of compounds produced through combinatorial synthesis. This novel technology was utilized in the early drug discovery stages for robust hit identification and lead optimization. In this project, the aim was to build a Machine Learning- based QSAR model with conformal prediction for hit identification on two different target proteins, the DEL was assayed on. An initial investigation was conducted on a pilot project with 1000 compounds and the analyses and the conclusions drawn from this part were later applied to a larger dataset with 1.2 million compounds. With this classification model, the prediction of the compound activity in the DEL as well as in an external dataset was aimed to be analyzed with identification of the top hits to evaluate model’s performance and applicability. Support Vector Machine (SVM) and Random Forest (RF) models were built on both the pilot and the main datasets with different descriptor sets of Signature Fingerprints, RDKIT and CDK. In addition, an Autoencoder was used to supply data-driven descriptors on the pilot data as well. The Libsvm and the Liblinear implementations were explored and compared based on the models’ performances. The comparisons were made by considering the key concepts of conformal prediction such as the trade-off between validity and efficiency, observed fuzziness and the calibration against a range of significance levels. The top hits were determined by two sorting methods, credibility and p-value differences between the binary classes. The assignment of correct single-labels to the true actives over a wide range of significance levels regardless of the similarity of the test compounds to the training set was confirmed for the models. Furthermore, an accumulation of these true actives in the models’ top hit selections was observed according to the latter sorting method and additional investigations on the similarity and the building block enrichments in the top 50 and 100 compounds were conducted. The Tanimoto similarity demonstrated the model’s predictive power in selecting structurally dissimilar compounds while the building block enrichment analysis showed the selectivity of the binding pocket where the target protein B was determined to be more selective. All of these comparison methods enabled an extensive study on the model evaluation and performance. In conclusion, the Liblinear model with the Signature Fingerprints was concluded to give the best model performance for both the pilot and the main datasets with the considerations of the model performances and the computational power requirements. However, an external set prediction was not successful due to the low structural diversity in the DEL which the model was trained on. Machine Learning DNA-Encoded Chemical Library Support Vector Machine Random Forest Conformal Prediction QSAR Pharmaceutical Sciences Farmaceutiska vetenskaper
253	Tyre sound classification with machine learning Jabali, Aghyad, Mohammedbrhan, Husein Abdelkadir January 2021 (has links) Having enough data about the usage of tyre types on the road can lead to a better understanding of the consequences of studded tyres on the environment. This paper is focused on training and testing a machine learning model which can be further integrated into a larger system for automation of the data collection process. Different machine learning algorithms, namely CNN, SVM, and Random Forest, were compared in this experiment. The method used in this paper is an empirical method. First, sound data for studded and none-studded tyres was collected from three different locations in the city of Gävle/Sweden. A total of 760 Mel spectrograms from both classes was generated to train and test a well-known CNN model (AlexNet) on MATLAB. Sound features for both classes were extracted using JAudio to train and test models that use SVM and Random Forest classifi-ers on Weka. Unnecessary features were removed one by one from the list of features to improve the performance of the classifiers. The result shows that CNN achieved accuracy of 84%, SVM has the best performance both with and without removing some audio features (i.e 94% and 92%, respectively), while Random Forest has 89 % accuracy. The test data is comprised of 51% of the studded class and 49% of the none-studded class and the result of the SVM model has achieved more than 94 %. Therefore, it can be considered as an acceptable result that can be used in practice. Sound Classification Machine learning Support vector machine (SVM) Convolutional Neural Network (CNN) Random Forest. Computer Sciences Datavetenskap (datalogi)
254	Geospatial Modeling of Land Cover Change in the Chocó-Darien Global Ecoregion of South America: Assessing Proximate Causes and Underlying Drivers of Deforestation and Reforestation Fagua, José Camilo 01 December 2018 (has links) The Chocó-Darien Global Ecoregion (CGE) in South America is one of 25 global biodiversity hotspots prioritized for conservation. I performed the first land-use and land-cover (LULC) change analysis for the entire CGE in this dissertation. There were three main objectives: 1) Select the best available imagery to build annual land-use and land-cover maps from 2001 to 2015 across the CGE. 2) Model LULC across the CGE to assess forest change trends from 2002 to 2015 and identify the effect of proximate causes of deforestation and reforestation. 3) Estimate the effects of underlying drivers on deforestation and reforestation across the CGE between 2002 and 2015. I developed annual LULC maps across the CGE from 2002 to 2015 using MODIS (Moderate Resolution Imaging Spectro radiometer) vegetation index products and random forest classification. The LULC maps resulted in high accuracies (Kappa = 0.87; SD = 0.008). We detected a gradual replacement of forested areas with agriculture and secondary vegetation (agriculture reverting to early regeneration of natural vegetation) across the CGE. Forest loss was higher between 2010-2015 when compared to 2002-2010. LULC change trends, proximate causes, and reforestation transitions varied according to administrative authority (countries: PanamanianCGE, Colombian CGE, and Ecuadorian CGE). Population growth and road density were underlying drivers of deforestation. Armed conflicts, Gross Domestic Product, and average annual rain were proximate causes and underlying drivers related reforestation. Forest change land use - land cover drivers of forest change Bayesian structural equation model Random forest Ecology and Evolutionary Biology
255	Data Mining for Accurately Estimating Residential Natural Gas Energy Consumption and Savings Using a Random Forest Approach Naji, Adel Ali 30 May 2019 (has links) No description available. Mechanical Engineering Economics Energy Random Forest Building Energy Efficiency Data Mining Levelized Cost of Fuel Saving Worst-to-First Strategy
256	A Comparison of Machine Learning Techniques to Predict University Rates Park, Samuel M. 06 September 2019 (has links) No description available. Mathematics Statistics
257	A Statistical Analysis of Medical Data for Breast Cancer and Chronic Kidney Disease Yang, Kaolee 05 May 2020 (has links) No description available. Statistics Breast cancer Chronic kidney disease Medical data Logistic regression Decision tree Bagging Random forest Neural networks Model selection
258	Anomaly Detection for Network Traffic in a Resource Constrained Environment Lidholm, Pontus, Ingletto, Gaia January 2023 (has links) Networks connected to the internet are under a constant threat of attacks. To protect against such threats, new techniques utilising already connected hardware have in this thesis been proven to be a viable solution. By equipping network switches with lightweight machine learning models, such as, Decision Tree and Random Forest, no additional devices are needed to be installed on the network.When an attack is detected, the device may notify or take direct actions on the network to protect vulnerable systems. By utilising container software on Westermo's devices, a model has been integrated, limiting its computational resources. Such a system, and its building blocks, are what this thesis has researched and implemented. The system has been validated using multiple different models using a range of parameters.These models have been trained offline on datasets with pre-recorded attacks. The recordings are converted into flows, decreasing dataset size and increasing information density. These flows contain features corresponding to information about the packets and statistics about the flows. During training, a subset of features was selected using a Genetic Algorithm, decreasing the time for processing each packet. After the models have been trained, they are converted to C code, which runs on a network switch. These models are verified online, using a simulated factory, launching different attacks on the network. Results show that the hardware is sufficient for smaller models and that the system is capable of detecting certain types of attacks. Network Traffic Anomaly Detection Embedded Systems Machine Learning Random Forest Embedded Systems Inbäddad systemteknik Communication Systems Kommunikationssystem Computer Systems Datorsystem
259	Classification of weather conditions based on supervised learning Safia, Mohamad, Abbas, Rodi January 2023 (has links) Forecasting the weather remains a challenging task because of the atmosphere's complexity and unpredictable nature. A few of the factors that decide weather conditions, such as rain, clouds, clear skies, and sunshine, include temperature, pressure, humidity, wind speed, and direction. Currently, sophisticated, and physical models are used to forecast weather, but they have several limitations, particularly in terms of computational time. In the past few years, supervised machine learning algorithms have shown great promise for the precise forecasting of meteorological events. Using historical weather data, these strategies train a model to predict the weather in the future. This study employs supervised machine learning techniques, including k-nearest neighbors (KNNs), support vector machines (SVMs), random forests (RFs), and artificial neural networks (ANNs), for better weather forecast accuracy. To conduct this study, we employed historical weather data from the Weatherstack API. The data spans several years and contains information on several meteorological variables, including temperature, pressure, humidity, wind speed, and direction. The data is processed beforehand which includes normalizing it and dividing it into separate training and testing sets. Finally, the effectiveness of different models is examined to determine which is best for producing accurate weather forecasts. The results of this study provide information on the application of supervised machine learning methods for weather forecasting and support the creation of better weather prediction models. / Att förutsäga vädret är fortfarande en utmanande uppgift på grund av atmosfärens komplexitet och oförutsägbara natur. Några av faktorerna som påverkar väderförhållandena, som regn, moln, klart väder och solsken, inkluderar temperatur, tryck, luftfuktighet, vindhastighet och riktning. För närvarande används sofistikerade fysiska modeller för att förutsäga vädret, men de har flera begränsningar, särskilt när det gäller beräkningstid. Under de senaste åren har övervakade maskininlärningsalgoritmer visat stor potential för att noggrant förutsäga meteorologiska händelser. Genom att använda historiska väderdata tränar dessa strategier en modell för att förutsäga framtida väder. Denna studie använder övervakade maskininlärningstekniker, inklusive k-nearest neighbors (KNNs), support vector machines (SVMs), random forests (RFs) och artificial neural networks (ANNs), för att förbättra noggrannheten i väderprognoser. För att genomföra denna studie använde vi historiska väderdata från Weatherstack API. Data sträcker sig över flera år och innehåller information om flera meteorologiska variabler, inklusive temperatur, tryck, luftfuktighet, vindhastighet och riktning. Data bearbetas i förväg, vilket inkluderar normalisering och uppdelning i separata tränings- och testset. Slutligen undersöks effektiviteten hos olika modeller för att avgöra vilken som är bäst för att producera noggranna väderprognoser. Resultaten av denna studie ger information om tillämpningen av övervakade maskininlärningsmetoder för väderprognoser och stödjer skapandet av bättre väderprognosmodeller. Machine learning Neural networks Support vector machines K-nearest neighbours Random forest Weather prediction Computer Sciences Datavetenskap (datalogi)
260	Network Interconnectivity Prediction from SCADA System Data : A Case Study in the Wastewater Industry / Prediktion av Nätverkssammankoppling från Data Genererat av SCADA System : En fallstudie inom avloppsindustrin Isacson, Jonas January 2019 (has links) Increased strain on incumbent wastewater distribution networks originating from population increases as well as climate change calls for enhanced resource utilization. Accurately being able to predict network interconnectivity is vital within the wastewater industry to enable operational management strategies that optimizes the performance of the wastewater system. In this thesis, an evaluation of the network interconnectivity prediction performance of two machine learning models, the multilayer perceptron (MLP) and the support vector machine (SVM), utilizing supervisory control and dataacquisition (SCADA) system data for a wastewater system is presented. Results of the thesis imply that the MLP achieves the best predictions of the network interconnectivity. The thesis concludes that the MLP is the superior model and that the highest achievable network interconnectivity accuracy is 56% which is attained by the MLP model. / Den ökade påfrestningen på nuvarande avloppsnät till följd av befolkningstillväxt och klimatförändringar medför att det finns behov för optimerad resursförbrukning. Att korrekt kunna predicera ett avloppsnät är önskvärt då det möjliggör för effektivitetshöjande operativ förvaltning av avloppssystemet. I denna avhandling evalueras hur väl två maskininlärningsmodeller kan predicera nätverketssammankoppling med data från ett system för övervakning och kontroll av data (SCADA) genererat av ett avloppsnätverk. De två modellerna som testas är en multilagersperceptron (MLP) och en stödvektormaskin (SVM). Resultaten av avhandlingen visar på att MLP modellen uppnår den bästa prediktionen av nätverketssammankoppling. Avhandlingen konkluderar att MLP modellen är den bästa modellen för att predicera nätverkets sammankoppling samt att den högsta nåbara korrektheten var 56% vilket uppnåddes av MLP modellen. MLP SVM IoT Binary Classification Random Forest Network Predicition Wastewater Distrubtion Network SCADA Industry 4.0 Engineering and Technology Teknik och teknologier

Search results