Spelling suggestions: "subject:"[een] BOOSTING"" "subject:"[enn] BOOSTING""
181 |
Gradient Boosting Machine and Artificial Neural Networks in R and H2O / Gradient Boosting Machine and Artificial Neural Networks in R and H2OSabo, Juraj January 2016 (has links)
Artificial neural networks are fascinating machine learning algorithms. They used to be considered unreliable and computationally very expensive. Now it is known that modern neural networks can be quite useful, but their computational expensiveness unfortunately remains. Statistical boosting is considered to be one of the most important machine learning ideas. It is based on an ensemble of weak models that together create a powerful learning system. The goal of this thesis is the comparison of these machine learning models on three use cases. The first use case deals with modeling the probability of burglary in the city of Chicago. The second use case is the typical example of customer churn prediction in telecommunication industry and the last use case is related to the problematic of the computer vision. The second goal of this thesis is to introduce an open-source machine learning platform called H2O. It includes, among other things, an interface for R and it is designed to run in standalone mode or on Hadoop. The thesis also includes the introduction into an open-source software library Apache Hadoop that allows for distributed processing of big data. Concretely into its open-source distribution Hortonworks Data Platform.
|
182 |
Quantitative Retrieval of Organic Soil Properties from Visible Near-Infrared Shortwave Infrared (Vis-NIR-SWIR) Spectroscopy Using Fractal-Based Feature Extraction.Liu, Lanfa, Buchroithner, Manfred, Ji, Min, Dong, Yunyun, Zhang, Rongchung 27 March 2017 (has links)
Visible and near-infrared diffuse reflectance spectroscopy has been demonstrated to be a fast and cheap tool for estimating a large number of chemical and physical soil properties, and effective features extracted from spectra are crucial to correlating with these properties. We adopt a novel methodology for feature extraction of soil spectroscopy based on fractal geometry. The spectrum can be divided into multiple segments with different step–window pairs. For each segmented spectral curve, the fractal dimension value was calculated using variation estimators with power indices 0.5, 1.0 and 2.0. Thus, the fractal feature can be generated by multiplying the fractal dimension value with spectral energy. To assess and compare the performance of new generated features, we took advantage of organic soil samples from the large-scale European Land Use/Land Cover Area Frame Survey (LUCAS). Gradient-boosting regression models built using XGBoost library with soil spectral library were developed to estimate N, pH and soil organic carbon (SOC) contents. Features generated by a variogram estimator performed better than two other estimators and the principal component analysis (PCA). The estimation results for SOC were coefficient of determination (R2) = 0.85, root mean square error (RMSE) = 56.7 g/kg, the ratio of percent deviation (RPD) = 2.59; for pH: R2 = 0.82, RMSE = 0.49 g/kg, RPD = 2.31; and for N: R2 = 0.77, RMSE = 3.01 g/kg, RPD = 2.09. Even better results could be achieved when fractal features were combined with PCA components. Fractal features generated by the proposed method can improve estimation accuracies of soil properties and simultaneously maintain the original spectral curve shape.
|
183 |
Implementace obrazových klasifikátorů v FPGA / Implementation of Image Classifiers in FPGAsKadlček, Filip January 2010 (has links)
The thesis deals with image classifiers and their implementation using FPGA technology. There are discussed weak and strong classifiers in the work. As an example of strong classifiers, the AdaBoost algorithm is described. In the case of weak classifiers, basic types of feature classifiers are shown, including Haar and Gabor wavelets. The rest of work is primarily focused on LBP, LRP and LR classifiers, which are well suitable for efficient implementation in FPGAs. With these classifiers is designed pseudo-parallel architecture. Process of classifications is divided on software and hardware parts. The thesis deals with hardware part of classifications. The designed classifier is very fast and produces results of classification every clock cycle.
|
184 |
Entwicklung einer offenen Softwareplattform für Visual ServoingSprößig, Sören 28 June 2010 (has links)
Ziel dieser Diplomarbeit ist es, eine flexibel zu verwendende Plattform für Visual Servoing-Aufgaben zu Erstellen, mit der eine Vielzahl von verschiedenen Anwendungsfällen abgedeckt werden kann.
Kernaufgabe der Arbeit ist es dabei, verschiedene Verfahren der Gesichtserkennung (face detection) am Beispiel der Haar-Kaskade und -wiedererkennung (face recognition) am Beispiel von Eigenfaces und Fisherfaces zu betrachten und an ausführlichen Beispielen vorzustellen.
Dabei sollen allgemeine Grundbegriffe der Bildverarbeitung und bereits bekannte Verfahren vorgestellt und ihre Implementierung im Detail dargestellt werden.
Aus den dadurch gewonnen Erkenntnissen und dem sich ergebenden Anforderungsprofil an die zu entwickelnde Plattform leitet sich anschließend die Realisierung als eigenständige Anwendung ab.
Hierbei ist weiterhin zu untersuchen, wie die neu zu entwickelnde Software zukunftssicher und in Hinblick auf einen möglichen Einsatz in Praktika einfach zu verwenden realisiert werden kann.
Sämtliche während der Arbeit entstandenen Programme und Quellcodes werden auf einem separaten Datenträger zur Verfügung gestellt. Eine komplett funktionsfähige Entwicklungsumgebung wird als virtuelle Maschine beigelegt.
|
185 |
Using supervised learning methods to predict the stop duration of heavy vehicles.Oldenkamp, Emiel January 2020 (has links)
In this thesis project, we attempt to predict the stop duration of heavy vehicles using data based on GPS positions collected in a previous project. All of the training and prediction is done in AWS SageMaker, and we explore possibilities with Linear Learner, K-Nearest Neighbors and XGBoost, all of which are explained in this paper. Although we were not able to construct a production-grade model within the time frame of the thesis, we were able to show that the potential for such a model does exist given more time, and propose some suggestions for the paths one can take to improve on the endpoint of this project.
|
186 |
Introduction à l’apprentissage automatique en pharmacométrie : concepts et applicationsLeboeuf, Paul-Antoine 05 1900 (has links)
L’apprentissage automatique propose des outils pour faire face aux problématiques d’aujourd’hui et de demain. Les récentes percées en sciences computationnelles et l’émergence du phénomène des mégadonnées ont permis à l’apprentissage automatique d’être mis à l’avant plan tant dans le monde académique que dans la société. Les récentes réalisations de l’apprentissage automatique dans le domaine du langage naturel, de la vision et en médecine parlent d’eux-mêmes. La liste des sciences et domaines qui bénéficient des techniques de l’apprentissage automatique est longue.
Cependant, les tentatives de coopération avec la pharmacométrie et les sciences connexes sont timides et peu nombreuses. L’objectif de ce projet de maitrise est d’explorer le potentiel de l’apprentissage automatique en sciences pharmaceutiques. Cela a été réalisé par l’application de techniques et des méthodes d’apprentissage automatique à des situations de pharmacologie clinique et de pharmacométrie. Le projet a été divisé en trois parties. La première partie propose un algorithme pour renforcer la fiabilité de l’étape de présélection des covariables d’un modèle de pharmacocinétique de population. Une forêt aléatoire et l’XGBoost ont été utilisés pour soutenir la présélection des covariables. Les indicateurs d’importance relative des variables pour la forêt aléatoire et pour l’XGBoost ont bien identifié l’importance de toutes les covariables qui avaient un effet sur les différents paramètres du modèle PK de référence. La seconde partie confirme qu’il est possible d’estimer des concentrations plasmatiques avec des méthodes différentes de celles actuellement utilisés en pharmacocinétique. Les mêmes algorithmes ont été sélectionnés et leur ajustement pour la tâche était appréciable. La troisième partie confirme la possibilité de faire usage des méthodes d'apprentissage automatique pour la prédiction de relations complexes et typiques à la pharmacologie clinique. Encore une fois, la forêt aléatoire et l’XGBoost ont donné lieu à un ajustement appréciable. / Machine learning offers tools to deal with current problematics. Recent breakthroughs in computational sciences and the emergence of the big data phenomenon have brought machine learning to the forefront in both academia and society. The recent achievements of machine learning in natural language, computational vision and medicine speak for themselves. The list of sciences and fields that benefit from machine learning techniques is long.
However, attempts to cooperate with pharmacometrics and related sciences are timid and limited. The aim of this Master thesis is to explore the potential of machine learning in pharmaceutical sciences. This has been done through the application of machine learning techniques and methods to situations of clinical pharmacology and pharmacometrics. The project was divided into three parts. The first part proposes an algorithm to enhance the reliability of the covariate pre-selection step of a population pharmacokinetic model. Random forest and XGBoost were used to support the screening of covariates. The indicators of the relative importance of the variables for the random forest and for XGBoost recognized the importance of all the covariates that influenced the various parameters of the PK model of reference. The second part exemplifies the estimation of plasma concentrations using machine learning methods. The same algorithms were selected and their fit for the task was appreciable. The third part confirms the possibility to apply machine learning methods in the prediction of complex relationships, as some typical clinical pharmacology relationships. Again, random forest and XGBoost got a nice adjustment.
|
187 |
Ensemble Classifier Design and Performance Evaluation for Intrusion Detection Using UNSW-NB15 DatasetZoghi, Zeinab 30 November 2020 (has links)
No description available.
|
188 |
Data Driven Energy Efficiency of ShipsTaspinar, Tarik January 2022 (has links)
Decreasing the fuel consumption and thus greenhouse gas emissions of vessels has emerged as a critical topic for both ship operators and policy makers in recent years. The speed of vessels has long been recognized to have highest impact on fuel consumption. The solution suggestions like "speed optimization" and "speed reduction" are ongoing discussion topics for International Maritime Organization. The aim of this study are to develop a speed optimization model using time-constrained genetic algorithms (GA). Subsequent to this, this paper also presents the application of machine learning (ML) regression methods in setting up a model with the aim of predicting the fuel consumption of vessels. Local outlier factor algorithm is used to eliminate outlier in prediction features. In boosting and tree-based regression prediction methods, the overfitting problem is observed after hyperparameter tuning. Early stopping technique is applied for overfitted models.In this study, speed is also found as the most important feature for fuel consumption prediction models. On the other hand, GA evaluation results showed that random modifications in default speed profile can increase GA performance and thus fuel savings more than constant speed limits during voyages. The results of GA also indicate that using high crossover rates and low mutations rates can increase fuel saving.Further research is recommended to include fuel and bunker prices to determine more accurate fuel efficiency.
|
189 |
Multi-level Safety Performance Functions For High Speed FacilitiesAhmed, Mohamed 01 January 2012 (has links)
High speed facilities are considered the backbone of any successful transportation system; Interstates, freeways, and expressways carry the majority of daily trips on the transportation network. Although these types of roads are relatively considered the safest among other types of roads, they still experience many crashes, many of which are severe, which not only affect human lives but also can have tremendous economical and social impacts. These facts signify the necessity of enhancing the safety of these high speed facilities to ensure better and efficient operation. Safety problems could be assessed through several approaches that can help in mitigating the crash risk on long and short term basis. Therefore, the main focus of the research in this dissertation is to provide a framework of risk assessment to promote safety and enhance mobility on freeways and expressways. Multi-level Safety Performance Functions (SPFs) were developed at the aggregate level using historical crash data and the corresponding exposure and risk factors to identify and rank sites with promise (hot-spots). Additionally, SPFs were developed at the disaggregate level utilizing real-time weather data collected from meteorological stations located at the freeway section as well as traffic flow parameters collected from different detection systems such as Automatic Vehicle Identification (AVI) and Remote Traffic Microwave Sensors (RTMS). These disaggregate SPFs can identify real-time risks due to turbulent traffic conditions and their interactions with other risk factors. In this study, two main datasets were obtained from two different regions. Those datasets comprise historical crash data, roadway geometrical characteristics, aggregate weather and traffic parameters as well as real-time weather and traffic data. iii At the aggregate level, Bayesian hierarchical models with spatial and random effects were compared to Poisson models to examine the safety effects of roadway geometrics on crash occurrence along freeway sections that feature mountainous terrain and adverse weather. At the disaggregate level; a main framework of a proactive safety management system using traffic data collected from AVI and RTMS, real-time weather and geometrical characteristics was provided. Different statistical techniques were implemented. These techniques ranged from classical frequentist classification approaches to explain the relationship between an event (crash) occurring at a given time and a set of risk factors in real time to other more advanced models. Bayesian statistics with updating approach to update beliefs about the behavior of the parameter with prior knowledge in order to achieve more reliable estimation was implemented. Also a relatively recent and promising Machine Learning technique (Stochastic Gradient Boosting) was utilized to calibrate several models utilizing different datasets collected from mixed detection systems as well as real-time meteorological stations. The results from this study suggest that both levels of analyses are important, the aggregate level helps in providing good understanding of different safety problems, and developing policies and countermeasures to reduce the number of crashes in total. At the disaggregate level, real-time safety functions help toward more proactive traffic management system that will not only enhance the performance of the high speed facilities and the whole traffic network but also provide safer mobility for people and goods. In general, the proposed multi-level analyses are useful in providing roadway authorities with detailed information on where countermeasures must be implemented and when resources should be devoted. The study also proves that traffic data collected from different detection systems could be a useful asset that should be utilized iv appropriately not only to alleviate traffic congestion but also to mitigate increased safety risks. The overall proposed framework can maximize the benefit of the existing archived data for freeway authorities as well as for road users.
|
190 |
Machine Learning based Predictive Data Analytics for Embedded Test SystemsAl Hanash, Fayad January 2023 (has links)
Organizations gather enormous amounts of data and analyze these data to extract insights that can be useful for them and help them to make better decisions. Predictive data analytics is a crucial subfield within data analytics that make accurate predictions. Predictive data analytics extracts insights from data by using machine learning algorithms. This thesis presents the supervised learning algorithm to perform predicative data analytics in Embedded Test System at the Nordic Engineering Partner company. Predictive Maintenance is a concept that is often used in manufacturing industries which refers to predicting asset failures before they occur. The machine learning algorithms used in this thesis are support vector machines, multi-layer perceptrons, random forests, and gradient boosting. Both binary and multi-class classifier have been provided to fit the models, and cross-validation, sampling techniques, and a confusion matrix have been provided to accurately measure their performance. In addition to accuracy, recall, precision, f1, kappa, mcc, and roc auc measurements are used as well. The prediction models that are fitted achieve high accuracy.
|
Page generated in 0.0597 seconds