Global ETD Search

1	Anomaly Detection in Categorical Data with Interpretable Machine Learning : A random forest approach to classify imbalanced data Yan, Ping January 2019 (has links) Metadata refers to "data about data", which contains information needed to understand theprocess of data collection. In this thesis, we investigate if metadata features can be usedto detect broken data and how a tree-based interpretable machine learning algorithm canbe used for an effective classification. The goal of this thesis is two-fold. Firstly, we applya classification schema using metadata features for detecting broken data. Secondly, wegenerate the feature importance rate to understand the model’s logic and reveal the keyfactors that lead to broken data. The given task from the Swedish automotive company Veoneer is a typical problem oflearning from extremely imbalanced data set, with 97 percent of data belongs healthy dataand only 3 percent of data belongs to broken data. Furthermore, the whole data set containsonly categorical variables in nominal scales, which brings challenges to the learningalgorithm. The notion of handling imbalanced problem for continuous data is relativelywell-studied, but for categorical data, the solution is not straightforward. In this thesis, we propose a combination of tree-based supervised learning and hyperparametertuning to identify the broken data from a large data set. Our methods arecomposed of three phases: data cleaning, which is eliminating ambiguous and redundantinstances, followed by the supervised learning algorithm with random forest, lastly, weapplied a random search for hyper-parameter optimization on random forest model. Our results show empirically that tree-based ensemble method together with a randomsearch for hyper-parameter optimization have made improvement to random forest performancein terms of the area under the ROC. The model outperformed an acceptableclassification result and showed that metadata features are capable of detecting brokendata and providing an interpretable result by identifying the key features for classificationmodel. machine learning decision tree imbalanced data anomaly detection random forest maskininlärning beslut träd obalanserat data anomalitetsdetektering Probability Theory and Statistics Sannolikhetsteori och statistik
2	GNSS Position Error Estimated by Machine Learning Techniques with Environmental Information Input / GNSS Positionsfelestimering genom Maskinlärningstekniker med Indata om Kringliggande Miljö Kuratomi, Alejandro January 2019 (has links) In Intelligent Transport Systems (ITS), specifically in autonomous driving operations, accurate vehicle localization is essential for safe operations. The localization accuracy depends on both position and positioning error estimates. Technologies aiming to improve positioning error estimation are required and are currently being researched. This project has investigated machine learning algorithms applied to positioning error estimation by assessing relevant information obtained from a GNSS receiver and adding environmental information coming from a camera mounted on a radio controlled vehicle testing platform. The research was done in two stages. The first stage consists of the machine learning algorithms training and testing on existing GNSS data coming from Waysure´s data base from tests ran in 2016, which did not consider the environment surrounding the GNSS receiver used during the tests. The second stage consists of the machine learning algorithms training and testing on GNSS data coming from new test runs carried on May 2019, which include the environment surrounding the GNSS receiver used. The results of both stages are compared. The relevant features are obtained as a result of the machine learning decision trees algorithm and are presented. This report concludes that there is no statistical evidence indicating that the tested environmental input from the camera could improve positioning error estimation accuracy with the built machine learning models. / Inom Intelligenta transportsystem (ITS), specifikt för självkörande fordon, så är en exakt fordonspositionering en nödvändighet för ökad trafiksäkerhet. Positionsnoggrannheten beror på estimering av både positionen samt positionsfelet. Olika tekniker och tillämpningar som siktar på att förbättra positionsfeluppskattningen behövs, vilket det nu forskas kring. Denna uppsats undersöker olika maskininlärningsalgoritmer inriktade på estimering av positionsfel. Algoritmerna utvärderar relevant information från en GNSS-mottagare, samt information från en kamera om den kringliggande miljön. En GNSS-mottagare och kamera monterades på en radiostyrd mobil testplattform för insamling av data. Examensarbetet består av två delar. Första delen innehåller träning och testning av valda maskininlärningsalgoritmer med GNSS-data tillhandahållen av Waysure från tester gjorda under 2016. Denna data inkluderar ingen information från den omkringliggande miljön runt GNSS-mottagaren. Andra delen består av träning och testning av valda maskininlärningsalgoritmer på GNSS-data som kommer från nya tester gjorda under maj 2019, vilka inkluderar miljöinformation runt GNSS-mottagaren. Resultaten från båda delar analyseras. De viktigaste egenskaper som erhålls från en trädbaserad modell, algoritmens beslutsträd, presenteras. Slutsatsen från denna rapport är att det inte går att statistiskt säkerställa att inkludering av information från den omkringliggande miljön från en kamera förbättrar noggrannheten vid estimering av positionsfelet med de valda maskininlärningsmodellerna. Global Navigation Satellite Systems GNSS Position Positioning error Machine Learning Decision Trees Support Vector Machines. Global Navigation Satellite Systems GNSS Position positionsfel Maskinlärning Beslut träd Stödvektormaskiner. Engineering and Technology Teknik och teknologier

Search results

Anomaly Detection in Categorical Data with Interpretable Machine Learning : A random forest approach to classify imbalanced data

GNSS Position Error Estimated by Machine Learning Techniques with Environmental Information Input / GNSS Positionsfelestimering genom Maskinlärningstekniker med Indata om Kringliggande Miljö