Global ETD Search

1	Evaluating Random Forest and k-Nearest Neighbour Algorithms on Real-Life Data Sets / Utvärdering av slumpmässig skog och k-närmaste granne algoritmer på verkliga datamängder Salim, Atheer, Farahani, Milad January 2023 (has links) Computers can be used to classify various types of data, for example to filter email messages, detect computer viruses, detect diseases, etc. This thesis explores two classification algorithms, random forest and k-nearest neighbour, to understand how accurately and how quickly they classify data. A literature study was conducted to identify the various prerequisites and to find suitable data sets. Five different data sets, leukemia, credit card, heart failure, mushrooms and breast cancer, were gathered and classified by each algorithm. A train split and a 4-fold cross-validation for each data set was used. The Rust library SmartCore, which included numerous classification methods and tools, was used to perform the classification. The results gathered indicated that using the train split resulted in better classification results, as opposed to 4-fold cross-validation. However, it could not be determined if any attributes of a data set affect the classification accuracy. Random forest managed to achieve the best classification results on the two data sets heart failure and leukemia, whilst k-nearest neighbour achieved the best classification results on the remaining three data sets. In general the classification results on both algorithms were similar. Based on the results, the execution time of random forest was dependent on the number of trees in the ”forest”, in which a greater number of trees resulted in an increased execution time. In contrast, a higher k value did not increase the execution time of k-nearest neighbour. It was also found that data sets with only binary values (0 and 1) run much faster than a data set with arbitrary values when using random forest. The number of instances in a data set also leads to an increased execution time for random forest despite a small number of features. The same applied to k-nearest neighbour, but with the number of features also affecting the execution since time is needed to compute distances between data points. Random forest managed to achieve the fastest execution time on the two data sets credit card and mushrooms, whilst k-nearest neighbour executed faster on the remaining three data sets. The difference in execution time between the algorithms varied a lot and this depends on the parameter value chosen for the respective algorithm. / Datorer kan användas för att klassificera olika typer av data, t.ex att filtrera e-postmeddelanden, upptäcka datorvirus, upptäcka sjukdomar, etc. Denna avhandling utforskar två klassificeringsalgoritmer, slumpmässiga skogar och k-närmaste grannar, för att förstå hur precist och hur snabbt de klassificerar data. En litteraturstudie genomfördes för att identifiera de olika förutsättningarna och för att hitta lämpliga datamängder. Fem olika datamängder, leukemia, credit card, heart failure, mushrooms och breast cancer, samlades in och klassificerades av varje algoritm. En träningsfördelning och en 4-faldig korsvalidering för varje datamängd användes. Rust-biblioteket SmartCore, som inkluderade många klassificeringsmetoder och verktyg, användes för att utföra klassificeringen. De insamlade resultaten visade att användningen av träningsfördelning resulterade i bättre klassificeringsresultat i motsats till 4-faldig korsvalidering. Det gick dock inte att fastställa om några attribut för en datamängd påverkar klassificeringens noggrannhet. Slumpmässiga skogar lyckades uppnå det bästa klassificeringsresultaten på de två datamängderna heart failure och leukemia, medan k-närmaste granne uppnådde det bästa klassificeringsresultaten på de återstående tre datamängderna. I allmänhet var klassificeringsresultaten för båda algoritmerna likartade. Utifrån resultaten var utförandetiden för slumpmässiga skogar beroende av antalet träd i ”skogen”, då ett större antal träd resulterade i en ökad utförandetid. Däremot ökade inte ett högre k-värde exekveringstiden för k-närmaste grannar. Det upptäcktes även att datamängder med endast binära värden (0 och 1) körs mycket snabbare än datamängder med godtyckliga värden när man använder slumpmässiga skogar. Antalet instanser i en datamängd leder också till en ökad exekveringstid för slumpmässiga skogar trots ett litet antal egenskaper. Detsamma gällde för k-närmaste granne, men även antalet egenskaper påverkade exekveringstiden då tid behövs för att beräkna avstånd mellan datapunkter. Slumpmässiga skogar lyckades uppnå den snabbaste exekveringstiden på de två datamängderna credit card och mushrooms, medan k-närmaste granne exekverades snabbare på de återstående tre datamängderna. Skillnaden i exekveringstid mellan algoritmerna varierade mycket och detta beror på vilket parametervärde som valts för respektive algoritm. Random Forest k-Nearest Neighbour Evaluation Machine Learning Classification Execution Time Slumpmässig Skog k-Närmaste Granne Utvärdering Maskininlärning Klassificiering Exekveringstid Computer and Information Sciences Data- och informationsvetenskap
2	A Comparative Study of Machine Learning Algorithms for Angular Position Estimation in Assembly Tools / Jämförande studie av maskininlärningsalgoritmer för skattning av vinkelposition hos monteringsverktyg Fagerlund, Henrik January 2023 (has links) The threaded fastener is by far the most common method for securing components together and plays a significant role in determining the quality of a product. Atlas Copco offers industrial tools for tightening these fasteners, which are today suffering from errors in the applied torque. These errors have been found to behave in periodic patterns which indicate that the errors can be predicted and therefore compensated for. However, this is only possible by knowing the rotational position of the tool. Atlas Copco is interested in the possibility of acquiring this rotational position without installing sensors inside the tools. To address this challenge, the thesis explores the feasibility of estimating the rotational position by analysing the behaviour of the errors and finding periodicities in the data. The objective is to determine whether these periodicities can be used to accurately estimate the rotation of the torque errors of unknown data relative to errors of data where the rotational position is known. The tool analysed in this thesis exhibits a periodic pattern in the torque error with a period of 11 revolutions. Two methods for estimating the rotational position were evaluated: a simple nearest neighbour method that uses mean squared error (MSE) as distance measure, and a more complex circular fully convolutional network (CFCN). The project involved data collection from a custom-built setup. However, the setup was not fully completed, and the models were therefore evaluated on a limited dataset. The results showed that the CFCN method was not able to identify the rotational position of the signal. The insufficient size of the data is discussed to be the cause for this. The nearest neighbour method, however, was able to estimate the rotational position correctly with 100% accuracy across 1000 iterations, even when looking at a fragment of a signal as small as 40%. Unfortunately, this method is computationally demanding and exhibits slow performance when applied to large datasets. Consequently, adjustments are required to enhance its practical applicability. In summary, the findings suggest that the nearest neighbour method is a promising approach for estimating the rotational position and could potentially contribute to improving the accuracy of tools. / Skruvförband är den vanligaste typen av förband för att sammanfoga komponenter och är avgörande för en produkts kvalitet. Atlas Copco tillverkar industriverktyg avsedda för sådana skruvförband, som dessvärre lider av små avvikelser i åtdragningsmomentet. Avvikelserna uppvisar ett konsekvent periodiskt mönster, vilket indikerar att de är förutsägbara och därför möjliga att kompenseras för. Det är dock endast möjligt genom att veta verktygets vinkelposition. Atlas Copco vill veta om det är möjligt att erhålla vinkelpositionen utan att installera sensorer i verktygen. Denna uppsats undersöker möjligheten att uppskatta vinkelpositionen genom att analysera beteendet hos avvikelserna i åtdragningsmomentet och identifiera periodiciteter i datan, samt undersöka om dessa periodiciteter kan utnyttjas för att uppskatta rotationen hos avvikelserna hos okänd data i förhållande till tidigare data. Det verktyget som används i detta projekt uppvisar en tydlig periodicitet med en period på 11 varv. Två metoder för att uppskatta vinkelpositionen utvärderades: en simpel nearest neighbour-metod som använder mean squared error (MSE) som mått för avstånd, och ett mer komplext circular fully convolutional network (CFCN). Projektet innefattade datainsamling från en egendesignad testrigg som tyvärr aldrig blev färdigställd, vilket medförde att utvärderingen av modellerna utfördes på ett begränsat dataset. Resultatet indikerade att CFCN-metoden kräver en större datamängd för att kunna uppskatta rotationen hos den okända datan. Nearest neighbour-metoden lyckades uppskatta rotationen med 100% noggrannhet över 1000 iterationer, även när endast ett segment så litet som 40% av signalen utvärderades. Tyvärr lider denna metod av hög beräkningsbelastning och kräver förbättringar för att vara praktiskt tillämpbar. Sammantaget visade resultaten att nearest neighbour-metoden har potential att vara ett lovande tillvägagångssätt för att uppskatta vinkelpositionen och kan på så sätt bidra till förbättring av verktygens noggrannhet. applied mathematics circular fully convolutional network nearest neighbour method power tools threaded fasteners neural network machine learning convolution tillämpad matematik cirkulärt faltningsnätverk närmaste granne-metod elverktyg skruvförband neuralt nätverk maskininlärning faltning Other Mathematics Annan matematik
3	A deep learning based anomaly detection pipeline for battery fleets Khongbantabam, Nabakumar Singh January 2021 (has links) This thesis proposes a deep learning anomaly detection pipeline to detect possible anomalies during the operation of a fleet of batteries and presents its development and evaluation. The pipeline employs sensors that connect to each battery in the fleet to remotely collect real-time measurements of their operating characteristics, such as voltage, current, and temperature. The deep learning based time-series anomaly detection model was developed using Variational Autoencoder (VAE) architecture that utilizes either Long Short-Term Memory (LSTM) or, its cousin, Gated Recurrent Unit (GRU) as the encoder and the decoder networks (LSTMVAE and GRUVAE). Both variants were evaluated against three well-known conventional anomaly detection algorithms Isolation Nearest Neighbour (iNNE), Isolation Forest (iForest), and kth Nearest Neighbour (k-NN) algorithms. All five models were trained using two variations in the training dataset (full-year dataset and partial recent dataset), producing a total of 10 different model variants. The models were trained using the unsupervised method and the results were evaluated using a test dataset consisting of a few known anomaly days in the past operation of the customer’s battery fleet. The results demonstrated that k-NN and GRUVAE performed close to each other, outperforming the rest of the models with a notable margin. LSTMVAE and iForest performed moderately, while the iNNE and iForest variant trained with the full dataset, performed the worst in the evaluation. A general observation also reveals that limiting the training dataset to only a recent period produces better results nearly consistently across all models. / Detta examensarbete föreslår en pipeline för djupinlärning av avvikelser för att upptäcka möjliga anomalier under driften av en flotta av batterier och presenterar dess utveckling och utvärdering. Rörledningen använder sensorer som ansluter till varje batteri i flottan för att på distans samla in realtidsmätningar av deras driftsegenskaper, såsom spänning, ström och temperatur. Den djupinlärningsbaserade tidsserieanomalidetekteringsmodellen utvecklades med VAE-arkitektur som använder antingen LSTM eller, dess kusin, GRU som kodare och avkodarnätverk (LSTMVAE och GRU) VAE). Båda varianterna utvärderades mot tre välkända konventionella anomalidetekteringsalgoritmer -iNNE, iForest och k-NN algoritmer. Alla fem modellerna tränades med hjälp av två varianter av träningsdatauppsättningen (helårsdatauppsättning och delvis färsk datauppsättning), vilket producerade totalt 10 olika modellvarianter. Modellerna tränades med den oövervakade metoden och resultaten utvärderades med hjälp av en testdatauppsättning bestående av några kända anomalidagar under tidigare drift av kundens batteriflotta. Resultaten visade att k-NN och GRUVAE presterade nära varandra och överträffade resten av modellerna med en anmärkningsvärd marginal. LSTMVAE och iForest presterade måttligt, medan varianten iNNE och iForest tränade med hela datasetet presterade sämst i utvärderingen. En allmän observation avslöjar också att en begränsning av träningsdatauppsättningen till endast en ny period ger bättre resultat nästan konsekvent över alla modeller. Forklift batteries Battery sensors Data pipeline Predictive maintenance Anomaly detection Deep learning Battery failure prediction Time-series Variational autoencoder Long short-term memory LSTM Gated recurrent unit GRU Isolation nearest neighbor iNNE Isolation forest iForest kth nearest neighbor kNN. Gaffeltruckbatterier Batterisensorer Datapipeline Prediktivt underhåll Avvikelsedetektering Deep learning Batterifelsprediktion Tidsserier Variationsautokodare Långt korttidsminne LSTM Gated recurrent unit GRU Isolation närmaste granne iNNE Isolation skog iForest kth närmaste granne kNN. Computer and Information Sciences Data- och informationsvetenskap

1

Page generated in 0.0699 seconds