191 |
Inomhuspositionering med bredbandig radioGustavsson, Oscar, Miksits, Adam January 2019 (has links)
In this report it is evaluated whether a higher dimensional fingerprint vector increases accuracy of an algorithm for indoor localisation. Many solutions use a Received Signal Strength Indicator (RSSI) to estimate a position. It was studied if the use of the Channel State Information (CSI), i.e. the channel’s frequency response, is beneficial for the accuracy.The localisation algorithm estimates the position of a new measurement by comparing it to previous measurements using k-Nearest Neighbour (k-NN) regression. The mean power was used as RSSI and 100 samples of the frequency response as CSI. Reduction of the dimension of the CSI vector with statistical moments and Principal Component Analysis (PCA) was tested. An improvement in accuracy could not be observed by using a higher dimensional fingerprint vector than RSSI. A standardised Euclidean or Mahalanobis distance measure in the k-NN algorithm seemed to perform better than Euclidean distance. Taking the logarithm of the frequency response samples before doing any calculation also seemed to improve accuracy. / I denna rapport utvärderas huruvida data av högre dimension ökar noggrannheten hos en algoritm för inomhuspositionering. Många lösningar använder en indikator för mottagen signalstyrka (RSSI) för att skatta en position. Det studerades studerade om användningen av kanalens fysikaliska tillstånd (CSI), det vill säga kanalens frekvenssvar, är fördelaktig för noggrannheten.Positioneringsalgoritmen skattar positionen för en ny mätning genom att jämföra den med tidigare mätningar med k-Nearest Neighbour (k-NN)-regression. Medeleffekten användes som RSSI och 100 sampel av frekvenssvaret som CSI. Reducering av CSI vektornsdimension med statistiska moment och Principalkomponentanalys(PCA) testades. En förbättring av noggrannheten kunde inte observeras genom att använda data med högre dimension än RSSI. Ett standardiserat Euklidiskt eller Mahalanobis avståndsåatt i k-NN-algoritmen verkade prestera bättre än Euklidiskt avstånd. Att ta logaritmen av frekvenssvarets sampel innan andra beräkningar gjordes verkade också förbättra noggrannheten.
|
192 |
Prediktion av efterfrågan i filmbranschen baserat på maskininlärningLiu, Julia, Lindahl, Linnéa January 2018 (has links)
Machine learning is a central technology in data-driven decision making. In this study, machine learning in the context of demand forecasting in the motion picture industry from film exhibitors’ perspective is investigated. More specifically, it is investigated to what extent the technology can assist estimation of public interest in terms of revenue levels of unreleased movies. Three machine learning models are implemented with the aim to forecast cumulative revenue levels during the opening weekend of various movies which were released in 2010-2017 in Sweden. The forecast is based on ten attributes which range from public online user-generated data to specific movie characteristics such as production budget and cast. The results indicate that the choice of attributes as well as models in this study were not optimal on the Swedish market as the retrieved values from relevant precision metrics were inadequate, however with valid underlying reasons. / Maskininlärning är en central teknik i datadrivet beslutsfattande. I den här rapporten utreds maskininlärning isammanhanget av efterfrågeprediktion i filmbranschen från biografers perspektiv. Närmare bestämt undersöks det i vilken utsträckningtekniken kan bistå uppskattning av publikintresse i termer av intäkter vad gäller osläppta filmer hos biografer. Tremaskininlärningsmodeller implementeras i syfte att göra en prognos på kumulativa intäktsnivåer under premiärhelgen för filmer vilkahade premiär 2010-2017 i Sverige. Prognostiseringen baseras på varierande attribut som sträcker sig från publik användargenererad data på nätet till filmspecifika variabler så som produktionsbudget och uppsättning av skådespelare. De erhållna resultaten visar att valen av attribut och modeller inte var optimala på den svenska marknaden då erhållna precisionsmått från modellerna antog låga värden, med relevanta underliggande skäl.
|
193 |
Identifying the beginning of a kayak race using velocity signal dataKvedaraite, Indre January 2023 (has links)
A kayak is a small watercraft that moves over the water. The kayak is propelled by a person sitting inside of the hull and paddling using a double-bladed paddle. While kayaking can be casual, it is used as a competitive sport in races and even the Olympic games. Therefore, it is important to be able to analyse athletes’ performance during the race. To study the race better, some kayaking teams and organizations have attached sensors to their kayaks. These sensors record various data, which is later used to generate performance reports. However, to generate such reports, the coach must manually pinpoint the beginning of the race because the sensors collect data before the actual race begins, which may include practice runs, warming-up sessions, or just standing and waiting position. The identification of the race start and the race sequence in the data is tedious and time-consuming work and could be automated. This project proposes an approach to identify kayak races from velocity signal data with the help of a machine learning algorithm. The proposed approach is a combination of several techniques: signal preprocessing, a machine learning algorithm, and a programmatic approach. Three machine learning algorithms were evaluated to detect the race sequence, which are Support Vector Machine (SVM), k-Nearest Neighbour (kNN), and Random Forest (RF). SVM outperformed other algorithms with an accuracy of 95%. Programmatic approach was proposed to identify the start time of the race. The average error of the proposed approach is 0.24 seconds. The proposed approach was utilized in the implemented web-based application with a user interface for coaches to automatically detect the beginning of a kayak race and race signal sequence.
|
194 |
Data mining inom tillverkningsindustrin : En fallstudie om möjligheten att förutspå kvalitetsutfall i produktionslinjerJanson, Lisa, Mathisson, Minna January 2021 (has links)
I detta arbete har en fallstudie utförts på Volvo Group i Köping. I takt med ¨övergången till industri 4.0, ökar möjligheterna att använda maskininlärning som ett verktyg i analysen av industriell data och vidareutvecklingen av industriproduktionen. Detta arbete syftar till att undersöka möjligheten att förutspå kvalitetsutfall vid sammanpressning av nav och huvudaxel. Metoden innefattar implementering av tre maskininlärningsmodeller samt evaluering av dess prestation i förhållande till varandra. Vid applicering av modellerna på monteringsdata från fabriken erhölls ett bristfälligt resultat, vilket indikerar att det utifrån de inkluderade variablerna inte är möjligt att förutspå kvalitetsutfallet. Orsakerna som låg till grund för resultatet granskades, och det resulterade i att det förmodligen berodde på att modellerna var oförmögna att finna samband i datan eller att det inte fanns något samband i datasetet. För att avgöra vilken av dessa två faktorer som var avgörande skapades ett fabricerat dataset där tre nya variabler introducerades. De fabricerade värdena på dessa variabler skapades på sådant sätt att det fanns syntetisk kausalitet mellan två av variablerna och kvalitetsutfallet. Vid applicering av modellerna på den fabricerade datan, lyckades samtliga modeller identifiera det syntetiska sambandet. Utifrån det drogs slutsatsen att det bristfälliga resultatet inte berodde på modellernas prestation utan att det inte fanns något samband i datasetet bestående av verklig monteringsdata. Det här bidrog till bedömningen att om spårbarheten på komponenterna hade ökat i framtiden, i kombination med att fler maskiner i produktionslinjen genererade data till ett sammankopplat system, skulle denna studie kunna utföras igen, men med fler variabler och ett större dataset. Support vector machine var den modell som presterade bäst, givet de prestationsmått som användes i denna studie. Det faktum att modellerna som inkluderats i den här studien lyckades identifiera sambandet i datan, när det fanns vetskap om att sambandet existerade, motiverar användandet av dessa modeller i framtida studier. Avslutningsvis kan det konstateras att med förbättrad spårbarhet och en allt mer uppkopplad fabrik, finns det möjlighet att använda maskininlärningsmodeller som komponenter i större system för att kunna uppnå effektiviseringar. / As the adaptation towards Industry 4.0 proceeds, the possibility of using machine learning as a tool for further development of industrial production, becomes increasingly profound. In this paper, a case study has been conducted at Volvo Group in Köping, in order to investigate the wherewithals of predicting quality outcomes in the compression of hub and mainshaft. In the conduction of this study, three different machine learning models were implemented and compared amongst each other. A dataset containing data from Volvo’s production site in Köping was utilized when training and evaluating the models. However, the low evaluation scores acquired from this, indicate that the quality outcome of the compression could not be predicted given solely the variables included in that dataset. Therefore, a dataset containing three additional variables consisting of fabricated values and a known causality between two of the variables and the quality outcome, was also utilized. The purpose of this was to investigate whether the poor evaluation metrics resulted from a non-existent pattern between the included variables and the quality outcome, or from the models not being able to find the pattern. The performance of the models, when trained and evaluated on the fabricated dataset, indicate that the models were in fact able to find the pattern that was known to exist. Support vector machine was the model that performed best, given the evaluation metrics that were chosen in this study. Consequently, if the traceability of the components were to be enhanced in the future and an additional number of machines in the production line would transmit production data to a connected system, it would be possible to conduct the study again with additional variables and a larger data set. The fact that the models included in this study succeeded in finding patterns in the dataset when such patterns were known to exist, motivates the use of the same models. Furthermore, it can be concluded that with enhanced traceability of the components and a larger amount of machines transmitting production data to a connected system, there is a possibility that machine learning models could be utilized as components in larger business monitoring systems, in order to achieve efficiencies.
|
195 |
Efficient Algorithms for Data Mining with Federated DatabasesYoung, Barrington R. St. A. 03 July 2007 (has links)
No description available.
|
196 |
A Parallel Algorithm for Query Adaptive, Locality Sensitive Hash SearchCarraher, Lee A. 17 September 2012 (has links)
No description available.
|
197 |
Statistics of Quantum Energy Levels of Integrable Systems and a Stochastic Network Model with Applications to Natural and Social SciencesMa, Tao 18 October 2013 (has links)
No description available.
|
198 |
Predicting basketball performance based on draft pick : A classification analysisHarmén, Fredrik January 2022 (has links)
In this thesis, we will look to predict the performance of a basketball player coming into the NBA depending on where the player was picked in the NBA draft. This will be done by testing different machine learning models on data from the previous 35 NBA drafts and then comparing the models in order to see which model had the highest accuracy of classification. The machine learning methods used are Linear Discriminant Analysis, K-Nearest Neighbors, Support Vector Machines and Random Forests. The results show that the method with the highest accuracy of classification was Random Forests, with an accuracy of 42%.
|
199 |
Investigating the performance of matrix factorization techniques applied on purchase data for recommendation purposesHolländer, John January 2015 (has links)
Automated systems for producing product recommendations to users is a relatively new area within the field of machine learning. Matrix factorization techniques have been studied to a large extent on data consisting of explicit feedback such as ratings, but to a lesser extent on implicit feedback data consisting of for example purchases.The aim of this study is to investigate how well matrix factorization techniques perform compared to other techniques when used for producing recommendations based on purchase data. We conducted experiments on data from an online bookstore as well as an online fashion store, by running algorithms processing the data and using evaluation metrics to compare the results. We present results proving that for many types of implicit feedback data, matrix factorization techniques are inferior to various neighborhood- and association rules techniques for producing product recommendations. We also present a variant of a user-based neighborhood recommender system algorithm \textit{(UserNN)}, which in all tests we ran outperformed both the matrix factorization algorithms and the k-nearest neighbors algorithm regarding both accuracy and speed. Depending on what dataset was used, the UserNN achieved a precision approximately 2-22 percentage points higher than those of the matrix factorization algorithms, and 2 percentage points higher than the k-nearest neighbors algorithm. The UserNN also outperformed the other algorithms regarding speed, with time consumptions 3.5-5 less than those of the k-nearest neighbors algorithm, and several orders of magnitude less than those of the matrix factorization algorithms.
|
200 |
台灣地震散群之研究吳東陽 Unknown Date (has links)
九二一地震是台灣數十年來傷亡最大的地震,根據中央氣象局的研究發現九二一地震之後半年至一年內發生的地震,大多數都是由其引發的餘震,然而一個地震屬於主震、或是某個地震的餘震又該如何判斷呢?本文是以統計資料分析之觀點來區分主震與餘震,而不是利用相關地震學理論來區分主震與餘震,本文主要研究的是比較四種區分主震與餘震的方法:整體距離(Global Distance)、負相關(Negative Correlation)、最近鄰區(Nearest Neighbors)、視窗(Window)。四種地震散群方法所需要給定的參數:時間與空間參數,要如何選取與決定,本文則是利用台灣自1991年1月 1日至2003年12月31日之地震規模大於5.0以上的資料,定義地震減少比例(decreasing earthquake percent)來選取參數,以求出最適當的模型參數。套用選取得到的模型參數,利用電腦模擬地震來驗證比較方法的優劣,依據誤判主震(False Positive)、誤判餘震(False Negative)、分錯比例(Overall Error Rate)等準則比較各種地震散群方法的優劣,研究發現四種方法各有其優劣之處。
關鍵詞:主震、餘震、空間統計、最近鄰區、電腦模擬 / The Chi-Chi earthquake resulted in one of the greatest casualties for the past 100 years in Taiwan. According to the Central Weather Bureau in Taiwan, most of the earthquakes that occurred 6 months to 12 months after the Chi-Chi earthquake were the aftershocks. But in general, how do we classify if a certain earthquake is a main earthquake or aftershock? In this study, our interest is on the statistical methods for detecting whether an earthquake is a main earthquake. Four declustering methods are considered: Global Distance, Negative Correlation, Nearest Neighbors and Window. Taiwan earthquake data, with magnitude larger than 5 occurring between 1991 and 2003, were used to determine the parameters used in these four methods. Finally, a computer simulation is used to evaluate the performance of four methods, based on the results such as false positive and false negative, and overall Error Rate.
Key Words: Decluster, Aftershock, Spatial Statistics, Nearest Neighbors, Simulation
|
Page generated in 0.0598 seconds