Spelling suggestions: "subject:"knearest"" "subject:"nearest""
71 |
Découverte d'évènements par contenu visuel dans les médias sociaux / Visual-based event mining in social mediaTrad, Riadh 05 June 2013 (has links)
L’évolution du web, de ce qui était typiquement connu comme un moyen de communication à sens unique en mode conversationnel, a radicalement changé notre manière de traiter l’information. Des sites de médias sociaux tels que Flickr et Facebook, offrent des espaces d’échange et de diffusion de l’information. Une information de plus en plus riche, mais aussi personnelle, et qui s’organise, le plus souvent, autour d’événements de la vie réelle. Ainsi, un événement peut être perçu comme un ensemble de vues personnelles et locales, capturées par différents utilisateurs. Identifier ces différentes instances permettrait, dès lors, de reconstituer une vue globale de l’événement. Plus particulièrement, lier différentes instances d’un même événement profiterait à bon nombre d’applications tel que la recherche, la navigation ou encore le filtrage et la suggestion de contenus. L’objectif principal de cette thèse est l’identification du contenu multimédia, associé à un événement dans de grandes collections d’images. Une première contribution est une méthode de recherche d’événements basée sur le contenu visuel. La deuxième contribution est une approche scalable et distribuée pour la construction de graphes des K plus proches voisins. La troisième contribution est une méthode collaborative pour la sélection de contenu pertinent. Plus particulièrement, nous nous intéresserons aux problèmes de génération automatique de résumés d’événements et suggestion de contenus dans les médias sociaux. / The ease of publishing content on social media sites brings to the Web an ever increasing amount of user generated content captured during, and associated with, real life events. Social media documents shared by users often reflect their personal experience of the event. Hence, an event can be seen as a set of personal and local views, recorded by different users. These event records are likely to exhibit similar facets of the event but also specific aspects. By linking different records of the same event occurrence we can enable rich search and browsing of social media events content. Specifically, linking all the occurrences of the same event would provide a general overview of the event. In this dissertation we present a content-based approach for leveraging the wealth of social media documents available on the Web for event identification and characterization. To match event occurrences in social media, we develop a new visual-based method for retrieving events in huge photocollections, typically in the context of User Generated Content. The main contributions of the thesis are the following : (1) a new visual-based method for retrieving events in photo collections, (2) a scalable and distributed framework for Nearest Neighbors Graph construction for high dimensional data, (3) a collaborative content-based filtering technique for selecting relevant social media documents for a given event.
|
72 |
Classification of weather conditions based on supervised learningSafia, Mohamad, Abbas, Rodi January 2023 (has links)
Forecasting the weather remains a challenging task because of the atmosphere's complexity and unpredictable nature. A few of the factors that decide weather conditions, such as rain, clouds, clear skies, and sunshine, include temperature, pressure, humidity, wind speed, and direction. Currently, sophisticated, and physical models are used to forecast weather, but they have several limitations, particularly in terms of computational time. In the past few years, supervised machine learning algorithms have shown great promise for the precise forecasting of meteorological events. Using historical weather data, these strategies train a model to predict the weather in the future. This study employs supervised machine learning techniques, including k-nearest neighbors (KNNs), support vector machines (SVMs), random forests (RFs), and artificial neural networks (ANNs), for better weather forecast accuracy. To conduct this study, we employed historical weather data from the Weatherstack API. The data spans several years and contains information on several meteorological variables, including temperature, pressure, humidity, wind speed, and direction. The data is processed beforehand which includes normalizing it and dividing it into separate training and testing sets. Finally, the effectiveness of different models is examined to determine which is best for producing accurate weather forecasts. The results of this study provide information on the application of supervised machine learning methods for weather forecasting and support the creation of better weather prediction models. / Att förutsäga vädret är fortfarande en utmanande uppgift på grund av atmosfärens komplexitet och oförutsägbara natur. Några av faktorerna som påverkar väderförhållandena, som regn, moln, klart väder och solsken, inkluderar temperatur, tryck, luftfuktighet, vindhastighet och riktning. För närvarande används sofistikerade fysiska modeller för att förutsäga vädret, men de har flera begränsningar, särskilt när det gäller beräkningstid. Under de senaste åren har övervakade maskininlärningsalgoritmer visat stor potential för att noggrant förutsäga meteorologiska händelser. Genom att använda historiska väderdata tränar dessa strategier en modell för att förutsäga framtida väder. Denna studie använder övervakade maskininlärningstekniker, inklusive k-nearest neighbors (KNNs), support vector machines (SVMs), random forests (RFs) och artificial neural networks (ANNs), för att förbättra noggrannheten i väderprognoser. För att genomföra denna studie använde vi historiska väderdata från Weatherstack API. Data sträcker sig över flera år och innehåller information om flera meteorologiska variabler, inklusive temperatur, tryck, luftfuktighet, vindhastighet och riktning. Data bearbetas i förväg, vilket inkluderar normalisering och uppdelning i separata tränings- och testset. Slutligen undersöks effektiviteten hos olika modeller för att avgöra vilken som är bäst för att producera noggranna väderprognoser. Resultaten av denna studie ger information om tillämpningen av övervakade maskininlärningsmetoder för väderprognoser och stödjer skapandet av bättre väderprognosmodeller.
|
73 |
Inomhuspositionering med bredbandig radioGustavsson, Oscar, Miksits, Adam January 2019 (has links)
In this report it is evaluated whether a higher dimensional fingerprint vector increases accuracy of an algorithm for indoor localisation. Many solutions use a Received Signal Strength Indicator (RSSI) to estimate a position. It was studied if the use of the Channel State Information (CSI), i.e. the channel’s frequency response, is beneficial for the accuracy.The localisation algorithm estimates the position of a new measurement by comparing it to previous measurements using k-Nearest Neighbour (k-NN) regression. The mean power was used as RSSI and 100 samples of the frequency response as CSI. Reduction of the dimension of the CSI vector with statistical moments and Principal Component Analysis (PCA) was tested. An improvement in accuracy could not be observed by using a higher dimensional fingerprint vector than RSSI. A standardised Euclidean or Mahalanobis distance measure in the k-NN algorithm seemed to perform better than Euclidean distance. Taking the logarithm of the frequency response samples before doing any calculation also seemed to improve accuracy. / I denna rapport utvärderas huruvida data av högre dimension ökar noggrannheten hos en algoritm för inomhuspositionering. Många lösningar använder en indikator för mottagen signalstyrka (RSSI) för att skatta en position. Det studerades studerade om användningen av kanalens fysikaliska tillstånd (CSI), det vill säga kanalens frekvenssvar, är fördelaktig för noggrannheten.Positioneringsalgoritmen skattar positionen för en ny mätning genom att jämföra den med tidigare mätningar med k-Nearest Neighbour (k-NN)-regression. Medeleffekten användes som RSSI och 100 sampel av frekvenssvaret som CSI. Reducering av CSI vektornsdimension med statistiska moment och Principalkomponentanalys(PCA) testades. En förbättring av noggrannheten kunde inte observeras genom att använda data med högre dimension än RSSI. Ett standardiserat Euklidiskt eller Mahalanobis avståndsåatt i k-NN-algoritmen verkade prestera bättre än Euklidiskt avstånd. Att ta logaritmen av frekvenssvarets sampel innan andra beräkningar gjordes verkade också förbättra noggrannheten.
|
74 |
Prediktion av efterfrågan i filmbranschen baserat på maskininlärningLiu, Julia, Lindahl, Linnéa January 2018 (has links)
Machine learning is a central technology in data-driven decision making. In this study, machine learning in the context of demand forecasting in the motion picture industry from film exhibitors’ perspective is investigated. More specifically, it is investigated to what extent the technology can assist estimation of public interest in terms of revenue levels of unreleased movies. Three machine learning models are implemented with the aim to forecast cumulative revenue levels during the opening weekend of various movies which were released in 2010-2017 in Sweden. The forecast is based on ten attributes which range from public online user-generated data to specific movie characteristics such as production budget and cast. The results indicate that the choice of attributes as well as models in this study were not optimal on the Swedish market as the retrieved values from relevant precision metrics were inadequate, however with valid underlying reasons. / Maskininlärning är en central teknik i datadrivet beslutsfattande. I den här rapporten utreds maskininlärning isammanhanget av efterfrågeprediktion i filmbranschen från biografers perspektiv. Närmare bestämt undersöks det i vilken utsträckningtekniken kan bistå uppskattning av publikintresse i termer av intäkter vad gäller osläppta filmer hos biografer. Tremaskininlärningsmodeller implementeras i syfte att göra en prognos på kumulativa intäktsnivåer under premiärhelgen för filmer vilkahade premiär 2010-2017 i Sverige. Prognostiseringen baseras på varierande attribut som sträcker sig från publik användargenererad data på nätet till filmspecifika variabler så som produktionsbudget och uppsättning av skådespelare. De erhållna resultaten visar att valen av attribut och modeller inte var optimala på den svenska marknaden då erhållna precisionsmått från modellerna antog låga värden, med relevanta underliggande skäl.
|
75 |
Identifying the beginning of a kayak race using velocity signal dataKvedaraite, Indre January 2023 (has links)
A kayak is a small watercraft that moves over the water. The kayak is propelled by a person sitting inside of the hull and paddling using a double-bladed paddle. While kayaking can be casual, it is used as a competitive sport in races and even the Olympic games. Therefore, it is important to be able to analyse athletes’ performance during the race. To study the race better, some kayaking teams and organizations have attached sensors to their kayaks. These sensors record various data, which is later used to generate performance reports. However, to generate such reports, the coach must manually pinpoint the beginning of the race because the sensors collect data before the actual race begins, which may include practice runs, warming-up sessions, or just standing and waiting position. The identification of the race start and the race sequence in the data is tedious and time-consuming work and could be automated. This project proposes an approach to identify kayak races from velocity signal data with the help of a machine learning algorithm. The proposed approach is a combination of several techniques: signal preprocessing, a machine learning algorithm, and a programmatic approach. Three machine learning algorithms were evaluated to detect the race sequence, which are Support Vector Machine (SVM), k-Nearest Neighbour (kNN), and Random Forest (RF). SVM outperformed other algorithms with an accuracy of 95%. Programmatic approach was proposed to identify the start time of the race. The average error of the proposed approach is 0.24 seconds. The proposed approach was utilized in the implemented web-based application with a user interface for coaches to automatically detect the beginning of a kayak race and race signal sequence.
|
76 |
Data mining inom tillverkningsindustrin : En fallstudie om möjligheten att förutspå kvalitetsutfall i produktionslinjerJanson, Lisa, Mathisson, Minna January 2021 (has links)
I detta arbete har en fallstudie utförts på Volvo Group i Köping. I takt med ¨övergången till industri 4.0, ökar möjligheterna att använda maskininlärning som ett verktyg i analysen av industriell data och vidareutvecklingen av industriproduktionen. Detta arbete syftar till att undersöka möjligheten att förutspå kvalitetsutfall vid sammanpressning av nav och huvudaxel. Metoden innefattar implementering av tre maskininlärningsmodeller samt evaluering av dess prestation i förhållande till varandra. Vid applicering av modellerna på monteringsdata från fabriken erhölls ett bristfälligt resultat, vilket indikerar att det utifrån de inkluderade variablerna inte är möjligt att förutspå kvalitetsutfallet. Orsakerna som låg till grund för resultatet granskades, och det resulterade i att det förmodligen berodde på att modellerna var oförmögna att finna samband i datan eller att det inte fanns något samband i datasetet. För att avgöra vilken av dessa två faktorer som var avgörande skapades ett fabricerat dataset där tre nya variabler introducerades. De fabricerade värdena på dessa variabler skapades på sådant sätt att det fanns syntetisk kausalitet mellan två av variablerna och kvalitetsutfallet. Vid applicering av modellerna på den fabricerade datan, lyckades samtliga modeller identifiera det syntetiska sambandet. Utifrån det drogs slutsatsen att det bristfälliga resultatet inte berodde på modellernas prestation utan att det inte fanns något samband i datasetet bestående av verklig monteringsdata. Det här bidrog till bedömningen att om spårbarheten på komponenterna hade ökat i framtiden, i kombination med att fler maskiner i produktionslinjen genererade data till ett sammankopplat system, skulle denna studie kunna utföras igen, men med fler variabler och ett större dataset. Support vector machine var den modell som presterade bäst, givet de prestationsmått som användes i denna studie. Det faktum att modellerna som inkluderats i den här studien lyckades identifiera sambandet i datan, när det fanns vetskap om att sambandet existerade, motiverar användandet av dessa modeller i framtida studier. Avslutningsvis kan det konstateras att med förbättrad spårbarhet och en allt mer uppkopplad fabrik, finns det möjlighet att använda maskininlärningsmodeller som komponenter i större system för att kunna uppnå effektiviseringar. / As the adaptation towards Industry 4.0 proceeds, the possibility of using machine learning as a tool for further development of industrial production, becomes increasingly profound. In this paper, a case study has been conducted at Volvo Group in Köping, in order to investigate the wherewithals of predicting quality outcomes in the compression of hub and mainshaft. In the conduction of this study, three different machine learning models were implemented and compared amongst each other. A dataset containing data from Volvo’s production site in Köping was utilized when training and evaluating the models. However, the low evaluation scores acquired from this, indicate that the quality outcome of the compression could not be predicted given solely the variables included in that dataset. Therefore, a dataset containing three additional variables consisting of fabricated values and a known causality between two of the variables and the quality outcome, was also utilized. The purpose of this was to investigate whether the poor evaluation metrics resulted from a non-existent pattern between the included variables and the quality outcome, or from the models not being able to find the pattern. The performance of the models, when trained and evaluated on the fabricated dataset, indicate that the models were in fact able to find the pattern that was known to exist. Support vector machine was the model that performed best, given the evaluation metrics that were chosen in this study. Consequently, if the traceability of the components were to be enhanced in the future and an additional number of machines in the production line would transmit production data to a connected system, it would be possible to conduct the study again with additional variables and a larger data set. The fact that the models included in this study succeeded in finding patterns in the dataset when such patterns were known to exist, motivates the use of the same models. Furthermore, it can be concluded that with enhanced traceability of the components and a larger amount of machines transmitting production data to a connected system, there is a possibility that machine learning models could be utilized as components in larger business monitoring systems, in order to achieve efficiencies.
|
77 |
Efficient Algorithms for Data Mining with Federated DatabasesYoung, Barrington R. St. A. 03 July 2007 (has links)
No description available.
|
78 |
Predicting basketball performance based on draft pick : A classification analysisHarmén, Fredrik January 2022 (has links)
In this thesis, we will look to predict the performance of a basketball player coming into the NBA depending on where the player was picked in the NBA draft. This will be done by testing different machine learning models on data from the previous 35 NBA drafts and then comparing the models in order to see which model had the highest accuracy of classification. The machine learning methods used are Linear Discriminant Analysis, K-Nearest Neighbors, Support Vector Machines and Random Forests. The results show that the method with the highest accuracy of classification was Random Forests, with an accuracy of 42%.
|
79 |
Operação de busca exata aos K-vizinhos mais próximos reversos em espaços métricos / Answering exact reverse k-nerarest neighbors queries in metric spaceOliveira, Willian Dener de 19 March 2010 (has links)
A complexidade dos dados armazenados em grandes bases de dados aumenta cada vez mais, criando a necessidade de novas operações de consulta. Uma classe de operações que tem apresentado interesse crescente são as chamadas Consultas por Similaridade, sendo as mais conhecidas as consultas por Abrangência (\'R IND. q\') e por k-Vizinhos mais Proximos (kNN), sendo que esta ultima obtem quais são os k elementos armazenados mais similares a um dado elemento de referência. Outra consulta que é interessante tanto para consultas diretas quanto como parte de operações de análises mais complexas e a operação de consulta aos k-Vizinhos mais Próximos Reversos (RkNN). Seu objetivo e obter todos os elementos armazenados que têm um dado elemento de referência como um dos seus k elementos mais similares. Devido a complexidade de execução da operação de RkNN, a grande maioria das soluções existentes restringem-se a dados representados em espaços multidimensionais euclidianos (nos quais estão denidas tambem operações cardinais e topológicas, além de se considerar a similaridade como sendo a distância Euclidiana entre dois elementos), ou então obtém apenas respostas aproximadas, sujeitas a existência de falsos negativos. Várias aplicações de análise de dados científicos, médicos, de engenharia, financeiros, etc. requerem soluções eficientes para o problema da operação de RkNN sobre dados representados em espaços métricos, onde os elementos não podem ser considerados estar em um espaço nem Euclidiano nem multidimensional. Num espaço métrico, além dos próprios elementos armazenados existe apenas uma função de comparação métrica entre pares de objetos. Neste trabalho, são propostas novas podas de espaço de busca e o algoritmo RkNN-MG que utiliza essas novas podas para solucionar o problema de consultas RkNN exatas em espaços métricos sem limitações. Toda a proposta supõe que o conjunto de dados esta em um espaço métrico imerso isometricamente em espaço euclidiano e utiliza propriedades da geometria métrica válida neste espaço para realizar podas eficientes por lei dos cossenos combinada com as podas tradicionais por desigualdade triangular. Os experimentos demonstram comparativamente que as novas podas são mais eficientes que as tradicionais podas por desigualdade triangular, tendo desempenhos equivalente quando comparadas em conjuntos de alta dimensionalidade ou com dimensão fractal alta. Assim, os resultados confirmam as novas podas propostas como soluções alternativas eficientes para o problema de consultas RkNN / Data stored in large databases present an ever increasing complexity, pressing for the development of new classes of query operators. One such class, which is enticing an increasing interest, is the so-called Similarity Queries, where the most common are the similarity range queries (\'R IND. q\') and the k-nearest neighbor queries (kNN). A k-nearest neighbor query aims at retrieving the k stored elements nearer (or more similar) to a given reference element. Another important similarity query is the reverse k-nearest neighbor (RkNN), useful both for queries posed directly by the analyst and for queries that are part of more complex analysis processes. The objective of a reverse k-nearest neighbor queries is obtaining the stored elements that has the query reference element as one of their k-nearest neighbors. As the RkNN operation is a rather expensive operation, from the computational standpoint, most existing solutions only solve the query when applied over Euclidean multidimensional spaces (as these spaces also define cardinal and topological operations besides the Euclidean distance between pairs of elements) or retrieve only approximate answers, where false negatives can occur. Several applications, like the analysis of scientific, medical, engineering or financial data, require efficient and exact answers for the RkNN queries over data which is frequently represented in metric spaces, that is where no other property besides the similarity measure exists. Therefore, for applications handling metrical data, the assumption of Euclidean metric or even multidimensional data cannot be used. In this work, we propose new pruning rules based on the law of cosines, and the RkNN-MG algorithm, which uses them to solve RkNN queries in a way that is exact, faster than the existing approaches, that is not limited for any value of k, and that can be applied both over static and over dynamic datasets. The new pruning rules assume that the data set is in a metric space that can be embedded into an Euclidean space and use metric geometry properties valid in this space to perform effective pruning based on the law of cosines combined with the traditional pruning based on the triangle inequality property. The experiments show that the new pruning rules are alkways more efficient than the traditional pruning rules based solely on the triangle inequality. The experiments show that for high high dimensionality datasets, or for metric datasets with high fractal dimensionality, the performance improvement is smaller than for for lower dimensioinality datasets, but it\'s never worse. Thus, the results confirm that the our pruning rules are efficient alternative to solve RkNN queries in general
|
80 |
Aplicação de classificadores para determinação de conformidade de biodiesel / Attesting compliance of biodiesel quality using classification methodsLOPES, Marcus Vinicius de Sousa 26 July 2017 (has links)
Submitted by Rosivalda Pereira (mrs.pereira@ufma.br) on 2017-09-04T17:47:07Z
No. of bitstreams: 1
MarcusLopes.pdf: 2085041 bytes, checksum: 14f6f9bbe0d5b050a23103874af8c783 (MD5) / Made available in DSpace on 2017-09-04T17:47:07Z (GMT). No. of bitstreams: 1
MarcusLopes.pdf: 2085041 bytes, checksum: 14f6f9bbe0d5b050a23103874af8c783 (MD5)
Previous issue date: 2017-07-26 / The growing demand for energy and the limitations of oil reserves have led to the
search for renewable and sustainable energy sources to replace, even partially, fossil fuels.
Biodiesel has become in last decades the main alternative to petroleum diesel. Its quality
is evaluated by given parameters and specifications which vary according to country or
region like, for example, in Europe (EN 14214), US (ASTM D6751) and Brazil (RANP
45/2014), among others. Some of these parameters are intrinsically related to the composition
of fatty acid methyl esters (FAMEs) of biodiesel, such as viscosity, density, oxidative
stability and iodine value, which allows to relate the behavior of these properties with the
size of the carbon chain and the presence of unsaturation in the molecules. In the present
work four methods for direct classification (support vector machine, K-nearest neighbors,
decision tree classifier and artificial neural networks) were optimized and compared to
classify biodiesel samples according to their compliance to viscosity, density, oxidative
stability and iodine value, having as input the composition of fatty acid methyl esters,
since those parameters are intrinsically related to composition of biodiesel. The classifi-
cations were carried out under the specifications of standards EN 14214, ASTM D6751
and RANP 45/2014. A comparison between these methods of direct classification and empirical
equations (indirect classification) distinguished positively the direct classification
methods in the problem addressed, especially when the biodiesel samples have properties
values very close to the limits of the considered specifications. / A demanda crescente por fontes de energia renováveis e como alternativa aos combustíveis
fósseis tornam o biodiesel como uma das principais alternativas para substituição dos derivados do petróleo. O controle da qualidade do biodiesel durante processo de
produção e distribuição é extremamente importante para garantir um combustível com
qualidade confiável e com desempenho satisfatório para o usuário final. O biodiesel é
caracterizado pela medição de determinadas propriedades de acordo com normas internacionais.
A utilização de métodos de aprendizagem de máquina para a caracterização do
biodiesel permite economia de tempo e dinheiro. Neste trabalho é mostrado que para a
determinação da conformidade de um biodiesel os classificadores SVM, KNN e Árvore de
decisões apresentam melhores resultados que os métodos de predição de trabalhos anteriores.
Para as propriedades de viscosidade densidade, índice de iodo e estabilidade oxidativa
(RANP 45/2014, EN14214:2014 e ASTM D6751-15) os classificadores KNN e Árvore de
decisões apresentaram-se como melhores opções. Estes resultados mostram que os classificadores
podem ser aplicados de forma prática visando economia de tempo, recursos
financeiros e humanos.
|
Page generated in 0.0353 seconds