281 |
A Study of Several Statistical Methods for Classification with Application to Microbial Source TrackingZhong, Xiao 30 April 2004 (has links)
With the advent of computers and the information age, vast amounts of data generated in a great deal of science and industry fields require the statisticians to explore further. In particular, statistical and computational problems in biology and medicine have created a new field of bioinformatics, which is attracting more and more statisticians, computer scientists, and biologists. Several procedures have been developed for tracing the source of fecal pollution in water resources based on certain characteristics of certain microorganisms. Use of this collection of techniques has been termed microbial source tracking (MST). Most of the current methods for MST are based on patterns of either phenotypic or genotypic variation in indicator organisms. Studies also suggested that patterns of genotypic variation might be more reliable due to their less association with environmental factors than those of phenotypic variation. Among the genotypic methods for source tracking, fingerprinting via rep-PCR is most common. Thus, identifying the specific pollution sources in contaminated waters based on rep-PCR fingerprinting techniques, viewed as a classification problem, has become an increasingly popular research topic in bioinformatics. In the project, several statistical methods for classification were studied, including linear discriminant analysis, quadratic discriminant analysis, logistic regression, and $k$-nearest-neighbor rules, neural networks and support vector machine. This project report summaries each of these methods and relevant statistical theory. In addition, an application of these methods to a particular set of MST data is presented and comparisons are made.
|
282 |
Data, learning and privacy in recommendation systems / Données, apprentissage et respect de la vie privée dans les systèmes de recommandationMittal, Nupur 25 November 2016 (has links)
Les systèmes de recommandation sont devenus une partie indispensable des services et des applications d’internet, en particulier dû à la surcharge de données provenant de nombreuses sources. Quel que soit le type, chaque système de recommandation a des défis fondamentaux à traiter. Dans ce travail, nous identifions trois défis communs, rencontrés par tous les types de systèmes de recommandation: les données, les modèles d'apprentissage et la protection de la vie privée. Nous élaborons différents problèmes qui peuvent être créés par des données inappropriées en mettant l'accent sur sa qualité et sa quantité. De plus, nous mettons en évidence l'importance des réseaux sociaux dans la mise à disposition publique de systèmes de recommandation contenant des données sur ses utilisateurs, afin d'améliorer la qualité des recommandations. Nous fournissons également les capacités d'inférence de données publiques liées à des données relatives aux utilisateurs. Dans notre travail, nous exploitons cette capacité à améliorer la qualité des recommandations, mais nous soutenons également qu'il en résulte des menaces d'atteinte à la vie privée des utilisateurs sur la base de leurs informations. Pour notre second défi, nous proposons une nouvelle version de la méthode des k plus proches voisins (knn, de l'anglais k-nearest neighbors), qui est une des méthodes d'apprentissage parmi les plus populaires pour les systèmes de recommandation. Notre solution, conçue pour exploiter la nature bipartie des ensembles de données utilisateur-élément, est évolutive, rapide et efficace pour la construction d'un graphe knn et tire sa motivation de la grande quantité de ressources utilisées par des calculs de similarité dans les calculs de knn. Notre algorithme KIFF utilise des expériences sur des jeux de données réelles provenant de divers domaines, pour démontrer sa rapidité et son efficacité lorsqu'il est comparé à des approches issues de l'état de l'art. Pour notre dernière contribution, nous fournissons un mécanisme permettant aux utilisateurs de dissimuler leur opinion sur des réseaux sociaux sans pour autant dissimuler leur identité. / Recommendation systems have gained tremendous popularity, both in academia and industry. They have evolved into many different varieties depending mostly on the techniques and ideas used in their implementation. This categorization also marks the boundary of their application domain. Regardless of the types of recommendation systems, they are complex and multi-disciplinary in nature, involving subjects like information retrieval, data cleansing and preprocessing, data mining etc. In our work, we identify three different challenges (among many possible) involved in the process of making recommendations and provide their solutions. We elaborate the challenges involved in obtaining user-demographic data, and processing it, to render it useful for making recommendations. The focus here is to make use of Online Social Networks to access publicly available user data, to help the recommendation systems. Using user-demographic data for the purpose of improving the personalized recommendations, has many other advantages, like dealing with the famous cold-start problem. It is also one of the founding pillars of hybrid recommendation systems. With the help of this work, we underline the importance of user’s publicly available information like tweets, posts, votes etc. to infer more private details about her. As the second challenge, we aim at improving the learning process of recommendation systems. Our goal is to provide a k-nearest neighbor method that deals with very large amount of datasets, surpassing billions of users. We propose a generic, fast and scalable k-NN graph construction algorithm that improves significantly the performance as compared to the state-of-the art approaches. Our idea is based on leveraging the bipartite nature of the underlying dataset, and use a preprocessing phase to reduce the number of similarity computations in later iterations. As a result, we gain a speed-up of 14 compared to other significant approaches from literature. Finally, we also consider the issue of privacy. Instead of directly viewing it under trivial recommendation systems, we analyze it on Online Social Networks. First, we reason how OSNs can be seen as a form of recommendation systems and how information dissemination is similar to broadcasting opinion/reviews in trivial recommendation systems. Following this parallelism, we identify privacy threat in information diffusion in OSNs and provide a privacy preserving algorithm for the same. Our algorithm Riposte quantifies the privacy in terms of differential privacy and with the help of experimental datasets, we demonstrate how Riposte maintains the desirable information diffusion properties of a network.
|
283 |
An IoT Solution for Urban Noise Identification in Smart Cities : Noise Measurement and ClassificationAlsouda, Yasser January 2019 (has links)
Noise is defined as any undesired sound. Urban noise and its effect on citizens area significant environmental problem, and the increasing level of noise has become a critical problem in some cities. Fortunately, noise pollution can be mitigated by better planning of urban areas or controlled by administrative regulations. However, the execution of such actions requires well-established systems for noise monitoring. In this thesis, we present a solution for noise measurement and classification using a low-power and inexpensive IoT unit. To measure the noise level, we implement an algorithm for calculating the sound pressure level in dB. We achieve a measurement error of less than 1 dB. Our machine learning-based method for noise classification uses Mel-frequency cepstral coefficients for audio feature extraction and four supervised classification algorithms (that is, support vector machine, k-nearest neighbors, bootstrap aggregating, and random forest). We evaluate our approach experimentally with a dataset of about 3000 sound samples grouped in eight sound classes (such as car horn, jackhammer, or street music). We explore the parameter space of the four algorithms to estimate the optimal parameter values for the classification of sound samples in the dataset under study. We achieve noise classification accuracy in the range of 88% – 94%.
|
284 |
Um estudo sobre a extraÃÃo de caracterÃsticas e a classificaÃÃo de imagens invariantes à rotaÃÃo extraÃdas de um sensor industrial 3D / A study on the extraction of characteristics and the classification of invariant images through the rotation of an 3D industrial sensorRodrigo Dalvit Carvalho da Silva 08 May 2014 (has links)
CoordenaÃÃo de AperfeÃoamento de Pessoal de NÃvel Superior / Neste trabalho, à discutido o problema de reconhecimento de objetos utilizando imagens extraÃdas de um sensor industrial 3D. NÃs nos concentramos em 9 extratores de caracterÃsticas, dos quais 7 sÃo baseados nos momentos invariantes (Hu, Zernike, Legendre, Fourier-Mellin, Tchebichef, Bessel-Fourier e Gaussian-Hermite), um outro à baseado na Transformada de Hough e o Ãltimo na anÃlise de componentes independentes, e, 4 classificadores, Naive Bayes, k-Vizinhos mais PrÃximos, MÃquina de Vetor de Suporte e Rede Neural Artificial-Perceptron Multi-Camadas. Para a escolha do melhor extrator de caracterÃsticas, foram comparados os seus desempenhos de classificaÃÃo em termos de taxa de acerto e de tempo de extraÃÃo, atravÃs do classificador k-Vizinhos mais PrÃximos utilizando distÃncia euclidiana. O extrator de caracterÃsticas baseado nos momentos de Zernike obteve as melhores taxas de acerto, 98.00%, e tempo relativamente baixo de extraÃÃo de caracterÃsticas, 0.3910 segundos. Os dados gerados a partir deste, foram apresentados a diferentes heurÃsticas de classificaÃÃo. Dentre os classificadores testados, o classificador k-Vizinhos mais PrÃximos, obteve a melhor taxa mÃdia de acerto, 98.00% e, tempo mÃdio de classificaÃÃo relativamente baixo, 0.0040 segundos, tornando-se o classificador mais adequado para a aplicaÃÃo deste estudo. / In this work, the problem of recognition of objects using images extracted from a 3D industrial sensor is discussed. We focus in 9 feature extractors (where seven are based on invariant moments -Hu, Zernike, Legendre, Fourier-Mellin, Tchebichef, BesselâFourier and Gaussian-Hermite-, another is based on the Hough transform and the last one on independent component analysis), and 4 classifiers (Naive Bayes, k-Nearest Neighbor, Support Vector machines and Artificial Neural Network-Multi-Layer Perceptron). To choose the best feature extractor, their performance was compared in terms of classification accuracy rate and extraction time by the k-nearest neighbors classifier using euclidean distance. The feature extractor based on Zernike moments, got the best hit rates, 98.00 %, and relatively low time feature extraction, 0.3910 seconds. The data generated from this, were presented to different heuristic classification. Among the tested classifiers, the k-nearest neighbors classifier achieved the highest average hit rate, 98.00%, and average time of relatively low rank, 0.0040 seconds, thus making it the most suitable classifier for the implementation of this study.
|
285 |
CircularTrip and ArcTrip:effective grid access methods for continuous spatial queries.Cheema, Muhammad Aamir, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links)
A k nearest neighbor query q retrieves k objects that lie closest to the query point q among a given set of objects P. With the availability of inexpensive location aware mobile devices, the continuous monitoring of such queries has gained lot of attention and many methods have been proposed for continuously monitoring the kNNs in highly dynamic environment. Multiple continuous queries require real-time results and both the objects and queries issue frequent location updates. Most popular spatial index, R-tree, is not suitable for continuous monitoring of these queries due to its inefficiency in handling frequent updates. Recently, the interest of database community has been shifting towards using grid-based index for continuous queries due to its simplicity and efficient update handling. For kNN queries, the order in which cells of the grid are accessed is very important. In this research, we present two efficient and effective grid access methods, CircularTrip and ArcTrip, that ensure that the number of cells visited for any continuous kNN query is minimum. Our extensive experimental study demonstrates that CircularTrip-based continuous kNN algorithm outperforms existing approaches in terms of both efficiency and space requirement. Moreover, we show that CircularTrip and ArcTrip can be used for many other variants of nearest neighbor queries like constrained nearest neighbor queries, farthest neighbor queries and (k + m)-NN queries. All the algorithms presented for these queries preserve the properties that they visit minimum number of cells for each query and the space requirement is low. Our proposed techniques are flexible and efficient and can be used to answer any query that is hybrid of above mentioned queries. For example, our algorithms can easily be used to efficiently monitor a (k + m) farthest neighbor query in a constrained region with the flexibility that the spatial conditions that constrain the region can be changed by the user at any time.
|
286 |
Numerical Evaluation of Classification Techniques for Flaw DetectionVallamsundar, Suriyapriya January 2007 (has links)
Nondestructive testing is used extensively throughout the industry for quality assessment and detection of defects in engineering materials. The range and variety of anomalies is enormous and critical assessment of their location and size is often complicated. Depending upon final operational considerations, some of these anomalies may be critical and their detection and classification is therefore of importance. Despite the several advantages of using Nondestructive testing for flaw detection, the conventional NDT techniques based on the heuristic experience-based pattern identification methods have many drawbacks in terms of cost, length and result in erratic analysis and thus lead to discrepancies in results.
The use of several statistical and soft computing techniques in the evaluation and classification operations result in the development of an automatic decision support system for defect characterization that offers the possibility of an impartial standardized performance. The present work evaluates the application of both supervised and unsupervised classification techniques for flaw detection and classification in a semi-infinite half space. Finite element models to simulate the MASW test in the presence and absence of voids were developed using the commercial package LS-DYNA. To simulate anomalies, voids of different sizes were inserted on elastic medium. Features for the discrimination of received responses were extracted in time and frequency domains by applying suitable transformations. The compact feature vector is then classified by different techniques: supervised classification (backpropagation neural network, adaptive neuro-fuzzy inference system, k-nearest neighbor classifier, linear discriminate classifier) and unsupervised classification (fuzzy c-means clustering). The classification results show that the performance of k-nearest Neighbor Classifier proved superior when compared with the other techniques with an overall accuracy of 94% in detection of presence of voids and an accuracy of 81% in determining the size of the void in the medium. The assessment of the various classifiers’ performance proved to be valuable in comparing the different techniques and establishing the applicability of simplified classification methods such as k-NN in defect characterization.
The obtained classification accuracies for the detection and classification of voids are very encouraging, showing the suitability of the proposed approach to the development of a decision support system for non-destructive testing of materials for defect characterization.
|
287 |
TOP-K AND SKYLINE QUERY PROCESSING OVER RELATIONAL DATABASESamara, Rafat January 2012 (has links)
Top-k and Skyline queries are a long study topic in database and information retrieval communities and they are two popular operations for preference retrieval. Top-k query returns a subset of the most relevant answers instead of all answers. Efficient top-k processing retrieves the k objects that have the highest overall score. In this paper, some algorithms that are used as a technique for efficient top-k processing for different scenarios have been represented. A framework based on existing algorithms with considering based cost optimization that works for these scenarios has been presented. This framework will be used when the user can determine the user ranking function. A real life scenario has been applied on this framework step by step. Skyline query returns a set of points that are not dominated (a record x dominates another record y if x is as good as y in all attributes and strictly better in at least one attribute) by other points in the given datasets. In this paper, some algorithms that are used for evaluating the skyline query have been introduced. One of the problems in the skyline query which is called curse of dimensionality has been presented. A new strategy that based on the skyline existing algorithms, skyline frequency and the binary tree strategy which gives a good solution for this problem has been presented. This new strategy will be used when the user cannot determine the user ranking function. A real life scenario is presented which apply this strategy step by step. Finally, the advantages of the top-k query have been applied on the skyline query in order to have a quickly and efficient retrieving results.
|
288 |
Numerical Evaluation of Classification Techniques for Flaw DetectionVallamsundar, Suriyapriya January 2007 (has links)
Nondestructive testing is used extensively throughout the industry for quality assessment and detection of defects in engineering materials. The range and variety of anomalies is enormous and critical assessment of their location and size is often complicated. Depending upon final operational considerations, some of these anomalies may be critical and their detection and classification is therefore of importance. Despite the several advantages of using Nondestructive testing for flaw detection, the conventional NDT techniques based on the heuristic experience-based pattern identification methods have many drawbacks in terms of cost, length and result in erratic analysis and thus lead to discrepancies in results.
The use of several statistical and soft computing techniques in the evaluation and classification operations result in the development of an automatic decision support system for defect characterization that offers the possibility of an impartial standardized performance. The present work evaluates the application of both supervised and unsupervised classification techniques for flaw detection and classification in a semi-infinite half space. Finite element models to simulate the MASW test in the presence and absence of voids were developed using the commercial package LS-DYNA. To simulate anomalies, voids of different sizes were inserted on elastic medium. Features for the discrimination of received responses were extracted in time and frequency domains by applying suitable transformations. The compact feature vector is then classified by different techniques: supervised classification (backpropagation neural network, adaptive neuro-fuzzy inference system, k-nearest neighbor classifier, linear discriminate classifier) and unsupervised classification (fuzzy c-means clustering). The classification results show that the performance of k-nearest Neighbor Classifier proved superior when compared with the other techniques with an overall accuracy of 94% in detection of presence of voids and an accuracy of 81% in determining the size of the void in the medium. The assessment of the various classifiers’ performance proved to be valuable in comparing the different techniques and establishing the applicability of simplified classification methods such as k-NN in defect characterization.
The obtained classification accuracies for the detection and classification of voids are very encouraging, showing the suitability of the proposed approach to the development of a decision support system for non-destructive testing of materials for defect characterization.
|
289 |
Doppler Radar Data Processing And ClassificationAygar, Alper 01 September 2008 (has links) (PDF)
In this thesis, improving the performance of the automatic recognition of the Doppler radar targets is studied. The radar used in this study is a ground-surveillance doppler radar. Target types are car, truck, bus, tank, helicopter, moving man and running man. The input of this thesis is the output of the real doppler radar signals which are normalized and preprocessed (TRP vectors: Target Recognition Pattern vectors) in the doctorate thesis by Erdogan (2002). TRP vectors are normalized and homogenized doppler radar target signals with respect to target speed, target aspect angle and target range. Some target classes have repetitions in time in their TRPs. By the use of these repetitions, improvement of the target type classification performance is studied. K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms are used for doppler radar target classification and the results are evaluated. Before classification PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis), NMF (Nonnegative Matrix Factorization) and ICA (Independent Component Analysis) are implemented and applied to normalized doppler radar signals for feature extraction and dimension reduction in an efficient way. These techniques transform the input vectors, which are the normalized doppler radar signals, to another space. The effects of the implementation of these feature extraction algoritms and the use of the repetitions in doppler radar target signals on the doppler radar target classification performance are studied.
|
290 |
Kombination von terrestrischen Aufnahmen und Fernerkundungsdaten mit Hilfe der kNN-Methode zur Klassifizierung und Kartierung von Wäldern / Combination of field data and remote sensing data with the knn-method (k-nearest neighbors method) for classification and mapping of forestsStümer, Wolfgang 30 August 2004 (has links) (PDF)
Bezüglich des Waldes hat sich in den letzten Jahren seitens der Politik und Wirtschaft ein steigender Informationsbedarf entwickelt. Zur Bereitstellung dieses Bedarfes stellt die Fernerkundung ein wichtiges Hilfsmittel dar, mit dem sich flächendeckende Datengrundlagen erstellen lassen. Die k-nächsten-Nachbarn-Methode (kNN-Methode), die terrestrische Aufnahmen mit Fernerkundungsdaten kombiniert, stellt eine Möglichkeit dar, diese Datengrundlage mit Hilfe der Fernerkundung zu verwirklichen. Deshalb beschäftigt sich die vorliegende Dissertation eingehend mit der kNN-Methode. An Hand der zwei Merkmale Grundfläche (metrische Daten) und Totholz (kategoriale Daten) wurden umfangreiche Berechnungen durchgeführt, wobei verschiedenste Variationen der kNN-Methode berücksichtigt wurden. Diese Variationen umfassen verschiedenste Einstellungen der Distanzfunktion, der Wichtungsfunktion und der Anzahl k-nächsten Nachbarn. Als Fernerkundungsdatenquellen kamen Landsat- und Hyperspektraldaten zum Einsatz, die sich sowohl von ihrer spektralen wie auch ihrer räumlichen Auflösung unterscheiden. Mit Hilfe von Landsat-Szenen eines Gebietes von verschiedenen Zeitpunkten wurde außerdem der multitemporale Ansatz berücksichtigt. Die terrestrische Datengrundlage setzt sich aus Feldaufnahmen mit verschiedenen Aufnahmedesigns zusammen, wobei ein wichtiges Kriterium die gleichmäßige Verteilung von Merkmalswerten (z.B. Grundflächenwerten) über den Merkmalsraum darstellt. Für die Durchführung der Berechnungen wurde ein Programm mit Visual Basic programmiert, welches mit der Integrierung aller Funktionen auf der Programmoberfläche eine benutzerfreundliche Bedienung ermöglicht. Die pixelweise Ausgabe der Ergebnisse mündete in detaillierte Karten und die Verifizierung der Ergebnisse wurde mit Hilfe des prozentualen Root Mean Square Error und der Bootstrap-Methode durchgeführt. Die erzielten Genauigkeiten für das Merkmal Grundfläche liegen zwischen 35 % und 67 % (Landsat) bzw. zwischen 65 % und 67 % (HyMapTM). Für das Merkmal Totholz liegen die Übereinstimmungen zwischen den kNN-Schätzern und den Referenzwerten zwischen 60,0 % und 73,3 % (Landsat) und zwischen 60,0 % und 63,3 % (HyMapTM). Mit den erreichten Genauigkeiten bietet sich die kNN-Methode für die Klassifizierung von Beständen bzw. für die Integrierung in Klassifizierungsverfahren an. / Mapping forest variables and associated characteristics is fundamental for forest planning and management. The following work describes the k-nearest neighbors (kNN) method for improving estimations and to produce maps for the attributes basal area (metric data) and deadwood (categorical data). Several variations within the kNN-method were tested, including: distance metric, weighting function and number of neighbors. As sources of remote sensing Landsat TM satellite images and hyper spectral data were used, which differ both from their spectral as well as their spatial resolutions. Two Landsat scenes from the same area acquired September 1999 and 2000 regard multiple approaches. The field data for the kNN- method comprise tree field measurements which were collected from the test site Tharandter Wald (Germany). The three field data collections are characterized by three different designs. For the kNN calculation a program with integration all kNN functions were developed. The relative root mean square errors (RMSE) and the Bootstrap method were evaluated in order to find optimal parameters. The estimation accuracy for the attribute basal area is between 35 % and 67 % (Landsat) and 65 % and 67 % (HyMapTM). For the attribute deadwood is the accuracy between 60 % and 73 % (Landsat) and 60 % and 63 % (HyMapTM). Recommendations for applying the kNN method for mapping and regional estimation are provided.
|
Page generated in 0.0774 seconds