Global ETD Search

151	Soluções aproximadas para algoritmos escaláveis de mineração de dados em domínios de dados complexos usando GPGPU / On approximate solutions to scalable data mining algorithms for complex data problems using GPGPU Mamani, Alexander Victor Ocsa 22 September 2011 (has links) A crescente disponibilidade de dados em diferentes domínios tem motivado o desenvolvimento de técnicas para descoberta de conhecimento em grandes volumes de dados complexos. Trabalhos recentes mostram que a busca em dados complexos é um campo de pesquisa importante, já que muitas tarefas de mineração de dados, como classificação, detecção de agrupamentos e descoberta de motifs, dependem de algoritmos de busca ao vizinho mais próximo. Para resolver o problema da busca dos vizinhos mais próximos em domínios complexos muitas abordagens determinísticas têm sido propostas com o objetivo de reduzir os efeitos da maldição da alta dimensionalidade. Por outro lado, algoritmos probabilísticos têm sido pouco explorados. Técnicas recentes relaxam a precisão dos resultados a fim de reduzir o custo computacional da busca. Além disso, em problemas de grande escala, uma solução aproximada com uma análise teórica sólida mostra-se mais adequada que uma solução exata com um modelo teórico fraco. Por outro lado, apesar de muitas soluções exatas e aproximadas de busca e mineração terem sido propostas, o modelo de programação em CPU impõe restrições de desempenho para esses tipos de solução. Uma abordagem para melhorar o tempo de execução de técnicas de recuperação e mineração de dados em várias ordens de magnitude é empregar arquiteturas emergentes de programação paralela, como a arquitetura CUDA. Neste contexto, este trabalho apresenta uma proposta para buscas kNN de alto desempenho baseada numa técnica de hashing e implementações paralelas em CUDA. A técnica proposta é baseada no esquema LSH, ou seja, usa-se projeções em subespac¸os. O LSH é uma solução aproximada e tem a vantagem de permitir consultas de custo sublinear para dados em altas dimensões. Usando implementações massivamente paralelas melhora-se tarefas de mineração de dados. Especificamente, foram desenvolvidos soluções de alto desempenho para algoritmos de descoberta de motifs baseados em implementações paralelas de consultas kNN. As implementações massivamente paralelas em CUDA permitem executar estudos experimentais sobre grandes conjuntos de dados reais e sintéticos. A avaliação de desempenho realizada neste trabalho usando GeForce GTX470 GPU resultou em um aumento de desempenho de até 7 vezes, em média sobre o estado da arte em buscas por similaridade e descoberta de motifs / The increasing availability of data in diverse domains has created a necessity to develop techniques and methods to discover knowledge from huge volumes of complex data, motivating many research works in databases, data mining and information retrieval communities. Recent studies have suggested that searching in complex data is an interesting research field because many data mining tasks such as classification, clustering and motif discovery depend on nearest neighbor search algorithms. Thus, many deterministic approaches have been proposed to solve the nearest neighbor search problem in complex domains, aiming to reduce the effects of the well-known curse of dimensionality. On the other hand, probabilistic algorithms have been slightly explored. Recently, new techniques aim to reduce the computational cost relaxing the quality of the query results. Moreover, in large-scale problems, an approximate solution with a solid theoretical analysis seems to be more appropriate than an exact solution with a weak theoretical model. On the other hand, even though several exact and approximate solutions have been proposed, single CPU architectures impose limits on performance to deliver these kinds of solution. An approach to improve the runtime of data mining and information retrieval techniques by an order-of-magnitude is to employ emerging many-core architectures such as CUDA-enabled GPUs. In this work we present a massively parallel kNN query algorithm based on hashing and CUDA implementation. Our method, based on the LSH scheme, is an approximate method which queries high-dimensional datasets with sub-linear computational time. By using the massively parallel implementation we improve data mining tasks, specifically we create solutions for (soft) realtime time series motif discovery. Experimental studies on large real and synthetic datasets were carried out thanks to the highly CUDA parallel implementation. Our performance evaluation on GeForce GTX 470 GPU resulted in average runtime speedups of up to 7x on the state-of-art of similarity search and motif discovery solutions Busca ao vizinho mais próximo Busca por similaridade aproximada Complex data CUDA CUDA Dados complexos Data mining Descoberta der motifs GPGPU GPGPU Mineração de dados Motif-discovery Nearest neighbor search Projeção aleatória Random projection Similarity serach
152	Empirical RF Propagation Modeling of Human Body Motions for Activity Classification Fu, Ruijun 19 December 2012 (has links) "Many current and future medical devices are wearable, using the human body as a conduit for wireless communication, which implies that human body serves as a crucial part of the transmission medium in body area networks (BANs). Implantable medical devices such as Pacemaker and Cardiac Defibrillators are designed to provide patients with timely monitoring and treatment. Endoscopy capsules, pH Monitors and blood pressure sensors are used as clinical diagnostic tools to detect physiological abnormalities and replace traditional wired medical devices. Body-mounted sensors need to be investigated for use in providing a ubiquitous monitoring environment. In order to better design these medical devices, it is important to understand the propagation characteristics of channels for in-body and on- body wireless communication in BANs. The IEEE 802.15.6 Task Group 6 is officially working on the standardization of Body Area Network, including the channel modeling and communication protocol design. This thesis is focused on the propagation characteristics of human body movements. Specifically, standing, walking and jogging motions are measured, evaluated and analyzed using an empirical approach. Using a network analyzer, probabilistic models are derived for the communication links in the medical implant communication service band (MICS), the industrial scientific medical band (ISM) and the ultra- wideband (UWB) band. Statistical distributions of the received signal strength and second order statistics are presented to evaluate the link quality and outage performance for on-body to on- body communications at different antenna separations. The Normal distribution, Gamma distribution, Rayleigh distribution, Weibull distribution, Nakagami-m distribution, and Lognormal distribution are considered as potential models to describe the observed variation of received signal strength. Doppler spread in the frequency domain and coherence time in the time domain from temporal variations is analyzed to characterize the stability of the channels induced by human body movements. The shape of the Doppler spread spectrum is also investigated to describe the relationship of the power and frequency in the frequency domain. All these channel characteristics could be used in the design of communication protocols in BANs, as well as providing features to classify different human body activities. Realistic data extracted from built-in sensors in smart devices were used to assist in modeling and classification of human body movements along with the RF sensors. Variance, energy and frequency domain entropy of the data collected from accelerometer and orientation sensors are pre- processed as features to be used in machine learning algorithms. Activity classifiers with Backpropagation Network, Probabilistic Neural Network, k-Nearest Neighbor algorithm and Support Vector Machine are discussed and evaluated as means to discriminate human body motions. The detection accuracy can be improved with both RF and inertial sensors." Energy Entropy Backpropagation Probabilistic Neural Network Support Vector Machine k-Nearest Neighbor Averaged Fade Duration Level Crossing Rate Statistical Characterization Body Area Networks Activity Classification Coherence Time Doppler Spread Spectrum Doppler Spread RF Propagation
153	A Study of Several Statistical Methods for Classification with Application to Microbial Source Tracking Zhong, Xiao 30 April 2004 (has links) With the advent of computers and the information age, vast amounts of data generated in a great deal of science and industry fields require the statisticians to explore further. In particular, statistical and computational problems in biology and medicine have created a new field of bioinformatics, which is attracting more and more statisticians, computer scientists, and biologists. Several procedures have been developed for tracing the source of fecal pollution in water resources based on certain characteristics of certain microorganisms. Use of this collection of techniques has been termed microbial source tracking (MST). Most of the current methods for MST are based on patterns of either phenotypic or genotypic variation in indicator organisms. Studies also suggested that patterns of genotypic variation might be more reliable due to their less association with environmental factors than those of phenotypic variation. Among the genotypic methods for source tracking, fingerprinting via rep-PCR is most common. Thus, identifying the specific pollution sources in contaminated waters based on rep-PCR fingerprinting techniques, viewed as a classification problem, has become an increasingly popular research topic in bioinformatics. In the project, several statistical methods for classification were studied, including linear discriminant analysis, quadratic discriminant analysis, logistic regression, and $k$-nearest-neighbor rules, neural networks and support vector machine. This project report summaries each of these methods and relevant statistical theory. In addition, an application of these methods to a particular set of MST data is presented and comparisons are made. classification k-nearest-neighbor (k-n-n) neural networks linear discriminant analysis (LDA) support vector machines microbial source tracking (MST) quadratic discriminant analysis (QDA) logistic regression Bioinformatics Water Pollution Measurement Technique Microbial source tracking
154	Undersökning om hjulmotorströmmar kan användas som alternativ metod för kollisiondetektering i autonoma gräsklippare. : Klassificering av hjulmotorströmmar med KNN och MLP. / Investigation if wheel motor currents can be used as an alternative method for collision detection in robotic lawn mowers Bertilsson, Tobias, Johansson, Romario January 2019 (has links) Purpose – The purpose of the study is to expand the knowledge of how wheel motor currents can be combined with machine learning to be used in a collision detection system for autonomous robots, in order to decrease the number of external sensors and open new design opportunities and lowering production costs. Method – The study is conducted with design science research where two artefacts are developed in a cooperation with Globe Tools Group. The artefacts are evaluated in how they categorize data given by an autonomous robot in the two categories collision and non-collision. The artefacts are then tested by generated data to analyse their ability to categorize. Findings – Both artefacts showed a 100 % accuracy in detecting the collisions in the given data by the autonomous robot. In the second part of the experiment the artefacts show that they have different decision boundaries in how they categorize the data, which will make them useful in different applications. Implications – The study contributes to an expanding knowledge in how machine learning and wheel motor currents can be used in a collision detection system. The results can lead to lowering production costs and opening new design opportunities. Limitations – The data used in the study is gathered by an autonomous robot which only did frontal collisions on an artificial lawn. Keywords – Machine learning, K-Nearest Neighbour, Multilayer Perceptron, collision detection, autonomous robots, Collison detection based on current. / Syfte – Studiens syfte är att utöka kunskapen om hur hjulmotorstömmar kan kombineras med maskininlärning för att användas vid kollisionsdetektion hos autonoma robotar, detta för att kunna minska antalet krävda externa sensorer hos dessa robotar och på så sätt öppna upp design möjligheter samt minska produktionskostnader Metod – Studien genomfördes med design science research där två artefakter utvecklades i samarbete med Globe Tools Group. Artefakterna utvärderades sedan i hur de kategoriserade kollisioner utifrån en given datamängd som genererades från en autonom gräsklippare. Studiens experiment introducerade sedan in data som inte ingick i samma datamängd för att se hur metoderna kategoriserade detta. Resultat – Artefakterna klarade med 100% noggrannhet att detektera kollisioner i den giva datamängden som genererades. Dock har de två olika artefakterna olika beslutsregioner i hur de kategoriserar datamängderna till kollision samt icke-kollisioner, vilket kan ge dom olika användningsområden Implikationer – Examensarbetet bidrar till en ökad kunskap om hur maskininlärning och hjulmotorströmmar kan användas i ett kollisionsdetekteringssystem. Studiens resultat kan bidra till minskade kostnader i produktion samt nya design möjligheter Begränsningar – Datamängden som användes i studien samlades endast in av en autonom gräsklippare som gjorde frontalkrockar med underlaget konstgräs. Nyckelord – Maskininlärning, K-nearest neighbor, Multi-layer perceptron, kollisionsdetektion, autonoma robotar Machine learning K-Nearest Neighbour Multilayer Perceptron collision detection autonomous robots Collison detection based on current. Maskininlärning K-nearest neighbor Multi-layer perceptron kollisionsdetektion autonoma robotar Embedded Systems Inbäddad systemteknik
155	CircularTrip and ArcTrip:effective grid access methods for continuous spatial queries. Cheema, Muhammad Aamir, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) A k nearest neighbor query q retrieves k objects that lie closest to the query point q among a given set of objects P. With the availability of inexpensive location aware mobile devices, the continuous monitoring of such queries has gained lot of attention and many methods have been proposed for continuously monitoring the kNNs in highly dynamic environment. Multiple continuous queries require real-time results and both the objects and queries issue frequent location updates. Most popular spatial index, R-tree, is not suitable for continuous monitoring of these queries due to its inefficiency in handling frequent updates. Recently, the interest of database community has been shifting towards using grid-based index for continuous queries due to its simplicity and efficient update handling. For kNN queries, the order in which cells of the grid are accessed is very important. In this research, we present two efficient and effective grid access methods, CircularTrip and ArcTrip, that ensure that the number of cells visited for any continuous kNN query is minimum. Our extensive experimental study demonstrates that CircularTrip-based continuous kNN algorithm outperforms existing approaches in terms of both efficiency and space requirement. Moreover, we show that CircularTrip and ArcTrip can be used for many other variants of nearest neighbor queries like constrained nearest neighbor queries, farthest neighbor queries and (k + m)-NN queries. All the algorithms presented for these queries preserve the properties that they visit minimum number of cells for each query and the space requirement is low. Our proposed techniques are flexible and efficient and can be used to answer any query that is hybrid of above mentioned queries. For example, our algorithms can easily be used to efficiently monitor a (k + m) farthest neighbor query in a constrained region with the flexibility that the spatial conditions that constrain the region can be changed by the user at any time. Nearest neighbor analysis (Statistics) Querying (Computer science) Query languages (Computer science) Data structures (Computer science) Real-time data processing. Computational grids (Computer systems) Computer algorithms.
156	Numerical Evaluation of Classification Techniques for Flaw Detection Vallamsundar, Suriyapriya January 2007 (has links) Nondestructive testing is used extensively throughout the industry for quality assessment and detection of defects in engineering materials. The range and variety of anomalies is enormous and critical assessment of their location and size is often complicated. Depending upon final operational considerations, some of these anomalies may be critical and their detection and classification is therefore of importance. Despite the several advantages of using Nondestructive testing for flaw detection, the conventional NDT techniques based on the heuristic experience-based pattern identification methods have many drawbacks in terms of cost, length and result in erratic analysis and thus lead to discrepancies in results. The use of several statistical and soft computing techniques in the evaluation and classification operations result in the development of an automatic decision support system for defect characterization that offers the possibility of an impartial standardized performance. The present work evaluates the application of both supervised and unsupervised classification techniques for flaw detection and classification in a semi-infinite half space. Finite element models to simulate the MASW test in the presence and absence of voids were developed using the commercial package LS-DYNA. To simulate anomalies, voids of different sizes were inserted on elastic medium. Features for the discrimination of received responses were extracted in time and frequency domains by applying suitable transformations. The compact feature vector is then classified by different techniques: supervised classification (backpropagation neural network, adaptive neuro-fuzzy inference system, k-nearest neighbor classifier, linear discriminate classifier) and unsupervised classification (fuzzy c-means clustering). The classification results show that the performance of k-nearest Neighbor Classifier proved superior when compared with the other techniques with an overall accuracy of 94% in detection of presence of voids and an accuracy of 81% in determining the size of the void in the medium. The assessment of the various classifiers’ performance proved to be valuable in comparing the different techniques and establishing the applicability of simplified classification methods such as k-NN in defect characterization. The obtained classification accuracies for the detection and classification of voids are very encouraging, showing the suitability of the proposed approach to the development of a decision support system for non-destructive testing of materials for defect characterization. Non-destructive testing flaw detection finite element modeling LS-DYNA classification techniques tools of artificial intelligence neural network fuzzy logic linear discriminate analysis k nearest neighbor Rayleigh waves Lamb source confusion matrix seismic waves Civil Engineering
157	TOP-K AND SKYLINE QUERY PROCESSING OVER RELATIONAL DATABASE Samara, Rafat January 2012 (has links) Top-k and Skyline queries are a long study topic in database and information retrieval communities and they are two popular operations for preference retrieval. Top-k query returns a subset of the most relevant answers instead of all answers. Efficient top-k processing retrieves the k objects that have the highest overall score. In this paper, some algorithms that are used as a technique for efficient top-k processing for different scenarios have been represented. A framework based on existing algorithms with considering based cost optimization that works for these scenarios has been presented. This framework will be used when the user can determine the user ranking function. A real life scenario has been applied on this framework step by step. Skyline query returns a set of points that are not dominated (a record x dominates another record y if x is as good as y in all attributes and strictly better in at least one attribute) by other points in the given datasets. In this paper, some algorithms that are used for evaluating the skyline query have been introduced. One of the problems in the skyline query which is called curse of dimensionality has been presented. A new strategy that based on the skyline existing algorithms, skyline frequency and the binary tree strategy which gives a good solution for this problem has been presented. This new strategy will be used when the user cannot determine the user ranking function. A real life scenario is presented which apply this strategy step by step. Finally, the advantages of the top-k query have been applied on the skyline query in order to have a quickly and efficient retrieving results. Top-k query Skyline query Fagin’s algorithm Threshold Algorithm No random access algorithm Minimal Probing algorithm Block-Nested-Loop algorithm Nearest Neighbor algorithm Branch and Bound Skyline Algorithm Divide and Conquer algorithm
158	Numerical Evaluation of Classification Techniques for Flaw Detection Vallamsundar, Suriyapriya January 2007 (has links) Nondestructive testing is used extensively throughout the industry for quality assessment and detection of defects in engineering materials. The range and variety of anomalies is enormous and critical assessment of their location and size is often complicated. Depending upon final operational considerations, some of these anomalies may be critical and their detection and classification is therefore of importance. Despite the several advantages of using Nondestructive testing for flaw detection, the conventional NDT techniques based on the heuristic experience-based pattern identification methods have many drawbacks in terms of cost, length and result in erratic analysis and thus lead to discrepancies in results. The use of several statistical and soft computing techniques in the evaluation and classification operations result in the development of an automatic decision support system for defect characterization that offers the possibility of an impartial standardized performance. The present work evaluates the application of both supervised and unsupervised classification techniques for flaw detection and classification in a semi-infinite half space. Finite element models to simulate the MASW test in the presence and absence of voids were developed using the commercial package LS-DYNA. To simulate anomalies, voids of different sizes were inserted on elastic medium. Features for the discrimination of received responses were extracted in time and frequency domains by applying suitable transformations. The compact feature vector is then classified by different techniques: supervised classification (backpropagation neural network, adaptive neuro-fuzzy inference system, k-nearest neighbor classifier, linear discriminate classifier) and unsupervised classification (fuzzy c-means clustering). The classification results show that the performance of k-nearest Neighbor Classifier proved superior when compared with the other techniques with an overall accuracy of 94% in detection of presence of voids and an accuracy of 81% in determining the size of the void in the medium. The assessment of the various classifiers’ performance proved to be valuable in comparing the different techniques and establishing the applicability of simplified classification methods such as k-NN in defect characterization. The obtained classification accuracies for the detection and classification of voids are very encouraging, showing the suitability of the proposed approach to the development of a decision support system for non-destructive testing of materials for defect characterization. Non-destructive testing flaw detection finite element modeling LS-DYNA classification techniques tools of artificial intelligence neural network fuzzy logic linear discriminate analysis k nearest neighbor Rayleigh waves Lamb source confusion matrix seismic waves Civil Engineering
159	Doppler Radar Data Processing And Classification Aygar, Alper 01 September 2008 (has links) (PDF) In this thesis, improving the performance of the automatic recognition of the Doppler radar targets is studied. The radar used in this study is a ground-surveillance doppler radar. Target types are car, truck, bus, tank, helicopter, moving man and running man. The input of this thesis is the output of the real doppler radar signals which are normalized and preprocessed (TRP vectors: Target Recognition Pattern vectors) in the doctorate thesis by Erdogan (2002). TRP vectors are normalized and homogenized doppler radar target signals with respect to target speed, target aspect angle and target range. Some target classes have repetitions in time in their TRPs. By the use of these repetitions, improvement of the target type classification performance is studied. K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms are used for doppler radar target classification and the results are evaluated. Before classification PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis), NMF (Nonnegative Matrix Factorization) and ICA (Independent Component Analysis) are implemented and applied to normalized doppler radar signals for feature extraction and dimension reduction in an efficient way. These techniques transform the input vectors, which are the normalized doppler radar signals, to another space. The effects of the implementation of these feature extraction algoritms and the use of the repetitions in doppler radar target signals on the doppler radar target classification performance are studied.
160	中文詞彙集的來源與權重對中文裁判書分類成效的影響 / Exploring the Influences of Lexical Sources and Term Weights on the Classification of Chinese Judgment Documents 鄭人豪, Cheng, Jen-Hao Unknown Date (has links) 國外法學資訊系統已研究多年，嘗試利用科技幫助提昇司法審判的效率。重要的議題包括輔助判決，法律文件分類，或是相似案件搜尋等。本研究將針對中文裁判書的分類做進一步談討。在文件特徵表示方面，我們以有序詞組來表達中文裁判書，我們嘗試比較採用不同的詞彙來源對於分類效果的影響。實驗中我們分別採用一般通用的電子詞典建立一般詞組；以及以演算法取出法學專業詞彙集建立專業詞組。並依tf-idf(term frequency – inverse document frequency)的概念，設計兩種詞組權重tpf-idf(term pair frequency – inverse document frequency)以及tpf-icf(term pair frequency – inverse category frequency)，來計算特徵詞組權重。在文件分類演算法方面，我們實作以相似度為基礎的k最近鄰居法作為系統分類機制，藉由裁判書的案由欄位，將案例分為七種類別，分別為竊盜、搶奪、強盜、贓物、傷害、恐嚇以及賭博。並藉由觀察案例資料庫的相似度分佈，以找出恰當的參數，進一步得到較佳的分類正確率與較低的拒絕率。我們並依照自省式學習法的精神，建立權重調整的機制。企圖藉由自省式學習法提昇分類效果，以及找出對分類有影響的詞組。而我們以案例資料庫的相似度差異值以及距離差異值，分析調整前後案例資料庫的變化，藉以觀察自省式學習法的效果。 / Legal information systems for non-Chinese languages have been studied intensively in the past many years. There are several topics under discussion, such as judgment assistance, legal document classification, and similar case search, and so on. This thesis studies the classification of Chinese judgment documents. I use phrases as the indices for documents. I attempt to compare the influences of different lexical sources for segmenting Chinese text. One of the lexical sources is a general machine-readable dictionary, Hownet, and the other is the set of terms algorithmically extracted from legal documents. Based on the concept of tf-idf, I design two kinds of phrase weights: tpf-idf and tpf-icf. In the experiments, I use the k-nearest neighbor method to classify Chinese judgment documents into seven categories based on their prosecution reasons: larceny(竊盜), robbery (搶奪), robbery by threatening or disabling the victims (強盜), receiving stolen property (贓物), causing bodily harm (傷害), intimidation (恐嚇), and gambling(賭博). To achieve high accuracy with low rejection rates, I observe and discuss the distribution of similarity of the training documents to select appropriate parameters. In addition, I also conduct a set of analogous experiments for classifying documents based on the cited legal articles for gambling cases. To improve the classification effects, I apply the introspective learning technique to adjust the weights of phrases. I observe the intra-cluster similarity and inter-cluster similarity in evaluating the effects of weight adjustment on experiments for classifying documents based on their prosecution reasons and cited articles. 法學資訊系統自然語言處理 k最近鄰居法自省式學習法 Legal information system Natural language processing k nearest neighbor introspective learning

Search results