Global ETD Search

21	Automated data classification using feature weighted self-organising map (FWSOM) Ahamd Usman, Aliyu January 2018 (has links) The enormous increase in the production of electronic data in today's information era has led to more challenges in analysing and understanding of the data. The rise in the innovations of technology devices, computers and the Internet has made it much easier to collect and store different kind of data ranging from personal, medical, financial, and scientific data. The growth in the amount of the generated data has introduced the term “Big Data” to describe this extremely high-dimensional and yet complex data. Making sense of the generated data sets is of great importance for the discovery of meaningful information that can be used to support decision making. Data mining techniques have been designed as a process for ex-ploring these data sets to extract meaning for decision making. An essential phase of the data mining procedure is the data transformation that involves the selection of input parameters. Selecting the right input parameters has a great impact on the performance of machine learning algorithms. Currently, there are existing manual statistical methods that are used for this task, but these are difficult to use, time consuming and require an expert. Automated data analysis is the initial step to relieve this burden from humans, through the provision of a systematic procedure of inspecting, transforming and modelling data for knowledge discovery. This project presents a novel method that exploits the power of self-organization for a sys-tematic procedure of conducting and inspecting data classification, with the identification of input parameters that are important for the process. The developed method can be used on different classification problems with practical application in various areas such as health con-dition monitoring in health care, machinery fault detection and analysis, and financial instrument analysis among others. 620
22	Advanced techniques for automatic classification of digitally modulated communication signals / Hong, Liang, January 2002 (has links) Thesis (Ph. D.)--University of Missouri-Columbia, 2002. / Typescript. Vita. Includes bibliographical references (leaves 158-174). Also available on the Internet.
23	Advanced techniques for automatic classification of digitally modulated communication signals Hong, Liang, January 2002 (has links) Thesis (Ph. D.)--University of Missouri-Columbia, 2002. / Typescript. Vita. Includes bibliographical references (leaves 158-174). Also available on the Internet.
24	A homogeneous hierarchical scripted vector classification network with optimisation by genetic algorithm : a thesis submitted in partial fulfilment of the requirements for the degree of Master of Engineering in Electrical and Computer Engineering at the University of Canterbury, Christchurch, New Zealand / Wright, Hamish M. January 1900 (has links) Thesis (M.E.)--University of Canterbury, 2007. / Typescript (photocopy). "August 2007." Includes bibliographical references (leaves [105]-109). Also available via the World Wide Web.
25	A workflow for the modeling and analysis of biomedical data Marsolo, Keith Allen, January 2007 (has links) Thesis (Ph. D.)--Ohio State University, 2007. / Title from first page of PDF file. Includes bibliographical references (p. 229-239).
26	Development of a Workflow for Automatic Classification and Digitization of Road Objects Gathered with Mobile Mapping Ekblad, Jacob, Lips, Jacob January 2015 (has links) Mobile Mapping Systems gathers a lot of spatial data that can be used for inventory and analyses regarding road safety. The main purpose of this thesis is to propose a workflow for automatic classification and digitisation of objects in a point cloud gathered by a Mobile Mapping System. The current method used for processing point clouds is performed manually which is cost-inefficient and time consuming due to the vast amount of data the point cloud contains. Before defining the workflow, different software were reviewed for finding which ones to use for the classification. The software review showed that a combination of using Terrasolid and FME is suitable for performing the steps suggested in the classification and digitisation method. The proposed workflow for performing automatic classification and digitisation is based on six different steps: Identify characteristics of the objects of interest, Filter the point cloud, Noise reduction, Identify objects, Digitise and Control. This method has been carried out on two examples, road signs and painted road lines. Attributes that have been used for classifying the objects involves intensity, colour value and spatial relations. The results showed that for digitising road signs, the method found 15 out of 16 signs (94%). For digitising the painted road lines, the results produced by the automatic function had an average misalignment of 3.8 centimetres in comparison to the initial point cloud. The thesis demonstrates that the carried out functions are less time demanding for the user, compared to the manual method carried out today. / Bilburen laserskanning samlar geografiska och rumslig data som kan användas för inventering och analys gällande trafiksäkerhet. Huvudsyftet med denna uppsats är att föreslå ett arbetsflöde för automatisk klassificering och digitalisering av objekt i ett punktmoln som samlats in av en bilburen laserskanner. Den nuvarande metoden som används för bearbetning av punktmoln innebär mycket manuellt arbete för användaren vilket är kostnadsineffektivt och tidskrävande på grund av den stora mängd data punktmolnet består av. Innan arbetsflödet kunde definieras, har olika programvaror granskats för att hitta vilka som kan användas för klassificeringen. Den genomgångna programvaran visade att en kombination av att använda Terrasolid och FME är lämplig för att utföra de åtgärder som föreslås för klassificeringen och digitalisering. Det föreslagna arbetsflödet för att utföra automatisk klassificering och digitalisering bygger på sex olika steg: Identifiera egenskaper hos objekt av intresse, Filtrera punktmoln, Brusreducering, Identifiera objekt, Digitalisera och Kontroll av resultat. Denna metod har genomförts på två exempel, vägmärken och målade väglinjer. Attribut som har använts för att klassificera objekten innefattar intensitet, färgvärde och spatiala relationer. Resultaten visar att för digitalisering av vägmärken, kunde funktionen identifiera 15 utav 16 skyltar(94%). Vid digitalisering av de målade väglinjerna, visade resultaten framtagna av den automatiska funktionen en genomsnittlig avvikelse i sidled på 3,8 centimeter i jämförelse med det ursprungliga punktmolnet. Avhandlingen visar att de framlagda funktionerna är mindre tidskrävande för användaren, jämfört med den manuella metoden som utförs idag. Mobile Mapping Automatic classification Laser scanning Road Other Civil Engineering Annan samhällsbyggnadsteknik
27	Etude de la paraphrase sous-phrastique en traitement automatique des langues / A study of sub-sentential paraphrases in Natural Language Processing Bouamor, Houda 11 June 2012 (has links) La variabilité en langue est une source majeure de difficultés dans la plupart des applications du traitement automatique des langues. Elle se manifeste dans le fait qu’une même idée ou un même événement peut être exprimé avec des mots ou des groupes de mots différents ayant la même signification dans leur contexte respectif. Capturer automatiquement des équivalences sémantiques entre des unités de texte est une tâche complexe mais qui s’avère indispensable dans de nombreux contextes. L’acquisition a priori de listes d’équivalences met à disposition des ressources utiles pour, par exemple, améliorer le repérage d’une réponse à une question, autoriser des formulations différentes en évaluation de la traduction automatique, ou encore aider des auteurs à trouver des formulations plus adaptées. Dans cette thèse, nous proposons une étude détaillée de la tâche d’acquisition de paraphrases sous-phrastiques à partir de paires d’énoncés sémantiquement liés. Nous démontrons empiriquement que les corpus parallèles monolingues, bien qu’extrêmement rares, constituent le type de ressource le plus adapté pour ce genre d’étude. Nos expériences mettent en jeu cinq techniques d’acquisition, représentatives de différentes approches et connaissances, en anglais et en français. Afin d’améliorer la performance en acquisition, nous réalisons la combinaison des paraphrases produites par ces techniques par une validation reposant sur un classifieur automatique à maximum d’entropie bi-classe. Un résultat important de notre étude est l’identification de paraphrases qui défient actuellement les techniques étudiées, lesquelles sont classées et quantifiées en anglais et français. Nous examinons également dans cette thèse l’impact de la langue, du type du corpus et la comparabilité des paires des énoncés utilisés sur la tâche d’acquisition de paraphrases sous- phrastiques. Nous présentons le résultat d’une analyse de la performance des différentes méthodes testées en fonction des difficultés d’alignement des paires de paraphrases d’énoncés. Nous donnons, ensuite, un compte rendu descriptif et quantitatif des caractéristiques des paraphrases trouvées dans les différents types de corpus étudiés ainsi que celles qui défient les approches actuelles d’identification automatique. / Language variation, or the fact that messages can be conveyed in a great variety of ways by means of linguistic expressions, is one of the most challenging and certainly fascinating features of language for Natural Language Processing, with wide applications in language analysis and generation. The term paraphrase is now commonly used to refer to textual units of equivalent meaning, down to the level of sub-sentential fragments. Although one can envisage to manually build high-coverage lists of synonyms, enumerating meaning equivalences at the level of phrases is too daunting a task for humans. Consequently, acquiring this type of knowledge by automatic means has attracted a lot of attention and significant research efforts have been devoted to this objective. In this thesis we use parallel monolingual corpora for a detailed study of the task of sub-sentential paraphrase acquisition. We argue that the scarcity of this type of resource is compensated by the fact that it is the most suited corpus type for studies on paraphrasing. We propose a large exploration of this task with experiments on two languages with five different acquisition techniques, selected for their complementarity, their combinations, as well as four monolingual corpus types of varying comparability. We report, under all conditions, a significant improvement over all techniques by validating candidate paraphrases using a maximum entropy classifier. An important result of our study is the identification of difficult-to-acquire paraphrase pairs, which are classified and quantified in a bilingual typology. Corpus monolingues Acquisition de paraphrase Classification automatique de paraphrase Typologie de paraphrase Monolingual corpora Paraphrase acquisition Paraphrase automatic classification Paraphrase typology
28	Neural Networks for the Web Services Classification Silva, Jesús, Senior Naveda, Alexa, Solórzano Movilla, José, Niebles Núẽz, William, Hernández Palma, Hugo 07 January 2020 (has links) This article introduces a n-gram-based approach to automatic classification of Web services using a multilayer perceptron-type artificial neural network. Web services contain information that is useful for achieving a classification based on its functionality. The approach relies on word n-grams extracted from the web service description to determine its membership in a category. The experimentation carried out shows promising results, achieving a classification with a measure F=0.995 using unigrams (2-grams) of words (characteristics composed of a lexical unit) and a TF-IDF weight. Classification (of information) Classification (of information) Websites Automatic classification Lexical unit N-grams Web service description Word n-grams Web services
29	Improving the Accessibility of Arabic Electronic Theses and Dissertations (ETDs) with Metadata and Classification Abdelrahman, Eman January 2021 (has links) Much research work has been done to extract data from scientific papers, journals, and articles. However, Electronic Theses and Dissertations (ETDs) remain an unexplored genre of data in the research fields of natural language processing and machine learning. Moreover, much of the related research involved data that is in the English language. Arabic data such as news and tweets have begun to receive some attention in the past decade. However, Arabic ETDs remain an untapped source of data despite the vast number of benefits to students and future generations of scholars. Some ways of improving the browsability and accessibility of data include data annotation, indexing, parsing, translation, and classification. Classification is essential for the searchability and management of data, which can be manual or automated. The latter is beneficial when handling growing volumes of data. There are two main roadblocks to performing automatic subject classification on Arabic ETDs. The first is the unavailability of a public corpus of Arabic ETDs. The second is the Arabic language’s linguistic complexity, especially in academic documents. This research presents the Otrouha project, which aims at building a corpus of key metadata of Arabic ETDs as well as providing a methodology for their automatic subject classification. The first goal is aided by collecting data from the AskZad Digital Library. The second goal is achieved by exploring different machine learning and deep learning techniques. The experiments’ results show that deep learning using pretrained language models gave the highest classification performance, indicating that language models significantly contribute to natural language understanding. / M.S. / An Electronic Thesis or Dissertation (ETD) is an openly-accessible electronic version of a graduate student’s research thesis or dissertation. It documents their main research effort that has taken place and becomes available in the University Library instead of a paper copy. Over time, collections of ETDs have been gathered and made available online through different digital libraries. ETDs are a valuable source of information for scholars and researchers, as well as librarians. With the digitalization move in most Middle Eastern Universities, the need to make Arabic ETDs more accessible significantly increases as their numbers increase. One of the ways to improve their accessibility and searchability is through providing automatic classification instead of manual classification. This thesis project focuses on building a corpus of metadata of Arabic ETDs and building a framework for their automatic subject classification. This is expected to pave the way for more exploratory research on this valuable genre of data. Machine learning NLP Automatic Classification Deep learning (Machine learning) Pretrained Language Models Digital Libraries
30	ACTION: automatic classification for Chinese documents. January 1994 (has links) by Jacqueline, Wai-ting Wong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (p. 107-109). / Abstract --- p.i / Acknowledgement --- p.iii / List of Tables --- p.viii / List of Figures --- p.ix / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Chinese Information Processing --- p.6 / Chapter 2.1 --- Chinese Word Segmentation --- p.7 / Chapter 2.1.1 --- Statistical Method --- p.8 / Chapter 2.1.2 --- Probabilistic Method --- p.9 / Chapter 2.1.3 --- Linguistic Method --- p.10 / Chapter 2.2 --- Automatic Indexing --- p.10 / Chapter 2.2.1 --- Title Indexing --- p.11 / Chapter 2.2.2 --- Free-Text Searching --- p.11 / Chapter 2.2.3 --- Citation Indexing --- p.12 / Chapter 2.3 --- Information Retrieval Systems --- p.13 / Chapter 2.3.1 --- Users' Assessment of IRS --- p.13 / Chapter 2.4 --- Concluding Remarks --- p.15 / Chapter 3 --- Survey on Classification --- p.16 / Chapter 3.1 --- Text Classification --- p.17 / Chapter 3.2 --- Survey on Classification Schemes --- p.18 / Chapter 3.2.1 --- Commonly Used Classification Systems --- p.18 / Chapter 3.2.2 --- Classification of Newspapers --- p.31 / Chapter 3.3 --- Concluding Remarks --- p.37 / Chapter 4 --- System Models and the ACTION Algorithm --- p.38 / Chapter 4.1 --- Factors Affecting Systems Performance --- p.38 / Chapter 4.1.1 --- Specificity --- p.39 / Chapter 4.1.2 --- Exhaustivity --- p.40 / Chapter 4.2 --- Assumptions and Scope --- p.42 / Chapter 4.2.1 --- Assumptions --- p.42 / Chapter 4.2.2 --- System Scope ´ؤ Data Flow Diagrams --- p.44 / Chapter 4.3 --- System Models --- p.48 / Chapter 4.3.1 --- Article --- p.48 / Chapter 4.3.2 --- Matching Table --- p.49 / Chapter 4.3.3 --- Forest --- p.51 / Chapter 4.3.4 --- Matching --- p.53 / Chapter 4.4 --- Classification Rules --- p.54 / Chapter 4.5 --- The ACTION Algorithm --- p.56 / Chapter 4.5.1 --- Algorithm Design Objectives --- p.56 / Chapter 4.5.2 --- Measuring Node Significance --- p.56 / Chapter 4.5.3 --- Pseudocodes --- p.61 / Chapter 4.6 --- Concluding Remarks --- p.64 / Chapter 5 --- Analysis of Results and Validation --- p.66 / Chapter 5.1 --- Seeking for Exhaustivity Rather Than Specificity --- p.67 / Chapter 5.1.1 --- The News Article --- p.67 / Chapter 5.1.2 --- The Matching Results --- p.68 / Chapter 5.1.3 --- The Keyword Values --- p.68 / Chapter 5.1.4 --- Analysis of Classification Results --- p.71 / Chapter 5.2 --- Catering for Hierarchical Relationships Between Classes and Subclasses --- p.72 / Chapter 5.2.1 --- The News Article --- p.72 / Chapter 5.2.2 --- The Matching Results --- p.73 / Chapter 5.2.3 --- The Keyword Values --- p.74 / Chapter 5.2.4 --- Analysis of Classification Results --- p.75 / Chapter 5.3 --- A Representative With Zero Occurrence --- p.78 / Chapter 5.3.1 --- The News Article --- p.78 / Chapter 5.3.2 --- The Matching Results --- p.79 / Chapter 5.3.3 --- The Keyword Values --- p.80 / Chapter 5.3.4 --- Analysis of Classification Results --- p.81 / Chapter 5.4 --- Statistical Analysis --- p.83 / Chapter 5.4.1 --- Classification Results with Highest Occurrence Frequency --- p.83 / Chapter 5.4.2 --- Classification Results with Zero Occurrence Frequency --- p.85 / Chapter 5.4.3 --- Distribution of Classification Results on Level Numbers --- p.86 / Chapter 5.5 --- Concluding Remarks --- p.87 / Chapter 5.5.1 --- Advantageous Characteristics of ACTION --- p.88 / Chapter 6 --- Conclusion --- p.93 / Chapter 6.1 --- Perspectives in Document Representation --- p.93 / Chapter 6.2 --- Classification Schemes --- p.95 / Chapter 6.3 --- Classification System Model --- p.95 / Chapter 6.4 --- The ACTION Algorithm --- p.96 / Chapter 6.5 --- Advantageous Characteristics of the ACTION Algorithm --- p.96 / Chapter 6.6 --- Testing and Validating the ACTION algorithm --- p.98 / Chapter 6.7 --- Future Work --- p.99 / Chapter 6.8 --- A Final Remark --- p.100 / Chapter A --- System Models --- p.102 / Chapter B --- Classification Rules --- p.104 / Chapter C --- Node Significance Definitions --- p.105 / References --- p.107 Automatic classification Chinese language--Data processing

Search results