Global ETD Search

851	Discretização e geração de gráficos de dados em aprendizado de máquina / Attribute discretization and graphics generation in machine learning Voltolini, Richardson Floriani 17 November 2006 (has links) A elevada quantidade e variedade de informações adquirida e armazenada em meio eletrônico e a incapacidade humana de analizá-las, têm motivado o desenvolvimento da área de Mineracão de Dados - MD - que busca, de maneira semi-automática, extrair conhecimento novo e útil de grandes bases de dados. Uma das fases do processo de MD é o pré-processamento dessas bases de dados. O pré-processamento de dados tem como alguns de seus principais objetivos possibilitar que o usuário do processo obtenha maior compreensão dos dados utilizados, bem como tornar os dados mais adequados para as próximas fases do processo de MD. Uma técnica que busca auxiliar no primeiro objetivo citado é a geracão de gráficos de dados, a qual consiste na representação gráfica dos registros (exemplos) de uma base de dados. Existem diversos métodos de geracão de gráficos, cada qual com suas características e objetivos. Ainda na fase de pré-processamento, de modo a tornar os dados brutos mais adequados para as demais fases do processo de MD, diversas técnicas podem ser aplicadas, promovendo transformações nesses dados. Uma delas é a discretização de dados, que transforma um atributo contínuo da base de dados em um atributo discreto. Neste trabalho são abordados alguns métodos de geração de gráficos e de discretização de dados freqüentemente utilizados pela comunidade. Com relação aos métodos de geração de gráficos, foi projetado e implementado o sistema DISCOVERGRAPHICS que provê interfaces para a geração de gráficos de dados. As diferentes interfaces criadas permitem a utilização do sistema por usuários avançados, leigos e por outros sistemas computacionais. Com relação ao segundo assunto abordado neste trabalho, discretização de dados, foram considerados diversos métodos de discretização supervisionados e não-supervisionados, freqüentemente utilizados pela comunidade, e foi proposto um novo método não-supervisionado denominado K-MeansR. Esses métodos foram comparados entre sí por meio da realização de experimentos e analise estatística dos resultados, considerando-se diversas medidas para realizar a avaliação. Os resultados obtidos indicam que o método proposto supera vários dos métodos de discretização considerados / The great quantity and variety of information acquired and stored electronically and the lack of human capacity to analyze it, have motivated the development of Data Mining - DM - a process that attempts to extract new and useful knowledge from databases. One of the steps of the DM process is data preprocessing. The main goals of the data preprocessing step are to enable the user to have a better understanding of the data being used and to transform the data so it is appropriate for the next step of the DM process related to pattern extraction. A technique concerning the first goal consists of the graphic representation of records (examples) of databases. There are various methods to generate these graphic representations, each one with its own characteristics and objectives. Furthermore, still in the preprocessing step, and in order to transform the raw data into a more suitable form for the next step of the DM process, various data discretization technique methods which transform continuous database attribute values into discrete ones can be applied. This work presents some frequently used methods of graph generation and data discretization. Related to the graph generation methods, we have developed a system called DISCOVERGRAPHICS, which offers different interfaces for graph generation. These interfaces allow both advanced and beginner users, as well as other systems, to access the DISCOVERGRAPHICS system facilities. Regarding the second subject of this work, data discretization, we considered various supervised and unsupervised methods and proposed a new unsupervised data discretization method called K-MeansR. Using different evaluation measures and databases, all these methods were experimentally compared to each other and statistical tests were run to analyze the experimental results. These results showed that the proposed method performed better than many of the other data discretization methods considered in this work Aprendizado de máquina Discretização Discretization Geração de gráficos Graphics generation Machine learning
852	Propensity of Endogenous Alternative Splicing to Mediate Mutative Damage Lutz, Ashley 28 April 2019 (has links) The identification of alternative splicing in the human genome elucidated the potential to several enduring genomic questions. Not only could this phenomenon explain why organism complexity was not at all correlated with the genome size, or explain how an organisms could be affected by experience and environment at the molecular level, but it was perhaps the most flexible and adaptive regulatory mechanism identified to date. While the pathogenic aberrations of this mechanism have generally been readily investigated and identified as potential therapeutic targets, its meditative or advantageous instances have largely not been considered. Initiated exon skipping has been shown to have therapeutic effects in Muscular Dystrophy animal models and even in vitro human muscle cells (Aartsma-Rus, Annemieke, et al, Human Molecular Genetics 2003, McClorey, G., et al, Neuromuscular disorders, 2006). However, the consideration that this process may be occurring endogenously in human cells and contributing to other complex diseases has remained largely ignored. In this work, we have undertaken the first large-scale statistical examination of alternatively spliced variants between the tissues of diseased and normal patients. We hypothesize that there are endogenous alternative splicing events occurring in these tissues that purposefully mediate mutative damage and contribute to the differentiation between diseased and healthy phenotypes. By integrating data from several different sources and employing statistic and machine learning models, we have identified significant differences in gene characteristics between canonical and spliced variants correlated with changes in clinical outcomes. We conclude that this evidence supports our hypothesis that alternative splicing can be positively driven to mediate genetic damage. Expression of these genetically damaged and canonically spliced variants is clearly implicated in diseased tissue and poor clinical outcomes. Alternative Splicing Cancer Machine Learning mRNAseq Mutation Statistics
853	Robustness of Neural Networks for Discrete Input: An Adversarial Perspective Ebrahimi, Javid 30 April 2019 (has links) In the past few years, evaluating on adversarial examples has become a standard procedure to measure robustness of deep learning models. Literature on adversarial examples for neural nets has largely focused on image data, which are represented as points in continuous space. However, a vast proportion of machine learning models operate on discrete input, and thus demand a similar rigor in understanding their vulnerabilities and robustness. We study robustness of neural network architectures for textual and graph inputs, through the lens of adversarial input perturbations. We will cover methods for both attacks and defense; we will focus on 1) addressing challenges in optimization for creating adversarial perturbations for discrete data; 2) evaluating and contrasting white-box and black-box adversarial examples; and 3) proposing efficient methods to make the models robust against adversarial attacks. Adversarial machine learning Graph neural networks Machine translation
854	Automatic Chinese calligraphic font generation with machine learning technology Wang, Wei January 2018 (has links) University of Macau / Faculty of Science and Technology. / Department of Computer and Information Science Machine learning Calligraphy, Chinese -- Data processing Image processing
855	Exploring Node Attributes for Data Mining in Attributed Graphs Jihwan Lee (6639122) 10 June 2019 (has links) Graphs have attracted researchers in various fields in that many different kinds of real-world entities and relationships between them can be represented and analyzed effectively and efficiently using graphs. In particular, researchers in data mining and machine learning areas have developed algorithms and models to understand the complex graph data better and perform various data mining tasks. While a large body of work exists on graph mining, most existing work does not fully exploit attributes attached to graph nodes or edges.<div><br></div><div>In this dissertation, we exploit node attributes to generate better solutions to several graph data mining problems addressed in the literature. First, we introduce the notion of statistically significant attribute associations in attribute graphs and propose an effective and efficient algorithm to discover those associations. The effectiveness analysis on the results shows that our proposed algorithm can reveal insightful attribute associations that cannot be identified using the earlier methods focused solely on frequency. Second, we build a probabilistic generative model for observed attributed graphs. Under the assumption that there exist hidden communities behind nodes in a graph, we adopt the idea of latent topic distributions to model a generative process of node attribute values and link structure more precisely. This model can be used to detect hidden communities and profile missing attribute values. Lastly, we investigate how to employ node attributes to learn latent representations of nodes in lower dimensional embedding spaces and use the learned representations to improve the performance of data mining tasks over attributed graphs.<br></div> Pattern Recognition and Data Mining Attributed Graphs Data Mining Machine Learning
856	Mathematical Methods for Maritime Signal Curation in Noisy Environments Jimenez Blazquez, Lara January 2019 (has links) QTAGG has designed a real-time autonomous system that continuously calculates an optimum propulsion plan controlling the engines and propellers of a vessel. In this way, the precision of the signals that are used is very important, as any little error in the signal can produce incorrect control effects and cause critical damages to the equipment or passengers. This thesis describes the mathematics and implementation of a system to detect and correct disturbances in the data signals of a vessel. The system applies a signal curation based on mathematical modelling and statistics leading to clean data to use in QTAGG’s control system. Signal curation Maritime field Machine learning Engineering and Technology Teknik och teknologier
857	Avaliação automática da qualidade de escrita de resumos científicos em inglês / Automatic evaluation of the quality of English abstracts Genoves Junior, Luiz Carlos 01 June 2007 (has links) Problemas com a escrita podem afetar o desempenho de profissionais de maneira marcante, principalmente no caso de cientistas e acadêmicos que precisam escrever com proficiência e desembaraço não somente na língua materna, mas principalmente em inglês. Durante os últimos anos, ferramentas de suporte à escrita, algumas com enfoque em textos científicos, como o AMADEUS e o SciPo foram desenvolvidas e têm auxiliado pesquisadores na divulgação de suas pesquisas. Entretanto, a criação dessas ferramentas é baseada em córpus, sendo muito custosa, pois implica em selecionar textos bem escritos, além de segmentá-los de acordo com sua estrutura esquemática. Nesse mestrado estudamos, avaliamos e implementamos métodos de detecção automática da estrutura esquemática e de avaliação automática da qualidade de escrita de resumos científicos em inglês. Investigamos o uso de tais métodos para possibilitar o desenvolvimento de dois tipos de ferramentas: de detecção de bons resumos e de crítica. Nossa abordagem é baseada em córpus e em aprendizado de máquina supervisionado. Desenvolvemos um detector automático da estrutura esquemática, que chamamos de AZEA, com taxa de acerto de 80,4% eKappa de 0,73, superiores ao estado da arte (acerto de 73%, Kappa de 0,65). Experimentamos várias combinações de algoritmos, atributos e diferentes seções de um artigo científicos. Utilizamos o AZEA na implementação de duas dimensões de uma rubrica para o gênero científico, composta de 7 dimensões, e construímos e disponibilizamos uma ferramenta de crítica da estrutura de um resumo. Um detector de erros de uso de artigo também foi desenvolvido, com precisão é de 83,7% (Kappa de 0,63) para a tarefa de decidir entre omitir ou não um artigo, com enfoque no feedback ao usuário e como parte da implementação da dimensão de erros gramaticais da rubrica. Na tarefa de detectar bons resumos, utilizamos métodos usados com sucesso na avaliação automática da qualidade de escrita de redações com as implementações da rubrica e realizamos experimentos iniciais, ainda com resultados fracos, próximos à baseline. Embora não tenhamos construído um bom avaliador automático da qualidade de escrita, acreditamos que este trabalho indica direções para atingir esta meta, e forneça algumas das ferramentas necessárias / Poor writing may have serious implications for a professional\'s career. This is even more serious in the case of scientists and academics whose job requires fluency and proficiency in their mother tongue as well as in English. This is why a number of writing tools have been developed in order to assist researchers to promote their work. Here, we are particularly interested in tools, such as AMADEUS and SciPo, which focus on scientific writing. AMADEUS and SciPo are corpus-based tools and hence they rely on corpus compilation which is by no means an easy task. In addition to the dificult task of selecting well-written texts, it also requires segmenting these texts according to their schematic structure. The present dissertation aims to investigate, evaluate and implement some methods to automatically detect the schematic structure of English abstracts and to automatically evaluate their quality. These methods have been examined with a view to enabling the development of two types of tools, namely: detection of well-written abstracts and a critique tool. For automatically detecting schematic structures, we have developed a tool, named AZEA, which adopts a corpus-based, supervised machine learning approach. AZEA reaches 80.4% accuracy and Kappa of 0.73, which is above the highest rates reported in the literature so far (73% accuracy and Kappa of 0.65). We have tested a number of different combinations of algorithms, features and different paper sections. AZEA has been used to implement two out of seven dimensions of a rubric for analyzing scientific papers. A critique tool for evaluating the structure of abstracts has also been developed and made available. In addition, our work also includes the development of a classifier for identifying errors related to English article usage. This classifier reaches 83.7% accuracy (Kappa de 0.63) in the task of deciding whether or not a given English noun phrase requires an article. If implemented in the dimension of grammatical errors of the above mentioned rubric, it can be used to give users feedback on their errors. As regards the task of detecting well-written abstracts, we have resorted to methods which have been successfully adopted to evaluate quality of essays and some preliminary tests have been carried out. However, our results are not yet satisfactory since they are not much above the baseline. Despite this drawback, we believe this study proves relevant since in addition to offering some of the necessary tools, it provides some fundamental guidelines towards the automatic evaluation of the quality of texts Aprendizado de máquina Computacional linguistics Lingüística computacional Machine learning NLP PLN
858	Machine Learning Algorithms for the Analysis of Social Media and Detection of Malicious User Generated Content Unknown Date (has links) One of the de ning characteristics of the modern Internet is its massive connectedness, with information and human connection simply a few clicks away. Social media and online retailers have revolutionized how we communicate and purchase goods or services. User generated content on the web, through social media, plays a large role in modern society; Twitter has been in the forefront of political discourse, with politicians choosing it as their platform for disseminating information, while websites like Amazon and Yelp allow users to share their opinions on products via online reviews. The information available through these platforms can provide insight into a host of relevant topics through the process of machine learning. Speci - cally, this process involves text mining for sentiment analysis, which is an application domain of machine learning involving the extraction of emotion from text. Unfortunately, there are still those with malicious intent and with the changes to how we communicate and conduct business, comes changes to their malicious practices. Social bots and fake reviews plague the web, providing incorrect information and swaying the opinion of unaware readers. The detection of these false users or posts from reading the text is di cult, if not impossible, for humans. Fortunately, text mining provides us with methods for the detection of harmful user generated content. This dissertation expands the current research in sentiment analysis, fake online review detection and election prediction. We examine cross-domain sentiment analysis using tweets and reviews. Novel techniques combining ensemble and feature selection methods are proposed for the domain of online spam review detection. We investigate the ability for the Twitter platform to predict the United States 2016 presidential election. In addition, we determine how social bots in uence this prediction. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2018. / FAU Electronic Theses and Dissertations Collection Machine learning. Text mining. User-generated content. Social media.
859	Enhancement of Deep Neural Networks and Their Application to Text Mining Unknown Date (has links) Many current application domains of machine learning and arti cial intelligence involve knowledge discovery from text, such as sentiment analysis, document ontology, and spam detection. Humans have years of experience and training with language, enabling them to understand complicated, nuanced text passages with relative ease. A text classi er attempts to emulate or replicate this knowledge so that computers can discriminate between concepts encountered in text; however, learning high-level concepts from text, such as those found in many applications of text classi- cation, is a challenging task due to the many challenges associated with text mining and classi cation. Recently, classi ers trained using arti cial neural networks have been shown to be e ective for a variety of text mining tasks. Convolutional neural networks have been trained to classify text from character-level input, automatically learn high-level abstract representations and avoiding the need for human engineered features. This dissertation proposes two new techniques for character-level learning, log(m) character embedding and convolutional window classi cation. Log(m) embedding is a new character-vector representation for text data that is more compact and memory e cient than previous embedding vectors. Convolutional window classi cation is a technique for classifying long documents, i.e. documents with lengths exceeding the input dimension of the neural network. Additionally, we investigate the performance of convolutional neural networks combined with long short-term memory networks, explore how document length impacts classi cation performance and compare performance of neural networks against non-neural network-based learners in text classi cation tasks. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2018. / FAU Electronic Theses and Dissertations Collection Text Mining Neural networks (Computer science) Machine learning
860	Parallel Distributed Deep Learning on Cluster Computers Unknown Date (has links) Deep Learning is an increasingly important subdomain of arti cial intelligence. Deep Learning architectures, arti cial neural networks characterized by having both a large breadth of neurons and a large depth of layers, bene ts from training on Big Data. The size and complexity of the model combined with the size of the training data makes the training procedure very computationally and temporally expensive. Accelerating the training procedure of Deep Learning using cluster computers faces many challenges ranging from distributed optimizers to the large communication overhead speci c to a system with o the shelf networking components. In this thesis, we present a novel synchronous data parallel distributed Deep Learning implementation on HPCC Systems, a cluster computer system. We discuss research that has been conducted on the distribution and parallelization of Deep Learning, as well as the concerns relating to cluster environments. Additionally, we provide case studies that evaluate and validate our implementation. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2018. / FAU Electronic Theses and Dissertations Collection Deep learning. Neural networks (Computer science). Artificial intelligence. Machine learning.

Search results