Spelling suggestions: "subject:"cachine learningtechniques"" "subject:"cachine tuningtechnique""
1 |
PREDICTING MELANOMA RISK FROM ELECTRONIC HEALTH RECORDS WITH MACHINE LEARNING TECHNIQUESUnknown Date (has links)
Melanoma is one of the fastest growing cancers in the world, and can affect patients earlier in life than most other cancers. Therefore, it is imperative to be able to identify patients at high risk for melanoma and enroll them in screening programs to detect the cancer early. Electronic health records collect an enormous amount of data about real-world patient encounters, treatments, and outcomes. This data can be mined to increase our understanding of melanoma as well as build personalized models to predict risk of developing the cancer. Cancer risk models built from structured clinical data are limited in current research, with most studies involving just a few variables from institutional databases or registries. This dissertation presents data processing and machine learning approaches to build melanoma risk models from a large database of de-identified electronic health records. The database contains consistently captured structured data, enabling the extraction of hundreds of thousands of data points each from millions of patient records. Several experiments are performed to build effective models, particularly to predict sentinel lymph node metastasis in known melanoma patients and to predict individual risk of developing melanoma. Data for these models suffer from high dimensionality and class imbalance. Thus, classifiers such as logistic regression, support vector machines, random forest, and XGBoost are combined with advanced modeling techniques such as feature selection and data sampling. Risk factors are evaluated using regression model weights and decision trees, while personalized predictions are provided through random forest decomposition and Shapley additive explanations. Random undersampling on the melanoma risk dataset shows that many majority samples can be removed without a decrease in model performance. To determine how much data is truly needed, we explore learning curve approximation methods on the melanoma data and three publicly-available large-scale biomedical datasets. We apply an inverse power law model as well as introduce a novel semi-supervised curve creation method that utilizes a small amount of labeled data. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2019. / FAU Electronic Theses and Dissertations Collection
|
2 |
Aplicação de técnicas de visão computacional e aprendizado de máquina para a detecção de exsudatos duros em imagens de fundo de olho / Application of techniques of computer vision and machine learning for detection of hard exudates in images of eye fundusCarvalho, Tiago José de, 1985- 16 August 2018 (has links)
Orientadores: Siome Klein Goldenstein, Jacques Wainer / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-16T14:41:21Z (GMT). No. of bitstreams: 1
Carvalho_TiagoJosede_M.pdf: 8401323 bytes, checksum: f84374dac5bebf5ea465a7a74ea9b5e4 (MD5)
Previous issue date: 2010 / Resumo: O desenvolvimento de métodos computacionais capazes de auxiliar especialistas de diversas áreas na realização de suas tarefas é foco de diversos estudos. Na área da saúde, o diagnóstico precoce de doenças é muito importante para a melhoria da qualidade de vida dos pacientes. Para oftalmologistas que tratam de pacientes com diabetes, um método confiável para a detecção de anomalias em imagens de fundo de olho é importante para um diagnóstico precoce evitando o aparecimento de complicações na retina. Tais complicações podem causar até cegueira. Exsudatos duros é uma das anomalias mais comuns encontradas na retina, sendo sua detecção o foco de vários tipos de abordagens na literatura. Esta dissertação apresenta uma nova e eficiente abordagem para detecção de exsudatos duros em imagens de fundo de olho. Esta abordagem utiliza técnicas de visão computacional e inteligência artificial, como descritores locais, dicionários visuais, agrupamentos e classificação de padrões para detectar exsudatos nas imagens. / Abstract: The computational methods development can helps specialists of several areas in your works is focus of many studies. In health area the premature diagnosis of diseases is very important to improve the patient's life quality. To ophthalmologists who treat patients with diabetics, a reliable method to anomalies detects in eye fundus images is important to a premature diagnosis, avoiding appear of retina complications. Such complications can cause blindness. Hard Exsudates is one of more common anomalies found at retina, being your detection is the focus of many kinds of approaches in literature. This master's thesis presents a new and efficient approach for detection of exsudates at eye fundus images. This approach uses computer vision and artificial inteligence techniques like visiual dictionaries, clustering and pattern recognition to detect hard exsudates in images. / Mestrado / Visão Computacional / Mestre em Ciência da Computação
|
3 |
META-LEARNING AND ENSEMBLE METHODS FOR DEEP NEURAL NETWORKSUnknown Date (has links)
Deep Neural Networks have been widely applied in many different applications and achieve significant improvement over classical machine learning techniques. However, training a neural network usually requires large amount of data, which is not guaranteed in some applications such as medical image classification. To address this issue, people propose to implement meta learning and ensemble learning techniques to make deep learning trainers more powerful. This thesis focuses on using deep learning equipped with meta learning and ensemble learning to study specific problems.
We first propose a new deep learning based method for suggestion mining.
The major challenges of suggestion mining include cross domain issue and the issues caused by unstructured and highly imbalanced data structure. To overcome these challenges, we propose to apply Random Multi-model Deep Learning (RMDL) which combines three different deep learning architectures (DNNs, RNNs and CNNs) and automatically selects the optimal hyper parameter to improve the robustness and flexibility of the model. Our experimental results on the SemEval-2019 competition Task 9 data sets demonstrate that our proposed RMDL outperforms most of the existing suggestion mining methods. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection
|
4 |
SUSTAINING CHAOS USING DEEP REINFORCEMENT LEARNINGUnknown Date (has links)
Numerous examples arise in fields ranging from mechanics to biology where disappearance of Chaos can be detrimental. Preventing such transient nature of chaos has been proven to be quite challenging. The utility of Reinforcement Learning (RL), which is a specific class of machine learning techniques, in discovering effective control mechanisms in this regard is shown. The autonomous control algorithm is able to prevent the disappearance of chaos in the Lorenz system exhibiting meta-stable chaos, without requiring any a-priori knowledge about the underlying dynamics. The autonomous decisions taken by the RL algorithm are analyzed to understand how the system’s dynamics are impacted. Learning from this analysis, a simple control-law capable of restoring chaotic behavior is formulated. The reverse-engineering approach adopted in this work underlines the immense potential of the techniques used here to discover effective control strategies in complex dynamical systems. The autonomous nature of the learning algorithm makes it applicable to a diverse variety of non-linear systems, and highlights the potential of RLenabled control for regulating other transient-chaos like catastrophic events. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection
|
5 |
MACHINE LEARNING DEMODULATOR ARCHITECTURES FOR POWER-LIMITED COMMUNICATIONSUnknown Date (has links)
The success of deep learning has renewed interest in applying neural networks and other machine learning techniques to most fields of data and signal processing, including communications. Advances in architecture and training lead us to consider new modem architectures that allow flexibility in design, continued learning in the field, and improved waveform coding. This dissertation examines neural network architectures and training methods suitable for demodulation in power-limited communication systems, such as those found in wireless sensor networks. Such networks will provide greater connection to the world around us and are expected to contain orders of magnitude more devices than cellular networks. A number of standard and proprietary protocols span this space, with modulations such as frequency-shift-keying (FSK), Gaussian FSK (GFSK), minimum shift keying (MSK), on-off-keying (OOK), and M-ary orthogonal modulation (M-orth). These modulations enable low-cost radio hardware with efficient nonlinear amplification in the transmitter and noncoherent demodulation in the receiver. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection
|
6 |
A New Framework and Novel Techniques to Multimodal Concept Representation and FusionLin, Xudong January 2024 (has links)
To solve real-world problems, machines are required to perceive multiple modalities and fuse the information from them. This thesis studies learning to understand and fuse multimodal information. Existing approaches follow a three-stage learning paradigm. The first stage is to train models for each modality. This process for video understanding models is usually based on supervised training, which is not scalable. Moreover, these modality-specific models are updated rather frequently nowadays with improving single-modality perception abilities.
The second stage is crossmodal pretraining, which trains a model to align and fuse multiple modalities based on paired multimodal data, such as video-caption pairs. This process is resource-consuming and expensive. The third stage is to further fine-tune or prompt the resulting model from the second stage towards certain downstream tasks. The key bottleneck of conventional methods lies in the continuous feature representation used for non-textual modalities, which is usually costly to align and fuse with text.
In this thesis, we investigate the representation and the fusion based on textual concepts. We propose to map non-textual modalities to textual concepts and then fuse these textual concepts using text models. We systematically study various specific methods of mapping and different architectures for fusion. The proposed methods include an end-to-end video-based text generation model with differentiable tokenization for video and audio concepts, a contrastive-model-based architecture with zero-shot concept extractor, a deep concept injection algorithm enabling language models to solve multimodal tasks without any training, and a distant supervision framework learning concepts in a long temporal span.
With our concept representation, we empirically demonstrate that without several orders of magnitude more cost for the crossmodal pretraining stage, our models are able to achieve competitive or even superior performance on downstream tasks such as video question answering, video captioning, text-video retrieval, and audio-video dialogue. We also examine the possible limitations of concept representations such as when the text quality of a dataset is poor. We believe we show a potential path towards upgradable multimodal intelligence, whose components can be easily updated towards new models or new modalities of data.
|
7 |
Learning fuzzy logic from examplesAranibar, Luis Alfonso Quiroga January 1994 (has links)
No description available.
|
8 |
Arcabouço genérico baseado em técnicas de agrupamento para sistemas de recomendação / Cluster-based generic framework for recommender systemsPanaggio, Ricardo Luís Zanetti 10 January 2010 (has links)
Orientador: Ricardo da Silva Torres / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-17T10:19:12Z (GMT). No. of bitstreams: 1
Panaggio_RicardoLuisZanetti_M.pdf: 1050987 bytes, checksum: f88ede3a681c880be4489f30662ec451 (MD5)
Previous issue date: 2010 / Resumo: A diferença entre o conjunto de dados disponíveis e o conjunto dos dados que interessam a um usuário é enorme e, em geral, cresce diariamente, uma vez que o volume de dados produzidos todos os dias só aumenta. Identificar todo o conjunto de dados de interesse de um usuário utilizando mecanismos tradicionais é muito difícil - talvez impossível. Nesse cenário, ferramentas que possam ajudar usuários a identificar itens de interesse, como sistemas de recomendação, têm um grande valor. Esta dissertação apresenta um modelo genérico que pode ser utilizado para a criação de sistemas de recomendação, e sua instanciação utilizando técnicas de agrupamento. Essa dissertação apresenta também a validação desse modelo, a partir de sua implementação e experimentação com dados das bases Movielens e Jester. As principais contribuições são: definição de um modelo de recomendação baseado em grafos, até onde se sabe mais rico e mais genérico que os encontrados na literatura; especificação e implementação de uma arquitetura modular de um sistema de recomendação baseada nesse modelo, com enfoque em técnicas de agrupamento de dados; validação da arquitetura e do modelo de recomendação propostos, comparando eficácia e eficiência de técnicas de agrupamento de dados em sistemas de recomendação / Abstract: The difference between the data available and the set of interesting data to a certain user is enormous and, in general, is becoming greater daily, as the amount of data produced increases. Identifying all the interesting data set using traditional mechanisms is difficult- sometimes impossible. In this scenario, providing tools that can help users on identifying items that are of interest, such as recommendation systems, is of great importance. This dissertation presents a generic model that can be used to create recommender systems, and its instantiation using clustering techniques. It also discusses the validation of this model, by showing results obtained from experiments with data from Movielens and Jester datasets. The main contributions are: a graph-based generic model for recommender systems, which is more generic and richer than the ones found in literature; the specification and implementation of a modular architecture for recommender systems based on that model, focused on clustering techniques; validation of both model and architecture, by comparing efficiency and effectiveness of clustering-based recommender systems / Mestrado / Sistemas de Recuperação da Informação / Mestre em Ciência da Computação
|
9 |
Machine learning for epigenetics : algorithms for next generation sequencing dataMayo, Thomas Richard January 2018 (has links)
The advent of Next Generation Sequencing (NGS), a little over a decade ago, has led to a vast and rapid increase in the generation of genomic data. The drastically reduced cost has in turn enabled powerful modifications that can be used to investigate not just genetic, but epigenetic, phenomena. Epigenetics refers to the study of mechanisms effecting gene expression other than the genetic code itself and thus, at the transcription level, incorporates DNA methylation, transcription factor binding and histone modifications amongst others. This thesis outlines and tackles two major challenges in the computational analysis of such data using techniques from machine learning. Firstly, I address the problem of testing for differential methylation between groups of bisulfite sequencing data sets. DNA methylation plays an important role in genomic imprinting, X-chromosome inactivation and the repression of repetitive elements, as well as being implicated in numerous diseases, such as cancer. Bisulfite sequencing provides single nucleotide resolution methylation data at the whole genome scale, but a sensitive analysis of such data is difficult. I propose a solution that uses a powerful kernel-based machine learning technique, the Maximum Mean Discrepancy, to leverage well-characterised spatial correlations in DNA methylation, and adapt the method for this particular use. I use this tailored method to analyse a novel data set from a study of ageing in three different tissues in the mouse. This study motivates further modifications to the method and highlights the utility of the underlying measure as an exploratory tool for methylation analysis. Secondly, I address the problem of predictive and explanatory modelling of chromatin immunoprecipitation sequencing data (ChIP-Seq). ChIP-Seq is typically used to assay the binding of a protein of interest, such as a transcription factor or histone, to the DNA, and as such is one of the most widely used sequencing assays. While peak callers are a powerful tool in identifying binding sites of sparse and clean ChIPSeq profiles, more broad signals defy analysis in this framework. Instead, generative models that explain the data in terms of the underlying sequence can help uncover mechanisms that predicting binding or the lack thereof. I explore current problems with ChIP-Seq analysis, such as zero-inflation and the use of the control experiment, known as the input. I then devise a method for representing k-mers that enables the use of longer DNA sub-sequences within a flexible model development framework, such as generalised linear models, without heavy programming requirements. Finally, I use these insights to develop an appropriate Bayesian generative model that predicts ChIP-Seq count data in terms of the underlying DNA sequence, incorporating DNA methylation information where available, fitting the model with the Expectation-Maximization algorithm. The model is tested on simulated data and real data pertaining to the histone mark H3k27me3. This thesis therefore straddles the fields of bioinformatics and machine learning. Bioinformatics is both plagued and blessed by the plethora of different techniques available for gathering data and their continual innovations. Each technique presents a unique challenge, and hence out-of-the-box machine learning techniques have had little success in solving biological problems. While I have focused on NGS data, the methods developed in this thesis are likely to be applicable to future technologies, such as Third Generation Sequencing methods, and the lessons learned in their adaptation will be informative for the next wave of computational challenges.
|
10 |
Classificadores e aprendizado em processamento de imagens e visão computacional / Classifiers and machine learning techniques for image processing and computer visionRocha, Anderson de Rezende, 1980- 03 March 2009 (has links)
Orientador: Siome Klein Goldenstein / Tese (doutorado) - Universidade Estadual de Campinas, Instituto da Computação / Made available in DSpace on 2018-08-12T17:37:15Z (GMT). No. of bitstreams: 1
Rocha_AndersondeRezende_D.pdf: 10303487 bytes, checksum: 243dccfe5255c828ce7ead27c27eb1cd (MD5)
Previous issue date: 2009 / Resumo: Neste trabalho de doutorado, propomos a utilizaçãoo de classificadores e técnicas de aprendizado de maquina para extrair informações relevantes de um conjunto de dados (e.g., imagens) para solução de alguns problemas em Processamento de Imagens e Visão Computacional. Os problemas de nosso interesse são: categorização de imagens em duas ou mais classes, detecçãao de mensagens escondidas, distinção entre imagens digitalmente adulteradas e imagens naturais, autenticação, multi-classificação, entre outros. Inicialmente, apresentamos uma revisão comparativa e crítica do estado da arte em análise forense de imagens e detecção de mensagens escondidas em imagens. Nosso objetivo é mostrar as potencialidades das técnicas existentes e, mais importante, apontar suas limitações. Com esse estudo, mostramos que boa parte dos problemas nessa área apontam para dois pontos em comum: a seleção de características e as técnicas de aprendizado a serem utilizadas. Nesse estudo, também discutimos questões legais associadas a análise forense de imagens como, por exemplo, o uso de fotografias digitais por criminosos. Em seguida, introduzimos uma técnica para análise forense de imagens testada no contexto de detecção de mensagens escondidas e de classificação geral de imagens em categorias como indoors, outdoors, geradas em computador e obras de arte. Ao estudarmos esse problema de multi-classificação, surgem algumas questões: como resolver um problema multi-classe de modo a poder combinar, por exemplo, caracteríisticas de classificação de imagens baseadas em cor, textura, forma e silhueta, sem nos preocuparmos demasiadamente em como normalizar o vetor-comum de caracteristicas gerado? Como utilizar diversos classificadores diferentes, cada um, especializado e melhor configurado para um conjunto de caracteristicas ou classes em confusão? Nesse sentido, apresentamos, uma tecnica para fusão de classificadores e caracteristicas no cenário multi-classe através da combinação de classificadores binários. Nós validamos nossa abordagem numa aplicação real para classificação automática de frutas e legumes. Finalmente, nos deparamos com mais um problema interessante: como tornar a utilização de poderosos classificadores binarios no contexto multi-classe mais eficiente e eficaz? Assim, introduzimos uma tecnica para combinação de classificadores binarios (chamados classificadores base) para a resolução de problemas no contexto geral de multi-classificação. / Abstract: In this work, we propose the use of classifiers and machine learning techniques to extract useful information from data sets (e.g., images) to solve important problems in Image Processing and Computer Vision. We are particularly interested in: two and multi-class image categorization, hidden messages detection, discrimination among natural and forged images, authentication, and multiclassification. To start with, we present a comparative survey of the state-of-the-art in digital image forensics as well as hidden messages detection. Our objective is to show the importance of the existing solutions and discuss their limitations. In this study, we show that most of these techniques strive to solve two common problems in Machine Learning: the feature selection and the classification techniques to be used. Furthermore, we discuss the legal and ethical aspects of image
forensics analysis, such as, the use of digital images by criminals. We introduce a technique for image forensics analysis in the context of hidden messages detection and image classification in categories such as indoors, outdoors, computer generated, and art works. From this multi-class classification, we found some important questions: how to solve a multi-class problem in order to combine, for instance, several different features such as color, texture, shape, and silhouette without worrying about the pre-processing and normalization of the combined feature vector? How to take advantage of different classifiers, each one custom tailored to a specific set of classes in confusion? To cope with most of these problems, we present a feature and classifier fusion technique based on combinations of binary classifiers. We validate our solution with a real application for automatic produce classification. Finally, we address another interesting problem: how to combine powerful binary classifiers in the multi-class scenario more effectively? How to boost their efficiency? In this context, we present a solution that boosts the efficiency and effectiveness of multi-class from binary
techniques. / Doutorado / Engenharia de Computação / Doutor em Ciência da Computação
|
Page generated in 0.1532 seconds