Global ETD Search

91	A Graph Theoretic Clustering Algorithm based on the Regularity Lemma and Strategies to Exploit Clustering for Prediction Trivedi, Shubhendu 30 April 2012 (has links) The fact that clustering is perhaps the most used technique for exploratory data analysis is only a semaphore that underlines its fundamental importance. The general problem statement that broadly describes clustering as the identification and classification of patterns into coherent groups also implicitly indicates it's utility in other tasks such as supervised learning. In the past decade and a half there have been two developments that have altered the landscape of research in clustering: One is improved results by the increased use of graph theoretic techniques such as spectral clustering and the other is the study of clustering with respect to its relevance in semi-supervised learning i.e. using unlabeled data for improving prediction accuracies. In this work an attempt is made to make contributions to both these aspects. Thus our contributions are two-fold: First, we identify some general issues with the spectral clustering framework and while working towards a solution, we introduce a new algorithm which we call "Regularity Clustering" which makes an attempt to harness the power of the Szemeredi Regularity Lemma, a remarkable result from extremal graph theory for the task of clustering. Secondly, we investigate some practical and useful strategies for using clustering unlabeled data in boosting prediction accuracy. For all of these contributions we evaluate our methods against existing ones and also apply these ideas in a number of settings. Machine Learning Graph Mining Unsupervised Learning Ensemble Learning Semi-Supervised Learning Regularity Lemma Graph Partitioning
92	The differential geometric structure in supervised learning of classifiers Bai, Qinxun 12 May 2017 (has links) In this thesis, we study the overfitting problem in supervised learning of classifiers from a geometric perspective. As with many inverse problems, learning a classification function from a given set of example-label pairs is an ill-posed problem, i.e., there exist infinitely many classification functions that can correctly predict the class labels for all training examples. Among them, according to Occam's razor, simpler functions are favored since they are less overfitted to training examples and are therefore expected to perform better on unseen examples. The standard technique to enforce Occam's razor is to introduce a regularization scheme, which penalizes some type of complexity of the learned classification function. Some widely used regularization techniques are functional norm-based (Tikhonov) techniques, ensemble-based techniques, early stopping techniques, etc. However, there is important geometric information in the learned classification function that is closely related to overfitting, and has been overlooked by previous methods. In this thesis, we study the complexity of a classification function from a new geometric perspective. In particular, we investigate the differential geometric structure in the submanifold corresponding to the estimator of the class probability P(y\|x), based on the observation that overfitting produces rapid local oscillations and hence large mean curvature of this submanifold. We also show that our geometric perspective of supervised learning is naturally related to an elastic model in physics, where our complexity measure is a high dimensional extension of the surface energy in physics. This study leads to a new geometric regularization approach for supervised learning of classifiers. In our approach, the learning process can be viewed as a submanifold fitting problem that is solved by a mean curvature flow method. In particular, our approach finds the submanifold by iteratively fitting the training examples in a curvature or volume decreasing manner. Our technique is unified for both binary and multiclass classification, and can be applied to regularize any classification function that satisfies two requirements: firstly, an estimator of the class probability can be obtained; secondly, first and second derivatives of the class probability estimator can be calculated. For applications, where we apply our regularization technique to standard loss functions for classification, our RBF-based implementation compares favorably to widely used regularization methods for both binary and multiclass classification. We also design a specific algorithm to incorporate our regularization technique into the standard forward-backward training of deep neural networks. For theoretical analysis, we establish Bayes consistency for a specific loss function under some mild initialization assumptions. We also discuss the extension of our approach to situations where the input space is a submanifold, rather than a Euclidean space. / 2018-11-30T00:00:00Z Computer science Classification Class probability Mean curvature flow Regularization Supervised learning Volume
93	Social training : aprendizado semi supervisionado utilizando funções de escolha social / Social-Training: Semi-Supervised Learning Using Social Choice Functions Alves, Matheus January 2017 (has links) Dada a grande quantidade de dados gerados atualmente, apenas uma pequena porção dos mesmos pode ser rotulada manualmente por especialistas humanos. Isso é um desafio comum para aplicações de aprendizagem de máquina. Aprendizado semi-supervisionado aborda este problema através da manipulação dos dados não rotulados juntamente aos dados rotulados. Entretanto, se apenas uma quantidade limitada de exemplos rotulados está disponível, o desempenho da tarefa de aprendizagem de máquina (e.g., classificação) pode ser não satisfatória. Diversas soluções abordam este problema através do uso de uma ensemble de classificadores, visto que essa abordagem aumenta a diversidade dos classificadores. Algoritmos como o co-training e o tri-training utilizam múltiplas partições de dados ou múltiplos algoritmos de aprendizado para melhorar a qualidade da classificação de instâncias não rotuladas através de concordância por maioria simples. Além disso, existem abordagens que estendem esta ideia e adotam processos de votação menos triviais para definir os rótulos, como eleição por maioria ponderada, por exemplo. Contudo, estas soluções requerem que os rótulos possuam um certo nível de confiança para serem utilizados no treinamento. Consequentemente, nem toda a informação disponível é utilizada. Por exemplo: informações associadas a níveis de confiança baixos são totalmente ignoradas. Este trabalho propõe uma abordagem chamada social-training, que utiliza toda a informação disponível na tarefa de aprendizado semi-supervisionado. Para isto, múltiplos classificadores heterogêneos são treinados com os dados rotulados e geram diversas classificações para as mesmas instâncias não rotuladas. O social-training, então, agrega estes resultados em um único rótulo por meio de funções de escolha social que trabalham com agregação de rankings sobre as instâncias. Especificamente, a solução trabalha com casos de classificação binária. Os resultados mostram que trabalhar com o ranking completo, ou seja, rotular todas as instâncias não rotuladas, é capaz de reduzir o erro de classificação para alguns conjuntos de dados da base da UCI utilizados. / Given the huge quantity of data currently being generated, just a small portion of it can be manually labeled by human experts. This is a challenge for machine learning applications. Semi-supervised learning addresses this problem by handling unlabeled data alongside labeled ones. However, if only a limited quantity of labeled examples is available, the performance of the machine learning task (e.g., classification) can be very unsatisfactory. Many solutions address this issue by using a classifier ensemble because this increases diversity. Algorithms such as co-training and tri-training use multiple views or multiple learning algorithms in order to improve the classification of unlabeled instances through simple majority agreement. Also, there are approaches that extend this idea and adopt less trivial voting processes to define the labels, like weighted majority voting. Nevertheless, these solutions require some confidence level on the label in order to use it for training. Hence, not all information is used, i.e., information associated with low confidence level is disregarded completely. An approach called social-training is proposed, which uses all information available in the semi-supervised learning task. For this, multiple heterogeneous classifiers are trained with the labeled data and generate diverse classifications for the same unlabeled instances. Social-training then aggregates these results into a single label by means of social choice functions that work with rank aggregation over the instances. The solution addresses binary classification cases. The results show that working with the full ranking, i.e., labeling all unlabeled instances, is able to reduce the classification error for some UCI data sets used. Aprendizado : máquina Gestão do conhecimento Semi-supervised learning Social choice functions Classifier ensembles
94	Classify-normalize-classify : a novel data-driven framework for classifying forest pixels in remote sensing images / Classifica-normaliza-classifica : um nova abordagem para classficar pixels de floresta em imagens de sensoriamento remoto Souza, César Salgado Vieira de January 2017 (has links) O monitoramento do meio ambiente e suas mudanças requer a análise de uma grade quantidade de imagens muitas vezes coletadas por satélites. No entanto, variações nos sinais devido a mudanças nas condições atmosféricas frequentemente resultam num deslocamento da distribuição dos dados para diferentes locais e datas. Isso torna difícil a distinção dentre as várias classes de uma base de dados construída a partir de várias imagens. Neste trabalho introduzimos uma nova abordagem de classificação supervisionada, chamada Classifica-Normaliza-Classifica (CNC), para amenizar o problema de deslocamento dos dados. A proposta é implementada usando dois classificadores. O primeiro é treinado em imagens não normalizadas de refletância de topo de atmosfera para distinguir dentre pixels de uma classe de interesse (CDI) e pixels de outras categorias (e.g. floresta versus não-floresta). Dada uma nova imagem de teste, o primeiro classificador gera uma segmentação das regiões da CDI e então um vetor mediano é calculado para os valores espectrais dessas áreas. Então, esse vetor é subtraído de cada pixel da imagem e portanto fixa a distribuição de dados de diferentes imagens num mesmo referencial. Finalmente, o segundo classificador, que é treinado para minimizar o erro de classificação em imagens já centralizadas pela mediana, é aplicado na imagem de teste normalizada no segundo passo para produzir a segmentação binária final. A metodologia proposta foi testada para detectar desflorestamento em pares de imagens co-registradas da Landsat 8 OLI sobre a floresta Amazônica. Experimentos usando imagens multiespectrais de refletância de topo de atmosfera mostraram que a CNC obteve maior acurácia na detecção de desflorestamento do que classificadores aplicados em imagens de refletância de superfície fornecidas pelo United States Geological Survey. As acurácias do método proposto também se mostraram superiores às obtidas pelas máscaras de desflorestamento do programa PRODES. / Monitoring natural environments and their changes over time requires the analysis of a large amount of image data, often collected by orbital remote sensing platforms. However, variations in the observed signals due to changing atmospheric conditions often result in a data distribution shift for different dates and locations making it difficult to discriminate between various classes in a dataset built from several images. This work introduces a novel supervised classification framework, called Classify-Normalize-Classify (CNC), to alleviate this data shift issue. The proposed scheme uses a two classifier approach. The first classifier is trained on non-normalized top-of-the-atmosphere reflectance samples to discriminate between pixels belonging to a class of interest (COI) and pixels from other categories (e.g. forest vs. non-forest). At test time, the estimated COI’s multivariate median signal, derived from the first classifier segmentation, is subtracted from the image and thus anchoring the data distribution from different images to the same reference. Then, a second classifier, pre-trained to minimize the classification error on COI median centered samples, is applied to the median-normalized test image to produce the final binary segmentation. The proposed methodology was tested to detect deforestation using bitemporal Landsat 8 OLI images over the Amazon rainforest. Experiments using top-of-the-atmosphere multispectral reflectance images showed that the deforestation was mapped by the CNC framework more accurately as compared to running a single classifier on surface reflectance images provided by the United States Geological Survey (USGS). Accuracies from the proposed framework also compared favorably with the benchmark masks of the PRODES program. Processamento : Imagem Sensoriamento remoto Image normalization Radiometric correction Pixel classification Forest segmentation Deforestation detection Supervised learning
95	Técnicas para o problema de dados desbalanceados em classificação hierárquica / Techniques for the problem of imbalanced data in hierarchical classification Barella, Victor Hugo 24 July 2015 (has links) Os recentes avanços da ciência e tecnologia viabilizaram o crescimento de dados em quantidade e disponibilidade. Junto com essa explosão de informações geradas, surge a necessidade de analisar dados para descobrir conhecimento novo e útil. Desse modo, áreas que visam extrair conhecimento e informações úteis de grandes conjuntos de dados se tornaram grandes oportunidades para o avanço de pesquisas, tal como o Aprendizado de Máquina (AM) e a Mineração de Dados (MD). Porém, existem algumas limitações que podem prejudicar a acurácia de alguns algoritmos tradicionais dessas áreas, por exemplo o desbalanceamento das amostras das classes de um conjunto de dados. Para mitigar tal problema, algumas alternativas têm sido alvos de pesquisas nos últimos anos, tal como o desenvolvimento de técnicas para o balanceamento artificial de dados, a modificação dos algoritmos e propostas de abordagens para dados desbalanceados. Uma área pouco explorada sob a visão do desbalanceamento de dados são os problemas de classificação hierárquica, em que as classes são organizadas em hierarquias, normalmente na forma de árvore ou DAG (Direct Acyclic Graph). O objetivo deste trabalho foi investigar as limitações e maneiras de minimizar os efeitos de dados desbalanceados em problemas de classificação hierárquica. Os experimentos realizados mostram que é necessário levar em consideração as características das classes hierárquicas para a aplicação (ou não) de técnicas para tratar problemas dados desbalanceados em classificação hierárquica. / Recent advances in science and technology have made possible the data growth in quantity and availability. Along with this explosion of generated information, there is a need to analyze data to discover new and useful knowledge. Thus, areas for extracting knowledge and useful information in large datasets have become great opportunities for the advancement of research, such as Machine Learning (ML) and Data Mining (DM). However, there are some limitations that may reduce the accuracy of some traditional algorithms of these areas, for example the imbalance of classes samples in a dataset. To mitigate this drawback, some solutions have been the target of research in recent years, such as the development of techniques for artificial balancing data, algorithm modification and new approaches for imbalanced data. An area little explored in the data imbalance vision are the problems of hierarchical classification, in which the classes are organized into hierarchies, commonly in the form of tree or DAG (Direct Acyclic Graph). The goal of this work aims at investigating the limitations and approaches to minimize the effects of imbalanced data with hierarchical classification problems. The experimental results show the need to take into account the features of hierarchical classes when deciding the application of techniques for imbalanced data in hierarchical classification. Aprendizado supervisionado Classificação hierárquica Dados desbalanceados Data imbalance Desbalanceamento de dados Hierarchical classification Imbalanced data Supervised learning
96	Rotulação de indivíduos representativos no aprendizado semissupervisionado baseado em redes: caracterização, realce, ganho e filosofia / Representatives labeling for network-based semi-supervised learning:characterization, highlighting, gain and philosophy Araújo, Bilzã Marques de 29 April 2015 (has links) Aprendizado semissupervisionado (ASS) é o nome dado ao paradigma de aprendizado de máquina que considera tanto dados rotulados como dados não rotulados. Embora seja considerado frequentemente como um meio termo entre os paradigmas supervisionado e não supervisionado, esse paradigma é geralmente aplicado a tarefas preditivas ou descritivas. Na tarefa preditiva de classificação, p. ex., o objetivo é rotular dados não rotulados de acordo com os rótulos dos dados rotulados. Nesse caso, enquanto que os dados não rotulados descrevem as distribuições dos dados e mediam a propagação dos rótulos, os itens de dados rotulados semeiam a propagação de rótulos e guiam-na à estabilidade. No entanto, dados são gerados tipicamente não rotulados e sua rotulação requer o envolvimento de especialistas no domínio, rotulando-os manualmente. Dificuldades na visualização de grandes volumes de dados, bem como o custo associado ao envolvimento do especialista, são desafios que podem restringir o desempenho dessa tarefa. Por- tanto, o destacamento automático de bons candidatos a dados rotulados, doravante denominados indivíduos representativos, é uma tarefa de grande importância, e pode proporcionar uma boa relação entre o custo com especialista e o desempenho do aprendizado. Dentre as abordagens de ASS discriminadas na literatura, nosso interesse de estudo se concentra na abordagem baseada em redes, onde conjuntos de dados são representados relacionalmente, através da abstração gráfica. Logo, o presente trabalho tem como objetivo explorar a influência dos nós rotulados no desempenho do ASS baseado em redes, i.e., estudar a caracterização de nós representativos, como a estrutura da rede pode realçá-los, o ganho de desempenho de ASS proporcionado pela rotulação manual dos mesmos, e aspectos filosóficos relacionados. Em relação à caracterização, critérios de caracterização de nós centrais em redes são estudados considerando-se redes com estruturas modulares bem definidas. Contraintuitivamente, nós bastantes conectados (hubs) não são muito representativos. Nós razoavelmente conectados em vizinhanças pouco conectadas, por outro lado, são; estritamente local, esse critério de caracterização é escalável a grandes volumes de dados. Em redes com distribuição de grau homogênea - modelo Girvan-Newman (GN), nós com alto coeficiente de agrupamento também mostram-se representativos. Por outro lado, em redes com distribuição de grau heterogênea - modelo Lancichinetti-Fortunato-Radicchi (LFR), nós com alta intermedialidade se destacam. Nós com alto coeficiente de agrupamento em redes GN estão tipicamente situados em motifs do tipo quase-clique; nós com alta intermedialidade em redes LFR são hubs situados na borda das comunidades. Em ambos os casos, os nós destacados são excelentes regularizadores. Além disso, como critérios diversos se destacam em redes com características diversas, abordagens unificadas para a caracterização de nós representativos também foram estudadas. Crítica para o realce de indivíduos representativos e o bom desempenho da classificação semissupervisionada, a construção de redes a partir de bases de dados vetoriais também foi estudada. O método denominado AdaRadius foi proposto, e apresenta vantagens tais como adaptabilidade em bases de dados com densidade variada, baixa dependência da configuração de seus parâmetros, e custo computacional razoável, tanto sobre dados pool-based como incrementais. As redes resultantes, por sua vez, são esparsas, porém conectadas, e permitem que a classificação semissupervisionada se favoreça da rotulação prévia de indivíduos representativos. Por fim, também foi estudada a validação de métodos de construção de redes para o ASS, sendo proposta a medida denominada coerência grafo-rótulos de Katz. Em suma, os resultados discutidos apontam para a validade da seleção de indivíduos representativos para semear a classificação semissupervisionada, corroborando a hipótese central da presente tese. Analogias são encontrados em diversos problemas modelados em redes, tais como epidemiologia, propagação de rumores e informações, resiliência, letalidade, grandmother cells, e crescimento e auto-organização. / Semi-supervised learning (SSL) is the name given to the machine learning paradigm that considers both labeled and unlabeled data. Although often defined as a mid-term between unsupervised and supervised machine learning, this paradigm is usually applied to predictive or descriptive tasks. In the classification task, for example, the goal is to label the unlabeled data according to the labels of the labeled data. In this case, while the unlabeled data describes the data distributions and mediate the label propagation, the labeled data seeds the label propagation and guide it to the stability. However, as a whole, data is generated unlabeled, and to label data requires the involvement of domain specialists, labeling it by hand. Difficulties on visualizing huge amounts of data, as well as the cost of the specialists involvement, are challenges which may constraint the labeling task performance. Therefore, the automatic highlighting of good candidates to label by hand, henceforth called representative individuals, is a high value task, which may result in a good tradeoff between the cost with the specialist and the machine learning performance. Among the SSL approaches in the literature, our study is focused on the network--based approache, where datasets are represented relationally, through the graphic abstraction. Thus, the current study aims to explore and exploit the influence of the labeled data on the SSL performance, that is, the proper characterization of representative nodes, how the network structure may enhance them, the SSL performance gain due to labeling them by hand, and related philosophical aspects. Concerning the characterization, central nodes characterization criteria were studied on networks with well-defined modular structures. Counterintuitively, highly connected nodes (hubs) are not much representatives. Not so connected nodes placed in low connectivity neighborhoods are, though. Strictly local, this characterization is scalable to huge volumes of data. In networks with homogeneous degree distribution - Girvan-Newman networks (GN), nodes with high clustering coefficient also figure out as representatives. On the other hand, in networks with inhomogeneous degree distribution - Lancichinetti-Fortunato-Radicchi networks (LFR), nodes with high betweenness stand out. Nodes with high clustering coefficient in GN networks typically lie in almost-cliques motifs; nodes with high betweenness in LFR networks are highly connected nodes, which lie in communities borders. In both cases, the highlighted nodes are outstanding regularizers. Besides that, unified approaches to characterize representative nodes were studied because diverse criteria stand out for diverse networks. Crucial for highlighting representative nodes and ensure good SSL performance, the graph construction from vector-based datasets was also studied. The method called AdaRadius was introduced and presents advantages such as adaptability to data with variable density, low dependency on parameters settings, and reasonable computational cost on both pool based and incremental data. Yielding networks are sparse but connected and allow the semi-supervised classification to take great advantage of the manual labeling of representative nodes. Lastly, the validation of graph construction methods for SSL was studied, being proposed the validation measure called graph-labels Katz coherence. Summing up, the discussed results give rise to the validity of representative individuals selection to seed the semi-supervised classification, supporting the central assumption of current thesis. Analogies may be found in several real-world network problems, such as epidemiology, rumors and information spreading, resilience, lethality, grandmother cells, and network evolving and self-organization. Amostragem de dados Aprendizado semisupervisionado Compelx networks Data sampling Redes complexas Semi-supervised learning
97	USING MACHINE LEARNING TO PREDICT ACUTE KIDNEY INJURIES AMONG PATIENTS TREATED WITH EMPIRIC ANTIBIOTICS Rutter, Wilbur Cliff, IV 01 January 2018 (has links) Acute kidney injury (AKI) is a significant adverse effect of many medications that leads to increased morbidity, cost, and mortality among hospitalized patients. Recent literature supports a strong link between empiric combination antimicrobial therapy and increased AKI risk. As briefly summarized below, the following chapters describe my research conducted in this area. Chapter 1 presents and summarizes the published literature connecting combination antimicrobial therapy with increased AKI incidence. This chapter sets the specific aims I aim to achieve during my dissertation project. Chapter 2 describes a study in which patients receiving vancomycin (VAN) in combination with piperacillin-tazobactam (TZP) or cefepime (CFP). I matched over 1,600 patients receiving both combinations and found a significantly lower incidence of AKI among patient receiving the CFP+VAN combination when controlling for confounders. The conclusion of this study is that VAN+TZP has significantly increased risk of AKI compared to CFP+VAN, confirming the results of previous literature. Chapter 3 presents a study of patients receiving VAN in combination with meropenem (MEM) or TZP. This study included over 10,000 patients and used inverse probability of treatment weighting to conserve data for this population. After controlling for confounders, VAN+TZP was associated with significantly more AKI than VAN+MEM. This study demonstrates that MEM is clinically viable alternative to TZP in empiric antimicrobial therapy. Chapter 4 describes a study in which patients receiving TZP or ampicillin-sulbactam (SAM) with or without VAN were analyzed for AKI incidence. The purpose of this study was to identify whether the addition of a beta-lactamase inhibitor to a beta-lactam increased the risk of AKI. This study included more than 2,400 patients receiving either agent and found that there were no differences in AKI among patients receiving SAM or TZP; however, AKI was significantly more common in the TZP group when stratified by VAN exposure. This study shows that comparisons of TZP to other beta-lactams without beta-lactamase inhibitors are valid. Chapter 5 presents a study of almost 30,000 patients who received combination antimicrobial therapy over an 8-year period. This study demonstrates similar AKI incidence to previous literature and the studies presented in the previous chapters. Additionally, the results of the predictive models suggest that further work in this research area is needed. The studies conducted present a clear message that patients receiving VAN+TZP are at significantly greater risk of AKI than alternative regimens for empiric coverage of infection. Vancomycin piperacillin-tazobactam nephrotoxicty supervised learning Health Information Technology Pharmacy and Pharmaceutical Sciences
98	Prediction of Hierarchical Classification of Transposable Elements Using Machine Learning Techniques Panta, Manisha 05 August 2019 (has links) Transposable Elements (TEs) or jumping genes are the DNA sequences that have an intrinsic capability to move within a host genome from one genomic location to another. Studies show that the presence of a TE within or adjacent to a functional gene may alter its expression. TEs can also cause an increase in the rate of mutation and can even promote gross genetic arrangements. Thus, the proper classification of the identified jumping genes is important to understand their genetic and evolutionary effects. While computational methods have been developed that perform either binary classification or multi-label classification of TEs, few studies have focused on their hierarchical classification. The existing methods have limited accuracy in classifying TEs. In this study, we examine the performance of a variety of machine learning (ML) methods and propose a robust augmented Stacking-based ML method, ClassifyTE, for the hierarchical classification of TEs with high accuracy. Transposable Elements Hierarchical Classification Supervised Learning Machine Learning Computer Sciences Other Computer Sciences Physical Sciences and Mathematics
99	A Decision Support Model for Personalized Cancer Treatment Rico-Fontalvo, Florentino Antonio 30 October 2014 (has links) This work is motivated by the need of providing patients with a decision support system that facilitates the selection of the most appropriate treatment strategy in cancer treatment. Treatment options are currently subject to predetermined clinical pathways and medical expertise, but generally, do not consider the individual patient characteristics or preferences. Although genomic patient data are available, this information is rarely used in the clinical setting for real-life patient care. In the area of personalized medicine, the advancement in the fundamental understanding of cancer biology and clinical oncology can promote the prevention, detection, and treatment of cancer diseases. The objectives of this research are twofold. 1) To develop a patient-centered decision support model that can determine the most appropriate cancer treatment strategy based on subjective medical decision criteria, and patient's characteristics concerning the treatment options available and desired clinical outcomes; and 2) to develop a methodology to organize and analyze gene expression data and validate its accuracy as a predictive model for patient's response to radiation therapy (tumor radiosensitivity). The complexity and dimensionality of the data generated from gene expression microarrays requires advanced computational approaches. The microarray gene expression data processing and prediction model is built in four steps: response variable transformation to emphasize the lower and upper extremes (related to Radiosensitive and Radioresistant cell lines); dimensionality reduction to select candidate gene expression probesets; model development using a Random Forest algorithm; and validation of the model in two clinical cohorts for colorectal and esophagus cancer patients. Subjective human decision-making plays a significant role in defining the treatment strategy. Thus, the decision model developed in this research uses language and mechanisms suitable for human interpretation and understanding through fuzzy sets and degree of membership. This treatment selection strategy is modeled using a fuzzy logic framework to account for the subjectivity associated to the medical strategy and the patient's characteristics and preferences. The decision model considers criteria associated to survival rate, adverse events and efficacy (measured by radiosensitivity) for treatment recommendation. Finally, a sensitive analysis evaluates the impact of introducing radiosensitivity in the decision-making process. The intellectual merit of this research stems from the fact that it advances the science of decision-making by integrating concepts from the fields of artificial intelligence, medicine, biology and biostatistics to develop a decision aid approach that considers conflictive objectives and has a high practical value. The model focuses on criteria relevant to cancer treatment selection but it can be modified and extended to other scenarios beyond the healthcare environment. Fuzzy Logic Gene Expression Random Forest Rectal Cancer Supervised Learning Systems Biology Industrial Engineering
100	Application of supervised and unsupervised learning to analysis of the arterial pressure pulse Walsh, Andrew Michael, Graduate school of biomedical engineering, UNSW January 2006 (has links) This thesis presents an investigation of statistical analytical methods applied to the analysis of the shape of the arterial pressure waveform. The arterial pulse is analysed by a selection of both supervised and unsupervised methods of learning. Supervised learning methods are generally better known as regression. Unsupervised learning methods seek patterns in data without the specification of a target variable. The theoretical relationship between arterial pressure and wave shape is first investigated by study of a transmission line model of the arterial tree. A meta-database of pulse waveforms obtained by the SphygmoCor"??" device is then analysed by the unsupervised learning technique of Self Organising Maps (SOM). The map patterns indicate that the observed arterial pressures affect the wave shape in a similar way as predicted by the theoretical model. A database of continuous arterial pressure obtained by catheter line during sleep is used to derive supervised models that enable estimation of arterial pressures, based on the measured wave shapes. Independent component analysis (ICA) is also used in a supervised learning methodology to show the theoretical plausibility of separating the pressure signals from unwanted noise components. The accuracy and repeatability of the SphygmoCor?? device is measured and discussed. Alternative regression models are introduced that improve on the existing models in the estimation of central cardiovascular parameters from peripheral arterial wave shapes. Results of this investigation show that from the information in the wave shape, it is possible, in theory, to estimate the continuous underlying pressures within the artery to a degree of accuracy acceptable to the Association for the Advancement of Medical Instrumentation. This could facilitate a new role for non-invasive sphygmographic devices, to be used not only for feature estimation but as alternatives to invasive arterial pressure sensors in the measurement of continuous blood pressure. self organising maps SOM blood pressure supervised learning independent component analysis sphygmocor arterial pressure

Search results