Global ETD Search

131	The query based learning system for lifetime prediction of metallic components Ge, Esther January 2008 (has links) This research project was a step forward in developing an efficient data mining method for estimating the service life of metallic components in Queensland school buildings. The developed method links together the different data sources of service life information and builds the model for a real situation when the users have information on limited inputs only. A practical lifetime prediction system was developed for the industry partners of this project including Queensland Department of Public Works and Queensland Department of Main Roads. The system provides high accuracy in practice where not all inputs are available for querying to the system.
132	Machine learning for automatic classification of remotely sensed data Milne, Linda, Computer Science & Engineering, Faculty of Engineering, UNSW January 2008 (has links) As more and more remotely sensed data becomes available it is becoming increasingly harder to analyse it with the more traditional labour intensive, manual methods. The commonly used techniques, that involve expert evaluation, are widely acknowledged as providing inconsistent results, at best. We need more general techniques that can adapt to a given situation and that incorporate the strengths of the traditional methods, human operators and new technologies. The difficulty in interpreting remotely sensed data is that often only a small amount of data is available for classification. It can be noisy, incomplete or contain irrelevant information. Given that the training data may be limited we demonstrate a variety of techniques for highlighting information in the available data and how to select the most relevant information for a given classification task. We show that more consistent results between the training data and an entire image can be obtained, and how misclassification errors can be reduced. Specifically, a new technique for attribute selection in neural networks is demonstrated. Machine learning techniques, in particular, provide us with a means of automating classification using training data from a variety of data sources, including remotely sensed data and expert knowledge. A classification framework is presented in this thesis that can be used with any classifier and any available data. While this was developed in the context of vegetation mapping from remotely sensed data using machine learning classifiers, it is a general technique that can be applied to any domain. The emphasis of the applicability for this framework being domains that have inadequate training data available. contribution analysis ensemble classifiers multi-strategy classification attribute selection feature selection
133	Public Health Surveillance in High-Dimensions with Supervised Learning January 2010 (has links) abstract: Public health surveillance is a special case of the general problem where counts (or rates) of events are monitored for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates. This leads to an important challenge to detect a change that only occurs within a region, initially unspecified, defined by these covariates. Current methods are typically limited to spatial and/or temporal covariate information and often fail to use all the information available in modern data that can be paramount in unveiling these subtle changes. Additional complexities associated with modern health data that are often not accounted for by traditional methods include: covariates of mixed type, missing values, and high-order interactions among covariates. This work proposes a transform of public health surveillance to supervised learning, so that an appropriate learner can inherently address all the complexities described previously. At the same time, quantitative measures from the learner can be used to define signal criteria to detect changes in rates of events. A Feature Selection (FS) method is used to identify covariates that contribute to a model and to generate a signal. A measure of statistical significance is included to control false alarms. An alternative Percentile method identifies the specific cases that lead to changes using class probability estimates from tree-based ensembles. This second method is intended to be less computationally intensive and significantly simpler to implement. Finally, a third method labeled Rule-Based Feature Value Selection (RBFVS) is proposed for identifying the specific regions in high-dimensional space where the changes are occurring. Results on simulated examples are used to compare the FS method and the Percentile method. Note this work emphasizes the application of the proposed methods on public health surveillance. Nonetheless, these methods can easily be extended to a variety of applications where counts (or rates) of events are monitored for changes. Such problems commonly occur in domains such as manufacturing, economics, environmental systems, engineering, as well as in public health. / Dissertation/Thesis / Ph.D. Industrial Engineering 2010 Industrial Engineering Public Health Statistics Data Mining Feature Selection Feature Value Selection Public Health Surveillance
134	Um modelo neural de aprimoramento progressivo para redução de dimensionalidade / A Progressive Enhancement Neural Model for dimensionality reduction Camargo, Sandro da Silva January 2010 (has links) Nas últimas décadas, avanços em tecnologias de geração, coleta e armazenamento de dados têm contribuído para aumentar o tamanho dos bancos de dados nas diversas áreas de conhecimento humano. Este aumento verifica-se não somente em relação à quantidade de amostras de dados, mas principalmente em relação à quantidade de características descrevendo cada amostra. A adição de características causa acréscimo de dimensões no espaço matemático, conduzindo ao crescimento exponencial do hipervolume dos dados, problema denominado “maldição da dimensionalidade”. A maldição da dimensionalidade tem sido um problema rotineiro para cientistas que, a fim de compreender e explicar determinados fenômenos, têm se deparado com a necessidade de encontrar estruturas significativas ocultas, de baixa dimensão, dentro de dados de alta dimensão. Este processo denomina-se redução de dimensionalidade dos dados (RDD). Do ponto de vista computacional, a conseqüência natural da RDD é uma diminuição do espaço de busca de hipóteses, melhorando o desempenho e simplificando os resultados da modelagem de conhecimento em sistemas autônomos de aprendizado. Dentre as técnicas utilizadas atualmente em sistemas autônomos de aprendizado, as redes neurais artificiais (RNAs) têm se tornado particularmente atrativas para modelagem de sistemas complexos, principalmente quando a modelagem é difícil ou quando a dinâmica do sistema não permite o controle on-line. Apesar de serem uma poderosa técnica, as RNAs têm seu desempenho afetado pela maldição da dimensionalidade. Quando a dimensão do espaço de entradas é alta, as RNAs podem utilizar boa parte de seus recursos para representar porções irrelevantes do espaço de busca, dificultando o aprendizado. Embora as RNAs, assim como outras técnicas de aprendizado de máquina, consigam identificar características mais informativas para um processo de modelagem, a utilização de técnicas de RDD frequentemente melhora os resultados do processo de aprendizado. Este trabalho propõe um wrapper que implementa um modelo neural de aprimoramento progressivo para RDD em sistemas autônomos de aprendizado supervisionado visando otimizar o processo de modelagem. Para validar o modelo neural de aprimoramento progressivo, foram realizados experimentos com bancos de dados privados e de repositórios públicos de diferentes domínios de conhecimento. A capacidade de generalização dos modelos criados é avaliada por meio de técnicas de validação cruzada. Os resultados obtidos demonstram que o modelo neural de aprimoramento progressivo consegue identificar características mais informativas, permitindo a RDD, e tornando possível criar modelos mais simples e mais precisos. A implementação da abordagem e os experimentos foram realizados no ambiente Matlab, utilizando o toolbox de RNAs. / In recent decades, advances on data generation, collection and storing technologies have contributed to increase databases size in different knowledge areas. This increase is seen not only regarding samples amount, but mainly regarding dimensionality, i.e. the amount of features describing each sample. Features adding causes dimension increasing in mathematical space, leading to an exponential growth of data hypervolume. This problem is called “the curse of dimensionality”. The curse of dimensionality has been a routine problem for scientists, that in order to understand and explain some phenomena, have faced with the demand to find meaningful low dimensional structures hidden in high dimensional search spaces. This process is called data dimensionality reduction (DDR). From computational viewpoint, DDR natural consequence is a reduction of hypothesis search space, improving performance and simplifying the knowledge modeling results in autonomous learning systems. Among currently used techniques in autonomous learning systems, artificial neural networks (ANNs) have becoming particularly attractive to model complex systems, when modeling is hard or when system dynamics does not allow on-line control. Despite ANN being a powerful tool, their performance is affected by the curse of dimensionality. When input space dimension is high, ANNs can use a significant part of their resources to represent irrelevant parts of input space making learning process harder. Although ANNs, and other machine learning techniques, can identify more informative features for a modeling process, DDR techniques often improve learning results. This thesis proposes a wrapper which implements a Progressive Enhancement Neural Model to DDR in supervised autonomous learning systems in order to optimize the modeling process. To validate the proposed approach, experiments were performed with private and public databases, from different knowledge domains. The generalization ability of developed models is evaluated by means of cross validation techniques. Obtained results demonstrate that the proposed approach can identify more informative features, allowing DDR, and becoming possible to create simpler and more accurate models. The implementation of the proposed approach and related experiments were performed in Matlab Environment, using ANNs toolbox. Redes neurais Inteligência artificial Heurística Lógica modal Heuristics Wrapper Dimensionality reduction Feature selection Neural modeling
135	Abordagens de seleção de variáveis para classificação e regressão em química analítica / Feature selection approaches for classification and regression in analytical chemistry Soares, Felipe January 2017 (has links) A utilização de técnicas analíticas para classificação de produtos ou predição de propriedades químicas tem se mostrado de especial interesse tanto na indústria quanto na academia. Através da análise da concentração elementar, ou de técnicas de espectroscopia, é possível obter-se um grande número de informações sobre as amostras em análise. Contudo, o elevado número de variáveis disponíveis (comprimentos de onda, ou elementos químicos, por exemplo) pode prejudicar a acurácia dos modelos gerados, necessitando da utilização de técnicas para seleção das variáveis mais relevantes com vistas a tornar os modelos mais robustos. Esta dissertação propõe métodos para seleção de variáveis em química analítica com propósito de classificação de produtos e predição via regressão de propriedades químicas. Para tal, inicialmente propõe-se um método de seleção de intervalos não equidistantes de comprimentos de onda em espectroscopia para classificação de combustíveis, o qual baseia-se na distância entre espectros médios de duas classes distintas; os intervalos são então utilizados em técnicas de classificação.Ao ser aplicado em dois bancos de dados de espectroscopia, o método foi capaz de reduzir o número de variáveis utilizadas para somente 23,19% e 4,95% das variáveis originais, diminuindo o erro de 13,90% para 11,63% e de 4,71% para 1,21%. Em seguida é apresentado um método para seleção dos elementos mais relevantes para classificação de vinhos provenientes de quatro países da América do Sul, baseado nos parâmetros da análise discriminante linear. O método possibilitou atingir acurácia média de 99,9% retendo em média 6,82 elementos químicos, sendo que a melhor acurácia média atingida utilizando todos os 45 elementos disponíveis foi de 91,2%. Por fim, utiliza-se o algoritmo support vector regression – recursive feature elimination (SVR-RFE) para seleção dos comprimentos de onda mais importantes na regressão por vetores de suporte. Ao serem aplicado em 12 bancos de dados juntamente com outros métodos de seleção e regressão, o SVR e o SVR-RFE obtiveram os melhores resultados em 8 deles, sendo que o SVR-RFE foi significativamente superior dentre os algoritmos de seleção. A aplicação dos métodos deseleção de variáveis propostos na presente dissertação possibilitou a realização de classificações e regressões mais robustas, bem como a redução do número de variáveis retidas nos modelos. / The use of analytical techniques in product classification or chemical properties estimation has been of great interest in both industry and academy. The employment of spectroscopy techniques, or through elemental analysis, provides a great amount of information about the samples being analyzed. However, the large number of features (e.g.: wavelengths or chemical elements) included in the models may jeopardize the accuracy, urging the employment of feature selection techniques to identify the most relevant features, producing more robust models. This dissertation presents feature selection methods focused on analytical chemistry, aiming at product classification and chemical property estimation (regression). For that matter, the first proposed method aims at identifying the most relevant wavelength intervals for fuel classification based on the distance between the average spectra of the two classes being analyzed. The identified intervals are then used as input for classifiers. When applied to two spectroscopy datasets, the proposed framework reduced the number of features to just 23.19% and 4.95% of the original ones, also reducing the misclassification error to 4.71% and 1.21%. Next, a method for identifying the most important elements for wine classification is presented, which is based on the parameters from linear discriminant analysis and aims at classifying wine samples produced in four south American countries. The method achieved average accuracy of 99.9% retaining average 8.82 chemical elements; the best accuracy using all 45 available chemical elements was 91.2%. Finally, the use of the support vector regression – recursive feature elimination (SVR-RFE) algorithm is proposed to identify the most relevant wavelengths for support vector regression. The proposed framework was applied to 12 datasets with other feature selection approaches and regression algorithms. SVR and SVR-RFE achieved the best results in 8 out of 12 datasets; SVR-RFE when compared to other feature selection algorithms proved have significantly better performance. The employment of the proposed feature selection methodsin this dissertation yield more robust classifiers and regression models, also reducing the number of features needed to produce accurate results. Vinho : Classificação Química analítica Espectroscopia Combustíveis : Classificação Feature selection Classification Regression Analytical chemistry Spectroscopy Elemental analysis
136	Um modelo neural de aprimoramento progressivo para redução de dimensionalidade / A Progressive Enhancement Neural Model for dimensionality reduction Camargo, Sandro da Silva January 2010 (has links) Nas últimas décadas, avanços em tecnologias de geração, coleta e armazenamento de dados têm contribuído para aumentar o tamanho dos bancos de dados nas diversas áreas de conhecimento humano. Este aumento verifica-se não somente em relação à quantidade de amostras de dados, mas principalmente em relação à quantidade de características descrevendo cada amostra. A adição de características causa acréscimo de dimensões no espaço matemático, conduzindo ao crescimento exponencial do hipervolume dos dados, problema denominado “maldição da dimensionalidade”. A maldição da dimensionalidade tem sido um problema rotineiro para cientistas que, a fim de compreender e explicar determinados fenômenos, têm se deparado com a necessidade de encontrar estruturas significativas ocultas, de baixa dimensão, dentro de dados de alta dimensão. Este processo denomina-se redução de dimensionalidade dos dados (RDD). Do ponto de vista computacional, a conseqüência natural da RDD é uma diminuição do espaço de busca de hipóteses, melhorando o desempenho e simplificando os resultados da modelagem de conhecimento em sistemas autônomos de aprendizado. Dentre as técnicas utilizadas atualmente em sistemas autônomos de aprendizado, as redes neurais artificiais (RNAs) têm se tornado particularmente atrativas para modelagem de sistemas complexos, principalmente quando a modelagem é difícil ou quando a dinâmica do sistema não permite o controle on-line. Apesar de serem uma poderosa técnica, as RNAs têm seu desempenho afetado pela maldição da dimensionalidade. Quando a dimensão do espaço de entradas é alta, as RNAs podem utilizar boa parte de seus recursos para representar porções irrelevantes do espaço de busca, dificultando o aprendizado. Embora as RNAs, assim como outras técnicas de aprendizado de máquina, consigam identificar características mais informativas para um processo de modelagem, a utilização de técnicas de RDD frequentemente melhora os resultados do processo de aprendizado. Este trabalho propõe um wrapper que implementa um modelo neural de aprimoramento progressivo para RDD em sistemas autônomos de aprendizado supervisionado visando otimizar o processo de modelagem. Para validar o modelo neural de aprimoramento progressivo, foram realizados experimentos com bancos de dados privados e de repositórios públicos de diferentes domínios de conhecimento. A capacidade de generalização dos modelos criados é avaliada por meio de técnicas de validação cruzada. Os resultados obtidos demonstram que o modelo neural de aprimoramento progressivo consegue identificar características mais informativas, permitindo a RDD, e tornando possível criar modelos mais simples e mais precisos. A implementação da abordagem e os experimentos foram realizados no ambiente Matlab, utilizando o toolbox de RNAs. / In recent decades, advances on data generation, collection and storing technologies have contributed to increase databases size in different knowledge areas. This increase is seen not only regarding samples amount, but mainly regarding dimensionality, i.e. the amount of features describing each sample. Features adding causes dimension increasing in mathematical space, leading to an exponential growth of data hypervolume. This problem is called “the curse of dimensionality”. The curse of dimensionality has been a routine problem for scientists, that in order to understand and explain some phenomena, have faced with the demand to find meaningful low dimensional structures hidden in high dimensional search spaces. This process is called data dimensionality reduction (DDR). From computational viewpoint, DDR natural consequence is a reduction of hypothesis search space, improving performance and simplifying the knowledge modeling results in autonomous learning systems. Among currently used techniques in autonomous learning systems, artificial neural networks (ANNs) have becoming particularly attractive to model complex systems, when modeling is hard or when system dynamics does not allow on-line control. Despite ANN being a powerful tool, their performance is affected by the curse of dimensionality. When input space dimension is high, ANNs can use a significant part of their resources to represent irrelevant parts of input space making learning process harder. Although ANNs, and other machine learning techniques, can identify more informative features for a modeling process, DDR techniques often improve learning results. This thesis proposes a wrapper which implements a Progressive Enhancement Neural Model to DDR in supervised autonomous learning systems in order to optimize the modeling process. To validate the proposed approach, experiments were performed with private and public databases, from different knowledge domains. The generalization ability of developed models is evaluated by means of cross validation techniques. Obtained results demonstrate that the proposed approach can identify more informative features, allowing DDR, and becoming possible to create simpler and more accurate models. The implementation of the proposed approach and related experiments were performed in Matlab Environment, using ANNs toolbox. Redes neurais Inteligência artificial Heurística Lógica modal Heuristics Wrapper Dimensionality reduction Feature selection Neural modeling
137	Simultaneous Variable and Feature Group Selection in Heterogeneous Learning: Optimization and Applications January 2014 (has links) abstract: Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and learn interpretable models. Due to the multi-modality nature of heterogeneous data, it is interesting to design efficient machine learning models that are capable of performing variable selection and feature group (data source) selection simultaneously (a.k.a bi-level selection). In this thesis, I carry out research along this direction with a particular focus on designing efficient optimization algorithms. I start with a unified bi-level learning model that contains several existing feature selection models as special cases. Then the proposed model is further extended to tackle the block-wise missing data, one of the major challenges in the diagnosis of Alzheimer's Disease (AD). Moreover, I propose a novel interpretable sparse group feature selection model that greatly facilitates the procedure of parameter tuning and model selection. Last but not least, I show that by solving the sparse group hard thresholding problem directly, the sparse group feature selection model can be further improved in terms of both algorithmic complexity and efficiency. Promising results are demonstrated in the extensive evaluation on multiple real-world data sets. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2014 Computer science block-wise missing data feature selection hard-thresholding multi-source optimization
138	Comparison of Feature Selection Methods for Robust Dexterous Decoding of Finger Movements from the Primary Motor Cortex of a Non-human Primate Using Support Vector Machine January 2015 (has links) abstract: Robust and stable decoding of neural signals is imperative for implementing a useful neuroprosthesis capable of carrying out dexterous tasks. A nonhuman primate (NHP) was trained to perform combined flexions of the thumb, index and middle fingers in addition to individual flexions and extensions of the same digits. An array of microelectrodes was implanted in the hand area of the motor cortex of the NHP and used to record action potentials during finger movements. A Support Vector Machine (SVM) was used to classify which finger movement the NHP was making based upon action potential firing rates. The effect of four feature selection techniques, Wilcoxon signed-rank test, Relative Importance, Principal Component Analysis, and Mutual Information Maximization was compared based on SVM classification performance. SVM classification was used to examine the functional parameters of (i) efficacy (ii) endurance to simulated failure and (iii) longevity of classification. The effect of using isolated-neuron and multi-unit firing rates was compared as the feature vector supplied to the SVM. The best classification performance was on post-implantation day 36, when using multi-unit firing rates the worst classification accuracy resulted from features selected with Wilcoxon signed-rank test (51.12 ± 0.65%) and the best classification accuracy resulted from Mutual Information Maximization (93.74 ± 0.32%). On this day when using single-unit firing rates, the classification accuracy from the Wilcoxon signed-rank test was 88.85 ± 0.61 % and Mutual Information Maximization was 95.60 ± 0.52% (degrees of freedom =10, level of chance =10%) / Dissertation/Thesis / Masters Thesis Bioengineering 2015 Biomedical engineering Feature selection Machine learning neural decoding Neural engineering Neuroprosthetics Support vector machine
139	Big Data Analysis of Bacterial Inhibitors in Parallelized Cellomics - A Machine Learning Approach January 2016 (has links) abstract: Identifying chemical compounds that inhibit bacterial infection has recently gained a considerable amount of attention given the increased number of highly resistant bacteria and the serious health threat it poses around the world. With the development of automated microscopy and image analysis systems, the process of identifying novel therapeutic drugs can generate an immense amount of data - easily reaching terabytes worth of information. Despite increasing the vast amount of data that is currently generated, traditional analytical methods have not increased the overall success rate of identifying active chemical compounds that eventually become novel therapeutic drugs. Moreover, multispectral imaging has become ubiquitous in drug discovery due to its ability to provide valuable information on cellular and sub-cellular processes using florescent reagents. These reagents are often costly and toxic to cells over an extended period of time causing limitations in experimental design. Thus, there is a significant need to develop a more efficient process of identifying active chemical compounds. This dissertation introduces novel machine learning methods based on parallelized cellomics to analyze interactions between cells, bacteria, and chemical compounds while reducing the use of fluorescent reagents. Machine learning analysis using image-based high-content screening (HCS) data is compartmentalized into three primary components: (1) \textit{Image Analytics}, (2) \textit{Phenotypic Analytics}, and (3) \textit{Compound Analytics}. A novel software analytics tool called the Insights project is also introduced. The Insights project fully incorporates distributed processing, high performance computing, and database management that can rapidly and effectively utilize and store massive amounts of data generated using HCS biological assessments (bioassays). It is ideally suited for parallelized cellomics in high dimensional space. Results demonstrate that a parallelized cellomics approach increases the quality of a bioassay while vastly decreasing the need for control data. The reduction in control data leads to less fluorescent reagent consumption. Furthermore, a novel proposed method that uses single-cell data points is proven to identify known active chemical compounds with a high degree of accuracy, despite traditional quality control measurements indicating the bioassay to be of poor quality. This, ultimately, decreases the time and resources needed in optimizing bioassays while still accurately identifying active compounds. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2016 Computer science Molecular biology Convolution Neural Network Feature Selection High Content Screening Machine Learning
140	Mining Signed Social Networks Using Unsupervised Learning Algorithms January 2017 (has links) abstract: Due to vast resources brought by social media services, social data mining has received increasing attention in recent years. The availability of sheer amounts of user-generated data presents data scientists both opportunities and challenges. Opportunities are presented with additional data sources. The abundant link information in social networks could provide another rich source in deriving implicit information for social data mining. However, the vast majority of existing studies overwhelmingly focus on positive links between users while negative links are also prevailing in real- world social networks such as distrust relations in Epinions and foe links in Slashdot. Though recent studies show that negative links have some added value over positive links, it is dicult to directly employ them because of its distinct characteristics from positive interactions. Another challenge is that label information is rather limited in social media as the labeling process requires human attention and may be very expensive. Hence, alternative criteria are needed to guide the learning process for many tasks such as feature selection and sentiment analysis. To address above-mentioned issues, I study two novel problems for signed social networks mining, (1) unsupervised feature selection in signed social networks; and (2) unsupervised sentiment analysis with signed social networks. To tackle the first problem, I propose a novel unsupervised feature selection framework SignedFS. In particular, I model positive and negative links simultaneously for user preference learning, and then embed the user preference learning into feature selection. To study the second problem, I incorporate explicit sentiment signals in textual terms and implicit sentiment signals from signed social networks into a coherent model Signed- Senti. Empirical experiments on real-world datasets corroborate the effectiveness of these two frameworks on the tasks of feature selection and sentiment analysis. / Dissertation/Thesis / Masters Thesis Computer Science 2017 Computer science Feature selection Sentiment analysis Signed social network Unsupervised learning

Search results