• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 244
  • 85
  • 27
  • 20
  • 10
  • 6
  • 5
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 487
  • 487
  • 180
  • 154
  • 117
  • 116
  • 111
  • 70
  • 69
  • 61
  • 55
  • 53
  • 53
  • 50
  • 49
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
281

Seleção de características e aprendizado ativo para classificação de imagens de sensoriamento remoto / Feature selection and active learning for remote sensing image classification

Fábio Rodrigues Jorge 29 April 2015 (has links)
Em aplicações de sensoriamento remoto, há diversos problemas nos quais há conhecimento predominante sobre uma categoria ou classe alvo, e pouco conhecimento sobre as demais categorias. Nesses casos, o treinamento de um classificador é prejudicado pelo desbalanceamento de classes. Assim, o estudo de características visuais para se definir o melhor subespaço de características pode ser uma alternativa viável para melhorar o desempenho dos classificadores. O uso de abordagens baseadas em detecção de anomalias também pode auxiliar por meio da modelagem da classe normal (comumente majoritária) enquanto todas as outras classes são consideradas como anomalias. Este estudo apresentou uma base de imagens de sensoriamento remoto, cuja aplicação é identificar entre regiões de cobertura vegetal e regiões de não cobertura vegetal. Para solucionar o problema de desbalanceamento entre as classes, foram realizados estudos das características visuais a fim de definir qual o conjunto de atributos que melhor representa os dados. Também foi proposta a criação de um pipeline para se tratar bases desbalanceadas de cobertura vegetal. Este pipeline fez uso de técnicas de seleção de características e aprendizado ativo. A análise de características apresentou que o subespaço usando o extrator BIC com o índice de vegetação ExG foi o que melhor distinguiu os dados. Além disso, a técnica de ordenação proposta mostrou bom desempenho com poucas dimensões. O aprendizado ativo também ajudou na criação de um modelo melhor, com resultados comparáveis com as melhores características visuais. / In remote sensing applications, there are several problems in which there is predominant knowledge about a target category or class, and little knowledge of the other categories. In such cases, the training of a classifier is hampered by the class imbalance. Thus, the study of visual characteristics to determine the best subspace characteristics may be a feasible alternative to improve the performance of classifiers. The use of anomaly detection-based approaches can also help through the normal class modeling (usually the major class) while considering all other classes as anomalies. This study presents a remote sensing image dataset, whose application is to classify regions of the image into vegetation coverage (related to plantation) and non-vegetation coverage. To solve the class imbalance problem, studies were conducted using several visual characteristics in order to define the set of attributes that best represent the data. A pipeline that deals with the vegetation classification problem and its class imbalance issues is also proposed. This pipeline made use of feature selection techniques and active learning. The visual features analysis showed that a subspace using the BIC extractor with EXG vegetation index was the best to distinguished the data. Also, and the proposed sorting-based feature selection achieved good results with a low dimensional subspaces. Furthermore, the active learning helped creating a better model, with results comparable with the best visual features.
282

Seleção de características por meio de algoritmos genéticos para aprimoramento de rankings e de modelos de classificação / Feature selection by genetic algorithms to improve ranking and classification models

Sérgio Francisco da Silva 25 April 2011 (has links)
Sistemas de recuperação de imagens por conteúdo (Content-based image retrieval { CBIR) e de classificação dependem fortemente de vetores de características que são extraídos das imagens considerando critérios visuais específicos. É comum que o tamanho dos vetores de características seja da ordem de centenas de elementos. Conforme se aumenta o tamanho (dimensionalidade) do vetor de características, também se aumentam os graus de irrelevâncias e redundâncias, levando ao problema da \"maldição da dimensionalidade\". Desse modo, a seleção das características relevantes é um passo primordial para o bom funcionamento de sistemas CBIR e de classificação. Nesta tese são apresentados novos métodos de seleção de características baseados em algoritmos genéticos (do inglês genetic algorithms - GA), visando o aprimoramento de consultas por similaridade e modelos de classificação. A família Fc (\"Fitness coach\") de funções de avaliação proposta vale-se de funções de avaliação de ranking, para desenvolver uma nova abordagem de seleção de características baseada em GA que visa aprimorar a acurácia de sistemas CBIR. A habilidade de busca de GA considerando os critérios de avaliação propostos (família Fc) trouxe uma melhora de precisão de consultas por similaridade de até 22% quando comparado com métodos wrapper tradicionais para seleção de características baseados em decision-trees (C4.5), naive bayes, support vector machine, 1-nearest neighbor e mineração de regras de associação. Outras contribuições desta tese são dois métodos de seleção de características baseados em filtragem, com aplicações em classificação de imagens, que utilizam o cálculo supervisionado da estatística de silhueta simplificada como função de avaliação: o silhouette-based greedy search (SiGS) e o silhouette-based genetic algorithm search (SiGAS). Os métodos propostos superaram os métodos concorrentes na literatura (CFS, FCBF, ReliefF, entre outros). É importante também ressaltar que o ganho em acurácia obtido pela família Fc, e pelos métodos SiGS e SiGAS propostos proporcionam também um decréscimo significativo no tamanho do vetor de características, de até 90% / Content-based image retrieval (CBIR) and classification systems rely on feature vectors extracted from images considering specific visual criteria. It is common that the size of a feature vector is of the order of hundreds of elements. When the size (dimensionality) of the feature vector is increased, a higher degree of redundancy and irrelevancy can be observed, leading to the \"curse of dimensionality\" problem. Thus, the selection of relevant features is a key aspect in a CBIR or classification system. This thesis presents new methods based on genetic algorithms (GA) to perform feature selection. The Fc (\"Fitness coach\") family of fitness functions proposed takes advantage of single valued ranking evaluation functions, in order to develop a new method of genetic feature selection tailored to improve the accuracy of CBIR systems. The ability of the genetic algorithms to boost feature selection by employing evaluation criteria (fitness functions) improves up to 22% the precision of the query answers in the analyzed databases when compared to traditional wrapper feature selection methods based on decision-tree (C4.5), naive bayes, support vector machine, 1-nearest neighbor and association rule mining. Other contributions of this thesis are two filter-based feature selection algorithms for classification purposes, which calculate the simplified silhouette statistic as evaluation function: the silhouette-based greedy search (SiGS) and the silhouette-based genetic algorithm search (SiGAS). The proposed algorithms overcome the state-of-the-art ones (CFS, FCBF and ReliefF, among others). It is important to stress that the gain in accuracy of the proposed methods family Fc, SiGS and SIGAS is allied to a significant decrease in the feature vector size, what can reach up to 90%
283

Stabilité de la sélection de variables sur des données haute dimension : une application à l'expression génique / Feature selection stability on high dimensional data : an application to gene expression data

Dernoncourt, David 15 October 2014 (has links)
Les technologies dites « haut débit » permettent de mesurer de très grandes quantités de variables à l'échelle de chaque individu : séquence ADN, expressions des gènes, profil lipidique… L'extraction de connaissances à partir de ces données peut se faire par exemple par des méthodes de classification. Ces données contenant un très grand nombre de variables, mesurées sur quelques centaines de patients, la sélection de variables est une étape préalable indispensable pour réduire le risque de surapprentissage, diminuer les temps de calcul, et améliorer l'interprétabilité des modèles. Lorsque le nombre d’observations est faible, la sélection tend à être instable, et on observe souvent que sur deux jeux de données différents mais traitant d’un même problème, les variables sélectionnées ne se recoupent presque pas. Pourtant, obtenir une sélection stable semble crucial si l'on veut avoir confiance dans la pertinence effective des variables sélectionnées à des fins d'extraction de connaissances. Dans ce travail, nous avons d'abord cherché à déterminer quels sont les facteurs qui influencent le plus la stabilité de la sélection. Puis nous avons proposé une approche, spécifique aux données puces à ADN, faisant appel aux annotations fonctionnelles pour assister les méthodes de sélection habituelles, en enrichissant les données avec des connaissances a priori. Nous avons ensuite travaillé sur deux aspects des méthodes d'ensemble : le choix de la méthode d'agrégation et les ensembles hybrides. Dans un dernier chapitre, nous appliquons les méthodes étudiées à un problème de prédiction de la reprise de poids suite à un régime, à partir de données puces, chez des patients obèses. / High throughput technologies allow us to measure very high amounts of variables in patients: DNA sequence, gene expression, lipid profile… Knowledge discovery can be performed on such data using, for instance, classification methods. However, those data contain a very high number of variables, which are measured, in the best cases, on a few hundreds of patients. This makes feature selection a necessary first step so as to reduce the risk of overfitting, reduce computation time, and improve model interpretability. When the amount of observations is low, feature selection tends to be unstable. It is common to observe that two selections obtained from two different datasets dealing with the same problem barely overlap. Yet, it seems important to obtain a stable selection if we want to be confident that the selected variables are really relevant, in an objective of knowledge discovery. In this work, we first tried to determine which factors have the most influence on feature selection stability. We then proposed a feature selection method, specific to microarray data, using functional annotations from Gene Ontology in order to assist usual feature selection methods, with the addition of a priori knowledge to the data. We then worked on two aspects of ensemble methods: the choice of the aggregation method, and hybrid ensemble methods. In the final chapter, we applied the methods studied in the thesis to a dataset from our lab, dealing with the prediction of weight regain after a diet, from microarray data, in obese patients.
284

PREDICTION OF PUBLIC BUS TRANSPORTATION PLANNING BASED ON PASSENGER COUNT AND TRAFFIC CONDITIONS

Heidaripak, Samrend January 2021 (has links)
Artificial intelligence has become a hot topic in the past couple of years because of its potential of solving problems. The most used subset of artificial intelligence today is machine learning, which is essentially the way a machine can learn to do tasks without getting any explicit instructions. A problem that has historically been solved by common knowledge and experience is the planning of bus transportation, which has been prone to mistakes. This thesis investigates how to extract the key features of a raw dataset and if a couple of machine learning algorithms can be applied to predict and plan the public bus transportation, while also considering the weather conditions. By using a pre-processing method to extract the features before creating and evaluating an k-nearest neighbors model as well as an artificial neural network model, predicting the passenger count on a given route could help planning of the bus transportation. The outcome of the thesis was that the feature extraction was successful, and both models could successfully predict the passenger count based on normal conditions. However, in extreme conditions such as the pandemic during 2020, the models could not be proven to successfully predict the passenger count nor being used to plan the bus transportation.
285

Analýza fonace u pacientů s Parkinsonovou nemocí / Analysis of phonation in patients with Parkinson's disease

Kopřiva, Tomáš January 2015 (has links)
This work deals with analysis of phonation in patients with Parkinson’s disease (PD). Approximately 90% of patients with Parkinson’s disease suffer from speech motor dysfunction called hypokinetic dysarthria. System for Parkinson’s disease analysis from speech signals is proposed and several types of features are examined. Czech Parkinson’s speech database called PARCZ is used for classification. This dataset consists of 84 PD patients and 49 healthy controls. Results are evaluated in two ways. Firstly, features are individually analysed by Spearman correlation, mutual information and Mann-Whitney U test. Classification is based on random forests along with leave-one-out validation. Secondly, SFFS algorithm is employed for feature selection in order to get the best classification result. Proposed system is tested for each gender individually and both genders together as well. Best result for both genders together is expressed by accuracy 89,47 %, sensitivity 91,67% and specificity 85,71 %. Results of this work showed that the most important vowel realizations for phonation analysis are sustained vowels pronounced with maximum or minimum intensity (not whispering).
286

Visual Analytics for Decision Making in Performance Evaluation

Jieqiong Zhao (8791535) 05 May 2020 (has links)
Performance analysis often considers numerous factors contributing to performance, and the relative importance of these factors is evolving based on dynamic conditions and requirements. Investigating large numbers of factors and understanding individual factors' predictability within the ultimate performance are challenging tasks. A visual analytics approach that integrates interactive analysis, novel visual representations, and predictive machine learning models can provide new capabilities to examine performance effectively and thoroughly. Currently, only limited research has been done on the possible applications of visual analytics for performance evaluation. In this dissertation, two specific types of performance analysis are presented: (1) organizational employee performance evaluation and (2) performance improvement of machine learning models with interactive feature selection. Both application scenarios leverage the human-in-the-loop approach to assist the identification of influential factors. For organizational employee performance evaluation, a novel visual analytics system, MetricsVis, is developed to support exploratory organizational performance analysis. MetricsVis incorporates hybrid evaluation metrics that integrate quantitative measurements of observed employee achievements and subjective feedback on the relative importance of these achievements to demonstrate employee performance at and between multiple levels regarding the organizational hierarchy. MetricsVis II extends the original system by including actual supervisor ratings and user-guided rankings to capture preferences from users through derived weights. Comparing user preferences with objective employee workload data enables users to relate user evaluation to historical observations and even discover potential bias. For interactive feature selection and model evaluation, a visual analytics system, FeatureExplorer, allows users to refine and diagnose a model iteratively by selecting features based on their domain knowledge, interchangeable features, feature importance, and the resulting model performance. FeatureExplorer enables users to identify stable, trustable, and credible predictive features that contribute significantly to a prediction model.
287

Automatic detection of the fuel composition in a Diesel Engine : Identifying fuel composition in the fuel system of a combustion engine and optimising for computational complexity / Automatisk bränsledetektering med beräkningseffektiv variabelval : Idenftifiering av bränslekomposition i en förbränningmotors bränslesystem för optimerat variabelsval

Hultgren, Andree January 2021 (has links)
The transportation industry is responsible for 26% of all emission of greenhouse gases in the European Union. Many steps are being taken to minimise greenhouse gas emissions. The most effective way to reduce the emission of greenhouse gases is by transitioning to biofuels. The combustion engines in most vehicles perform below their potential efficiency when running on biofuels due to the reduced energy density. The characteristics of the injection into the combustion chamber can be adjusted if the fuel type being injected is known. In Diesel engines, Fatty Acid Methyl Esters (FAME) is one of the most used biofuels. The higher weight density and lower energy density of FAME compared to Diesel result in lower power output when used in a Diesel engine. Detecting the fuel composition in the engine would allow for adaptation to the injection characteristics and bring back the engine’s efficiency to its full potential independent of the fuel composition. The most significant issue with fuel composition prediction is that no work has been done in this field using machine learning. There are several hundreds of features inside the control system of a truck. The selection of which features contribute to the prediction of fuel composition is important and challenging. The prediction should be computationally inexpensive and relatively accurate to facilitate in-time prediction. Using a feature selection method based on Shapley additive explanations (SHAP) applied to an expert network enables feature selection perfectly tailored for finding the optimal features that combined will provide accurate predictions with minimal computational resources. This feature selection method has been tested before but with limited analysis and adaptation. We apply various feature selection methods and propose a new feature selection method coined SHAP-C, which outperforms all other feature selection methods we have tested for this particular scope of application. The results show that with a minimal network of two input features and six hidden nodes, the fuel composition can be predicted with a 98.82% accuracy using a total of 75 floating-point operations. The low computational complexity allows for real-time predictions in the control system of a truck, which can be used to modulate the injection characteristics into the engine’s combustion chamber. The network used to identify the fuel composition has been trained with data from a single truck. The results are therefore not generalised across trucks. This adjustment based on fuel composition would allow a truck to run optimally independent of the fuel composition. / Transportindustrin är ansvarig för 26% av alla utsläpp i den Europeiska Unionen. Många steg tas för att minimera utsläpp av växthusgaser. En av de mest effektiva metoderna för att minska utsläppen är biobränslen. Förbränningsmotorer i de flesta fordon underpresterar när de använder biobränslen som källa för energi. Karaktäristiken av injektionen i förbränningskammaren kan justeras om bränsletypen är känd. I dieselmotorer är fettsyrametylestrar en av de mest använda biobränslena. Den högre densiteten i vikt och den lägre densiteten i energi resulterar i en låg effekt när biobränslet används i en dieselmotor. Detektering av bränslekomposition i bränslesystemet skulle möjliggöra en adaptiv injektion av bränsle för att optimera effektiviteten av motorn. Det största problemet med bränsledetektering är att inget arbete har gjorts inom maskininlärning i detta område. Det finns hundratals olika mätvärden inuti kontrollsystemet av en lastbil. Valet av vilket mätvärde som bidrar till en träffsäker beräkning av bränslekomposition är mycket viktigt. Beräkningen måste vara beräkningsmässigt billig, snabb och träffsäker. Därför måste en skräddarsydd lösning byggas för att finna de bästa mätvärden med minimal beräkningskostnad för att kunna beräkna bränsletyp i realtid. Användningen av en mätvärdesväljande metod baserad på SHAP och ett expert-nätverk tillåter ett val av mätpunkter som är perfekt anpassat för att finna vilka mätpunkter är optimala för att träffsäkert och beräkningsbilligt ta fram bränslekompositionen. Detta val av mätvärden har testats förut men klassades som opålitligt på grund av den slumpmässiga naturen av neurala nätverk. Denna brist har överkommits genom att träna ett stort antal expertnätverk och använda resultatet från genomsnittet över alla modeller, vilket eliminerar den stokastiska naturen av problemet. Resultaten visar att med hjälp av ett litet nätverk med två mätpunkter och sex dolda noder, kan bränslekompositionen beräknas med en träffsäkerhet av 98.82% med endast 75 flyttalsoperationer. Detta tillåter för realtids beräkning av bränslekomposition i kontrollsystemet till en lastbil, vilket i sin tur kan modulera injektionskaraktäristiken av bränsle till förbränningskammaren i motorn. Denna justering baserat på bränslekompositionen tillåter en lastbil att köras optimalt oavsätt komposition av bränsle.
288

Computer-Aided Diagnosis for Mammographic Microcalcification Clusters

Tembey, Mugdha 07 November 2003 (has links)
Breast cancer is the second leading cause of cancer deaths among women in the United States and microcalcifications clusters are one of the most important indicators of breast disease. Computer methodologies help in the detection and differentiation between benign and malignant lesions and have the potential to improve radiologists' performance and breast cancer diagnosis significantly. A Computer-Aided Diagnosis (CAD-Dx) algorithm has been previously developed to assist radiologists in the diagnosis of mammographic clusters of calcifications with the modules: (a) detection of all calcification-like areas, (b) false-positive reduction and segmentation of the detected calcifications, (c) selection of morphological and distributional features and (d) classification of the clusters. Classification was based on an artificial neural network (ANN) with 14 input features and assigned a likelihood of malignancy to each cluster. The purpose of this work was threefold: (a) optimize the existing algorithm and test on a large database, (b) rank classification features and select the best feature set, and (c) determine the impact of single and two-view feature estimation on classification and feature ranking. Classification performance was evaluated with the NevProp4 artificial neural network trained with the leave-one-out resampling technique. Sequential forward selection was used for feature selection and ranking. Mammograms from 136 patients, containing single or two views of a breast with calcification cluster were digitized at 60 microns and 16 bits per pixel. 260 regions of interest (ROI's) centered on calcification cluster were defined to build the single-view dataset. 100 of the 136 patients had a two-view mammogram which yielded 202 ROI's that formed the two-view dataset. Classification and feature selection were evaluated with both these datasets. To decide on the optimal features for two-view feature estimation several combinations of CC and MLO view features were attempted. On the single-view dataset the classifier achieved an AZ =0.8891 with 88% sensitivity and 77% specificity at an operating point of 0.4; 12 features were selected as the most important. With the two-view dataset, the classifier achieved a higher performance with an AZ =0.9580 and sensitivity and specificity of 98% and 80% respectively at an operating point of 0.4; 10 features were selected as the most important.
289

Prediction of Protein Function and Functional Sites From Protein Sequences

Hu, Jing 01 May 2009 (has links)
High-throughput genomics projects have resulted in a rapid accumulation of protein sequences. Therefore, computational methods that can predict protein functions and functional sites efficiently and accurately are in high demand. In addition, prediction methods utilizing only sequence information are of particular interest because for most proteins, 3-dimensional structures are not available. However, there are several key challenges in developing methods for predicting protein function and functional sites. These challenges include the following: the construction of representative datasets to train and evaluate the method, the collection of features related to the protein functions, the selection of the most useful features, and the integration of selected features into suitable computational models. In this proposed study, we tackle these challenges by developing procedures for benchmark dataset construction and protein feature extraction, implementing efficient feature selection strategies, and developing effective machine learning algorithms for protein function and functional site predictions. We investigate these challenges in three bioinformatics tasks: the discovery of transmembrane beta-barrel (TMB) proteins in gram-negative bacterial proteomes, the identification of deleterious non-synonymous single nucleotide polymorphisms (nsSNPs), and the identification of helix-turn-helix (HTH) motifs from protein sequence.
290

Autonomous model selection for surface classification via unmanned aerial vehicle

Watts-Willis, Tristan A. 01 January 2017 (has links)
In the pursuit of research in remote areas, robots may be employed to deploy sensor networks. These robots need a method of classifying a surface to determine if it is a suitable installation site. Developing surface classification models manually requires significant time and detracts from the goal of automating systems. We create a system that automatically collects the data using an Unmanned Aerial Vehicle (UAV), extracts features, trains a large number of classifiers, selects the best classifier, and programs the UAV with that classifier. We design this system with user configurable parameters for choosing a high accuracy, efficient classifier. In support of this system, we also develop an algorithm for evaluating the effectiveness of individual features as indicators of the variable of interest. Motivating our work is a prior project that manually developed a surface classifier using an accelerometer; we replicate those results with our new automated system and improve on those results, providing a four-surface classifier with a 75% classification rate and a hard/soft classifier with a 100% classification rate. We further verify our system through a field experiment that collects and classifies new data, proving its end-to-end functionality. The general form of our system provides a valuable tool for automation of classifier creation and is released as an open-source tool.

Page generated in 0.488 seconds