• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 112
  • 42
  • 13
  • 9
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 208
  • 208
  • 208
  • 80
  • 59
  • 54
  • 43
  • 36
  • 32
  • 28
  • 25
  • 25
  • 25
  • 23
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
181

Apprentissage semi-supervisé pour la détection multi-objets dans des séquences vidéos : Application à l'analyse de flux urbains / Semi-supervised learning for multi-object detection in video sequences : Application to the analysis of urban flow

Maâmatou, Houda 05 April 2017 (has links)
Depuis les années 2000, un progrès significatif est enregistré dans les travaux de recherche qui proposent l’apprentissage de détecteurs d’objets sur des grandes bases de données étiquetées manuellement et disponibles publiquement. Cependant, lorsqu’un détecteur générique d’objets est appliqué sur des images issues d’une scène spécifique les performances de détection diminuent considérablement. Cette diminution peut être expliquée par les différences entre les échantillons de test et ceux d’apprentissage au niveau des points de vues prises par la(les) caméra(s), de la résolution, de l’éclairage et du fond des images. De plus, l’évolution de la capacité de stockage des systèmes informatiques, la démocratisation de la "vidéo-surveillance" et le développement d’outils d’analyse automatique des données vidéos encouragent la recherche dans le domaine du trafic routier. Les buts ultimes sont l’évaluation des demandes de gestion du trafic actuelles et futures, le développement des infrastructures routières en se basant sur les besoins réels, l’intervention pour une maintenance à temps et la surveillance des routes en continu. Par ailleurs, l’analyse de trafic est une problématique dans laquelle plusieurs verrous scientifiques restent à lever. Ces derniers sont dus à une grande variété dans la fluidité de trafic, aux différents types d’usagers, ainsi qu’aux multiples conditions météorologiques et lumineuses. Ainsi le développement d’outils automatiques et temps réel pour l’analyse vidéo de trafic routier est devenu indispensable. Ces outils doivent permettre la récupération d’informations riches sur le trafic à partir de la séquence vidéo et doivent être précis et faciles à utiliser. C’est dans ce contexte que s’insèrent nos travaux de thèse qui proposent d’utiliser les connaissances antérieurement acquises et de les combiner avec des informations provenant de la nouvelle scène pour spécialiser un détecteur d’objet aux nouvelles situations de la scène cible. Dans cette thèse, nous proposons de spécialiser automatiquement un classifieur/détecteur générique d’objets à une scène de trafic routier surveillée par une caméra fixe. Nous présentons principalement deux contributions. La première est une formalisation originale de transfert d’apprentissage transductif à base d’un filtre séquentiel de type Monte Carlo pour la spécialisation automatique d’un classifieur. Cette formalisation approxime itérativement la distribution cible inconnue au départ, comme étant un ensemble d’échantillons de la base spécialisée à la scène cible. Les échantillons de cette dernière sont sélectionnés à la fois à partir de la base source et de la scène cible moyennant une pondération qui utilise certaines informations a priori sur la scène. La base spécialisée obtenue permet d’entraîner un classifieur spécialisé à la scène cible sans intervention humaine. La deuxième contribution consiste à proposer deux stratégies d’observation pour l’étape mise à jour du filtre SMC. Ces stratégies sont à la base d’un ensemble d’indices spatio-temporels spécifiques à la scène de vidéo-surveillance. Elles sont utilisées pour la pondération des échantillons cibles. Les différentes expérimentations réalisées ont montré que l’approche de spécialisation proposée est performante et générique. Nous avons pu y intégrer de multiples stratégies d’observation. Elle peut être aussi appliquée à tout type de classifieur. De plus, nous avons implémenté dans le logiciel OD SOFT de Logiroad les possibilités de chargement et d’utilisation d’un détecteur fourni par notre approche. Nous avons montré également les avantages des détecteurs spécialisés en comparant leurs résultats avec celui de la méthode Vu-mètre de Logiroad. / Since 2000, a significant progress has been recorded in research work which has proposed to learn object detectors using large manually labeled and publicly available databases. However, when a generic object detector is applied on images of a specific scene, the detection performances will decrease considerably. This decrease may be explained by the differences between the test samples and the learning ones at viewpoints taken by camera(s), resolution, illumination and background images. In addition, the storage capacity evolution of computer systems, the "video surveillance" democratization and the development of automatic video-data analysis tools have encouraged research into the road-traffic domain. The ultimate aims are the management evaluation of current and future trafic requests, the road infrastructures development based on real necessities, the intervention of maintenance task in time and the continuous road surveillance. Moreover, traffic analysis is a problematicness where several scientific locks should be lifted. These latter are due to a great variety of traffic fluidity, various types of users, as well multiple weather and lighting conditions. Thus, developing automatic and real-time tools to analyse road-traffic videos has become an indispensable task. These tools should allow retrieving rich data concerning the traffic from the video sequence and they must be precise and easy to use. This is the context of our thesis work which proposes to use previous knowledges and to combine it with information extracted from the new scene to specialize an object detector to the new situations of the target scene. In this thesis, we propose to automatically specialize a generic object classifier/detector to a road traffic scene surveilled by a fixed camera. We mainly present two contributions. The first one is an original formalization of Transductive Transfer Learning based on a sequential Monte Carlo filter for automatic classifier specialization. This formalization approximates iteratively the previously unknown target distribution as a set of samples composing the specialized dataset of the target scene. The samples of this dataset are selected from both source dataset and target scene further to a weighting step using some prior information on the scene. The obtained specialized dataset allows training a specialized classifier to the target scene without human intervention. The second contribution consists in proposing two observation strategies to be used in the SMC filter’s update step. These strategies are based on a set of specific spatio-temporal cues of the video surveillance scene. They are used to weight the target samples. The different experiments carried out have shown that the proposed specialization approach is efficient and generic. We have been able to integrate multiple observation strategies. It can also be applied to any classifier / detector. In addition, we have implemented into the Logiroad OD SOFT software the loading and utilizing possibilities of a detector provided by our approach. We have also shown the advantages of the specialized detectors by comparing their results to the result of Logiroad’s Vu-meter method.
182

Vision-based moving pedestrian recognition from imprecise and uncertain data / Reconnaissance de piétons par vision à partir de données imprécises et incertaines

Zhou, Dingfu 05 December 2014 (has links)
La mise en oeuvre de systèmes avancés d’aide à la conduite (ADAS) basée vision, est une tâche complexe et difficile surtout d’un point de vue robustesse en conditions d’utilisation réelles. Une des fonctionnalités des ADAS vise à percevoir et à comprendre l’environnement de l’ego-véhicule et à fournir l’assistance nécessaire au conducteur pour réagir à des situations d’urgence. Dans cette thèse, nous nous concentrons sur la détection et la reconnaissance des objets mobiles car leur dynamique les rend plus imprévisibles et donc plus dangereux. La détection de ces objets, l’estimation de leurs positions et la reconnaissance de leurs catégories sont importants pour les ADAS et la navigation autonome. Par conséquent, nous proposons de construire un système complet pour la détection des objets en mouvement et la reconnaissance basées uniquement sur les capteurs de vision. L’approche proposée permet de détecter tout type d’objets en mouvement en fonction de deux méthodes complémentaires. L’idée de base est de détecter les objets mobiles par stéréovision en utilisant l’image résiduelle du mouvement apparent (RIMF). La RIMF est définie comme l’image du mouvement apparent causé par le déplacement des objets mobiles lorsque le mouvement de la caméra a été compensé. Afin de détecter tous les mouvements de manière robuste et de supprimer les faux positifs, les incertitudes liées à l’estimation de l’ego-mouvement et au calcul de la disparité doivent être considérées. Les étapes principales de l’algorithme sont les suivantes : premièrement, la pose relative de la caméra est estimée en minimisant la somme des erreurs de reprojection des points d’intérêt appariées et la matrice de covariance est alors calculée en utilisant une stratégie de propagation d’erreurs de premier ordre. Ensuite, une vraisemblance de mouvement est calculée pour chaque pixel en propageant les incertitudes sur l’ego-mouvement et la disparité par rapport à la RIMF. Enfin, la probabilité de mouvement et le gradient de profondeur sont utilisés pour minimiser une fonctionnelle d’énergie de manière à obtenir la segmentation des objets en mouvement. Dans le même temps, les boîtes englobantes des objets mobiles sont générées en utilisant la carte des U-disparités. Après avoir obtenu la boîte englobante de l’objet en mouvement, nous cherchons à reconnaître si l’objet en mouvement est un piéton ou pas. Par rapport aux algorithmes de classification supervisée (comme le boosting et les SVM) qui nécessitent un grand nombre d’exemples d’apprentissage étiquetés, notre algorithme de boosting semi-supervisé est entraîné avec seulement quelques exemples étiquetés et de nombreuses instances non étiquetées. Les exemples étiquetés sont d’abord utilisés pour estimer les probabilités d’appartenance aux classes des exemples non étiquetés, et ce à l’aide de modèles de mélange de gaussiennes après une étape de réduction de dimension réalisée par une analyse en composantes principales. Ensuite, nous appliquons une stratégie de boosting sur des arbres de décision entraînés à l’aide des instances étiquetées de manière probabiliste. Les performances de la méthode proposée sont évaluées sur plusieurs jeux de données de classification de référence, ainsi que sur la détection et la reconnaissance des piétons. Enfin, l’algorithme de détection et de reconnaissances des objets en mouvement est testé sur les images du jeu de données KITTI et les résultats expérimentaux montrent que les méthodes proposées obtiennent de bonnes performances dans différents scénarios de conduite en milieu urbain. / Vision-based Advanced Driver Assistance Systems (ADAS) is a complex and challenging task in real world traffic scenarios. The ADAS aims at perceiving andunderstanding the surrounding environment of the ego-vehicle and providing necessary assistance for the drivers if facing some emergencies. In this thesis, we will only focus on detecting and recognizing moving objects because they are more dangerous than static ones. Detecting these objects, estimating their positions and recognizing their categories are significantly important for ADAS and autonomous navigation. Consequently, we propose to build a complete system for moving objects detection and recognition based on vision sensors. The proposed approach can detect any kinds of moving objects based on two adjacent frames only. The core idea is to detect the moving pixels by using the Residual Image Motion Flow (RIMF). The RIMF is defined as the residual image changes caused by moving objects with compensated camera motion. In order to robustly detect all kinds of motion and remove false positive detections, uncertainties in the ego-motion estimation and disparity computation should also be considered. The main steps of our general algorithm are the following : first, the relative camera pose is estimated by minimizing the sum of the reprojection errors of matched features and its covariance matrix is also calculated by using a first-order errors propagation strategy. Next, a motion likelihood for each pixel is obtained by propagating the uncertainties of the ego-motion and disparity to the RIMF. Finally, the motion likelihood and the depth gradient are used in a graph-cut-based approach to obtain the moving objects segmentation. At the same time, the bounding boxes of moving object are generated based on the U-disparity map. After obtaining the bounding boxes of the moving object, we want to classify the moving objects as a pedestrian or not. Compared to supervised classification algorithms (such as boosting and SVM) which require a large amount of labeled training instances, our proposed semi-supervised boosting algorithm is trained with only a few labeled instances and many unlabeled instances. Firstly labeled instances are used to estimate the probabilistic class labels of the unlabeled instances using Gaussian Mixture Models after a dimension reduction step performed via Principal Component Analysis. Then, we apply a boosting strategy on decision stumps trained using the calculated soft labeled instances. The performances of the proposed method are evaluated on several state-of-the-art classification datasets, as well as on a pedestrian detection and recognition problem.Finally, both our moving objects detection and recognition algorithms are tested on the public images dataset KITTI and the experimental results show that the proposed methods can achieve good performances in different urban scenarios.
183

Leitura da web em português em ambiente de aprendizado sem-fim

Duarte, Maísa Cristina 04 January 2016 (has links)
Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2017-01-03T12:49:19Z No. of bitstreams: 1 TeseMCD.pdf: 1564245 bytes, checksum: fbb9eb1099a1b38351371c97e8e49bb4 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-16T16:47:27Z (GMT) No. of bitstreams: 1 TeseMCD.pdf: 1564245 bytes, checksum: fbb9eb1099a1b38351371c97e8e49bb4 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-16T16:47:38Z (GMT) No. of bitstreams: 1 TeseMCD.pdf: 1564245 bytes, checksum: fbb9eb1099a1b38351371c97e8e49bb4 (MD5) / Made available in DSpace on 2017-01-16T16:47:46Z (GMT). No. of bitstreams: 1 TeseMCD.pdf: 1564245 bytes, checksum: fbb9eb1099a1b38351371c97e8e49bb4 (MD5) Previous issue date: 2016-01-04 / Não recebi financiamento / NELL is a computer system that has the goal of learn to learn 24 hours per day, continuously and learn more an better than the last day, to perform the knowledge base (KB). NELL is running since January 12 of 2010. Furthermore, NELL goals is have hight precision to be able to continue the learning. NELL is developed in macro-reading context, because this NELL needs very much redundancy to run. The first step to run NELL is to have an big (all-pairs-data). An all-pairs-data is a preprocessed base using Natural Language Processing (NLP), that base has all sufficient statistics about a corpus of web pages. The proposal of this project was to create a instance of NELL (currently in English) in Portuguese. For this, the first goal was the developing an all-pairs-data in Portuguese. The second step was to create a new version of Portuguese NELL. And finally, the third goal was to develop a coreference resolution hybrid method focused in features semantics and morphologics. This method is not dependent of a specific language, it is can be applied for another languages with the same alphabet of Portuguese language. The NELL in Portuguese was developed, but the all-pairs-data is not big enough. Because it Portuguese NELL is not running for ever, like the English version. Even so, this project present the steps about how to develop a NELL in other language and some ideas about how to improve the all-pairs-data. By the way, this project present a coreference resolution hybrid method with good results to NELL. / A NELL é um sistema de computador que possui o objetivo de executar 24 horas por dia, 7 dias por semana, sem parar. A versão atual da NELL foi iniciada em 12 de Janeiro de 2010 e continua ativa. Seu objetivo é aprender cada vez mais fatos da web para popular sua base de conhecimento (Knowlegde Base - KB). Além de aprender cada vez mais, a NELL também objetiva alcançar alta confiança no aprendizado para garantir a continuidade do aprendizado. A NELL foi desenvolvida e atua no contexto da macroleitura, no qual é necessária uma grande quantidade e redundância de dados. Para que o sistema possa aprender, o primeiro passo é criar uma base preprocessada (all-pairs-data) a partir do uso de técnicas linguísticas. O all-pairs-data deve possuir todas as estatísticas suficientes para a execução da NELL e também deve ser de um tamanho suficientemente grande para que o aprendizado possa ocorrer. Neste projeto, foi proposta a criação de uma nova instância da NELL em português. Inicialmente foi proposta a criação de um all-pairs-data e, em seguida, a criação de uma abordagem híbrida para a resolução de correferências independente de língua por base em características semânticas e morfológicas. A proposta híbrida objetivou aperfeiçoar o processo atual de tratamento de correferências na NELL, melhorando assim a confiabilidade no aprendizado. Todas as propostas foram desenvolvidas e a NELL em português obteve bons resultados. Tais resultados evidenciam que a leitura da web em português poderá se tornar um sistema de aprendizado sem-fim. Para que isso ocorra são também apresentadas as futuras abordagens e propostas. Além disso, este projeto apresenta a metodologia de criação da instância da NELL em português, uma proposta de resolução de correferência que explora atributos linguisticos,bem como a ontologia da NELL, além de apontar trabalhos futuros, nos quais inclui-se processos de adição de outras línguas na NELL, principalmente para aquelas que possuem poucas páginas web disponíveis para o aprendizado.
184

[en] POROSITY ESTIMATION FROM SEISMIC ATTRIBUTES WITH SIMULTANEOUS CLASSIFICATION OF SPATIALLY STRUCTURED LATENT FACIES / [pt] PREDIÇÃO DE POROSIDADE A PARTIR DE ATRIBUTOS SÍSMICOS COM CLASSIFICAÇÃO SIMULTÂNEA DE FACIES GEOLÓGICAS LATENTES EM ESTRUTURAS ESPACIAIS

LUIZ ALBERTO BARBOSA DE LIMA 26 April 2018 (has links)
[pt] Predição de porosidade em reservatórios de óleo e gás representa em uma tarefa crucial e desafiadora na indústria de petróleo. Neste trabalho é proposto um novo modelo não-linear para predição de porosidade que trata fácies sedimentares como variáveis ocultas ou latentes. Esse modelo, denominado Transductive Conditional Random Field Regression (TCRFR), combina com sucesso os conceitos de Markov random fields, ridge regression e aprendizado transdutivo. O modelo utiliza volumes de impedância sísmica como informação de entrada condicionada aos valores de porosidade disponíveis nos poços existentes no reservatório e realiza de forma simultânea e automática a classificação das fácies e a estimativa de porosidade em todo o volume. O método é capaz de inferir as fácies latentes através da combinação de amostras precisas de porosidade local presentes nos poços com dados de impedância sísmica ruidosos, porém disponíveis em todo o volume do reservatório. A informação precisa de porosidade é propagada no volume através de modelos probabilísticos baseados em grafos, utilizando conditional random fields. Adicionalmente, duas novas técnicas são introduzidas como etapas de pré-processamento para aplicação do método TCRFR nos casos extremos em que somente um número bastante reduzido de amostras rotuladas de porosidade encontra-se disponível em um pequeno conjunto de poços exploratórios, uma situação típica para geólogos durante a fase exploratória de uma nova área. São realizados experimentos utilizando dados de um reservatório sintético e de um reservatório real. Os resultados comprovam que o método apresenta um desempenho consideravelmente superior a outros métodos automáticos de predição em relação aos dados sintéticos e, em relação aos dados reais, um desempenho comparável ao gerado por técnicas tradicionais de geo estatística que demandam grande esforço manual por parte de especialistas. / [en] Estimating porosity in oil and gas reservoirs is a crucial and challenging task in the oil industry. A novel nonlinear model for porosity estimation is proposed, which handles sedimentary facies as latent variables. It successfully combines the concepts of conditional random fields (CRFs), transductive learning and ridge regression. The proposed Transductive Conditional Random Field Regression (TCRFR) uses seismic impedance volumes as input information, conditioned on the porosity values from the available wells in the reservoir, and simultaneously and automatically provides as output the porosity estimation and facies classification in the whole volume. The method is able to infer the latent facies states by combining the local, labeled and accurate porosity information available at well locations with the plentiful but imprecise impedance information available everywhere in the reservoir volume. That accurate information is propagated in the reservoir based on conditional random field probabilistic graphical models, greatly reducing uncertainty. In addition, two new techniques are introduced as preprocessing steps for the application of TCRFR in the extreme but realistic cases where just a scarce amount of porosity labeled samples are available in a few exploratory wells, a typical situation for geologists during the evaluation of a reservoir in the exploration phase. Both synthetic and real-world data experiments are presented to prove the usefulness of the proposed methodology, which show that it outperforms previous automatic estimation methods on synthetic data and provides a comparable result to the traditional manual labored geostatistics approach on real-world data.
185

Agrupamento de dados baseado em comportamento coletivo e auto-organização / Data clustering based on collective behavior and self-organization

Roberto Alves Gueleri 18 June 2013 (has links)
O aprendizado de máquina consiste de conceitos e técnicas que permitem aos computadores melhorar seu desempenho com a experiência, ou, em outras palavras, aprender com dados. Um dos principais tópicos do aprendizado de máquina é o agrupamento de dados que, como o nome sugere, procura agrupar os dados de acordo com sua similaridade. Apesar de sua definição relativamente simples, o agrupamento é uma tarefa computacionalmente complexa, tornando proibitivo o emprego de algoritmos exaustivos, na busca pela solução ótima do problema. A importância do agrupamento de dados, aliada aos seus desafios, faz desse campo um ambiente de intensa pesquisa. Também a classe de fenômenos naturais conhecida como comportamento coletivo tem despertado muito interesse. Isso decorre da observação de um estado organizado e global que surge espontaneamente das interações locais presentes em grandes grupos de indivíduos, caracterizando, pois, o que se chama auto-organização ou emergência, para ser mais preciso. Os desafios intrínsecos e a relevância do tema vêm motivando sua pesquisa em diversos ramos da ciência e da engenharia. Ao mesmo tempo, técnicas baseadas em comportamento coletivo vêm sendo empregadas em tarefas de aprendizado de máquina, mostrando-se promissoras e ganhando bastante atenção. No presente trabalho, objetivou-se o desenvolvimento de técnicas de agrupamento baseadas em comportamento coletivo. Faz-se cada item do conjunto de dados corresponder a um indivíduo, definem-se as leis de interação local, e então os indivíduos são colocados a interagir entre si, de modo que os padrões que surgem reflitam os padrões originalmente presentes no conjunto de dados. Abordagens baseadas em dinâmica de troca de energia foram propostas. Os dados permanecem fixos em seu espaço de atributos, mas carregam certa informação a energia , a qual é progressivamente trocada entre eles. Os grupos são estabelecidos entre dados que tomam estados de energia semelhantes. Este trabalho abordou também o aprendizado semissupervisionado, cuja tarefa é rotular dados em bases parcialmente rotuladas. Nesse caso, foi adotada uma abordagem baseada na movimentação dos próprios dados pelo espaço de atributos. Procurou-se, durante todo este trabalho, não apenas propor novas técnicas de aprendizado, mas principalmente, por meio de muitas simulações e ilustrações, mostrar como elas se comportam em diferentes cenários, num esforço em mostrar onde reside a vantagem de se utilizar a dinâmica coletiva na concepção dessas técnicas / Machine learning consists of concepts and techniques that enable computers to improve their performance with experience, i.e., enable computers to learn from data. Data clustering (or just clustering) is one of its main topics, which aims to group data according to their similarities. Regardless of its simple definition, clustering is a complex computational task. Its relevance and challenges make this field an environment of intense research. The class of natural phenomena known as collective behavior has also attracted much interest. This is due to the observation that global patterns may spontaneously arise from local interactions among large groups of individuals, what is know as self-organization (or emergence). The challenges and relevance of the subject are encouraging its research in many branches of science and engineering. At the same time, techniques based on collective behavior are being employed in machine learning tasks, showing to be promising. The objective of the present work was to develop clustering techniques based on collective behavior. Each dataset item corresponds to an individual. Once the local interactions are defined, the individuals begin to interact with each other. It is expected that the patterns arising from these interactions match the patterns originally present in the dataset. Approaches based on dynamics of energy exchange have been proposed. The data are kept fixed in their feature space, but they carry some sort of information (the energy), which is progressively exchanged among them. The groups are established among data that take similar energy states. This work has also addressed the semi-supervised learning task, which aims to label data in partially labeled datasets. In this case, it has been proposed an approach based on the motion of the data themselves around the feature space. More than just providing new machine learning techniques, this research has tried to show how the techniques behave in different scenarios, in an effort to show where lies the advantage of using collective dynamics in the design of such techniques
186

Abordagem semi-supervisionada para detecção de módulos de software defeituosos

OLIVEIRA, Paulo César de 31 August 2015 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2017-07-24T12:11:04Z No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Dissertação Mestrado Paulo César de Oliveira.pdf: 2358509 bytes, checksum: 36436ca63e0a8098c05718bbee92d36e (MD5) / Made available in DSpace on 2017-07-24T12:11:04Z (GMT). No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Dissertação Mestrado Paulo César de Oliveira.pdf: 2358509 bytes, checksum: 36436ca63e0a8098c05718bbee92d36e (MD5) Previous issue date: 2015-08-31 / Com a competitividade cada vez maior do mercado, aplicações de alto nível de qualidade são exigidas para a automação de um serviço. Para garantir qualidade de um software, testá-lo visando encontrar falhas antecipadamente é essencial no ciclo de vida de desenvolvimento. O objetivo do teste de software é encontrar falhas que poderão ser corrigidas e consequentemente, aumentar a qualidade do software em desenvolvimento. À medida que o software cresce, uma quantidade maior de testes é necessária para prevenir ou encontrar defeitos, visando o aumento da qualidade. Porém, quanto mais testes são criados e executados, mais recursos humanos e de infraestrutura são necessários. Além disso, o tempo para realizar as atividades de teste geralmente não é suficiente, fazendo com que os defeitos possam escapar. Cada vez mais as empresas buscam maneiras mais baratas e efetivas para detectar defeitos em software. Muitos pesquisadores têm buscado nos últimos anos, mecanismos para prever automaticamente defeitos em software. Técnicas de aprendizagem de máquina vêm sendo alvo das pesquisas, como uma forma de encontrar defeitos em módulos de software. Tem-se utilizado muitas abordagens supervisionadas para este fim, porém, rotular módulos de software como defeituosos ou não para fins de treinamento de um classificador é uma atividade muito custosa e que pode inviabilizar a utilização de aprendizagem de máquina. Neste contexto, este trabalho propõe analisar e comparar abordagens não supervisionadas e semisupervisionadas para detectar módulos de software defeituosos. Para isto, foram utilizados métodos não supervisionados (de detecção de anomalias) e também métodos semi-supervisionados, tendo como base os classificadores AutoMLP e Naive Bayes. Para avaliar e comparar tais métodos, foram utilizadas bases de dados da NASA disponíveis no PROMISE Software Engineering Repository. / Because the increase of market competition then high level of quality applications are required to provide automate services. In order to achieve software quality testing is essential in the development lifecycle with the purpose of finding defect as earlier as possible. The testing purpose is not only to find failures that can be fixed, but improve software correctness and quality. Once software gets more complex, a greater number of tests will be necessary to prevent or find defects. Therefore, the more tests are designed and exercised, the more human and infrastructure resources are needed. However, time to run the testing activities are not enough, thus, as a result, it causes escape defects. Companies are constantly trying to find cheaper and effective ways to software defect detection in earlier stages. In the past years, many researchers are trying to finding mechanisms to automatically predict these software defects. Machine learning techniques are being a research target, as a way of finding software modules detection. Many supervised approaches are being used with this purpose, but labeling software modules as defective or not defective to be used in training phase is very expensive and it can make difficult machine learning use. Considering that this work aims to analyze and compare unsupervised and semi-supervised approaches to software module defect detection. To do so, unsupervised methods (of anomaly detection) and semi-supervised methods using AutoMLP and Naive Bayes algorithms were used. To evaluate and compare these approaches, NASA datasets were used at PROMISE Software Engineering Repository.
187

Aprendizado semissupervisionado multidescrição em classificação de textos / Multi-view semi-supervised learning in text classification

Ígor Assis Braga 23 April 2010 (has links)
Algoritmos de aprendizado semissupervisionado aprendem a partir de uma combinação de dados rotulados e não rotulados. Assim, eles podem ser aplicados em domínios em que poucos exemplos rotulados e uma vasta quantidade de exemplos não rotulados estão disponíveis. Além disso, os algoritmos semissupervisionados podem atingir um desempenho superior aos algoritmos supervisionados treinados nos mesmos poucos exemplos rotulados. Uma poderosa abordagem ao aprendizado semissupervisionado, denominada aprendizado multidescrição, pode ser usada sempre que os exemplos de treinamento são descritos por dois ou mais conjuntos de atributos disjuntos. A classificação de textos é um domínio de aplicação no qual algoritmos semissupervisionados vêm obtendo sucesso. No entanto, o aprendizado semissupervisionado multidescrição ainda não foi bem explorado nesse domínio dadas as diversas maneiras possíveis de se descrever bases de textos. O objetivo neste trabalho é analisar o desempenho de algoritmos semissupervisionados multidescrição na classificação de textos, usando unigramas e bigramas para compor duas descrições distintas de documentos textuais. Assim, é considerado inicialmente o difundido algoritmo multidescrição CO-TRAINING, para o qual são propostas modificações a fim de se tratar o problema dos pontos de contenção. É também proposto o algoritmo COAL, o qual pode melhorar ainda mais o algoritmo CO-TRAINING pela incorporação de aprendizado ativo como uma maneira de tratar pontos de contenção. Uma ampla avaliação experimental desses algoritmos foi conduzida em bases de textos reais. Os resultados mostram que o algoritmo COAL, usando unigramas como uma descrição das bases textuais e bigramas como uma outra descrição, atinge um desempenho significativamente melhor que um algoritmo semissupervisionado monodescrição. Levando em consideração os bons resultados obtidos por COAL, conclui-se que o uso de unigramas e bigramas como duas descrições distintas de bases de textos pode ser bastante compensador / Semi-supervised learning algorithms learn from a combination of both labeled and unlabeled data. Thus, they can be applied in domains where few labeled examples and a vast amount of unlabeled examples are available. Furthermore, semi-supervised learning algorithms may achieve a better performance than supervised learning algorithms trained on the same few labeled examples. A powerful approach to semi-supervised learning, called multi-view learning, can be used whenever the training examples are described by two or more disjoint sets of attributes. Text classification is a domain in which semi-supervised learning algorithms have shown some success. However, multi-view semi-supervised learning has not yet been well explored in this domain despite the possibility of describing textual documents in a myriad of ways. The aim of this work is to analyze the effectiveness of multi-view semi-supervised learning in text classification using unigrams and bigrams as two distinct descriptions of text documents. To this end, we initially consider the widely adopted CO-TRAINING multi-view algorithm and propose some modifications to it in order to deal with the problem of contention points. We also propose the COAL algorithm, which further improves CO-TRAINING by incorporating active learning as a way of dealing with contention points. A thorough experimental evaluation of these algorithms was conducted on real text data sets. The results show that the COAL algorithm, using unigrams as one description of text documents and bigrams as another description, achieves significantly better performance than a single-view semi-supervised algorithm. Taking into account the good results obtained by COAL, we conclude that the use of unigrams and bigrams as two distinct descriptions of text documents can be very effective
188

O algoritmo de aprendizado semi-supervisionado co-training e sua aplicação na rotulação de documentos / The semi-supervised learning algorithm co-training applied to label text documents

Edson Takashi Matsubara 26 May 2004 (has links)
Em Aprendizado de Máquina, a abordagem supervisionada normalmente necessita de um número significativo de exemplos de treinamento para a indução de classificadores precisos. Entretanto, a rotulação de dados é freqüentemente realizada manualmente, o que torna esse processo demorado e caro. Por outro lado, exemplos não-rotulados são facilmente obtidos se comparados a exemplos rotulados. Isso é particularmente verdade para tarefas de classificação de textos que envolvem fontes de dados on-line tais como páginas de internet, email e artigos científicos. A classificação de textos tem grande importância dado o grande volume de textos disponível on-line. Aprendizado semi-supervisionado, uma área de pesquisa relativamente nova em Aprendizado de Máquina, representa a junção do aprendizado supervisionado e não-supervisionado, e tem o potencial de reduzir a necessidade de dados rotulados quando somente um pequeno conjunto de exemplos rotulados está disponível. Este trabalho descreve o algoritmo de aprendizado semi-supervisionado co-training, que necessita de duas descrições de cada exemplo. Deve ser observado que as duas descrições necessárias para co-training podem ser facilmente obtidas de documentos textuais por meio de pré-processamento. Neste trabalho, várias extensões do algoritmo co-training foram implementadas. Ainda mais, foi implementado um ambiente computacional para o pré-processamento de textos, denominado PreTexT, com o objetivo de utilizar co-training em problemas de classificação de textos. Os resultados experimentais foram obtidos utilizando três conjuntos de dados. Dois conjuntos de dados estão relacionados com classificação de textos e o outro com classificação de páginas de internet. Os resultados, que variam de excelentes a ruins, mostram que co-training, similarmente a outros algoritmos de aprendizado semi-supervisionado, é afetado de maneira bastante complexa pelos diferentes aspectos na indução dos modelos. / In Machine Learning, the supervised approach usually requires a large number of labeled training examples to learn accurately. However, labeling is often manually performed, making this process costly and time-consuming. By contrast, unlabeled examples are often inexpensive and easier to obtain than labeled examples. This is especially true for text classification tasks involving on-line data sources, such as web pages, email and scientific papers. Text classification is of great practical importance today given the massive volume of online text available. Semi-supervised learning, a relatively new area in Machine Learning, represents a blend of supervised and unsupervised learning, and has the potential of reducing the need of expensive labeled data whenever only a small set of labeled examples is available. This work describes the semi-supervised learning algorithm co-training, which requires a partitioned description of each example into two distinct views. It should be observed that the two different views required by co-training can be easily obtained from textual documents through pre-processing. In this works, several extensions of co-training algorithm have been implemented. Furthermore, we have also implemented a computational environment for text pre-processing, called PreTexT, in order to apply the co-training algorithm to text classification problems. Experimental results using co-training on three data sets are described. Two data sets are related to text classification and the other one to web-page classification. Results, which range from excellent to poor, show that co-training, similarly to other semi-supervised learning algorithms, is affected by modelling assumptions in a rather complicated way.
189

Analyse automatique de l’écriture manuscrite sur tablette pour la détection et le suivi thérapeutique de personnes présentant des pathologies / Automatic handwriting analysis for pathology detection and follow-up on digital tablets

Kahindo Senge Muvingi, Christian 14 November 2019 (has links)
Nous présentons dans cette thèse un nouveau paradigme pour caractériser la maladie d’Alzheimer à travers l’écriture manuscrite acquise sur tablette graphique. L’état de l’art est dominé par des méthodes qui supposent un comportement unique ou homogène au sein de chaque profil cognitif. Ces travaux exploitent des paramètres cinématiques globaux, sur lesquels ils appliquent des tests statistiques ou des algorithmes de classification pour discriminer les différents profils cognitifs (les patients Alzheimer, les troubles cognitifs légers (« Mild Cognitive impairment » : MCI) et les sujets Contrôle (HC)). Notre travail aborde ces deux limites de la littérature de la façon suivante : premièrement au lieu de considérer un comportement homogène au sein de chaque profil cognitif ou classe (HC, MCI, ES-AD : « Early-Stage Alzheimer Disease »), nous nous sommes affranchis de cette hypothèse (ou contrainte) forte de la littérature. Nous considérons qu’il peut y avoir plusieurs comportements au sein de chaque profil cognitif. Ainsi, nous proposons un apprentissage semi-supervisé pour trouver des groupes homogènes de sujets et analysons l’information contenue dans ces clusters ou groupes sur les profils cognitifs. Deuxièmement, au lieu d’exploiter les paramètres cinématiques globaux (ex : vitesse moyenne, pression moyenne, etc.), nous avons défini deux paramétrisations ou codages : une paramétrisation semi-globale, puis locale en modélisant la dynamique complète de chaque paramètre. L’un de nos résultats importants met en évidence deux clusters majeurs qui sont découverts, l’un dominé par les sujets HC et MCI et l’autre par les MCI et ES-AD, révélant ainsi que les patients atteints de MCI ont une motricité fine qui est proche soit des sujets HC, soit des patients ES-AD. Notre travail montre également que la vitesse prise localement regroupe un ensemble riche des caractéristiques telles que la taille, l’inclinaison, la fluidité et la régularité, et révèle comment ces paramètres spatiotemporels peuvent conjointement caractériser les profils cognitifs. / We present, in this thesis, a novel paradigm for assessing Alzheimer’s disease by analyzing impairment of handwriting (HW) on tablets, a challenging problem that is still in its infancy. The state of the art is dominated by methods that assume a unique behavioral trend for each cognitive profile, and that extract global kinematic parameters, assessed by standard statistical tests or classification models, for discriminating the neuropathological disorders (Alzheimer’s (AD), Mild Cognitive Impairment (MCI)) from Healthy Controls (HC). Our work tackles these two major limitations as follows. First, instead of considering a unique behavioral pattern for each cognitive profile, we relax this heavy constraint by allowing the emergence of multimodal behavioral patterns. We achieve this by performing semi-supervised learning to uncover homogeneous clusters of subjects, and then we analyze how much information these clusters carry on the cognitive profiles. Second, instead of relying on global kinematic parameters, mostly consisting of their average, we refine the encoding either by a semi-global parameterization, or by modeling the full dynamics of each parameter, harnessing thereby the rich temporal information inherently characterizing online HW. Thanks to our modeling, we obtain new findings that are the first of their kind on this research field. A striking finding is revealed: two major clusters are unveiled, one dominated by HC and MCI subjects, and one by MCI and ES-AD, thus revealing that MCI patients have fine motor skills leaning towards either HC’s or ES-AD’s. This thesis introduces also a new finding from HW trajectories that uncovers a rich set of features simultaneously like the full velocity profile, size and slant, fluidity, and shakiness, and reveals, in a naturally explainable way, how these HW features conjointly characterize, with fine and subtle details, the cognitive profiles.
190

Balancing signals for semi-supervised sequence learning

Xu, Ge Ya 12 1900 (has links)
Recurrent Neural Networks(RNNs) are powerful models that have obtained outstanding achievements in many sequence learning tasks. Despite their accomplishments, RNN models still suffer with long sequences during training. It is because error propagate backwards from output to input layers carrying gradient signals, and with long input sequence, issues like vanishing and exploding gradients can arise. This thesis reviews many current studies and existing architectures designed to circumvent the long-term dependency problems in backpropagation through time (BPTT). Mainly, we focus on the method proposed by Trinh et al. (2018) which uses semi- supervised learning method to alleviate the long-term dependency problems in BPTT. Despite the good results Trinh et al. (2018)’s model achieved, we suggest that the model can be further improved with a more systematic way of balancing auxiliary signals. In this thesis, we present our paper – RNNs with Private and Shared Representations for Semi-Supervised Learning – which is currently under review for AAAI-2019. We propose a semi-supervised RNN architecture with explicitly designed private and shared representations that regulates the gradient flow from auxiliary task to main task. / Les réseaux neuronaux récurrents (RNN) sont des modèles puissants qui ont obtenu des réalisations exceptionnelles dans de nombreuses tâches d’apprentissage séquentiel. Malgré leurs réalisations, les modèles RNN sou˙rent encore de longues séquences pendant l’entraî-nement. C’est parce que l’erreur se propage en arrière de la sortie vers les couches d’entrée transportant des signaux de gradient, et avec une longue séquence d’entrée, des problèmes comme la disparition et l’explosion des gradients peuvent survenir. Cette thèse passe en revue de nombreuses études actuelles et architectures existantes conçues pour contour-ner les problèmes de dépendance à long terme de la rétropropagation dans le temps (BPTT). Nous nous concentrons principalement sur la méthode proposée par cite Trinh2018 qui utilise une méthode d’apprentissage semi-supervisée pour atténuer les problèmes de dépendance à long terme dans BPTT. Malgré les bons résultats obtenus avec le modèle de cite Trinh2018, nous suggérons que le modèle peut être encore amélioré avec une manière plus systématique d’équilibrer les signaux auxiliaires. Dans cette thèse, nous présentons notre article - emph RNNs with Private and Shared Representations for Semi-Supervised Learning - qui est actuellement en cours de révision pour AAAI-2019. Nous propo-sons une architecture RNN semi-supervisée avec des représentations privées et partagées explicitement conçues qui régule le flux de gradient de la tâche auxiliaire à la tâche principale.

Page generated in 0.111 seconds