Global ETD Search

71	Image classification for a large number of object categories Bosch Rué, Anna 25 September 2007 (has links) L'increment de bases de dades que cada vegada contenen imatges més difícils i amb un nombre més elevat de categories, està forçant el desenvolupament de tècniques de representació d'imatges que siguin discriminatives quan es vol treballar amb múltiples classes i d'algorismes que siguin eficients en l'aprenentatge i classificació. Aquesta tesi explora el problema de classificar les imatges segons l'objecte que contenen quan es disposa d'un gran nombre de categories. Primerament s'investiga com un sistema híbrid format per un model generatiu i un model discriminatiu pot beneficiar la tasca de classificació d'imatges on el nivell d'anotació humà sigui mínim. Per aquesta tasca introduïm un nou vocabulari utilitzant una representació densa de descriptors color-SIFT, i desprès s'investiga com els diferents paràmetres afecten la classificació final. Tot seguit es proposa un mètode par tal d'incorporar informació espacial amb el sistema híbrid, mostrant que la informació de context es de gran ajuda per la classificació d'imatges. Desprès introduïm un nou descriptor de forma que representa la imatge segons la seva forma local i la seva forma espacial, tot junt amb un kernel que incorpora aquesta informació espacial en forma piramidal. La forma es representada per un vector compacte obtenint un descriptor molt adequat per ésser utilitzat amb algorismes d'aprenentatge amb kernels. Els experiments realitzats postren que aquesta informació de forma te uns resultats semblants (i a vegades millors) als descriptors basats en aparença. També s'investiga com diferents característiques es poden combinar per ésser utilitzades en la classificació d'imatges i es mostra com el descriptor de forma proposat juntament amb un descriptor d'aparença millora substancialment la classificació. Finalment es descriu un algoritme que detecta les regions d'interès automàticament durant l'entrenament i la classificació. Això proporciona un mètode per inhibir el fons de la imatge i afegeix invariança a la posició dels objectes dins les imatges. S'ensenya que la forma i l'aparença sobre aquesta regió d'interès i utilitzant els classificadors random forests millora la classificació i el temps computacional. Es comparen els postres resultats amb resultats de la literatura utilitzant les mateixes bases de dades que els autors Aixa com els mateixos protocols d'aprenentatge i classificació. Es veu com totes les innovacions introduïdes incrementen la classificació final de les imatges. / The release of challenging data sets with ever increasing numbers of object categories isforcing the development of image representations that can cope with multiple classes andof algorithms that are efficient in training and testing. This thesis explores the problem ofclassifying images by the object they contain in the case of a large number of categories. We first investigate weather the hybrid combination of a latent generative model with a discriminative classifier is beneficial for the task of weakly supervised image classification.We introduce a novel vocabulary using dense color SIFT descriptors, and then investigate classification performances by optimizing different parameters. A new way to incorporate spatial information within the hybrid system is also proposed showing that contextual information provides a strong support for image classification. We then introduce a new shape descriptor that represents local image shape and its spatial layout, together with a spatial pyramid kernel. Shape is represented as a compactvector descriptor suitable for use in standard learning algorithms with kernels. Experimentalresults show that shape information has similar classification performances and sometimes outperforms those methods using only appearance information. We also investigate how different cues of image information can be used together. Wewill see that shape and appearance kernels may be combined and that additional informationcues increase classification performance. Finally we provide an algorithm to automatically select the regions of interest in training. This provides a method of inhibiting background clutter and adding invariance to the object instance's position. We show that shape and appearance representation over the regions of interest together with a random forest classifier which automatically selects the best cues increases on performance and speed. We compare our classification performance to that of previous methods using the authors'own datasets and testing protocols. We will see that the set of innovations introduced here lead for an impressive increase on performance. Categorias de objetos Object categories Modelo discriminativo Model discriminatiu Discriminative model Random forest Modelo generativo Model generatiu Generative model Regiones de interés Regions d'interès Region of interest Clasificación de imágenes Classificació d'imatges Image classification Categories d'objectes pLSA Probabilistic Latent Semantic Analysis 004 68
72	Modèles statistiques non linéaires pour l'analyse de formes : application à l'imagerie cérébrale Sfikas, Giorgos 07 September 2012 (has links) (PDF) Cette thèse a pour objet l'analyse statistique de formes, dans le contexte de l'imagerie médicale.Dans le champ de l'imagerie médicale, l'analyse de formes est utilisée pour décrire la variabilité morphologique de divers organes et tissus. Nous nous focalisons dans cette thèse sur la construction d'un modèle génératif et discriminatif, compact et non-linéaire, adapté à la représentation de formes.Ce modèle est évalué dans le contexte de l'étude d'une population de patients atteints de la maladie d'Alzheimer et d'une population de sujets contrôles sains. Notre intérêt principal ici est l'utilisationdu modèle discriminatif pour découvrir les différences morphologiques les plus discriminatives entre une classe de formes donnée et des formes n'appartenant pas à cette classe. L'innovation théorique apportée par notre modèle réside en deux points principaux : premièrement, nous proposons un outil pour extraire la différence discriminative dans le cadre Support Vector Data Description (SVDD) ; deuxièmement, toutes les reconstructions générées sont anatomiquementcorrectes. Ce dernier point est dû au caractère non-linéaire et compact du modèle, lié à l'hypothèse que les données (les formes) se trouvent sur une variété non-linéaire de dimension faible. Une application de notre modèle à des données médicales réelles montre des résultats cohérents avec les connaissances médicales. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Analyse statistique de formes Apprentissage de variétés Support vector data description Différence discriminative Maladie d'Alzheimer Hippocampe
73	Efeitos do tratamento com l?tio na mem?ria aversiva, comportamentos relacionados ? ansiedade e depress?o e na express?o de BDNF em ratos Pontes, Isabella Maria de Oliveira 09 May 2014 (has links) Made available in DSpace on 2014-12-17T15:37:21Z (GMT). No. of bitstreams: 1 IsabellaMOP_DISSERT.pdf: 1986488 bytes, checksum: 1f1b995fa77d662628cf94f2e167faf0 (MD5) Previous issue date: 2014-05-09 / Conselho Nacional de Desenvolvimento Cient?fico e Tecnol?gico / Lithium (Li) is the first choice to treat bipolar disorder, a psychiatric illness characterized by mood oscillations between mania and depression. However, studies have demonstrated that this drug might influence mnemonic process due to its neuroprotector, antiapoptotic and neurogenic effects. The use of Li in the treatment of cognitive deficits caused by brain injury or neurodegenerative disorders have been widely studied, and this drug shows to be effective in preventing or even alleviating the memory impairment. The effects of Li on anxiety and depression are controversial and the relationship of the effects of lithium on memory, anxiety and depression remain unknown. In this context, this study aims to: evaluate the effects of acute and chronic administration of lithium carbonate in aversive memory and anxiety, simultaneously, using the plus maze discriminative avoidance task (PMDAT); test the antidepressant effect of the drug through the forced swimming test (FS) and analyze brainderived neurotrophic factor (BDNF) expression in structures related to memory and emotion. To evaluation of the acute effects, male Wistar rats were submitted to i.p. administration of lithium carbonate (50, 100 or 200 mg/kg) one hour before the training session (PMDAT) or lithium carbonate (50 or 100 mg/kg) one hour before the test session (FS). To evaluation of the chronic effects, the doses administered were 50 or 100 mg/kg or vehicle once a day for 21 days before the beginning of behavioral tasks (PMDAT and FS). Afterwards, the animals were euthanized and their brains removed and submitted to immunohistochemistry procedure to quantify BDNF. The animals that received acute treatment with 100 and 200 mg/kg of Li did not discriminated between the enclosed arms (aversive and non-aversive) in the training session of PMDAT, showing that these animal did not learned the task. This lack of discrimination was also observed in the test session, showing that the animals did not recall the aversive task. We also observed an increased exploration of the open arms of these same groups, indicating an anxiolytic effect. The same groups showed a reduction of locomotor activity, however, this effect does not seem to be related with the anxiolytic effect of the drug. Chronic treatment with Li did not promote alterations on learning or memory processes. Nevertheless, we observed a reduction of open arms exploration by animals treated with 50 mg/kg when compared to the other groups, showing an anxiogenic effect caused by this dose. This effect it is not related to locomotor alterations since there were no alterations in these parameters. Both acute and chronic treatment were ineffective in the FS. Chronic treatment with lithium was not able to modify BDNF expression in hippocampus, amygdala and pre-frontal cortex. These results suggest that acute administration of lithium promote impairments on learning in an aversive task, blocking the occurrence of memory consolidation and retrieval. The reduction of anxiety following acute treatment may have prevented the learning of the aversive task, as it has been found that optimum levels of anxiety are necessary for the occurrence of learning with emotional context. With continued, treatment the animals recover the ability to learn and recall the task. Indeed, they do not show differences in relation to control group, and the lack of alterations on BDNF expression corroborates this result. Possibly, the regimen of treatment used was not able to promote cognitive improvement. Li showed acute anxiolytic effect, however chronic administration 4 promoted the opposite effect. More studies are necessary to clarify the potential beneficial effect of Li on aversive memory / L?tio (Li) ? o f?rmaco de escolha para o tratamento do transtorno bipolar, doen?a psiqui?trica caracterizada por oscila??es de humor entre mania e depress?o. Entretanto, estudos mostram que essa droga pode ter influ?ncia sobre os processos mnem?nicos devido a seu car?ter neuroprotetor, antiapopt?tico e neurog?nico. O emprego no l?tio para o tratamento de d?ficits cognitivos provocados por les?es cerebrais ou doen?as neurodegenerativas vem sendo amplamente estudado, visto que esse f?rmaco mostra-se capaz de prevenir ou at? mesmo aliviar preju?zos na mem?ria. Os efeitos do Li na ansiedade e depress?o s?o controversos e a rela??o entre os efeitos do Li na mem?ria, ansiedade e depress?o s?o ainda desconhecidos. Neste contexto, os objetivos deste estudo foram: avaliar os efeitos da administra??o aguda e cr?nica de carbonato de l?tio na mem?ria aversiva e ansiedade, simultaneamente, utilizando a esquiva discriminativa no labirinto em cruz elevado (ED); testar o efeito antidepressivo do f?rmaco atrav?s do teste do nado for?ado (NF); avaliar a express?o de fator neurotr?fico derivado do enc?falo (BDNF) em estruturas relacionadas com mem?ria e emo??o. Para a avalia??o do efeito agudo, ratos Wistar machos foram submetidos ? administra??o intraperitoneal de carbonato de l?tio 50, 100 ou 200 mg/kg uma hora antes do treino (ED) ou carbonato de l?tio 50 ou 100 mg/kg uma hora antes do teste (NF). Para a avalia??o cr?nica, foram administradas as doses de 50 ou 100 mg/kg ou ve?culo por 21 dias antes do in?cio das tarefas comportamentais (ED e NF). Ap?s o t?rmino dessas tarefas, os animais foram eutanasiados e seus enc?falos removidos para realiza??o de imunohistoqu?mica para quantificar BDNF. Os animais que receberam tratamento agudo com Li nas doses de 100 e 200 mg/kg n?o demonstraram discrimina??o entre os bra?os fechados (aversivo e n?o-aversivo) na sess?o treino da ED, mostrando que esses animais n?o aprenderam a tarefa. Essa aus?ncia na discrimina??o foi observada tamb?m na sess?o teste, mostrando que n?o houve evoca??o da mem?ria aversiva. Foi ainda observado um aumento da explora??o dos bra?os abertos para essas mesmas doses, apontando um efeito ansiol?tico do f?rmaco. Os mesmos grupos apresentaram ainda uma redu??o na atividade locomotora, no entanto, esse efeito parece n?o estar relacionado com o efeito ansiol?tico do f?rmaco. O tratamento cr?nico com l?tio n?o promoveu altera??es nos processos de aprendizado e mem?ria. No entanto, foi observado uma redu??o da explora??o dos bra?os abertos pelos animais tratados com a dose de 50 mg/kg em rela??o aos outros grupos, mostrando um efeito ansiog?nico causado pelo tratamento cr?nico. Esse efeito n?o est? relacionado a altera??es locomotoras, visto que n?o foi detectado altera??es nesses par?metros. Ambos os tratamentos (agudo e cr?nico) foram ineficazes em demonstrar o efeito antidepressivo do l?tio na tarefa do NF. O tratamento cr?nico com l?tio tamb?m n?o foi capaz de alterar a express?o de BDNF no hipocampo, am?gdala e c?rtex pr?-frontal. Esses resultados sugerem que a administra??o aguda de l?tio promove preju?zos no aprendizado em uma tarefa aversiva, impedindo a ocorr?ncia de consolida??o e evoca??o da mem?ria. A redu??o da ansiedade no tratamento agudo pode ter impedido o aprendizado da tarefa aversiva, visto que j? foi verificado que n?veis adequados de ansiedade s?o necess?rios para que ocorra aprendizado com contexto 2 emocional. Com a continuidade do tratamento os animais recuperam a capacidade de aprender e evocar a tarefa, mas n?o apresentam altera??es em rela??o ao grupo controle e a aus?ncia de altera??o na express?o de BDNF corrobora esse resultado. Possivelmente, o regime de tratamento utilizado n?o foi capaz de promover melhora cognitiva nos animais. O l?tio demonstrou efeito ansiol?tico agudo, todavia a administra??o cr?nica promoveu efeito oposto. Mais estudos s?o necess?rios para esclarecer o potencial efeito ben?fico do l?tio sobre a mem?ria
74	Rozšíření pro pravděpodobnostní lineární diskriminační analýzu v rozpoznávání mluvčího / Extensions to Probabilistic Linear Discriminant Analysis for Speaker Recognition Plchot, Oldřich Unknown Date (has links) Tato práce se zabývá pravděpodobnostními modely pro automatické rozpoznávání řečníka. Podrobně analyzuje zejména pravděpodobnostní lineární diskriminační analýzu (PLDA), která modeluje nízkodimenzionální reprezentace promluv ve formě \acronym{i--vektorů}. Práce navrhuje dvě rozšíření v současnosti požívaného PLDA modelu. Nově navržený PLDA model s plným posteriorním rozložením modeluje neurčitost při generování i--vektorů. Práce také navrhuje nový diskriminativní přístup k trénování systému pro verifikaci řečníka, který je založený na PLDA. Pokud srovnáváme původní PLDA s modelem rozšířeným o modelování neurčitosti i--vektorů, výsledky dosažené s rozšířeným modelem dosahují až 20% relativního zlepšení při testech s krátkými nahrávkami. Pro delší testovací segmenty (více než jedna minuta) je zisk v přesnosti menší, nicméně přesnost nového modelu není nikdy menší než přesnost výchozího systému. Trénovací data jsou ale obvykle dostupná ve formě dostatečně dlouhých segmentů, proto v těchto případech použití nového modelu neposkytuje žádné výhody při trénování. Při trénování může být použit původní PLDA model a jeho rozšířená verze může být využita pro získání skóre v případě, kdy se bude provádět testování na krátkých segmentech řeči. Diskriminativní model je založen na klasifikaci dvojic i--vektorů do dvou tříd představujících oprávněný a neoprávněný soud (target a non-target trial). Funkcionální forma pro získání skóre pro každý pár je odvozena z PLDA a trénování je založeno na logistické regresi, která minimalizuje vzájemnou entropii mezi správným označením všech soudů a pravděpodobnostním označením soudů, které navrhuje systém. Výsledky dosažené s diskriminativně trénovaným klasifikátorem jsou podobné výsledkům generativního PLDA, ale diskriminativní systém prokazuje schopnost produkovat lépe kalibrované skóre. Tato schopnost vede k lepší skutečné přesnosti na neviděné evaluační sadě, což je důležitá vlastnost pro reálné použití.
75	Machines à noyaux pour le filtrage d'alarmes : application à la discrimination multiclasse en environnement maritime / Kernels machines for alarm-filtering : application to multiclass discrimination in the naval context Labbé, Benjamin 03 May 2011 (has links) Les systèmes infrarouges sont essentiels pour fournir aux forces armées une capacité de reconnaissance des menaces. En contexte opérationnel, ces systèmes sont contraints au temps-réel et à l’accès à des taux de fausses alarmes faibles. Ceci implique la détection des menaces parmi de nombreux objets non-pertinents.Dans ce document, nous combinons des OneClass-SVM pour une décision multiclasse avec rejet(préservant la fausse-alarme). En apprentissage, nous sélectionnons les variables pour contrôler la parcimonie du moteur de décision.Nous présentons également un classifieur original, le Discriminative OneClass-SVM, combinant les propriétés du C-SVM et du OneClass-SVM dans le contexte multiclasse. Ce détecteur de nouveauté n’a pas de dépendance au nombre de classes. Ceci permet une utilisation sur des données à grande échelle.Nos expériences sur des données réelles démontrent l’intérêt des propositions pour les systèmes fortement contraints, face aux méthodes de référence. / Infrared systems are keys to provide automatic control of threats to military forces. Such operational systems are constrained to real-time processing and high efficiency (low false-alarm rate) implying the recognition of threats among numerous irrelevant objects.In this document, we combine OneClass Support Vector Machines (SVM) to discriminate in the multiclass framework and to reject unknown objects (preserving the false-alarm rate).While learning, we perform variable selection to control the sparsity of the decision functions. We also introduce a new classifier, the Discriminative OneClass-SVM. It combines properties of both the biclass-SVM and the OneClass-SVM in a multiclass framework. This classifier detects novelty and has no dependency to the amount of categories, allowing to tackle large scale problems. Numerical experiments, on real world infrared datasets, demonstrate the relevance of our proposals for highly constrained systems, when compared to standard methods. Apprentissage automatique Classification de pistes Séparateur à Vaste Marge Décision avec rejet Surveillance infrarouge Machine learning Track classification Support vecteur machines One-class SVM Discriminative One-Class SVM Decision with reject options Infrared monitoring
76	Apprentissage discriminant des modèles continus en traduction automatique / Discriminative Training Procedure for Continuous-Space Translation Models Do, Quoc khanh 31 March 2016 (has links) Durant ces dernières années, les architectures de réseaux de neurones (RN) ont été appliquées avec succès à de nombreuses applications en Traitement Automatique de Langues (TAL), comme par exemple en Reconnaissance Automatique de la Parole (RAP) ainsi qu'en Traduction Automatique (TA).Pour la tâche de modélisation statique de la langue, ces modèles considèrent les unités linguistiques (c'est-à-dire des mots et des segments) à travers leurs projections dans un espace continu (multi-dimensionnel), et la distribution de probabilité à estimer est une fonction de ces projections.Ainsi connus sous le nom de "modèles continus" (MC), la particularité de ces derniers se trouve dans l'exploitation de la représentation continue qui peut être considérée comme une solution au problème de données creuses rencontré lors de l'utilisation des modèles discrets conventionnels.Dans le cadre de la TA, ces techniques ont été appliquées dans les modèles de langue neuronaux (MLN) utilisés dans les systèmes de TA, et dans les modèles continus de traduction (MCT).L'utilisation de ces modèles se sont traduit par d'importantes et significatives améliorations des performances des systèmes de TA. Ils sont néanmoins très coûteux lors des phrases d'apprentissage et d'inférence, notamment pour les systèmes ayant un grand vocabulaire.Afin de surmonter ce problème, l'architecture SOUL (pour "Structured Output Layer" en anglais) et l'algorithme NCE (pour "Noise Contrastive Estimation", ou l'estimation contrastive bruitée) ont été proposés: le premier modifie la structure standard de la couche de sortie, alors que le second cherche à approximer l'estimation du maximum de vraisemblance (MV) par une méthode d’échantillonnage.Toutes ces approches partagent le même critère d'estimation qui est la log-vraisemblance; pourtant son utilisation mène à une incohérence entre la fonction objectif définie pour l'estimation des modèles, et la manière dont ces modèles seront utilisés dans les systèmes de TA.Cette dissertation vise à concevoir de nouvelles procédures d'entraînement des MC, afin de surmonter ces problèmes.Les contributions principales se trouvent dans l'investigation et l'évaluation des méthodes d'entraînement efficaces pour MC qui visent à: (i) réduire le temps total de l'entraînement, et (ii) améliorer l'efficacité de ces modèles lors de leur utilisation dans les systèmes de TA.D'un côté, le coût d'entraînement et d'inférence peut être réduit (en utilisant l'architecture SOUL ou l'algorithme NCE), ou la convergence peut être accélérée.La dissertation présente une analyse empirique de ces approches pour des tâches de traduction automatique à grande échelle.D'un autre côté, nous proposons un cadre d'apprentissage discriminant qui optimise la performance du système entier ayant incorporé un modèle continu.Les résultats expérimentaux montrent que ce cadre d'entraînement est efficace pour l'apprentissage ainsi que pour l'adaptation des MC au sein des systèmes de TA, ce qui ouvre de nouvelles perspectives prometteuses. / Over the past few years, neural network (NN) architectures have been successfully applied to many Natural Language Processing (NLP) applications, such as Automatic Speech Recognition (ASR) and Statistical Machine Translation (SMT).For the language modeling task, these models consider linguistic units (i.e words and phrases) through their projections into a continuous (multi-dimensional) space, and the estimated distribution is a function of these projections. Also qualified continuous-space models (CSMs), their peculiarity hence lies in this exploitation of a continuous representation that can be seen as an attempt to address the sparsity issue of the conventional discrete models. In the context of SMT, these echniques have been applied on neural network-based language models (NNLMs) included in SMT systems, and oncontinuous-space translation models (CSTMs). These models have led to significant and consistent gains in the SMT performance, but are also considered as very expensive in training and inference, especially for systems involving large vocabularies. To overcome this issue, Structured Output Layer (SOUL) and Noise Contrastive Estimation (NCE) have been proposed; the former modifies the standard structure on vocabulary words, while the latter approximates the maximum-likelihood estimation (MLE) by a sampling method. All these approaches share the same estimation criterion which is the MLE ; however using this procedure results in an inconsistency between theobjective function defined for parameter stimation and the way models are used in the SMT application. The work presented in this dissertation aims to design new performance-oriented and global training procedures for CSMs to overcome these issues. The main contributions lie in the investigation and evaluation of efficient training methods for (large-vocabulary) CSMs which aim~:(a) to reduce the total training cost, and (b) to improve the efficiency of these models when used within the SMT application. On the one hand, the training and inference cost can be reduced (using the SOUL structure or the NCE algorithm), or by reducing the number of iterations via a faster convergence. This thesis provides an empirical analysis of these solutions on different large-scale SMT tasks. On the other hand, we propose a discriminative training framework which optimizes the performance of the whole system containing the CSM as a component model. The experimental results show that this framework is efficient to both train and adapt CSM within SMT systems, opening promising research perspectives. Traduction Automatique Statistique Réseau de neurones Modèles Continus de Traduction Apprentissage Discriminant Méthodes à Larges Marges Estimation Contrastive Bruitée Statistical Machine Translation Neural Network Continuous-Space Models Discriminative Training Large-Margin Methods Noise Contrastive Estimation
77	Diskriminační čití u adolescentních pacientek hospitalizovaných s mentální anorexií / Two-point discrimination in adolescent patients hospitalized with anorexia nervosa Kočí, Gabriela January 2019 (has links) Anorexia nervosa (AN) is a mental illnes manifesting itselg, among other signs, with impaired body schema and rejection of food. Principal focus of the thesis was to assess the discrimination treshold, ability to evaluate sensory perception and body self-concept in adolescent female patients hospitalised with anorexia nervosa. Our goal was to clarify and better understand the still not adequately described neurophysiological aspects of anorexia nervosa. The results were compared to control group; both groups comprised 18 girls, the average age of observed group being 14,7 ± 0,71 years and average age of control group being 15,3 ±0,71 years. Two-point discrimination was examined in three areas - arm, between shoulder blades and belly - with modified caliper. The Petrie test was used in sensory perception testing, while body self-concept was measured with BAT questionare. The examinations were performed in standardised conditions during similar day times. We found significant difference in two-point discrimination in the area between the shoulder blades with significance level α = 5 % and p-value p = 0,0001. A statistically significant difference was also observed in body self-concept with significance level α = 5 % and p-value p = 0,017. Thus we conclude that patients suffering from anorexia nervosa...
78	Zvyšování robustnosti systémů pro rozpoznávání mluvčích pomocí diskriminativních technik / Improving Robustness of Speaker Recognition using Discriminative Techniques Novotný, Ondřej January 2021 (has links) Tato práce pojednává o využití diskriminativních technik v oblasti rozpoznávání mluvčích za účelem získání větší robustnosti těchto systémů vůči vlivům negativně ovlivňující jejich výkonnost. Mezi tyto vlivy řadíme šum, reverberaci nebo přenosový kanál. Práce je rozdělena do dvou hlavních částí. V první části se věnujeme teoretickému úvodu do problematiky rozpoznávání mluvčích. Popsány jsou jednotlivé kroky rozpoznávacího systému od extrakce akustických příznaků, extrakce vektorových reprezentací nahrávek, až po tvorbu finálního rozpoznávacího skóre. Zvláštní důraz je věnován technikám extrakce vektorové reprezentace nahrávky, kdy popisujeme dvě rozdílná paradigmata možného přístupu, i-vektory a x-vektory. Druhá část práce se již více věnuje diskriminativním technikám pro zvýšení robustnosti. Techniky jsou organizovány tak, aby odpovídaly postupnému průchodu nahrávky rozpoznávacím systémem. Nejdříve je věnována pozornost předzpracování signálu pomocí neuronové sítě pro odšumění a obohacení signálu řeči jako univerzální technice, která je nezávislá na následně použitém rozpoznávacím systému. Dále se zameřujeme na využití diskriminativního přístupu při extrakci příznaků a extrakci vektorových reprezentací nahrávek. Práce rovněž pokrývá přechod od generativního paradigmatu k plně diskriminativnímu přístupu v systémech pro rozpoznávání mluvčích. Veškeré techniky jsou následně vždy experimentálně ověřeny a zhodnocen jejich přínos. V práci je navrženo několik přístupů, které se osvědčily jak u generativního přístupu v podobě i-vektorů, tak i u diskriminativních x-vektorů, a díky nim bylo dosaženo významného zlepšení. Pro úplnost jsou, v oblasti problematiky robustnosti, do práce zařazeny i další techniky, jako je normalizace skóre, či více-scénářové trénování systémů. Závěrem se práce zabývá problematikou robustnosti diskriminativních systému z pohledu dat využitých při jejich trénování.
79	Visual Tracking with Deep Learning : Automatic tracking of farm animals Zhu, Biwen January 2018 (has links) Automatic tracking and video of surveillance on a farm could help to support farm management. In this project, an automated detection system is used to detect sows in surveillance videos. This system is based upon deep learning and computer vision methods. In order to minimize disk storage and to meet the network requirements necessary to achieve the real-performance, tracking in compressed video streams is essential. The proposed system uses a Discriminative Correlation Filter (DCF) as a classifier to detect targets. The tracking model is updated by training the classifier with online learning methods. Compression technology encodes the video data, thus reducing both the bit rates at which video signals are transmitted and helping the video transmission better adapt to the limited network bandwidth. However, compression may reduce the image quality of the videos the precision of our tracking may decrease. Hence, we conducted a performance evaluation of existing visual tracking algorithms on video sequences with quality degradation due to various compression parameters (encoders, target bitrate, rate control model, and Group of Pictures (GOP) size). The ultimate goal of video compression is to realize a tracking system with equal performance, but requiring fewer network resources. The proposed tracking algorithm successfully tracks each sow in consecutive frames in most cases. The performance of our tracker was benchmarked against two state-of-art tracking algorithms: Siamese Fully-Convolutional (FC) and Efficient Convolution Operators (ECO). The performance evaluation result shows our proposed tracker has similar performance to both Siamese FC and ECO. In comparison with the original tracker, the proposed tracker achieved similar tracking performance, while requiring much less storage and generating a lower bitrate when the video was compressed with appropriate parameters. However, the system is far slower than needed for real-time tracking due to high computational complexity; therefore, more optimal methods to update the tracking model will be needed to achieve real-time tracking. / Automatisk spårning av övervakning i gårdens område kan bidra till att stödja jordbruket management. I detta projekt till ett automatiserat system för upptäckt upptäcka suggor från övervaknings filmer kommer att utformas med djupa lärande och datorseende metoder. Av hänsyn till Diskhantering och tid och hastighet Krav över nätverket för att uppnå realtidsscenarier i framtiden är spårning i komprimerade videoströmmar är avgörande. Det föreslagna systemet i detta projekt skulle använda en DCF (diskriminerande korrelationsfilter) som en klassificerare att upptäcka mål. Spårningen modell kommer att uppdateras genom att utbilda klassificeraren med online inlärningsmetoder. Compression teknik kodar videodata och minskar bithastigheter där videosignaler sänds kan hjälpa videoöverföring anpassar bättre i begränsad nätverk. det kan dock reducera bildkvaliteten på videoklipp och leder exakt hastighet av vårt spårningssystem för att minska. Därför undersöker vi utvärderingen av prestanda av befintlig visuella spårningsalgoritmer på videosekvenser Det ultimata målet med videokomprimering är att bidra till att bygga ett spårningssystem med samma prestanda men kräver färre nätverksresurser. Den föreslagna spårning algoritm spår framgångsrikt varje sugga i konsekutiva ramar i de flesta fall prestanda vår tracker var jämföras med två state-of-art spårning algoritmer:. Siamese Fully-Convolutional (FC) och Efficient Convolution Operators (ECO) utvärdering av prestanda Resultatet visar vår föreslagna tracker blir liknande prestanda med Siamese FC och ECO. I jämförelse med den ursprungliga spårningen uppnådde den föreslagna spårningen liknande spårningseffektivitet, samtidigt som det krävde mycket mindre lagring och alstra en lägre bitrate när videon komprimerades med lämpliga parametrar. Systemet är mycket långsammare än det behövs för spårning i realtid på grund av hög beräkningskomplexitet; därför behövs mer optimala metoder för att uppdatera spårningsmodellen för att uppnå realtidsspårning. Computer vision Video tracking Machine learning Discriminative correlation filter Compressed video Bandwidth balancing Network traffic Dator vision Video spårning Maskininlärning Diskriminerande korrelationsfilter Komprimerad video Bandbredd balansering Nätverkstrafik Computer Sciences Datavetenskap (datalogi)
80	Automatic Detection of Brain Functional Disorder Using Imaging Data Dey, Soumyabrata 01 January 2014 (has links) Recently, Attention Deficit Hyperactive Disorder (ADHD) is getting a lot of attention mainly for two reasons. First, it is one of the most commonly found childhood behavioral disorders. Around 5-10% of the children all over the world are diagnosed with ADHD. Second, the root cause of the problem is still unknown and therefore no biological measure exists to diagnose ADHD. Instead, doctors need to diagnose it based on the clinical symptoms, such as inattention, impulsivity and hyperactivity, which are all subjective. Functional Magnetic Resonance Imaging (fMRI) data has become a popular tool to understand the functioning of the brain such as identifying the brain regions responsible for different cognitive tasks or analyzing the statistical differences of the brain functioning between the diseased and control subjects. ADHD is also being studied using the fMRI data. In this dissertation we aim to solve the problem of automatic diagnosis of the ADHD subjects using their resting state fMRI (rs-fMRI) data. As a core step of our approach, we model the functions of a brain as a connectivity network, which is expected to capture the information about how synchronous different brain regions are in terms of their functional activities. The network is constructed by representing different brain regions as the nodes where any two nodes of the network are connected by an edge if the correlation of the activity patterns of the two nodes is higher than some threshold. The brain regions, represented as the nodes of the network, can be selected at different granularities e.g. single voxels or cluster of functionally homogeneous voxels. The topological differences of the constructed networks of the ADHD and control group of subjects are then exploited in the classification approach. We have developed a simple method employing the Bag-of-Words (BoW) framework for the classification of the ADHD subjects. We represent each node in the network by a 4-D feature vector: node degree and 3-D location. The 4-D vectors of all the network nodes of the training data are then grouped in a number of clusters using K-means; where each such cluster is termed as a word. Finally, each subject is represented by a histogram (bag) of such words. The Support Vector Machine (SVM) classifier is used for the detection of the ADHD subjects using their histogram representation. The method is able to achieve 64% classification accuracy. The above simple approach has several shortcomings. First, there is a loss of spatial information while constructing the histogram because it only counts the occurrences of words ignoring the spatial positions. Second, features from the whole brain are used for classification, but some of the brain regions may not contain any useful information and may only increase the feature dimensions and noise of the system. Third, in our study we used only one network feature, the degree of a node which measures the connectivity of the node, while other complex network features may be useful for solving the proposed problem. In order to address the above shortcomings, we hypothesize that only a subset of the nodes of the network possesses important information for the classification of the ADHD subjects. To identify the important nodes of the network we have developed a novel algorithm. The algorithm generates different random subset of nodes each time extracting the features from a subset to compute the feature vector and perform classification. The subsets are then ranked based on the classification accuracy and the occurrences of each node in the top ranked subsets are measured. Our algorithm selects the highly occurring nodes for the final classification. Furthermore, along with the node degree, we employ three more node features: network cycles, the varying distance degree and the edge weight sum. We concatenate the features of the selected nodes in a fixed order to preserve the relative spatial information. Experimental validation suggests that the use of the features from the nodes selected using our algorithm indeed help to improve the classification accuracy. Also, our finding is in concordance with the existing literature as the brain regions identified by our algorithms are independently found by many other studies on the ADHD. We achieved a classification accuracy of 69.59% using this approach. However, since this method represents each voxel as a node of the network which makes the number of nodes of the network several thousands. As a result, the network construction step becomes computationally very expensive. Another limitation of the approach is that the network features, which are computed for each node of the network, captures only the local structures while ignore the global structure of the network. Next, in order to capture the global structure of the networks, we use the Multi-Dimensional Scaling (MDS) technique to project all the subjects from an unknown network-space to a low dimensional space based on their inter-network distance measures. For the purpose of computing distance between two networks, we represent each node by a set of attributes such as the node degree, the average power, the physical location, the neighbor node degrees, and the average powers of the neighbor nodes. The nodes of the two networks are then mapped in such a way that for all pair of nodes, the sum of the attribute distances, which is the inter-network distance, is minimized. To reduce the network computation cost, we enforce that the maximum relevant information is preserved with minimum redundancy. To achieve this, the nodes of the network are constructed with clusters of highly active voxels while the activity levels of the voxels are measured based on the average power of their corresponding fMRI time-series. Our method shows promise as we achieve impressive classification accuracies (73.55%) on the ADHD-200 data set. Our results also reveal that the detection rates are higher when classification is performed separately on the male and female groups of subjects. So far, we have only used the fMRI data for solving the ADHD diagnosis problem. Finally, we investigated the answers of the following questions. Do the structural brain images contain useful information related to the ADHD diagnosis problem? Can the classification accuracy of the automatic diagnosis system be improved combining the information of the structural and functional brain data? Towards that end, we developed a new method to combine the information of structural and functional brain images in a late fusion framework. For structural data we input the gray matter (GM) brain images to a Convolutional Neural Network (CNN). The output of the CNN is a feature vector per subject which is used to train the SVM classifier. For the functional data we compute the average power of each voxel based on its fMRI time series. The average power of the fMRI time series of a voxel measures the activity level of the voxel. We found significant differences in the voxel power distribution patterns of the ADHD and control groups of subjects. The Local binary pattern (LBP) texture feature is used on the voxel power map to capture these differences. We achieved 74.23% accuracy using GM features, 77.30% using LBP features and 79.14% using combined information. In summary this dissertation demonstrated that the structural and functional brain imaging data are useful for the automatic detection of the ADHD subjects as we achieve impressive classification accuracies on the ADHD-200 data set. Our study also helps to identify the brain regions which are useful for ADHD subject classification. These findings can help in understanding the pathophysiology of the problem. Finally, we expect that our approaches will contribute towards the development of a biological measure for the diagnosis of the ADHD subjects. Computer Sciences Engineering

Search results