Global ETD Search

431	Sledování objektů ve videosekvencích / Image Tracking in Video Sequences Pavlík, Vít January 2016 (has links) Master's thesis addresses the long-term image tracking in video sequences. The project was intended to demonstrate the techniques that are needed for handling the long-term tracking. It primarily describes the techniques which application leads to construction of adaptive tracking system which is able to deal with the change of appearance of the object and unstable character of the surrounding environement appropriately.
432	All Negative on the Western Front: Analyzing the Sentiment of the Russian News Coverage of Sweden with Generic and Domain-Specific Multinomial Naive Bayes and Support Vector Machines Classifiers / På västfronten intet gott: attitydanalys av den ryska nyhetsrapporteringen om Sverige med generiska och domänspecifika Multinomial Naive Bayes- och Support Vector Machines-klassificerare Michel, David January 2021 (has links) This thesis explores to what extent Multinomial Naive Bayes (MNB) and Support Vector Machines (SVM) classifiers can be used to determine the polarity of news, specifically the news coverage of Sweden by the Russian state-funded news outlets RT and Sputnik. Three experiments are conducted. In the first experiment, an MNB and an SVM classifier are trained with the Large Movie Review Dataset (Maas et al., 2011) with a varying number of samples to determine how training data size affects classifier performance. In the second experiment, the classifiers are trained with 300 positive, negative, and neutral news articles (Agarwal et al., 2019) and tested on 95 RT and Sputnik news articles about Sweden (Bengtsson, 2019) to determine if the domain specificity of the training data outweighs its limited size. In the third experiment, the movie-trained classifiers are put up against the domain-specific classifiers to determine if well-trained classifiers from another domain perform better than relatively untrained, domain-specific classifiers. Four different types of feature sets (unigrams, unigrams without stop words removal, bigrams, trigrams) were used in the experiments. Some of the model parameters (TF-IDF vs. feature count and SVM’s C parameter) were optimized with 10-fold cross-validation. Other than the superior performance of SVM, the results highlight the need for comprehensive and domain-specific training data when conducting machine learning tasks, as well as the benefits of feature engineering, and to a limited extent, the removal of stop words. Interestingly, the classifiers performed the best on the negative news articles, which made up most of the test set (and possibly of Russian news coverage of Sweden in general). sentiment analysis news sentiment text classification cross-domain sentiment classification domain specificity domain-transfer problem transfer learning knowledge transfer support vector machines SVM multinomial naive Bayes Sweden Aurora 17 Russia Russian news RT Sputnik cyberwarfare influence campaign disinformation fake news propaganda
433	Silent speech recognition in EEG-based brain computer interface Ghane, Parisa January 2015 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / A Brain Computer Interface (BCI) is a hardware and software system that establishes direct communication between human brain and the environment. In a BCI system, brain messages pass through wires and external computers instead of the normal pathway of nerves and muscles. General work ow in all BCIs is to measure brain activities, process and then convert them into an output readable for a computer. The measurement of electrical activities in different parts of the brain is called electroencephalography (EEG). There are lots of sensor technologies with different number of electrodes to record brain activities along the scalp. Each of these electrodes captures a weighted sum of activities of all neurons in the area around that electrode. In order to establish a BCI system, it is needed to set a bunch of electrodes on scalp, and a tool to send the signals to a computer for training a system that can find the important information, extract them from the raw signal, and use them to recognize the user's intention. After all, a control signal should be generated based on the application. This thesis describes the step by step training and testing a BCI system that can be used for a person who has lost speaking skills through an accident or surgery, but still has healthy brain tissues. The goal is to establish an algorithm, which recognizes different vowels from EEG signals. It considers a bandpass filter to remove signals' noise and artifacts, periodogram for feature extraction, and Support Vector Machine (SVM) for classification. Brain Computer Interface EEG Support Vector Machine Multi-class Classification Speech recognition Speech processing systems -- Research Multimedia systems -- Research Wavelets (Mathematics) Computer algorithms -- Research User interfaces (Computer systems) Electrodes -- Testing
434	Fenchel duality-based algorithms for convex optimization problems with applications in machine learning and image restoration Heinrich, André 21 March 2013 (has links) The main contribution of this thesis is the concept of Fenchel duality with a focus on its application in the field of machine learning problems and image restoration tasks. We formulate a general optimization problem for modeling support vector machine tasks and assign a Fenchel dual problem to it, prove weak and strong duality statements as well as necessary and sufficient optimality conditions for that primal-dual pair. In addition, several special instances of the general optimization problem are derived for different choices of loss functions for both the regression and the classifification task. The convenience of these approaches is demonstrated by numerically solving several problems. We formulate a general nonsmooth optimization problem and assign a Fenchel dual problem to it. It is shown that the optimal objective values of the primal and the dual one coincide and that the primal problem has an optimal solution under certain assumptions. The dual problem turns out to be nonsmooth in general and therefore a regularization is performed twice to obtain an approximate dual problem that can be solved efficiently via a fast gradient algorithm. We show how an approximate optimal and feasible primal solution can be constructed by means of some sequences of proximal points closely related to the dual iterates. Furthermore, we show that the solution will indeed converge to the optimal solution of the primal for arbitrarily small accuracy. Finally, the support vector regression task is obtained to arise as a particular case of the general optimization problem and the theory is specialized to this problem. We calculate several proximal points occurring when using difffferent loss functions as well as for some regularization problems applied in image restoration tasks. Numerical experiments illustrate the applicability of our approach for these types of problems. info:eu-repo/classification/ddc/510 ddc:510
435	THEORY OF AUTOMATICITY IN CONSTRUCTION Ikechukwu Sylvester Onuchukwu (17469117) 30 November 2023 (has links) <p dir="ltr">Automaticity, an essential attribute of skill, is developed when a task is executed repeatedly with minimal attention and can have both good (e.g., productivity, skill acquisitions) and bad (e.g., accident involvement) implications on workers’ performance. However, the implications of automaticity in construction are unknown despite their significance. To address this knowledge gap, this research aimed to examine methods that are indicative of the development of automaticity on construction sites and its implications on construction safety and productivity. The objectives of the dissertation include: 1) examining the development of automaticity during the repetitive execution of a primary task of roofing construction and a concurrent secondary task (a computer-generated audio-spatial processing task) to measure attentional resources; 2) using eye-tracking metrics to distinguish between automatic and nonautomatic subjects and determine the significant factors contributing to the odds of automatic behavior; 3) determining which personal characteristics (such as personality traits and mindfulness dimensions) better explain the variability in the attention of workers while developing automaticity. To achieve this objective, 28 subjects were recruited to take part in a longitudinal study involving a total of 22 repetitive sessions of a simulated roofing task. The task involves the installation of 17 pieces of 25 ft2 shingles on a low-sloped roof model that was 8 ft wide, 8 ft long, and 4 ft high for one month in a laboratory. The collected data was analyzed using multiple statistical and data mining techniques such as repeated measures analysis of variance (RM-ANOVA), pairwise comparisons, principal component analysis (PCA), support vector machine (SVM), binary logistic regression (BLR), relative weight analyses (RWA), and advanced bootstrapping techniques to address the research questions. First, the findings showed that as the experiment progressed, there were significant improvements in the mean automatic performance measures such as the mean primary task duration, mean primary task accuracy, and mean secondary task score over the repeated measurements (p-value < 0.05). These findings were used to demonstrate that automaticity develops during repetitive construction activities. This is because these automatic performance measures provide an index for assessing feature-based changes that are synonymous with automaticity development. Second, this study successfully used supervised machine learning methods including SVM to classify subjects (with an accuracy of 76.8%) based on their eye-tracking data into automatic and nonautomatic states. Also, BLR was used to estimate the probability of exhibiting automaticity based on eye-tracking metrics and ascertain the variables significantly contributing to it. Eye-tracking variables collected towards safety harness and anchor, hammer, and work area AOIs were found to be significant predictors (p < 0.05) of the probability of exhibiting automatic behavior. Third, the results revealed that higher levels of agreeableness significantly impact increased levels of change in attention to productivity-related cues during automatic behavior. Additionally, higher levels of nonreactivity to inner experience significantly reduce the changes in attention to safety-related AOIs while developing automaticity. The findings of this study provide metrics to assess training effectiveness. The findings of this study can be used by practitioners to better understand the positive and negative consequences of developing automaticity, measure workers’ performance more accurately, assess training effectiveness, and personalize learning for workers. In long term, the findings of this study will also aid in improving human-AI teaming since the AI will be better able to understand the cognitive state of its human counterpart and can more precisely adapt to him or her.</p> Construction engineering Construction safety Automaticity Automatic performance Analysis of variance Attention Principal component analysis Support vector machines Binary logistic regression Machine learning Data mining Artificial intelligence Eye-tracking metrics Personal characteristics Personality traits Mindfulness dimensions Five facets of mindfulness Big Five personality traits Relative weight analysis Bootstrapping
436	Clearing the Way in Capsule Endoscopy with Deep Learning and Computer Vision. Noorda, Reinier Alexander 01 July 2022 (has links) [ES] La endoscopia capsular (CE) es una ampliamente utilizada alternativa mínimamente invasiva a la endoscopia tradicional, que permite la visualización de todo el intestino delgado, mientras no es posible hacerlo fácilmente con los procedimientos más invasivos. Sin embargo, esos métodos tradicionales aún suelen ser la primera opción de tratamiento, ya que todavía existen desafíos importantes en el campo de la CE, incluyendo el tiempo necesario para el diagnóstico por vídeo después del procedimiento, el hecho de que la cápsula no se puede controlar activamente, la falta de consenso sobre una buena preparación del paciente y el coste alto. En esta tesis doctoral, nuestro objetivo es extraer más información de los procedimientos de endoscopía por cápsula para ayudar a aliviar estos problemas desde una perspectiva que parece estar subrepresentada en la investigación actual. Primero, como el objetivo principal en esta tesis, pretendemos desarrollar un método de evaluación de la limpieza en procedimientos de CE automático y objetivo para asistir la investigación médica en métodos de preparación de los pacientes. Específicamente, a pesar de que una preparación adecuada del paciente pueda ayudar a obtener una mejor visibilidad, los estudios sobre el método más efectivo son contradictorios debido a la ausencia de tal método. Por lo tanto, pretendemos proporcionar un método de ese tipo, capaz de presentar la limpieza en una escala intuitiva, con una novedosa arquitectura relativamente ligera de una red neuronal convolucional en su núcleo. Entrenamos este modelo en un conjunto de datos extensivo de más de 50,000 parches de imágenes, obtenidos de 35 procedimientos CE diferentes, y lo comparamos con métodos de clasificación del estado del arte. A partir de la clasificación, desarrollamos un método para automáticamente estimar las probabilidades a nivel de píxel y deducir los puntos en la escala de la evaluación de la limpieza a través de umbrales aprendidos. Después, validamos nuestro método en un entorno clínico en 30 videos de CE obtenidos nuevamente, comparando las puntuaciones resultantes con las asignadas de forma independiente por especialistas humanos. Obtuvimos la mayor precisión de clasificación para el método propuesto (95,23%), con tiempos de predicción promedios significativamente más bajos que para el segundo mejor método. En la validación, encontramos un acuerdo aceptable con dos especialistas humanos en comparación con el acuerdo interhumano, mostrando su validez como método de evaluación objetivo. Adicionalmente, otro objetivo de este trabajo es detectar automáticamente el túnel y ubicar el túnel en cada fotograma. Para este objetivo, entrenamos un modelo basado en R-CNN, concretamente el detector ligero YOLOv3, en un total de 1385 fotogramas, extraídos de procedimientos de CE de 10 pacientes diferentes. De tal manera, alcanzamos una precisión del 86,55% y una recuperación del 88,79% en nuestro conjunto de datos de test. Ampliando este objetivo, también pretendemos visualizar la motilidad intestinal de una manera análoga a una manometría intestinal tradicional, basada únicamente en la técnica mínimamente invasiva de CE. Para esto, alineamos los fotogramas con similar orientación y derivamos los parámetros adecuados para nuestro método de segmentación de las propiedades del rectángulo delimitador del túnel. Finalmente, calculamos el tamaño relativo del túnel para construir un equivalente de una manometría intestinal a partir de información visual. Desde que concluimos nuestro trabajo, nuestro método para la evaluación automática de la limpieza se ha utilizado en un estudio a gran escala aún en curso, en el que participamos activamente. Mientras gran parte de la investigación se centra en la detección automática de patologías, como tumores, pólipos y hemorragias, esperamos que nuestro trabajo pueda hacer una contribución significativa para extraer más información de la CE también en otras áreas frecuentemente subestimadas. / [CA] L'endoscòpia capsular (CE) és una àmpliament utilitzada alternativa mínimament invasiva a l'endoscòpia tradicional, que permet la visualització de tot l'intestí prim, mentre no és possible fer-lo fàcilment amb els procediments més invasius. No obstant això, aqueixos mètodes tradicionals encara solen ser la primera opció de tractament, ja que encara existeixen desafiaments importants en el camp de la CE, incloent el temps necessari per al diagnòstic per vídeo després del procediment, el fet que la càpsula no es pot controlar activament, la falta de consens sobre una bona preparació del pacient i el cost alt. En aquesta tesi doctoral, el nostre objectiu és extraure més informació dels procediments de endoscopía per càpsula per a ajudar a alleujar aquests problemes des d'una perspectiva que sembla estar subrepresentada en la investigació actual. Primer, com l'objectiu principal en aquesta tesi, pretenem desenvolupar un mètode d'avaluació de la neteja en procediments de CE automàtic i objectiu per a assistir la investigació mèdica en mètodes de preparació dels pacients. Específicament, a pesar que una preparació adequada del pacient puga ajudar a obtindre una millor visibilitat, els estudis sobre el mètode més efectiu són contradictoris a causa de l'absència de tal mètode. Per tant, pretenem proporcionar un mètode d'aqueix tipus, capaç de presentar la neteja en una escala intuïtiva, amb una nova arquitectura relativament lleugera d'una xarxa neuronal convolucional en el seu nucli. Entrenem aquest model en un conjunt de dades extensiu de més de 50,000 pegats d'imatges, obtinguts de 35 procediments CE diferents, i el comparem amb mètodes de classificació de l'estat de l'art. A partir de la classificació, desenvolupem un mètode per a automàticament estimar les probabilitats a nivell de píxel i deduir els punts en l'escala de l'avaluació de la neteja a través de llindars apresos. Després, validem el nostre mètode en un entorn clínic en 30 vídeos de CE obtinguts novament, comparant les puntuacions resultants amb les assignades de manera independent per especialistes humans. Vam obtindre la major precisió de classificació per al mètode proposat (95,23%), amb temps de predicció mitjanes significativament més baixos que per al segon millor mètode. En la validació, trobem un acord acceptable amb dos especialistes humans en comparació amb l'acord interhumà, mostrant la seua validesa com a mètode d'avaluació objectiu. Addicionalment, un altre objectiu d'aquest treball és detectar automàticament el túnel i situar el túnel en cada fotograma. Per a aquest objectiu, entrenem un model basat en R-CNN, concretament el detector lleuger YOLOv3, en un total de 1385 fotogrames, extrets de procediments de CE de 10 pacients diferents. De tal manera, aconseguim una precisió del 86,55% i una recuperació del 88,79% en el nostre conjunt de dades de test. Ampliant aquest objectiu, també pretenem visualitzar la motilitat intestinal d'una manera anàloga a una manometría intestinal tradicional, basada únicament en la tècnica mínimament invasiva de CE. Per a això, alineem els fotogrames amb similar orientació i derivem els paràmetres adequats per al nostre mètode de segmentació de les propietats del rectangle delimitador del túnel. Finalment, calculem la grandària relativa del túnel per a construir un equivalent d'una manometría intestinal a partir d'informació visual. Des que concloem el nostre treball, el nostre mètode per a l'avaluació automàtica de la neteja s'ha utilitzat en un estudi a gran escala encara en curs, en el qual participem activament. Mentre gran part de la investigació se centra en la detecció automàtica de patologies, com a tumors, pòlips i hemorràgies, esperem que el nostre treball puga fer una contribució significativa per a extraure més informació de la CE també en altres àrees sovint subestimades. / [EN] Capsule endoscopy (CE) is a widely used, minimally invasive alternative to traditional endoscopy that allows visualisation of the entire small intestine, whereas more invasive procedures cannot easily do this. However, those traditional methods are still commonly the first choice of treatment for gastroenterologists as there are still important challenges surrounding the field of CE. Among others, these include the time consuming video diagnosis following the procedure, the fact that the capsule cannot be actively controlled, lack of consensus on good patient preparation and the high cost. In this doctoral thesis, we aim to extract more information from capsule endoscopy procedures to aid in alleviating these issues from a perspective that appears to be under-represented in current research. First, and as the main objective in this thesis, we aim to develop an objective, automatic cleanliness evaluation method in CE procedures to aid medical research in patient preparation methods. Namely, even though adequate patient preparation can help to obtain a cleaner intestine and thus better visibility in the resulting videos, studies on the most effective preparation method are conflicting due to the absence of such a method. Therefore, we aim to provide such a method, capable of presenting results on an intuitive scale, with a relatively light-weight novel convolutional neural network architecture at its core. We trained this model on an extensive data set of over 50,000 image patches, collected from 35 different CE procedures, and compared it with state-of-the-art classification methods. From the patch classification results, we developed a method to automatically estimate pixel-level probabilities and deduce cleanliness evaluation scores through automatically learnt thresholds. We then validated our method in a clinical setting on 30 newly collected CE videos, comparing the resulting scores to those independently assigned by human specialists. We obtained the highest classification accuracy for the proposed method (95.23%), with significantly lower average prediction times than for the second-best method. In the validation of our method, we found acceptable agreement with two human specialists compared to interhuman agreement, showing its validity as an objective evaluation method. Additionally, we aim to automatically detect and localise the tunnel in each frame, in order to help determine the capsule orientation at any given time. For this purpose, we trained an R-CNN based model, namely the light-weight YOLOv3 detector, on a total of 1385 frames, extracted from CE procedures of 10 different patients, achieving a precision of 86.55% combined with a recall of 88.79% on our test set. Extending on this, we additionally aim to visualise intestinal motility in a manner analogous to a traditional intestinal manometry, solely based on the minimally invasive technique of CE, through aligning the frames with similar orientation and using the bounding box parameters to derive adequate parameters for our tunnel segmentation method. Finally, we calculate the relative tunnel size to construct an equivalent of an intestinal manometry from visual information. Since we concluded our work, our method for automatic cleanliness evaluation has been used in a still on-going, large-scale study, with in which we actively participate. While much research focuses on automatic detection of pathologies, such as tumors, polyps and bleedings, we hope our work can make a significant contribution to extract more information from CE also in other areas that are often overlooked. / Noorda, RA. (2022). Clearing the Way in Capsule Endoscopy with Deep Learning and Computer Vision [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/183752 Endoscopy Capsule endoscopy Computer vision Deep learning Convolutional neural networks Intestinal content Patient preparation Visibility Cleanliness Automatic detection Intestinal motility Manometry Lumen Local binary patterns Support vector machines Endoscopía Cápsula endoscópica Visión artificial Aprendizaje profundo Redes neuronales Contenido intestinal Visibilidad Limpieza Detección automática Motilidad intestinal Manometría TEORIA DE LA SEÑAL Y COMUNICACIONES
437	Μέθοδοι διάγνωσης με βάση προηγμένες τεχνικές επεξεργασίας και ταξινόμησης δεδομένων. Εφαρμογές στη μαιευτική / Advanced data processing and classification techniques for diagnosis methods. Application in obstetrics Γεωργούλας, Γεώργιος Κ. 13 February 2009 (has links) Αντικείμενο της διατριβής ήταν η ανάπτυξη υπολογιστικών μεθόδων διάγνωσης και εκτίμησης της κατάστασης της υγείας του εμβρύου. Οι προτεινόμενες μεθοδολογίες αναλύουν και εξάγουν πληροφορίες από το σήμα της ΕΚΣ καθώς το συγκεκριμένο σήμα αποτελεί ένα από τα λιγοστά διαθέσιμα εργαλεία για την εκτίμηση της οξυγόνωσης του εμβρύου και της αξιολόγησης της κατάστασης της υγείας του κατά τη διάρκεια του τοκετού. Για την αξιολόγηση των μεθόδων εξετάστηκε η συσχέτιση της Εμβρυϊκής Καρδιακής Συχνότητας (ΕΚΣ) με βραχυπρόθεσμες αξιόπιστες ενδείξεις για την κατάσταση του εμβρύου και πιο συγκεκριμένα χρησιμοποιήθηκε η συσχέτιση της τιμής του pH του αίματος του εμβρύου η οποία αποτελεί μια έμμεση ένδειξη για την ανάπτυξη υποξίας κατά τη διάρκεια του τοκετού. Στα πλαίσια της διατριβής χρησιμοποιήθηκε για πρώτη φορά η μέθοδος της ανάλυσης σε ανεξάρτητες συνιστώσες για την εξαγωγή χαρακτηριστικών από το σήμα της ΕΚΣ. Επίσης προτάθηκαν και χρησιμοποιήθηκαν Κρυφά Μοντέλα Markov σε μια προσπάθεια να «συλληφθεί» η χρονική εξέλιξη του φαινομένου της μεταβολής της κατάστασης του εμβρύου. Επιπλέον προτάθηκαν νέα χαρακτηριστικά εξαγόμενα με τη χρήση του Διακριτού Μετασχηματισμού Κυματιδίου. Με χρήση μιας υβριδική μέθοδος, που βασίζεται στη χρήση εξελικτικής γραμματικής «κατασκευάστηκαν» νέα χαρακτηριστικά παραγόμενα από τα χαρακτηριστικά που είχαν ήδη εξαχθεί με συμβατικές μεθόδους. Επιπρόσθετα στα πλαίσια της διατριβής χρησιμοποιήθηκαν για πρώτη φορά (και η μόνη μέχρι στιγμής) μηχανές διανυσμάτων υποστήριξης για την ταξινόμηση και προτάθηκε και χρησιμοποιήθηκε για πρώτη φορά η μέθοδος βελτιστοποίησης με σμήνος σωματιδίων για τη ρύθμιση των παραμέτρων τους. Τέλος προτάθηκε και χρησιμοποιήθηκε για πρώτη φορά η μέθοδος βελτιστοποίησης με σμήνος σωματιδίων για την εκπαίδευση μιας νέας οικογένειας νευρωνικών δικτύων, των νευρωνικών δικτύων κυματιδίου. Μέσα από τα πειράματα τα οποία διεξήγαμε καταφέραμε να δείξουμε ότι τα δεδομένα της ΕΚΣ διαθέτουν σημαντική πληροφορία η οποία με τη χρήση κατάλληλων προηγμένων μεθόδων επεξεργασίας και ταξινόμησης μπορεί να συσχετιστεί με την τιμή του pH του εμβρύου, κάτι το οποίο θεωρούνταν ουτοπικό στη δεκαετία του 90. / This Dissertation dealt with the development of computational methods for the diagnosis and estimation of fetal condition. The proposed methods analyzed and extracted information from the Fetal Heart Rate (FHR) signal, since this is one of the few available tools for the estimation of fetal oxygenation and the assessment of fetal condition during labor. For the evaluation of the proposed methods the correlation of the FHR signal with short term indices were employed and to be more specific, its correlation with the pH values of fetal blood, which is an indirect sign of the development of fetal hypoxia during labor. In the context of this Dissertation, Independent Component Analysis (ICA) for feature extraction from the FHR signal was used for the first time. Moreover we used Hidden Markov Models in an attempt to “capture” the evolution in time of the fetal condition. Furthermore, new features based on the Discrete Wavelet Transform were proposed and used. Using a new hybrid method based on grammatical evolution new features were constructed based on already extracted features by conventional methods. Moreover, for the first (and only) time, Support Vector Machine (SVM) classifiers were employed in the field of FHR processing and the Particle Swarm Optimization (PSO) method was proposed for tuning their parameters. Finally, a new family of neural networks, the Wavelet Neural Networks (WNN) was proposed and used, trained using the PSO method. By conducting a number of experiments we managed to show that the FHR signal conveys valuable information, which by the use of advanced data processing and classification techniques can be associated with fetal pH, something which was not regarded feasible during the 90’s. Νευρωνικά δίκτυα Υποξία 618.326 100 285 Advanced data processing techniques Advanced data classification techniques Fetal heart rate (FHR) Independent component analysis (ICA) Wavelet transform Wavelet neural networks Support vector machines Neural networks Hypoxia Evolutionary algorithms Decision support system in medicine
438	Exploring variabilities through factor analysis in automatic acoustic language recognition / Exploration par l'analyse factorielle des variabilités de la reconnaissance acoustique automatique de la langue / Erforschung durch Faktor-Analysis der Variabilitäten der automatischen akustischen Sprachen-Erkennung Verdet, Florian 05 September 2011 (has links) La problématique traitée par la Reconnaissance de la Langue (LR) porte sur la définition découverte de la langue contenue dans un segment de parole. Cette thèse se base sur des paramètres acoustiques de courte durée, utilisés dans une approche d’adaptation de mélanges de Gaussiennes (GMM-UBM). Le problème majeur de nombreuses applications du vaste domaine de la re- problème connaissance de formes consiste en la variabilité des données observées. Dans le contexte de la Reconnaissance de la Langue (LR), cette variabilité nuisible est due à des causes diverses, notamment les caractéristiques du locuteur, l’évolution de la parole et de la voix, ainsi que les canaux d’acquisition et de transmission. Dans le contexte de la reconnaissance du locuteur, l’impact de la variabilité solution peut sensiblement être réduit par la technique d’Analyse Factorielle (Joint Factor Analysis, JFA). Dans ce travail, nous introduisons ce paradigme à la Reconnaissance de la Langue. Le succès de la JFA repose sur plusieurs hypothèses. La première est que l’information observée est décomposable en une partie universelle, une partie dépendante de la langue et une partie de variabilité, qui elle est indépendante de la langue. La deuxième hypothèse, plus technique, est que la variabilité nuisible se situe dans un sous-espace de faible dimension, qui est défini de manière globale.Dans ce travail, nous analysons le comportement de la JFA dans le contexte d’un dispositif de LR du type GMM-UBM. Nous introduisons et analysons également sa combinaison avec des Machines à Vecteurs Support (SVM). Les premières publications sur la JFA regroupaient toute information qui est amélioration nuisible à la tâche (donc ladite variabilité) dans un seul composant. Celui-ci est supposé suivre une distribution Gaussienne. Cette approche permet de traiter les différentes sortes de variabilités d’une manière unique. En pratique, nous observons que cette hypothèse n’est pas toujours vérifiée. Nous avons, par exemple, le cas où les données peuvent être groupées de manière logique en deux sous-parties clairement distinctes, notamment en données de sources téléphoniques et d’émissions radio. Dans ce cas-ci, nos recherches détaillées montrent un certain avantage à traiter les deux types de données par deux systèmes spécifiques et d’élire comme score de sortie celui du système qui correspond à la catégorie source du segment testé. Afin de sélectionner le score de l’un des systèmes, nous avons besoin d’un analyses détecteur de canal source. Nous proposons ici différents nouveaux designs pour engendrées de tels détecteurs automatiques. Dans ce cadre, nous montrons que les facteurs de variabilité (du sous-espace) de la JFA peuvent être utilisés avec succès pour la détection de la source. Ceci ouvre la perspective intéressante de subdiviser les5données en catégories de canal source qui sont établies de manière automatique. En plus de pouvoir s’adapter à des nouvelles conditions de source, cette propriété permettrait de pouvoir travailler avec des données d’entraînement qui ne sont pas accompagnées d’étiquettes sur le canal de source. L’approche JFA permet une réduction de la mesure de coûts allant jusqu’à généraux 72% relatives, comparé au système GMM-UBM de base. En utilisant des systèmes spécifiques à la source, suivis d’un sélecteur de scores, nous obtenons une amélioration relative de 81%. / Language Recognition is the problem of discovering the language of a spoken definitionutterance. This thesis achieves this goal by using short term acoustic information within a GMM-UBM approach.The main problem of many pattern recognition applications is the variability of problemthe observed data. In the context of Language Recognition (LR), this troublesomevariability is due to the speaker characteristics, speech evolution, acquisition and transmission channels.In the context of Speaker Recognition, the variability problem is solved by solutionthe Joint Factor Analysis (JFA) technique. Here, we introduce this paradigm toLanguage Recognition. The success of JFA relies on several assumptions: The globalJFA assumption is that the observed information can be decomposed into a universalglobal part, a language-dependent part and the language-independent variabilitypart. The second, more technical assumption consists in the unwanted variability part to be thought to live in a low-dimensional, globally defined subspace. In this work, we analyze how JFA behaves in the context of a GMM-UBM LR framework. We also introduce and analyze its combination with Support Vector Machines(SVMs).The first JFA publications put all unwanted information (hence the variability) improvemen tinto one and the same component, which is thought to follow a Gaussian distribution.This handles diverse kinds of variability in a unique manner. But in practice,we observe that this hypothesis is not always verified. We have for example thecase, where the data can be divided into two clearly separate subsets, namely datafrom telephony and from broadcast sources. In this case, our detailed investigations show that there is some benefit of handling the two kinds of data with two separatesystems and then to elect the output score of the system, which corresponds to the source of the testing utterance.For selecting the score of one or the other system, we need a channel source related analyses detector. We propose here different novel designs for such automatic detectors.In this framework, we show that JFA’s variability factors (of the subspace) can beused with success for detecting the source. This opens the interesting perspectiveof partitioning the data into automatically determined channel source categories,avoiding the need of source-labeled training data, which is not always available.The JFA approach results in up to 72% relative cost reduction, compared to the overall resultsGMM-UBM baseline system. Using source specific systems followed by a scoreselector, we achieve 81% relative improvement. Reconnaissance de formes Traitement automatique de la parole Apprentissage automatique Modélisation acoustique Analyse factorielle (jointe) Compensation de la variabilité Robustesse du système Information acoustique à court terme Modèle Universel (UBM) Décomposition de l'information Machines à Vectors Support (SVM) Canal de la source acoustique Détecteur de canal Pattern recognition Automatic speech processing Automatic learning Language recognition Acoustic modeling (joint) factor analysis Variability compensation System robustness Short-term acoustic information Gaussian Mixture Models (GMM) Universal Background Model (UBM) Information decomposition Support Vector Machines (SVM) Audio source channel Channel detector 006.454
439	Rozpoznání hudebního slohu z orchestrální nahrávky za pomoci technik Music Information Retrieval / Recognition of music style from orchestral recording using Music Information Retrieval techniques Jelínková, Jana January 2020 (has links) As all genres of popular music, classical music consists of many different subgenres. The aim of this work is to recognize those subgenres from orchestral recordings. It is focused on the time period from the very end of 16th century to the beginning of 20th century, which means that Baroque era, Classical era and Romantic era are researched. The Music Information Retrieval (MIR) method was used to classify chosen subgenres. In the first phase of MIR method, parameters were extracted from musical recordings and were evaluated. Only the best parameters were used as input data for machine learning classifiers, to be specific: kNN (K-Nearest Neighbor), LDA (Linear Discriminant Analysis), GMM (Gaussian Mixture Models) and SVM (Support Vector Machines). In the final chapter, all the best results are summarized. According to the results, there is significant difference between the Baroque era and the other researched eras. This significant difference led to better identification of the Baroque era recordings. On the contrary, Classical era ended up to be relatively similar to Romantic era and therefore all classifiers had less success in identification of recordings from this era. The results are in line with music theory and characteristics of chosen musical eras.
440	Detekce logopedických vad v řeči / Detection of Logopaedic Defects in Speech Pešek, Milan January 2009 (has links) The thesis deals with a design and an implementation of software for a detection of logopaedia defects of speech. Due to the need of early logopaedia defects detecting, this software is aimed at a child’s age speaker. The introductory part describes the theory of speech realization, simulation of speech realization for numerical processing, phonetics, logopaedia and basic logopaedia defects of speech. There are also described used methods for feature extraction, for segmentation of words to speech sounds and for features classification into either correct or incorrect pronunciation class. In the next part of the thesis there are results of testing of selected methods presented. For logopaedia speech defects recognition algorithms are used in order to extract the features MFCC and PLP. The segmentation of words to speech sounds is performed on the base of Differential Function method. The extracted features of a sound are classified into either a correct or an incorrect pronunciation class with one of tested methods of pattern recognition. To classify the features, the k-NN, SVN, ANN, and GMM methods are tested.

Search results