• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 102
  • 9
  • 9
  • 8
  • 6
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 183
  • 183
  • 92
  • 75
  • 37
  • 36
  • 34
  • 32
  • 27
  • 26
  • 26
  • 25
  • 22
  • 21
  • 19
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
151

Who Spoke What And Where? A Latent Variable Framework For Acoustic Scene Analysis

Sundar, Harshavardhan 26 March 2016 (has links) (PDF)
Speech is by far the most natural form of communication between human beings. It is intuitive, expressive and contains information at several cognitive levels. We as humans, are perceptive to several of these cognitive levels of information, as we can gather the information pertaining to the identity of the speaker, the speaker's gender, emotion, location, the language, and so on, in addition to the content of what is being spoken. This makes speech based human machine interaction (HMI), both desirable and challenging for the same set of reasons. For HMI to be natural for humans, it is imperative that a machine understands information present in speech, at least at the level of speaker identity, language, location in space, and the summary of what is being spoken. Although one can draw parallels between the human-human interaction and HMI, the two differ in their purpose. We, as humans, interact with a machine, mostly in the context of getting a task done more efficiently, than is possible without the machine. Thus, typically in HMI, controlling the machine in a specific manner is the primary goal. In this context, it can be argued that, HMI, with a limited vocabulary containing specific commands, would suffice for a more efficient use of the machine. In this thesis, we address the problem of ``Who spoke what and where", in the context of a machine understanding the information pertaining to identities of the speakers, their locations in space and the keywords they spoke, thus considering three levels of information - speaker identity (who), location (where) and keywords (what). This can be addressed with the help of multiple sensors like microphones, video camera, proximity sensors, motion detectors, etc., and combining all these modalities. However, we explore the use of only microphones to address this issue. In practical scenarios, often there are times, wherein, multiple people are talking at the same time. Thus, the goal of this thesis is to detect all the speakers, their keywords, and their locations in mixture signals containing speech from simultaneous speakers. Addressing this problem of ``Who spoke what and where" using only microphone signals, forms a part of acoustic scene analysis (ASA) of speech based acoustic events. We divide the problem of ``who spoke what and where" into two sub-problems: ``Who spoke what?" and ``Who spoke where". Each of these problems is cast in a generic latent variable (LV) framework to capture information in speech at different levels. We associate a LV to represent each of these levels and model the relationship between the levels using conditional dependency. The sub-problem of ``who spoke what" is addressed using single channel microphone signal, by modeling the mixture signal in terms of LV mass functions of speaker identity, the conditional mass function of the keyword spoken given the speaker identity, and a speaker-specific-keyword model. The LV mass functions are estimated in a Maximum likelihood (ML) framework using the Expectation Maximization (EM) algorithm using Student's-t Mixture Model (tMM) as speaker-specific-keyword models. Motivated by HMI in a home environment, we have created our own database. In mixture signals, containing two speakers uttering the keywords simultaneously, the proposed framework achieves an accuracy of 82 % for detecting both the speakers and their respective keywords. The other sub-problem of ``who spoke where?" is addressed in two stages. In the first stage, the enclosure is discretized into sectors. The speakers and the sectors in which they are located are detected in an approach similar to the one employed for ``who spoke what" using signals collected from a Uniform Circular Array (UCA). However, in place of speaker-specific-keyword models, we use tMM based speaker models trained on clean speech, along with a simple Delay and Sum Beamformer (DSB). In the second stage, the speakers are localized within the active sectors using a novel region constrained localization technique based on time difference of arrival (TDOA). Since the problem being addressed is a multi-label classification task, we use the average Hamming score (accuracy) as the performance metric. Although the proposed approach yields an accuracy of 100 % in an anechoic setting for detecting both the speakers and their corresponding sectors in two-speaker mixture signals, the performance degrades to an accuracy of 67 % in a reverberant setting, with a $60$ dB reverberation time (RT60) of 300 ms. To improve the performance under reverberation, prior knowledge of the location of multiple sources is derived using a novel technique derived from geometrical insights into TDOA estimation. With this prior knowledge, the accuracy of the proposed approach improves to 91 %. It is worthwhile to note that, the accuracies are computed for mixture signals containing more than 90 % overlap of competing speakers. The proposed LV framework offers a convenient methodology to represent information at broad levels. In this thesis, we have shown its use with three different levels. This can be extended to several such levels to be applicable for a generic analysis of the acoustic scene consisting of broad levels of events. It will turn out that not all levels are dependent on each other and hence the LV dependencies can be minimized by independence assumption, which will lead to solving several smaller sub-problems, as we have shown above. The LV framework is also attractive to incorporate prior knowledge about the acoustic setting, which is combined with the evidence from the data to derive the information about the presence of an acoustic event. The performance of the framework, is dependent on the choice of stochastic models, which model the likelihood function of the data given the presence of acoustic events. However, it provides an access to compare and contrast the use of different stochastic models for representing the likelihood function.
152

Sketch-based intuitive 3D model deformations

Bao, Xin January 2014 (has links)
In 3D modelling software, deformations are used to add, to remove, or to modify geometric features of existing 3D models to create new models with similar but slightly different details. Traditional techniques for deforming virtual 3D models require users to explicitly define control points and regions of interest (ROIs), and to define precisely how to deform ROIs using control points. The awkwardness of defining these factors in traditional 3D modelling software makes it difficult for people with limited experience of 3D modelling to deform existing 3D models as they expect. As applications which require virtual 3D model processing become more and more widespread, it becomes increasingly desirable to lower the "difficulty of use" threshold of 3D model deformations for users. This thesis argues that the user experience, in terms of intuitiveness and ease of use, of a user interface for deforming virtual 3D models, can be greatly enhanced by employing sketch-based 3D model deformation techniques, which require the minimal quantities of interactions, while keeping the plausibility of the results of deformations as well as the responsiveness of the algorithms, based on modern home grade computing devices. A prototype system for sketch-based 3D model deformations is developed and implemented to support this hypothesis, which allows the user to perform a deformation using a single deforming stroke, eliminating the need to explicitly select control points, the ROI and the deforming operation. GPU based accelerations have been employed to optimise the runtime performance of the system, so that the system is responsive enough for real-time interactions. The studies of the runtime performance and the usability of the prototype system are conducted to provide evidence to support the hypothesis.
153

Dose savings in digital breast tomosynthesis through image processing / Redução da dose de radiação em tomossíntese mamária através de processamento de imagens

Lucas Rodrigues Borges 14 June 2017 (has links)
In x-ray imaging, the x-ray radiation must be the minimum necessary to achieve the required diagnostic objective, to ensure the patients safety. However, low-dose acquisitions yield images with low quality, which affect the radiologists image interpretation. Therefore, there is a compromise between image quality and radiation dose. This work proposes an image restoration framework capable of restoring low-dose acquisitions to achieve the quality of full-dose acquisitions. The contribution of the new method includes the capability of restoring images with quantum and electronic noise, pixel offset and variable detector gain. To validate the image processing chain, a simulation algorithm was proposed. The simulation generates low-dose DBT projections, starting from fulldose images. To investigate the feasibility of reducing the radiation dose in breast cancer screening programs, a simulated pre-clinical trial was conducted using the simulation and the image processing pipeline proposed in this work. Digital breast tomosynthesis (DBT) images from 72 patients were selected, and 5 human observers were invited for the experiment. The results suggested that a reduction of up to 30% in radiation dose could not be perceived by the human reader after the proposed image processing pipeline was applied. Thus, the image processing algorithm has the potential to decrease radiation levels in DBT, also decreasing the cancer induction risks associated with the exam. / Em programas de rastreamento de câncer de mama, a dose de radiação deve ser mantida o mínimo necessário para se alcançar o diagnóstico, para garantir a segurança dos pacientes. Entretanto, imagens adquiridas com dose de radiação reduzida possuem qualidade inferior. Assim, existe um equilíbrio entre a dose de radiação e a qualidade da imagem. Este trabalho propõe um algoritmo de restauração de imagens capaz de recuperar a qualidade das imagens de tomossíntese digital mamária, adquiridas com doses reduzidas de radiação, para alcançar a qualidade de imagens adquiridas com a dose de referência. As contribuições do trabalho incluem a melhoria do modelo de ruído, e a inclusão das características do detector, como o ganho variável do ruído quântico. Para a validação a cadeia de processamento, um método de simulação de redução de dose de radiação foi proposto. Para investigar a possibilidade de redução de dose de radiação utilizada na tomossíntese, um estudo pré-clínico foi conduzido utilizando o método de simulação proposto e a cadeia de processamento. Imagens clínicas de tomossíntese mamária de 72 pacientes foram selecionadas e cinco observadores foram convidados para participar do estudo. Os resultados sugeriram que, após a utilização do processamento proposto, uma redução de 30% de dose de radiação pôde ser alcançada sem que os observadores percebessem diferença nos níveis de ruído e borramento. Assim, o algoritmo de processamento tem o potencial de reduzir os níveis de radiação na tomossíntese mamária, reduzindo também os riscos de indução do câncer de mama.
154

Data-driven Uncertainty Analysis in Neural Networks with Applications to Manufacturing Process Monitoring

Bin Zhang (11073474) 12 August 2021 (has links)
<p>Artificial neural networks, including deep neural networks, play a central role in data-driven science due to their superior learning capacity and adaptability to different tasks and data structures. However, although quantitative uncertainty analysis is essential for training and deploying reliable data-driven models, the uncertainties in neural networks are often overlooked or underestimated in many studies, mainly due to the lack of a high-fidelity and computationally efficient uncertainty quantification approach. In this work, a novel uncertainty analysis scheme is developed. The Gaussian mixture model is used to characterize the probability distributions of uncertainties in arbitrary forms, which yields higher fidelity than the presumed distribution forms, like Gaussian, when the underlying uncertainty is multimodal, and is more compact and efficient than large-scale Monte Carlo sampling. The fidelity of the Gaussian mixture is refined through adaptive scheduling of the width of each Gaussian component based on the active assessment of the factors that could deteriorate the uncertainty representation quality, such as the nonlinearity of activation functions in the neural network. </p> <p>Following this idea, an adaptive Gaussian mixture scheme of nonlinear uncertainty propagation is proposed to effectively propagate the probability distributions of uncertainties through layers in deep neural networks or through time in recurrent neural networks. An adaptive Gaussian mixture filter (AGMF) is then designed based on this uncertainty propagation scheme. By approximating the dynamics of a highly nonlinear system with a feedforward neural network, the adaptive Gaussian mixture refinement is applied at both the state prediction and Bayesian update steps to closely track the distribution of unmeasurable states. As a result, this new AGMF exhibits state-of-the-art accuracy with a reasonable computational cost on highly nonlinear state estimation problems subject to high magnitudes of uncertainties. Next, a probabilistic neural network with Gaussian-mixture-distributed parameters (GM-PNN) is developed. The adaptive Gaussian mixture scheme is extended to refine intermediate layer states and ensure the fidelity of both linear and nonlinear transformations within the network so that the predictive distribution of output target can be inferred directly without sampling or approximation of integration. The derivatives of the loss function with respect to all the probabilistic parameters in this network are derived explicitly, and therefore, the GM-PNN can be easily trained with any backpropagation method to address practical data-driven problems subject to uncertainties.</p> <p>The GM-PNN is applied to two data-driven condition monitoring schemes of manufacturing processes. For tool wear monitoring in the turning process, a systematic feature normalization and selection scheme is proposed for the engineering of optimal feature sets extracted from sensor signals. The predictive tool wear models are established using two methods, one is a type-2 fuzzy network for interval-type uncertainty quantification and the other is the GM-PNN for probabilistic uncertainty quantification. For porosity monitoring in laser additive manufacturing processes, convolutional neural network (CNN) is used to directly learn patterns from melt-pool patterns to predict porosity. The classical CNN models without consideration of uncertainty are compared with the CNN models in which GM-PNN is embedded as an uncertainty quantification module. For both monitoring schemes, experimental results show that the GM-PNN not only achieves higher prediction accuracies of process conditions than the classical models but also provides more effective uncertainty quantification to facilitate the process-level decision-making in the manufacturing environment.</p><p>Based on the developed uncertainty analysis methods and their proven successes in practical applications, some directions for future studies are suggested. Closed-loop control systems may be synthesized by combining the AGMF with data-driven controller design. The AGMF can also be extended from a state estimator to the parameter estimation problems in data-driven models. In addition, the GM-PNN scheme may be expanded to directly build more complicated models like convolutional or recurrent neural networks.</p>
155

Autonomní jednokanálový deinterleaving / Autonomous Single-Channel Deinterleaving

Tomešová, Tereza January 2021 (has links)
This thesis deals with an autonomous single-channel deinterleaving. An autonomous single-channel deinterleaving is a separation of the received sequence of impulses from more than one emitter to sequences of impulses from one emitter without a human assistance. Methods used for deinterleaving could be divided into single-parameter and multiple-parameter methods according to the number of parameters used for separation. This thesis primarily deals with multi-parameter methods. As appropriate methods for an autonomous single-channel deinterleaving DBSCAN and variational bayes methods were chosen. Selected methods were adjusted for deinterleaving and implemented in programming language Python. Their efficiency is examined on simulated and real data.
156

Diagnóza Parkinsonovy choroby z řečového signálu / Parkinson disease diagnosis using speech signal analysis

Karásek, Michal January 2011 (has links)
The thesis deals with the recognition of Parkinson's disease from the speech signal. The first part refers to the principles of speech signals and speech signals by patients suffering from Parkinson's disease. Further, it continues to describe the issues of speech signals processing, basic symptoms used for diagnosis of Parkinson's disease (e. g. VAI, VSA, FCR, VOT etc.) and reduction of these symptoms. The next part focuses on a block diagram of the program for the diagnosis of Parkinson's disease. The main objective of this thesis is comparison of two methods of feature selection (mRMR and SFFS). For classification have selected two different methods were used. The first method is classification kNN and second method of classification is Gaussian mixture model (GMM).
157

Dynamické rozpoznávání scény pro navigaci mobilního robotu / Dynamic Scene Understanding for Mobile Robot Navigation

Mikšík, Ondřej January 2012 (has links)
Diplomová práce se zabývá porozuměním dynamických scén pro navigaci mobilních robotů. V první části předkládáme nový přístup k "sebe-učícím" modelům - fůzi odhadu úběžníku cesty založeného na frekvenčním zpracování a pravděpodobnostních modelech využívající barvu pro segmentaci. Detekce úběžníku cesty je založena na odhadu dominantních orientací texturního toku, získáného pomocí banky Gaborových vlnek, a hlasování. Úběžník cesty poté definuje trénovací oblast, která se využívá k samostatnému učení barevných modelů. Nakonec, oblasti tvořící cestu jsou vybrány pomocí měření Mahalanobisovi vzdálenosti. Pár pravidel řeší situace, jako jsou mohutné stíny, přepaly a rychlost adaptivity. Kromě toho celý odhad úběžníku cesty je přepracován - vlnky jsou nahrazeny aproximacemi pomocí binárních blokových funkcí, což umožňuje efektivní filtraci pomocí integrálních obrazů. Nejužší hrdlo celého algoritmu bylo samotné hlasování, proto překládáme schéma, které nejdříve provede hrubý odhad úběžníku a následně jej zpřesní, čímž dosáhneme výrazně vyšší rychlosti (až 40x), zatímco přesnost se zhorší pouze o 3-5%. V druhé části práce předkládáme vyhlazovací filtr pro prostorovo-časovou konzistentnost predikcí, která je důležitá pro vyspělé systémy. Klíčovou částí filtru je nová metrika měřící podobnost mezi třídami, která rozlišuje mnohem lépe než standardní Euclidovská vzdálenost. Tato metrika může být použita k nejrůznějším úlohám v počítačovém vidění. Vyhlazovací filtr nejdříve odhadne optický tok, aby definoval lokální okolí. Toto okolí je použito k rekurzivní filtraci založené na podobnostní metrice. Celková přesnost předkládané metody měřená na pixelech, které nemají shodné predikce mezi původními daty a vyfiltrovanými, je téměř o 18% vyšší než u původních predikcí. Ačkoliv využíváme SHIM jako zdroj původních predikcí, algoritmus může být kombinován s kterýmkoliv jiným systémem (MRF, CRF,...), který poskytne predikce ve formě pravěpodobností. Předkládaný filtr představuje první krok na cestě k úplnému usuzování.
158

Kdy kdo mluví? / Speaker Diarization

Tomášek, Pavel January 2011 (has links)
This work aims at a task of speaker diarization. The goal is to implement a system which is able to decide "who spoke when". Particular components of implementation are described. The main parts are feature extraction, voice activity detection, speaker segmentation and clustering and finally also postprocessing. This work also contains results of implemented system on test data including a description of evaluation. The test data comes from the NIST RT Evaluation 2005 - 2007 and the lowest error rate for this dataset is 18.52% DER. Results are compared with diarization system implemented by Marijn Huijbregts from The Netherlands, who worked on the same data in 2009 and reached 12.91% DER.
159

Získávání znalostí z obrazových databází / Knowledge Discovery in Image Databases

Jaroš, Ondřej January 2010 (has links)
This thesis is focused on knowledge discovery from databases, especially on methods of classification and prediction. These methods are described in detail.  Furthermore, this work deals with multimedia databases and the way these databases store data. In particular, the method for processing low-level image and video data is described.  The practical part of the thesis focuses on the implementation of this GMM method used for extracting low-level features of video data and images. In other parts, input data and tools, which the implemented method was compared with, are described.  The last section focuses on experiments comparing extraction efficiency features of high-level attributes of low-level data and the methods implemented in selected classification tools LibSVM.
160

Rozšíření pro pravděpodobnostní lineární diskriminační analýzu v rozpoznávání mluvčího / Extensions to Probabilistic Linear Discriminant Analysis for Speaker Recognition

Plchot, Oldřich Unknown Date (has links)
Tato práce se zabývá pravděpodobnostními modely pro automatické rozpoznávání řečníka. Podrobně analyzuje zejména pravděpodobnostní lineární diskriminační analýzu (PLDA), která modeluje nízkodimenzionální reprezentace promluv ve formě \acronym{i--vektorů}.  Práce navrhuje dvě rozšíření v současnosti požívaného PLDA modelu. Nově navržený PLDA model s plným posteriorním rozložením  modeluje neurčitost při generování i--vektorů. Práce také navrhuje nový diskriminativní přístup k trénování systému pro verifikaci řečníka, který je založený na PLDA. Pokud srovnáváme původní PLDA s modelem rozšířeným o modelování  neurčitosti i--vektorů, výsledky dosažené s rozšířeným modelem dosahují až 20% relativního zlepšení při testech s krátkými nahrávkami. Pro delší  testovací segmenty  (více než jedna minuta) je zisk v přesnosti  menší, nicméně přesnost nového modelu není nikdy menší než přesnost výchozího systému.  Trénovací data jsou ale obvykle dostupná ve formě dostatečně dlouhých segmentů, proto v těchto případech použití nového modelu neposkytuje žádné výhody při trénování. Při trénování může být použit původní PLDA model a jeho rozšířená verze může být využita pro získání skóre v  případě, kdy se bude provádět testování na krátkých segmentech řeči. Diskriminativní model je založen na klasifikaci dvojic i--vektorů do dvou tříd představujících oprávněný a neoprávněný soud (target a non-target trial). Funkcionální forma pro získání skóre pro každý pár je odvozena z PLDA a trénování je založeno na logistické regresi, která minimalizuje vzájemnou entropii mezi správným označením všech soudů a pravděpodobnostním označením soudů, které navrhuje systém. Výsledky dosažené s diskriminativně trénovaným klasifikátorem jsou podobné výsledkům generativního PLDA, ale diskriminativní systém prokazuje schopnost produkovat lépe kalibrované skóre. Tato schopnost vede k lepší skutečné přesnosti na neviděné evaluační sadě, což je důležitá vlastnost pro reálné použití.

Page generated in 0.0598 seconds