• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 8
  • 3
  • 2
  • Tagged with
  • 14
  • 8
  • 6
  • 5
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Subfamily classification of the Defensin gene superfamily

Shikhagaie, Medya January 2004 (has links)
Defensins are small cysteine-rich, cationic peptides that play an essential role in the innate immune system of virtually all life forms, from insects and plants to amphibians and mammals. Defensins are mainly an innate immunity element, exhibiting antibacterial activities by disrupting the cell membrane of a wide range of organisms (Cole et al. 2002). Defensins also affect certain adaptive immune responses, including enhancing phagocytosis, promoting neutrophil recruitment, and enhancing the production of proinflammatory cytokines. The aim of this thesis is to make a comprehensive and accurate subfamily classification of the defensin gene family, primarily by using a library of Hidden Markov Models (HMMs). In this project the subfamily classification of the defensin gene family is primarily based on a constructed library of HMMs. Results: Sets of known defensins were organized in placed in 84 clusters using the clustering and alignment tool, FlowerPower. The clusters were further classified as mammalian alpha- or beta-defensins, plant defensin, insect defensin and defensin MGD. This classification was based on significant cluster hits against the Structural Classification of Proteins (SCOP) database and species distribution. Based on the relative positions of disulfide bonds and constructed Multiple Sequence Alignments (MSAs) some sequences were classified as belonging to the sperm– and theta-defensin subfamilies. Compared to PFAM’s classification of defensins, the subfamily classification presented here is more informative. The library of HMMs has been made public via a web server that was used to automatically score and analyze input sequences against the created database of HMMs. This database and web server are expected to be useful to researchers working on various aspects of defensin action.
2

Subfamily classification of the Defensin gene superfamily

Shikhagaie, Medya January 2004 (has links)
<p>Defensins are small cysteine-rich, cationic peptides that play an essential role in the innate immune system of virtually all life forms, from insects and plants to amphibians and mammals. Defensins are mainly an innate immunity element, exhibiting antibacterial activities by disrupting the cell membrane of a wide range of organisms (Cole et al. 2002). Defensins also affect certain adaptive immune responses, including enhancing phagocytosis, promoting neutrophil recruitment, and enhancing the production of proinflammatory cytokines.</p><p>The aim of this thesis is to make a comprehensive and accurate subfamily classification of the defensin gene family, primarily by using a library of Hidden Markov Models (HMMs). In this project the subfamily classification of the defensin gene family is primarily based on a constructed library of HMMs. Results: Sets of known defensins were organized in placed in 84 clusters using the clustering and alignment tool, FlowerPower. The clusters were further classified as mammalian alpha- or beta-defensins, plant defensin, insect defensin and defensin MGD. This classification was based on significant cluster hits against the Structural Classification of Proteins (SCOP) database and species distribution. Based on the relative positions of disulfide bonds and constructed Multiple Sequence Alignments (MSAs) some sequences were classified as belonging to the sperm– and theta-defensin subfamilies. Compared to PFAM’s classification of defensins, the subfamily classification presented here is more informative. The library of HMMs has been made public via a web server that was used to automatically score and analyze input sequences against the created database of HMMs. This database and web server are expected to be useful to researchers working on various aspects of defensin action.</p>
3

Reconhecimento e predição de promotores procarióticos: investigação de uma metodologia in silico baseada em HMMs

Reis, Adriana Neves dos 03 March 2005 (has links)
Made available in DSpace on 2015-03-05T13:53:45Z (GMT). No. of bitstreams: 0 Previous issue date: 3 / Universidade do Vale do Rio dos Sinos / A expressão dos genes em procariotos é desencadeada quando a enzima RNApolimerase interage com uma região adjacente ao gene, chamada de promotor, onde se encontram os principais elementos regulatórios do processo de transcrição. Apesar do crescente avanço das técnicas experimentais em biologia molecular, caracterizar e identificar um número significante de promotores, presentes em um dado genoma, continua sendo uma tarefa demorada e cara. Abordagens in silico são bastante utilizadas para reconhecer essas regiões em procariotos. Entretanto, além do alto número de falsos positivos obtidos, elas enfrentam a inexistência de um número adequado de promotores conhecidos para identificar padrões conservados entre as espécies. Logo, um método criterioso e confiável para predizêlos em qualquer organismo procariótico ainda é um desafio. Esta dissertação propõe um protocolo de uso de hidden Markov models (HMMs) que emprega Estimação de Limiar de Decisão (ELD) e Análise de Discriminação (AD) neste problema. Quatro espécie / Gene expression on prokaryotes initiates when the RNA-polymerase enzyme interacts with DNA regions called promoters. In these regions are located the main regulatory elements of the transcription process. Despite the improvement of in vitro techniques for molecular biology analysis, characterizing and identifying a great number of promoters on a genome is a complex task. In silico approaches are usually employed to recognize theses regions on prokaryotes. Nevertheless, the main drawback is the absence of a large set of promoters to identify conserved patterns among the species. Hence, a in silico method to predict them on any species is a challenge. This work proposes a protocol to use hidden Markov models (HMMs) methodology with Decision Threshold Estimation and Discrimination Analysis on this problem. Four prokaryotic species are investigated (Escherichia coli, Bacillus subtilis, Helicobacter pylori e Helicobacter hepaticus). The influence of different aspects in the recognition and prediction are examined:
4

Uma abordagem integrada para a construção e utilização de HMMs de perfil para análises genômicas e metagenômicas / An integrated approach for the construction and application of profile HMMs for genomic and metagenomic analyses.

Kashiwabara, Liliane Santana Oliveira 02 August 2019 (has links)
HMMs de perfil são um método poderoso para modelar a diversidade de sequências biológicas e constituem uma abordagem muito sensível para a detecção de ortólogos remotos. Uma potencial aplicação de tais modelos é a detecção de vírus emergentes e novos elementos genéticos móveis. Nosso grupo desenvolveu recentemente o GenSeed-HMM, um programa que emprega HMMs de perfil como sementes para montagem progressiva de genes-alvo, utilizando tanto dados genômicos como metagenômicos. No presente trabalho foi desenvolvido o TABAJARA, um programa para o desenho racional de HMMs de perfil. Partindo de um alinhamento de múltiplas sequências, o TABAJARA é capaz de encontrar blocos que são (1) conservados ou (2) discriminativos para dois ou mais grupos de sequências. O programa utiliza diferentes métricas para atribuir pontuações posição-específicas ao longo de todo o alinhamento e utiliza então uma janela deslizante para encontrar as regiões com maiores pontuações. Blocos de alinhamento selecionados são então extraídos e utilizados para construir HMMs de perfil. Para validar o método, o programa TABAJARA foi empregado para a construção de modelos para vírus do gênero Flavivirus e para fagos da família Microviridae. Em ambos os grupos virais foi possível se obter modelos de ampla abrangência, capazes de detectar todos os membros de um respectivo grupo taxonômico, e modelos de abrangência mais restrita, específicos para espécies distintas de Flavivirus (ex. DENV, ZIKV ou YFV) ou subfamílias de Microviridae (ex. Alpavirinae, Gokushovirinae e Pichovirinae). Em outra validação, foram utilizadas sequências da endonuclease Cas1 para se obter modelos capazes de diferenciar CRISPRs de casposons, esses últimos representando uma superfamília de transposons de DNA autossintetizantes, os quais originaram o sistema de imunidade CRISPR-Cas de procariotos. O TABAJARA conseguiu gerar modelos específicos de Cas1 derivada de casposons, permitindo sua diferenciação em relação aos seus ortólogos de CRISPRs. No presente trabalho foi desenvolvido ainda o HMM-Prospector, uma ferramenta que utiliza um conjunto de HMMs de perfil para a triagem de dados de sequenciamento genômico ou metagenômico. O programa informa quais são os modelos mais reconhecidos pelas leituras, sob valores de corte de pontuação definidos pelo usuário, assim como quantas leituras são detectadas por cada modelo. Com esta informação, os modelos mais relevantes podem ser utilizados como sementes em montagens progressivas com o programa GenSeed-HMM, dentro de uma abordagem integrada para a construção de modelos e sua aplicação. Finamente, foi desenvolvido o e-Finder, um aplicativo genérico para a detecção e extração de elementos multigênicos a partir de genomas ou metagenomas montados utilizando HMMs de perfil. O e-Finder executa buscas de similaridade entre os HMMs de perfil e as sequências traduzidas dos dados montados e checa, em seguida, se os critérios de sintenia pré-definidos foram atendidos, incluindo o número mínimo de genes, a ordem dos genes e as distâncias intergênicas. As sequências dos elementos são então extraídas, as regiões codificantes (ORFs) identificadas e traduzidas conceitualmente em sequências completas de proteínas. Para validar esta ferramenta, foram empegados dois estudos de caso, profagos da família Microviridae e casposons, utilizando-se HMMs de perfil específicos, construídos com o programa TABAJARA. Em ambos os casos, o e-Finder foi executado usando-se a base de dados PATRIC, um repositório com mais de 135.000 genomas de bactérias e arqueias. Foram identificados um total de 91 contigs positivos para casposons a partir de 79 genomas distintos. No caso dos Microviridae, foram encontrados 104 profagos candidatos, estendendo o conhecimento da gama de hospedeiros bacterianos. Em ambos os casos, análises filogenéticas confirmaram a correta atribuição taxonômica das sequências positivas. Os programas desenvolvidos neste trabalho podem ser utilizados isoladamente ou em combinação para detectar e discriminar sequências conhecidas ou remotamente relacionadas. Juntamente com o GenSeed-HMM, estes programas constituem um conjunto integrado de ferramentas com potencial aplicação na busca de novos vírus e elementos genéticos móveis, bem como em qualquer outra tarefa relacionada à detecção e/ou discriminação de subgrupos de famílias de sequências nucleotídicas ou proteicas / Profile HMMs are a powerful way of modeling sequence diversity and constitute a very sensitive approach to detect remote orthologs. A potential application of such models is the detection of emerging viruses and novel mobile genetic elements. Our group has recently developed GenSeed-HMM, a tool that employs profile HMMs as seeds for gene-targeted progressive assembly using either genomic or metagenomic data. In this work we developed TABAJARA, a program for the rational design of profile HMMs. Starting from a multiple sequence alignment, TABAJARA is able to find blocks that are either (1) conserved across all sequences or (2) discriminative for two or more specific groups of sequences. The program uses different metrics to ascribe position-specific scores along the whole alignment and then uses a sliding-window to find top-scoring regions. Selected alignment blocks are then extracted and used to build profile HMMs. To validate the method, we employed TABAJARA to construct models for viruses of the Flavivirus genus and phages of the Microviridae family. In both viral groups we were able to obtain wide-range models, able to detect all members of the respective taxonomic group, and models that are specific to particular Flavivirus species (e.g. DENV, ZIKV or YFV) or Microviridae subfamilies (e.g. Alpavirinae, Gokushovirinae and Pichovirinae). In another validation, we used sequences of the endonuclease Cas1 to obtain models capable of differentiating CRISPRs from casposons, the latter elements representing a superfamily of self-synthesizing DNA transposons that originated the prokaryotic CRISPR-Cas immunity. TABAJARA succeeded to generate models specific to casposon-derived Cas1, enabling their differentiation from CRISPR orthologs. We also developed HMM-Prospector, a tool that can use a batch of profile HMMs to screen genomic or metagenomic sequencing data, reporting which profile HMMs are mostly recognized under user-defined score cutoff values, and how many reads are detected by each model. With this information, the most relevant models can be used as seeds in progressive assemblies with GenSeed-HMM program, providing an integrated approach for model construction and application. Finally, we developed e-Finder, a generic application for detecting and extracting multigene elements from assembled genomes or metagenomes using profile HMMs. e-Finder runs similarity searches of profile HMMs against translated sequences of the assembled data and then checks if pre-defined syntenic criteria have been fulfilled, including minimum number of genes, gene order and intergenic distances. Element sequences are then extracted, their ORFs identified and conceptually translated into full-length protein sequences. To validate the tool, we employed two distinct case studies, prophages of the Microviridae family and casposons, using specific profile HMMs constructed by TABAJARA. In both cases, we executed e-Finder using the PATRIC database, a repository with over 135,000 bacterial and archaeal genomes. We identified in total 91 casposon-positive contigs from 79 distinct genomes. In the case of Microviridae, we found a total of 104 provirus candidates, extending the known range of bacterial hosts. In both cases, phylogenetic analyses confirmed the correct taxonomic assignment of the positive sequences. The programs developed in this work can be used alone or in combination to detect and discriminate known or distantly related sequences. Together with GenSeed-HMM, these programs provide an integrated toolbox with potential application in the search of novel viruses and mobile genetic elements, as well as in any other task related to the detection and/or discrimination of subgroups of DNA or protein sequences.
5

Chereme- Based Recognition of Isolated, Dynamic Gestures from South African Sign Language with Hidden Markov Models

Rajah, Christopher January 2006 (has links)
Masters of Science / Much work has been done in building systems that can recognise gestures, e.g. as a component of sign language recognition systems. These systems typically use whole gestures as the smallest unit for recognition. Although high recognition rates have been reported, these systems do not scale well and are computationally intensive. The reason why these systems generally scale poorly is that they recognize gestures by building individual models for each separate gesture; as the number of gestures grows, so does the required number of models. Beyond a certain threshold number of gestures to be recognized, this approach becomes infeasible. This work proposes that similarly good recognition rates can be achieved by building models for subcomponents of whole gestures, so-called cheremes. Instead of building models for entire gestures, we build models for cheremes and recognize gestures as sequences of such cheremes. The assumption is that many gestures share cheremes and that the number of cheremes necessary to describe gestures is much smaller than the number of gestures. This small number of cheremes then makes it possible to recognize a large number of gestures with a small number of chereme models. This approach is akin to phoneme-based speech recognition systems where utterances are recognized as phonemes which in turn are combined into words. We attempt to recognise and classify cheremes found in South African Sign Language (SASL). We introduce a method for the automatic discovery of cheremes in dynamic signs. We design, train and use hidden Markov models (HMMs) for chereme recognition. Our results show that this approach is feasible in that it not only scales well, but it also generalizes well. We are able to recognize cheremes in signs that were not used for training HMMs; this generalization ability is a basic necessity for chemere-based gesture recognition. Our approach can thus lay the foundation for building a SASL dynamic gesture recognition system.
6

Reconnaissance de caractères par méthodes markoviennes et réseaux bayésiens

Hallouli, Khalid 05 1900 (has links) (PDF)
Cette thése porte sur la reconnaissance de caractères imprimés et manuscrits par méthodes markoviennes et réseaux bayésiens. La première partie consiste à effectuer une modélisation stochastique markovienne en utilisant les HMMs classiques dans deux cas: semi-continu et discret. Un premier modèle HMM est obtenu à partir d'observations de type colonnes de pixels (HMM-vertical), le second à partir d'observations de type lignes (HMM-horizontal). Ensuite nous proposons deux types de modèles de fusion : modèle de fusion de scores qui consiste à combiner les deux vraisemblances résultantes des deux HMMs, et modèle de fusion de données qui regroupe simultanément les deux observations lignes et colonnes. Les résultats montrent l'importance du cas semi-continu et la performance des modèles de fusion. Dans la deuxième partie nous développons les réseaux bayésiens statiques et dynamiques, l'algorithme de Jensen Lauritzen Olesen (JLO) servant comme moteur d'inférence exacte, ainsi que l'apprentissage des paramètres avec des données complètes et incomplètes. Nous proposons une approche pour la reconnaissance de caractères (imprimés et manuscrits) en employant le formalisme des réseaux bayésiens dynamiques. Nous construisons certains types de modèles: HMM sous forme de réseau bayésien dynamique, modèle de trajectoire et modèles de couplages. Les résultats obtenus mettent en évidence la bonne performance des modèles couplés. En général nos applications nous permettent de conclure que l'utilisation des réseaux bayésiens est efficace et très prometteuse par le fait de modéliser les dépendances entre différentes observations dans les images de caractères.
7

From protein sequence to structural instability and disease

Wang, Lixiao January 2010 (has links)
A great challenge in bioinformatics is to accurately predict protein structure and function from its amino acid sequence, including annotation of protein domains, identification of protein disordered regions and detecting protein stability changes resulting from amino acid mutations. The combination of bioinformatics, genomics and proteomics becomes essential for the investigation of biological, cellular and molecular aspects of disease, and therefore can greatly contribute to the understanding of protein structures and facilitating drug discovery. In this thesis, a PREDICTOR, which consists of three machine learning methods applied to three different but related structure bioinformatics tasks, is presented: using profile Hidden Markov Models (HMMs) to identify remote sequence homologues, on the basis of protein domains; predicting order and disorder in proteins using Conditional Random Fields (CRFs); applying Support Vector Machines (SVMs) to detect protein stability changes due to single mutation. To facilitate structural instability and disease studies, these methods are implemented in three web servers: FISH, OnD-CRF and ProSMS, respectively. For FISH, most of the work presented in the thesis focuses on the design and construction of the web-server. The server is based on a collection of structure-anchored hidden Markov models (saHMM), which are used to identify structural similarity on the protein domain level. For the order and disorder prediction server, OnD-CRF, I implemented two schemes to alleviate the imbalance problem between ordered and disordered amino acids in the training dataset. One uses pruning of the protein sequence in order to obtain a balanced training dataset. The other tries to find the optimal p-value cut-off for discriminating between ordered and disordered amino acids.  Both these schemes enhance the sensitivity of detecting disordered amino acids in proteins. In addition, the output from the OnD-CRF web server can also be used to identify flexible regions, as well as predicting the effect of mutations on protein stability. For ProSMS, we propose, after careful evaluation with different methods, a clustered by homology and a non-clustered model for a three-state classification of protein stability changes due to single amino acid mutations. Results for the non-clustered model reveal that the sequence-only based prediction accuracy is comparable to the accuracy based on protein 3D structure information. In the case of the clustered model, however, the prediction accuracy is significantly improved when protein tertiary structure information, in form of local environmental conditions, is included. Comparing the prediction accuracies for the two models indicates that the prediction of mutation stability of proteins that are not homologous is still a challenging task. Benchmarking results show that, as stand-alone programs, these predictors can be comparable or superior to previously established predictors. Combined into a program package, these mutually complementary predictors will facilitate the understanding of structural instability and disease from protein sequence.
8

Construção e aplicação de HMMs de perfil para a detecção e classificação de vírus / Construction and application of profile HMMs for the specific detection and classification of viruses

Guimarães, Miriã Nunes 22 February 2019 (has links)
Os vírus são as entidades biológicas mais abundantes encontradas na natureza. O método clássico de estudo dos vírus requerem seu isolamento e propagação in vitro. Contudo, necessita-se ter um conhecimento prévio sobre as condições necessárias para seu cultivo em células, sendo assim a maior parte dos vírus existentes não é conhecida. Análises metagenômicas são uma alternativa para a detecção e caracterização de novos vírus, uma vez que não requerem um cultivo prévio e as amostras podem conter material genético de múltiplos organismos. Uma vez obtidas as sequências montadas a partir das leituras metagenômicas, o método mais utilizado para a identificação e classificação dos organismos é a busca de similaridade com o programa BLAST contra bancos de sequências conhecidas. Contudo, métodos de alinhamento pareado são capazes de identificar apenas sequências com identidade superior a 20-30%. Uma alternativa a essa limitação é o uso de métodos baseados no uso de perfis, que podem aumentar a sensibilidade de detecção de homólogos filogeneticamente distantes. HMMs de perfil são modelos probabilísticos capazes de representar a diversidade de caracteres em posições-específicas de um alinhamento de múltiplas sequências. Nosso grupo desenvolveu a ferramenta TABAJARA, utilizada neste projeto, para a identificação de blocos que podem ser conservados em todas as sequências do alinhamento ou discriminativos entre grupos de sequências. Esses blocos são utilizados para a geração de HMMs de perfil, os quais podem ser usados, no contexto da virologia, para a identificação de grupos taxonômicos amplos como famílias virais ou, ainda, taxa mais restritos como gêneros ou mesmo espécies de vírus. O presente projeto teve como objetivos aplicar e otimizar o programa TABAJARA em diferentes grupos taxonômicos de vírus, construir modelos específicos para cada um desses grupos e validar esses modelos em dados metagenômicos. O primeiro modelo de estudo escolhido foi a ordem Bunyavirales, composta de vírus de ssRNA (-) majoritariamente envelopados e esféricos, com genoma segmentado e pertencentes ao grupo 5 da classificação de Baltimore. Este grupo inclui vírus causadores de várias doenças em humanos, animais e plantas. O segundo modelo de estudo escolhido foi a família Togaviridae, composta de vírus de ssRNA (+) envelopados e esféricos, cujo genoma expressa uma poliproteína e pertencem ao grupo 4 da classificação de Baltimore. Este grupo inclui o vírus Chikungunya e outras espécies que causam diversas patologias ao homem. O terceiro modelo de estudo escolhido foi a subfamília Spounavirinae, compreendendo bacteriófagos que infectam vários hospedeiros bacterianos e em alguns casos possuem potencial terapêutico comprovado contra infecções bacterianas que afetam o homem. Estes fagos apresentam partículas virais com estrutura cabeça-cauda, não são envelopados, apresentam genoma de dsDNA e pertencem ao grupo 1 da classificação de Baltimore. Todos os modelos construídos foram validados quanto à sensibilidade e especificidade de detecção e, ao final, foram utilizados em análises de prospecção de vírus em dados metagenômicos obtidos na base SRA do NCBI. Os HMMs de perfil apresentaram excelente desempenho, comprovando a viabilidade da metodologia proposta neste projeto. Os resultados apresentados neste trabalho abrem a perspectiva da ampla utilização de HMMs de perfil como ferramentas universais para a detecção e classificação de vírus em dados metagenômicos. / Viruses are the most widely biological entities found in nature. Most of the information that can be obtained from these organisms requires viral in vitro isolation and cultivation. However, most of the existing viruses are still unknown because the biological requirements for their successful propagation have not been identified so far. Metagenomic analyses offer an interesting alternative for the detection and characterization of novel viruses, since previous cultivation is not required, and the samples may contain genetic material of multiple organisms. Once assembled sequences are obtained from individual reads, the most widely used method for viral identification and classification is the use of BLAST similarity searches against databases of known sequences. However, pairwise alignment methods are only able to identify sequences that present identity greater than 20-30%. Profile-based methods may increase the sensitivity of detection of remote homologues. Profile HMMs are probabilistic models capable of representing the diversity of amino acid residues at specific positions of a multiple sequence alignment. Our group is developing TABAJARA, a tool for the identification of alignment blocks that are conserved across all sequences of the alignment or discriminative between groups of sequences. These blocks are used to generate profile HMMs, which can in turn be used, in the context of virology, to identify broad taxonomic groups, such as viral families, or narrower taxa as genera or viral species. The present project aimed to apply and standardize the use of TABAJARA in different taxonomic groups of viruses, to build specific models for each of these groups and to validate these models in metagenomic data. We used three viral models for this study. The first chosen model was the Bunyavirales order, composed of mostly enveloped and spherical ssRNA(-) viruses with a segmented genome belonging to group 5 of the Baltimore classification. This group includes viruses that cause several important diseases in humans, animals and plants. The second chosen model was the Togaviridae family, composed of enveloped and spherical ssRNA(+) viruses, with a genome coding for a polyprotein, and belonging to group 4 of the Baltimore classification. This group includes the Chikungunya virus and some other viral species that cause relevant pathologies to humans and animals. Finally, we used the Spounavirinae subfamily, comprising viruses that infect a variety of bacterial hosts and that can potentially be used for phage therapy of some human bacterial diseases. These phages present non-enveloped virions with a head-to-tail structure, a dsDNA genome, and belong to group 1 of the Baltimore classification. All constructed profile HMMs were evaluated in regard to their sensitivity and specificity of detection, as well as tested in viral surveys using metagenomic data from the SRA database. The profile HMMs presented excellent performance, proving the viability of the methodology proposed in this project. The results presented in this work open the perspective of the wide use of profile HMMs as universal tools for the detection and classification of viruses in metagenomic data.
9

Speech Signal Classification Using Support Vector Machines

Sood, Gaurav 07 1900 (has links)
Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high‐performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the dependency on Hidden Markov Models is a fact. During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed. In this work a novel approach based upon probabilistic kernels in support vector machines have been attempted for speech data classification. The classification accuracy in case of support vector classification depends upon the kernel function used which in turn depends upon the data set in hand. But still as of now there is no way to know a priori which kernel will give us best results The kernel used in this work tries to normalize the time dimension by fitting a probability distribution over individual data points which normalizes the time dimension inherent to speech signals which facilitates the use of support vector machines since it acts on static data only. The divergence between these probability distributions fitted over individual speech utterances is used to form the kernel matrix. Vowel Classification, Isolated Word Recognition (Digit Recognition), have been attempted and results are compared with state of art systems.
10

Discovery Of Application Workloads From Network File Traces

Yadwadkar, Neeraja 12 1900 (has links) (PDF)
An understanding of Input/Output data access patterns of applications is useful in several situations. First, gaining an insight into what applications are doing with their data at a semantic level helps in designing efficient storage systems. Second, it helps to create benchmarks that mimic realistic application behavior closely. Third, it enables autonomic systems as the information obtained can be used to adapt the system in a closed loop. All these use cases require the ability to extract the application-level semantics of I/O operations. Methods such as modifying application code to associate I/O operations with semantic tags are intrusive. It is well known that network file system traces are an important source of information that can be obtained non-intrusively and analyzed either online or offline. These traces are a sequence of primitive file system operations and their parameters. Simple counting, statistical analysis or deterministic search techniques are inadequate for discovering application-level semantics in the general case, because of the inherent variation and noise in realistic traces. In this paper, we describe a trace analysis methodology based on Profile Hidden Markov Models. We show that the methodology has powerful discriminatory capabilities that enables it to recognize applications based on the patterns in the traces, and to mark out regions in a long trace that encapsulate sets of primitive operations that represent higher-level application actions. It is robust enough that it can work around discrepancies between training and target traces such as in length and interleaving with other operations. We demonstrate the feasibility of recognizing patterns based on a small sampling of the trace, enabling faster trace analysis. Preliminary experiments show that the method is capable of learning accurate profile models on live traces in an online setting. We present a detailed evaluation of this methodology in a UNIX environment using NFS traces of selected commonly used applications such as compilations as well as on industrial strength benchmarks such as TPC-C and Postmark, and discuss its capabilities and limitations in the context of the use cases mentioned above.

Page generated in 0.0283 seconds