• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 63
  • 12
  • 10
  • 9
  • 7
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 132
  • 132
  • 44
  • 21
  • 17
  • 17
  • 16
  • 15
  • 14
  • 13
  • 11
  • 10
  • 10
  • 9
  • 9
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

Pattern Matching for Financial Time Series Data

Liu, Ching-An 29 July 2008 (has links)
In security markets, the stock price movements are closely linked to the market information. For example, the subprime mortgage triggered a global financial crisis in 2007. Drops occurred in virtually every stock market in the world. After the Federal Reserve took several steps to address the crisis, the stock markets have been gradually stable. Reaction of the traders to the arrival information results in different patterns of the stock price movements. Thus pattern matching is an important subject in future movement prediction, rule discovery and computer aided diagnosis. In this research, we propose a pattern matching procedure to seize the similar stock price movements of two listed companies during one day. First, the algorithm of searching the longest common subsequence is introduced to sieve out the time intervals where the two listed companies have the same integrated volatility levels and price rise/drop trends. Next we transform the raw price data in the found matching time periods to the Bollinger Band Percent data, then use the power spectrum to extract low frequency components. Adjusted Pearson chi-squared tests are performed to analyze the similarity of the price movement patterns in these periods. We perform the study by simulation investigation first, then apply the procedure to empirical analysis of high frequency transaction data of NYSE.
92

Evaluation of biometric security systems against artificial fingers

Blommé, Johan January 2003 (has links)
<p>Verification of users’ identities are normally carried out via PIN-codes or ID- cards. Biometric identification, identification of unique body features, offers an alternative solution to these methods. </p><p>Fingerprint scanning is the most common biometric identification method used today. It uses a simple and quick method of identification and has therefore been favored instead of other biometric identification methods such as retina scan or signature verification. </p><p>In this report biometric security systems have been evaluated based on fingerprint scanners. The evaluation method focuses on copies of real fingers, artificial fingers, as intrusion method but it also mentions currently used algorithms for identification and strengths and weaknesses in hardware solutions used. </p><p>The artificial fingers used in the evaluation were made of gelatin, as it resembles the surface of human skin in ways of moisture, electric resistance and texture. Artificial fingers were based on ten subjects whose real fingers and artificial counterpart were tested on three different fingerprint scanners. All scanners tested accepted artificial fingers as substitutes for real fingers. Results varied between users and scanners but the artificial fingers were accepted between about one forth and half of the times. </p><p>Techniques used in image enhancement, minutiae analysis and pattern matching are analyzed. Normalization, binarization, quality markup and low pass filtering are described within image enhancement. In minutiae analysis connectivity numbers, point identification and skeletonization (thinning algorithms) are analyzed. Within pattern matching, direction field analysis and principal component analysis are described. Finally combinations of both minutiae analysis and pattern matching, hybrid models, are mentioned. </p><p>Based on experiments made and analysis of used techniques a recommendation for future use and development of fingerprint scanners is made.</p>
93

Αλγόριθμοι διαχείρισης και ανάλυσης ακολουθιών βιολογικών δεδομένων με εφαρμογή σε προβλήματα βιοπληροφορικής / Algorithms for the analysis of biological sequences with application on bioinformatics problems

Περδικούρη, Αικατερίνη 26 February 2009 (has links)
Αντικείμενο της παρούσας διδακτορικής διατριβής είναι η μελέτη και η σχεδίαση αποδοτικών αλγορίθμων για τη διαχείριση και ανάλυση ακολουθιών βιολογικών δεδομένων. Οι αλγόριθμοι που θα περιγράψουμε εφαρμόζονται σε προβλήματα Βιοπληροφορικής, όπως η αναγνώριση γνωστών ή άγνωστων μοτίβων του DNA και RNA, που εμπλέκονται σε ποικίλες βιολογικές διεργασίες καθώς και η ανακάλυψη περιοδικοτήτων. Ειδικότερα οι αλγόριθμοι που θα παρουσιάσουμε χρησιμοποιούνται για την ανάλυση Βιολογικών Ακολουθιών με “αδιάφορους χαρακτήρες” και Βιολογικών Ακολουθιών με Βάρη. Οι Βιολογικές Ακολουθίες με “αδιάφορους χαρακτήρες” αναπαριστούν συνήθως οικογένειες πρωτεϊνών ενώ οι Βιολογικές Ακολουθίες με βάρη αναπαριστούν συναρμολογούμένες ακολουθίες γονιδιωμάτων που έχουν πρόσφατα αλληλουχηθεί. Στις Βιολογικές Ακολουθίες με αδιάφορους χαρακτήρες παρουσιάζουμε δυο αποδοτικούς αλγορίθμους γραμμικού χρόνου για τον υπολογισμό της περιόδου και τον υπολογισμό του καλύμματος. Ο δεύτερος αλγόριθμος εφαρμόζεται και σε κυκλικά (circular DNAs). Στις Βιολογικές Ακολουθίες με βάρη παρουσιάζουμε δυο αλγορίθμους για τον υπολογισμό των βασικών περιοδικοτήτων: της περιόδου και του καλύμματος ενώ επιλύουμε και το πρόβλημα της εύρεσης προτύπου. Η ανάγκη για αποδοτική διαχείριση βιολογικών ακολουθιών με βάρη μας ώθησε να εισάγουμε μια νέα αποδοτική δομή η οποία επιλύει αποδοτικά τα 2 προηγούμενα προβλήματα. Η δομή αυτή ονομάζεται Δέντρο Επιθεμάτων με Βάρη. Χρησιμοποιώντας το Δέντρο Επιθεμάτων με Βάρη επιλύουμε διάφορες παραλλαγές του προβλήματος εξαγωγής μοτίβων από Βιολογικές Ακολουθίες με Βάρη. Τέλος αποφασίσαμε να μελετήσουμε τη χρήση των Γενετικών Αλγορίθμων και του Εξελικτικού Προγραμματισμού στην ανάλυση ακολουθιών βιολογικών δεδομένων. Αποτέλεσμα αυτής της μελέτης είναι η περιγραφή ενός γενετικού αλγορίθμου που υπολογίζει τις επαναλήψεις σε μια βιολογική ακολουθία. / The object of this doctoral thesis is the study and the design of efficient algorithms for the analysis of sequences of biological data. The algorithms that we describe have application on Bioinformatics problems, such as the recognition of known or unknown patterns in DNA and RNA that are involved in various biological activities, as well as the discovery of periodicities. More specifically the algorithms that we present are used for the analysis of Biological Sequences with “don't care characters”' and Weighted Biological Sequences. Biological Sequences with “don't care characters”, usually represent protein families while Weighted Biological Sequences represent assembled sequences of genomes that they have been recently sequenced. In Biological Sequences with “don't care characters”' we present two efficient algorithms of linear time for the computation of the period and the cover. The second algorithm is also applied in circular DNAs . In Weighted Biological Sequences we present two algorithms for the computation of basic periodicities: the period and the cover, while we also solve the problem of pattern matching. The need for efficient management of biological sequences with weights prompted us to introduce a new efficient data structure which solves efficiently the two precedents problems. This structure is named Weighted Suffix Tree. Using the Weighted Suffix Tree we solve various instances of the motif discovery problem in Biological Weighted Sequences. Finally we decided to study the use of Genetic Algorithms and Evolutionary Programming in the analysis of biological sequences. The result of this study is the description of a genetic algorithm that computes the repetitions in a biological sequence.
94

A Compiler for the dependently typed language Beluga

Ferreira Ruiz, Francisco 05 1900 (has links)
Les structures avec des lieurs sont très communes en informatique. Les langages de programmation et les systèmes logiques sont des exemples de structures avec des lieurs. La manipulation de lieurs est délicate, de sorte que l’écriture de programmes qui ma- nipulent ces structures tirerait profit d’un soutien spécifique pour les lieurs. L’environ- nement de programmation Beluga est un exemple d’un tel système. Nous développons et présentons ici un compilateur pour ce système. Parmi les programmes pour lesquels Beluga est spécialement bien adapté, plusieurs peuvent bénéficier d’un compilateur. Par exemple, les programmes pour valider les types (les "type-checkers"), les compilateurs et les interpréteurs tirent profit du soutien spécifique des lieurs et des types dépendants présents dans le langage. Ils nécessitent tous également une exécution efficace, que l’on propose d’obtenir par le biais d’un compilateur. Le but de ce travail est de présenter un nouveau compilateur pour Beluga, qui emploie une représentation interne polyvalente et permet de partager du code entre plusieurs back-ends. Une contribution notable est la compilation du filtrage de Beluga, qui est particulièrement puissante dans ce langage. / In computer science, structures with variable binders are very common. Program- ming languages and logical frameworks are examples of structures with binders. Thus writing programs that deal with these kinds of data benefits with explicit support for data binding. The Beluga programming environment is an example of such a system. In this work we develop and present a compiler for the system. Many of the programs that Beluga is specially well suited for writing can benefit from a compiler. For example, some of the kinds programs that would benefit more are type-checkers, compilers and interpreters that take advantage of the binder support and dependent types present in the language, and also require a reasonably fast run-time. Our goal in this work, is to present a compiler for the Beluga system, that uses a very versatile internal representation that helps with the development of the system, and allows a sharing of code between several back-ends. Furthermore, we present a way of compiling the uniquely powerful pattern language supported by Beluga.
95

Simplifying the Analysis of C++ Programs

Solodkyy, Yuriy 16 December 2013 (has links)
Based on our experience of working with different C++ front ends, this thesis identifies numerous problems that complicate the analysis of C++ programs along the entire spectrum of analysis applications. We utilize library, language, and tool extensions to address these problems and offer solutions to many of them. In particular, we present efficient, expressive and non-intrusive means of dealing with abstract syntax trees of a program, which together render the visitor design pattern obsolete. We further extend C++ with open multi-methods to deal with the broader expression problem. Finally, we offer two techniques, one based on refining the type system of a language and the other on abstract interpretation, both of which allow developers to statically ensure or verify various run-time properties of their programs without having to deal with the full language semantics or even the abstract syntax tree of a program. Together, the solutions presented in this thesis make ensuring properties of interest about C++ programs available to average language users.
96

Coloring, packing and embedding of graphs

Tahraoui, Mohammed Amin 04 December 2012 (has links) (PDF)
In this thesis, we investigate some problems in graph theory, namelythe graph coloring problem, the graph packing problem and tree pattern matchingfor XML query processing. The common point between these problems is that theyuse labeled graphs.In the first part, we study a new coloring parameter of graphs called the gapvertex-distinguishing edge coloring. It consists in an edge-coloring of a graph G whichinduces a vertex distinguishing labeling of G such that the label of each vertex isgiven by the difference between the highest and the lowest colors of its adjacentedges. The minimum number of colors required for a gap vertex-distinguishing edgecoloring of G is called the gap chromatic number of G and is denoted by gap(G).We will compute this parameter for a large set of graphs G of order n and we evenprove that gap(G) 2 fn E 1; n; n + 1g.In the second part, we focus on graph packing problems, which is an area ofgraph theory that has grown significantly over the past several years. However, themajority of existing works focuses on unlabeled graphs. In this thesis, we introducefor the first time the packing problem for a vertex labeled graph. Roughly speaking,it consists of graph packing which preserves the labels of the vertices. We studythe corresponding optimization parameter on several classes of graphs, as well asfinding general bounds and characterizations.The last part deal with the query processing of a core subset of XML query languages:XML twig queries. An XML twig query, represented as a small query tree,is essentially a complex selection on the structure of an XML document. Matching atwig query means finding all the occurrences of the query tree embedded in the XMLdata tree. Many holistic twig join algorithms have been proposed to match XMLtwig pattern. Most of these algorithms find twig pattern matching in two steps. Inthe first one, a query tree is decomposed into smaller pieces, and solutions againstthese pieces are found. In the second step, all of these partial solutions are joinedtogether to generate the final solutions. In this part, we propose a novel holistictwig join algorithm, called TwigStack++, which features two main improvementsin the decomposition and matching phase. The proposed solutions are shown to beefficient and scalable, and should be helpful for the future research on efficient queryprocessing in a large XML database.
97

Geração de rótulo de privacidade por palavras-chaves e casamento de padrões

Pontes, Diego Roberto Gonçalves de 13 July 2016 (has links)
Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2017-05-08T12:54:39Z No. of bitstreams: 1 DissDRGP.pdf: 2915023 bytes, checksum: 6dc48dd58772bd3d2917206ca9a92646 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-05-10T14:04:50Z (GMT) No. of bitstreams: 1 DissDRGP.pdf: 2915023 bytes, checksum: 6dc48dd58772bd3d2917206ca9a92646 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-05-10T14:04:57Z (GMT) No. of bitstreams: 1 DissDRGP.pdf: 2915023 bytes, checksum: 6dc48dd58772bd3d2917206ca9a92646 (MD5) / Made available in DSpace on 2017-05-10T14:09:36Z (GMT). No. of bitstreams: 1 DissDRGP.pdf: 2915023 bytes, checksum: 6dc48dd58772bd3d2917206ca9a92646 (MD5) Previous issue date: 2016-07-13 / Não recebi financiamento / Users do not usually read privacy policies from online services. Among the main reasons for that is the fact that such policies are long and commonly hard to understand, which makes the user lose interest in reading them carefully. In this scenario, users are prone to agree to the policies terms without knowing what kind of data is being collected and why. This dissertation discusses how the policies' content may be presented in a more friendly way, showing information about data collection and usage in a table herein called Privacy Label. The Privacy Label is a table with lines named according to data collection terms and columns named according to expressions that reveal how the data is used by the service. The table content shows if the policy collects a particular data to a particular usage. To generate the Privacy Label, a study was made in a set of privacy policies to identify which terms repeat more often along the texts. To do so, we used techniques to find keywords, and from these keywords we were able to create privacy categories. The categories define which kind of data is being collected and why, which are represented by cells in the Privacy Label. Using word comparison techniques, a privacy policy can be analyzed and important information can be extracted by comparing its terms with the terms from the privacy categories. For each category we find, we show it in the Privacy Label. To assess the proposed approach we developed an application prototype, herein called PPMark, that analyzes a particular privacy policy, extract its keywords and generates the Privacy Label automatically. The information extracted was analyzed regarding its quality using three metrics: precision, recall and f-measure. The results show that the approach is a viable functional alternative to generate the Privacy Label and present privacy policies in a friendly manner. There are evidences of time saving by using our approach, which facilitates the process of decision making. / Comumente, os usuários não leem as políticas de privacidade dos serviços online que utilizam. Entre as principais causas estão os textos longos, muitas vezes de difícil compreensão, desestimulando o interesse pela leitura atenciosa e integral. Neste cenário, os usuários, muitas vezes, concordam com os termos sem saber os tipos de dados que estão sendo coletados e o porquê. Esta dissertação discute como o conteúdo das políticas de privacidade pode ser apresentado de forma mais sintética para o usuário, com as informações sobre a coleta e a utilização dos dados sendo exibidas em uma tabela, denominada Rótulo de Privacidade. O Rótulo de Privacidade é uma tabela com linhas nomeadas por termos de coleta de dados e colunas nomeadas por expressões que denotam finalidade das coletas. O conteúdo da tabela informa se a política contempla a coleta de dados para a finalidade especificada. Para ser possível a geração do Rótulo de Privacidade, foi feito um estudo em um conjunto de políticas de privacidade para verificar quais termos mais se repetem nos textos. Para isto foram utilizadas técnicas para encontrar palavras-chave e com estas foram criadas categorias de privacidade. As categorias definem tipos de dados coletados e propósitos da coleta, que no Rótulo de Privacidade são representados pelas células da tabela. Utilizando técnicas de comparação de palavras, uma política de privacidade a ser lida pelo usuário pode ser analisada pela abordagem, extraindo informações importantes por meio das comparações de seus termos com os termos das categorias de privacidade elaboradas. Para cada categoria encontrada na política de privacidade, a informação é ilustrada no Rótulo de Privacidade. Para a avaliação da abordagem proposta, foi desenvolvido um protótipo de uma aplicação, denominada PPMark, que analisa uma particular política de privacidade, extrai as palavras-chave e gera o Rótulo de Privacidade de forma automatizada. As informações extraídas foram analisadas quanto à qualidade utilizandose três métricas que são empregadas para a avaliação de classificadores, sendo elas precisão, recall e f-measure. Os resultados mostraram que a abordagem proposta é uma alternativa funcional para o preenchimento do Rótulo de Privacidade e a apresentação das políticas de privacidade. Há evidências de economia de tempo com a leitura e entendimento das políticas, possibilitando suporte para a tomada de decisões.
98

Autour du lambda-calcul avec constructeurs / On the lambda calculus with constructors

Petit, Barbara 13 July 2011 (has links)
Le lambda calcul avec constructeurs (de Arbiser, Miquel et Rios) est une extension du lambda calcul avec un mécanisme de filtrage. Le filtrage à la ML y est décomposé en deux étapes: une analyse de cas sur des constantes (telle l'instruction «case» de Pascal), et une commutation de l'application avec la construction de filtrage. Cette règle de commutation entre deux constructions de natures différentes induit une géométrie de calcul surprenante, a priori incompatible avec les intuitions habituelles de typage. Cependant il a été montré que ce calcul est confluent, et vérifie la propriété de séparation (à la Böhm). Cette thèse propose un système de types du polymorphique pour ce calcul, et décrit ensuite un modèle de réalisabilité, qui adapte les candidats de réductibilité de Girard au lambda calcul avec constructeurs. La normalisation forte du calcul typé et l'absence d'erreur de filtrage lors de l'évaluation en découlent immédiatement. Nous nous intéressons ensuite à la sémantique du lambda calcul avec constructeurs non typé. Une notion générique de modèle catégorique pour ce calcul est définie, puis un modèle particulier (le modèle syntaxique dans la catégorie des PERs) est construit. Nous en déduisons un résultat de complétude. Enfin, nous proposons une traduction CPS du lambda calcul avec constructeurs dans le lambda calcul simplement typé avec paires. Le lambda calcul avec constructeurs peut ainsi être simulé dans un calcul bien connu, et cette traduction nous permet aussi de transformer tout modèle par continuation en modèle du lambda calcul avec constructeurs. Une équation catégorique caractéristique de ces modèles apparait alors, qui permet de construire des modèles non syntaxiques (dans les domaines) de Scott du lambda calcul avec constructeurs. / The lambda calculus with constructors was introduced by Arbiser, Miquel and Rios in the early 2000's as an extension of lambda calculus with pattern matching features. It decomposes the pattern matching à la ML into a case-analysis on constant constructors (in the spirit of the case instruction in Pascal), and a commutation rule between case construction and application. This commutation rule between two different kinds of constructions designs a surprising computational behaviour, a priori} not compatible with usual typing intuitions. However the whole calculus was proved confluent, and it enjoys the separation property (a version of Böhm's lemma).In this thesis we propose a polymorphic type system for this calculus, and we develop a realisability model, based on Girard's reducibility candidates. This leads to a strong normalisation result for the typed calculus, and guaranties that the type system prevents match failure. Next we focus on semantics for the untyped calculus. We first define a generic notion of models for the lambda calculus with constructors in Cartesian closed categories. We then establish the syntactic model in the category of PERs, and deduce a completeness result from it.Finally, we consider a translation of the lambda calculus with constructors into the pure lambda lambda calculus relying on continuation passing style techniques. This enables the simulation of the lambda calculus with constructors by a well known calculus, and provides a transformation of every continuation model into a model of the lambda calculus with constructors. Thereby a categorical equation characteristic of these models appears, which enables the construction of non syntactic models in Scott's domains.
99

Representações cache eficientes para índices baseados em Wavelet trees

SILVA, Israel Batista Freitas da 12 December 2016 (has links)
Submitted by Rafael Santana (rafael.silvasantana@ufpe.br) on 2017-08-30T19:22:34Z No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Israel Batista Freitas da Silva.pdf: 1433243 bytes, checksum: 5b1ac5501cae385e4811343e1426e6c9 (MD5) / Made available in DSpace on 2017-08-30T19:22:34Z (GMT). No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Israel Batista Freitas da Silva.pdf: 1433243 bytes, checksum: 5b1ac5501cae385e4811343e1426e6c9 (MD5) Previous issue date: 2016-12-12 / CNPQ, FACEPE. / Hoje em dia, há um exponencial crescimento do volume de informação no mundo. Esta explosão cria uma demanda por técnicas mais eficientes de indexação e consulta de dados, uma vez que, para serem úteis, eles precisarão ser manipuláveis. Casamento de padrões se refere à busca de um texto menor (padrão) em um texto muito maior (texto), reportando a quantidade de ocorrências e/ou as localizações das ocorrências. Para tal, pode-se construir uma estrutura chamada índice que pré-processará o texto e permitirá que consultas sejam feitas eficientemente. A eficiência prática de um índice, além da sua eficiência teórica, pode definir o quão utilizado ele será, e isto está diretamente ligado a como ele se comporta nas arquiteturas dos computadores atuais. O principal objetivo deste estudo é analisar o uso da estrutura Wavelet Tree como índice avaliando o impacto da reorganização interna dos seus dados quanto à localidade espacial e, assim propor formas de organização que reduzam efetivamente a quantidade de cache misses ocorridos na execução de operações neste índice. Através de análises empíricas com dados simulados e dados textuais obtidos de dois repositórios públicos, avaliou-se alguns aspectos de cinco tipos de organizações para os dados da estrutura com o objetivo de compará-las quanto ao tempo de execução e quantidade de cache misses ocorridos. Adicionalmente, uma análise teórica da complexidade da quantidade de cache misses ocorridos para operação de consulta de um padrão é descrita para uma das organizações propostas. Dois experimentos realizados sugerem comportamentos assintóticos para duas das organizações analisadas. Um terceiro experimento executado mostra que, para quatro das cinco organizações apresentadas, houve uma sistemática redução na quantidade de cache misses ocorridos para a cache de menor nível. Entretanto a redução de cache misses para cache de menor nível não se refletiu integralmente numa diferença no tempo de execução das operações, tendo sido esta menos significativa, nem na quantidade de cache misses ocorridos na cache de maior nível, onde houveram variações positivas e negativas.Os resultados obtidos permitem concluir que a escolha de uma representação adequada pode acarretar numa melhora significativa de utilização da cache. Diferentemente do modelo teórico, o custo de acesso à memória responde apenas por uma fração do tempo de computação das operações sobre as Wavelet Trees, pelo que a diminuição no número de cache misses não se traduziu integralmente no tempo de execução. No entanto, este fator pode ser crítico em situações mais extremas de utilização de memória. / Today, there is an exponential growth in the volume of information in the world. This increase creates the demand for more efficient indexing and querying techniques, since, to be useful, that data needs to be manageable. Pattern matching means searching for a string (pattern) in a much bigger string (text), reporting the number of occurrences and/or its locations. To do that, we need to build a data structure known as index. This structure will preprocess the text to allow for efficient queries. The adoption of an index depends heavily on its efficiency, and this is directly related to how well it performs on current machine architectures. The main objective of this work is to analyze the Wavelet Tree data structure as an index, assessing the impact of its internal organization with respect to spatial locality, and propose ways to organize its data as to reduce the amount of cache misses incurred by its operations. We performed an empirical analysis using both real and simulated textual data to compare the running time and cache behavior of Wavelet Trees using five different proposals of internal data layout. A theoretical analysis about the cache complexity of a query operation is also presented for the most efficient layout. Two experiments suggest good asymptotic behavior for two of the analyzed layouts. A third experiment shows that for four of the five layouts, there was a systematic reduction in the number of cache misses for the lowest level cache. Despite this, this reduction was not reflected in the runtime, neither in the performance for the highest level cache. The results obtained allow us to conclude that the choice of a suitable layout can lead to a significant improvement in cache usage. Unlike the theoretical model, however, the cost of memory access only accounts for a fraction of the operations’ computation time on the Wavelet Trees, so the decrease in the number of cache misses did not translate fully into gains in the execution time. However, this factor can still be critical in more extreme memory utilization situations.
100

Phonotactic Structures in Swedish : A Data-Driven Approach

Hultin, Felix January 2017 (has links)
Ever since Bengt Sigurd laid out the first comprehensive description of Swedish phonotactics in 1965, it has been the main point of reference within the field. This thesis attempts a new approach, by presenting a computational and statistical model of Swedish phonotactics, which can be built by any corpus of IPA phonetic script. The model is a weighted trie, represented as a finite state automaton, where states are phonemes linked by transitions in valid phoneme sequences, which adds the benefits of being probabilistic and expressible by regular languages. It was implemented using the Nordisk Språkteknologi (NST) pronunciation lexicon and was used to test against a couple of rulesets defined in Sigurd relating to initial two consonant clusters of phonemes and phoneme classes. The results largely agree with Sigurd's rules and illustrated the benefits of the model, in that it effectively can be used to pattern match against phonotactic information using regular expression-like syntax. / Ända sedan Bengt Sigurd lade fram den första övergripande beskrivningen av svensk fonotax 1965, så har den varit den främsta referenspunkten inom fältet. Detta examensarbete försöker sig på en ny infallsvinkel genom att presentera en beräkningsbar och statistisk modell av svensk fonotax som kan byggas med en korpus av fonetisk skrift i IPA. Modellen är en viktad trie, representerad som en ändlig automat, vilket har fördelarna av att vara probabilistisk och kunna beskrivas av reguljära språk. Den implementerades med hjälp av uttalslexikonet från Nordisk Språkteknologi (NST) och användes för att testa ett par regelgrupper av initiala två-konsonant kluster av fonem och fonemklasser definierad av Sigurd. Resultaten stämmer till större del överens med Sigurds regler och visar på fördelarna hos modellen, i att den effektivt kan användas för att matcha mönster av fonotaktisk information med hjälp av en liknande syntax för reguljära uttryck.

Page generated in 0.0693 seconds