Spelling suggestions: "subject:"computer linguistics"" "subject:"coomputer linguistics""
1 |
Disfluency in Swedish human–human and human–machine travel booking dialoguesEklund, Robert January 2004 (has links)
This thesis studies disfluency in spontaneous Swedish speech, i.e., the occurrence of hesitation phenomena like eh, öh, truncated words, repetitions and repairs, mispronunciations, truncated words and so on. The thesis is divided into three parts: PART I provides the background, both concerning scientific, personal and industrial–academic aspects in the Tuning in quotes, and the Preamble and Introduction (chapter 1). PART II consists of one chapter only, chapter 2, which dives into the etiology of disfluency. Consequently it describes previous research on disfluencies, also including areas that are not the main focus of the present tome, like stuttering, psychotherapy, philosophy, neurology, discourse perspectives, speech production, application-driven perspectives, cognitive aspects, and so on. A discussion on terminology and definitions is also provided. The goal of this chapter is to provide as broad a picture as possible of the phenomenon of disfluency, and how all those different and varying perspectives are related to each other. PART III describes the linguistic data studied and analyzed in this thesis, with the following structure: Chapter 3 describes how the speech data were collected, and for what reason. Sum totals of the data and the post-processing method are also described. Chapter 4 describes how the data were transcribed, annotated and analyzed. The labeling method is described in detail, as is the method employed to do frequency counts. Chapter 5 presents the analysis and results for all different categories of disfluencies. Besides general frequency and distribution of the different types of disfluencies, both inter- and intra-corpus results are presented, as are co-occurrences of different types of disfluencies. Also, inter- and intra-speaker differences are discussed. Chapter 6 discusses the results, mainly in light of previous research. Reasons for the observed frequencies and distribution are proposed, as are their relation to language typology, as well as syntactic, morphological and phonetic reasons for the observed phenomena. Future work is also envisaged, both work that is possible on the present data set, work that is possible on the present data set given extended labeling and work that I think should be carried out, but where the present data set fails, in one way or another, to meet the requirements of such studies. Appendices 1–4 list the sum total of all data analyzed in this thesis (apart from Tok Pisin data). Appendix 5 provides an example of a full human–computer dialogue. / The electronic version of the printed dissertation is a corrected version where typos as well as phrases have been corrected. A list with the corrections is presented in the errata list above.
|
2 |
Reëlgebaseerde klemtoontoekenning in 'n grafeem-na-foneemstelsel vir Afrikaans / E.W. MoutonMouton, Elsie Wilhelmina January 2010 (has links)
Text -to-speech systems currently are of great importance in the community. One core technology in this human language technology resource is stress assignment which plays an important role in any text-to-speech system. At present no automatic stress assigner for Afrikaans exists. For these reasons, the two most important aims of this project will be: a) to develop a complete and accurate set of stress rules for Afrikaans that can be implemented in an automatic stress assigner, and b) to develop an effective and highly accurate stress assigner in order to assign Afrikaans stress to words quickly and effectively. A set of stress rules for Afrikaans was developed in order to reach the first goal. It consists of 18 rules that are divided into groups for words that contain a schwa, derivations, and disyllabic, tri-syllabic and polysyllabic simplex words.
Next, different approaches that can be used to develop a stress assigner were examined, and the rule-based approach was used to implement the developed stress rules within the stress assigner. The programming language, Perl, was chosen for the implementation of the rules. The chosen algorithm was used to generate a stress assigner for Afrikaans by implementing the stress rules developed. The hyphenator, Calomo and the compound analyser, CKarma was used to hyphenate all the test data and detect word boundaries within compounds. A dataset of 10 000 correctly annotated tokens was developed during the testing process. The evaluation of the stress assigner consists of four phases. During the first phase, the stress assigner was evaluated with the 10 000 tokens and achieved an accuracy of 92.09%. The grapheme - to-phoneme converter was evaluated with the same data and scored 91.9%. The influence of various factors on stress assignment was determined, and it was established that stress assignment is an essential component of rule-based grapheme-to-phoneme conversion.
In conclusion, it can be said that the stress assigner achieved satisfactory results, and that the stress assigner can be successfully utilized in future projects to develop training data for further experiments with stress assignment and grapheme-to-phoneme conversion for Afrikaans. Experiments can be conducted in future with data-driven approaches that possibly may lead to better results in Afrikaans stress assignment and grapheme-to-phoneme conversion. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2010.
|
3 |
Reëlgebaseerde klemtoontoekenning in 'n grafeem-na-foneemstelsel vir Afrikaans / E.W. MoutonMouton, Elsie Wilhelmina January 2010 (has links)
Text -to-speech systems currently are of great importance in the community. One core technology in this human language technology resource is stress assignment which plays an important role in any text-to-speech system. At present no automatic stress assigner for Afrikaans exists. For these reasons, the two most important aims of this project will be: a) to develop a complete and accurate set of stress rules for Afrikaans that can be implemented in an automatic stress assigner, and b) to develop an effective and highly accurate stress assigner in order to assign Afrikaans stress to words quickly and effectively. A set of stress rules for Afrikaans was developed in order to reach the first goal. It consists of 18 rules that are divided into groups for words that contain a schwa, derivations, and disyllabic, tri-syllabic and polysyllabic simplex words.
Next, different approaches that can be used to develop a stress assigner were examined, and the rule-based approach was used to implement the developed stress rules within the stress assigner. The programming language, Perl, was chosen for the implementation of the rules. The chosen algorithm was used to generate a stress assigner for Afrikaans by implementing the stress rules developed. The hyphenator, Calomo and the compound analyser, CKarma was used to hyphenate all the test data and detect word boundaries within compounds. A dataset of 10 000 correctly annotated tokens was developed during the testing process. The evaluation of the stress assigner consists of four phases. During the first phase, the stress assigner was evaluated with the 10 000 tokens and achieved an accuracy of 92.09%. The grapheme - to-phoneme converter was evaluated with the same data and scored 91.9%. The influence of various factors on stress assignment was determined, and it was established that stress assignment is an essential component of rule-based grapheme-to-phoneme conversion.
In conclusion, it can be said that the stress assigner achieved satisfactory results, and that the stress assigner can be successfully utilized in future projects to develop training data for further experiments with stress assignment and grapheme-to-phoneme conversion for Afrikaans. Experiments can be conducted in future with data-driven approaches that possibly may lead to better results in Afrikaans stress assignment and grapheme-to-phoneme conversion. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2010.
|
4 |
Public Sentiment on Twitter and Stock Performance : A Study in Natural Language Processing / Allmänna sentimentet på Twitter och aktiemarknaden : En studie i språkteknologiHenriksson, Jimmy, Hultberg, Carl January 2019 (has links)
Since recent years, the use of non-traditional data sources by hedge funds in order to support investment decisions has increased. One of the data sources which has increased most is social media and it has become popular to analyze the public opinion with help of sentiment analysis in order to predict the performance of a company. In order to evaluate the public opinion one need big sets of Twitter data. The Twitter data was collected by streaming the Twitter feed and the stock data was collected from a Bloomberg Terminal. The aim of this study was to examine if there is a correlation between the public opinion of a stock and the stock price, and also what affects this relationship. While such a relationship cannot be established in general, we are able to show that if the data quality is good, there is a high correlation between the public opinion and stock price, and that significant events surrounding the company results in a higher correlation during that period. / De senaste åren har användandet av icke-traditionella datakällor ökat av hedgefonder för att ta investeringsbeslut. En av datakällorna som blivit populära är sociala medier och det har blivit vanligt att analysera folkopinionen med hjälp av sentimentanalys för att kunna förutspå ett företags resultat. För att analysera folkopinionen krävdes stora mängder Twitterdata. Twitter-datan hämtades genom att strömma Twitter-flödet och aktiedatan hämtades från en Bloomberg Terminal. Målet med studien var att undersöka ifall det finns en korrelation mellan folkopinionen av en aktie och aktiens prisutveckling, och även vad som påverkar denna relationen. Även om en sådan relation inte kan fastställas i allmänhet så kan vi visa att om datakvaliten är god, så finns det en hög korrelation mellan folkopinionen och aktiepriset, samt att vid betydande händelser som rör företaget, så resultar det i en hög korrelation under den perioden.
|
Page generated in 0.1018 seconds