Spelling suggestions: "subject:"wordclass"" "subject:"word:class""
1 |
A Probabilistic Tagging Module Based on Surface Pattern MatchingEklund, Robert January 1993 (has links)
A problem with automatic tagging and lexical analysis is that it is never 100 % accurate. In order to arrive at better figures, one needs to study the character of what is left untagged by automatic taggers. In this paper untagged residue outputted by the automatic analyser SWETWOL (Karlsson 1992) at Helsinki is studied. SWETWOL assigns tags to words in Swedish texts mainly through dictionary lookup. The contents of the untagged residue files are described and discussed, and possible ways of solving different problems are proposed. One method of tagging residual output is proposed and implemented: the left-stripping method, through which untagged words are bereaved their left-most letters, searched in a dictionary, and if found, tagged according to the information found in the said dictionary. If the stripped word is not found in the dictionary, a match is searched in ending lexica containing statistical information about word classes associated with that particular word form (i.e., final letter cluster, be this a grammatical suffix or not), and the relative frequency of each word class. If a match is found, the word is given graduated tagging according to the statistical information in the ending lexicon. If a match is not found, the word is stripped of what is now its left-most letter and is recursively searched in a dictionary and ending lexica (in that order). The ending lexica employed in this paper are retrieved from a reversed version of Nusvensk Frekvensordbok (Allén 1970), and contain endings of between one and seven letters. The contents of the ending lexica are to a certain degree described and discussed. The programs working according to the principles described are run on files of untagged residual output. Appendices include, among other things, LISP source code, untagged and tagged files, the ending lexica containing one and two letter endings and excerpts from ending lexica containing three to seven letters.
|
2 |
Um estudo da mudança de classe gramatical em unidades lexicais neológicas / A study of word class change in neological lexical units.Maroneze, Bruno Oliveira 22 March 2011 (has links)
A mudança de classe gramatical consiste na criação de uma unidade lexical em uma classe gramatical diferente da classe da base. Para efetuar essa criação, os falantes dispõem de diversos mecanismos, como a derivação sufixal (com diversos sufixos), a derivação parassintética, a derivação regressiva e a conversão. Nosso objetivo, no presente trabalho, é o de descrever tais mecanismos, procurando compreender por que motivo(s) os falantes criam novas unidades lexicais em classes gramaticais diferentes. Buscando a fundamentação teórica da Linguística Cognitiva, procuramos dividir nossa análise em duas perspectivas: a perspectiva onomasiológica, em que analisamos os mecanismos de criação lexical, e a perspectiva semasiológica, em que analisamos os mecanismos de interpretação de uma nova unidade lexical. Seguindo as ideias da Linguística Cognitiva, entendemos que as classes gramaticais devem ser consideradas categorias semânticas, e a mudança de classe, um processo de natureza basicamente semântica. Considerando apenas as classes gramaticais de natureza lexical, os seis tipos de mudança de classe possíveis em português são: adjetivo para substantivo, verbo para substantivo, substantivo para adjetivo, verbo para adjetivo, substantivo para verbo e adjetivo para verbo. Dessa forma, separamos 1.209 neologismos resultantes de mudança de classe gramatical integrantes da Base de neologismos do português brasileiro contemporâneo (que faz parte do Projeto TermNeo Observatório de Neologismos do Português Brasileiro Contemporâneo) e os classificamos em cada um dos seis tipos de mudança de classe. Para cada um dos tipos, analisamos onomasiologicamente os mecanismos de criação e, semasiologicamente, os mecanismos de interpretação desses neologismos. A derivação sufixal é o mecanismo mais empregado, com inúmeros sufixos produtivos no português contemporâneo, muitos deles polissêmicos; no entanto, a derivação parassintética na formação de verbos e a derivação regressiva na formação de substantivos abstratos também são mecanismos produtivos. Há alguns casos importantes de concorrência entre sufixos, como -ice e -(i)dade na mudança de adjetivo para substantivo e -ção e -mento na mudança de verbo para substantivo. Na análise da interpretação dos neologismos, a metonímia revelou-se um processo importante em quase todos os tipos de mudança de classe. Por fim, as análises parecem indicar que os falantes operam a mudança de classe gramatical com a finalidade de exprimir novos conceitos, não apenas por razões meramente morfossintáticas. / Word class change consists on the creation of a lexical unit in a word class different from the bases class. In order to do this, speakers have at their disposal many mechanisms, like suffixal derivation (with many different suffixes), parasynthetic derivation, regressive derivation and conversion. Our goal, in this study, is to describe such mechanisms, trying to figure out why speakers create new lexical units in different word classes. With the theoretical foundations of Cognitive Linguistics, we divide our analysis in two perspectives: the onomasiological perspective, in which we analyse the mechanisms of lexical creation, and the semasiological perspective, in which we analyse the mechanisms of interpretation of a new lexical unit. According to Cognitive Linguistics, we understand that word classes must be considered semantic categories, and word class change, a basically semantic process. Considering only the lexical word classes, the six possible types of word class change in Portuguese are: adjective to noun, verb to noun, noun to adjective, verb to adjective, noun to verb and adjective to verb. Therefore, we collected 1,209 neologisms resulting from a word class change process from the Base de neologismos do português brasileiro contemporâneo (Contemporary Brazilian Portuguese neologism database - part of Project TermNeo Contemporary Brazilian Portuguese Neologism Observer) and classified them into the six types of word class change. For each one of these types, we analysed onomasiologically the creation mechanisms and, semasiologically, the interpretation mechanisms of these neologisms. Suffixal derivation is the most employed mechanism, with many suffixes which are productive in contemporary Portuguese, many of them polysemic; however, parasynthetic derivation in verb creation and regressive derivation in the formation of abstract nouns are also productive mechanisms. There are some important cases of suffix competition, like -ice and -(i)dade in the change from adjective to noun and -ção and -mento in the change from verb to noun. In analyzing neologism interpretation, metonymy revealed itself an important process in almost all types of word class change. Finally, the analyses seem to show that speakers change word class in order to express new concepts, and not only for merely morphosyntactic reasons.
|
3 |
O estatuto conceitual e funcional das proformas. Pronome: protÃtipo das proformas. / The conceptual and functional status of proforms. Pronoun - the protype of proforms.Kilpatrick MÃller Bernardo Campelo 04 October 2007 (has links)
CoordenaÃÃo de AperfeiÃoamento de NÃvel Superior / RESUMO
Esta tese, predominantemente teÃrica, postula que os sistemas lingÃÃsticos naturais tendem a uma regularizaÃÃo demonstrÃvel por meio de formas, categorias, classes e funÃÃes prototÃpicas. As formas representativas de prototipicidade de propriedades categoriais ou de classes de palavra sÃo construÃdas por intermÃdio de eleiÃÃes dos usuÃrios de determinadas comunidades lingÃÃsticas. Essas opÃÃes assentam-se ou sedimentam-se com base na freqÃÃncia de uso. Quanto mais freqÃentemente uma forma à usada, maior a possibilidade de gramaticalizaÃÃo acentuada, com perda de massa fÃnica e morfologizaÃÃo, com repercussÃes atinentes ao seu estatuto categorial em relaÃÃo aos paradigmas da lÃngua. A codificaÃÃo gramatical de toda e qualquer propriedade categorial (nÃmero, pessoa, gÃnero, tempo, modo, voz, etc), assim como das classes de palavras Ã, em Ãltima anÃlise, construÃda com base no uso. A propositura fundamental desta tese à a reivindicaÃÃo de uma nova categoria, a proformalidade, com vistas a reconfigurar as classes, de tal sorte que a reordenaÃÃo contemple quatro macroclasses de palavras, a saber: nomes (substantivos, adjetivos, numerais); verbos; advÃrbios; e elementos relacionais (juntores preposicionais e conjuncionais). Essa categoria afeta igualmente as subclasses das referidas macroclasses e os morfemas intralexicais codificadores das aludidas propriedades categoriais, com a admissÃo de uma movimentaÃÃo interclasse e intraclasse decorrente da incidÃncia de processos de gramaticalizaÃÃo. O cabedal teÃrico à constituÃdo do confronto de modelos epistemolÃgicos, com a opÃÃo por um amÃlgama de teses aristotÃlicas e prototipistas; da exposiÃÃo da natureza dos processos de gramaticalizaÃÃo, com a admissÃo de que lÃxico e gramÃtica sÃo seÃÃes diferenciadas pelo estatuto de gramaticalidade; da admissÃo da hipÃtese evolucionÃria para explicar os movimentos de gramaticalizaÃÃo de codificaÃÃes de maior transparÃncia (ou concretude referencial) e funÃÃes exofÃricas para funÃÃes estritamente intralingÃÃsticas. Ao longo dessa exposiÃÃo teÃrica, que confronta teses tradicionais sem desconsiderar seu proveito relativo, anÃlises ilustrativas de amostras de uso concreto da lÃngua portuguesa (coligidas do www.corpusdoportugues.org e de outros sÃtios da internet) sÃo empreendidas com vistas a fundamentar minimamente a razÃo de ser da tese fundamental. Destarte, esta proposta de classificaÃÃo gradua as macroclasses de palavras em dois macrogrupos, denominados de pleriformas e proformas, os quais sÃo discrepados com base na manifestaÃÃo mais ou menos acentuada da categoria proformalidade. Essa categoria responde pela fusÃo de conceitos pragmÃticos, cognitivos e lingÃÃsticos para explicar a prototipicidade de formas de classes, subclasses e morfemas intralexicais como itens exemplares de seus respectivos paradigmas. Sua exemplaridade provÃm da conservaÃÃo de traÃos semÃnticos mÃnimos no interior de cada classe, subclasse ou paradigma mÃrfico intralexical, de tal modo que uma proforma pode desempenhar funÃÃo supletiva com freqÃÃncia majoritÃria, conquanto nÃo absoluta, ou representar prototipicamente todos os membros de sua classe, subclasse ou paradigma mÃrfico intralexical. A compreensÃo de que os processos de variaÃÃo e mudanÃa lingÃÃstica, em especial a gramaticalizaÃÃo, responde pela fluidez categorial nos levou a compor escalas de continua dentro das diversas macroclasses proformais, com vistas a exemplificar o trÃnsito interclasse e intraclasse com diferentes graus de gramaticalidade (observando-se para a avaliaÃÃo do estatuto de gramaticalidade, fatores de ordem mÃrfica, sintÃtica e semÃntica). Por outras palavras, a elaboraÃÃo das escalas tem por interesse ilustrar que o estatuto de gramaticalidade de pleri- e proformas disponÃveis para codificaÃÃo lingÃÃstica de toda ordem pode variar entre as classes, entre as subclasses de uma mesma classe, entre os morfemas intralexicais e entre funÃÃes sintÃtico-semÃnticas. Desse modo, no interior de cada macroclasse pleri- e proformal, de suas subclasses e de suas propriedades categoriais constitutivas, as formas apresentam estatutos de gramaticalidade variados, a depender de sua maior, menor ou mÃltipla filiaÃÃo, respectivamente, a macroclasses, a subclasses, ou a maior ou menor expressÃo morfologizada de uma propriedade categorial. As disputas, portanto, entre lÃxico e gramÃtica, condicionadas por fatores cognitivos e pragmÃticos, ocorrem entre as classes, as subclasses e os morfemas intralexicais codificadores de propriedades categoriais. Por fim, a tese presta um tributo à tradiÃÃo por ter, de um modo ou de outro, chamado atenÃÃo, ou intuÃdo, para a propensÃo de os sistemas lingÃÃsticos apresentarem, de modo periodicamente refundido, uma contraparte mais genÃrica de cada macroclasse, subclasse e morfemas intralexicais. Contudo, esse entendimento se refletiu ou se resumiu estrita e/ou principalmente à classe pronominal. Justifica-se, assim, a consideraÃÃo dos pronomes como os protÃtipos das proformas, ou seja, como seus exemplares tÃpicos ou melhores representantes. / This thesis, predominantly theoretical, postulates that the natural linguistic systems tend to regularization demonstrable through prototypical forms, categories, classes and functions. The representative forms of categorical properties prototypicality are built by means of user-determined choices of language communities. These choices settle down or sediment on the basis of usage frequency. The more often a given form is used, the greater the possibility of stressed grammaticalization, with the loss of phonic mass and the morphologicalization with germane repercussions to the categorical status in relationship to the language paradigms. The grammatical codification of each and every categorical property (number, person, size, gender, tense, mood, voice, and so on), as well as the word classes, is, in the last analysis, construed on the basis of the use. The fundamental proposition of this thesis is the claim for a new category, proformality, in pursuit of reconfiguring the word classes, in such a way the reordination considers four word macroclasses, to wit: nouns (substantives, adjectives, numerals); verbs; adverbs; and relational elements (prepositional and conjunctional connectors). This category affects not only the subclasses of the above mentioned macroclasses but also the intralexical morphemes codifying the referred categorical properties, with the admission of a interclass and intraclass moving as a consequence of the incidence of grammaticalization processes. The theoretical support is constituted by the confrontation of epistemological models, with the final option for an amalgam of Aristothelic and prototypical theses; by the exposition on the grammaticalization processes nature, with the assumption that lexicon and grammar are differentiated sections by virtue of the grammaticality status; by the admission of the evolutionary hypothesis to explain the movements of grammaticalization of codifications of bigger transparence (referential concreteness) and exophoric codifications to strictly intralinguistic functions. Along of that theoretical exposition, which confronts traditional theses without disregarding its relative profit, illustrative analyses of samples taken from the concrete use of Portuguese language (collected from www.corpusdoportugues.org and other internet sites) are undertaken aiming to found minimally the fundamental reason of this thesis. Thus, this proposition of classification gradates the word macroclasses in two macrogroups, named pleriforms and proforms, which are distinguished on the basis of the manifestation more or less stressed of proformality category. This category accounts for the fusion of pragmatic, cognitive and linguistic concepts in order to explain the prototypicality of forms related to classes, subclasses and intralexical morphemes as model items of their correspondent paradigms.Its exemplarity comes from the conservation of minimal semantic features at the bottom of each class, subclass and intralexical morphemic paradigm, in such a way a proform can perform a suppletive function with large-scale frequency, even though not absolute, or represent prototypically all of the members of its class, subclass or intralexical morphemic paradigm. The comprehension that the processes of variation and linguistic change, especially grammaticalization, accounts for categorical fluidity led us to produce scales of continua inside several proformal macroclasses, aiming to exemplify the movement inside the same class and among the different classes with different degrees of grammaticality ( to do that, one observes factors of morphological, syntactic and semantic factors). In other terms, the elaboration of scales aims to illustrate that the status of grammaticality of available pleri- and proforms to any kind of linguistic codification can vary among the classes, the subclasses of the same class, the intralexical morphemes and syntactic-semantic functions. So inside each pleri- and proformal macroclass, inside its subclasses and inside its constitutive categorical properties, forms present different levels of grammaticality on the basis of its greater, lesser or multiple membership, respectively, on macroclasses, subclasses, greater or lesser morphological expression of a categorical property. Therefore, the disputes among lexicon and grammar, conditioned by cognitive and pragmatic factors, occur inside the classes, the subclasses and the intralexical morphemes codifying categorical properties. Finally this thesis pays tribute to the tradition since it has paid attention, one way or the other, to the propension of linguistic systems to put forward, in a periodically remolded way, the more generic counterpart of each macroclass, subclass and intralexical morphemes. However, this understanding has been strictly or mainly reflected and subsumed to the so called pronominal class. That is the reason why it is justifiable the consideration that pronouns are the prototypes of proforms, that is, they are their typical exemplars or their better representatives.
|
4 |
O papel da nominalização no continuum categorial /Camacho, Roberto Gomes. January 2009 (has links)
Resumo: A continuidade categorial é uma propriedade indiscutível da linguagem para a tradição funcionalista, que a trata como um verdadeiro universal linguístico. Além de buscar evidência sistemática para a comprovação desse axioma, o principal objetivo deste trabalho é analisar a estrutura argumental da nominalização, num esforço concentrado por demonstrar que esse mesmo princípio universal é metodologicamente útil e teoricamente válido para postular relações intralinguísticas de continuidade categorial mesmo entre classes aparentemente discretas como as de substantivo e verbo. A trajetória percorrida para a sustentação da hipótese da continuidade categorial passa necessariamente pela comprovação de uma hipótese secundária, a de preservação de valência, postulada por Dik (1985; 1997), segundo a qual a estrutura argumental é parte constitutiva da nominalização. Essa busca não teria êxito se a trajetória percorrida não utilizasse um atalho necessário, representado pela teoria prototípica de categorização. De fato, postular a existência de categorias intermediárias, como a de nominalização, implica necessariamente a existência de membros mais prototípicos de uma categoria. A existência de estrutura argumental, que sinaliza a representação de entidades de ordem superior, permite aproximar a nominalização de membros não-prototípicos da categoria dos verbos como formas não-finitas, enquanto a ausência de estrutura argumental, que sinaliza a representação de uma entidade de primeira ordem, permitiu aproximá-lo de membros prototípicos da categoria dos substantivos / Abstract: Category continuity is an undisputable language property for the functionalist tradition, which treats this principle as a true axiom. Besides seeking systematic evidence for confirming this principle, the main objective of this study is to analyze the argument structure of nominalization as an effort to demonstrate that this very principle is both methodologically useful and theoretically valid to postulate intralinguistic relations of category continuity even between such apparently discrete word classes as nouns and verbs. The path for giving support to the category continuity hypothesis necessarily involves confirming a secondary one, that is, the valence preservation hypothesis, as postulated by Dik (1985, 1997), in which the argument structure is a constituent part of nominalization. However, that search would not be so successful if the path did not pass by a necessary shortcut, represented by the prototypical theory of categorization. In fact, to postulate the existence of intermediate categories, such as nominalization, necessarily imply the existence of more prototypical members of that category. The existence of argument structure, which indicates the representation of higher-order entities, allows inserting the nominalization into such nonprototypical members of verbs as non-finite forms, while the absence of argument structure, which indicates the representation of a first-order entity, allows inserting it into the prototypical members of nouns
|
5 |
Učivo o slovních druzích a učivo slovotvorné na 1. stupni základní školy / Kinds of words and vocabulary at elementary schoolMATOUŠKOVÁ, Lucie January 2007 (has links)
The thesis deals with questions of word class and word-forming subject matter in the school teaching of the Czech language at the infant school. It focuses on both course books analysis of three various publishers from the monitored thematic point of view and insight into class practice of the current school. The theoretical part of the thesis specializes in questions from the scientific point of view, the formal part solves problems of existing subject matter at the infant school and gives analysis of the course books of the Czech language for the infant school. The principal of the thesis is a research of knowledge associated with classifying of word class in both 5th and 6th class of the basic school.
|
6 |
Um estudo da mudança de classe gramatical em unidades lexicais neológicas / A study of word class change in neological lexical units.Bruno Oliveira Maroneze 22 March 2011 (has links)
A mudança de classe gramatical consiste na criação de uma unidade lexical em uma classe gramatical diferente da classe da base. Para efetuar essa criação, os falantes dispõem de diversos mecanismos, como a derivação sufixal (com diversos sufixos), a derivação parassintética, a derivação regressiva e a conversão. Nosso objetivo, no presente trabalho, é o de descrever tais mecanismos, procurando compreender por que motivo(s) os falantes criam novas unidades lexicais em classes gramaticais diferentes. Buscando a fundamentação teórica da Linguística Cognitiva, procuramos dividir nossa análise em duas perspectivas: a perspectiva onomasiológica, em que analisamos os mecanismos de criação lexical, e a perspectiva semasiológica, em que analisamos os mecanismos de interpretação de uma nova unidade lexical. Seguindo as ideias da Linguística Cognitiva, entendemos que as classes gramaticais devem ser consideradas categorias semânticas, e a mudança de classe, um processo de natureza basicamente semântica. Considerando apenas as classes gramaticais de natureza lexical, os seis tipos de mudança de classe possíveis em português são: adjetivo para substantivo, verbo para substantivo, substantivo para adjetivo, verbo para adjetivo, substantivo para verbo e adjetivo para verbo. Dessa forma, separamos 1.209 neologismos resultantes de mudança de classe gramatical integrantes da Base de neologismos do português brasileiro contemporâneo (que faz parte do Projeto TermNeo Observatório de Neologismos do Português Brasileiro Contemporâneo) e os classificamos em cada um dos seis tipos de mudança de classe. Para cada um dos tipos, analisamos onomasiologicamente os mecanismos de criação e, semasiologicamente, os mecanismos de interpretação desses neologismos. A derivação sufixal é o mecanismo mais empregado, com inúmeros sufixos produtivos no português contemporâneo, muitos deles polissêmicos; no entanto, a derivação parassintética na formação de verbos e a derivação regressiva na formação de substantivos abstratos também são mecanismos produtivos. Há alguns casos importantes de concorrência entre sufixos, como -ice e -(i)dade na mudança de adjetivo para substantivo e -ção e -mento na mudança de verbo para substantivo. Na análise da interpretação dos neologismos, a metonímia revelou-se um processo importante em quase todos os tipos de mudança de classe. Por fim, as análises parecem indicar que os falantes operam a mudança de classe gramatical com a finalidade de exprimir novos conceitos, não apenas por razões meramente morfossintáticas. / Word class change consists on the creation of a lexical unit in a word class different from the bases class. In order to do this, speakers have at their disposal many mechanisms, like suffixal derivation (with many different suffixes), parasynthetic derivation, regressive derivation and conversion. Our goal, in this study, is to describe such mechanisms, trying to figure out why speakers create new lexical units in different word classes. With the theoretical foundations of Cognitive Linguistics, we divide our analysis in two perspectives: the onomasiological perspective, in which we analyse the mechanisms of lexical creation, and the semasiological perspective, in which we analyse the mechanisms of interpretation of a new lexical unit. According to Cognitive Linguistics, we understand that word classes must be considered semantic categories, and word class change, a basically semantic process. Considering only the lexical word classes, the six possible types of word class change in Portuguese are: adjective to noun, verb to noun, noun to adjective, verb to adjective, noun to verb and adjective to verb. Therefore, we collected 1,209 neologisms resulting from a word class change process from the Base de neologismos do português brasileiro contemporâneo (Contemporary Brazilian Portuguese neologism database - part of Project TermNeo Contemporary Brazilian Portuguese Neologism Observer) and classified them into the six types of word class change. For each one of these types, we analysed onomasiologically the creation mechanisms and, semasiologically, the interpretation mechanisms of these neologisms. Suffixal derivation is the most employed mechanism, with many suffixes which are productive in contemporary Portuguese, many of them polysemic; however, parasynthetic derivation in verb creation and regressive derivation in the formation of abstract nouns are also productive mechanisms. There are some important cases of suffix competition, like -ice and -(i)dade in the change from adjective to noun and -ção and -mento in the change from verb to noun. In analyzing neologism interpretation, metonymy revealed itself an important process in almost all types of word class change. Finally, the analyses seem to show that speakers change word class in order to express new concepts, and not only for merely morphosyntactic reasons.
|
7 |
Automatic Induction of Word Classes in Swedish Sign LanguageSjons, Johan January 2013 (has links)
Identifying word classes is an important part of describing a language. Research about sign languages often lack distinctions crucial for identifying word classes, e.g. the difference between sign and gesture. Additionally, sign languages typically lack written form, something that often constrains quantitative research on sign language to the use of glosses translated to the spoken language in the area. In this thesis, such glosses have been extracted from The Swedish Sign Language Corpus. The glosses were mapped to utterances based on Swedish translations in the corpus, and these utterances served as input data to a word space model, producing a co-occurence matrix. This matrix was clustered with the K-means algorithm. The extracted utterances were also clustered with the Brown algorithm. By using V-measure, the clusters were compared to a gold standard annotated manually with word classes. The Brown algorithm performs significantly better in inducing word classes than a random baseline. This work shows that utilizing unsupervised learning is a feasible approach for doing research on word classes in Swedish Sign Language. However, future studies of this kind should employ a deeper linguistic analysis of the language as a part of choosing the algorithms.
|
8 |
Swedish Learners of English and Their Use of Linguistic MetaphorAhlin, Angelica January 2022 (has links)
This study investigates linguistic metaphors used by Swedish learners of English in upper secondary school. The aim is to provide a measure of the amount and distribution of metaphor in learner English, with the secondary aim of evaluating the method. 24 essays at two different proficiency levels were analyzed using the Metaphor Identification Procedure Vrije Universiteit (MIPVU), a method developed by Steen and his colleagues in 2010, which has since become a popular method for identifying metaphor. The findings are in accordance with previous research and indicate increased metaphor density with higher proficiency levels. The results also show that metaphor is not evenly distributed among word classes: the word classes prepositions and verbs were found to exhibit the highest proportions of metaphor, whereas e.g. adverbs exhibited very few metaphor-related words. MIPVU was found to be a reliable and useful method even for learner English, despite not being created for this purpose.
|
9 |
Word Classes in Language ModellingErikson, Emrik, Åström, Marcus January 2024 (has links)
This thesis concerns itself with word classes and their application to language modelling.Considering a purely statistical Markov model trained on sequences of word classes in theSwedish language different problems in language engineering are examined. Problemsconsidered are part-of-speech tagging, evaluating text modifiers such as translators withthe help of probability measurements and matrix norms, and lastly detecting differenttypes of text using the Fourier transform of cross entropy sequences of word classes.The results show that the word class language model is quite weak by itself but that itis able to improve part-of-speech tagging for 1 and 2 letter models. There are indicationsthat a stronger word class model could aid 3-letter and potentially even stronger models.For evaluating modifiers the model is often able to distinguish between shuffled andsometimes translated text as well as to assign a score as to how much a text has beenmodified. Future work on this should however take better care to ensure large enoughtest data. The results from the Fourier approach indicate that a Fourier analysis of thecross entropy sequence between word classes may allow the model to distinguish betweenA.I. generated text as well as translated text from human written text. Future work onmachine learning word class models could be carried out to get further insights into therole of word class models in modern applications. The results could also give interestinginsights in linguistic research regarding word classes.
|
10 |
Finns inte på kartan : Att nå fram till ord som inte finns med på bliss standardkarta med hjälp av enbart bliss standardkarta / Out of Reach : To Reach Words that are not on the Bliss Standard Chart, by Using Only Bliss Standard ChartWimnell, Rebecca, Ölmestig, Carin January 2010 (has links)
<p>Syftet med föreliggande studie var att undersöka vilka ordtyper som är lätta respektive svåraatt nå fram till med bliss standardkarta och vilka strategier som är mer respektive mindreeffektiva. Deltagarna, 24 kvinnliga studenter utan funktionshinder, delades upp i par. Den enai paret fick i uppgift att förklara 12 målord, som inte finns med på bliss standardkarta, enbartgenom att använda bliss standardkarta. Den andra i paret fick i uppgift att gissa vilkamålorden var. Eftersom tidigare studier indikerat att ordklass, frekvensnivå ochabstraktionsnivå kan påverka ords svårighetsgrad, valdes målorden i föreliggande studieutifrån dessa variabler. Resultatet visade att målordens ordklass inte påverkade derassvårighetsgrad. Målordens frekvensnivå påverkade svårighetsgraden i viss utsträckning.Abstraktionsnivå var den variabel som påverkade svårighetsgraden mest. Eftersom deabstrakta och ovanliga målorden var svårast, kan det vara klokt att inkludera de ordtyperna påblisskartan. Några strategier som gynnade kommunikationen var när blissaren använde syntaxoch syntaktiskt prompting samt när gissaren gav blissaren tid att avsluta sina fraser. Dessastrategier kan vara lämpliga att rekommendera till blissanvändare och deras samtalspartners.</p> / <p>The aim of this study was to investigate which types of words that are easy and difficult toreach with Bliss Swedish standard chart and also which strategies that is more and lessefficient. The participants, 24 female students with no functional limitations, were grouped inpairs. One in each pair was given the task to explain 12 target words that is not present onBliss standard chart, by using only Bliss standard chart. The other person in each pair wasgiven the task to guess which words that were asked for. Since former studies have indicatedthat word class, level of frequency and level of abstraction can affect words' degree ofdifficulty, the target words in this study were chosen based upon those factors. The resultsdemonstrated that the word class of the target words did not affect their degree of difficulty.The frequency of the target words affected their degree of difficulty in some ways. The levelof abstraction of the target words was the factor that affected the degree of difficulty the most.Since the abstract and infrequent target words were most difficult, it may be a good idea toinclude those types of words on the Bliss chart. Some of the strategies that were beneficial forcommunication were the blisser’s use of syntax and syntactic prompting. Another strategy thatwas beneficial for the communication was giving the blisser enough time to finish herphrases. It may be appropriate to recommend those strategies to Bliss users and theircommunication partners.</p>
|
Page generated in 0.6978 seconds