• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 2
  • 1
  • Tagged with
  • 17
  • 17
  • 9
  • 8
  • 8
  • 7
  • 7
  • 6
  • 6
  • 6
  • 5
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Outomatiese genreklassifikasie vir hulpbronskaars tale / Dirk Snyman

Snyman, Dirk Petrus January 2012 (has links)
When working in the terrain of text processing, metadata about a particular text plays an important role. Metadata is often generated using automatic text classification systems which classifies a text into one or more predefined classes or categories based on its contents. One of the dimensions by which a text can be can be classified, is the genre of a text. In this study the development of an automatic genre classification system in a resource scarce environment is postulated. This study aims to: i) investigate the techniques and approaches that are generally used for automatic genre classification systems, and identify the best approach for Afrikaans (a resource scarce language), ii) transfer this approach to other indigenous South African resource scarce languages, and iii) investigate the effectiveness of technology recycling for closely related languages in a resource scarce environment. To achieve the first goal, five machine learning approaches were identified from the literature that are generally used for text classification, together with five common approaches to feature extraction. Two different approaches to the identification of genre classes are presented. The machine learning-, feature extraction- and genre class identification approaches were used in a series of experiments to identify the best approach for genre classification for a resource scarce language. The best combination is identified as the multinomial naïve Bayes algorithm, using a bag of words approach as features to classify texts into three abstract classes. This results in an f-score (performance measure) of 0.929 and it was subsequently shown that this approach can be successfully applied to other indigenous South African languages. To investigate the viability of technology recycling for genre classification systems for closely related languages, Dutch test data was classified using an Afrikaans genre classification system and it is shown that this approach works well. A pre-processing step was implemented by using a machine translation system to increase the compatibility between Afrikaans and Dutch by translating the Dutch texts before classification. This results in an f-score of 0.577, indicating that technology recycling between closely related languages has merit. This approach can be used to promote and fast track the development of genre classification systems in a resource scarce environment. / MA (Linguistics and Literary Theory), North-West University, Potchefstroom Campus, 2013
12

Detección de información engañosa mediante Tecnologías del Lenguaje Humano e Inteligencia Artificial

Sepúlveda-Torres, Robiert 18 March 2022 (has links)
En los últimos años, el consumo de noticias en medios impresos ha sido sustituido en gran medida por el acceso a estas en variados formatos a través de medios digitales y redes sociales. Los bajos costes de acceso a la información y la profusión de las plataformas de comunicación y dispositivos móviles han producido un cambio en los hábitos de consumo de información, la que es recibida desde múltiples fuentes y replicada con inmediatez en un ambiente global. En este contexto, se ha incrementado la desinformación, un problema originado en los albores de la prensa tradicional. En la última década, la desinformación ha alcanzado una escala inmanejable debido al gran volumen de información al que un ciudadano común está expuesto cada día. A esto se suma que la mayoría de estos medios digitales no son arbitrados, y permiten publicar y compartir cualquier tipo de información. En este ambiente es muy probable la proliferación de información engañosa que, en la mayoría de los casos, pretende influir en la opinión pública para perseguir un objetivo económico, social o político subyacente. Esto puede perjudicar a las organizaciones, a las marcas y a las personas, entre otros, derivando en muchas ocasiones en conclusiones precipitadas por parte de los usuarios que la consumen. En este contexto surge el término de la posverdad como una tendencia a priorizar la subjetividad de una interpretación a la verificación de hechos reales. El titular de una noticia está diseñado para resumir sucintamente su contenido, proporcionando al lector una comprensión clara de la misma. Desafortunadamente, en la era de la posverdad, los titulares están más enfocados en atraer la atención del lector que en presentar con precisión el contenido de la noticia. Esto abre una enorme oportunidad para difundir desinformación con la construcción de titulares falsos o distorsionados. Las técnicas tradicionales de verificación de hechos realizadas por humanos son definitivamente impracticables y obsoletas ante la cantidad de textos informativos que se generan incluso cada hora. En este trabajo se abordan soluciones novedosas utilizando Tecnologías de Lenguaje Humano (TLH) y técnicas de Inteligencia Artificial (IA). Esta investigación se ha desarrollado en un área donde se intersecan confusamente diferentes conceptos, herramientas y aproximaciones. Se parte de una ubicación en el estado del arte acerca de las principales soluciones relacionadas con la detección de titulares engañosos, detección de posturas, detección de contradicciones, interrelación entre estos elementos y verificación automática de hechos. A partir del problema enunciado y sus conceptos, se profundiza en diferentes estrategias de solución con la aspiración de proponer una aproximación que permita, con un enfoque suficientemente práctico, aportar a la detección de información engañosa en medios digitales lo que puede convertirse en una herramienta de alerta en el complejo ambiente antes descrito. Entre los elementos considerados, se valora la utilización de ML y de DL como técnicas tradicionales de trabajo en el espacio de estas soluciones, así como sus alcances y limitaciones. Además, se introduce la idea de sustituir el contenido de una noticia por un resumen suficientemente esencial y obtenido de manera automática. La memoria presenta de manera lógica el curso de la investigación que parte de lo conceptual y utiliza el pensamiento deductivo y experimental para alcanzar generalizaciones y aplicarlas deductivamente a la solución de problemas específicos. Con ello, se abordan determinadas tareas que pueden contribuir parcialmente a la solución de parte del problema planteado, se diseñan experimentos y se especifica la solución en el ámbito del idioma español donde no se reportan aportaciones similares. Se propone una arquitectura flexible para la detección de titulares engañosos que ha permitido implementar sobre ella dos prototipos cuyos resultados experimentales y documentados suponen un paso de avance hacia la automatización de esta tarea. Esta arquitectura alcanza resultados notables al ser aplicada sobre dos conjuntos de datos en idioma inglés y español. Siguiendo los principios y las experiencias adquiridas se presenta una aplicación de una arquitectura similar para la detección de noticias falsas, lo que hace presumir su posible generalidad. / Esta tesis ha sido financiada por la Generalitat Valenciana a través del proyecto “SIIA: Tecnologías del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible” (PROMETEU/2018/089); y por FEDER/Ministerio de Ciencia e Innovación - Agencia Estatal de Investigación a través del proyecto “LIVINGLANG: Modelado del comportamiento de entidades digitales mediante tecnologías del lenguaje humano” (RTI2018-094653-B-C21 / C22).
13

Concept of Operations (CONOPS) for foreign language and speech translation technologies in a coalition military environment

Marshall, Susan LaVonne 03 1900 (has links)
Approved for public release, distribution is unlimited / This thesis presents Concept of Operations (CONOPS) for two specific automated language translation (ALT) devices, the P2 Phraselator and the Voice Response Translator (VRT). The CONOPS for each device are written as Appendix A and Appendix B respectively. The body of the thesis presents a broad introduction to the present state of ALT technology for the reader who is new to the general subject. It pursues this goal by introducing the human language translation problem followed by nine characteristic descriptors of ALT technology devices to provide a basic comparison framework of existing technologies. The premise is that ALT technology is presently in a state where it is tackled incrementally with various approaches. Two tables are provided that illustrate six commercially available devices using the descriptors. A scenario is then described in which the author observed the two subject ALT devices (depicted in the CONOPS in the Appendices) being employed within an international military exercise. Some unique human observations associated with the use of these devices in the exercise are discussed. A summary is provided of the Department of Defense (DOD) process that is exploring ALT technology devices, specifically the Language and Speech Exploitation Resources (LASER) Advanced Concept Technology Demonstration ACTD. / Lieutenant Commander, United States Navy
14

Enkele tegnieke vir die ontwikkeling en benutting van etiketteringhulpbronne vir hulpbronskaars tale / A.C. Griebenow

Griebenow, Annick January 2015 (has links)
Because the development of resources in any language is an expensive process, many languages, including the indigenous languages of South Africa, can be classified as being resource scarce, or lacking in tagging resources. This study investigates and applies techniques and methodologies for optimising the use of available resources and improving the accuracy of a tagger using Afrikaans as resource-scarce language and aims to i) determine whether combination techniques can be effectively applied to improve the accuracy of a tagger for Afrikaans, and ii) determine whether structural semi-supervised learning can be effectively applied to improve the accuracy of a supervised learning tagger for Afrikaans. In order to realise the first aim, existing methodologies for combining classification algorithms are investigated. Four taggers, trained using MBT, SVMlight, MXPOST and TnT respectively, are then combined into a combination tagger using weighted voting. Weights are calculated by means of total precision, tag precision and a combination of precision and recall. Although the combination of taggers does not consistently lead to an error rate reduction with regard to the baseline, it manages to achieve an error rate reduction of up to 18.48% in some cases. In order to realise the second aim, existing semi-supervised learning algorithms, with specific focus on structural semi-supervised learning, are investigated. Structural semi-supervised learning is implemented by means of the SVD-ASO-algorithm, which attempts to extract the shared structure of untagged data using auxiliary problems before training a tagger. The use of untagged data during the training of a tagger leads to an error rate reduction with regard to the baseline of 1.67%. Even though the error rate reduction does not prove to be statistically significant in all cases, the results show that it is possible to improve the accuracy in some cases. / MSc (Computer Science), North-West University, Potchefstroom Campus, 2015
15

Enkele tegnieke vir die ontwikkeling en benutting van etiketteringhulpbronne vir hulpbronskaars tale / A.C. Griebenow

Griebenow, Annick January 2015 (has links)
Because the development of resources in any language is an expensive process, many languages, including the indigenous languages of South Africa, can be classified as being resource scarce, or lacking in tagging resources. This study investigates and applies techniques and methodologies for optimising the use of available resources and improving the accuracy of a tagger using Afrikaans as resource-scarce language and aims to i) determine whether combination techniques can be effectively applied to improve the accuracy of a tagger for Afrikaans, and ii) determine whether structural semi-supervised learning can be effectively applied to improve the accuracy of a supervised learning tagger for Afrikaans. In order to realise the first aim, existing methodologies for combining classification algorithms are investigated. Four taggers, trained using MBT, SVMlight, MXPOST and TnT respectively, are then combined into a combination tagger using weighted voting. Weights are calculated by means of total precision, tag precision and a combination of precision and recall. Although the combination of taggers does not consistently lead to an error rate reduction with regard to the baseline, it manages to achieve an error rate reduction of up to 18.48% in some cases. In order to realise the second aim, existing semi-supervised learning algorithms, with specific focus on structural semi-supervised learning, are investigated. Structural semi-supervised learning is implemented by means of the SVD-ASO-algorithm, which attempts to extract the shared structure of untagged data using auxiliary problems before training a tagger. The use of untagged data during the training of a tagger leads to an error rate reduction with regard to the baseline of 1.67%. Even though the error rate reduction does not prove to be statistically significant in all cases, the results show that it is possible to improve the accuracy in some cases. / MSc (Computer Science), North-West University, Potchefstroom Campus, 2015
16

Spoken language identification in resource-scarce environments

Peche, Marius 24 August 2010 (has links)
South Africa has eleven official languages, ten of which are considered “resource-scarce”. For these languages, even basic linguistic resources required for the development of speech technology systems can be difficult or impossible to obtain. In this thesis, the process of developing Spoken Language Identification (S-LID) systems in resource-scarce environments is investigated. A Parallel Phoneme Recognition followed by Language Modeling (PPR-LM) architecture is utilized and three specific scenarios are investigated: (1) incomplete resources, including the lack of audio transcriptions and/or pronunciation dictionaries; (2) inconsistent resources, including the use of speech corpora that are unmatched with regard to domain or channel characteristics; and (3) poor quality resources, such as wrongly labeled or poorly transcribed data. Each situation is analysed, techniques defined to mitigate the effect of limited or poor quality resources, and the effectiveness of these techniques evaluated experimentally. Techniques evaluated include the development of orthographic tokenizers, bootstrapping of transcriptions, filtering of low quality audio, diarization and channel normalization techniques, and the human verification of miss-classified utterances. The knowledge gained from this research is used to develop the first S-LID system able to distinguish between all South African languages. The system performs well, able to differentiate among the eleven languages with an accuracy of above 67%, and among the six primary South African language families with an accuracy of higher than 80%, on segments of speech of between 2s and 10s in length. AFRIKAANS : Suid-Afrika het elf amptelike tale waarvan tien as hulpbron-skaars beskou word. Vir die tien tale kan selfs die basiese hulpbronne wat benodig word om spraak tegnologie stelsels te ontwikkel moeilik wees om te bekom. Die proses om ‘n Gesproke Taal Identifisering stelsel vir hulpbron-skaars omgewings te ontwikkel, word in hierdie tesis ondersoek. ‘n Parallelle Foneem Herkenning gevolg deur Taal Modellering argitektuur word ingespan om drie spesifieke moontlikhede word ondersoek: (1) Onvolledige Hulpbronne, byvoorbeeld vermiste transkripsies en uitspraak woordeboeke; (2) Teenstrydige Hulpbronne, byvoorbeeld die gebruik van spraak data-versamelings wat teenstrydig is in terme van kanaal kenmerke; en (3) Hulpbronne van swak kwaliteit, byvoorbeeld foutief geklasifiseerde data en klank opnames wat swak getranskribeer is. Elke situasie word geanaliseer, tegnieke om die negatiewe effekte van min of swak hulpbronne te verminder word ontwikkel, en die bruikbaarheid van hierdie tegnieke word deur middel van eksperimente bepaal. Tegnieke wat ontwikkel word sluit die ontwikkeling van ortografiese ontleders, die outomatiese ontwikkeling van nuwe transkripsies, die filtrering van swak kwaliteit klank-data, klank-verdeling en kanaal normalisering tegnieke, en menslike verifikasie van verkeerd geklassifiseerde uitsprake in. Die kennis wat deur hierdie navorsing bekom word, word gebruik om die eerste Gesproke Taal Identifisering stelsel wat tussen al die tale van Suid-Afrika kan onderskei, te ontwikkel. Hierdie stelsel vaar relatief goed, en kan die elf tale met ‘n akkuraatheid van meer as 67% identifiseer. Indien daar op die ses taal families gefokus word, verbeter die persentasie tot meer as 80% vir segmente wat tussen 2 en 10 sekondes lank. Copyright / Dissertation (MEng)--University of Pretoria, 2010. / Electrical, Electronic and Computer Engineering / unrestricted
17

Challenges in teaching IsiXhosa home language in rural Eastern Cape secondary schools

Kafu, Hazel Bukiwe 30 September 2020 (has links)
The purpose of this study was to investigate the challenges in teaching IsiXhosa home language in rural secondary schools. Learners from Grades 8 to 12 perform poorly in IsiXhosa grammar, essay writing, literature and oral work. The researcher sampled 40 learners from each of two senior secondary schools, eight parents and eight IsiXhosa subject specialists (two district based and six school based) to take part in the research. Data for this study were collected during cluster moderations in one of the secondary schools by using document analysis, interviews and questionnaires. Qualitative and quantitative methods were used by the researcher to analyse IsiXhosa results from Grade 8 to Grade 12. Analysis of documents such as mark schedules and marks for formal and informal tasks gave evidence that learners perform poorly in grammar, literature, oral work and essay writing. Scarcity or non-availability of distinctions (levels 6 and 7) in Grade 12 final exams as well as in Grades 8 to 11 proves that the language demands special attention for its teaching and learning in the secondary classroom situation; the conclusions were therefore drawn and recommendations made. / Curriculum and Instructional Studies / D. Ed. (Curriculum Design and Development)

Page generated in 0.048 seconds