• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 20
  • 3
  • 2
  • 1
  • Tagged with
  • 31
  • 20
  • 14
  • 12
  • 11
  • 9
  • 8
  • 8
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Effective automatic speech recognition data collection for under–resourced languages / de Vries N.J.

De Vries, Nicolaas Johannes January 2011 (has links)
As building transcribed speech corpora for under–resourced languages plays a pivotal role in developing automatic speech recognition (ASR) technologies for such languages, a key step in developing these technologies is the effective collection of ASR data, consisting of transcribed audio and associated meta data. The problem is that no suitable tool currently exists for effectively collecting ASR data for such languages. The specific context and requirements for effectively collecting ASR data for underresourced languages, render all currently known solutions unsuitable for such a task. Such requirements include portability, Internet independence and an open–source code–base. This work documents the development of such a tool, called Woefzela, from the determination of the requirements necessary for effective data collection in this context, to the verification and validation of its functionality. The study demonstrates the effectiveness of using smartphones without any Internet connectivity for ASR data collection for under–resourced languages. It introduces a semireal– time quality control philosophy which increases the amount of usable ASR data collected from speakers. Woefzela was developed for the Android Operating System, and is freely available for use on Android smartphones, with its source code also being made available. A total of more than 790 hours of ASR data for the eleven official languages of South Africa have been successfully collected with Woefzela. As part of this study a benchmark for the performance of a new National Centre for Human Language Technology (NCHLT) English corpus was established. / Thesis (M.Ing. (Electrical Engineering))--North-West University, Potchefstroom Campus, 2012.
22

Access and use of information and communication technology for teaching and learning amongst schools in under resourced communities in the Western Cape, South Africa

Koranteng, Kesewaa January 2012 (has links)
Thesis (MTech(Information Technology)) --Cape Peninsula University of Technology, 2012 / Due to the legacy of apartheid South Africa is facing developmental discrepancies with inequalities between the advantaged few in the more urban areas and the disadvantaged majority in the rural areas. With quality education being key, not only to the success of an individual but of a country’s development, efforts have been made to ensure equal access for all. ICT is seen as a key enabler to this end. The study investigated the status of ICT deployment and its integration into curricula in schools. The objective was to understand the factors affecting the efforts to achieve successful implementation of ICT integration into schools in underdeveloped areas, to understand the challenges that exist and ultimately, to inform solutions. A qualitative study was conducted, using a case study method. A purposive sampling method was used to select population elements; educators and school coordinators of ICT programs in Western Cape schools (i.e. Kulani Secondary, Sithembele Matiso Secondary, Macassar Secondary and Marvin Park Primary). To gain an understanding of the status quo, literature was explored and semi-structured interviews were conducted with ICT coordinators and educators within the 4 sampled schools. Activity theory was used to provide an analytical framework for the study. Through this framework the aims and objectives of the study were conceptualized and summarized to form a graphical representation of the phenomena under study. In spite of efforts to ensure universal access to ICT, the findings indicate that the status of ICT deployment and its integration into school curricula is far from favourable in underdeveloped schools.
23

Extraction de corpus parallèle pour la traduction automatique depuis et vers une langue peu dotée / Extraction a parallel corpus for machine translation from and to under-resourced languages

Do, Thi Ngoc Diep 20 December 2011 (has links)
Les systèmes de traduction automatique obtiennent aujourd'hui de bons résultats sur certains couples de langues comme anglais – français, anglais – chinois, anglais – espagnol, etc. Les approches de traduction empiriques, particulièrement l'approche de traduction automatique probabiliste, nous permettent de construire rapidement un système de traduction si des corpus de données adéquats sont disponibles. En effet, la traduction automatique probabiliste est fondée sur l'apprentissage de modèles à partir de grands corpus parallèles bilingues pour les langues source et cible. Toutefois, la recherche sur la traduction automatique pour des paires de langues dites «peu dotés» doit faire face au défi du manque de données. Nous avons ainsi abordé le problème d'acquisition d'un grand corpus de textes bilingues parallèles pour construire le système de traduction automatique probabiliste. L'originalité de notre travail réside dans le fait que nous nous concentrons sur les langues peu dotées, où des corpus de textes bilingues parallèles sont inexistants dans la plupart des cas. Ce manuscrit présente notre méthodologie d'extraction d'un corpus d'apprentissage parallèle à partir d'un corpus comparable, une ressource de données plus riche et diversifiée sur l'Internet. Nous proposons trois méthodes d'extraction. La première méthode suit l'approche de recherche classique qui utilise des caractéristiques générales des documents ainsi que des informations lexicales du document pour extraire à la fois les documents comparables et les phrases parallèles. Cependant, cette méthode requiert des données supplémentaires sur la paire de langues. La deuxième méthode est une méthode entièrement non supervisée qui ne requiert aucune donnée supplémentaire à l'entrée, et peut être appliquée pour n'importe quelle paires de langues, même des paires de langues peu dotées. La dernière méthode est une extension de la deuxième méthode qui utilise une troisième langue, pour améliorer les processus d'extraction de deux paires de langues. Les méthodes proposées sont validées par des expériences appliquées sur la langue peu dotée vietnamienne et les langues française et anglaise. / Nowadays, machine translation has reached good results when applied to several language pairs such as English – French, English – Chinese, English – Spanish, etc. Empirical translation, particularly statistical machine translation allows us to build quickly a translation system if adequate data is available because statistical machine translation is based on models trained from large parallel bilingual corpora in source and target languages. However, research on machine translation for under-resourced language pairs always faces to the lack of training data. Thus, we have addressed the problem of retrieving a large parallel bilingual text corpus to build a statistical machine translation system. The originality of our work lies in the fact that we focus on under-resourced languages for which parallel bilingual corpora do not exist in most cases. This manuscript presents our methodology for extracting a parallel corpus from a comparable corpus, a richer and more diverse data resource over the Web. We propose three methods of extraction. The first method follows the classical approach using general characteristics of documents as well as lexical information of the document to retrieve both parallel documents and parallel sentence pairs. However, this method requires additional data of the language pair. The second method is a completely unsupervised method that does not require additional data and it can be applied to any language pairs, even under resourced language pairs. The last method deals with the extension of the second method using a third language to improve the extraction process (triangulation). The proposed methods are validated by a number of experiments applied on the under resourced Vietnamese language and the English and French languages.
24

Exploring Science Identity: The Lived Experiences of Underserved Students in a University Supplemental Science Program

Perrault, Lynette D 20 December 2017 (has links)
Underserved students attending under-resourced schools experience limited opportunities to engage in advanced science. An exploration into the influence a supplemental science program has on underserved students’ acquisition of science knowledge and skills to increase their pursuit of science was conducted to help explain science identity formation in students. The proliferation of supplemental science programs have emerged as a result of limited exposure and resources in science for underserved students, thus prompting further investigation into the influence supplemental science programs have on underserved students interest and motivation in science, attainment of science knowledge and skills, and confidence in science to promote science identities in students. Using a phenomenological qualitative approach, this study examined science identity formation in high school students participating in a university supplemental environmental health science program. The study explored high school students’ perceptions of their lived experiences in science supplemental activities, research, and field experiences and the influences these experiences have in relation to their science identity development. The university supplemental science program was an eight-week summer program in which students interacted with a diverse group of peers from various high schools, through engaging in environmental health science rotations, field experiences, and research with faculty advisors and graduate student mentors. Data collection included existing program evaluation data including, weekly journals and exit interviews, as well as follow-up interviews conducted several months after the program concluded. The study findings from a three step coding process of the follow-up interview transcripts provided six emerging themes as follows: (1) promoting interest and motivation to pursue new areas of science, (2) mechanisms in the acquisition of science knowledge and skills in scientific practice, (3) confidence in science knowledge and abilities, (4) understanding and applying science in the world, (5) emerging relationships with peers and mentors in science, and (6) aspirations to be a science person in the scientific community. This research study informs other supplemental science programs, has implications for improved science curricula and instruction in K12 schools, as well as explains how exposure to science experiences can help students gain identities in science.
25

Training parsers for low-resourced languages : improving cross-lingual transfer with monolingual knowledge / Apprentissage d'analyseurs syntaxiques pour les langues peu dotées : amélioration du transfert cross-lingue grâce à des connaissances monolingues

Aufrant, Lauriane 06 April 2018 (has links)
Le récent essor des algorithmes d'apprentissage automatique a rendu les méthodes de Traitement Automatique des Langues d'autant plus sensibles à leur facteur le plus limitant : la qualité des systèmes repose entièrement sur la disponibilité de grandes quantités de données, ce qui n'est pourtant le cas que d'une minorité parmi les 7.000 langues existant au monde. La stratégie dite du transfert cross-lingue permet de contourner cette limitation : une langue peu dotée en ressources (la cible) peut être traitée en exploitant les ressources disponibles dans une autre langue (la source). Les progrès accomplis sur ce plan se limitent néanmoins à des scénarios idéalisés, avec des ressources cross-lingues prédéfinies et de bonne qualité, de sorte que le transfert reste inapplicable aux cas réels de langues peu dotées, qui n'ont pas ces garanties. Cette thèse vise donc à tirer parti d'une multitude de sources et ressources cross-lingues, en opérant une combinaison sélective : il s'agit d'évaluer, pour chaque aspect du traitement cible, la pertinence de chaque ressource. L'étude est menée en utilisant l'analyse en dépendance par transition comme cadre applicatif. Le cœur de ce travail est l'élaboration d'un nouveau méta-algorithme de transfert, dont l'architecture en cascade permet la combinaison fine des diverses ressources, en ciblant leur exploitation à l'échelle du mot. L'approche cross-lingue pure n'étant en l'état pas compétitive avec la simple annotation de quelques phrases cibles, c'est avant tout la complémentarité de ces méthodes que souligne l'analyse empirique. Une série de nouvelles métriques permet une caractérisation fine des similarités cross-lingues et des spécificités syntaxiques de chaque langue, de même que de la valeur ajoutée de l'information cross-lingue par rapport au cadre monolingue. L'exploitation d'informations typologiques s'avère également particulièrement fructueuse. Ces contributions reposent largement sur des innovations techniques en analyse syntaxique, concrétisées par la publication en open source du logiciel PanParser, qui exploite et généralise la méthode dite des oracles dynamiques. Cette thèse contribue sur le plan monolingue à plusieurs autres égards, comme le concept de cascades monolingues, pouvant traiter par exemple d'abord toutes les dépendances faciles, puis seulement les difficiles. / As a result of the recent blossoming of Machine Learning techniques, the Natural Language Processing field faces an increasingly thorny bottleneck: the most efficient algorithms entirely rely on the availability of large training data. These technological advances remain consequently unavailable for the 7,000 languages in the world, out of which most are low-resourced. One way to bypass this limitation is the approach of cross-lingual transfer, whereby resources available in another (source) language are leveraged to help building accurate systems in the desired (target) language. However, despite promising results in research settings, the standard transfer techniques lack the flexibility regarding cross-lingual resources needed to be fully usable in real-world scenarios: exploiting very sparse resources, or assorted arrays of resources. This limitation strongly diminishes the applicability of that approach. This thesis consequently proposes to combine multiple sources and resources for transfer, with an emphasis on selectivity: can we estimate which resource of which language is useful for which input? This strategy is put into practice in the frame of transition-based dependency parsing. To this end, a new transfer framework is designed, with a cascading architecture: it enables the desired combination, while ensuring better targeted exploitation of each resource, down to the level of the word. Empirical evaluation dampens indeed the enthusiasm for the purely cross-lingual approach -- it remains in general preferable to annotate just a few target sentences -- but also highlights its complementarity with other approaches. Several metrics are developed to characterize precisely cross-lingual similarities, syntactic idiosyncrasies, and the added value of cross-lingual information compared to monolingual training. The substantial benefits of typological knowledge are also explored. The whole study relies on a series of technical improvements regarding the parsing framework: this work includes the release of a new open source software, PanParser, which revisits the so-called dynamic oracles to extend their use cases. Several purely monolingual contributions complete this work, including an exploration of monolingual cascading, which offers promising perspectives with easy-then-hard strategies.
26

Automatic Annotation of Speech: Exploring Boundaries within Forced Alignment for Swedish and Norwegian / Automatisk Anteckning av Tal: Utforskning av Gränser inom Forced Alignment för Svenska och Norska

Biczysko, Klaudia January 2022 (has links)
In Automatic Speech Recognition, there is an extensive need for time-aligned data. Manual speech segmentation has been shown to be more laborious than manual transcription, especially when dealing with tens of hours of speech. Forced alignment is a technique for matching a signal with its orthographic transcription with respect to the duration of linguistic units. Most forced aligners, however, are language-dependent and trained on English data, whereas under-resourced languages lack the resources to develop an acoustic model required for an aligner, as well as manually aligned data. An alternative solution to the training of new models can be cross-language forced alignment, in which an aligner trained on one language is used for aligning data in another language.  This thesis aimed to evaluate state-of-the-art forced alignment algorithms available for Swedish and test whether a Swedish model could be applied for aligning Norwegian. Three approaches for forced aligners were employed: (1) one forced aligner based on Dynamic Time Warping and text-to-speech synthesis Aeneas, (2) two forced aligners based on Hidden Markov Models, namely the Munich AUtomatic Segmentation System (WebMAUS) and the Montreal Forced Aligner (MFA) and (3) Connectionist Temporal Classification (CTC) segmentation algorithm with two pre-trained and fine-tuned Wav2Vec2 Swedish models. First, small speech test sets for Norwegian and Swedish, covering different types of spontaneousness in the speech, were created and manually aligned to create gold-standard alignments. Second, the performance of the Swedish dataset was evaluated with respect to the gold standard. Finally, it was tested whether Swedish forced aligners could be applied for aligning Norwegian data. The performance of the aligners was assessed by measuring the difference between the boundaries set in the gold standard from that of the comparison alignment. The accuracy was estimated by calculating the proportion of alignments below a particular threshold proposed in the literature. It was found that the performance of the CTC segmentation algorithm with Wav2Vec2 (VoxRex) was superior to other forced alignment systems. The differences between the alignments of two Wav2Vec2 models suggest that the training data may have a larger influence on the alignments, than the architecture of the algorithm. In lower thresholds, the traditional HMM approach outperformed the deep learning models. Finally, findings from the thesis have demonstrated promising results for cross-language forced alignment using Swedish models to align related languages, such as Norwegian.
27

A Method for the Assisted Translation of QA Datasets Using Multilingual Sentence Embeddings / En metod för att assistera översättning av fråga-svarskorpusar med hjälp av språkagnostiska meningsvektorer

Vakili, Thomas January 2020 (has links)
This thesis presents a method which reduces the amount of labour required to translate the English question answering dataset SQuAD into Swedish. The purpose of the study is to contribute to shrinking the gap between natural language processing research in English and research in lesser-resourced languages by providing a method for creating datasets in these languages which are counterparts to those used in English. This would allow for the results from English studies to be evaluated in more languages. The method put forward by this thesis uses multilingual sentence embeddings to search for and rank answers to English SQuAD questions in SwedishWikipedia articles associated with the question. The resulting search results are then used to pair SQuAD questions with sentences that contain their answers. We also estimate to what extent SQuAD questions have answers in the Swedish edition of Wikipedia, concluding that this proportion of questions is small but still useful in size. Further, the evaluation of the method shows that it provides a clear reduction in the labour required for translating SQuAD into Swedish, while impacting the amount of datapoints retained in a resulting translation to a degree which is acceptable for many use-cases. Manual labour is still required for translating the SQuAD questions and for locating the answers within the Swedish sentences which contain them. Researching ways to automate these processes would further increase the utility of the approach, but are outside the scope of this thesis. / I detta examensarbete presenteras en metod som syftar till att minska mängden arbete som krävs för att översätta fråga-svarskorpuset SQuAD från engelska till svenska. Syftet med studien är att bidra till att minska glappet mellan språkteknologisk forskning på engelska och forskningen på språk med mindre resurser. Detta åstadkoms genom att beskriva en metod för att skapa korpusar liknande dem som används inom forskning på engelska och som kan användas för att utvärdera i vilken utsträckning resultat från den forskningen generaliserar till andra språk. Metoden använder språkagnostiska meningsvektorer för att söka efter svar på engelska SQuAD-frågor i svenska Wikipedia-artiklar, och sedan ranka dessa. Sökresultaten används sedan för att para samman SQuAD-frågor med de svenska meningar som innehåller deras svar. Även utsträckningen i vilken svar på engelska SQuAD-frågor står att finna i den svenska upplagan av Wikipedia undersöktes. Andelen SQuAD-frågor där ett svar fanns i den svenska Wikipedia-artikel som var associerad med frågan var liten men ändå användbar. Vidare visar utvärderingen av metoden att den innebär en tydlig minskning av mängden arbete som krävs för att översätta SQuAD till svenska. Denna minskning åstadkoms samtidigt som mängden fråga-svarspar som missas som en konsekvens av detta är acceptabel för många användningsområden. Manuellt arbete krävs fortfarande för att översätta SQuAD-frågorna från engelska och för att hitta var i de svenska meningarna som svaren finns. Vidare studier kring dessa frågor skulle bidra till att göra metoden än mer användbar, men ligger utanför avgränsningen för denna uppsats.
28

Teachers' experiences of curriculum change in two under-resourced primary schools in the Durban area

Pillay, Inbam 11 1900 (has links)
The purpose of this study was to explore teachers’ experiences of curriculum change in two under-resourced primary schools in the Durban area. By examining the experiences of educators using a qualitative approach the researcher was able to identify problems that prevent a smooth transition from one curriculum to another. The introduction of the Curriculum Assessment Policy Statements in January 2012 necessitated a plethora of adjustments for teachers at schools. Changes were made to the number of subjects to be taught, the notional time for each subject as well as a renewed emphasis on textbooks as a vital teaching resource in the classroom. This study was conducted in under-resourced primary schools in the Durban area. Data collection in both these schools shows that despite the lack of essential resources such as text books, teachers still manage to implement change and follow policy, whilst at the same time ensuring that their learners benefit from the curriculum. This study also highlights the challenges experienced by teachers in under-resourced schools that need to be confronted for effective curriculum implementation. The researcher makes recommendations to address these challenges as well as suggestions for future research. / Curriculum and Instructional Studies / M. Ed. (Curriculum Studies)
29

Teachers' experiences of curriculum change in two under-resourced primary schools in the Durban area

Pillay, Inbam 11 1900 (has links)
The purpose of this study was to explore teachers’ experiences of curriculum change in two under-resourced primary schools in the Durban area. By examining the experiences of educators using a qualitative approach the researcher was able to identify problems that prevent a smooth transition from one curriculum to another. The introduction of the Curriculum Assessment Policy Statements in January 2012 necessitated a plethora of adjustments for teachers at schools. Changes were made to the number of subjects to be taught, the notional time for each subject as well as a renewed emphasis on textbooks as a vital teaching resource in the classroom. This study was conducted in under-resourced primary schools in the Durban area. Data collection in both these schools shows that despite the lack of essential resources such as text books, teachers still manage to implement change and follow policy, whilst at the same time ensuring that their learners benefit from the curriculum. This study also highlights the challenges experienced by teachers in under-resourced schools that need to be confronted for effective curriculum implementation. The researcher makes recommendations to address these challenges as well as suggestions for future research. / Curriculum and Instructional Studies / M. Ed. (Curriculum Studies)
30

A model to facilitate research uptake in health care practice and policy development

Sigudla, Jerry 05 1900 (has links)
Despite the availability of numerous models for knowledge translation into practice and policy, research uptake remains low in resource-limited countries. This study was aimed at developing a model to facilitate research uptake in healthcare practice and policy development. The study used a two-phase exploratory sequential approach (QUAL→QUAN). Qualitative data were collected through semi-structured interviews with a total of 21 participants, categorised as researchers (6), frontline workers/practitioners (7), programme/policy managers (4), and directors/senior managers (4) from government, private sector and academic institutions of higher learning (universities and colleges). Quantitative data were collected through an online cross-sectional survey, administered to 212 respondents who conducted research studies in the Mpumalanga Province between 2014 to 2019. The most significant findings seem to be lack of awareness of research findings and champions to lead engagements among research stakeholders on research uptake. In addition, the research has established a failure by researchers to align public health research projects to existing local contexts and available resources. Conversely, there is a growing propensity of using informal research without consideration of data quality issues. It was further observed that establishing and sustaining beneficial collaboration between all research stakeholders is required to promote effective research uptake for practice and policy development. The survey results established a total of 13 components: four individual factors (support, experience, motivation & time factor); four organisational factors (research agenda, funding, resources & partnerships), and five research characteristics factors (gatekeeping, local research committees, accessibility of evidence, quality of evidence & critical appraisal skills). However, the Spearman’s correlation coefficient revealed that of the 13 factors, only six factors had a significant positive correlation with research uptake, namely: support, experience, motivation, time factor, resources, and critical appraisal skills. Consequently, a model for institutionalising research uptake is proposed. The roles of local research committees have been clarified, and a logical framework has been incorporated with pathways and channels of engagements to enable successful implementation of the research uptake model. / Health Studies / Ph. D. (Public Health)

Page generated in 0.0473 seconds