• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 14
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 25
  • 25
  • 8
  • 7
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Ανάπτυξη μεθόδου με σκοπό την αναγνώριση και εξαγωγή θεματικών λέξεων κλειδιών από διευθύνσεις ιστοσελίδων του ελληνικού Διαδικτύου / Keyword identification within Greek URLs

Βονιτσάνου, Μαρία-Αλεξάνδρα 16 January 2012 (has links)
Η αύξηση της διαθέσιμης Πληροφορίας στον Παγκόσμιο Ιστό είναι ραγδαία. Η παρατήρηση αυτή παρότρυνε πολλούς ερευνητές να επικεντρώσουν το έργο τους στην εξαγωγή χρήσιμων γνωρισμάτων από διαδικτυακά έγγραφα, όπως ιστοσελίδες, εικόνες, βίντεο, με σκοπό τη ενίσχυση της διαδικασίας κατηγοριοποίησης ιστοσελίδων. Ένας πόρος που περιέχει πληροφορία και δεν έχει διερευνηθεί διεξοδικά για γλώσσες εκτός της αγγλικής, είναι η διεύθυνση ιστοσελίδας (URL- Uniform Recourse Locator). Το κίνητρο της διπλωματικής αυτής εργασίας είναι το γεγονός ότι ένα σημαντικό υποσύνολο των χρηστών του διαδικτύου δείχνει ενδιαφέρον για δικτυακούς πόρους, των οποίων οι διευθύνσεις URL περιλαμβάνουν όρους προερχόμενους από τη μητρική τους γλώσσα (η οποία δεν είναι η αγγλική), γραμμένους με λατινικούς χαρακτήρες. Προτείνεται μέθοδος η οποία θα αναγνωρίζει και θα εξάγει τις λέξεις-κλειδιά από διευθύνσεις ιστοσελίδων (URLs), εστιάζοντας στο ελληνικό Διαδίκτυο και συγκεκριμένα σε URLs που περιέχουν ελληνικούς όρους. Το κύριο ζήτημα της προτεινόμενης μεθόδου είναι ότι οι ελληνικές λέξεις μπορούν να μεταγλωττίζονται με λατινικούς χαρακτήρες σύμφωνα με πολλούς διαφορετικούς τρόπους, καθώς και το γεγονός ότι τα URLs μπορούν να περιέχουν περισσότερες της μιας λέξεις χωρίς κάποιο διαχωριστικό. Παρόλη την ύπαρξη προηγούμενων προσεγγίσεων για την επεξεργασία ελληνικού διαδικτυακού περιεχομένου, όπως αναζητήσεις στο ελληνικό διαδίκτυο και αναγνώριση οντότητας σε ελληνικές ιστοσελίδες, καμία από τις παραπάνω δεν βασίζεται σε διευθύνσεις URL. Επιπλέον, έχουν αναπτυχθεί πολλές τεχνικές για την κατηγοριοποίηση ιστοσελίδων με βάση κυρίως τις διευθύνσεις URL, αλλά καμία δεν διερευνά την περίπτωση του ελληνικού διαδικτύου. Η προτεινόμενη μέθοδος περιέχει δύο βασικά στοιχεία: το μεταγλωττιστή και τον κατακερματιστή. Ο μεταγλωττιστής, βασισμένος σε ένα ελληνικό λεξικό και ένα σύνολο κανόνων, μετατρέπει τις λέξεις που είναι γραμμένες με λατινικούς χαρακτήρες σε ελληνικούς όρους ενώ παράλληλα ο κατακερματιστής τμηματοποιεί τη διεύθυνση URL σε λέξεις με νόημα, εξάγοντας, έτσι τελικά ελληνικούς όρους που αποτελούν λέξεις κλειδιά. Η πειραματική αξιολόγηση της προτεινόμενης μεθόδου σε δείγμα ελληνικών URLs αποδεικνύει ότι μπορεί να αξιοποιηθεί εποικοδομητικά στην αυτόματη αναγνώριση λέξεων-κλειδιών σε ελληνικά URLs. / The available information on the WWW is increasing rapidly. This observation has triggered many researchers to focus their work on extracting useful features from web documents that would enhance the task of web classification. A quite informative resource that has not been thoroughly explored for languages other than English, is the uniform recourse locator (URL). Motivated by the fact that a significant part of the Web users is interested in web resources, whose URLs contain terms from their non English native languages,written using Latin characters, we propose a method that identifies and extracts successfully keywords within URLs focusing on the Greek Web and especially ons URLs, containing Greek terms. The main issue of this approach is that Greek words can be transliterated to Latin characters in many different ways based on how the words are pronounced rather than on how they are written. Although there are previous attempts on similar issues, like Greek web searches and entity recognition in Greek Web Pages, none of them is based on URLs. In addition, there are many techniques on web page categorization based mainly on URLs but noone explores the case of Greek terms. The proposed method uses a three-step approach; firstly, a normalized URL is divided into its basic components, according to URI protocol (scheme :// host / path-elements / document . extension). The domain part is splitted on the apperance of punctuation marks or numbers. Secondly, domain-tokens are segmented into meaningful tokens using a set of transliteration rules and a Greek dictionary. Finally, in order to identify useful keywords, a score is assigned to each extracted keyword based on its length and whether the word is nested in another word. The algorithm is evaluated on a random sample of 1,000 URLs collected manually. We perform a human-based evaluation comparing the keywords extracted automatically with the keywords extracted manually when no other additional information than the URL is available. The results look promising.
22

Estudo longitudinal de hipossegmentações em textos do Ensino Fundamental II / Longitudinal study of hyposegmentations in texts of Junior High School

Fiel, Roberta Pereira 22 March 2018 (has links)
Submitted by Roberta Pereira Fiel (roh_fiel@hotmail.com) on 2018-05-02T12:47:59Z No. of bitstreams: 1 FIEL_RP_2018_Estudo longitudinal de hipossegmentações no Ensino Fundamental II.pdf: 4063414 bytes, checksum: 7f6aacb0e843d3a08ad7154b8bdb3dcb (MD5) / Approved for entry into archive by Elza Mitiko Sato null (elzasato@ibilce.unesp.br) on 2018-05-02T23:42:08Z (GMT) No. of bitstreams: 1 fiel_rp_me_sjrp_int.pdf: 4063414 bytes, checksum: 7f6aacb0e843d3a08ad7154b8bdb3dcb (MD5) / Made available in DSpace on 2018-05-02T23:42:08Z (GMT). No. of bitstreams: 1 fiel_rp_me_sjrp_int.pdf: 4063414 bytes, checksum: 7f6aacb0e843d3a08ad7154b8bdb3dcb (MD5) Previous issue date: 2018-03-22 / Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) / Esta dissertação trata da caracterização longitudinal da escrita de alunos do EF II no que diz respeito às chamadas hipossegmentações de palavras escritas – como “puraqui” (“por aqui”), nas quais há a ausência não-convencional de fronteira gráfica. Nossos objetivos são: (i) identificar, por meio de análise quantitativa, se há correlação (ou não) entre número de hipossegmentações e tempo de escolarização; e (ii) descrever qualitativamente as hipossegmentações, quanto a aspectos prosódicos dos enunciados falados e aspectos gráficos relativos às informações da própria convenção ortográfica. Para alcançar esses objetivos, nos baseamos, por um lado, em aparato teórico da fonologia prosódica, modelo relation-based, que concebe a existência de sete constituintes prosódicos que estruturam os enunciados das línguas do mundo; por outro lado, em abordagem da escrita como constituída de modo heterogêneo. Dos resultados obtidos na análise quantitativa, destacamos que há correlação entre aumento dos anos de escolarização e diminuição de ocorrência de hipossegmentação. No que se refere aos resultados quantitativos das estruturas envolvidas, destacamos: (i) a junção entre clítico e palavra prosódica é a característica do maior conjunto de dados; (ii) a junção entre dois clíticos é a segunda estrutura mais recorrente, predominando a hipossegmentação “oque”; (iii) a junção entre duas palavras prosódicas, a terceira mais recorrente no material analisado, decorre da mobilização de várias características linguísticas, como a hipossegmentação de estruturas perifrásticas que constituem exemplos de mudança linguística em curso; (iv) a junção de palavra prosódica e clítico é a estrutura menos recorrente, sendo a maioria dos dados decorrente da combinação de palavras com a ausência do hífen, que levou à formação de possíveis palavras prosódicas; (v) a junção envolvendo mais de uma palavra prosódica e/ou clítico ocorreu apenas em três dados, que abrangem estruturas como a frase entoacional e o enunciado fonológico. No que se refere aos resultados qualitativos, a partir de análise de cunho linguístico-textual, os casos em que há a flutuação entre convencional e não-convencional: (i) se distinguem entre si pela configuração prosódica, gramatical e linguística-textual; (ii) são indícios mais explícitos da inserção dos alunos em práticas orais/faladas e letradas/escritas; (ii) são marcas do complexo processo que envolve o Outro como instância representativa da linguagem (e da escrita em particular), a escrita na complexidade de seu funcionamento (heterogeneamente constituída) e o aluno enquanto sujeito escrevente. A principal contribuição desta dissertação está em: (i) fazer análise quantitativa e qualitativa de hipossegmentações no EF II; e (ii) evidenciar a complexidade que subjaz às relações entre prosódia e escrita por meio da segmentação não-convencional de palavras. / This work deals with the longitudinal characterization of the writing by students from Junior High School (EF II in Brazil) with respect to the hyposegmentations of written words in which there is the unconventional absence of graphic frontier (e.g. "puraqui" - "por aqui" in Portuguese - "around here" in English). To reach these objectives, we are based on a theoretical apparatus of prosodic phonology, on the one hand, a relation-based model, which conceives the existence of seven prosodic constituents that structure the utterances of the world's languages; and, on the other hand, in the approach of writing as constituted in a heterogeneous way. From the results obtained in the quantitative analysis, we highlight that is a correlation between the increase in the years of schooling and a decrease in the occurrence of hyposegmentations. Regarding the quantitative results of the structures, we highlight: (i) the junction between clitic and prosodic word is the characteristic of the largest data set; (ii) the junction between two clitics is the second most recurrent structure; (iii) the junction between two prosodic words, the third most recurrent in the material analyzed, derives from the mobilization of several linguistic characteristics, such as the hyposegmentation of periphrastic structures that are examples of linguistic change in progress; (iv) the prosodic and clitic word junction is the least recurrent structure, most of which results from the combination of words with the absence of the hyphen, which led to the formation of possible prosodic words; (v) the junction involving more than one prosodic and / or clitic word occurred only in three data, covering structures such as the intonational phrase and phonological utterance. Regarding the qualitative results, from a linguistic-textual analysis, we highlight that the cases in which there is a fluctuation between conventional and unconventional are: (i) distinguished by their prosodic, grammatical and linguistic-textual configuration ; (ii) more explicit indications of students' insertion into oral / spoken practices and literacy / written practices; (ii) marks of the complex process involving the Other as an instance representative of language (and writing in particular), writing in the complexity of its functioning (heterogeneously constituted) and the student as a writing subject. The main contribution of this work is: (i) to make a quantitative and qualitative analysis of hyposegmentations in by students from Junior High School (EF II) in Brazil Elementary School; and (ii) to show the complexity that underlies the relations between prosody and writing through unconventional segmentation of words. / FAPESP (Processo Nº 2015/26763-6)
23

支援數位人文研究之文本自動標註系統發展與使用評估研究 / Development and evaluation of an automatic text annotation system for supporting digital humanities research

劉鎮宇, Liu, Chen Yu Unknown Date (has links)
在傳統的人文研究中,人文學者大多以如古籍珍善本、歷史文獻等紙本出版形式之文本為主要研究文本型式,但是隨著資訊社會的來臨,許多研究機構陸續將這些紙本資料進行數位化並建置數位典藏資料庫,對人文研究環境與知識取得管道帶來巨大的改變,基於數位閱讀之文本研究型式也成為必然的發展趨勢。 因此,本研究發展支援數位人文研究之「文本自動標註系統」,藉由Linked Data的概念匯集來自不同資料庫的資源,並加以整合後,替文本進行自動註解,讓使用者在解讀文本時能夠即時參照其他資料庫的資源,並提供友善的具文本標註之閱讀介面,以利於人文學者透過閱讀進行資料的解讀。本研究以實驗研究法比較本研究所發展之「文本自動標註系統」與「MARKUS文本半自動標註系統」在支援人文學者進行文本資料解讀之閱讀成效與科技接受度是否具有顯著差異,並輔以半結構式深度訪談了解人文學者對於本研究發展之「文本自動標註系統」的看法及感受,也進一步分析「文本自動標註系統」閱讀成效、科技接受度及使用者行為歷程之間是否具有關聯性。 實驗結果發現,採用本研究發展之文本自動標註系統的閱讀成效高於MARKUS文本半自動標註系統,但未達顯著差異;而科技接受度分析結果則顯示文本自動標註系統之科技接受度顯著優於MARKUS文本半自動標註系統。另外,從訪談結果歸納得知,文本自動標註系統閱讀介面簡潔明瞭,比MARKUS文本半自動標註系統更適合閱讀,而閱讀介面是否易於使用與是否有用,是影響人文學者能否接受採用系統輔助數位人文研究的重要因素。此外,在兩個系統類似功能比較分析後也發現,文本自動標註系統在查詢詞彙功能、連結到來源網站功能及新增標註功能都比MARKUS文本半自動標註系統更為直覺易用。另外人文學者普遍認為斷句功能比自動斷詞功能更重要,鏈結來源資料庫則以萌典最有幫助。最後,採用文本自動標註系統之閱讀成效與使用者行為歷程之間無顯著關聯性。 / In traditional humanities research, most humanities scholars studied text-type paper-based publishing texts, such as rare ancient books and historical literature. However, many research institutes, in the information society, gradually digitalized such paper-based data and established digital archives database to result in great changes in humanities research environment and knowledge acquisition channels. The research pattern with digital reading based texts became the essential development trend. For this reason, an “automatic text annotation system” for supporting digital humanities research is developed in this study. Resources from distinct database are gathered through Linked Data and integrated for the automatic annotation of texts. It allows users immediately referring to resources from other database when interpreting texts and provides friendly reading interface with text annotation for humanities scholars interpreting data through reading. With experimental research, the “automatic text annotation system” developed in this study is compared with “MARKUS semi-automatic text annotation system” for supporting humanities scholars interpreting text data to discussed the difference in reading achievement and technology acceptance. Semi-structured in-depth interviews are also proceeded to understand humanities scholars’ opinions and perception about the “automatic text annotation system” developed in this study as well as to analyze the correlations among reading achievement, technology acceptance, and user behavior course of the “automatic text annotation system”. The experimental findings show that the reading achievement with the automatic text annotation system developed in this study is higher than that with MARKUS semi-automatic text annotation system, but not achieving the significance. The technology acceptance analysis reveals remarkably better technology acceptance of the automatic text annotation system than MARKUS semi-automatic text annotation system. According to the interviews, the reading interface of the automatic text annotation system is simple and clear that it is more suitable for reading than MARKUS semi-automatic text annotation system. The ease of use and usefulness of reading interface is a key factor in humanities scholars accepting the system for the digital humanities research. In regard to the comparison of similar functions between two systems, the functions of vocabulary enquiry, linking to source web sites, and annotation appending of the automatic text annotation system are more intuitive and easy to use than those of MARKUS semi-automatic text annotation system. What is more, humanities scholars emphasize more on the sentence segmentation function than the automatic word segmentation function, and the linked source database, Moedict, appears the best assistance. Finally, there is no significant correlation between reading achievement and user behavior course with the automatic text annotation system.
24

Acquisition de relations phonologiques non-adjacentes : de la perception de la parole à l’acquisition lexicale / Acquisition of non-adjacent phonological dependencies : From speech perception to lexical acquisition

González Gómez, Nayeli 01 August 2012 (has links)
Les langues ont de nombreux types de dépendances, certaines concernant des éléments adjacents et d'autres concernant des éléments non adjacents. Au cours des dernières décennies, de nombreuses études ont montré comment les capacités précoces générales des enfants pour traiter le langage se transforment en capacités spécialisées pour la langue qu'ils acquièrent. Ces études ont montré que pendant la deuxième moitié de leur première année de vie, les enfants deviennent sensibles aux propriétés prosodiques, phonétiques et phonotactiques de leur langue maternelle concernant les éléments adjacents. Cependant, aucune étude n'avait mis en évidence la sensibilité des enfants à des dépendances phonologiques non-adjacentes, qui sont un élément clé dans les langues humaines. Par conséquent, la présente thèse a examiné si les enfants sont capables de détecter, d'apprendre et d’utiliser des dépendances phonotactiques non-adjacentes. Le biais Labial-Coronal, correspondant à la prévalence des structures commençant par une consonne labiale suivie d'une consonne coronale (LC, comme bateau), par rapport au pattern inverse Coronal-Labial (CL, comme tabac), a été utilisé pour explorer la sensibilité des nourrissons aux dépendances phonologiques non-adjacentes. Nos résultats établissent qu’à 10 mois les enfants de familles francophones sont sensibles aux dépendances phonologiques non-adjacentes (partie expérimentale 1.1). De plus, nous avons exploré le niveau auquel s’effectuent ces acquisitions. En effet, des analyses de fréquence sur le lexique du français ont montré que le biais LC est clairement présent pour les séquences de plosives et de nasales, mais pas pour les fricatives. Les résultats d'une série d'expériences suggèrent que le pattern de préférences des enfants n’est pas guidé par l'ensemble des fréquences cumulées dans le lexique, ou des fréquences de paires individuelles, mais par des classes de consonnes définies par le mode d'articulation (partie expérimentale 1.2). En outre, nous avons cherché à savoir si l’émergence du biais LC était liés à des contraintes de type maturationnel ou bien par l'exposition à l’input linguistique. Pour cela, nous avons tout d’abord testé l'émergence du biais LC dans une population présentant des différences de maturation, à savoir des enfants nés prématurément (± 3 mois avant terme), puis comparé leurs performances à un groupe d‘enfants nés à terme appariés en âge de maturation, et à un groupe de nourrissons nés à terme appariés en âge chronologique. Nos résultats indiquent qu’à 10 mois les enfants prématurés ont un pattern qui ressemble plus au pattern des enfants nés à terme âgés de 10 mois (même âge d'écoute) qu’à celui des enfants nés à terme âgés de 7 mois (même âge de maturation ; partie expérimentale 1.3). Deuxièmement, nous avons testé une population apprenant une langue où le biais LC n’est pas aussi clairement présent dans le lexique : le japonais. Les résultats de cette série d'expériences n’a montré aucune préférence pour les structures LC ou CL chez les enfants japonais (partie expérimentale 1.4). Pris ensemble, ces résultats suggèrent que le biais LC peut être attribué à l'exposition à l'input linguistique et pas seulement à des contraintes maturationnelles. Enfin, nous avons exploré si, et quand, les acquisitions phonologiques apprises au cours de la première année de la vie influencent le début du développement lexical au niveau de la segmentation et de l’apprentissage des mots. Nos résultats montrent que les mots avec la structure phonotactique LC, plus fréquente, sont segmentés (partie expérimentale 2.1) et appris (partie expérimentale 2.2) à un âge plus précoce que les mots avec la structure phonotactique CL moins fréquente. Ces résultats suggèrent que les connaissances phonotactiques préalablement acquises peuvent influencer l'acquisition lexicale, même quand il s'agit d'une dépendance non-adjacente. / Languages instantiate many different kinds of dependencies, some holding between adjacent elements and others holding between non-adjacent elements. During the past decades, many studies have shown how infant initial language-general abilities change into abilities that are attuned to the language they are acquiring. These studies have shown that during the second half of their first year of life, infants became sensitive to the prosodic, phonetic and phonotactic properties of their mother tongue holding between adjacent elements. However, at the present time, no study has established sensitivity to nonadjacent phonological dependencies, which are a key feature in human languages. Therefore, the present dissertation investigates whether infants are able to detect, learn and use non-adjacent phonotactic dependencies. The Labial-Coronal bias, corresponding to the prevalence of structures starting with a labial consonant followed by a coronal consonant (LC, i.e. bat), over the opposite pattern (CL, i.e. tab) was used to explore infants sensitivity to non-adjacent phonological dependencies. Our results establish that by 10 months of age French-learning infants are sensitive to non-adjacent phonological dependencies (experimental part 1.1). In addition, we explored the level of generalization of these acquisitions. Frequency analyses on the French lexicon showed that the LC bias is clearly present for plosive and nasal sequences but not for fricatives. The results of a series of experiments suggest that infants preference patterns are not guided by overall cumulative frequencies in the lexicon, or frequencies of individual pairs, but by consonant classes defined by manner of articulation (experimental part 1.2). Furthermore, we explored whether the LC bias was trigger by maturational constrains or by the exposure to the input. To do so, we tested the emergence of the LC bias firstly in a population having maturational differences, that is infants born prematurely (± 3 months before term) and compared their performance to a group of full-term infants matched in maturational age, and a group of full-term infants matched in chronological age. Our results indicate that the preterm 10-month-old pattern resembles much more that of the full-term 10-month-olds (same listening age) than that of the full-term 7-month-olds (same maturational age; experimental part 1.3). Secondly we tested a population learning a language with no LC bias in its lexicon, that is Japanese-learning infants. The results of these set of experiments failed to show any preference for either LC or CL structures in Japanese-learning infants (experimental part 1.4). Taken together these results suggest that the LC bias is triggered by the exposure to the linguistic input and not only to maturational constrains. Finally, we explored whether, and if so when, phonological acquisitions during the first year of life constrain early lexical development at the level of word segmentation and word learning. Our results show that words with frequent phonotactic structures are segmented (experimental part 2.1) and learned (experimental part 2.2) at an earlier age than words with a less frequent phonotactic structure. These results suggest that prior phonotactic knowledge can constrain later lexical acquisition even when it involves a non-adjacent dependency.
25

Segmentation Strategies for Scene Word Images

Anil Prasad, M N January 2014 (has links) (PDF)
No description available.

Page generated in 0.1342 seconds