Global ETD Search

131	Knowledge dissemination from and knowledge production of a public review digital information source : A snapshot of visitors and active users’ activity in two language versions of Wikipedia HADJIGEORGIOU, ELLI January 2022 (has links) During the last two decades, Wikipedia’s multiple language versions managed to become editable forms of knowledge source, publicly available, which deal both, with non-typical knowledge dissemination and knowledge production, thanks to visitors’ viewership and active users/Wikipedians visitation/viewership and voluntary contribution, respectively. This dissertation investigates the function of recent viewership and editing in Wikipedia in Greek (WiG) and Bulgarian Language (WiB), by visitors and active users/Wikipedians (G-Wiks and B-Wiks), respectively, with respect to the issue of knowledge production (including reproduction) in the broader spectrum of Epistemology. Data and metadata analyzed in the dissertation were released/available as digital footprints from Wikimedia’s Common area and a third-party source (WikiShark). A recent snapshot of each of the above language versions of Wikipedia’s (WiG and WiB) activity, both individually and comparatively, proved that visitors have a specific pattern in both language versions in timeline, are relatively many compared to the number of their own language active users/Wikipedians (G-Wiks and B-Wiks). G-Wiks and B-Wiks, functioning as digitally enabled social networks (DESN), seem to deal more with editing, instead of content creation. From further content analysis of G-Wiks comments in discussion, it seems that editing process is not without tension or toxification; in their attempt to result after discussion in a meaningful Neutral Point of Views (NPOV) content of WiG. Moreover, thanks to many contemporary activities, G-Wiks DESN seems to open floor and facilitate/educate youth to become new members of WiG, through webinars, Editathons, and contests, aiming to result in further intentional, purposeful, and useful, collaborative knowledge production in the digital information pace, during this so-called Information Era. Wikipedia Wikipedians (Wiks) DESN Humanities and the Arts Humaniora och konst
132	[en] NAMED ENTITY RECOGNITION FOR PORTUGUESE / [pt] RECONHECIMENTO DE ENTIDADES MENCIONADAS PARA O PORTUGUÊS DANIEL SPECHT SILVA MENEZES 13 December 2018 (has links) [pt] A produção e acesso a quantidades imensas dados é um elemento pervasivo da era da informação. O volume de informação disponível é sem precedentes na história da humanidade e está sobre constante processo de expansão. Uma oportunidade que emerge neste ambiente é o desenvolvimento de aplicações que sejam capazes de estruturar conhecimento contido nesses dados. Neste contexto se encaixa a área de Processamento de Linguagem Natural (PLN) - Natural Language Processing (NLP) - , ser capaz de extrair informações estruturadas de maneira eficiente de fontes textuais. Um passo fundamental para esse fim é a tarefa de Reconhecimento de Entidades Mencionadas (ou nomeadas) - Named Entity Recognition (NER) - que consistem em delimitar e categorizar menções a entidades num texto. A construção de sistemas para NLP deve ser acompanhada de datasets que expressem o entendimento humano sobre as estruturas gramaticais de interesse, para que seja possível realizar a comparação dos resultados com o real discernimento humano. Esses datasets são recursos escassos, que requerem esforço humano para sua produção. Atualmente, a tarefa de NER vem sendo abordada com sucesso por meio de redes neurais artificiais, que requerem conjuntos de dados anotados tanto para avaliação quanto para treino. A proposta deste trabalho é desenvolver um dataset de grandes dimensões para a tarefa de NER em português de maneira automatizada, minimizando a necessidade de intervenção humana. Utilizamos recursos públicos como fonte de dados, nominalmente o DBpedia e Wikipédia. Desenvolvemos uma metodologia para a construção do corpus e realizamos experimentos sobre o mesmo utilizando arquiteturas de redes neurais de melhores performances reportadas atualmente. Exploramos diversas modelos de redes neurais, explorando diversos valores de hiperparâmetros e propondo arquiteturas com o foco específico de incorporar fontes de dados diferentes para treino. / [en] The production and access of huge amounts of data is a pervasive element of the Information Age. The volume of availiable data is without precedents in human history and it s in constant expansion. An oportunity that emerges in this context is the development and usage of applicationos that are capable structuring the knowledge of data. In this context fits the Natural Language Processing, being able to extract information efficiently from textual data. A fundamental step for this goal is the task of Named Entity Recognition (NER) which delimits and categorizes the mentions to entities. The development o systems for NLP tasks must be accompanied by datasets produced by humans in order to compare the system with the human discerniment for the NLP task at hand. These datasets are a scarse resource which the construction is costly in terms of human supervision. Recentlly, the NER task has been approached using artificial network models which needs datsets for both training and evaluation. In this work we propose the construction of a datasets for portuguese NER with an automatic approach using public data sources structured according to the principles of SemanticWeb, namely, DBpedia and Wikipédia. A metodology for the construction of this dataset was developed and experiments were performed using both the built dataset and the neural network architectures with the best reported results. Many setups for the experiments were evaluated, we obtained preliminary results for diverse hiperparameters values, also proposing architectures with the specific focus of incorporating diverse data sources for training. [pt] REDES NEURAIS [en] NEURAL NETWORKS [pt] APRENDIZADO DE MAQUINA [en] MACHINE LEARNING [pt] WIKIPEDIA [en] WIKIPEDIA [pt] PROCESSAMENTO DE LINGUAGEM NATURAL [en] NATURAL LANGUAGE PROCESSING [en] NAMED ENTITY RECOGNITION [pt] DATASETS [en] DATASETS
133	Perspectivas e metodologias de pesquisa da Comunicação Social no contexto da internet com o Big Data e da especialização Data Scientist / Perspectives and reseach methodologies inthe contexto of social communication of the internet whit Big and data Scientist specialization Gonçalves, Leandro Tavares 09 September 2014 (has links) Made available in DSpace on 2016-08-03T12:30:10Z (GMT). No. of bitstreams: 1 Leandro Tavares2.pdf: 1287442 bytes, checksum: 7f5aa84748d1a824abe72b2b6940ffe2 (MD5) Previous issue date: 2014-09-09 / The work analyzes the media in the context of the Internet and outlines new methodologies for the study area in filtering meanings in the scientific realm of information flows from social networks, news media or any other device that allows storage and retrieval of structured information and unstructured. In an attempt to reflect on the ways that these information flows and develop mainly in the volume produced, the project scales the fields of meanings that this relationship appears in the theories and practices of research. The aim of this study is to contextualize the media area within a changing and dynamic reality that is the environment of the internet and make parallel before the applications already successful in other areas. With the method of case study three cases were analyzed under two conceptual keys to Web Sphere Analysis and the Web Science reflecting the opposing information systems in the discursive and structural aspect. This way observes what the Media has earned in order to view its objects of study in the environment of internet networks for these prospects. The research result shows that it is a challenge to the researcher Media seek new learning, but the feedback information in a collaborative environment that the Internet presents is fertile ground for research path, for data modeling wins analytical corpus when the set of tools promoted and driven by technology allows isolating contents and allows deepening the meanings and relationships. / O trabalho desenvolvido analisa a Comunicação Social no contexto da internet e delineia novas metodologias de estudo para a área na filtragem de significados no âmbito científico dos fluxos de informação das redes sociais, mídias de notícias ou qualquer outro dispositivo que permita armazenamento e acesso a informação estruturada e não estruturada. No intento de uma reflexão sobre os caminhos, que estes fluxos de informação se desenvolvem e principalmente no volume produzido, o projeto dimensiona os campos de significados que tal relação se configura nas teorias e práticas de pesquisa. O objetivo geral deste trabalho é contextualizar a área da Comunicação Social dentro de uma realidade mutável e dinâmica que é o ambiente da internet e fazer paralelos perante as aplicações já sucedidas por outras áreas. Com o método de estudo de caso foram analisados três casos sob duas chaves conceituais a Web Sphere Analysis e a Web Science refletindo os sistemas de informação contrapostos no quesito discursivo e estrutural. Assim se busca observar qual ganho a Comunicação Social tem no modo de visualizar seus objetos de estudo no ambiente das internet por essas perspectivas. O resultado da pesquisa mostra que é um desafio para o pesquisador da Comunicação Social buscar novas aprendizagens, mas a retroalimentação de informação no ambiente colaborativo que a internet apresenta é um caminho fértil para pesquisa, pois a modelagem de dados ganha corpus analítico quando o conjunto de ferramentas promovido e impulsionado pela tecnologia permite isolar conteúdos e possibilita aprofundamento dos significados e suas relações. Web Science Big Data Yahoo Pipes RapidMiner Wikipedia Miner informação linguagem tecnologia internet cognição recursividade modelagem de informação Web Science Big Data Yahoo Pipes RapidMiner Wikipedia Miner information language technology internet cognition recursion information modeling
134	Development of a semantic data collection tool. : The Wikidata Project as a step towards the semantic web. Ubah, Ifeanyichukwu January 2013 (has links) The World Wide Web contains a vast amount of information. This feature makes it a very useful part of our everyday activities but the information contained within is made up of an exponentially increasing repository of semantically unstructured data. The semantic web movement involves the evolution of the existing World Wide web in order to enable computers make meaning of and understand the data they process and consequently increase their processing capabilities. Over the past decade a number of new projects implementing the semantic web technology have been developed albeit still in their infancy. These projects are based on semantic data models and one such is the Wikidata project. The Wikidata project is targeted at providing a more semantic platform for editing and sharing data throughout the Wikipedia and Wikimedia communities. This project studies how the Wikidata project facilitates such a semantic platform for the Wikimedia communities and includes the development of an application utilizing the semantic capabilities of Wikidata. The objective of the project is to develop an application capable of retrieving and presenting statistical data and also be able to make missing or invalid data on Wikidata detectable. The result is an application currently aimed at researchers and students who require a convenient tool for statistical data collection and data mining projects. Usability and performance tests of the application are also conducted with the results presented in the report. Keywords: Semantic web, World Wide Web, Semantic data model, Wikidata, data mining. Wikidata Wikipedia Semantic Semantic Web Data collection Data collection tool Semantic tool JSP semantic mediawiki Wikidata Wikipedia Semantic Semantic Web Data collection Data collection tool Semantic tool JSP semantic mediawiki Software Engineering Programvaruteknik
135	Ověřený zdroj poznání a Wikipedie / A Verified Knowledge Source and Wikipedia Rožek, Štěpán January 2020 (has links) The main subject and aim of this master thesis is the development and practical testing of a collaboration schema between Czech Encyclopedia of Sociology (Sociologická encyklopedie) and the Czech Wikipedia and its sociology-related content. The collaboration between these two information resources is based on the idea that the Encyclopedia of Sociology can serve as a quality resource for writing and editing Wikipedia and vice versa, Wikipedia can potentially increase Encyclopedia of Sociology's visibility and findability. The theoretical part of the thesis presents information about academic knowledge resources and open collaboration knowledge resources. Last but not least the theoretical part contains information about current practices of collaboration between Academia and Wikipedia. The research part of the thesis presents the development of the collaboration schema and its testing. Methods of semi-structured interviews and field experiment are used within this research. In the first part, one conducts interviews with Czech sociology experts and Wikipedians who are active in the field of sociology or related social sciences. The second part of the research then uses a field experiment to verify to some extent the information from the interviews and to induce reaction from the Wikipedian...
136	Aus Ideen werden Projekte, werden Ergebnisse, werden Ideen Heller, Lambert, Hoffmann, Tracy 11 April 2017 (has links) Wie können Bibliothekare digitales Gemeingut in der modernen Informationsgesellschaft mitgestalten? Welche Chancen ergeben sich aus der Kooperation mit Netz-Communities wie der Wikipedia? Wer kreiert überhaupt Wissen in einer Bibliothek? Auf der Suche nach Antworten auf diese Fragen trafen sich im Dezember letzten Jahres Bibliothekarinnen und Wikipedia-Aktive zum ersten WikiLibrary-Barcamp in Dresden. / enthält die Titel: Bibliotheken im Netz – eine Insel im Ozean des freien Wissens? Das Barcamp-Feeling – ein Erlebnisbericht info:eu-repo/classification/ddc/020 ddc:020
137	Μελέτη και ανάλυση συμπεριφορών σε ιστοτόπους κοινωνικής δικτύωσης Κλούβας, Δημήτριος 16 May 2014 (has links) To αντικείμενο της παρούσας διπλωματικής εργασίας είναι η μελέτη της συμπεριφοράς των χρηστών της Wikipedia, όταν πραγματοποιούν μια τροποποίηση περιεχομένου ενός άρθρου, σε σχέση με την χώρα καταγωγής τους. Η μελέτη ξεκινάει με μια γενική παρουσίαση των ιστοσελίδων κοινωνικής δικτύωσης με έμφαση στις Wikipedia αλλά και της έρευνας του ολλανδού κοινωνιολόγου Geert Hofstede και τις θεωρίας του περί την ύπαρξη πέντε κοινωνικών διαστάσεων που μπορούν να περιγράψουν αρκετά ικανοποιητικά κάθε κράτος και τους κατοίκους του. Στην συνέχεια, κατασκευάζουμε μια εφαρμογή η οποία αντλεί και συλλέγει δεδομένα σχετικά με τις τροποποιήσεις από πέντε διαφορετικές εκδόσεις – γλώσσες της Wikipedia για 8 διαφορετικά άρθρα και τα κατηγοριοποιεί ανάλογα με το είδος της τροποποίησης. Τέλος, γίνεται η προσπάθεια εξαγωγής κάποιων συμπερασμάτων σχετικά με τον τρόπο συμπεριφοράς των χρηστών που προέρχονται από το ίδιο κράτος συγκρίνοντας τα δεδομένα που συλλέξαμε για κάθε διαφορετική γλώσσα με τις διαστάσεις που έχει μετρήσει ο Geert Hofstede για το αντίστοιχο κράτος. / The subject of this thesis is to study the behaviour of the users of Wikipedia when editing the content of an article, with respect to the country of origin of the user. The study begins with an overview of social networking websites with a focus on Wikipedia and a presentation of the research of the Dutch sociologist Geert Hofstede and his theory of the existence of five social dimensions that can describe quite well each country and its residents. Afterwards, we develop an application that draws and collects data from the article history about the edits of eight Wikipedia articles from five different editions – languages of Wikipedia and classifies them according to the type of the edit. Finally, we attempt to export some conclusions about the behaviour of users from the same country by relating the data we exported for each language to the dimensions measured by Geert Hofstede for the corresponding country. Βικιπαίδεια 303.482 Wikipedia Social network analysis (SNA) Hofstede's cultural dimensions National characteristics
138	Encyclopaedic question answering Dornescu, Iustin January 2012 (has links) Open-domain question answering (QA) is an established NLP task which enables users to search for speciVc pieces of information in large collections of texts. Instead of using keyword-based queries and a standard information retrieval engine, QA systems allow the use of natural language questions and return the exact answer (or a list of plausible answers) with supporting snippets of text. In the past decade, open-domain QA research has been dominated by evaluation fora such as TREC and CLEF, where shallow techniques relying on information redundancy have achieved very good performance. However, this performance is generally limited to simple factoid and deVnition questions because the answer is usually explicitly present in the document collection. Current approaches are much less successful in Vnding implicit answers and are diXcult to adapt to more complex question types which are likely to be posed by users. In order to advance the Veld of QA, this thesis proposes a shift in focus from simple factoid questions to encyclopaedic questions: list questions composed of several constraints. These questions have more than one correct answer which usually cannot be extracted from one small snippet of text. To correctly interpret the question, systems need to combine classic knowledge-based approaches with advanced NLP techniques. To Vnd and extract answers, systems need to aggregate atomic facts from heterogeneous sources as opposed to simply relying on keyword-based similarity. Encyclopaedic questions promote QA systems which use basic reasoning, making them more robust and easier to extend with new types of constraints and new types of questions. A novel semantic architecture is proposed which represents a paradigm shift in open-domain QA system design, using semantic concepts and knowledge representation instead of words and information retrieval. The architecture consists of two phases, analysis – responsible for interpreting questions and Vnding answers, and feedback – responsible for interacting with the user. This architecture provides the basis for EQUAL, a semantic QA system developed as part of the thesis, which uses Wikipedia as a source of world knowledge and iii employs simple forms of open-domain inference to answer encyclopaedic questions. EQUAL combines the output of a syntactic parser with semantic information from Wikipedia to analyse questions. To address natural language ambiguity, the system builds several formal interpretations containing the constraints speciVed by the user and addresses each interpretation in parallel. To Vnd answers, the system then tests these constraints individually for each candidate answer, considering information from diUerent documents and/or sources. The correctness of an answer is not proved using a logical formalism, instead a conVdence-based measure is employed. This measure reWects the validation of constraints from raw natural language, automatically extracted entities, relations and available structured and semi-structured knowledge from Wikipedia and the Semantic Web. When searching for and validating answers, EQUAL uses the Wikipedia link graph to Vnd relevant information. This method achieves good precision and allows only pages of a certain type to be considered, but is aUected by the incompleteness of the existing markup targeted towards human readers. In order to address this, a semantic analysis module which disambiguates entities is developed to enrich Wikipedia articles with additional links to other pages. The module increases recall, enabling the system to rely more on the link structure of Wikipedia than on word-based similarity between pages. It also allows authoritative information from diUerent sources to be linked to the encyclopaedia, further enhancing the coverage of the system. The viability of the proposed approach was evaluated in an independent setting by participating in two competitions at CLEF 2008 and 2009. In both competitions, EQUAL outperformed standard textual QA systems as well as semi-automatic approaches. Having established a feasible way forward for the design of open-domain QA systems, future work will attempt to further improve performance to take advantage of recent advances in information extraction and knowledge representation, as well as by experimenting with formal reasoning and inferencing capabilities. 025.04
139	Coreference resolution with and for Wikipedia Ghaddar, Abbas 06 1900 (has links) Wikipédia est une ressource embarquée dans de nombreuses applications du traite- ment des langues naturelles. Pourtant, aucune étude à notre connaissance n’a tenté de mesurer la qualité de résolution de coréférence dans les textes de Wikipédia, une étape préliminaire à la compréhension de textes. La première partie de ce mémoire consiste à construire un corpus de coréférence en anglais, construit uniquement à partir des articles de Wikipédia. Les mentions sont étiquetées par des informations syntaxiques et séman- tiques, avec lorsque cela est possible un lien vers les entités FreeBase équivalentes. Le but est de créer un corpus équilibré regroupant des articles de divers sujets et tailles. Notre schéma d’annotation est similaire à celui suivi dans le projet OntoNotes. Dans la deuxième partie, nous allons mesurer la qualité des systèmes de détection de coréférence à l’état de l’art sur une tâche simple consistant à mesurer les mentions du concept décrit dans une page Wikipédia (p. ex : les mentions du président Obama dans la page Wiki- pédia dédiée à cette personne). Nous tenterons d’améliorer ces performances en faisant usage le plus possible des informations disponibles dans Wikipédia (catégories, redi- rects, infoboxes, etc.) et Freebase (information du genre, du nombre, type de relations avec autres entités, etc.). / Wikipedia is a resource of choice exploited in many NLP applications, yet we are not aware of recent attempts to adapt coreference resolution to this resource, a prelim- inary step to understand Wikipedia texts. The first part of this master thesis is to build an English coreference corpus, where all documents are from the English version of Wikipedia. We annotated each markable with coreference type, mention type and the equivalent Freebase topic. Our corpus has no restriction on the topics of the documents being annotated, and documents of various sizes have been considered for annotation. Our annotation scheme follows the one of OntoNotes with a few disparities. In part two, we propose a testbed for evaluating coreference systems in a simple task of measuring the particulars of the concept described in a Wikipedia page (eg. The statements of Pres- ident Obama the Wikipedia page dedicated to that person). We show that by exploiting the Wikipedia markup (categories, redirects, infoboxes, etc.) of a document, as well as links to external knowledge bases such as Freebase (information of the type, num- ber, type of relationship with other entities, etc.), we can acquire useful information on entities that helps to classify mentions as coreferent or not. Résolution de Coréférences Création du corpus Wikipédia Coreference Resolution Corpus Creation Wikipedia
140	Digital Communication and Interactive Storytelling in Wikipedia : A Study of Greek Users’ Interaction and Experience Mavridis, George January 2021 (has links) Wikipedia consists of an online encyclopedia created by users worldwide who collaborate to distribute knowledge and edit information in real-time. Although Wikipedia's accuracy has been a disputable and debatable issue in many recent studies, little academic research has systematically addressed how users interact with the platform's storytelling tools and how do they perceive and use Wikipedia's infrastructure, such as interactive tools. This exploratory study fulfills this gap and sheds light on users' perceptions about Wikipedia's interactivity. Moreover, Wikipedia is approached as an online community where collaboration, co-creation, and knowledge distribution play an important role. Therefore, it can be studied under the scope of Digital Humanities as well. The theoretical framework of interactive storytelling and digital communication suggests that hyperlinks, page preview bottoms, or interactive catalogs are applied in Wikipedia's environment to help users absorb information and construct their narratives. The findings of this thesis offer practical insights on how Wikipedia's interactive storytelling tools empower users with the ability to develop their stories and become editors/authors and provide a foundation for further academic research on user experience and how to improve interactivity and digital communication in Wikipedia Wikipedia interactive storytelling digital communication digital humanities online communities user experience hyperlinks interactive tools Humanities and the Arts Humaniora och konst

Search results