Global ETD Search

81	Evaluation and development of conceptual document similarity metrics with content-based recommender applications Gouws, Stephan 12 1900 (has links) Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: The World Wide Web brought with it an unprecedented level of information overload. Computers are very effective at processing and clustering numerical and binary data, however, the automated conceptual clustering of natural-language data is considerably harder to automate. Most past techniques rely on simple keyword-matching techniques or probabilistic methods to measure semantic relatedness. However, these approaches do not always accurately capture conceptual relatedness as measured by humans. In this thesis we propose and evaluate the use of novel Spreading Activation (SA) techniques for computing semantic relatedness, by modelling the article hyperlink structure of Wikipedia as an associative network structure for knowledge representation. The SA technique is adapted and several problems are addressed for it to function over the Wikipedia hyperlink structure. Inter-concept and inter-document similarity metrics are developed which make use of SA to compute the conceptual similarity between two concepts and between two natural-language documents. We evaluate these approaches over two document similarity datasets and achieve results which compare favourably with the state of the art. Furthermore, document preprocessing techniques are evaluated in terms of the performance gain these techniques can have on the well-known cosine document similarity metric and the Normalised Compression Distance (NCD) metric. Results indicate that a near two-fold increase in accuracy can be achieved for NCD by applying simple preprocessing techniques. Nonetheless, the cosine similarity metric still significantly outperforms NCD. Finally, we show that using our Wikipedia-based method to augment the cosine vector space model provides superior results to either in isolation. Combining the two methods leads to an increased correlation of Pearson p = 0:72 over the Lee (2005) document similarity dataset, which matches the reported result for the state-of-the-art Explicit Semantic Analysis (ESA) technique, while requiring less than 10% of the Wikipedia database as required by ESA. As a use case for document similarity techniques, a purely content-based news-article recommender system is designed and implemented for a large online media company. This system is used to gather additional human-generated relevance ratings which we use to evaluate the performance of three state-of-the-art document similarity metrics for providing content-based document recommendations. / AFRIKAANSE OPSOMMING: Die Wêreldwye-Web het ’n vlak van inligting-oorbelading tot gevolg gehad soos nog nooit tevore. Rekenaars is baie effektief met die verwerking en groepering van numeriese en binêre data, maar die konsepsuele groepering van natuurlike-taal data is aansienlik moeiliker om te outomatiseer. Tradisioneel berus sulke algoritmes op eenvoudige sleutelwoordherkenningstegnieke of waarskynlikheidsmetodes om semantiese verwantskappe te bereken, maar hierdie benaderings modelleer nie konsepsuele verwantskappe, soos gemeet deur die mens, baie akkuraat nie. In hierdie tesis stel ons die gebruik van ’n nuwe aktiverings-verspreidingstrategie (AV) voor waarmee inter-konsep verwantskappe bereken kan word, deur die artikel skakelstruktuur van Wikipedia te modelleer as ’n assosiatiewe netwerk. Die AV tegniek word aangepas om te funksioneer oor die Wikipedia skakelstruktuur, en verskeie probleme wat hiermee gepaard gaan word aangespreek. Inter-konsep en inter-dokument verwantskapsmaatstawwe word ontwikkel wat gebruik maak van AV om die konsepsuele verwantskap tussen twee konsepte en twee natuurlike-taal dokumente te bereken. Ons evalueer hierdie benadering oor twee dokument-verwantskap datastelle en die resultate vergelyk goed met die van ander toonaangewende metodes. Verder word teks-voorverwerkingstegnieke ondersoek in terme van die moontlike verbetering wat dit tot gevolg kan hê op die werksverrigting van die bekende kosinus vektorruimtemaatstaf en die genormaliseerde kompressie-afstandmaatstaf (GKA). Resultate dui daarop dat GKA se akkuraatheid byna verdubbel kan word deur gebruik te maak van eenvoudige voorverwerkingstegnieke, maar dat die kosinus vektorruimtemaatstaf steeds aansienlike beter resultate lewer. Laastens wys ons dat die Wikipedia-gebasseerde metode gebruik kan word om die vektorruimtemaatstaf aan te vul tot ’n gekombineerde maatstaf wat beter resultate lewer as enige van die twee metodes afsonderlik. Deur die twee metodes te kombineer lei tot ’n verhoogde korrelasie van Pearson p = 0:72 oor die Lee dokument-verwantskap datastel. Dit is gelyk aan die gerapporteerde resultaat vir Explicit Semantic Analysis (ESA), die huidige beste Wikipedia-gebasseerde tegniek. Ons benadering benodig egter minder as 10% van die Wikipedia databasis wat benodig word vir ESA. As ’n toetstoepassing vir dokument-verwantskaptegnieke ontwerp en implementeer ons ’n stelsel vir ’n aanlyn media-maatskappy wat nuusartikels aanbeveel vir gebruikers, slegs op grond van die artikels se inhoud. Joernaliste wat die stelsel gebruik ken ’n punt toe aan elke aanbeveling en ons gebruik hierdie data om die akkuraatheid van drie toonaangewende maatstawwe vir dokument-verwantskap te evalueer in die konteks van inhoud-gebasseerde nuus-artikel aanbevelings. Document similarity Wikipedia Spreading activation Information retrieval Dissertations -- Electronic engineering Theses -- Electronic engineering
82	Wikipedia som källa? : Är det accepterat vid studier i ämnet medie- och kommunikationsvetenskap vid Uppsala universitet? Salomon, Susanna January 2008 (has links) <p>Abstract</p><p>Title: Wikipedia as a source? Is it accepted in the studies of Media and Communications at Uppsala University? (Wikipedia som källa? Är det accepterat vid studier i ämnet medie- och kommunikationsvetenskap vid Uppsala universitet?)</p><p>Number of pages: 38 (39 including enclosures)</p><p>Author: Susanna Salomon</p><p>Tutor: Else Nygren</p><p>Course: Media and Communication Studies C</p><p>Period: Autumn 2007</p><p>University: Division of Media and Communication, Department of Information Science, Uppsala University</p><p>Purpose/ Aim: The purpose is to study whether or not the Internet encyclopedia Wikipedia is an accepted source when a student writes a paper in Media and Communications at Uppsala University</p><p>Material/ Method: Qualitative research method based on interviews with teachers and on litterature.</p><p>Main results: The study shows that there is no common view within the faculty whether or not Wikipedia could be used as a source when writing a paper in Media and Communications. Some accept it, others do not. The results show that the teachers of this subject at Uppsala University have not yet decided upon how to adjust to the new large information bank wich is Wikipedia.</p><p>Keywords: Wikipedia, Uppsala University, sources, reliability, objectivity, collective intelligence, the Internet</p> Wikipedia Uppsala University sources reliability objectivity collective intelligence the Internet Media and communication studies Medie- och kommunikationsvetenskap
83	"Statistiskt sett har ju NE också fel" : en kvalitativ studie rörande gymnasiebibliotekariers uppfattningar och undervisning kring Wikipedia Jensen, Malene, Törnqvist-Andersson, Caroline January 2010 (has links) <p>The main purpose of this bachelor’s thesis is to examine how Swedish upper secondary school librarians and related staff relate to the online encyclopaedia Wikipedia. The research is placed in the all-embracing context of information literacy and source criticism on the Internet. The study was carried out in the form of qualitative interviews and based on two theoretical bases: firstly, the concept of cognitive authority stipulated by Patrick Wilson, and secondly the idea that Wikipedia has a draw towards late modern epistemological assumptions. The latter theory was also associated with the perceptions of knowledge among the library staff interviewed. According to the results, there seems to be a connection between a) the library staff’s perceptions of knowledge, b) the library staff’s actual knowing about Wikipedia, c) the library staff’s attitudes toward Wikipedia and d) the teaching performed by the library staff about Wikipedia. The results also suggest the cognitive authority of the traditional encyclopaedia to be strong, especially among those within the library staff not very familiar with Wikipedia. Those positive to Wikipedia possibly represent perceptions of knowledge with a draw towards the late modern. Finally, possible solutions regarding the improvement of information literacy and of Wikipedia usage among the upper secondary school pupils are discussed.</p> Wikipedia informationskompetens gymnasiebibliotekarier gymnasieelever kognitiv auktoritet kunskapssyn Library and information science Biblioteks- och informationsvetenskap
84	What is the influence of genre during the perception of structured text for retrieval and search? Clark, Malcolm John January 2014 (has links) This thesis presents an investigation into the high value of structured text (or form) in the context of genre within Information Retrieval. In particular, how are these structured texts perceived and why are they not more heavily used within Information Retrieval & Search communities? The main motivation is to show the features in which people can exploit genre within Information Search & Retrieval, in particular, categorisation and search tasks. To do this, it was vital to record and analyse how and why this was done during typical tasks. The literature review highlighted two previous studies (Toms & Campbell 1999a; Watt 2009) which have reported pilot studies consisting of genre categorisation and information searching. Both studies and other findings within the literature review inspired the work contained within this thesis. Genre is notoriously hard to define, but a very useful framework of ‘Purpose and Form’, developed by Yates & Orlikowski (1992), was utilised to design two user studies for the research reported within the thesis. The two studies consisted of, first, a categorisation task (e-‐mails), and second, a set of six ‘simulated situations’ in Wikipedia, both of which collected quantitative data from eye tracking experiments as well as qualitative user data. The results of both studies showed the extent to which the participants utilised the form features of the stimuli presented, in particular, how these were used, which ocular behaviours (skimming or scanning) and actual features were used, and which were the most important. The main contributions to research made by this thesis were, first of all, that the task-‐based user evaluations employing simulated search scenarios revealed ‘how’ and ‘why’ users make decisions while interacting with the textual features of structure and layout within a discourse community, and, secondly, an extensive evaluation of the quantitative data revealed the features that were used by the participants in the user studies and the effects of the interpretation of genre in the search and categorisation process as well as the perceptual processes used in the various communities. This will be of benefit for the re-‐development of information systems. As far as is known, this is the first detailed and systematic investigation into the types of features, value of form, perception of features, and layout of genre using eye tracking in online communities, such as Wikipedia. 006.3
85	Creating a Bilingual Dictionary using Wikipedia / Creating a Bilingual Dictionary using Wikipedia Ivanova, Angelina January 2011 (has links) Title: Creating a Bilingual Dictionary using Wikipedia Author: Angelina Ivanova Department/Institute: Institute of Formal and Applied Linguistics (32-ÚFAL) Supervisor of the master thesis: RNDr. Daniel Zeman Ph.D. Abstract: Machine-readable dictionaries play important role in the research area of com- putational linguistics. They gained popularity in such fields as machine translation and cross-language information extraction. In this thesis we investigate the quality and content of bilingual English-Russian dictionaries generated from Wikipedia link structure. Wiki-dictionaries differ dramatically from the traditional dictionaries: the re- call of the basic terminology on Mueller's dictionary was 7.42%. Machine translation experiments with Wiki-dictionary incorporated into the training set resulted in the rather small, but statistically significant drop of the the quality of the translation compared to the experiment without Wiki-dictionary. We supposed that the main reason was domain difference between the dictio- nary and the corpus and got some evidence that on the test set collected from Wikipedia articles the model with incorporated dictionary performed better. In this work we show how big the difference between the dictionaries de- veloped from the Wikipedia link structure and the traditional...
86	L'utilisation de ressources partagées dans l'apprentissage : un changement du rapport au savoir ? : Le cas de Wikipedia Maneval, Béatrice 26 March 2013 (has links) C'est face à la montée fulgurante de l'informatique pour tous que débute cette thèse. L'usage des nouvelles technologies se propage dans tous les domaines. L'éducation est bien sûr concernée par ce phénomène.C'est ainsi que nous remonterons le temps aux débuts de l'éducation en retraçant les évolutions qui l'ont affectée. Ceci nous permettra de mieux comprendre comment les bouleversements d'aujourd'hui s'intègrent dans notre système actuel. Nous étudierons tout particulièrement l'encyclopédie Wikipedia, véritable phénomène de société par sa gestion innovante. Comment est-elle appréhendée dans l'enseignement ? Notre étude essaiera d'apporter des éclaircissements dans ce domaine. L'encyclopédie est-elle exploitée par la population enseignante dans leurs pratiques professionnelles ? Cet espace collaboratif occupe-t-il toute sa place dans le monde de l'éducation ? Ces questionnements nous permettront d'élaborer notre plan d'étude. Il nous conduira en conclusion vers les modalités d'intégration de ces nouveaux espaces de ressources partagées dans l'enseignement. / The genesis of this thesis is the lightning rise of Information Technology. The use of new technologies is spreading in every field. Education is of course affected by this trend.This is why we will go back in time to the early stages of education by observing its evolution.This will allow us to better understand how today's changesfit into our current system. We will study with particular interest the Wikipedia encyclopedia, a genuine phenomenon of our time by his innovative approach. How is it utilized in teaching? Our study will attempt to bring light to such areas.How is the encyclopedia utilized in education, and particularly by teachers?Does this collaborative field occupy all its place in the world of education?These questions will allow us to develop our study plan. It will as a conclusion lead us towards the integration modes of these new shared-resource fields in teaching. TICE Encyclopédie Wikipédia Apprentissage Enseignants Didactique TICE Wikipedia encyclopedia Learning Teachers Didactic
87	Wikipédia, encyclopédie et site d’actualités ˸ qualité de l'information et normes collaboratives d’un média en ligne / Wikipedia, encyclopaedia and news site ˸ quality of information and collaborative standards for an online media outlet Doutreix, Marie-Noëlle 07 December 2018 (has links) Approximations, fausses informations, biais idéologiques : Wikipédia a dû répondre à de nombreuses critiques depuis sa création en 2001. Pourtant, les contributeurs n’ont cessé d’élaborer des outils, labels et indicateurs pour accompagner les utilisateurs dans la recherche d’une information fiable.Cette thèse s’intéresse aux moyens mis en œuvre pour renforcer la fiabilité d’un média collaboratif en ligne. Car la synthèse que Wikipédia opère entre une prétention encyclopédique et la prise en charge de l’actualité invite à la concevoir comme une encyclopédie médiatique. En effet, l’actualité y est traitée en temps réel et les sources utilisées pour ce type de sujet sont bien souvent journalistiques, participant ainsi à la « circulation circulaire » de l’information médiatique. Le genre encyclopédique de Wikipédia est questionné en cherchant les éléments de filiation tels que l’hypertextualité et en soulignant les principaux points de divergences épistémologiques comme le rapport aux sources et le principe de neutralité de point de vue. Les deux corpus étudiés, de près de trois cents articles chacun, rendent manifeste la place de l’actualité dans les usages de Wikipédia et dans les pratiques des contributeurs. / Overly rough data, false information, ideological prejudices, such are the numerous criticisms to which the Wikipedia project has responded ever since its inception in 2001. All this despite the fact that contributors to the project had never ceased honing their tools, labels and indications to help users find reliable information.This thesis seeks to explore the means that have been used to enhance the reliability of the collaborative medium put on line. It argues that the synthesis proposed by the Wikipedia project combines, on the one hand, claims to possess an encyclopaedic format with, on the other, the ability to keep abreast of current developments makes it a media encyclopaedia. Indeed, current developments are dealt with in real time and the sources relied on are often journalistic in nature, a situation which displays the « circular circulation » of information, characteristic of media in general. The very genre of the encyclopaedia is brought into the spotlight when the thesis turns to related phenomena such as hypertextuality and also when it underscores the main areas of epistemological differences such as the relationship with sources and the principle of a neutral point of view. The two corpuses studied, each comprising nearly three hundred articles, clearly show the place that the Wikipedia project devotes to current events as well as the importance they have for the practices engaged in by contributors to the project. Média Encyclopédie Numérique Information Qualité Web Actualité Wikipédia Media Encyclopaedia Digital Information Quality Web News Wikipedia
88	A wikification prediction model based on the combination of latent, dyadic and monadic features / Um modelo de previsão para Wikification baseado na combinação de atributos latentes, diádicos e monádicos Ferreira, Raoni Simões 25 April 2016 (has links) Most of the reference information, nowadays, is found in repositories of documents semantically linked, created in a collaborative fashion and freely available in the web. Among the many problems faced by content providers in these repositories, one of the most important is Wikification, that is, the placement of links in the articles. These links have to support user navigation and should provide a deeper semantic interpretation of the content. Wikification is a hard task since the continuous growth of such repositories makes it increasingly demanding for editors. As consequence, they have their focus shifted from content creation, which should be their main objective. This has motivated the design of automatic Wikification tools which, traditionally, address two distinct problems: (a) how to identify which words (or phrases) in an article should be selected as anchors and (b) how to determine to which article the link, associated with the anchor, should point. Most of the methods in literature that address these problems are based on machine learning approaches which attempt to capture, through statistical features, characteristics of the concepts and its associations. Although these strategies handle the repository as a graph of concepts, normally they take limited advantage of the topological structure of this graph, as they describe it by means of human-engineered link statistical features. Despite the effectiveness of these machine learning methods, better models should take full advantage of the information topology if they describe it by means of data-oriented approaches such as matrix factorization. This indeed has been successfully done in other domains, such as movie recommendation. In this work, we fill this gap, proposing a wikification prediction model that combines the strengths of traditional predictors based on statistical features with a latent component which models the concept graph topology by means of matrix factorization. By comparing our model with a state-of-the-art wikification method, using a sample of Wikipedia articles, we obtained a gain up to 13% in F1 metric. We also provide a comprehensive analysis of the model performance showing the importance of the latent predictor component and the attributes derived from the associations between the concepts. The study still includes the analysis of the impact of ambiguous concepts, which allows us to conclude the model is resilient to ambiguity, even though does not include any explicitly disambiguation phase. We finally study the impact of selecting training samples from specific content quality classes, an information that is available in some respositories, such as Wikipedia. We empirically shown that the quality of the training samples impact on precision and overlinking, when comparing training performed using random quality samples versus high quality samples. / Atualmente, informações de referência são disponibilizadas através de repositórios de documentos semanticamente ligados, criados de forma colaborativa e com acesso livre na Web. Entre os muitos problemas enfrentados pelos provedores de conteúdo desses repositórios, destaca-se a Wikification, isto é, a inclusão de links nos artigos desses repositórios. Esses links possibilitam a navegação pelos artigos e permitem ao usuário um aprofundamento semântico do conteúdo. A Wikification é uma tarefa complexa, uma vez que o crescimento contínuo de tais repositórios resulta em um esforço cada vez maior dos editores. Como consequência, eles têm seu foco desviado da criação de conteúdo, que deveria ser o seu principal objetivo. Isso tem motivado o desenvolvimento de ferramentas de Wikification automática que, tradicionalmente, abordam dois problemas distintos: (a) como identificar que palavras (ou frases) em um artigo deveriam ser selecionados como texto de âncora e (b) como determinar para que artigos o link, associado ao texto de âncora, deveria apontar. A maioria dos métodos na literatura que abordam esses problemas usam aprendizado de máquina. Eles tentam capturar, através de atributos estatísticos, características dos conceitos e seus links. Embora essas estratégias tratam o repositório como um grafo de conceitos, normalmente elas pouco exploram a estrutura topológica do grafo, uma vez que se limitam a descrevê-lo por meio de atributos estatísticos dos links, projetados por especialistas humanos. Embora tais métodos sejam eficazes, novos modelos poderiam tirar mais proveito da topologia se a descrevessem por meio de abordagens orientados a dados, tais como a fatoração matricial. De fato, essa abordagem tem sido aplicada com sucesso em outros domínios como recomendação de filmes. Neste trabalho, propomos um modelo de previsão para Wikification que combina a força dos previsores tradicionais baseados em atributos estatísticos, projetados por seres humanos, com um componente de previsão latente, que modela a topologia do grafo de conceitos usando fatoração matricial. Ao comparar nosso modelo com o estado-da-arte em Wikification, usando uma amostra de artigos Wikipédia, observamos um ganho de até 13% em F1. Além disso, fornecemos uma análise detalhada do desempenho do modelo enfatizando a importância do componente de previsão latente e dos atributos derivados dos links entre os conceitos. Também analisamos o impacto de conceitos ambíguos, o que permite concluir que nosso modelo se porta bem mesmo diante de ambiguidade, apesar de não tratar explicitamente este problema. Ainda realizamos um estudo sobre o impacto da seleção das amostras de treino conforme a qualidade dos seus conteúdos, uma informação disponível em alguns repositórios, tais como a Wikipédia. Nós observamos que o treino com documentos de alta qualidade melhora a precisão do método, minimizando o uso de links desnecessários. Aprendizado de máquina Fatoração matricial Link prediction Machine learning Matrix factorization Previsão de links Wikificação Wikification Wikipedia Wikipédia
89	Can web indicators be used to estimate the citation impact of conference papers in engineering? Aduku, Kuku J. January 2019 (has links) Although citation counts are widely used to support research evaluation, they can only reflect academic impacts, whereas research can also be useful outside academia. There is therefore a need for alternative indicators and empirical studies to evaluate them. Whilst many previous studies have investigated alternative indicators for journal articles and books, this thesis explores the importance and suitability of four web indicators for conference papers. These are readership counts from the online reference manager Mendeley and citation counts from Google Patents, Wikipedia and Google Books. To help evaluate these indicators for conference papers, correlations with Scopus citations were evaluated for each alternative indicator and compared with corresponding correlations between alternative indicators and citation counts for journal articles. Four subject areas that value conferences were chosen for the analysis: Computer Science Applications; Computer Software Engineering; Building & Construction Engineering; and Industrial & Manufacturing Engineering. There were moderate correlations between Mendeley readership counts and Scopus citation counts for both journal articles and conference papers in Computer Science Applications and Computer Software. For conference papers in Building & Construction Engineering and Industrial & Manufacturing Engineering, the correlations between Mendeley readers and citation counts are much lower than for journal articles. Thus, in fields where conferences are important, Mendeley readership counts are reasonable impact indicators for conference papers although they are better impact indicators for journal articles. Google Patent citations had low positive correlations with citation counts for both conference papers and journal articles in Software Engineering and Computer Science Applications. There were negative correlations for both conference papers and journal articles in Industrial and Manufacturing Engineering. However, conference papers in Building and Construction Engineering attracted no Google Patent citations. This suggests that there are disciplinary differences but little overall value for Google Patent citations as impact indicators in engineering fields valuing conferences. Wikipedia citations had correlations with Scopus citations that were statistically significantly positive only in Computer Science Applications, whereas the correlations were not statistically significantly different from zero in Building & Construction Engineering, Industrial & Manufacturing Engineering and Software Engineering. Conference papers were less likely to be cited in Wikipedia than journal articles were in all fields, although the difference was minor in Software Engineering. Thus, Wikipedia citations seem to have little value in engineering fields valuing conferences. Google Books citations had positive significant correlations with Scopus-indexed citations for conference papers in all fields except Building & Construction Engineering, where the correlations were not statistically significantly different from zero. Google Books citations seemed to be most valuable impact indicators in Computer Science Applications and Software Engineering, where the correlations were moderate, than in Industrial & Manufacturing Engineering, where the correlations were low. This means that Google Book citations are valuable indicators for conference papers in engineering fields valuing conferences. Although evidence from correlation tests alone is insufficient to judge the value of alternative indicators, the results suggest that Mendeley readers and Google Books citations may be useful for both journal articles and conference papers in engineering fields that value conferences, but not Wikipedia citations or Google Patent citations.
90	Media om Wikipedia : en diskursanalys av nationella facktidskrifter / Media on Wikipedia : a discourse analysis of national professional journals Almroth, Bodil, Tenglin, Sofia January 2011 (has links) The aim of this Master’s thesis is to examine how librarians, information specialists and teachers discuss Wikipedia within national (Swedish) professional journals. Questions asked in the study are: How is Wikipedia perceived in the professional journals? What different positions do writers and commentators take in relation to Wikipedia? 133 articles from 31 different professional journals in the period from 2001 to the middle of 2010 were analysed. The theory and method used is Laclau’s and Mouffe’s discourse theory from which we created our own model with sex different steps. The results show that there are several recurring discussions, for example about the credibility of user generated content. Another example is the discussion about the use of Wikipedia within the education and school contexts and whether the encyclopedia should be seen as sufficiently credible to use in these contexts. From these discussions we have identified three discourses that we have chosen to call: the knowledge-liberal discourse, the knowledge-conservative discourse and the pedagogical discourse. / Program: Bibliotekarie wikipedia experter diskursanalys nationella facktidskrifter skol- och utbildningsmiljöer kunskap kognitiva auktoriteter användargenererat material Social Sciences Samhällsvetenskap

Search results