Global ETD Search

561	Topic and link detection from multilingual news. January 2003 (has links) Huang Ruizhang. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 110-114). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- The Defitition of Topic and Event --- p.2 / Chapter 1.2 --- Event and Topic Discovery --- p.2 / Chapter 1.2.1 --- Problem Definition --- p.2 / Chapter 1.2.2 --- Characteristics of the Discovery Problems --- p.3 / Chapter 1.2.3 --- Our Contributions --- p.5 / Chapter 1.3 --- Story Link Detection --- p.5 / Chapter 1.3.1 --- Problem Definition --- p.5 / Chapter 1.3.2 --- Our Contributions --- p.6 / Chapter 1.4 --- Thesis Organization --- p.7 / Chapter 2 --- Literature Review --- p.8 / Chapter 2.1 --- University of Massachusetts (UMass) --- p.8 / Chapter 2.1.1 --- Topic Detection Approach --- p.8 / Chapter 2.1.2 --- Story Link Detection Approach --- p.9 / Chapter 2.2 --- BBN Technologies --- p.10 / Chapter 2.3 --- IBM Research Center --- p.11 / Chapter 2.4 --- Carnegie Mellon University (CMU) --- p.12 / Chapter 2.4.1 --- Topic Detection Approach --- p.12 / Chapter 2.4.2 --- Story Link Detection Approach --- p.14 / Chapter 2.5 --- National Taiwan University (NTU) --- p.14 / Chapter 2.5.1 --- Topic Detection Approach --- p.14 / Chapter 2.5.2 --- Story Link Detection Approach --- p.15 / Chapter 3 --- System Overview --- p.17 / Chapter 3.1 --- News Sources --- p.18 / Chapter 3.2 --- Story Preprocessing --- p.24 / Chapter 3.3 --- Information Extraction --- p.25 / Chapter 3.4 --- Gloss Translation --- p.26 / Chapter 3.5 --- Term Weight Calculation --- p.30 / Chapter 3.6 --- Event And Topic Discovery --- p.31 / Chapter 3.7 --- Story Link Detection --- p.33 / Chapter 4 --- Event And Topic Discovery --- p.34 / Chapter 4.1 --- Overview of Event and Topic discovery --- p.34 / Chapter 4.2 --- Event Discovery Component --- p.37 / Chapter 4.2.1 --- Overview of Event Discovery Algorithm --- p.37 / Chapter 4.2.2 --- Similarity Calculation --- p.39 / Chapter 4.2.3 --- Story and Event Combination --- p.43 / Chapter 4.2.4 --- Event Discovery Output --- p.44 / Chapter 4.3 --- Topic Discovery Component --- p.45 / Chapter 4.3.1 --- Overview of Topic Discovery Algorithm --- p.47 / Chapter 4.3.2 --- Relevance Model --- p.47 / Chapter 4.3.3 --- Event and Topic Combination --- p.50 / Chapter 4.3.4 --- Topic Discovery Output --- p.50 / Chapter 5 --- Event And Topic Discovery Experimental Results --- p.54 / Chapter 5.1 --- Testing Corpus --- p.54 / Chapter 5.2 --- Evaluation Methodology --- p.56 / Chapter 5.3 --- Experimental Results on Event Discovery --- p.58 / Chapter 5.3.1 --- Parameter Tuning --- p.58 / Chapter 5.3.2 --- Event Discovery Result --- p.59 / Chapter 5.4 --- Experimental Results on Topic Discovery --- p.62 / Chapter 5.4.1 --- Parameter Tuning --- p.64 / Chapter 5.4.2 --- Topic Discovery Results --- p.64 / Chapter 6 --- Story Link Detection --- p.67 / Chapter 6.1 --- Topic Types --- p.67 / Chapter 6.2 --- Overview of Link Detection Component --- p.68 / Chapter 6.3 --- Automatic Topic Type Categorization --- p.70 / Chapter 6.3.1 --- Training Data Preparation --- p.70 / Chapter 6.3.2 --- Feature Selection --- p.72 / Chapter 6.3.3 --- Training and Tuning Categorization Model --- p.73 / Chapter 6.4 --- Link Detection Algorithm --- p.74 / Chapter 6.4.1 --- Story Component Weight --- p.74 / Chapter 6.4.2 --- Story Link Similarity Calculation --- p.76 / Chapter 6.5 --- Story Link Detection Output --- p.77 / Chapter 7 --- Link Detection Experimental Results --- p.80 / Chapter 7.1 --- Testing Corpus --- p.80 / Chapter 7.2 --- Topic Type Categorization Result --- p.81 / Chapter 7.3 --- Link Detection Evaluation Methodology --- p.82 / Chapter 7.4 --- Experimental Results on Link Detection --- p.83 / Chapter 7.4.1 --- Language Normalization Factor Tuning --- p.83 / Chapter 7.4.2 --- Link Detection Performance --- p.90 / Chapter 7.4.3 --- Link Detection Performance Breakdown --- p.91 / Chapter 8 --- Conclusions and Future Work --- p.95 / Chapter 8.1 --- Conclusions --- p.95 / Chapter 8.2 --- Future Work --- p.96 / Chapter A --- List of Topic Title Annotated for TDT3 corpus by LDC --- p.98 / Chapter B --- List of Manually Annotated Events for TDT3 Corpus --- p.104 / Bibliography --- p.114 Journalism--Data processing Broadcast journalism--Data processing Information retrieval Cross-language information retrieval Computational linguistics English language--Data processing Chinese language--Data processing
562	Effective Techniques for Indonesian Text Retrieval Asian, Jelita, jelitayang@gmail.com January 2007 (has links) The Web is a vast repository of data, and information on almost any subject can be found with the aid of search engines. Although the Web is international, the majority of research on finding of information has a focus on languages such as English and Chinese. In this thesis, we investigate information retrieval techniques for Indonesian. Although Indonesia is the fourth most populous country in the world, little attention has been given to search of Indonesian documents. Stemming is the process of reducing morphological variants of a word to a common stem form. Previous research has shown that stemming is language-dependent. Although several stemming algorithms have been proposed for Indonesian, there is no consensus on which gives better performance. We empirically explore these algorithms, showing that even the best algorithm still has scope for improvement. We propose novel extensions to this algorithm and develop a new Indonesian stemmer, and show that these can improve stemming correctness by up to three percentage points; our approach makes less than one error in thirty-eight words. We propose a range of techniques to enhance the performance of Indonesian information retrieval. These techniques include: stopping; sub-word tokenisation; and identification of proper nouns; and modifications to existing similarity functions. Our experiments show that many of these techniques can increase retrieval performance, with the highest increase achieved when we use grams of size five to tokenise words. We also present an effective method for identifying the language of a document; this allows various information retrieval techniques to be applied selectively depending on the language of target documents. We also address the problem of automatic creation of parallel corpora --- collections of documents that are the direct translations of each other --- which are essential for cross-lingual information retrieval tasks. Well-curated parallel corpora are rare, and for many languages, such as Indonesian, do not exist at all. We describe algorithms that we have developed to automatically identify parallel documents for Indonesian and English. Unlike most current approaches, which consider only the context and structure of the documents, our approach is based on the document content itself. Our algorithms do not make any prior assumptions about the documents, and are based on the Needleman-Wunsch algorithm for global alignment of protein sequences. Our approach works well in identifying Indonesian-English parallel documents, especially when no translation is performed. It can increase the separation value, a measure to discriminate good matches of parallel documents from bad matches, by approximately ten percentage points. We also investigate the applicability of our identification algorithms for other languages that use the Latin alphabet. Our experiments show that, with minor modifications, our alignment methods are effective for English-French, English-German, and French-German corpora, especially when the documents are not translated. Our technique can increase the separation value for the European corpus by up to twenty-eight percentage points. Together, these results provide a substantial advance in understanding techniques that can be applied for effective Indonesian text retrieval. Information retrieval text retrieval cross lingual information retrieval Indonesian stemming stopping tokenisation proper noun identification language identification compound identification parallel corpora identification
563	Attribute Exploration on the Web Jäschke, Robert, Rudolph, Sebastian 28 May 2013 (has links) (PDF) We propose an approach for supporting attribute exploration by web information retrieval, in particular by posing appropriate queries to search engines, crowd sourcing systems, and the linked open data cloud. We discuss underlying general assumptions for this to work and the degree to which these can be taken for granted. Formale Begriffsanalyse Information Retrieval Informationsrückgewinnung Internet Formal Concept Analysis Attribute Exploration Web Information Retrieval Linked Open Data ddc:004 rvk:ST 304 rvk:ST 125
564	Entwurf und Implementierung eines Frameworks zur Analyse und Evaluation von Verfahren im Information Retrieval Wilhelm, Thomas 13 August 2008 (has links) (PDF) Diese Diplomarbeit führt kurz in das Thema Information Retrieval mit den Schwerpunkten Evaluation und Evaluationskampagnen ein. Im Anschluss wird anhand der Nachteile eines vorhandenen Retrieval Systems ein neues Retrieval Framework zur experimentellen Evaluation von Ansätzen aus dem Information Retrieval entworfen und umgesetzt. Die Komponenten des Frameworks sind dabei so abstrakt angelegt, dass verschiedene, bestehende Retrieval Systeme, wie zum Beispiel Apache Lucene oder Terrier, integriert werden können. Anhand einer Referenzimplementierung für den ImageCLEF Photographic Retrieval Task des ImageCLEF Tracks des Cross Language Evaluation Forums wird die Funktionsfähigkeit des Frameworks überprüft und bestätigt. Content-Based Image Retrieval (CBIR) ddc:004 ddc:020 ddc:000 Evaluation Framework <Informatik> Information-Retrieval-System
565	Recommendation in Enterprise 2.0 Social Media Streams Lunze, Torsten 15 October 2014 (has links) (PDF) A social media stream allows users to share user-generated content as well as aggregate different external sources into one single stream. In Enterprise 2.0 such social media streams empower co-workers to share their information and to work efficiently and effectively together while replacing email communication. As more users share information it becomes impossible to read the complete stream leading to an information overload. Therefore, it is crucial to provide the users a personalized stream that suggests important and unread messages. The main characteristic of an Enterprise 2.0 social media stream is that co-workers work together on projects represented by topics: the stream is topic-centered and not user-centered as in public streams such as Facebook or Twitter. A lot of work has been done dealing with recommendation in a stream or for news recommendation. However, none of the current research approaches deal with the characteristics of an Enterprise 2.0 social media stream to recommend messages. The existing systems described in the research mainly deal with news recommendation for public streams and lack the applicability for Enterprise 2.0 social media streams. In this thesis a recommender concept is developed that allows the recommendation of messages in an Enterprise 2.0 social media stream. The basic idea is to extract features from a new message and use those features to compute a relevance score for a user. Additionally, those features are used to learn a user model and then use the user model for scoring new messages. This idea works without using explicit user feedback and assures a high user acceptance because no intense rating of messages is necessary. With this idea a content-based and collaborative-based approach is developed. To reflect the topic-centered streams a topic-specific user model is introduced which learns a user model independently for each topic. There are constantly new terms that occur in the stream of messages. For improving the quality of the recommendation (by finding more relevant messages) the recommender should be able to handle the new terms. Therefore, an approach is developed which adapts a user model if unknown terms occur by using terms of similar users or topics. Also, a short- and long-term approach is developed which tries to detect short-term interests of users. Only if the interest of a user occurs repeatedly over a certain time span are terms transferred to the long-term user model. The approaches are evaluated against a dataset obtained through an Enterprise 2.0 social media stream application. The evaluation shows the overall applicability of the concept. Specifically the evaluation shows that a topic-specific user model outperforms a global user model and also that adapting the user model according to similar users leads to an increase in the quality of the recommendation. Interestingly, the collaborative-based approach cannot reach the quality of the content-based approach. Sozialer Nachrichtenstrom Inhaltbasierte Empfehlungssysteme Enterprise 2.0 information retrieval social media streams content-based recommendation enterprise 2.0 ddc:004 rvk:ST 515 Information Retrieval Empfehlungssystem Soziale Software
566	Peer to peer English/Chinese cross-language information retrieval Lu, Chengye January 2008 (has links) Peer to peer systems have been widely used in the internet. However, most of the peer to peer information systems are still missing some of the important features, for example cross-language IR (Information Retrieval) and collection selection / fusion features. Cross-language IR is the state-of-art research area in IR research community. It has not been used in any real world IR systems yet. Cross-language IR has the ability to issue a query in one language and receive documents in other languages. In typical peer to peer environment, users are from multiple countries. Their collections are definitely in multiple languages. Cross-language IR can help users to find documents more easily. E.g. many Chinese researchers will search research papers in both Chinese and English. With Cross-language IR, they can do one query in Chinese and get documents in two languages. The Out Of Vocabulary (OOV) problem is one of the key research areas in crosslanguage information retrieval. In recent years, web mining was shown to be one of the effective approaches to solving this problem. However, how to extract Multiword Lexical Units (MLUs) from the web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in web mining based automated translation approaches. Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalized-score based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database but do not consider the retrieval performance of the remote search engine. This thesis presents research on building a peer to peer IR system with crosslanguage IR and advance collection profiling technique for fusion features. Particularly, this thesis first presents a new Chinese term measurement and new Chinese MLU extraction process that works well on small corpora. An approach to selection of MLUs in a more accurate manner is also presented. After that, this thesis proposes a collection profiling strategy which can discover not only collection content but also retrieval performance of the remote search engine. Based on collection profiling, a web-based query classification method and two collection fusion approaches are developed and presented in this thesis. Our experiments show that the proposed strategies are effective in merging results in uncooperative peer to peer environments. Here, an uncooperative environment is defined as each peer in the system is autonomous. Peer like to share documents but they do not share collection statistics. This environment is a typical peer to peer IR environment. Finally, all those approaches are grouped together to build up a secure peer to peer multilingual IR system that cooperates through X.509 and email system.
567	Tagging, rating, posting : studying forms of user contribution for web-based information management and information retrieval / Heckner, Markus January 2008 (has links) Zugl.: Regensburg, Univ., Diss., 2008
568	Semantic construction with provenance for model configurations in scientific workflows Thakur, Amritanshu, January 2008 (has links) Thesis (M.S.)--Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
569	Uma proposta de interface de consulta para recuperação de informação em documentos semi-estruturados Junqueira, Mirella Silva 19 February 2009 (has links) Conselho Nacional de Desenvolvimento Científico e Tecnológico / Semi-Structured Information Retrieval is an intermediate way to retrieve information between Textual Retrieval and Structured Retrieval (typical in relational database systems). In structured retrieval systems, users generally know the available data structure and query languages, so they can formulate queries that produce more accurate results. In textual retrieval users dont known the data structure and formulate queries with keywords only, which produces not so accurate results. In Semi-Structured Retrieval, users generally dont known the data structure and formulate queries that mix textual search and structured retrieval mechanisms. In this context, the problem of how to improve the results accuracy using the structure inside semi-structured documents appears. Semi-structured data is usually stored as XML documents and can be seen as trees. Internal nodes of these trees have the structure of documents, while leaf nodes contain text. The design of interfaces for users in this context is one of the biggest challenges in semi-structured information retrieval. This occurs especially because the users dont known the document structure and have problems in formulating structured queries. This dissertation presents a proposal and a prototype interface developed to help users in the process of formulation of structured queries. The aim is to increase the precision in the results of the queries. The proposal is validated by experiments involving volunteers users and by comparing the results of textual queries and structured queries made with the help of the interface. The improvement reaches 440% for well structured queries, with a user who knows the interface, and 179.75% for reasonably structured queries, by users without experience to use the interface. / A Recuperação Semi-Estruturada é uma forma de recuperação de informação intermediária entre a Recuperação Textual e a Recuperação Estruturada (típica em sistemas de banco de dados relacionais). Em sistemas de recuperação estruturada, o usuário geralmente conhece a estrutura dos dados e as linguagens de consulta disponíveis, conseguindo assim formular consultas que produzem resultados mais precisos. Na Recuperação Textual o usuário não conhece a estrutura dos dados e formula as consultas apenas com palavraschaves, as quais geram resultados não tão precisos. Na Recuperação Semi-Estruturada, o usuário geralmente desconhece a estrutura dos dados e formula consultas que mesclam buscas textuais e mecanismos de recuperação estruturada. Neste contexto, surge o problema de como melhorar a precisão dos resultados aproveitando a estrutura contida nos documentos semi-estruturados. Dados semi-estruturados são comummente armazenados como documentos XML, os quais podem ser vistos como árvores. Nós internos dessas árvores contem a estrutura do documento enquanto os nós folhas contêm os dados. O projeto de interfaces para usuários neste contexto é um dos grandes desafios na recuperação semi-estruturada. Isso ocorre especialmente porque os usuários não conhecem a estrutura do documento e têm dificuldade na formulação de consultas estruturadas. Este trabalho apresenta uma proposta e um protótipo de interface desenvolvido para auxiliar os usuários no processo de formulação de consultas estruturadas. Pretende-se com isso aumentar a precisão nos resultados das consultas. A proposta é validada por meio de experimentos envolvendo usuários voluntários e pela comparação de resultados obtidos com consultas textuais e consultas estruturadas formuladas com o auxílio da ferramenta. A melhoria atinge 440% para consultas bem estruturadas, realizadas por usuário que conhece bem a interface, e 179,75% para consultas razoavelmente estruturadas, realizadas por usuários sem experiência no uso da interface. / Mestre em Ciência da Computação Recuperação de informação Recuperação semiestruturada Interfaces homem-máquina XML Banco de dados Database Information retrieval Semi-structured information retrieval Human-machine interfaces
570	Approches non supervisées pour la recommandation de lectures et la mise en relation automatique de contenus au sein d'une bibliothèque numérique / Unsupervised approaches to recommending reads and automatically linking content within a digital library Benkoussas, Chahinez 14 December 2016 (has links) Cette thèse s’inscrit dans le domaine de la recherche d’information (RI) et la recommandation de lecture. Elle a pour objets :— La création de nouvelles approches de recherche de documents utilisant des techniques de combinaison de résultats, d’agrégation de données sociales et de reformulation de requêtes ;— La création d’une approche de recommandation utilisant des méthodes de RI et les graphes entre les documents. Deux collections de documents ont été utilisées. Une collection qui provient de l’évaluation CLEF (tâche Social Book Search - SBS) et la deuxième issue du domaine des sciences humaines et sociales (OpenEdition, principalement Revues.org). La modélisation des documents de chaque collection repose sur deux types de relations :— Dans la première collection (CLEF SBS), les documents sont reliés avec des similarités calculées par Amazon qui se basent sur plusieurs facteurs (achats des utilisateurs, commentaires, votes, produits achetés ensemble, etc.) ;— Dans la deuxième collection (OpenEdition), les documents sont reliés avec des relations de citations (à partir des références bibliographiques).Le manuscrit est structuré en deux parties. La première partie «état de l’art» regroupe une introduction générale, un état de l’art sur la RI et sur les systèmes de recommandation. La deuxième partie «contributions» regroupe un chapitre sur la détection de comptes rendus de lecture au sein de la collection OpenEdition (Revues.org), un chapitre sur les méthodes de RI utilisées sur des requêtes complexes et un dernier chapitre qui traite l’approche de recommandation proposée qui se base sur les graphes. / This thesis deals with the field of information retrieval and the recommendation of reading. It has for objects:— The creation of new approach of document retrieval and recommendation using techniques of combination of results, aggregation of social data and reformulation of queries;— The creation of an approach of recommendation using methods of information retrieval and graph theories.Two collections of documents were used. First one is a collection which is provided by CLEF (Social Book Search - SBS) and the second from the platforms of electronic sources in Humanities and Social Sciences OpenEdition.org (Revues.org). The modelling of the documents of every collection is based on two types of relations:— For the first collection (SBS), documents are connected with similarity calculated by Amazon which is based on several factors (purchases of the users, the comments, the votes, products bought together, etc.);— For the second collection (OpenEdition), documents are connected with relations of citations, extracted from bibliographical references.We show that the proposed approaches bring in most of the cases gain in the performances of research and recommendation. The manuscript is structured in two parts. The first part "state of the art" includes a general introduction, a state of the art of informationretrieval and recommender systems. The second part "contributions" includes a chapter on the detection of reviews of books in Revues.org; a chapter on the methods of IR used on complex queries written in natural language and last chapter which handles the proposed approach of recommendation which is based on graph. Recherche d’information Recommandation Modèles de recherche d’information Graphes Bibliothèque numérique Réseau de citations Classification automatique. Information retrieval Recommendation Information retrieval models Graphs Digital library Citation’s network Automatic classification.

Search results