Spelling suggestions: "subject:"informationretrieval."" "subject:"informationsretrieval.""
311 |
Cross-Language Information Retrieval : En studie av lingvistiska problem och utvecklade översättningsmetoder för lösningar angående informationsåtervinning över språkliga gränser.Boström, Anna January 2004 (has links)
<p>Syftet med denna uppsats är att undersöka problem samt lösningar i relation till informationsåtervinning över språkliga gränser. Metoden som har använts i uppsatsen är studier av forskningsmaterial inom lingvistik samt främst den relativt nya forskningsdisciplinen Cross-Language Information Retrieval (CLIR). I uppsatsen hävdas att världens alla olikartade språk i dagsläget måste betraktas som ett angeläget problem för informationsvetenskapen, ty språkliga skillnader utgör ännu ett stort hinder för den internationella informationsåtervinning som tekniska framsteg, uppkomsten av Internet, digitala bibliotek, globalisering, samt stora politiska förändringar i ett flertal länder runtom i världen under de senaste åren tekniskt och teoretiskt sett har möjliggjort. I uppsatsens första del redogörs för några universellt erkända lingvistiska skillnader mellan olika språk – i detta fall främst med exempel från europeiska språk – och vanliga problem som dessa kan bidra till angående översättningar från ett språk till ett annat. I uppsatsen hävdas att dessa skillnader och problem även måste anses som relevanta när det gäller informationsåtervinning över språkliga gränser. Uppsatsen fortskrider med att ta upp ämnet Cross-Language Information Retrieval (CLIR), inom vilken lösningar på flerspråkighet och språkskillnader inom informationsåtervinning försöker utvecklas och förbättras. Målet med CLIR är att en informationssökare så småningom skall kunna söka information på sitt modersmål men ändå hitta relevant information på flera andra språk. Ett ytterligare mål är att den återfunna informationen i sin helhet även skall kunna översättas till ett för sökaren önskat språk. Fyra olika översättningsmetoder som i dagsläget finns utvecklade inom CLIR för att automatiskt kunna översätta sökfrågor, ämnesord, eller, i vissa fall, hela dokument åt en informationssökare med lite eller ingen alls kunskap om det språk som han eller hon söker information på behandlas därefter. De fyra metoderna – identifierade som maskinöversättning, tesaurus- och ordboksöversättning, korpusbaserad översättning, samt ingen översättning – diskuteras även i relation till de lingvistiska problem och skillnader som har tagits upp i uppsatsens första del. Resultatet visar att språk är någonting mycket komplext och att de olika metoderna som hittills finns utvecklade ofta kan lösa något eller några av de uppmärksammade lingvistiska översättningssvårigheterna. Dock finns det inte någon utvecklad metod som i dagsläget kan lösa samtliga problem. Uppsatsen uppmärksammar emellertid även att CLIR-forskarna i hög grad är medvetna om de nuvarande metodernas uppenbara begränsningar och att man prövar att lösa detta genom att försöka kombinera flera olika översättningsmetoder i ett CLIR-system. Avslutningsvis redogörs även för CLIR-forskarnas förväntningar och förhoppningar inför framtiden.</p> / <p>This essay deals with information retrieval across languages by examining different types of literature in the research areas of linguistics and multilingual information retrieval. The essay argues that the many different languages that co-exist around the globe must be recognised as an essential obstacle for information science. The language barrier today remains a major impediment for the expansion of international information retrieval otherwise made technically and theoretically possible over the last few years by new technical developments, the Internet, digital libraries, globalisation, and moreover many political changes in several countries around the world. The first part of the essay explores linguistic differences and difficulties related to general translations from one language to another, using examples from mainly European languages. It is suggested that these problems and differences also must be acknowledged and regarded as highly important when it comes to information retrieval across languages. The essay continues by reporting on Cross-Language Information Retrieval (CLIR), a relatively new research area where methods for multilingual information retrieval are studied and developed. The object of CLIR is that people in the future shall be able to search for information in their native tongue, but still find relevant information in more than one language. Another goal for the future is the possibility to translate complete documents into a person’s language of preference. The essay reports on four different CLIR-methods currently established for automatically translating queries, subject headings, or, in some cases, complete documents, and thus aid people with little or no knowledge of the language in which he or she is looking for information. The four methods – identified as machine translation, translations using a multilingual thesaurus or a manually produced machine readable dictionary, corpus-based translation, and no translation – are discussed in relation to the linguistic translation difficulties mentioned in the paper’s initial part. The conclusion drawn is that language is exceedingly complex and that while the different CLIR-methods currently developed often can solve one or two of the acknowledged linguistic difficulties, none is able to overcome all. The essay also show, however, that CLIR-scientists are highly aware of the limitations of the different translation methods and that many are trying to get to terms with this by incorporating several sources of translation in one single CLIR-system. The essay finally concludes by looking at CLIR-scientists’ expectations and hopes for the future.</p>
|
312 |
Cross-language information retrieval : en studie av lingvistiska problem och utvecklade översättningsmetoder för lösningar angående informationsåtervinning över språkliga gränserBoström, Anna January 2004 (has links)
Syftet med denna uppsats är att undersöka problem samt lösningar i relation till informationsåtervinning över språkliga gränser. Metoden som har använts i uppsatsen är studier av forskningsmaterial inom lingvistik samt främst den relativt nya forskningsdisciplinen Cross-Language Information Retrieval (CLIR). I uppsatsen hävdas att världens alla olikartade språk i dagsläget måste betraktas som ett angeläget problem för informationsvetenskapen, ty språkliga skillnader utgör ännu ett stort hinder för den internationella informationsåtervinning som tekniska framsteg, uppkomsten av Internet, digitala bibliotek, globalisering, samt stora politiska förändringar i ett flertal länder runtom i världen under de senaste åren tekniskt och teoretiskt sett har möjliggjort. I uppsatsens första del redogörs för några universellt erkända lingvistiska skillnader mellan olika språk – i detta fall främst med exempel från europeiska språk – och vanliga problem som dessa kan bidra till angående översättningar från ett språk till ett annat. I uppsatsen hävdas att dessa skillnader och problem även måste anses som relevanta när det gäller informationsåtervinning över språkliga gränser. Uppsatsen fortskrider med att ta upp ämnet Cross-Language Information Retrieval (CLIR), inom vilken lösningar på flerspråkighet och språkskillnader inom informationsåtervinning försöker utvecklas och förbättras. Målet med CLIR är att en informationssökare så småningom skall kunna söka information på sitt modersmål men ändå hitta relevant information på flera andra språk. Ett ytterligare mål är att den återfunna informationen i sin helhet även skall kunna översättas till ett för sökaren önskat språk. Fyra olika översättningsmetoder som i dagsläget finns utvecklade inom CLIR för att automatiskt kunna översätta sökfrågor, ämnesord, eller, i vissa fall, hela dokument åt en informationssökare med lite eller ingen alls kunskap om det språk som han eller hon söker information på behandlas därefter. De fyra metoderna – identifierade som maskinöversättning, tesaurus- och ordboksöversättning, korpusbaserad översättning, samt ingen översättning – diskuteras även i relation till de lingvistiska problem och skillnader som har tagits upp i uppsatsens första del. Resultatet visar att språk är någonting mycket komplext och att de olika metoderna som hittills finns utvecklade ofta kan lösa något eller några av de uppmärksammade lingvistiska översättningssvårigheterna. Dock finns det inte någon utvecklad metod som i dagsläget kan lösa samtliga problem. Uppsatsen uppmärksammar emellertid även att CLIR-forskarna i hög grad är medvetna om de nuvarande metodernas uppenbara begränsningar och att man prövar att lösa detta genom att försöka kombinera flera olika översättningsmetoder i ett CLIR-system. Avslutningsvis redogörs även för CLIR-forskarnas förväntningar och förhoppningar inför framtiden. / This essay deals with information retrieval across languages by examining different types of literature in the research areas of linguistics and multilingual information retrieval. The essay argues that the many different languages that co-exist around the globe must be recognised as an essential obstacle for information science. The language barrier today remains a major impediment for the expansion of international information retrieval otherwise made technically and theoretically possible over the last few years by new technical developments, the Internet, digital libraries, globalisation, and moreover many political changes in several countries around the world. The first part of the essay explores linguistic differences and difficulties related to general translations from one language to another, using examples from mainly European languages. It is suggested that these problems and differences also must be acknowledged and regarded as highly important when it comes to information retrieval across languages. The essay continues by reporting on Cross-Language Information Retrieval (CLIR), a relatively new research area where methods for multilingual information retrieval are studied and developed. The object of CLIR is that people in the future shall be able to search for information in their native tongue, but still find relevant information in more than one language. Another goal for the future is the possibility to translate complete documents into a person’s language of preference. The essay reports on four different CLIR-methods currently established for automatically translating queries, subject headings, or, in some cases, complete documents, and thus aid people with little or no knowledge of the language in which he or she is looking for information. The four methods – identified as machine translation, translations using a multilingual thesaurus or a manually produced machine readable dictionary, corpus-based translation, and no translation – are discussed in relation to the linguistic translation difficulties mentioned in the paper’s initial part. The conclusion drawn is that language is exceedingly complex and that while the different CLIR-methods currently developed often can solve one or two of the acknowledged linguistic difficulties, none is able to overcome all. The essay also show, however, that CLIR-scientists are highly aware of the limitations of the different translation methods and that many are trying to get to terms with this by incorporating several sources of translation in one single CLIR-system. The essay finally concludes by looking at CLIR-scientists’ expectations and hopes for the future.
|
313 |
Querying databases privately : a new approach to private information retrieval /Asonov, Dmitri. January 2004 (has links)
Humboldt-Univ., Diss.--Berlin, 2003.
|
314 |
ValidAX - Validierung der Frameworks AMOPA und XTRIEVALBerger, Arne, Eibl, Maximilian, Heinich, Stephan, Herms, Robert, Kahl, Stefan, Kürsten, Jens, Kurze, Albrecht, Manthey, Robert, Rickert, Markus, Ritter, Marc 03 February 2015 (has links) (PDF)
Das Projekt „ValidAX - Validierung der Frameworks AMOPA und XTRIEVAL“ untersucht die Möglichkeiten die an der Professur Medieninformatik der TU Chemnitz erstellten Softwareframeworks AMOPA (Automated Moving Picture Annotator) und Xtrieval (Extensible Information Retrieval Framework) in Richtung einer wirtschaftlichen Verwertbarkeit weiterzuentwickeln und in Arbeitsprozesse praktisch einzubinden. AMOPA ist in der Lage, beliebige audiovisuelle Medien zu analysieren und Metadaten wie Schnittgrenzen, Szenen, Personen, Audiotranskriptionen und andere durchzuführen. Xtrieval ist ein hochflexibles Werkzeug, welches die Recherche in beliebigen Medien optimal ermöglicht. Für die Durchführung des Projekts wurden insgesamt drei mögliche Einsatzszenarien definiert, in denen die Frameworks unterschiedlichen Anforderungen ausgesetzt waren:
- Archivierung
- Interaktives und automatisiertes Fernsehen
- Medizinische Videoanalysen
Entsprechend der Szenarien wurden die Frameworks optimiert und technische Workflows konzipiert und realisiert. Demonstratoren dienen zur Gewinnung weiterer Verwertungspartner. / The project "ValidAX - Validation of the frameworks AMOPA and XTRIEVAL" examines the possibilities of developing the software framework AMOPA (Automated Moving Picture Annotator) and Xtrieval (Extensible Information Retrieval Framework) towards a commercial usage. The frameworks have been created by the Chair Media Informatics at the TU Chemnitz. AMOPA is able to analyze any audiovisual media and to generate additional metadata such as scene detection, face detection, audio transcriptions and others. Xtrieval is a highly flexible tool that allows users to search in any media. For the implementation of the project a total of three possible scenarios have been defined, in which the frameworks were exposed to different requirements:
• Archiving
• Interactive and automated TV
• Medical video analysis
According to the scenarios, the frameworks were optimized and designed and technical workflows were conceptualized and implemented. Demonstrators are used to obtain further commercialization partner.
|
315 |
Word embeddings for monolingual and cross-language domain-specific information retrieval / Ordinbäddningar för enspråkig och tvärspråklig domänspecifik informationssökningWigder, Chaya January 2018 (has links)
Various studies have shown the usefulness of word embedding models for a wide variety of natural language processing tasks. This thesis examines how word embeddings can be incorporated into domain-specific search engines for both monolingual and cross-language search. This is done by testing various embedding model hyperparameters, as well as methods for weighting the relative importance of words to a document or query. In addition, methods for generating domain-specific bilingual embeddings are examined and tested. The system was compared to a baseline that used cosine similarity without word embeddings, and for both the monolingual and bilingual search engines the use of monolingual embedding models improved performance above the baseline. However, bilingual embeddings, especially for domain-specific terms, tended to be of too poor quality to be used directly in the search engines. / Flera studier har visat att ordinbäddningsmodeller är användningsbara för många olika språkteknologiuppgifter. Denna avhandling undersöker hur ordinbäddningsmodeller kan användas i sökmotorer för både enspråkig och tvärspråklig domänspecifik sökning. Experiment gjordes för att optimera hyperparametrarna till ordinbäddningsmodellerna och för att hitta det bästa sättet att vikta ord efter hur viktiga de är i dokumentet eller sökfrågan. Dessutom undersöktes metoder för att skapa domänspecifika tvåspråkiga inbäddningar. Systemet jämfördes med en baslinje utan inbäddningar baserad på cosinuslikhet, och för både enspråkiga och tvärspråkliga sökningar var systemet som använde enspråkiga inbäddningar bättre än baslinjen. Däremot var de tvåspråkiga inbäddningarna, särskilt för domänspecifika ord, av låg kvalitet och gav för dåliga resultat för direkt användning inom sökmotorer.
|
316 |
A Model-Based Methodology for Managing T&E MetadataHamilton, John, Fernandes, Ronald, Darr, Timothy, Graul, Michael, Jones, Charles, Weisenseel, Annette 10 1900 (has links)
ITC/USA 2009 Conference Proceedings / The Forty-Fifth Annual International Telemetering Conference and Technical Exhibition / October 26-29, 2009 / Riviera Hotel & Convention Center, Las Vegas, Nevada / In this paper, we present a methodology for managing diverse sources of T&E metadata. Central to this methodology is the development of a T&E Metadata Reference Model, which serves as the standard model for T&E metadata types, their proper names, and their relationships to each other. We describe how this reference model can be mapped to a range's own T&E data and process models to provide a standardized view into each organization's custom metadata sources and procedures. Finally, we present an architecture that uses these models and mappings to support cross-system metadata management tasks and makes these capabilities accessible across the network through a single portal interface.
|
317 |
Semantically-enhanced image tagging systemRahuma, Awatef January 2013 (has links)
In multimedia databases, data are images, audio, video, texts, etc. Research interests in these types of databases have increased in the last decade or so, especially with the advent of the Internet and Semantic Web. Fundamental research issues vary from unified data modelling, retrieval of data items and dynamic nature of updates. The thesis builds on findings in Semantic Web and retrieval techniques and explores novel tagging methods for identifying data items. Tagging systems have become popular which enable the users to add tags to Internet resources such as images, video and audio to make them more manageable. Collaborative tagging is concerned with the relationship between people and resources. Most of these resources have metadata in machine processable format and enable users to use free- text keywords (so-called tags) as search techniques. This research references some tagging systems, e.g. Flicker, delicious and myweb2.0. The limitation with such techniques includes polysemy (one word and different meaning), synonymy (different words and one meaning), different lexical forms (singular, plural, and conjugated words) and misspelling errors or alternate spellings. The work presented in this thesis introduces semantic characterization of web resources that describes the structure and organization of tagging, aiming to extend the existing Multimedia Query using similarity measures to cater for collaborative tagging. In addition, we discuss the semantic difficulties of tagging systems, suggesting improvements in their accuracies. The scope of our work is classified as follows: (i) Increase the accuracy and confidence of multimedia tagging systems. (ii) Increase the similarity measures of images by integrating varieties of measures. To address the first shortcoming, we use the WordNet based on a tagging system for social sharing and retrieval of images as a semantic lingual ontology resource. For the second shortcoming we use the similarity measures in different ways to recognise the multimedia tagging system. Fundamental to our work is the novel information model that we have constructed for our computation. This is based on the fact that an image is a rich object that can be characterised and formulated in n-dimensions, each dimension contains valuable information that will help in increasing the accuracy of the search. For example an image of a tree in a forest contains more information than an image of the same tree but in a different environment. In this thesis we characterise a data item (an image) by a primary description, followed by n-secondary descriptions. As n increases, the accuracy of the search improves. We give various techniques to analyse data and its associated query. To increase the accuracy of the tagging system we have performed different experiments on many images using similarity measures and various techniques from VoI (Value of Information). The findings have shown the linkage/integration between similarity measures and that VoI improves searches and helps/guides a tagger in choosing the most adequate of tags.
|
318 |
Truth discovery under resource constraintsEtuk, Anthony Anietie January 2015 (has links)
Social computing initiatives that mark a shift from personal computing towards computations involving collective action, are driving a dramatic evolution in modern decision-making. Decisionmakers or stakeholders can now tap into the power of tremendous numbers and varieties of information sources (crowds), capable of providing information for decisions that could impact individual or collective well-being. More information sources does not necessarily translate to better information quality, however. Social influence in online environments, for example, may bias collective opinions. In addition, querying information sources may be costly, in terms of energy, bandwidth, delay overheads, etc., in real-world applications. In this research, we propose a general approach for truth discovery in resource constrained environments, where there is uncertainty regarding the trustworthiness of sources. First, we present a model of diversity, which allows a decision-maker to form groups, made up of sources likely to provide similar reports. We demonstrate that this mechanism is able to identify different forms of dependencies among information sources, and hence has the potential to mitigate the risk of double-counting evidence due to correlated biases among information sources. Secondly, we present a sampling decision-making model, which combines source diversification and reinforcement learning to drive sampling strategy. We demonstrate that this mechanism is effective in guiding sampling decisions given different task constraints or information needs. We evaluate our model by comparing it with algorithms representing classes of existing approaches reported in the literature.
|
319 |
Diffusion of network information retrieval in academiaAshley, Nancy Winniford January 1995 (has links)
NIR, network information retrieval, is the act of finding and retrieving information on interconnected computer networks. The research investigated the extent to which NIR awareness and use has diffused through a broad research population, and why and how academics become aware of and use NIR. Everett Rogers' diffusion of innovation theory was adapted to guide the investigation. A survey of 888 faculty members at the University of Arizona with Internet-accessible computer accounts resulted in a 32% return of surveys. Respondents from the various colleges at the university use between 20% and 39% of available NIR technologies, suggesting that NIR is in an early stage of diffusion in all colleges. Twenty-one one hour open-ended interviews were conducted with faculty from a variety of disciplines. Analysis of coded interview comments was used to test the usefulness of Rogers' theory in describing the diffusion of NIR. Predictions that mass media communication channels which go outside the local community will be more likely to result in awareness and use of NIR were not supported. Predictions that use of NIR would be associated with the perception that NIR (1) is compatible with needs and social norms, and (2) has relative advantage over previous practice, were supported. The predictions that use would be associated with perceptions of (1) compatibility with previous conditions, (2) low NIR complexity, and (3) trialability of NIR, were not supported. The explanatory power of the diffusion of innovation theory is improved for diffusion of NIR if NIR technologies are not studied in a vacuum. Rather, NIR technologies need to be studied in association with particular types of information resources (i.e. general interest and research-related resources) and particular types of communities (i.e. research communities). The study suggests that before NIR will diffuse in research communities, academics will need to agree that NIR dissemination of information will be rewarded in the promotion and tenure process. Such redefinition of social norms will help to create within research areas a critical mass of NIR users, and thus contribute to the diffusion of NIR.
|
320 |
Authorship Attribution of Source CodeTennyson, Matthew Francis 01 January 2013 (has links)
Authorship attribution of source code is the task of deciding who wrote a program, given its source code. Applications include software forensics, plagiarism detection, and determining software ownership. A number of methods for the authorship attribution of source code have been presented in the past. A review of those existing methods is presented, while focusing on the two state-of-the-art methods: SCAP and Burrows.
The primary goal was to develop a new method for authorship attribution of source code that is even more effective than the current state-of-the-art methods. Toward that end, a comparative study of the methods was performed in order to determine their relative effectiveness and establish a baseline. A suitable set of test data was also established in a manner intended to support the vision of a universal data set suitable for standard use in authorship attribution experiments. A data set was chosen consisting of 7,231 open-source and textbook programs written in C++ and Java by thirty unique authors.
The baseline study showed both the Burrows and SCAP methods were indeed state-of-the-art. The Burrows method correctly attributed 89% of all documents, while the SCAP method correctly attributed 95%. The Burrows method inherently anonymizes the data by stripping all comments and string literals, while the SCAP method does not. So the methods were also compared using anonymized data. The SCAP method correctly attributed 91% of the anonymized documents, compared to 89% by Burrows.
The Burrows method was improved in two ways: the set of features used to represent programs was updated and the similarity metric was updated. As a result, the improved method successfully attributed nearly 94% of all documents, compared to 89% attributed in the baseline.
The SCAP method was also improved in two ways: the technique used to anonymize documents was changed and the amount of information retained in the source code author profiles was determined differently. As a result, the improved method successfully attributed 97% of anonymized documents and 98% of non-anonymized documents, compared to 91% and 95% that were attributed in the baseline, respectively.
The two improved methods were used to create an ensemble method based on the Bayes optimal classifier. The ensemble method successfully attributed nearly 99% of all documents in the data set.
|
Page generated in 0.151 seconds