Global ETD Search

1	Combining outputs from machine translation systems Salim, Fahim January 2011 (has links) Combining Outputs from Machine Translation Systems By Fahim A. Salim Supervised by: Ing. Zdenek Zabokrtsky, Ph.D Institute of Formal and Applied Linguistics, Charles University in Prague 2010. Abstract: Due to the massive ongoing research there are many paradigms of Machine Translation systems with diverse characteristics. Even systems designed on the same paradigm might perform differently in different scenarios depending upon their training data used and other design decisions made. All Machine Translation Systems have their strengths and weaknesses and often weakness of one MT system is the strength of the other. No single approach or system seems to always perform best, therefore combining different approaches or systems i.e. creating systems of Hybrid nature, to capitalize on their strengths and minimizing their weaknesses in an ongoing trend in Machine Translation research. But even Systems of Hybrid nature has limitations and they also tend to perform differently in different scenarios. Thanks to the World Wide Web and open source, nowadays one can have access to many different and diverse Machine Translation systems therefore it is practical to have techniques which could combine the translation of different MT systems and produce a translation which is better than any of the individual systems....
2	Re-ranking de busca visual de produtos usando informação multimodal Santos, Joyce Miranda dos 12 March 2013 (has links) Made available in DSpace on 2015-04-11T14:02:51Z (GMT). No. of bitstreams: 1 joyce.pdf: 2848954 bytes, checksum: 2975e0e84f1ae7a53273f20004ce6c78 (MD5) Previous issue date: 2013-03-12 / With the fast development of the Internet and the popularization of mobile devices, searching for a specific product in e-commerce Web sites through a query image has become a very promising area of research. In this context, CBIR (Content-Based Image Retrieval) techniques have been exploited to support and improve the shopping experience of consumers. In this dissertation, we address the problem of product visual search using an image as a query, instead of the more popular approach of search based on keywords. We propose a strategy for re-ranking based on multimedia information usually available in database of products. Our strategy makes use of category information and textual description associated with the top-k images of an initial ranking generated by CBIR techniques only. Experiments were performed considering the judgment of users on two collections of images collected from popular e-commerce Web sites. Our results show that our strategy achieves significant gains compared to an approach based only on CBIR techniques. / Com o rápido desenvolvimento da Internet, a popularização de dispositivos móveis e de sites de comércio eletrônico, procurar um produto específico a partir de uma imagem tem se tornado uma área de pesquisa promissora. Nesse contexto, técnicas de CBIR (Content-Based Image Retrieval) vêm sendo exploradas para apoiar e melhorar a experiência de compra dos consumidores. Neste trabalho, abordamos o problema de busca visual de produtos usando uma imagem como consulta, no lugar da mais popular abordagem de busca que é baseada em palavras-chave. Nós propomos uma estratégia de re-ranking que faz uso de informações multimídia normalmente disponíveis nas bases de dados de produtos. Nossa estratégia faz uso de informações de categoria e descrição textual associadas às imagens melhor posicionadas de um ranking inicial gerado por técnicas puramente de CBIR. Experimentos foram realizados considerando o julgamento de usuários em duas coleções de imagens coletadas a partir de sites de comércio eletrônico. Nossos resultados mostram que nossa estratégia alcança ganhos significativos quando comparada à busca puramente visual. Busca visual de produtos Re-ranking de imagens Comércio eletrônico Products visual search Image re-ranking E-commerce
3	Réordonnancement de candidats reponses pour un système de questions-réponses / Re-ranking of candidates answers of a question-answering system. Bernard, Guillaume 06 June 2011 (has links) L’objectif de cette thèse a été de proposer une approche robuste pour traiter le problème de la recherche dela réponse précise à une question.Notre première contribution a été la conception et la mise en œuvre d’un modèle de représentation robuste de l’informationet son implémentation. Son objectif est d’apporter aux phrases des documents et aux questions de l’informationstructurelle, composée de groupes de mots typés (segments typés) et de relations entre ces groupes. Ce modèle a été évalué sur différents corpus (écrits, oraux, web) et a donné de bons résultats, prouvant sa robustesse.Notre seconde contribution a consisté en la conception d’une méthode de réordonnancement des candidats réponsesretournés par un système de questions-réponses. Cette méthode a aussi été conçue pour des besoins de robustesse, ets’appuie sur notre première contribution. L’idée est de comparer une question et le passage d’où a été extraite une réponse candidate, et de calculer un score de similarité, en s’appuyant notamment sur une distance d’édition.Le réordonnanceur a été évalué sur les données de différentes campagnes d’évaluation. Les résultats obtenus sontparticulièrement positifs sur des questions longues et complexes. Ces résultats prouvent l’intérêt de notre méthode, notreapproche étant particulièrement adaptée pour traiter les questions longues, et ce quel que soit le type de données. Leréordonnanceur a ainsi été évalué sur l’édition 2010 de la campagne d’évaluation Quaero, où les résultats sont positifs. / The objective of this work is to introduce a new robust approach to treat the problem of finding the correctanswer to a question.Our first contribution is the design and implementation of a robust representation model for information. The aim is torepresent the structural information of sentences of documents and questions structural information. This representation iscomposed of typed groups of words (typed segments) and relations between these groups. This model has been evaluatedon several corpus (written, oral, web) and achieved good resultats, which proves his robustness.Our second contribution consisted is the design of a re-ranking method of a set of the candidate answers output by thequestion-answering system. This re-ranking method is based on the structural information representation. The general ideais to compare a question and a passage from where a candidate answer was extracted, and to compute a similarity score by using a modified edit distance we proposed.Our re-ranking method has been evaluated on the data of several evaluation campaigns. The results are quite goodon long and complex questions. These results show the interest of our method : our approach is quite adapted to treatlong question, whatever the type of the data. The re-ranker has been officially evaluated on the 2010 edition of the Quaeroevaluation campaign, with positives results. Question-Réponse Oral Réordonnancement Domaine ouvert Question-Answering Oral Re-ranking Open domain
4	Bradfordizing als alternativer Sacheinstieg: Evaluation thematischer Kernzonenbildung Mayr, Philipp 24 August 2009 (has links) (PDF) Insbesondere die Reihenfolge und Struktur der gelisteten Ergebnisse (Ranking) spielt, neben dem direkten Volltextzugriff auf die Dokumente, inzwischen eine entscheidende Rolle beim Design von Suchsystemen. Ziel der Forschung/Doktorarbeit von Philip Mayr ist es, zu untersuchen, ob das vorgestellte alternative Re-Rankingverfahren „Bradfordizing“ im Anwendungsbereich bibliographischer Datenbanken operabel ist und voraussichtlich gewinnbringend in Informationssystemen eingesetzt und dem Nutzer angeboten werden kann. Die Evaluation von Bradfordizing zeigt, dass die Dokumente der Kernzone (Kernzeitschriften) für die meisten Testreihen eine signifikant höhere Precision als Dokumente der Zone 2 und Zone 3 (Peripheriezeitschriften) ergeben. Sowohl für Zeitschriften als auch für Monographien kann ein Relevanzvorteil nach Bradfordizing auf einer sehr breiten Basis von Themen und Fragestellungen an zwei unabhängigen Dokumentkorpora empirisch nachgewiesen werden. Bradfordizing Re-Ranking Relevanz ddc:020 Bibliometrie Information Retrieval Informetrie Ranking
5	Bradfordizing als alternativer Sacheinstieg: Evaluation thematischer Kernzonenbildung Mayr, Philipp 24 August 2009 (has links) Insbesondere die Reihenfolge und Struktur der gelisteten Ergebnisse (Ranking) spielt, neben dem direkten Volltextzugriff auf die Dokumente, inzwischen eine entscheidende Rolle beim Design von Suchsystemen. Ziel der Forschung/Doktorarbeit von Philip Mayr ist es, zu untersuchen, ob das vorgestellte alternative Re-Rankingverfahren „Bradfordizing“ im Anwendungsbereich bibliographischer Datenbanken operabel ist und voraussichtlich gewinnbringend in Informationssystemen eingesetzt und dem Nutzer angeboten werden kann. Die Evaluation von Bradfordizing zeigt, dass die Dokumente der Kernzone (Kernzeitschriften) für die meisten Testreihen eine signifikant höhere Precision als Dokumente der Zone 2 und Zone 3 (Peripheriezeitschriften) ergeben. Sowohl für Zeitschriften als auch für Monographien kann ein Relevanzvorteil nach Bradfordizing auf einer sehr breiten Basis von Themen und Fragestellungen an zwei unabhängigen Dokumentkorpora empirisch nachgewiesen werden. info:eu-repo/classification/ddc/020 ddc:020 Bibliometrie Information Retrieval Informetrie Ranking Bradfordizing Re-Ranking Relevanz
6	Leveraging supplementary transcriptions and transliterations via re-ranking Bhargava, Aditya Unknown Date No description available. Natural language processing Computational linguistics Grapheme-to-phoneme conversion Machine transliteration Transliteration SVM re-ranking Supplemental data
7	Word Confidence Estimation and Its Applications in Statistical Machine Translation / Les mesures de confiance au niveau des mots et leurs applications pour la traduction automatique statistique Luong, Ngoc Quang 12 November 2014 (has links) Les systèmes de traduction automatique (TA), qui génèrent automatiquement la phrase de la langue cible pour chaque entrée de la langue source, ont obtenu plusieurs réalisations convaincantes pendant les dernières décennies et deviennent les aides linguistiques efficaces pour la communauté entière dans un monde globalisé. Néanmoins, en raison de différents facteurs, sa qualité en général est encore loin de la perfection, constituant le désir des utilisateurs de savoir le niveau de confiance qu'ils peuvent mettre sur une traduction spécifique. La construction d'une méthode qui est capable d'indiquer des bonnes parties ainsi que d'identifier des erreurs de la traduction est absolument une bénéfice pour non seulement les utilisateurs, mais aussi les traducteurs, post-éditeurs, et les systèmes de TA eux-mêmes. Nous appelons cette méthode les mesures de confiance (MC). Cette thèse se porte principalement sur les méthodes des MC au niveau des mots (MCM). Le système de MCM assigne à chaque mot de la phrase cible un étiquette de qualité. Aujourd'hui, les MCM jouent un rôle croissant dans nombreux aspects de TA. Tout d'abord, elles aident les post-éditeurs d'identifier rapidement les erreurs dans la traduction et donc d'améliorer leur productivité de travail. De plus, elles informent les lecteurs des portions qui ne sont pas fiables pour éviter leur malentendu sur le contenu de la phrase. Troisièmement, elles sélectionnent la meilleure traduction parmi les sorties de plusieurs systèmes de TA. Finalement, et ce qui n'est pas le moins important, les scores MCM peuvent aider à perfectionner la qualité de TA via certains scénarios: ré-ordonnance des listes N-best, ré-décodage du graphique de la recherche, etc. Dans cette thèse, nous visons à renforcer et optimiser notre système de MCM, puis à l'exploiter pour améliorer TA ainsi que les mesures de confiance au niveau des phrases (MCP). Comparer avec les approches précédentes, nos nouvelles contributions étalent sur les points principaux comme suivants. Tout d'abord, nous intégrons différents types des paramètres: ceux qui sont extraits du système TA, avec des caractéristiques lexicales, syntaxiques et sémantiques pour construire le système MCM de base. L'application de différents méthodes d'apprentissage nous permet d'identifier la meilleure (méthode: "Champs conditionnels aléatoires") qui convient le plus nos donnés. En suite, l'efficacité de touts les paramètres est plus profond examinée en utilisant un algorithme heuristique de sélection des paramètres. Troisièmement, nous exploitons l'algorithme Boosting comme notre méthode d'apprentissage afin de renforcer la contribution des sous-ensembles des paramètres dominants du système MCM, et en conséquence d'améliorer la capacité de prédiction du système MCM. En outre, nous enquérons les contributions des MCM vers l'amélioration de la qualité de TA via différents scénarios. Dans le re-ordonnance des liste N-best, nous synthétisons les scores à partir des sorties du système MCM et puis les intégrons avec les autres scores du décodeur afin de recalculer la valeur de la fonction objective, qui nous permet d'obtenir un mieux candidat. D'ailleurs, dans le ré-décodage du graphique de la recherche, nous appliquons des scores de MCM directement aux noeuds contenant chaque mot pour mettre à jour leurs coûts. Une fois la mise à jour se termine, la recherche pour meilleur chemin sur le nouveau graphique nous donne la nouvelle hypothèse de TA. Finalement, les scores de MCM sont aussi utilisés pour renforcer les performances des systèmes de MCP. Au total, notre travail apporte une image perspicace et multidimensionnelle sur des MCM et leurs impacts positifs sur différents secteurs de la TA. Les résultats très prometteurs ouvrent une grande avenue où MCM peuvent exprimer leur rôle, comme: MCM pour la reconnaissance automatique de la parole (RAP), pour la sélection parmi plusieurs systèmes de TA, et pour les systèmes de TA auto-apprentissage. / Machine Translation (MT) systems, which generate automatically the translation of a target language for each source sentence, have achieved impressive gains during the recent decades and are now becoming the effective language assistances for the entire community in a globalized world. Nonetheless, due to various factors, MT quality is still not perfect in general, and the end users therefore expect to know how much should they trust a specific translation. Building a method that is capable of pointing out the correct parts, detecting the translation errors and concluding the overall quality of each MT hypothesis is definitely beneficial for not only the end users, but also for the translators, post-editors, and MT systems themselves. Such method is widely known under the name Confidence Estimation (CE) or Quality Estimation (QE). The motivations of building such automatic estimation methods originate from the actual drawbacks of assessing manually the MT quality: this task is time consuming, effort costly, and sometimes impossible in case where the readers have little or no knowledge of the source language. This thesis mostly focuses on the CE methods at word level (WCE). The WCE classifier tags each word in the MT output a quality label. The WCE working mechanism is straightforward: a classifier trained beforehand by a number of features using ML methods computes the confidence score of each label for each MT output word, then tag this word with highest score label. Nowadays, WCE shows an increasing importance in many aspects of MT. Firstly, it assists the post-editors to quickly identify the translation errors, hence improve their productivity. Secondly, it informs readers of portions of sentence that are not reliable to avoid the misunderstanding about the sentence's content. Thirdly, it selects the best translation among options from multiple MT systems. Last but not least, WCE scores can help to improve the MT quality via some scenarios: N-best list re-ranking, Search Graph Re-decoding, etc. In this thesis, we aim at building and optimizing our baseline WCE system, then exploiting it to improve MT and Sentence Confidence Estimation (SCE). Compare to the previous approaches, our novel contributions spread of these following main points. Firstly, we integrate various types of prediction indicators: system-based features extracted from the MT system, together with lexical, syntactic and semantic features to build the baseline WCE systems. We also apply multiple Machine Learning (ML) models on the entire feature set and then compare their performances to select the optimal one to optimize. Secondly, the usefulness of all features is deeper investigated using a greedy feature selection algorithm. Thirdly, we propose a solution that exploits Boosting algorithm as a learning method in order to strengthen the contribution of dominant feature subsets to the system, thus improve of the system's prediction capability. Lastly, we explore the contributions of WCE in improving MT quality via some scenarios. In N-best list re-ranking, we synthesize scores from WCE outputs and integrate them with decoder scores to calculate again the objective function value, then to re-order the N-best list to choose a better candidate. In the decoder's search graph re-decoding, the proposition is to apply WCE score directly to the nodes containing each word to update its cost regarding on the word quality. Furthermore, WCE scores are used to build useful features, which can enhance the performance of the Sentence Confidence Estimation system. In total, our work brings the insightful and multidimensional picture of word quality prediction and its positive impact on various sectors for Machine Translation. The promising results open up a big avenue where WCE can play its role, such as WCE for Automatic Speech Recognition (ASR) System, WCE for multiple MT selection, and WCE for re-trainable and self-learning MT systems. Traduction automatique statistique Mesure confiance Champs conditionnels aléatoires Statistical machine translation Confidence Estimation N-best list re-ranking Boost- ing Feature Selection Quality Estimation 004
8	Re-Ranking auf Basis von Bradfordizing für die verteilte Suche in digitalen Bibliotheken Mayr, Philipp 06 March 2009 (has links) Trotz großer Dokumentmengen für datenbankübergreifende Literaturrecherchen erwarten akademische Nutzer einen möglichst hohen Anteil an relevanten und qualitativen Dokumenten in den Trefferergebnissen. Insbesondere die Reihenfolge und Struktur der gelisteten Ergebnisse (Ranking) spielt, neben dem direkten Volltextzugriff auf die Dokumente, inzwischen eine entscheidende Rolle beim Design von Suchsystemen. Nutzer erwarten weiterhin flexible Informationssysteme, die es unter anderem zulassen, Einfluss auf das Ranking der Dokumente zu nehmen bzw. alternative Rankingverfahren zu verwenden. In dieser Arbeit werden zwei Mehrwertverfahren für Suchsysteme vorgestellt, die die typischen Probleme bei der Recherche nach wissenschaftlicher Literatur behandeln und damit die Recherchesituation messbar verbessern können. Die beiden Mehrwertdienste semantische Heterogenitätsbehandlung am Beispiel Crosskonkordanzen und Re-Ranking auf Basis von Bradfordizing, die in unterschiedlichen Phasen der Suche zum Einsatz kommen, werden hier ausführlich beschrieben und im empirischen Teil der Arbeit bzgl. der Effektivität für typische fachbezogene Recherchen evaluiert. Vorrangiges Ziel der Promotion ist es, zu untersuchen, ob das hier vorgestellte alternative Re-Rankingverfahren Bradfordizing im Anwendungsbereich bibliographischer Datenbanken zum einen operabel ist und zum anderen voraussichtlich gewinnbringend in Informationssystemen eingesetzt und dem Nutzer angeboten werden kann. Für die Tests wurden Fragestellungen und Daten aus zwei Evaluationsprojekten (CLEF und KoMoHe) verwendet. Die intellektuell bewerteten Dokumente stammen aus insgesamt sieben wissenschaftlichen Fachdatenbanken der Fächer Sozialwissenschaften, Politikwissenschaft, Wirtschaftswissenschaften, Psychologie und Medizin. Die Evaluation der Crosskonkordanzen (insgesamt 82 Fragestellungen) zeigt, dass sich die Retrievalergebnisse signifikant für alle Crosskonkordanzen verbessern; es zeigt sich zudem, dass interdisziplinäre Crosskonkordanzen den stärksten (positiven) Effekt auf die Suchergebnisse haben. Die Evaluation des Re-Ranking nach Bradfordizing (insgesamt 164 Fragestellungen) zeigt, dass die Dokumente der Kernzone (Kernzeitschriften) für die meisten Testreihen eine signifikant höhere Precision als Dokumente der Zone 2 und Zone 3 (Peripheriezeitschriften) ergeben. Sowohl für Zeitschriften als auch für Monographien kann dieser Relevanzvorteil nach Bradfordizing auf einer sehr breiten Basis von Themen und Fragestellungen an zwei unabhängigen Dokumentkorpora empirisch nachgewiesen werden. / In spite of huge document sets for cross-database literature searches, academic users expect a high ratio of relevant and qualitative documents in result sets. It is particularly the order and structure of the listed results (ranking) that play an important role when designing search systems alongside the direct full text access for documents. Users also expect flexible information systems which allow influencing the ranking of documents and application of alternative ranking techniques. This thesis proposes two value-added approaches for search systems which treat typical problems in searching scientific literature and seek to improve the retrieval situation on a measurable level. The two value-added services, semantic treatment of heterogeneity (the example of cross-concordances) and re-ranking on Bradfordizing, which are applied in different search phases, are described in detail and their effectiveness in typical subject-specific searches is evaluated in the empirical part of the thesis. The preeminent goal of the thesis is to study if the proposed, alternative re-ranking approach Bradfordizing is operable in the domain of bibliographic databases, and if the approach is profitable, i.e. serves as a value added, for users in information systems. We used topics and data from two evaluation projects (CLEF and KoMoHe) for the tests. The intellectually assessed documents come from seven academic abstracting and indexing databases representing social science, political science, economics, psychology and medicine. The evaluation of the cross-concordances (82 topics altogether) shows that the retrieval results improve significantly for all cross-concordances, indicating that interdisciplinary cross-concordances have the strongest (positive) effect on the search results. The evaluation of Bradfordizing re-ranking (164 topics altogether) shows that core zone (core journals) documents display significantly higher precision than was seen for documents in zone 2 and zone 3 (periphery journals) for most test series. This post-Bradfordizing relevance advantage can be demonstrated empirically across a very broad basis of topics and two independent document corpora as well for journals and monographs. Evaluation Bradfordizing Crosskonkordanzen heterogen erschlossene Fachdatenbanken Information Retrieval Re-Ranking Kernzeitschriften Relevanz digitale Bibliotheken bibliographischen Datenbanken Zeitschriftenartikel Monographien evaluation Bradfordizing cross-concordances heterogeneous indexed databases Information Retrieval re-ranking core journals relevance digital libraries bibliographic databases journal articles monographs ST 252 AN 9300 AK 28100 AN 96300 ST 205 ddc:020

Search results