Spelling suggestions: "subject:"fulltext""
1 |
Fehlertolerante Volltextsuche in elektronischen Enzyklopädien und Heuristiken zur FehlerratenverbesserungEsser, Wolfram Unknown Date (has links) (PDF)
Würzburg, Univ., Diss., 2005
|
2 |
Interdisziplinäre Aufsatzdatenbanken und ihre Verlinkung mit den Volltexten eine vergleichende Analyse ausgewählter Beispiele /Schulte-Derne, Heike, January 2004 (has links)
Stuttgart, FH, Diplomarb., 2003.
|
3 |
TopX efficient and versatile top-k query processing for text, structured, and semistructured dataTheobald, Martin January 2006 (has links)
Zugl.: Saarbrücken, Univ., Diss., 2006 / Hergestellt on demand
|
4 |
Fehlertolerante Volltextsuche in elektronischen Enzyklopädien und Heuristiken zur Fehlerratenverbesserung / Fault-tolerant Fulltext-Search in Electronic Encylopedias and Heuristics for Error Rate ImprovementEßer, Wolfram January 2005 (has links) (PDF)
In der vorliegenden Arbeit wird das Konzept und die praktische Umsetzung einer fehlertoleranten Volltextsuche vorgestellt, welche die unscharfe Recherche nach Suchmustern in umfangreichen, digitalen, enzyklopädischen Werken ermöglichen. Das dabei zur Anwendung kommende neue Verfahren, welches durch Gewichte gesteuert das ursprüngliche Benutzer-Suchmuster in seiner Gestalt verändert (Weighted Pattern Morphing, WPM) und anschließend mit einer nachgeschalteten exakten Volltextsuche sucht, konnte in zahlreichen kommerziellen Anwendungsfällen seine Praxistauglichkeit beweisen. Darunter ist die Anwendung zur unscharfen Suche in einer mittelalterlichen, handschriftlichen Chronik besonders interessant, da diese die frühneuhochdeutsche Sprache verwendet und es zur damaligen Zeit noch keine vereinheitlichte Rechtschreibung gab. Aber nicht nur bei der Endbenutzer-Suche kann WPM eingesetzt werden - auch im redaktionellen Umfeld konnten mit dem Verfahren noch mehrere hundert Tippfehler in einem bereits mehrfach lektorierten digitalen Lexikon gefunden werden. Dabei arbeitet das Verfahren deutlich schärfer, als die sonst zur unscharfen Suche (und damit zur Fehler-Suche) verwendete Edit-Distanz. Abschließend wird in der Arbeit noch ein Verfahren vorgestellt, mit dem aus einem 3D-Drahtgitter-Modell und den Faksimile-Scans einer mittelalterlichen Handschrift automatisch ein virtuelles Buch zum Durchblättern am PC erstellt wurde. / In the work reported here, we present a new way of performing fault-tolerant fulltext retrieval on large text corpora, such as scientific encyclopedias. The weighted pattern morphing (WPM) technique introduced in this paper overcomes disadvantages of both the popular edit distance measure and the Soundex code approaches, yet keeping their flexibility. This algorithm handles phonetic similarities; common typing errors such as omission or transposition of letters, and inconsistent usage of abbreviations and hyphenation. After showing how WPM can be implemented efficiently, we present a novel method of how the weights of the internal penalty matrix can be automatically adjusted for even better results. Though the described technique can be applied without prior knowledge of actual user patterns, re-examination with a large number of online-user's patterns proves the portability of this fine-tuning approach. We further show how shifting the penalty matrix from one language to another can be accomplished. The described WPM technique is integrated into a large commercial pharmaceutic encyclopedia CDROM, an online dermatological encyclopedia, and an online-reference encyclopedia of parasitology research, thus also proving its “road capability”. The thesis shows further the possibility to use WPM in the development phase of a digital encyclopedia to spot and correct typos and errors. A few hundred errors could be corrected in a text corpus that was reviewed several times before. Finally, the work presents an automatic approach in building a virtual book from a 3D-wireframe model and facsimile scans of a medieval handwriting. The user can flip pages back and forth in this virtual book, where the original version of the book is not accessible to the masses.
|
5 |
Improving the quality of the text, a pilot project to assess and correct the OCR in a multilingual environmentMaurer, Yves 16 October 2017 (has links)
The user expectation from a digitized collection is that a full text search can be performed and that it will retrieve all the relevant results. The reality is, however, that the errors introduced during Optical Character Recognition (OCR) degrade the results significantly and users do not get what they expect. The National Library of Luxembourg started its digitization program in 2000 and in 2005 started performing OCR on the scanned images. The OCR was always performed by the scanning suppliers, so over the years quite a lot of different OCR programs in different versions have been used. The manual parts of the digitization chain (handling, scanning, zoning, …) are difficult, costly and mostly incompressible, so the library thought that the supplier should focus on a high quality level for these parts. OCR is an automated process and so the library believed that the text recognized by the OCR could be improved automatically since OCR software improves over the years. This is why the library has never asked the supplier for a minimum recognition rate.
The author is proposing to test this assumption by first evaluating the base quality of the text extracted by the original supplier, followed by running a contemporary OCR program and finally comparing its quality to the first extraction. The corpus used is the collection of digitized newspapers from Luxembourg, published from the 18th century to the 20th century. A complicating element is that the corpus consists of three main languages, German, French and Luxembourgish, which are often present on a single newspaper page together. A preliminary step is hence added to detect the language used in a block of text so that the correct dictionaries and OCR engines can be used.
|
6 |
Jahresbericht 2003 / Universitätsbibliothek ChemnitzThümer, Ingrid 14 August 2007 (has links) (PDF)
Jahresbericht der Universitätsbibliothek Chemnitz - Berichtsjahr 2003 / Annual report of the University Library of Chemnitz in 2003
|
7 |
Jahresbericht 2003 / Universitätsbibliothek ChemnitzThümer, Ingrid 14 August 2007 (has links)
Jahresbericht der Universitätsbibliothek Chemnitz - Berichtsjahr 2003 / Annual report of the University Library of Chemnitz in 2003
|
Page generated in 0.0332 seconds