• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • Tagged with
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Utvärdering av sökmotorer i en svensk kontext / Evaluating search engines in a Swedish context

Adolfsson, Alexander, Ovesson, Christoffer January 2023 (has links)
The focus of this study was to evaluate different search engines on Swedish text. Information retrieval is widely used by both people and organizations, and it is important to be able to efficiently retrieve needed information at the right time. The study determined that relevance and speed are the most important factors in search engines. The evaluation measures the precision and recall which are relevance measurements, and speed of two search engines, Elastic search and MarkLogic. The evaluation has determined that there is no significant difference in the relevance of the retrieved results between the engines. The evaluation has also determined that there is a statistically significant difference in speed between the engines, with Elastic search outperforming MarkLogic. Both search engines performed very well in terms of successful searches, meaning to return a relevant document in the first 20 results. Both engines succeeded in fulfilling the information need 96% of the time. / Fokus för denna studie var att utvärdera olika sökmotorer på svensk text. Informationshämtning används i stor utsträckning av både människor och organisationer, och det är viktigt att effektivt kunna hämta nödvändig information vid rätt tidpunkt. Studien fastställde att relevans och hastighet är de viktigaste faktorerna för sökmotorer. Utvärderingen mäter precision och recall som är relevansmätvärden och responstid som hastighetmätvärde för två sökmotorer, Elasticsearch och MarkLogic. Utvärderingen har visat att det inte finns någon signifikant skillnad i relevansen av de hämtade resultaten mellan motorerna. Utvärderingen har också visat att det finns en statistiskt signifikant skillnad i hastighet mellan motorerna, där Elasticsearch överträffar MarkLogic. Båda sökmotorerna presterade väldigt bra när det gäller lyckade sökningar, vilket innebär att returnera ett relevant dokument i de första 20 resultaten. Båda motorerna lyckas uppfylla informationsbehovet 96% av tiden.
2

Improvement of Optical Character Recognition on Scanned Historical Documents Using Image Processing

Aula, Lara January 2021 (has links)
As an effort to improve accessibility to historical documents, digitization of historical archives has been an ongoing process at many institutions since the origination of Optical Character Recognition. The old, scanned documents can contain deteriorations acquired over time or caused by old printing methods. Common visual attributes seen on the documents are variations in style and font, broken characters, ink intensity, noise levels and damage caused by folding or ripping and more. Many of these attributes are disfavoring for modern Optical Character Recognition tools and can lead to failed character recognition. This study approaches stated problem by using image processing methods to improve the result of character recognition. Furthermore, common image quality characteristics of scanned historical documents with unidentifiable text are analyzed. The Optical Character Recognition tool used to conduct this research was the open-source Tesseract software. Image processing methods like Gaussian lowpass filtering, Otsu’s optimum thresholding method and morphological operations were used to prepare the historical documents for Tesseract. Using the Precision and Recall classification method, the OCR output was evaluated, and it was seen that the recall improved by 63 percentage points and the precision by 18 percentage points. This shows that using image pre-processing methods as an approach to increase the readability of historical documents for Optical Character Recognition tools is effective. Further it was seen that common characteristics that are especially disadvantageous for Tesseract are font deviations, occurrence of non-belonging objects, character fading, broken characters, and Poisson noise.

Page generated in 0.1016 seconds