Global ETD Search

1	Implementing an enterprise search platform using Lucene.NET Pettersson, Fredrik, Pettersson, Niklas January 2012 (has links) This master’s thesis, conducted at Sectra Medical Systems AB, investigates the feasibility of integrating a search platform, built on modern search technology, into the complex architecture of existing products. This was done through the implementation and integration of a search platform prototype, called Sectra Enterprise Search. It was built upon the search engine library Lucene.NET, written in C# for the Microsoft .NET Framework. Lucene.NET originates from the Java library Lucene, which is highly regarded and widely used for similar purposes. During the development process a lot of requirements for the search platform were identified, including high availability, scalability and maintainability. Besides full text search for information in a variety of data sources, desirable features include autocompletion and highlighting. Sectra Enterprise Search was successfully integrated within the architecture of existing products. The architecture of the prototype consists of multiple layers, with the search engine functionality at the very bottom and a web service handling all incoming request at the top. To sum up, integrating a search platform based on modern search technology into the architecture of existing products infers full control of deployment, users searching in a more intuitive manner and reasonable search response times. Enterprise Search Lucene Lucene.NET
2	Scalability of Stepping Stones and Pathways Venkatachalam, Logambigai 30 May 2008 (has links) Information Retrieval (IR) plays a key role in serving large communities of users who are in need of relevant answers for their search queries. IR encompasses various search models to address different requirements and has introduced a variety of supporting tools to improve effectiveness and efficiency. "Search" is the key focus of IR. The classic search methodology takes an input query, processes it, and returns the result as a ranked list of documents. However, this approach is not the most effective method to support the task of finding document associations (relationships between concepts or queries) both for direct or indirect relationships. The Stepping Stones and Pathways (SSP) retrieval methodology supports retrieval of ranked chains of documents that support valid relationships between any two given concepts. SSP has many potential practical and research applications, which are in need of a tool to find connections between two concepts. The early SSP "proof-of-concept" implementation could handle only 6000 documents. However, commercial search applications will have to deal with millions of documents. Hence, addressing the scalability limitation becomes extremely important in the current SSP implementation in order to overcome the limitations on handling large datasets. Research on various commercial search applications and their scalability indicates that the Lucene search tool kit is widely used due to its support for scalability, performance, and extensibility features. Many web-based and desktop applications have used this search tool kit to great success, including Wikipedia search, job search sites, digital libraries, e-commerce sites, and the Eclipse Integrated Development Environment (IDE). The goal of this research is to re-implement SSP in a scalable way, so that it can work for larger datasets and also can be deployed commercially. This work explains the approach adopted for re-implementation focusing on scalable indexing, searching components, new ways to process citations (references), a new approach for query expansion, document clustering, and document similarity calculation. The experiments performed to test the factors such as runtime and storage proved that the system can be scaled up to handle up to millions of documents. / Master of Science Scalability Lucene CiteSeer Connection finding search framework
3	Enhancing a Web Crawler with Arabic Search. Nguyen, Qui V. 25 July 2012 Many advantages of the Internetâ ease of access, limited regulation, vast potential audience, and fast flow of informationâ have turned it into the most popular way to communicate and exchange ideas. Criminal and terrorist groups also use these advantages to turn the Internet into their new play/battle fields to conduct their illegal/terror activities. There are millions of Web sites in different languages on the Internet, but the lack of foreign language search engines makes it impossible to analyze foreign language Web sites efficiently. This thesis will enhance an open source Web crawler with Arabic search capability, thus improving an existing social networking tool to perform page correlation and analysis of Arabic Web sites. A social networking tool with Arabic search capabilities could become a valuable tool for the intelligence community. Its page correlation and analysis results could be used to collect open source intelligence and build a network of Web sites that are related to terrorist or criminal activities. Nutch Lucene Web Crawler Information Retrieval in Arabic Stemming in Arabic
4	Beach Museum Web Application Kakkireni, Nithin Kumar January 1900 (has links) Master of Science / Department of Computer Science / Daniel Andresen / This project involves in developing a responsive web application for Beach Museum at Manhattan, Kansas. Application is built on development boxes using Amazon web services. Project is built on MVC architecture that helps user to search images, create their own collection from the images and include an admin module. Migrating the current existing SQL database to couchDB for better performance of the available data. Integrated Apache Lucene to support text search in the couch database writing different indexes to retrieve the results. Implementing core functionalities like basic search, advanced search, filter objects with respective to artist, decade, object type and relevance using different indexes and Mango queries in the couchDB. Search Results are further chunked and displayed to the user. Web storage API’s were used to provide the functionality for a user to create their own collection (set of Images). Built an Admin module to perform CRUD operations the database. Admin module involves in creating exhibitions, adding/editing works and artists in the couch DB. Web Application CouchDB Lucene Web Storage Beach Museum
5	Information Retrieval Using Lucene and WordNet Whissel, Jhon F. 23 December 2009 (has links) No description available. Computer Science Lucene WordNet Nutch open source information retrieval
6	Modelovanje i pretraživanje nad nestruktuiranim podacima i dokumentima u e-Upravi Republike Srbije / Modeling and searching over unstructured data and documents in e-Government of the Republic of Serbia Nikolić Vojkan 27 September 2016 (has links) <p>Danas, servisi e-Uprave u različitim oblastima koriste question answer sisteme koncepta u poku&scaron;aju da se razume tekst i da pomognu građanima u dobijanju odgovora na svoje upite u bilo koje vreme i veoma brzo. Automatsko mapiranje relevantnih dokumenata se ističe kao važna aplikacija za automatsku strategiju klasifikacije: upit-dokumenta. Ova doktorska disertacija ima za cilj doprinos u identifikaciji nestruktuiranih dokumenata i predstavlja važan korak ka razja&scaron;njavanju uloge eksplicitnih koncepata u pronalaženju podataka uop&scaron;te ajče&scaron; a reprezenta vna &scaron;ema u tekstualnoj kategorizaciji je BoW pristup, kada je u pozadini veliki skup znanja. Ova disertacija uvodi novi pristup ka stvaranju koncepta zasnovanog na tekstualnoj prezantaciji i primeni kategorizacije teksta, kako bi se stvorile definisane klase u slučaju sažetih tekstualnih dokumenata Takođe, ovde je prikazan algoritam zasnovan na klasifikaciji, modelovan za upite koji odgovaraju temi. Otežavaju a okolnost u slučaju ovog koncepta, koji prezentuje termine sa visokom frekvencijom pojavljivanja u upitma, zasniva se na sličnostima u prethodno definisanim klasama dokumenata Rezultati eksperimenta iz oblasti Krivičnog zakonika Republike Srbije, u ovom slučaju i studija, pokazuju da prezentacija teksta zasnovana na konceptu ima zadovoljavaju e rezultate i u slučaju kada ne postoji rečnik za datu oblast.</p> / <p>Nowadays, the concept of Question Answering Systems (QAS) has been used by e-government services in various fields as an attempt to understand the text and help citizens in getting answers to their questions promptly and at any time. Automatic mapping of relevant documents stands out as an important application for automatic classification strategy: query-document. This doctoral thesis aims to contribute to identification of unstructured documents and represents an important step towards clarifying the role of explicit concepts within Information Retrieval in general. The most common scheme in text categorization is BoW approach, especially when, as a basis, we have a large set of knowledge. This thesis introduces a new approach to the creation of text presentation based concept and applying text categorization, with the aim to create a defined class in case of compressed text documents.Also, this paper discusses the classification based algorithm modeled for queries that suit the theme. What makes the situation more complicated is the fact that this concept is based on the similarities in previously defined classes of documents and terms with a high frequency of appearance presented in queries. The results of the experiment in the field of the Criminal Code, and this paper as well, show that the text presentation based concept has satisfactory results even in case where there is no vocabulary for certain field.</p>
7	Improve and optimize search engine : To provide better and relevant content for the customer Ramsell, Daniel January 2019 (has links) This report has conducted a research of comparing a few open source search engines. The research contains two evaluation processes, the first evaluation will evaluate each open source search engine found on today’s market. Points will be given between one to five points depending on how well the open source search engine meets the requirements. The open source search engine with the highest score will then be chosen for implementation. The first evaluation resulted in Elasticsearch being the selected open source search engine and will continue to the implementation phase. The second evaluation will be measuring the system performance and the relevance of the SERP (Search Engine Results Pages). This phase will evaluate the system performance by taking time measurements on how long it takes for the search engines to deliver the SERP. The relevance of the search results will be judge by a group of CSN employers. The group will be giving point be-tween one to five points depending on the relevance of the SERP. It will eval-uate Elasticsearch with the search engine CSN are using today on their web-site (www.csn.se). This phase resulted in Elasticsearch being the better in performance measurements but not in the relevance of the SERP. This was discussed and came to the conclusion that most points were lost because of the first search result Elasticsearch delivered. If this search result was re-moved Elasticsearch could deliver as good results as the old search engine. The survey came to the conclusion that Elasticsearch is recommended for CSN if certain problem areas could be corrected before implementation into their systems. Open Source Search Engine Apache Lucene Core Elasticsearch Sphinx Xapian Solr SiteVision Computer Systems Datorsystem
8	Intelligent Retrieval and Clustering of Inventions Andrabi, Liaqat Hussain January 2015 (has links) Ericsson’s Region IPR & Licensing (RIPL) receives about 3000 thousands Invention Disclosures (IvDs) every year submitted by researchers as a result of their R&D activities. To decide whether an IvD has a good business value and a patent application should be filed; a rigorous evaluation process is carried out by a selected Patent Attorney (PA). One of most important elements of the evaluation process is to find prior art similar, including similar IvDs that have been evaluated before. These documents are not public and therefore can’t be searched using available search tools. For now the process of finding prior art is done manually (without the help of any search tools) and takes up significant amount of time. The aim of this Master’s thesis is to develop and test an information retrieval search engine as a proof of concept to find similar Invention Disclosure documents and related patent applications. For this purpose, a SOLR database server is setup with up to seven thousand five hundred (7500) IvDs indexed. A similarity algorithm is implemented which is customized to weight different fields. LUCENE is then used to query the server and display the relevant documents in a web application. Information Retrieval Solr Lucene Patents Patent Search Engine NoSql. Computer and Information Sciences Data- och informationsvetenskap
9	Simplifying Q&A Systems with Topic Modelling Kozee, Troy January 2017 (has links) No description available. Computer Science Information Retrieval question-answer natural language processing topic modelling lucene
10	Similarity Search in Document Collections / Similarity Search in Document Collections Jordanov, Dimitar Dimitrov January 2009 (has links) Hlavním cílem této práce je odhadnout výkonnost volně šířeni balík Sémantický Vektory a třída MoreLikeThis z balíku Apache Lucene. Tato práce nabízí porovnání těchto dvou přístupů a zavádí metody, které mohou vést ke zlepšení kvality vyhledávání.

Search results