Spelling suggestions: "subject:"text search"" "subject:"next search""
1 |
Using Rabin-Karp fingerprints and LevelDB for faster searchesDeighton, Richard A. 01 December 2012 (has links)
This thesis represents the results of a study into using fingerprints generated according to the Rabin-Karp Algorithm, and a database LevelDB to achieve Text Search times below GREP, which is a standard command-line UNIX text search tool.
Text Search is a set of algorithms that find a string of characters called a Search Pattern in a much larger string of characters in a document we call a text file.
The Rabin-Karp Algorithm iterates through a text file converting character strings into fingerprints at each location. A fingerprint numerically represents a window length string of characters to the left of its location. The algorithm compares the calculated fingerprint to the Search Pattern’s fingerprint. When fingerprints are not equal, we can guarantee the corresponding strings will not match. Whereas when fingerprints are, the strings probably match. A verification process confirms matches by checking respective characters.
Our application emerges after making the following major changes to the Rabin-Karp Algorithm. First, we employ a two-step technique rather than one. During step 1, the preprocessing step, we calculate and store fingerprints in a LevelDB database called an Index Database. This is our first major change unique to us. Step 2, the matching step, is our second unique change. We use the Index Database to look-up the Search Pattern’s fingerprint and gather its set of locations. Finally, we allow the pattern to be any length relative to the window length. We even created an equation to check if the difference in length is too long for the fingerprint’s number system base.
We facilitated our performance experiments by first building our application and testing it against GREP for a wide range of different parameters. Our conclusions and recommendations determine that although we currently only outperform GREP in about half the cases, we identify some promising opportunities to modify some parts of our application so that we can outperform GREP in all instances. / UOIT
|
2 |
An investigation into children's use of the lookback strategyCataldo, Maria Guilia January 2000 (has links)
No description available.
|
3 |
An Investigation into User Text Query and Text Descriptor ConstructionPfitzner, Darius Mark, pfit0022@flinders.edu.au January 2009 (has links)
Cognitive limitations such as those described in Miller's (1956) work on channel capacity and Cowen's (2001) on short-term memory are factors in determining user cognitive load and in turn task performance. Inappropriate user cognitive load can reduce user efficiency in goal realization. For instance, if the user's attentional capacity is not appropriately applied to the task, distractor processing can tend to appropriate capacity from it. Conversely, if a task drives users beyond their short-term memory envelope, information loss may be realized in its translation to long-term memory and subsequent retrieval for task base processing.
To manage user cognitive capacity in the task of text search the interface should allow users to draw on their powerful and innate pattern recognition abilities. This harmonizes with Johnson-Laird's (1983) proposal that propositional representation is tied to mental models. Combined with the theory that knowledge is highly organized when stored in memory an appropriate approach for cognitive load optimization would be to graphically present single documents, or clusters thereof, with an appropriate number and type of descriptors. These descriptors are commonly words and/or phrases.
Information theory research suggests that words have different levels of importance in document topic differentiation. Although key word identification is well researched, there is a lack of basic research into human preference regarding query formation and the heuristics users employ in search. This lack extends to features as elementary as the number of words preferred to describe and/or search for a document. Contrastive understanding these preferences will help balance processing overheads of tasks like clustering against user cognitive load to realize a more efficient document retrieval process. Common approaches such as search engine log analysis cannot provide this degree of understanding and do not allow clear identification of the intended set of target documents.
This research endeavours to improve the manner in which text search returns are presented so that user performance under real world situations is enhanced. To this end we explore both how to appropriately present search information and results graphically to facilitate optimal cognitive and perceptual load/utilization, as well as how people use textual information in describing documents or constructing queries.
|
4 |
Harnessing Data Parallel Hardware for Server WorkloadsAgrawal, Sandeep R. January 2015 (has links)
<p>Trends in increasing web traffic demand an increase in server throughput while preserving energy efficiency and total cost of ownership. Present work in optimizing data center efficiency primarily focuses on using general purpose processors, however these might not be the most efficient platforms for server workloads. Data parallel hardware achieves high energy efficiency by amortizing instruction costs across multiple data streams, and high throughput by enabling massive parallelism across independent threads. These benefits are considered traditionally applicable to scientific workloads, and common server tasks like page serving or search are considered unsuitable for a data parallel execution model.</p><p>Our work builds on the observation that server workload execution patterns are not completely unique across multiple requests. For a high enough arrival rate, a server has the opportunity to launch cohorts of similar requests on data parallel hardware, improving server performance and power/energy efficiency. We present a framework---called Rhythm---for high throughput servers that can exploit similarity across requests to improve server performance and power/energy efficiency by launching data parallel executions for request cohorts. An implementation of the SPECWeb Banking workload using Rhythm on NVIDIA GPUs provides a basis for evaluation. </p><p>Similarity search is another ubiquitous server workload that involves identifying the nearest neighbors to a given query across a large number of points. We explore the performance, power and dollar benefits of using accelerators to perform similarity search for query cohorts in very high dimensions under tight deadlines, and demonstrate an implementation on GPUs that searches across a corpus of billions of documents and is significantly cheaper than commercial deployments. We show that with software and system modifications, data parallel designs can greatly outperform common task parallel implementations.</p> / Dissertation
|
5 |
Elasticity of ElasticsearchTsaousi, Kleivi Dimitris January 2021 (has links)
Elasticsearch has evolved from an experimental, open-source, NoSQL database for full-text documents to an easily scalable search engine that canhandle a large amount of documents. This evolution has enabled companies todeploy Elasticsearch as an internal search engine for information retrieval (logs,documents, etc.). Later on, it was transformed as a cloud service and the latestdevelopment allows a containerized, serverless deployment of the application,using Docker and Kubernetes.This research examines the behaviour of the system by comparing the length and appearance of single-term and multiple-terms queries, the scaling behaviour and the security of the service. The application is deployed on Google Cloud Platform as a Kubernetes cluster hosting containerized Elasticsearch images that work as databasenodes of a bigger database cluster. As input data, a collection of JSON formatted documents containing the title and abstract of published papersin the field of computer science was used inside a single index. All the plots were extracted using Kibana visualization software. The results showed that multiple-term queries put a bigger stress on thesystem than single-term queries. Also the number of simultaneous users querying in the system is a big factor affecting the behaviour of the system. By scaling up the number of Elasticsearch nodes inside the cluster, indicated that more simultaneous requests could be served by the system.
|
6 |
Návrh vyhledávacího systému pro moderní potřeby / Design of search engine for modern needsMaršálek, Tomáš January 2016 (has links)
In this work I argue that field of text search has focused mostly on long text documents, but there is a growing need for efficient short text search, which has different user expectations. Due to this reduced data set size requirements different algorithmic techniques become more computationally affordable. The focus of this work is on approximate and prefix search and purely text based ranking methods, which are needed due to lower precision of text statistics on short text. A basic prototype search engine has been created using the researched techniques. Its capabilities were demonstrated on example search scenarios and the implementation was compared to two other open source systems representing currently recommended approaches for short text search problem. The results show feasibility of the implemented prototype regarding both user expectations and performance. Several options of future direction of the system are proposed.
|
7 |
Matematický vyhledávač / Mathematical Search EngineMišutka, Jozef January 2013 (has links)
Mathematics has been used to describe phenomena and problems in many re- search fields for centuries. The basic elements used in the description are formu- lae which express information symbolically. However, searching for mathematical knowledge in digital form using available tools is still cumbersome. We address this issue by presenting the mathematical search engine EgoMath, based on a full text searching, which can search for mathematical formulae and text. We perform an eval- uation over a large collection of documents showing that our solution is usable. Our approach can be used with huge document collections by applying one specialised technique. In order to provide a valuable evaluation of the quality, we built an al- ternative mathematical search engine using the feature extraction technique proposed by Ma et al. We propose important improvements to this solution achieving interest- ing results. We perform the first ever cross-evaluation of mathematical search engines based on different algorithms. A comprehensive survey of existing techniques avail- able, presented in this thesis, completes the picture of mathematical searching.
|
8 |
Webová aplikace pro fulltextové vyhledávání nad PDF dokumenty / Web Application for Fulltext Search in PDF DocumentsSvoboda, Ondřej January 2012 (has links)
This master's thesis describes principles of full text search engines, design and implementation of web application for referencing and full text searching in PDF documents. It also contains an overview and comparison with currently available reference management software. There are discussed bibliographic information export possibilities in various citation styles and formats. Final application is written in PHP scripting language and uses MySQL database.
|
9 |
Use Case Driven Evaluation of Database Systems for ILDAThapa, Shova 18 November 2022 (has links)
No description available.
|
Page generated in 0.0488 seconds