Jacobson, Bryan L.
19 February 1992
Index size savings from three techniques are measured. The three techniques are: 1) eliminating common, low information words found in a "stop list" (such as: of, the, at, etc.), 2) truncating terms by eliminating word stems (such as: -s, -ed, -ing, etc.), and 3) simple data compression. Savings are measured on two moderately large collections of text. The index size savings that result from using the techniques individually and in combination are reported. The impact on query performance in terms of speed, recall and precision are estimated. / Graduation date: 1992
Itakura, Kalista Yuki
Traditional information retrieval applications, such as Web search, return atomic units of retrieval, which are generically called ``documents''. Depending on the application, a document may be a Web page, an email message, a journal article, or any similar object. In contrast to this traditional approach, focused retrieval helps users better pin-point their exact information needs by returning results at the sub-document level. These results may consist of predefined document components~---~such as pages, sections, and paragraphs~---~or they may consist of arbitrary passages, comprising any sub-string of a document. If a document is marked up with XML, a focused retrieval system might return individual XML elements or ranges of elements. This thesis proposes and evaluates a number of approaches to focused retrieval, including methods based on XML markup and methods based on arbitrary passages. It considers the best unit of retrieval, explores methods for efficient sub-document retrieval, and evaluates formulae for sub-document scoring. Focused retrieval is also considered in the specific context of the Wikipedia, where methods for automatic vandalism detection and automatic link generation are developed and evaluated.
17 December 2010
One of the greatest challenges in information retrieval is to develop an intelligent system for user and machine interaction that supports users in their quest for relevant information. The dramatic increase in the amount of Web content gives rise to the need for a large-scale distributed information retrieval system, targeted to support millions of users and terabytes of data. To retrieve information from such a large amount of data in an efficient manner, the index is split among the servers in a distributed information retrieval system. Thus, partitioning the index among these collaborating nodes plays an important role in enhancing the performance of a distributed search engine. The two widely known inverted index partitioning schemes for a distributed information retrieval system are document partitioning and term partitioning. %In a document partitioned system, each of the server hosts a subset of the documents in the collection, and execute every query against its local sub-collection. In a term partitioned index, each node is responsible for a subset of the terms in the collection, and serves them to a central node as they are required for query evaluation. In this thesis, we introduce the Document over Term inverted index distribution scheme, which splits a set of nodes into several groups (sub-clusters) and then performs document partitioning between the groups and term partitioning within the group. As this approach is based on the term and document index partitioning approaches, we also refer it as a Hybrid Inverted Index. This approach retains the disk access benefits of term partitioning and the benefits of sharing computational load, scalability, maintainability, and availability of the document partitioning. We also introduce the Document over Document index partitioning scheme, based on the document partitioning approach. In this approach, a set of nodes is split into groups and documents in the collection are partitioned between groups and also within each group. This strategy retains all the benefits of the document partitioning approach, but reduces the computational load more effectively and uses resources more efficiently. We compare distributed index approaches experimentally and show that in terms of efficiency and scalability, document partition based approaches perform significantly better than the others. The Document over Term partitioning offers efficient utilization of search-servers and lowers disk access, but suffers from the problem of load imbalance. The Document over Document partitioning emerged to be the preferred method during high workload.
In memory, encoding and retrieval are often conceived of as two separate processes. However, there is substantial evidence to suggest that this view is wrong—that they are instead highly interdependent processes. One recent example is from Jacoby, Shimizu, Daniels, and Rhodes (2005a), who showed that new words presented as foils among a list of old words that had been deeply encoded were themselves subsequently better recognized than were new words presented as foils among a list of old words that had been shallowly encoded. This paradigm, referred to as memory-for-foils, not only demonstrates a link between encoding and retrieval, but also has led to a proposal about what form this interaction is taking in this task. Jacoby et al. (2005a) proposed that people put in place a retrieval mode that leads to a reprocessing of the original encoding state, which is incidentally applied across both old and new items within the context of a recognition memory test. Such a constrained-retrieval account suggests an intimate relation between encoding and retrieval processes that allows for memories to be highly integrated. The goal of this thesis is to provide a better understanding of the generalizability and limitations of this memory-for-foils phenomenon and, ultimately, to provide more direct evidence for the interaction of these processes. Experiments 1 and 2 began by replicating the memory-for-foils phenomenon as well as an experiment by Marsh et al. (2009b) which confirmed that the phenomenon does not result simply from strength of encoding differences. Experiment 3 then substituted a deep vs shallow imagery manipulation for the levels-of-processing manipulation, demonstrating that the effect is robust and that it generalizes, also occurring with a different type of encoding. Experiment 4 extended the generalizability of the task to factual phrases. Experiment 5 then moved on to testing the encoding/retrieval interactions by once again employing the imagery encoding manipulation with an additional quality judgment in the final recognition memory test. Using the remember/know paradigm (Gardiner, 1988; Tulving, 1985) demonstrated that more highly-detailed memories were associated with foils from the test of deep items than with foils from the test of shallow items. From there, response time was used to infer processing speed in Experiment 6a, in a test of whether foils tested among deep items incur an advantage independent of the manipulation undergone by those items. When a lexical decision test replaced the final recognition test, there was no evidence of a memory advantage for “deep” foils over “shallow” foils. Finally, Experiment 6b provided compelling evidence for context-related encoding during tests of deeply encoded words, showing enhanced priming for foils presented among deeply encoded targets when participants made the same deep encoding judgments on those items as were made on the targets during study. Taken together, these findings provide support for the source-constrained retrieval hypothesis and for the idea of a retrieval mode. New information—information that we may not even be intending to remember— is influenced by how surrounding items are encoded and retrieved, as long as the surrounding items recruit a coherent mode of processing. This demonstrates a clear need to consider encoding and retrieval as highly interactive processes and to avoid conceptualizing them as entirely separate entities. This is a crucial part of increasing our understanding of the fundamental processes in memory.
Spatial Relationship Image Retrieval employing Multiple-Instance Learning and Orthogonal Fractal BasesLai, Chin-Ning 01 July 2006 (has links)
The objective of the present work is to propose a novel method to extract a stable feature set representative of image content. Each image is represented by a linear combination of fractal orthonormal basis vectors. The mapping coefficients of an image projected onto each orthonormal basis constitute the feature vector. The set of orthonormal basis vectors are generated by utilizing fractal iterative function through target and domain blocks mapping. The distance measure remains consistent, i.e., isometric embedded, between any image pairs before and after the projection onto orthonormal axes. Not only similar images generate points close to each other in the feature space, but also dissimilar ones produce feature points far apart. The above statements are logically equivalent to that distant feature points are guaranteed to map to images with dissimilar contents, while close feature points correspond to similar images. Therefore, utilizing coefficients derived from the proposed linear combination of fractal orthonormal basis as key to search image database will retrieve similar images, while at the same time exclude dissimilar ones. The coefficients associated with each image can be later used to reconstruct the original. The content-based query is performed in the compressed domain. This approach is efficient for content-based query. Scaling, rotational, translation, mirroring and horizontal/vertical flipping variations of a query image are also supported. A symbolic image database system is a system in which a large amount of image data and their related information are represented by both symbolic images and physical images. How to perceive spatial relationships among the components in a symbolic image is an important criterion to find a match between the symbolic image of the scene object and the one being store as a modal in the symbolic image database. Spatial reasoning techniques have been applied to pictorial database, in particular those using 2D strings as an index representation have been successful. In most of the previous approaches for iconic indexing, for simplifying the concerns, they apply the MBR (Minimum bounding rectangle) of two objects to define the spatial relationship between them. Multiple instance learning algorithms provide ways for computer program to improve automatically with experience. Most images are inherently ambiguous disseminators of information. Unfortunately, interfaces to image databases normally involve the user giving the system ambiguous queries. By treating each query as a Multiple-Instance example, we make the ambiguity in each image explicit. In addition, by receiving several positive and negative examples, the system can learn what the user desires. Using the learned concept, the system returns images from the database that are close to that concept. In this project, we propose to apply the Multiple-Instance learning model by deriving the projection vector of fractal orthonormal bases for a small number of training images to learn what images from database are of interest to the user.
24 July 2001
With the advent of multimedia computer, the voice and images could be stored in database. How to retrieve the information user want is a heard question. To query the large numbers of digital images which human desired is not a simple task. The studies of traditional image database retrieval use color, shape, and content to analyze a digital image, and create the index file. But they cannot promise that use the similar index files will find the similar images, and the similar images can get the similar index files. In this thesis, we propose a new method to analyze a digital image by fractal code. Fractal coding is an effective method to compress digital image. In fractal code, the image is partitioned into a set of non-overlapping range blocks, and a set of overlapping domain blocks is chosen from the same image. For all range blocks, we need to find one domain block and one iteration function such that the mapping from the domain block is similar to the range block. Two similar images have similar iterated functions, and two similar iterated functions have similar attractors. In these two reasons, we use the iteration function to create index file. We have proved fractal code can be a good index file in chapter 3. In chapter 4, we implement the fractal-based image database. In this system, we used fractal code to create index file, and used Fisher discriminate function, color, complexity, and illumination to decide the output order.
16 May 2003
With the advent of multimedia computer, the voice and images could be stored in database. How to retrieve the information user want is a heard question. To query the large numbers of digital images which human desired is not a simple task. The studies of traditional image database retrieval use color, shape, and content to analyze a digital image, and create the index file. But they cannot promise that use the similar index files will find the similar images, and the similar images can get the similar index files. In this thesis, we propose a new method to analyze a digital image by fractal code. Fractal coding is an effective method to compress digital image. In fractal code, the image is partitioned into a set of non-overlapping range blocks, and a set of overlapping domain blocks is chosen from the same image. For all range blocks, we need to find one domain block and one iteration function such that the mapping from the domain block is similar to the range block. Two similar images have similar iterated functions, and two similar iterated functions have similar attractors. In these two reasons, we use the iteration function to create index file. We have proved fractal code can be a good index file in chapter 2. In chapter 3, we implement the fractal-based image database. In this system, we used fractal code to create index file, and used Fisher discriminate function, color, complexity, and illumination to decide the output order.
Flesch, Marie H.
15 November 2004
Electrodermal activity (EDA), an indicator of arousal of the sympathetic nervous system, was investigated as a potential correlate of feeling-of-knowing (FOK) and tip-of-the-tongue (TOT) states. In Experiment 1, skin conductance was measured while participants answered general knowledge questions and made binary FOK and TOT judgments. Significant correlations were found between frequency of skin conductance responses (SCRs) and presence of both FOK and TOT states. In Experiment 2, warmth ratings were used and a follow-up clue session was added to offer participants the opportunity to resolve initially unanswered questions. SCR frequency during TOT states was significantly predictive of resolution during the clue period, although not as predictive as participants' warmth ratings. The potential of EDA as an on-line, non-intrusive measure of metamemory and memory retrieval is discussed.
29 July 2008
We have proposed and demonstrated a technique for the measurement of the wavefront of a diode laser beam with a large dynamic range. Our technique is a modified version of Hartmann and Shack-Hartmann wavefront sensor. The modified version is capable of providing a large dynamic range (180 degrees). The wavefront measurement exhibits a precision of ( 0.02 degrees), subject to a standard deviation governed by the diffraction limit (~£f/d). Using the physical measurement of the wavefront, we are able to reconstruct the electric fields of a diode laser beam at any location, including the far-field and near-field. The reconstructed electric fields were computed form the data of the intensity and the phase distribution by means of Fourier transform. The information about the electric field can be very useful in the design of microlens for the efficient coupling of light source into optical components. The results indicate that the wavefront sensor with large dynamic range can provide a reliable method for measuring the wavefront distributions of diode lasers with large divergence angles. However, the numerical near-field intensity is 150% deviated from the measured near-field intensity because of the inherent inaccuracy in the wavefront measurement. In this study, we have measured the near field intensity distribution directly with an objective and a CCD camera. We found that the distribution of the mode field was symmetric at a distance of 8£gm from a diode laser and the mode field diameter was 4.75£gm. Using the phase retrieval algorithms, the radii of the near-field wavefront in the vertical axis and the horizontal axis were 8£gm and 41£gm, respectively. Through the geometrical optics, the optimum curvatures of elliptic-cone-shaped lensed fiber for efficient coupling in the vertical axis and the horizontal axis were 4£gm and 20.5£gm individually. Once we know the optimum curvatures of elliptic-cone- shaped lensed fiber, we can fabricate it using grinding and fusing.
Simultaneously searching with multiple algorithm settings an alternative to parameter tuning for suboptimal single-agent search /Valenzano, Richard. January 2009 (has links)
Thesis (M. Sc.)--University of Alberta, 2009. / Title from PDF file main screen (viewed on Nov. 27, 2009). "A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of Science, Department of Computing Science, University of Alberta." Includes bibliographical references.
Page generated in 0.0636 seconds