Global ETD Search

201	Evaluating Text Segmentation Fournier, Christopher 24 April 2013 (has links) This thesis investigates the evaluation of automatic and manual text segmentation. Text segmentation is the process of placing boundaries within text to create segments according to some task-dependent criterion. An example of text segmentation is topical segmentation, which aims to segment a text according to the subjective definition of what constitutes a topic. A number of automatic segmenters have been created to perform this task, and the question that this thesis answers is how to select the best automatic segmenter for such a task. This requires choosing an appropriate segmentation evaluation metric, confirming the reliability of a manual solution, and then finally employing an evaluation methodology that can select the automatic segmenter that best approximates human performance. A variety of comparison methods and metrics exist for comparing segmentations (e.g., WindowDiff, Pk), and all save a few are able to award partial credit for nearly missing a boundary. Those comparison methods that can award partial credit unfortunately lack consistency, symmetricity, intuition, and a host of other desirable qualities. This work proposes a new comparison method named boundary similarity (B) which is based upon a new minimal boundary edit distance to compare two segmentations. Near misses are frequent, even among manual segmenters (as is exemplified by the low inter-coder agreement reported by many segmentation studies). This work adapts some inter-coder agreement coefficients to award partial credit for near misses using the new metric proposed herein, B. The methodologies employed by many works introducing automatic segmenters evaluate them simply in terms of a comparison of their output to one manual segmentation of a text, and often only by presenting nothing other than a series of mean performance values (along with no standard deviation, standard error, or little if any statistical hypothesis testing). This work asserts that one segmentation of a text cannot constitute a “true” segmentation; specifically, one manual segmentation is simply one sample of the population of all possible segmentations of a text and of that subset of desirable segmentations. This work further asserts that an adapted inter-coder agreement statistics proposed herein should be used to determine the reproducibility and reliability of a coding scheme and set of manual codings, and then statistical hypothesis testing using the specific comparison methods and methodologies demonstrated herein should be used to select the best automatic segmenter. This work proposes new segmentation evaluation metrics, adapted inter-coder agreement coefficients, and methodologies. Most important, this work experimentally compares the state-or-the-art comparison methods to those proposed herein upon artificial data that simulates a variety of scenarios and chooses the best one (B). The ability of adapted inter-coder agreement coefficients, based upon B, to discern between various levels of agreement in artificial and natural data sets is then demonstrated. Finally, a contextual evaluation of three automatic segmenters is performed using the state-of-the art comparison methods and B using the methodology proposed herein to demonstrate the benefits and versatility of B as opposed to its counterparts. computational linguistics evaluation natural language processing segmentation boundary similarity
202	Near Images: A Tolerance Based Approach to Image Similarity and its Robustness to Noise and Lightening Shahfar, Shabnam 27 September 2011 (has links) This thesis represents a tolerance near set approach to detect similarity between digital images. Two images are considered as sets of perceptual objects and a tolerance relation defines the nearness between objects. Two perceptual objects resemble each other if the difference between their descriptions is smaller than a tolerable level of error. Existing tolerance near set approaches to image similarity consider both images in a single tolerance space and compare the size of tolerance classes. This approach is shown to be sensitive to noise and distortions. In this thesis, a new tolerance-based method is proposed that considers each image in a separate tolerance space and defines the similarity based on differences between histograms of the size of tolerance classes. The main advantage of the proposed method is its lower sensitivity to distortions such as adding noise, darkening or brightening. This advantage has been shown here through a set of experiments. Image similarity measures Tolerance spaces Near images Nearness measures
203	Similarity and Diversity in Information Retrieval Akinyemi, John 25 April 2012 (has links) Inter-document similarity is used for clustering, classification, and other purposes within information retrieval. In this thesis, we investigate several aspects of document similarity. In particular, we investigate the quality of several measures of inter-document similarity, providing a framework suitable for measuring and comparing the effectiveness of inter-document similarity measures. We also explore areas of research related to novelty and diversity in information retrieval. The goal of diversity and novelty is to be able to satisfy as many users as possible while simultaneously minimizing or eliminating duplicate and redundant information from search results. In order to evaluate the effectiveness of diversity-aware retrieval functions, user query logs and other information captured from user interactions with commercial search engines are mined and analyzed in order to uncover various informational aspects underlying queries, which are known as subtopics. We investigate the suitability of implicit associations between document content as an alternative to subtopic mining. We also explore subtopic mining from document anchor text and anchor links. In addition, we investigate the suitability of inter-document similarity as a measure for diversity-aware retrieval models, with the aim of using measured inter-document similarity as a replacement for diversity-aware evaluation models that rely on subtopic mining. Finally, we investigate the suitability and application of document similarity for requirements traceability. We present a fast algorithm that uncovers associations between various versions of frequently edited documents, even in the face of substantial changes. Computer Science
204	Complex-Wavelet Structural Similarity Based Image Classification Gao, Yang January 2012 (has links) Complex wavelet structural similarity (CW-SSIM) index has been recognized as a novel image similarity measure of broad potential applications due to its robustness to small geometric distortions such as translation, scaling and rotation of images. Nevertheless, how to make the best use of it in image classification problems has not been deeply investi- gated. In this study, we introduce a series of novel image classification algorithms based on CW-SSIM and use handwritten digit and face image recognition as examples for demonstration, including CW-SSIM based nearest neighbor method, CW-SSIM based k means method, CW-SSIM based support vector machine method (SVM) and CW-SSIM based SVM using affinity propagation. Among the proposed approaches, the best compromise between accuracy and complexity is obtained by the CW-SSIM support vector machine algorithm, which combines an unsupervised clustering method to divide the training images into clusters with representative images and a supervised learning method based on support vector machines to maximize the classification accuracy. Our experiments show that such a conceptually simple image classification method, which does not involve any registration, intensity normalization or sophisticated feature extraction processes, and does not rely on any modeling of the image patterns or distortion processes, achieves competitive performance with reduced computational cost. complex-wavelet structural similarity image classification Electrical and Computer Engineering
205	Texture Descriptors For Content-based Image Retrieval Carkacioglu, Abdurrahman 01 January 2003 (has links) (PDF) Content Based Image Retrieval (CBIR) systems represent images in the database by color, texture, and shape information. In this thesis, we concentrate on tex- ture features and introduce a new generic texture descriptor, namely, Statistical Analysis of Structural Information (SASI). Moreover, in order to increase the re- trieval rates of a CBIR system, we propose a new method that can also adapt an image retrieval system into a con&macr / gurable one without changing the underlying feature extraction mechanism and the similarity function. SASI is based on statistics of clique autocorrelation coe&plusmn / cients, calculated over structuring windows. SASI de&macr / nes a set of clique windows to extract and measure various structural properties of texture by using a spatial multi- resolution method. Experimental results, performed on various image databases, indicate that SASI is more successful then the Gabor Filter descriptors in cap- turing small granularities and discontinuities such as sharp corners and abrupt changes. Due to the &deg / exibility in designing the clique windows, SASI reaches higher average retrieval rates compared to Gabor Filter descriptors. However, the price of this performance is increased computational complexity. Since, retrieving of similar images of a given query image is a subjective task, it is desirable that retrieval mechanism should be con&macr / gurable by the user. In the proposed method, basically, original feature space of a content-based retrieval system is nonlinearly transformed into a new space, where the distance between the feature vectors is adjusted by learning. The transformation is realized by Arti&macr / cial Neural Network architecture. A cost function is de&macr / ned for learning and optimized by simulated annealing method. Experiments are done on the texture image retrieval system, which use SASI and Gabor Filter features. The results indicate that con&macr / gured image retrieval system is signi&macr / cantly better than the original system.
206	On the rules-to-episodes transition in classification : generalization of similarity and rules with practice / Wood, Timothy J. January 1998 (has links) Thesis (Ph.D.)--McMaster University, 1998. / Includes bibliographical references (leaves 131-139). Also available via World Wide Web.
207	I know how you feel the effect of similarity and empathy on neural mirroring / Quandt, Lorna. Carp, Joshua. Halenar, Michael. Sklar, Alfredo. January 2007 (has links) Thesis (B.A.)--Haverford College, Dept. of Psychology, 2007. / Includes bibliographical references.
208	The effect of facial resemblance on alibi credibility and final verdicts Ochoa, Claudia, January 2009 (has links) Thesis (M.A.)--University of Texas at El Paso, 2009. / Title from title screen. Vita. CD-ROM. Includes bibliographical references. Also available online.
209	Learning Commonsense Categorical Knowledge in a Thread Memory System Stamatoiu, Oana L. 18 May 2004 (has links) If we are to understand how we can build machines capable of broadpurpose learning and reasoning, we must first aim to build systemsthat can represent, acquire, and reason about the kinds of commonsenseknowledge that we humans have about the world. This endeavor suggestssteps such as identifying the kinds of knowledge people commonly haveabout the world, constructing suitable knowledge representations, andexploring the mechanisms that people use to make judgments about theeveryday world. In this work, I contribute to these goals by proposingan architecture for a system that can learn commonsense knowledgeabout the properties and behavior of objects in the world. Thearchitecture described here augments previous machine learning systemsin four ways: (1) it relies on a seven dimensional notion of context,built from information recently given to the system, to learn andreason about objects' properties; (2) it has multiple methods that itcan use to reason about objects, so that when one method fails, it canfall back on others; (3) it illustrates the usefulness of reasoningabout objects by thinking about their similarity to other, betterknown objects, and by inferring properties of objects from thecategories that they belong to; and (4) it represents an attempt tobuild an autonomous learner and reasoner, that sets its own goals forlearning about the world and deduces new facts by reflecting on itsacquired knowledge. This thesis describes this architecture, as wellas a first implementation, that can learn from sentences such as ``Ablue bird flew to the tree'' and ``The small bird flew to the cage''that birds can fly. One of the main contributions of thiswork lies in suggesting a further set of salient ideas about how wecan build broader purpose commonsense artificial learners andreasoners. AI learning context categorization similarity Bridge thread memory
210	Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection Nahnsen, Thade, Uzuner, Ozlem, Katz, Boris 19 May 2005 (has links) We present a system to determine content similarity of documents. More specifically, our goal is to identify book chapters that are translations of the same original chapter; this task requires identification of not only the different topics in the documents but also the particular flow of these topics. We experiment with different representations employing n-grams of lexical chains and test these representations on a corpus of approximately 1000 chapters gathered from books with multiple parallel translations. Our representations include the cosine similarity of attribute vectors of n-grams of lexical chains, the cosine similarity of tf*idf-weighted keywords, and the cosine similarity of unweighted lexical chains (unigrams of lexical chains) as well as multiplicative combinations of the similarity measures produced by these approaches. Our results identify fourgrams of unordered lexical chains as a particularly useful representation for text similarity evaluation. AI Natural Language Processing N-grams Text Similarity Lexical Chains

Search results