Spelling suggestions: "subject:"[een] DOCUMENT"" "subject:"[enn] DOCUMENT""
671 |
Image Annotation With Semi-supervised ClusteringSayar, Ahmet 01 December 2009 (has links) (PDF)
Image annotation is defined as generating a set of textual words for a given image, learning from the available training data consisting of visual image content and annotation words.
Methods developed for image annotation usually make use of region clustering algorithms to quantize the visual information. Visual codebooks are generated from the region clusters of low level visual features. These codebooks are then, matched with the words of the text document related to the image, in various ways.
In this thesis, we propose a new image annotation technique, which improves the representation and quantization of the visual information by employing the available but unused information, called side information, which is hidden in the system. This side information is used to semi-supervise the clustering process which creates the visterms. The selection of side information depends on the visual image content, the annotation words and the relationship between them. Although there may be many different ways of defining and selecting side information, in this thesis, three types of side information are proposed. The first one is the hidden topic probability information obtained automatically from the text document associated with the image. The second one is the orientation and the third one is the color information around interest points that correspond to critical locations in the image. The side information provides a set of constraints in a semi-supervised K-means region clustering algorithm. Consequently, in generation of the visual terms from the regions, not only low level features are clustered, but also side information is used to complement the visual information,
called visterms. This complementary information is expected to close the semantic gap between the low level features extracted from each region and the high level textual information. Therefore, a better match between visual codebook and the annotation words is obtained. Moreover, a speedup is obtained in the modified K-means algorithm because of the constraints brought by the side information. The proposed algorithm is implemented in a high performance parallel computation environment.
|
672 |
A Framework For Ranking And Categorizing Medical DocumentsAl Zamil, Mohammed Gh. I. 01 June 2010 (has links) (PDF)
In this dissertation, we present a framework to enhance the retrieval, ranking, and categorization of text documents in medical domain. The contributions of this study are the introduction of a similarity model to retrieve and rank medical textdocuments and the introduction of rule-based categorization method based on lexical syntactic patterns features. We formulate the similarity model by combining three features to model the relationship among document and construct a document network. We aim to rank retrieved documents according to their topics / making highly relevant document on the top of the hit-list. We have applied this model on OHSUMED collection (TREC-9) in order to demonstrate the performance effectiveness in terms of topical ranking, recall, and precision metrics.
In addition, we introduce ROLEX-SP (Rules Of LEXical Syntactic Patterns) / a method for the automatic induction of rule-based text-classifiers relies on lexical syntactic patterns as a set of features to categorize text-documents. The proposed method is dedicated to solve the problem of multi-class classification and feature imbalance problems in domain specific text documents. Furthermore, our proposed
method is able to categorize documents according to a predefined set of characteristics such as: user-specific, domain-specific, and query-based categorization which facilitates browsing documents in search-engines and increase
users ability to choose among relevant documents. To demonstrate the applicability of ROLEX-SP, we have performed experiments on OHSUMED (categorization
collection). The results indicate that ROLEX-SP outperforms state-of-the-art methods in categorizing short-text medical documents.
|
673 |
An Automated Tool For Quality Manual Generation From Business Process ModelsAydin, Elif 01 September 2010 (has links) (PDF)
The majority of organizations make their business processes explicit to improve them. Defining business processes manually and modeling them are two alternatives utilized for this purpose. Meanwhile, organizations have quality management systems which are frequently shaped by frameworks. The most commonly used process improvement frameworks in the IT sector are ITIL, Cobit, CMMI and ISO 9001. These frameworks indicate the necessity of process documentation and ISO 9001 addresses the name &ldquo / Quality Manual&rdquo / for this purpose.
In this thesis, an automated tool is developed for quality manual generation from predetermined business process models. In addition, a case study is performed by means of a systematic approach and its results were discussed with the findings of structured interviews. The aim of the study is to reduce the effort and time required for quality manual preparation and merge quality management activities with process modeling by means of process documentation.
|
674 |
Improving Search Result Clustering By Integrating Semantic Information From WikipediaCalli, Cagatay 01 September 2010 (has links) (PDF)
Suffix Tree Clustering (STC) is a search result clustering (SRC) algorithm focused on generating overlapping clusters with meaningful labels in linear time. It showed the feasibility of SRC but in time, subsequent studies introduced description-first algorithms that generate better labels and achieve higher precision. Still, STC remained as the fastest SRC algorithm and there appeared studies concerned with different problems of STC. In this thesis, semantic relations between cluster labels and documents are exploited to filter out noisy labels and improve merging phase of STC. Wikipedia is used to identify these relations and methods for integrating semantic information to STC are suggested. Semantic features are shown to be effective for SRC task when used together with term frequency vectors. Furthermore, there were no SRC studies on Turkish up to now. In this thesis, a dataset for Turkish is introduced and a number of methods are tested on Turkish.
|
675 |
Discovery of Evolution Patterns from Sequences of DocumentsChang, Yu-Hsiu 06 August 2001 (has links)
Due to the ever-increasing volume of textual documents, text mining is a rapidly growing application of knowledge discovery in databases. Past text mining techniques predominately concentrated on discovering intra-document patterns from textual documents, such as text categorization, document clustering, query expansion, and event tracking. Mining inter-document patterns from textual documents has been largely ignored in the literature. This research focuses on discovering inter-document patterns, called evolution patterns, from document-sequences and proposed the evolution pattern discovery (EPD) technique for mining evolution patterns from a set of ordered sequences of documents. The discovery of evolution patterns can be applied in such domains as environmental scanning and knowledge management, and can be used to facilitate existing document management and retrieval techniques (e.g., event tracking).
|
676 |
Investigations of Term Expansion on Text Mining TechniquesYang, Chin-Sheng 02 August 2002 (has links)
Recent advances in computer and network technologies have contributed significantly to global connectivity and stimulated the amount of online textual document to grow extremely rapidly. The rapid accumulation of textual documents on the Web or within an organization requires effective document management techniques, covering from information retrieval, information filtering and text mining. The word mismatch problem represents a challenging issue to be addressed by the document management research. Word mismatch has been extensively investigated in information retrieval (IR) research by the use of term expansion (or specifically query expansion). However, a review of text mining literature suggests that the word mismatch problem has seldom been addressed by text mining techniques. Thus, this thesis aims at investigating the use of term expansion on some text mining techniques, specifically including text categorization, document clustering and event detection. Accordingly, we developed term expansion extensions to these three text mining techniques. The empirical evaluation results showed that term expansion increased the categorization effectiveness when the correlation coefficient feature selection was employed. With respect to document clustering, techniques extended with term expansion achieved comparable clustering effectiveness to existing techniques and showed its superiority in improving clustering specificity measure. Finally, the use of term expansion for supporting event detection has degraded the detection effectiveness as compared to the traditional event detection technique.
|
677 |
Evaluation of Event Episode Analysis SystemLee, Ming-yu 26 July 2008 (has links)
Knowledge-based assets play a very important role in the Information Age, and its increasingly influence on organizational competition makes Knowledge Management a hot issue in business research.Content analysis of documents is a core function of knowledge management. In previous research, many techniques have been developed to generate textual summary and/or generating ontology-based episodic knowledge from multipl documents. However, not much research has been done to compare different ways of organizing and presenting knowledge.
Since different knowledge presentations may result in different effects on the user, the purpose of this thesis is to develop a method for investigating different document summary and presentation systems. In this research, we have developed an effect measurement method based on the extended Bloom¡¦s Taxonomy of Educational Objectives.More specifically, we proposes evaluation criteria based on memory and cognition of the user.
A field experiment was conducted to compare graphical and textual systems. Results indicate that the ontology-based system has significantly superior performance in concept memorizing and procedural memorizing. On the other hand, the textual summary-based system performed better in remembering facts.
|
678 |
Managing XML data in a relational warehouse on query translation, warehouse maintenance, and data staleness /Kanna, Rajesh. January 2001 (has links) (PDF)
Thesis (M.S.)--University of Florida, 2001. / Title from first page of PDF file. Document formatted into pages; contains x, 75 p.; also contains graphics. Vita. Includes bibliographical references (p. 71-74).
|
679 |
Materialized view matching and compensation for SQL/XML and Xquery /Hoppe, Andrzej. January 2008 (has links)
Thesis (M.Sc.)--York University, 2008. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 147-152). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR38782
|
680 |
A comparative citation analysis study of web-based and print journal-based scholarly communication in the XML research fieldZhao, Dangzhi. Burnett, Gary. January 2003 (has links)
Thesis (Ph. D.)--Florida State University, 2003. / Advisor: Dr. Gary Burnett, Florida State University, School of Information Studies. Title and description from dissertation home page (viewed Apr. 06, 2004). Includes bibliographical references.
|
Page generated in 0.0535 seconds