Global ETD Search

191	Personalized and Context-aware Document Clustering Yang, Chin-Sheng 15 July 2007 (has links) To manage the ever-increasing volume of documents, organizations and individuals typically organize documents into categories (or category hierarchies) to facilitate their document management and support subsequent document retrieval and access. Document clustering is an intentional act that should reflect individuals¡¦ preferences with regard to the semantic coherency or relevant categorization of documents and should conform to the context of a target task under investigation. Thus, effective document clustering techniques need to take into account a user¡¦s categorization context defined by or relevant to the target task under consideration. However, existing document clustering techniques generally anchor in pure content-based analysis and therefore are not able to facilitate personalized or context-aware document clustering. In response, we design, implement and empirically evaluate three document clustering techniques capable of facilitating personalized or contextual document clustering. First, we extend an existing document clustering technique (specifically, the partial-clustering-based personalized document-clustering (PEC) approach) and propose the Collaborative Filtering¡Vbased personalized document-Clustering (CFC) technique to overcome the problem of small-sized partial clustering encountered by the PEC technique. Particularly, the CFC technique expands the size of a user¡¦s partial clustering based on the partial clusterings of other users with similar categorization preferences. Second, to support contextual document clustering, we design and implement a Context-Aware document-Clustering (CAC) technique by taking into consideration a user¡¦s categorization preference (i.e., a set of anchoring terms) relevant to the context of a target task and a statistical-based thesaurus constructed from the World Wide Web (WWW) via a search engine. Third, in response to the problem of small-sized set of anchoring terms which can greatly degrade the effectiveness of the CAC technique, we extend CAC and propose a Collaborative Filtering-based Context-Aware document Clustering (CF-CAC) technique. Our empirical evaluation results suggest that our proposed CFC, CAC, and CF-CAC techniques better support the need of personalized and contextual document clustering than do their benchmark techniques. Context-aware document clustering Personalized document clustering Text mining Document clustering Knowledge management
192	Text Mining: A Burgeoning Quality Improvement Tool J. Mohammad, Mohammad Alkin Cihad 01 November 2007 (has links) (PDF) While the amount of textual data available to us is constantly increasing, managing the texts by human effort is clearly inadequate for the volume and complexity of the information involved. Consequently, requirement for automated extraction of useful knowledge from huge amounts of textual data to assist human analysis is apparent. Text mining (TM) is mostly an automated technique that aims to discover knowledge from textual data. In this thesis, the notion of text mining, its techniques, applications are presented. In particular, the study provides the definition and overview of concepts in text categorization. This would include document representation models, weighting schemes, feature selection methods, feature extraction, performance measure and machine learning techniques. The thesis details the functionality of text mining as a quality improvement tool. It carries out an extensive survey of text mining applications within service sector and manufacturing industry. It presents two broad experimental studies tackling the potential use of text mining for the hotel industry (the comment card analysis), and in automobile manufacturer (miles per gallon analysis). Keywords: Text Mining, Text Categorization, Quality Improvement, Service Sector, Manufacturing Industry. QA Mathematics 1-939
193	Sentiment Analysis In Turkish Erogul, Umut 01 June 2009 (has links) (PDF) Sentiment analysis is the automatic classification of a text, trying to determine the attitude of the writer with respect to a specific topic. The attitude may be either their judgment or evaluation, their feelings or the intended emotional communication. The recent increase in the use of review sites and blogs, has made a great amount of subjective data available. Nowadays, it is nearly impossible to manually process all the relevant data available, and as a consequence, the importance given to the automatic classification of unformatted data, has increased. Up to date, all of the research carried on sentiment analysis was focused on English language. In this thesis, two Turkish datasets tagged with sentiment information is introduced and existing methods for English are applied on these datasets. This thesis also suggests new methods for Turkish sentiment analysis. QA General 15707
194	Improving Search Result Clustering By Integrating Semantic Information From Wikipedia Calli, Cagatay 01 September 2010 (has links) (PDF) Suffix Tree Clustering (STC) is a search result clustering (SRC) algorithm focused on generating overlapping clusters with meaningful labels in linear time. It showed the feasibility of SRC but in time, subsequent studies introduced description-first algorithms that generate better labels and achieve higher precision. Still, STC remained as the fastest SRC algorithm and there appeared studies concerned with different problems of STC. In this thesis, semantic relations between cluster labels and documents are exploited to filter out noisy labels and improve merging phase of STC. Wikipedia is used to identify these relations and methods for integrating semantic information to STC are suggested. Semantic features are shown to be effective for SRC task when used together with term frequency vectors. Furthermore, there were no SRC studies on Turkish up to now. In this thesis, a dataset for Turkish is introduced and a number of methods are tested on Turkish. QA Computer Software 76.75-76.765
195	Acquisition Of Liver Specific Parasites-bacteria-drugs-diseases-genes Knowledge From Medline Yildirim, Pinar 01 April 2011 (has links) (PDF) Biomedical literature such as MEDLINE articles are rich resources for discovering and tracking disease and drug knowledge. For example, information regarding the drugs that are used with a particular disease or the changes in drug usage over time is valulable. However, this information is buried in thousands of MEDLINE articles. Acquiring knowledge from these articles requires complex processes depending on the biomedical text mining techniques. Today, parasitic and bacterial diseases affect hundreds of millions of people worldwide. They result in significant mortality and devastating social and economic consequences. There are many control and eradication programs conducted in the world. Also, many drugs are developed for diseases caused from parasites and bacteria. In this study, research was conducted of parasites (bacteria affecting the liver) and treatment drugs were tested. Also, relationships between these diseases and genes, along with parasites and bacteria were searched through data and biomedical text mining techniques. This study reveals that the treatment of parasites and bacteria seems to be stable over the last four decades. The methodology introduced in this study also presents a reference model to acquire medical knowledge from the literature. QA Computer Software 76.75-76.765
196	Ontology Based Text Mining In Turkish Radiology Reports Deniz, Onur 01 January 2012 (has links) (PDF) Vast amount of radiology reports are produced in hospitals. Being in free text format and having errors due to rapid production, it continuously gets more complicated for radiologists and physicians to reach meaningful information. Though application of ontologies into bio-medical text mining has gained increasing interest in recent years, less work has been offered for ontology based retrieval tasks in Turkish language. In this work, an information extraction and retrieval system based on SNOMED-CT ontology has been proposed for Turkish radiology reports. Main purpose of this work is to utilize semantic relations in ontology to improve precision and recall rates of search results in domain. Practical problems encountered such as spelling errors, segmentation and tokenization of unstructured medical reports has also been addressed during the work. QA General 15707
197	Emotion Analysis Of Turkish Texts By Using Machine Learning Methods Boynukalin, Zeynep 01 July 2012 (has links) (PDF) Automatically analysing the emotion in texts is in increasing interest in today&rsquo / s research fields. The aim is to develop a machine that can detect type of user&rsquo / s emotion from his/her text. Emotion classification of English texts is studied by several researchers and promising results are achieved. In this thesis, an emotion classification study on Turkish texts is introduced. To the best of our knowledge, this is the first study on emotion analysis of Turkish texts. In English there exists some well-defined datasets for the purpose of emotion classification, but we could not find datasets in Turkish suitable for this study. Therefore, another important contribution is the generating a new data set in Turkish for emotion analysis. The dataset is generated by combining two types of sources. Several classification algorithms are applied on the dataset and results are compared. Due to the nature of Turkish language, new features are added to the existing methods to improve the success of the proposed method. QA Computer Software 76.75-76.765
198	大学生における「就職しないこと」イメージの構造と進路未決定 : テキストマイニングを用いた検討 SUGIMOTO, Hideharu, 杉本, 英晴 31 March 2009 (has links) No description available. University Students Text Mining Techniques Career Indecision
199	Discovery of Evolution Patterns from Sequences of Documents Chang, Yu-Hsiu 06 August 2001 (has links) Due to the ever-increasing volume of textual documents, text mining is a rapidly growing application of knowledge discovery in databases. Past text mining techniques predominately concentrated on discovering intra-document patterns from textual documents, such as text categorization, document clustering, query expansion, and event tracking. Mining inter-document patterns from textual documents has been largely ignored in the literature. This research focuses on discovering inter-document patterns, called evolution patterns, from document-sequences and proposed the evolution pattern discovery (EPD) technique for mining evolution patterns from a set of ordered sequences of documents. The discovery of evolution patterns can be applied in such domains as environmental scanning and knowledge management, and can be used to facilitate existing document management and retrieval techniques (e.g., event tracking). Feature Extraction Feature Selection Document Clustering Frequent Temporal Patterns Feature-Based Evolution Patterns Text Mining
200	Investigations of Term Expansion on Text Mining Techniques Yang, Chin-Sheng 02 August 2002 (has links) Recent advances in computer and network technologies have contributed significantly to global connectivity and stimulated the amount of online textual document to grow extremely rapidly. The rapid accumulation of textual documents on the Web or within an organization requires effective document management techniques, covering from information retrieval, information filtering and text mining. The word mismatch problem represents a challenging issue to be addressed by the document management research. Word mismatch has been extensively investigated in information retrieval (IR) research by the use of term expansion (or specifically query expansion). However, a review of text mining literature suggests that the word mismatch problem has seldom been addressed by text mining techniques. Thus, this thesis aims at investigating the use of term expansion on some text mining techniques, specifically including text categorization, document clustering and event detection. Accordingly, we developed term expansion extensions to these three text mining techniques. The empirical evaluation results showed that term expansion increased the categorization effectiveness when the correlation coefficient feature selection was employed. With respect to document clustering, techniques extended with term expansion achieved comparable clustering effectiveness to existing techniques and showed its superiority in improving clustering specificity measure. Finally, the use of term expansion for supporting event detection has degraded the detection effectiveness as compared to the traditional event detection technique. Term Association Word Mismatch Text Mining Event Detection Term Expansion Document Clustering Text Categorization

Search results