Global ETD Search

61	Initial Results in the Development of SCAN : a Swedish Clinical Abbreviation Normalizer Isenius, Niklas, Velupillai, Sumithra, Kvist, Maria January 2012 (has links) Abbreviations are common in clinical documentation, as this type of text is written under time-pressure and serves mostly for internal communication. This study attempts to apply and extend existing rule-based algorithms that have been developed for English and Swedish abbreviation detection, in order to create an abbreviation detection algorithm for Swedish clinical texts that can identify and suggest definitions for abbreviations and acronyms. This can be used as a pre-processing step for further information extraction and text mining models, as well as for readability solutions. Through a literature review, a number of heuristics were defined for automatic abbreviation detection. These were used in the construction of the Swedish Clinical Abbreviation Normalizer (SCAN). The heuristics were: a) freely available external resources: a dictionary of general Swedish, a dictionary of medical terms and a dictionary of known Swedish medical abbreviations, b) maximum word lengths (from three to eight characters), and c) heuristics for handling common patterns such as hyphenation. For each token in the text, the algorithm checks whether it is a known word in one of the lexicons, and whether it fulfills the criteria for word length and the created heuristics. The final algorithm was evaluated on a set of 300 Swedish clinical notes from an emergency department at the Karolinska University Hospital, Stockholm. These notes were annotated for abbreviations, a total of 2,050 tokens. This set was annotated by a physician accustomed to reading and writing medical records. The algorithm was tested in different variants, where the word lists were modified, heuristics adapted to characteristics found in the texts, and different combinations of word lengths. The best performing version of the algorithm achieved an F-Measure score of 79%, with 76% recall and 81% precision, which is a considerable improvement over the baseline where each token was only matched against the word lists (51% F-measure, 87% recall, 36% precision). Not surprisingly, precision results are higher when the maximum word length is set to the lowest (three), and recall results higher when it is set to the highest (eight). Algorithms for rule-based systems, mainly developed for English, can be successfully adapted for abbreviation detection in Swedish medical records. System performance relies heavily on the quality of the external resources, as well as on the created heuristics. In order to improve results, part-of-speech information and/or local context is needed for disambiguation. In the case of Swedish, compounding also needs to be handled. Automatic Abbreviation Detection Medical Records Clinical Text Mining
62	Personalized Medicine through Automatic Extraction of Information from Medical Texts Frunza, Oana Magdalena 17 April 2012 (has links) The wealth of medical-related information available today gives rise to a multidimensional source of knowledge. Research discoveries published in prestigious venues, electronic-health records data, discharge summaries, clinical notes, etc., all represent important medical information that can assist in the medical decision-making process. The challenge that comes with accessing and using such vast and diverse sources of data stands in the ability to distil and extract reliable and relevant information. Computer-based tools that use natural language processing and machine learning techniques have proven to help address such challenges. This current work proposes automatic reliable solutions for solving tasks that can help achieve a personalized-medicine, a medical practice that brings together general medical knowledge and case-specific medical information. Phenotypic medical observations, along with data coming from test results, are not enough when assessing and treating a medical case. Genetic, life-style, background and environmental data also need to be taken into account in the medical decision process. This thesis’s goal is to prove that natural language processing and machine learning techniques represent reliable solutions for solving important medical-related problems. From the numerous research problems that need to be answered when implementing personalized medicine, the scope of this thesis is restricted to four, as follows: 1. Automatic identification of obesity-related diseases by using only textual clinical data; 2. Automatic identification of relevant abstracts of published research to be used for building systematic reviews; 3. Automatic identification of gene functions based on textual data of published medical abstracts; 4. Automatic identification and classification of important medical relations between medical concepts in clinical and technical data. This thesis investigation on finding automatic solutions for achieving a personalized medicine through information identification and extraction focused on individual specific problems that can be later linked in a puzzle-building manner. A diverse representation technique that follows a divide-and-conquer methodological approach shows to be the most reliable solution for building automatic models that solve the above mentioned tasks. The methodologies that I propose are supported by in-depth research experiments and thorough discussions and conclusions. Natural Language Processing Machine Learning Text Mining Medical Informatics
63	A Lexicon for Gene Normalization / Ett lexicon för gennormalisering Lingemark, Maria January 2009 (has links) Researchers tend to use their own or favourite gene names in scientific literature, even though there are official names. Some names may even be used for more than one gene. This leads to problems with ambiguity when automatically mining biological literature. To disambiguate the gene names, gene normalization is used. In this thesis, we look into an existing gene normalization system, and develop a new method to find gene candidates for the ambiguous genes. For the new method a lexicon is created, using information about the gene names, symbols and synonyms from three different databases. The gene mention found in the scientific literature is used as input for a search in this lexicon, and all genes in the lexicon that match the mention are returned as gene candidates for that mention. These candidates are then used in the system's disambiguation step. Results show that the new method gives a better over all result from the system, with an increase in precision and a small decrease in recall. Bioinformatics Gene Normalization String Matching Text Mining Bioinformatics Bioinformatik
64	Corpus construction based on Ontological domain knowledge Benis, Nirupama, Kaliyaperumal, Rajaram January 2011 (has links) The purpose of this thesis is to contribute a corpus for sentence level interpretation of biomedical language. The available corpora for the biomedical domain are small in terms of amount of text and predicates. Besides that these corpora are developed rather intuitively. In this effort which we call BioOntoFN, we created a corpus from the domain knowledge provided by an ontology. By doing this we believe that we can provide a rough set of rules to create corpora from ontologies. Besides that we also designed an annotation tool specifically for building our corpus. We built a corpus for biological transport events. The ontology we used is the piece of Gene Ontology pertaining to transport, the term transport GO: 0006810 and all of its child concepts, which could be called a sub-ontology. The annotation of the corpus follows the rules of FrameNet and the output is annotated text that is in an XML format similar to that of FrameNet. The text for the corpus is taken from abstracts of MEDLINE articles. The annotation tool is a GUI created using Java. Text mining Biomedical text mining Natural Language Processing
65	Cross-Lingual Text Categorization Lin, Yen-Ting 29 July 2004 (has links) With the emergence and proliferation of Internet services and e-commerce applications, a tremendous amount of information is accessible online, typically as textual documents. To facilitate subsequent access to and leverage from this information, the efficient and effective management¡Xspecifically, text categorization¡Xof the ever-increasing volume of textual documents is essential to organizations and person. Existing text categorization techniques focus mainly on categorizing monolingual documents. However, with the globalization of business environments and advances in Internet technology, an organization or person often retrieves and archives documents in different languages, thus creating the need for cross-lingual text categorization. Motivated by the significance of and need for such a cross-lingual text categorization technique, this thesis designs a technique with two different category assignment methods, namely, individual- and cluster-based. The empirical evaluation results show that the cross-lingual text categorization technique performs well and the cluster-based method outperforms the individual-based method. Document management Cross-lingual text categorization Text categorization Text mining
66	Semantic-Based Approach to Supporting Opinion Summarization Chen, Yen-Ming 20 July 2006 (has links) With the rapid expansion of e-commerce, the Web has become an excellent source for gathering customer opinions (or so-called customer reviews). Customer reviews are essential for merchants or product manufacturers to understand general responses of customers on their products for product or marketing campaign improvement. In addition, customer reviews can enable merchants better understand specific preferences of individual customers and facilitates making effective marketing decisions. Prior data mining research mainly concentrates on analyzing customer demographic, attitudinal, psychographic, transactional, and behavioral data for supporting customer relationship management and marketing decision making and did not pay attention to the use of customer reviews as additional source for marketing intelligence. Thus, the purpose of this research is to develop an efficient and effective opinion summarization technique. Specifically, we will propose a semantic-based product feature extraction technique (SPE) which aims at improving the existing product feature extraction technique and is desired to enhance the overall opinion summarization effectiveness. Semantic Orientation Opinion Summarization Customer Review Text Mining
67	Poly-Lingual Text Categorization Shih, Hui-Hua 09 August 2006 (has links) With the rapid emergence and proliferation of Internet and the trend of globalization, a tremendous number of textual documents written in different languages are electronically accessible online. Efficiently and effectively managing these textual documents written different languages is essential to organizations and individuals. Although poly-lingual text categorization (PLTC) can be approached as a set of independent monolingual classifiers, this naïve approach employs only the training documents of the same language to construct to construct a monolingual classifier and fails to utilize the opportunity offered by poly-lingual training documents. Motivated by the significance of and need for such a poly-lingual text categorization technique, we propose a PLTC technique that takes into account all training documents of all languages when constructing a monolingual classifier for a specific language. Using the independent monolingual text categorization (MnTC) technique as our performance benchmark, our empirical evaluation results show that our proposed PLTC technique achieves higher classification accuracy than the benchmark technique does in both English and Chinese corpora. In addition, our empirical results also suggest the robustness of the proposed PLTC technique with respect to the range of training sizes investigated. Text categorization Document management Text mining Poly-lingual text categorization
68	Using Text mining Techniques for automatically classifying Public Opinion Documents Chen, Kuan-hsien 19 January 2009 (has links) In a democratic society, the number of public opinion documents increase with days, and there is a pressing need for automatically classifying these documents. Traditional approach for classifying documents involves the techniques of segmenting words and the use of stop words, corpus, and grammar analysis for retrieving the key terms of documents. However, with the emergence of new terms, the traditional methods that leverage dictionary or thesaurus may incur lower accuracy. Therefore, this study proposes a new method that does not require the prior establishment of a dictionary or thesaurus, and is applicable to documents written in any language and documents containing unstructured text. Specifically, the classification method employs genetic algorithm for achieving this goal. In this method, each training document is represented by several chromosomes, and based on the gene values of these chromosomes, the characteristic terms of the document are determined. The fitness function, which is required by the genetic algorithm for evaluating the fitness of an evolved chromosome, considers the similarity to the chromosomes of documents of other types. This study used data FAQ of e-mail box of Taipei city mayor for evaluating the proposed method by varying the length of documents. The results show that the proposed method achieves the average accuracy rate of 89%, the average precision rate of 47%, and the average recall rate of 45%. In addition, F-measure can reach up to 0.7. The results confirms that the number of training documents, content of training documents, the similarity between the types of documents, and the length of the documents all contribute to the effectiveness of the proposed method. Text Categorization Word Segmentation Genetic Algorithms Public Opinion Text Mining
69	Automated Analysis Techniques for Online Conversations with Application in Deception Detection Twitchell, Douglas P. January 2005 (has links) Email, chat, instant messaging, blogs, and newsgroups are now common ways for people to interact. Along with these new ways for sending, receiving, and storing messages comes the challenge of organizing, filtering, and understanding them, for which text mining has been shown to be useful. Additionally, it has done so using both content-dependent and content-independent methods.Unfortunately, computer-mediated communication has also provided criminals, terrorists, spies, and other threats to security a means of efficient communication. However, the often textual encoding of these communications may also provide for the possibility of detecting and tracking those who are deceptive. Two methods for organizing, filtering, understanding, and detecting deception in text-based computer-mediated communication are presented.First, message feature mining uses message features or cues in CMC messages combined with machine learning techniques to classify messages according to the sender's intent. The method utilizes common classification methods coupled with linguistic analysis of messages for extraction of a number of content-independent input features. A study using message feature mining to classify deceptive and non-deceptive email messages attained classification accuracy between 60\% and 80\%.Second, speech act profiling is a method for evaluating and visualizing synchronous CMC by creating profiles of conversations and their participants using speech act theory and probabilistic classification methods. Transcripts from a large corpus of speech act annotated conversations are used to train language models and a modified hidden Markov model (HMM) to obtain probable speech acts for sentences, which are aggregated for each conversation participant creating a set of speech act profiles. Three studies for validating the profiles are detailed as well as two studies showing speech act profiling's ability to uncover uncertainty related to deception.The methods introduced here are two content-independent methods that represent a possible new direction in text analysis. Both have possible applications outside the context of deception. In addition to aiding deception detection, these methods may also be applicable in information retrieval, technical support training, GSS facilitation support, transportation security, and information assurance. computer-mediated communication deception detection speech act theory text mining
70	MINING CONSUMER TRENDS FROM ONLINE REVIEWS: AN APPROACH FOR MARKET RESEARCH Tsubiks, Olga 10 August 2012 (has links) We present a novel marketing method for consumer trend detection from online user generated content, which is motivated by the gap identified in the market research literature. The existing approaches for trend analysis generally base on rating of trends by industry experts through survey questionnaires, interviews, or similar. These methods proved to be inherently costly and often suffer from bias. Our approach is based on the use of information extraction techniques for identification of trends in large aggregations of social media data. It is cost-effective method that reduces the possibility of errors associated with the design of the sample and the research instrument. The effectiveness of the approach is demonstrated in the experiment performed on restaurant review data. The accuracy of the results is at the level of current approaches for both, information extraction and market research. Consumer trend identification Consumer trend monitoring Online reviews Text mining

Search results