• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 238
  • 124
  • 44
  • 38
  • 30
  • 29
  • 24
  • 24
  • 13
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • Tagged with
  • 619
  • 619
  • 141
  • 128
  • 115
  • 113
  • 87
  • 86
  • 85
  • 81
  • 80
  • 76
  • 65
  • 64
  • 64
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Personalized Medicine through Automatic Extraction of Information from Medical Texts

Frunza, Oana Magdalena 17 April 2012 (has links)
The wealth of medical-related information available today gives rise to a multidimensional source of knowledge. Research discoveries published in prestigious venues, electronic-health records data, discharge summaries, clinical notes, etc., all represent important medical information that can assist in the medical decision-making process. The challenge that comes with accessing and using such vast and diverse sources of data stands in the ability to distil and extract reliable and relevant information. Computer-based tools that use natural language processing and machine learning techniques have proven to help address such challenges. This current work proposes automatic reliable solutions for solving tasks that can help achieve a personalized-medicine, a medical practice that brings together general medical knowledge and case-specific medical information. Phenotypic medical observations, along with data coming from test results, are not enough when assessing and treating a medical case. Genetic, life-style, background and environmental data also need to be taken into account in the medical decision process. This thesis’s goal is to prove that natural language processing and machine learning techniques represent reliable solutions for solving important medical-related problems. From the numerous research problems that need to be answered when implementing personalized medicine, the scope of this thesis is restricted to four, as follows: 1. Automatic identification of obesity-related diseases by using only textual clinical data; 2. Automatic identification of relevant abstracts of published research to be used for building systematic reviews; 3. Automatic identification of gene functions based on textual data of published medical abstracts; 4. Automatic identification and classification of important medical relations between medical concepts in clinical and technical data. This thesis investigation on finding automatic solutions for achieving a personalized medicine through information identification and extraction focused on individual specific problems that can be later linked in a puzzle-building manner. A diverse representation technique that follows a divide-and-conquer methodological approach shows to be the most reliable solution for building automatic models that solve the above mentioned tasks. The methodologies that I propose are supported by in-depth research experiments and thorough discussions and conclusions.
72

A Lexicon for Gene Normalization / Ett lexicon för gennormalisering

Lingemark, Maria January 2009 (has links)
Researchers tend to use their own or favourite gene names in scientific literature, even though there are official names. Some names may even be used for more than one gene. This leads to problems with ambiguity when automatically mining biological literature. To disambiguate the gene names, gene normalization is used. In this thesis, we look into an existing gene normalization system, and develop a new method to find gene candidates for the ambiguous genes. For the new method a lexicon is created, using information about the gene names, symbols and synonyms from three different databases. The gene mention found in the scientific literature is used as input for a search in this lexicon, and all genes in the lexicon that match the mention are returned as gene candidates for that mention. These candidates are then used in the system's disambiguation step. Results show that the new method gives a better over all result from the system, with an increase in precision and a small decrease in recall.
73

Cross-Lingual Text Categorization

Lin, Yen-Ting 29 July 2004 (has links)
With the emergence and proliferation of Internet services and e-commerce applications, a tremendous amount of information is accessible online, typically as textual documents. To facilitate subsequent access to and leverage from this information, the efficient and effective management¡Xspecifically, text categorization¡Xof the ever-increasing volume of textual documents is essential to organizations and person. Existing text categorization techniques focus mainly on categorizing monolingual documents. However, with the globalization of business environments and advances in Internet technology, an organization or person often retrieves and archives documents in different languages, thus creating the need for cross-lingual text categorization. Motivated by the significance of and need for such a cross-lingual text categorization technique, this thesis designs a technique with two different category assignment methods, namely, individual- and cluster-based. The empirical evaluation results show that the cross-lingual text categorization technique performs well and the cluster-based method outperforms the individual-based method.
74

Cross-Lingual Text Categorization: A Training-corpus Translation-based Approach

Hsu, Kai-hsiang 21 July 2005 (has links)
Text categorization deals with the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the assignment of unclassified documents to appropriate categories. Most of existing text categorization techniques deal with monolingual documents (i.e., all documents are written in one language) during the text categorization model learning and category assignment (or prediction). However, with the globalization of business environments and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for cross-lingual text categorization (CLTC). Existing studies on CLTC focus on the prediction-corpus translation-based approach that lacks of a systematic mechanism for reducing translation noises; thus, limiting their cross-lingual categorization effectiveness. Motivated by the needs of providing more effective CLTC support, we design a training-corpus translation-based CLTC approach. Using the prediction-corpus translation-based approach as the performance benchmark, our empirical evaluation results show that our proposed CLTC approach achieves significantly better classification effectiveness than the benchmark approach does in both Chinese
75

Cross-Lingual Question Answering for Corpora with Question-Answer Pairs

Huang, Shiuan-Lung 02 August 2005 (has links)
Question answering from a corpus of question-answer (QA) pairs accepts a user question in a natural language, and retrieves relevant QA pairs in the corpus. Most of existing question answering techniques are monolingual in nature. That is, the language used for expressing a user question is identical to that for the QA pairs in the corpus. However, with the globalization of business environments and advances in Internet technology, more and more online information and knowledge are stored in the question-answer pair format on the Internet or intranet in different languages. To facilitate users¡¦ access to these QA-pair documents using natural language queries in such a multilingual environment, there is a pressing need for the support of cross-lingual question answering (CLQA). In response, this study designs a thesaurus based CLQA technique. We empirically evaluate our proposed CLQA technique, using a monolingual question answering technique and a machine translation based CLQA technique as performance benchmarks. Our empirical evaluation results show that our proposed CLQA technique achieves a satisfactory effectiveness when using that of the monolingual question answering technique as a performance reference. Moreover, our empirical evaluation results suggest our proposed thesaurus based CLQA technique significantly outperforms the benchmark machine translation based CLQA technique.
76

Semantic-Based Approach to Supporting Opinion Summarization

Chen, Yen-Ming 20 July 2006 (has links)
With the rapid expansion of e-commerce, the Web has become an excellent source for gathering customer opinions (or so-called customer reviews). Customer reviews are essential for merchants or product manufacturers to understand general responses of customers on their products for product or marketing campaign improvement. In addition, customer reviews can enable merchants better understand specific preferences of individual customers and facilitates making effective marketing decisions. Prior data mining research mainly concentrates on analyzing customer demographic, attitudinal, psychographic, transactional, and behavioral data for supporting customer relationship management and marketing decision making and did not pay attention to the use of customer reviews as additional source for marketing intelligence. Thus, the purpose of this research is to develop an efficient and effective opinion summarization technique. Specifically, we will propose a semantic-based product feature extraction technique (SPE) which aims at improving the existing product feature extraction technique and is desired to enhance the overall opinion summarization effectiveness.
77

Poly-Lingual Text Categorization

Shih, Hui-Hua 09 August 2006 (has links)
With the rapid emergence and proliferation of Internet and the trend of globalization, a tremendous number of textual documents written in different languages are electronically accessible online. Efficiently and effectively managing these textual documents written different languages is essential to organizations and individuals. Although poly-lingual text categorization (PLTC) can be approached as a set of independent monolingual classifiers, this naïve approach employs only the training documents of the same language to construct to construct a monolingual classifier and fails to utilize the opportunity offered by poly-lingual training documents. Motivated by the significance of and need for such a poly-lingual text categorization technique, we propose a PLTC technique that takes into account all training documents of all languages when constructing a monolingual classifier for a specific language. Using the independent monolingual text categorization (MnTC) technique as our performance benchmark, our empirical evaluation results show that our proposed PLTC technique achieves higher classification accuracy than the benchmark technique does in both English and Chinese corpora. In addition, our empirical results also suggest the robustness of the proposed PLTC technique with respect to the range of training sizes investigated.
78

Using Text mining Techniques for automatically classifying Public Opinion Documents

Chen, Kuan-hsien 19 January 2009 (has links)
In a democratic society, the number of public opinion documents increase with days, and there is a pressing need for automatically classifying these documents. Traditional approach for classifying documents involves the techniques of segmenting words and the use of stop words, corpus, and grammar analysis for retrieving the key terms of documents. However, with the emergence of new terms, the traditional methods that leverage dictionary or thesaurus may incur lower accuracy. Therefore, this study proposes a new method that does not require the prior establishment of a dictionary or thesaurus, and is applicable to documents written in any language and documents containing unstructured text. Specifically, the classification method employs genetic algorithm for achieving this goal. In this method, each training document is represented by several chromosomes, and based on the gene values of these chromosomes, the characteristic terms of the document are determined. The fitness function, which is required by the genetic algorithm for evaluating the fitness of an evolved chromosome, considers the similarity to the chromosomes of documents of other types. This study used data FAQ of e-mail box of Taipei city mayor for evaluating the proposed method by varying the length of documents. The results show that the proposed method achieves the average accuracy rate of 89%, the average precision rate of 47%, and the average recall rate of 45%. In addition, F-measure can reach up to 0.7. The results confirms that the number of training documents, content of training documents, the similarity between the types of documents, and the length of the documents all contribute to the effectiveness of the proposed method.
79

Deriving Genetic Networks Using Text Mining

Olsson, Elin January 2002 (has links)
<p>On the Internet an enormous amount of information is available that is represented in an unstructured form. The purpose with a text mining tool is to collect this information and present it in a more structured form. In this report text mining is used to create an algorithm that searches abstracts available from PubMed and finds specific relationships between genes that can be used to create a network. The algorithm can also be used to find information about a specific gene. The network created by Mendoza et al. (1999) was verified in all the connections but one using the algorithm. This connection contained implicit information. The results suggest that the algorithm is better at extracting information about specific genes than finding connections between genes. One advantage with the algorithm is that it can also find connections between genes and proteins and genes and other chemical substances.</p>
80

Automated Analysis Techniques for Online Conversations with Application in Deception Detection

Twitchell, Douglas P. January 2005 (has links)
Email, chat, instant messaging, blogs, and newsgroups are now common ways for people to interact. Along with these new ways for sending, receiving, and storing messages comes the challenge of organizing, filtering, and understanding them, for which text mining has been shown to be useful. Additionally, it has done so using both content-dependent and content-independent methods.Unfortunately, computer-mediated communication has also provided criminals, terrorists, spies, and other threats to security a means of efficient communication. However, the often textual encoding of these communications may also provide for the possibility of detecting and tracking those who are deceptive. Two methods for organizing, filtering, understanding, and detecting deception in text-based computer-mediated communication are presented.First, message feature mining uses message features or cues in CMC messages combined with machine learning techniques to classify messages according to the sender's intent. The method utilizes common classification methods coupled with linguistic analysis of messages for extraction of a number of content-independent input features. A study using message feature mining to classify deceptive and non-deceptive email messages attained classification accuracy between 60\% and 80\%.Second, speech act profiling is a method for evaluating and visualizing synchronous CMC by creating profiles of conversations and their participants using speech act theory and probabilistic classification methods. Transcripts from a large corpus of speech act annotated conversations are used to train language models and a modified hidden Markov model (HMM) to obtain probable speech acts for sentences, which are aggregated for each conversation participant creating a set of speech act profiles. Three studies for validating the profiles are detailed as well as two studies showing speech act profiling's ability to uncover uncertainty related to deception.The methods introduced here are two content-independent methods that represent a possible new direction in text analysis. Both have possible applications outside the context of deception. In addition to aiding deception detection, these methods may also be applicable in information retrieval, technical support training, GSS facilitation support, transportation security, and information assurance.

Page generated in 0.1105 seconds