• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 250
  • 124
  • 44
  • 38
  • 31
  • 29
  • 24
  • 24
  • 13
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • Tagged with
  • 632
  • 632
  • 145
  • 132
  • 122
  • 115
  • 95
  • 89
  • 87
  • 82
  • 81
  • 77
  • 72
  • 67
  • 66
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Automated Analysis Techniques for Online Conversations with Application in Deception Detection

Twitchell, Douglas P. January 2005 (has links)
Email, chat, instant messaging, blogs, and newsgroups are now common ways for people to interact. Along with these new ways for sending, receiving, and storing messages comes the challenge of organizing, filtering, and understanding them, for which text mining has been shown to be useful. Additionally, it has done so using both content-dependent and content-independent methods.Unfortunately, computer-mediated communication has also provided criminals, terrorists, spies, and other threats to security a means of efficient communication. However, the often textual encoding of these communications may also provide for the possibility of detecting and tracking those who are deceptive. Two methods for organizing, filtering, understanding, and detecting deception in text-based computer-mediated communication are presented.First, message feature mining uses message features or cues in CMC messages combined with machine learning techniques to classify messages according to the sender's intent. The method utilizes common classification methods coupled with linguistic analysis of messages for extraction of a number of content-independent input features. A study using message feature mining to classify deceptive and non-deceptive email messages attained classification accuracy between 60\% and 80\%.Second, speech act profiling is a method for evaluating and visualizing synchronous CMC by creating profiles of conversations and their participants using speech act theory and probabilistic classification methods. Transcripts from a large corpus of speech act annotated conversations are used to train language models and a modified hidden Markov model (HMM) to obtain probable speech acts for sentences, which are aggregated for each conversation participant creating a set of speech act profiles. Three studies for validating the profiles are detailed as well as two studies showing speech act profiling's ability to uncover uncertainty related to deception.The methods introduced here are two content-independent methods that represent a possible new direction in text analysis. Both have possible applications outside the context of deception. In addition to aiding deception detection, these methods may also be applicable in information retrieval, technical support training, GSS facilitation support, transportation security, and information assurance.
82

The textcat Package for n-Gram Based Text Categorization in R

Feinerer, Ingo, Buchta, Christian, Geiger, Wilhelm, Rauch, Johannes, Mair, Patrick, Hornik, Kurt 02 1900 (has links) (PDF)
Identifying the language used will typically be the first step in most natural language processing tasks. Among the wide variety of language identification methods discussed in the literature, the ones employing the Cavnar and Trenkle (1994) approach to text categorization based on character n-gram frequencies have been particularly successful. This paper presents the R extension package textcat for n-gram based text categorization which implements both the Cavnar and Trenkle approach as well as a reduced n-gram approach designed to remove redundancies of the original approach. A multi-lingual corpus obtained from the Wikipedia pages available on a selection of topics is used to illustrate the functionality of the package and the performance of the provided language identification methods. (authors' abstract)
83

MINING CONSUMER TRENDS FROM ONLINE REVIEWS: AN APPROACH FOR MARKET RESEARCH

Tsubiks, Olga 10 August 2012 (has links)
We present a novel marketing method for consumer trend detection from online user generated content, which is motivated by the gap identified in the market research literature. The existing approaches for trend analysis generally base on rating of trends by industry experts through survey questionnaires, interviews, or similar. These methods proved to be inherently costly and often suffer from bias. Our approach is based on the use of information extraction techniques for identification of trends in large aggregations of social media data. It is cost-effective method that reduces the possibility of errors associated with the design of the sample and the research instrument. The effectiveness of the approach is demonstrated in the experiment performed on restaurant review data. The accuracy of the results is at the level of current approaches for both, information extraction and market research.
84

Automatic Identification of Protein Characterization Articles in support of Database Curation

Denroche, Robert 01 February 2010 (has links)
Experimentally determining the biological function of a protein is a process known as protein characterization. Establishing the role a specific protein plays is a vital step toward fully understanding the biochemical processes that drive life in all its forms. In order for researchers to efficiently locate and benefit from the results of protein characterization experiments, the relevant information is compiled into public databases. To populate such databases, curators, who are experts in the biomedical domain, must search the literature to obtain the relevant information, as the experiment results are typically published in scientific journals. The database curators identify relevant journal articles, read them, and then extract the required information into the database. In recent years the rate of biomedical research has greatly increased, and database curators are unable to keep pace with the number of articles being published. Consequently, maintaining an up-to-date database of characterized proteins, let alone populating a new database, has become a daunting task. In this thesis, we report our work to reduce the effort required from database curators in order to create and maintain a database of characterized proteins. We describe a system we have designed for automatically identifying relevant articles that discuss the results of protein characterization experiments. Classifiers are trained and tested using a large dataset of abstracts, which we collected from articles referenced in public databases, as well as small datasets of manually labeled abstracts. We evaluate both a standard and a modified naïve Bayes classifier and examine several different feature sets for representing articles. Our findings indicate that the resulting classifier performs well enough to be considered useful by the curators of a characterized protein database. / Thesis (Master, Computing) -- Queen's University, 2010-01-28 18:45:17.249
85

Extracting Structured Knowledge from Textual Data in Software Repositories

Hasan, Maryam Unknown Date
No description available.
86

症状からの病名検索支援に基づく病院検索支援システムの提案

SUGIURA, Shin-ichi, FURUHASHI, Takeshi, YOSHIKAWA, Tomohiro, 杉浦, 伸一, 古橋, 武, 吉川, 大弘, HAO, Bo 01 1900 (has links)
No description available.
87

Text Mining Supported Skill Monitoring - A Framework for Analyzing Job Announcements with Special Focus on Curriculum Planning and Spatial Applications

Ledermüller, Karl 18 August 2011 (has links) (PDF)
In our fast changing global village, the wealth of nations and the wealth of individuals are to some extent determined by a production factor which is called human capital. Nations are seen to be more competitive and therefore create a higher level of wealth if they have a better educated workforce. On an individual basis human capital, which is seen as ones skills and competencies, also define the success on the labor market. This success on the labor market generates individual wealth. The probability of an individual receiving a proper job is assumed to be higher, if the skills, competencies and signals of the employee reflect the skills, competencies and signals required at the job market. This dissertation wants to explore the required skills, competencies and signals by screening job announcements and analyze them via text mining techniques. (author's abstract) Part I chapter I gives an overview of relevant literature, which deals with the economic dimension of knowledge. Starting from the idea of the knowledge based economy the question: "What is useful knowledge?" is raised and discussed with the ideas of Mokyr (2005). These ideas form the framework of the heuristic model for job announcement based competence classification (see chapter: 2.5). This classification is the foundation of the first application of curricular investigation 8. To fill the framework with content, the historical development of the role of skills, competencies and signals is shortly discussed. Starting with the competence and skill dimension in the famous book "Wealth of Nations" from Smith (1868) the dissertation focuses on the 1960's where Schultz (1961) (re-) invented the idea of human capital and the importance of investing in this factor. Theodore W. Schultz received a nobel prize for his ideas. Additionally disparities and similarities according to the approaches of Bourdieu (2005) as a famous sociologist and nobel laureate Spence (1973) are disputed. Chapter 2 debates personal competence from an educational perspective. After discussing "What is educational quality" and "Who is interested in high quality education" it is argued, that employability seems to be important for all stakeholder groups. Basic concepts of employability skills and competencies are defined. Theory comparison in chapter 2.5 leads to a heuristic model for job announcement based competence classification. However, this model could be applied for different problems. Chapter 3 defines the role of the job announcements (and its contained skills and competencies) and critical assumptions which lie behind the analysis of job announcements. Part II explains the used methodology by explaining how the data were harvested from the internet (chapter 4). Data were pre- and post processed (chapter 5) and job announcements were connected with their regional origin (chapter 7). Part III shows two possible applications. The first application is a text mining based context analysis of financial related job announcements to help finding strategies to support curriculum planning focused on employability (see chapter 8). The second application shows (regional) credential inflation effects based on the core/periphery model of Krugman (1991) which are seen as an "adverse reaction" of the knowledge based economy idea (see chapter 9). (author's abstract)
88

Personalized Medicine through Automatic Extraction of Information from Medical Texts

Frunza, Oana Magdalena 17 April 2012 (has links)
The wealth of medical-related information available today gives rise to a multidimensional source of knowledge. Research discoveries published in prestigious venues, electronic-health records data, discharge summaries, clinical notes, etc., all represent important medical information that can assist in the medical decision-making process. The challenge that comes with accessing and using such vast and diverse sources of data stands in the ability to distil and extract reliable and relevant information. Computer-based tools that use natural language processing and machine learning techniques have proven to help address such challenges. This current work proposes automatic reliable solutions for solving tasks that can help achieve a personalized-medicine, a medical practice that brings together general medical knowledge and case-specific medical information. Phenotypic medical observations, along with data coming from test results, are not enough when assessing and treating a medical case. Genetic, life-style, background and environmental data also need to be taken into account in the medical decision process. This thesis’s goal is to prove that natural language processing and machine learning techniques represent reliable solutions for solving important medical-related problems. From the numerous research problems that need to be answered when implementing personalized medicine, the scope of this thesis is restricted to four, as follows: 1. Automatic identification of obesity-related diseases by using only textual clinical data; 2. Automatic identification of relevant abstracts of published research to be used for building systematic reviews; 3. Automatic identification of gene functions based on textual data of published medical abstracts; 4. Automatic identification and classification of important medical relations between medical concepts in clinical and technical data. This thesis investigation on finding automatic solutions for achieving a personalized medicine through information identification and extraction focused on individual specific problems that can be later linked in a puzzle-building manner. A diverse representation technique that follows a divide-and-conquer methodological approach shows to be the most reliable solution for building automatic models that solve the above mentioned tasks. The methodologies that I propose are supported by in-depth research experiments and thorough discussions and conclusions.
89

災害のイマジネーション力に関する探索的研究 - 大学生の想像力と阪神淡路大震災の事例との比較 -

元吉, 忠寛, MOTOYOSHI, Tadahiro 20 April 2006 (has links)
国立情報学研究所で電子化したコンテンツを使用している。
90

A coprocessor for fast searching in large databases: Associative Computing Engine

Layer, Christophe, January 2007 (has links)
Ulm, Univ., Diss., 2007.

Page generated in 0.0756 seconds