• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 238
  • 124
  • 44
  • 38
  • 30
  • 29
  • 24
  • 24
  • 13
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • Tagged with
  • 619
  • 619
  • 141
  • 128
  • 115
  • 113
  • 87
  • 86
  • 85
  • 81
  • 80
  • 76
  • 65
  • 64
  • 64
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

The textcat Package for n-Gram Based Text Categorization in R

Feinerer, Ingo, Buchta, Christian, Geiger, Wilhelm, Rauch, Johannes, Mair, Patrick, Hornik, Kurt 02 1900 (has links) (PDF)
Identifying the language used will typically be the first step in most natural language processing tasks. Among the wide variety of language identification methods discussed in the literature, the ones employing the Cavnar and Trenkle (1994) approach to text categorization based on character n-gram frequencies have been particularly successful. This paper presents the R extension package textcat for n-gram based text categorization which implements both the Cavnar and Trenkle approach as well as a reduced n-gram approach designed to remove redundancies of the original approach. A multi-lingual corpus obtained from the Wikipedia pages available on a selection of topics is used to illustrate the functionality of the package and the performance of the provided language identification methods. (authors' abstract)
82

MINING CONSUMER TRENDS FROM ONLINE REVIEWS: AN APPROACH FOR MARKET RESEARCH

Tsubiks, Olga 10 August 2012 (has links)
We present a novel marketing method for consumer trend detection from online user generated content, which is motivated by the gap identified in the market research literature. The existing approaches for trend analysis generally base on rating of trends by industry experts through survey questionnaires, interviews, or similar. These methods proved to be inherently costly and often suffer from bias. Our approach is based on the use of information extraction techniques for identification of trends in large aggregations of social media data. It is cost-effective method that reduces the possibility of errors associated with the design of the sample and the research instrument. The effectiveness of the approach is demonstrated in the experiment performed on restaurant review data. The accuracy of the results is at the level of current approaches for both, information extraction and market research.
83

Automatic Identification of Protein Characterization Articles in support of Database Curation

Denroche, Robert 01 February 2010 (has links)
Experimentally determining the biological function of a protein is a process known as protein characterization. Establishing the role a specific protein plays is a vital step toward fully understanding the biochemical processes that drive life in all its forms. In order for researchers to efficiently locate and benefit from the results of protein characterization experiments, the relevant information is compiled into public databases. To populate such databases, curators, who are experts in the biomedical domain, must search the literature to obtain the relevant information, as the experiment results are typically published in scientific journals. The database curators identify relevant journal articles, read them, and then extract the required information into the database. In recent years the rate of biomedical research has greatly increased, and database curators are unable to keep pace with the number of articles being published. Consequently, maintaining an up-to-date database of characterized proteins, let alone populating a new database, has become a daunting task. In this thesis, we report our work to reduce the effort required from database curators in order to create and maintain a database of characterized proteins. We describe a system we have designed for automatically identifying relevant articles that discuss the results of protein characterization experiments. Classifiers are trained and tested using a large dataset of abstracts, which we collected from articles referenced in public databases, as well as small datasets of manually labeled abstracts. We evaluate both a standard and a modified naïve Bayes classifier and examine several different feature sets for representing articles. Our findings indicate that the resulting classifier performs well enough to be considered useful by the curators of a characterized protein database. / Thesis (Master, Computing) -- Queen's University, 2010-01-28 18:45:17.249
84

Extracting Structured Knowledge from Textual Data in Software Repositories

Hasan, Maryam Unknown Date
No description available.
85

症状からの病名検索支援に基づく病院検索支援システムの提案

SUGIURA, Shin-ichi, FURUHASHI, Takeshi, YOSHIKAWA, Tomohiro, 杉浦, 伸一, 古橋, 武, 吉川, 大弘, HAO, Bo 01 1900 (has links)
No description available.
86

Text Mining Supported Skill Monitoring - A Framework for Analyzing Job Announcements with Special Focus on Curriculum Planning and Spatial Applications

Ledermüller, Karl 18 August 2011 (has links) (PDF)
In our fast changing global village, the wealth of nations and the wealth of individuals are to some extent determined by a production factor which is called human capital. Nations are seen to be more competitive and therefore create a higher level of wealth if they have a better educated workforce. On an individual basis human capital, which is seen as ones skills and competencies, also define the success on the labor market. This success on the labor market generates individual wealth. The probability of an individual receiving a proper job is assumed to be higher, if the skills, competencies and signals of the employee reflect the skills, competencies and signals required at the job market. This dissertation wants to explore the required skills, competencies and signals by screening job announcements and analyze them via text mining techniques. (author's abstract) Part I chapter I gives an overview of relevant literature, which deals with the economic dimension of knowledge. Starting from the idea of the knowledge based economy the question: "What is useful knowledge?" is raised and discussed with the ideas of Mokyr (2005). These ideas form the framework of the heuristic model for job announcement based competence classification (see chapter: 2.5). This classification is the foundation of the first application of curricular investigation 8. To fill the framework with content, the historical development of the role of skills, competencies and signals is shortly discussed. Starting with the competence and skill dimension in the famous book "Wealth of Nations" from Smith (1868) the dissertation focuses on the 1960's where Schultz (1961) (re-) invented the idea of human capital and the importance of investing in this factor. Theodore W. Schultz received a nobel prize for his ideas. Additionally disparities and similarities according to the approaches of Bourdieu (2005) as a famous sociologist and nobel laureate Spence (1973) are disputed. Chapter 2 debates personal competence from an educational perspective. After discussing "What is educational quality" and "Who is interested in high quality education" it is argued, that employability seems to be important for all stakeholder groups. Basic concepts of employability skills and competencies are defined. Theory comparison in chapter 2.5 leads to a heuristic model for job announcement based competence classification. However, this model could be applied for different problems. Chapter 3 defines the role of the job announcements (and its contained skills and competencies) and critical assumptions which lie behind the analysis of job announcements. Part II explains the used methodology by explaining how the data were harvested from the internet (chapter 4). Data were pre- and post processed (chapter 5) and job announcements were connected with their regional origin (chapter 7). Part III shows two possible applications. The first application is a text mining based context analysis of financial related job announcements to help finding strategies to support curriculum planning focused on employability (see chapter 8). The second application shows (regional) credential inflation effects based on the core/periphery model of Krugman (1991) which are seen as an "adverse reaction" of the knowledge based economy idea (see chapter 9). (author's abstract)
87

Personalized Medicine through Automatic Extraction of Information from Medical Texts

Frunza, Oana Magdalena 17 April 2012 (has links)
The wealth of medical-related information available today gives rise to a multidimensional source of knowledge. Research discoveries published in prestigious venues, electronic-health records data, discharge summaries, clinical notes, etc., all represent important medical information that can assist in the medical decision-making process. The challenge that comes with accessing and using such vast and diverse sources of data stands in the ability to distil and extract reliable and relevant information. Computer-based tools that use natural language processing and machine learning techniques have proven to help address such challenges. This current work proposes automatic reliable solutions for solving tasks that can help achieve a personalized-medicine, a medical practice that brings together general medical knowledge and case-specific medical information. Phenotypic medical observations, along with data coming from test results, are not enough when assessing and treating a medical case. Genetic, life-style, background and environmental data also need to be taken into account in the medical decision process. This thesis’s goal is to prove that natural language processing and machine learning techniques represent reliable solutions for solving important medical-related problems. From the numerous research problems that need to be answered when implementing personalized medicine, the scope of this thesis is restricted to four, as follows: 1. Automatic identification of obesity-related diseases by using only textual clinical data; 2. Automatic identification of relevant abstracts of published research to be used for building systematic reviews; 3. Automatic identification of gene functions based on textual data of published medical abstracts; 4. Automatic identification and classification of important medical relations between medical concepts in clinical and technical data. This thesis investigation on finding automatic solutions for achieving a personalized medicine through information identification and extraction focused on individual specific problems that can be later linked in a puzzle-building manner. A diverse representation technique that follows a divide-and-conquer methodological approach shows to be the most reliable solution for building automatic models that solve the above mentioned tasks. The methodologies that I propose are supported by in-depth research experiments and thorough discussions and conclusions.
88

災害のイマジネーション力に関する探索的研究 - 大学生の想像力と阪神淡路大震災の事例との比較 -

元吉, 忠寛, MOTOYOSHI, Tadahiro 20 April 2006 (has links)
国立情報学研究所で電子化したコンテンツを使用している。
89

A coprocessor for fast searching in large databases: Associative Computing Engine

Layer, Christophe, January 2007 (has links)
Ulm, Univ., Diss., 2007.
90

Web-based named entity recognition and data integration to accelerate molecular biology research

Pafilis, Evangelos. January 2008 (has links)
Heidelberg, Univ., Diss., 2008. / Online publiziert: 2009.

Page generated in 0.0688 seconds