Spelling suggestions: "subject:"[een] TEXT MINING"" "subject:"[enn] TEXT MINING""
71 |
MINING CONSUMER TRENDS FROM ONLINE REVIEWS: AN APPROACH FOR MARKET RESEARCHTsubiks, Olga 10 August 2012 (has links)
We present a novel marketing method for consumer trend detection from online user generated content, which is motivated by the gap identified in the market research literature. The existing approaches for trend analysis generally base on rating of trends by industry experts through survey questionnaires, interviews, or similar. These methods proved to be inherently costly and often suffer from bias. Our approach is based on the use of information extraction techniques for identification of trends in large aggregations of social media data. It is cost-effective method that reduces the possibility of errors associated with the design of the sample and the research instrument. The effectiveness of the approach is demonstrated in the experiment performed on restaurant review data. The accuracy of the results is at the level of current approaches for both, information extraction and market research.
|
72 |
Automatic Identification of Protein Characterization Articles in support of Database CurationDenroche, Robert 01 February 2010 (has links)
Experimentally determining the biological function of a protein is a process known as protein characterization. Establishing the role a specific protein plays is a vital step toward fully understanding the biochemical processes that drive life in all its forms. In order for researchers to efficiently locate and benefit from the results of protein characterization experiments, the relevant information is compiled into public databases. To populate such databases, curators, who are experts in the biomedical domain, must search the literature to obtain the relevant information, as the experiment results are typically published in scientific journals. The database curators identify relevant journal articles, read them, and then extract the required information into the database. In recent years the rate of biomedical research has greatly increased, and database curators are unable to keep pace with the number of articles being published. Consequently, maintaining an up-to-date database of characterized proteins, let alone populating a new database, has become a daunting task.
In this thesis, we report our work to reduce the effort required from database curators in order to create and maintain a database of characterized proteins. We describe a system we have designed for automatically identifying relevant articles that discuss the results of protein characterization experiments. Classifiers are trained and tested using a large dataset of abstracts, which we collected from articles referenced in public databases, as well as small datasets of manually labeled abstracts. We evaluate both a standard and a modified naïve Bayes classifier and examine several different feature sets for representing articles. Our findings indicate that the resulting classifier performs well enough to be considered useful by the curators of a characterized protein database. / Thesis (Master, Computing) -- Queen's University, 2010-01-28 18:45:17.249
|
73 |
Extracting Structured Knowledge from Textual Data in Software RepositoriesHasan, Maryam Unknown Date
No description available.
|
74 |
Text Mining Supported Skill Monitoring - A Framework for Analyzing Job Announcements with Special Focus on Curriculum Planning and Spatial ApplicationsLedermüller, Karl 18 August 2011 (has links) (PDF)
In our fast changing global village, the wealth of nations and the wealth of individuals are to some extent determined by a production factor which is called human capital.
Nations are seen to be more competitive and therefore create a higher level of wealth if they have a better educated workforce. On an individual basis human capital, which is seen as ones skills and competencies, also define the success on the labor market. This success on the labor market generates individual wealth. The probability of an individual receiving a proper job is assumed to be higher, if the skills, competencies and signals of the employee reflect the skills, competencies and signals required at the job market. This dissertation wants to explore the required skills, competencies and signals by screening
job announcements and analyze them via text mining techniques. (author's abstract)
Part I chapter I gives an overview of relevant literature, which deals with the economic dimension of knowledge. Starting from the idea of the knowledge based economy the
question: "What is useful knowledge?" is raised and discussed with the ideas of Mokyr (2005). These ideas form the framework of the heuristic model for job announcement
based competence classification (see chapter: 2.5). This classification is the foundation of the first application of curricular investigation 8. To fill the framework with content, the historical development of the role of skills, competencies and signals is shortly discussed.
Starting with the competence and skill dimension in the famous book "Wealth of Nations" from Smith (1868) the dissertation focuses on the 1960's where Schultz (1961)
(re-) invented the idea of human capital and the importance of investing in this factor. Theodore W. Schultz received a nobel prize for his ideas. Additionally disparities and
similarities according to the approaches of Bourdieu (2005) as a famous sociologist and nobel laureate Spence (1973) are disputed.
Chapter 2 debates personal competence from an educational perspective. After discussing "What is educational quality" and "Who is interested in high quality education" it is argued, that employability seems to be important for all stakeholder groups. Basic concepts of employability skills and competencies are defined. Theory comparison in chapter 2.5 leads to a heuristic model for job announcement based competence classification. However, this model could be applied for different problems. Chapter 3 defines the role of the job announcements (and its contained skills and competencies) and critical assumptions which lie behind the analysis of job announcements.
Part II explains the used methodology by explaining how the data were harvested from the internet (chapter 4). Data were pre- and post processed (chapter 5) and job announcements were connected with their regional origin (chapter 7).
Part III shows two possible applications. The first application is a text mining based context analysis of financial related job announcements to help finding strategies to support curriculum planning focused on employability (see chapter 8). The second application shows (regional) credential inflation effects based on the core/periphery model of Krugman (1991) which are seen as an "adverse reaction" of the knowledge based economy idea
(see chapter 9). (author's abstract)
|
75 |
Personalized Medicine through Automatic Extraction of Information from Medical TextsFrunza, Oana Magdalena 17 April 2012 (has links)
The wealth of medical-related information available today gives rise to a multidimensional source of knowledge. Research discoveries published in prestigious venues, electronic-health records data, discharge summaries, clinical notes, etc., all represent important medical information that can assist in the medical decision-making process. The challenge that comes with accessing and using such vast and diverse sources of data stands in the ability to distil and extract reliable and relevant information. Computer-based tools that use natural language processing and machine learning techniques have proven to help address such challenges. This current work proposes automatic reliable solutions for solving tasks that can help achieve a personalized-medicine, a medical practice that brings together general medical knowledge and case-specific medical information. Phenotypic medical observations, along with data coming from test results, are not enough when assessing and treating a medical case. Genetic, life-style, background and environmental data also need to be taken into
account in the medical decision process. This thesis’s goal is to prove that natural
language processing and machine learning techniques represent reliable solutions for
solving important medical-related problems.
From the numerous research problems that need to be answered when implementing
personalized medicine, the scope of this thesis is restricted to four, as follows:
1. Automatic identification of obesity-related diseases by using only textual clinical
data;
2. Automatic identification of relevant abstracts of published research to be used for
building systematic reviews;
3. Automatic identification of gene functions based on textual data of published medical abstracts;
4. Automatic identification and classification of important medical relations between medical concepts in clinical and technical data. This thesis investigation on finding automatic solutions for achieving a personalized medicine through information identification and extraction focused on individual specific problems that can be later linked in a puzzle-building manner. A diverse representation technique that follows a divide-and-conquer methodological approach shows to be the most reliable solution for building automatic models that solve the above mentioned
tasks. The methodologies that I propose are supported by in-depth research experiments
and thorough discussions and conclusions.
|
76 |
災害のイマジネーション力に関する探索的研究 - 大学生の想像力と阪神淡路大震災の事例との比較 -元吉, 忠寛, MOTOYOSHI, Tadahiro 20 April 2006 (has links)
国立情報学研究所で電子化したコンテンツを使用している。
|
77 |
A coprocessor for fast searching in large databases: Associative Computing EngineLayer, Christophe, January 2007 (has links)
Ulm, Univ., Diss., 2007.
|
78 |
Web-based named entity recognition and data integration to accelerate molecular biology researchPafilis, Evangelos. January 2008 (has links)
Heidelberg, Univ., Diss., 2008. / Online publiziert: 2009.
|
79 |
Τεχνικές text mining για την συγκριτική ανάλυση νοήματος κειμένουΠλώτα, Δέσποινα 27 December 2010 (has links)
Τις τελευταίες δεκαετίες έχουν παραχθεί ασύλληπτα μεγάλες ποσότητες δεδομένων από διάφορες διεργασίες που έχουν οργανωθεί με χρήση υπολογιστικών συστημάτων.
Το μεγαλύτερο βέβαια ποσό των δεδομένων βρίσκεται σε μορφή κειμένων και αυτός ο τύπος των μη δομημένων στοιχείων στερείται συνήθως «τα στοιχεία για τα στοιχεία». Η ανάγκη λοιπόν για την αυτοματοποιημένη εξαγωγή χρήσιμης γνώσης από τεράστια ποσά κειμενικών στοιχείων προκειμένου να βοηθηθεί η ανθρώπινη ανάλυση είναι προφανής.
Η εξόρυξη κειμένου (text mining) είναι ένας νέος ερευνητικός τομέας που προσπαθεί να επιλύσει το πρόβλημα της υπερφόρτωσης πληροφοριών με την χρησιμοποίηση των τεχνικών από την εξόρυξη από δεδομένα (data mining), την μηχανική μάθηση (machine learning), την επεξεργασία φυσικής γλώσσας (natural language processing), την ανάκτηση πληροφορίας (information retrieval), την εξαγωγή πληροφορίας (information extraction) και τη διαχείριση γνώσης (Knowledge management).
Βασιζόμενοι λοιπόν σε αυτήν την τεχνική εξόρυξης κειμένου παρουσιάζουμε σε αυτή την διπλωματική εργασία μια μεθοδολογία εξαγωγής γνώσης από κείμενο με απώτερο σκοπό την απόδοση της πατρότητας δυο έργων σε συγκεκριμένο συγγραφέα.
Το κύριο θέμα ενδιαφέροντος είναι το εξής: είναι η Ιλιάδα και Οδύσσεια έργα του ίδιου ποιητή;
Η μεθοδολογία μας βασίζεται στην ανάλυση του «σημαινόμενου» παρά του «σημαίνοντος» στην Ιλιάδα και στην Οδύσσεια.
Σε μία πρώτη φάση μετασχηματίζουμε τα δεδομένα: διατηρήθηκαν μόνο τα ουσιαστικά, τα ρήματα, τα επίθετα και τα επιρρήματα τα οποία οργανώθηκαν σε ομάδες συνωνύμων, όπου κάθε ομάδα αντιπροσωπεύει μία έννοια. Επιλέξαμε να κάνουμε ανάλυση των σχέσεων μεταξύ αυτών των εννοιών. Έτσι μετατρέψαμε όλες τις προτάσεις στο κείμενο, σε προτάσεις οι οποίες αποτελούνται μόνο από αυτές τις έννοιες, απαλείφοντας φυσικά τα διπλότυπα.
Στη συνέχεια μετασχηματίσαμε το κείμενο σε μια δομημένη μορφή, ώστε να μπορέσουμε να το αποθηκεύσουμε σε «εγγραφές» μιας βάσης δεδομένων. Συγκεκριμένα, θεωρήσαμε συνεχή τμήματα κειμένου σαν τέτοιες «εγγραφές». Πειραματιστήκαμε ορίζοντας είτε μία πρόταση είτε δύο συνεχόμενες ως «εγγραφή», χρησιμοποιώντας τον Apriori αλγόριθμο για να εξάγουμε «κανόνες συσχέτισης» της μορφής «90% των εγγραφών που περιέχουν την έννοια χ περιέχουν και την έννοια y». Εξάγαμε ένα μεγάλο αριθμό ισχυρών συσχετίσεων μεταξύ ίδιων εννοιών και στα δυο ποιήματα (π.χ. «γη»-«άνδρας»). Υπάρχουν επίσης συσχετίσεις μεταξύ διαφορετικών εννοιών (π.χ. «μάχη»-«άνδρας» μόνο στην Ιλιάδα) και διαφορετικές συσχετίσεις για την ίδια έννοια (π.χ. «ήρωας»-«μάχη» στην Ιλιάδα και «ήρωας»-«κατοικία» στην Οδύσσεια). Όμως, δεν βρήκαμε καμία αντίθεση. Αυτά τα αποτελέσματα ενδεχομένως να οδηγούν στο συμπέρασμα ότι ο Όμηρος έγραψε και τα δυο έπη. / What is generally called “the Homeric question” is by far the oldest author-attribution problem. The Homeric question really encompasses several issues, e.g. are the Iliad and Odyssey each work of a single poet? In this paper we try to answer the question using a data mining technique. Data mining is an emerging research area that develops techniques for knowledge discovery in huge volumes of data. Data mining methods have been applied to a wide variety of domains, from market basket analysis to the analysis of satellite pictures and human genomes.
More specifically, in this paper, we present an application of data mining in discovering whether a document is ascribed to a writer. Our methodology is based on analyzing rather the content than the syntax. More specifically, we propose a technique for mining association rules, in order to analyze associations amongst concepts. We, also demonstrate the results of the analyses which we have undertaken using this algorithm.
|
80 |
Avaliação das capacidades dinâmicas através de técnicas de business analytcsScherer, Jonatas Ost January 2017 (has links)
O desenvolvimento das capacidades dinâmicas habilita a empresa à inovar de forma mais eficiente, e por conseguinte, melhorar seu desempenho. Esta tese apresenta um framework para mensuração do grau de desenvolvimento das capacidades dinâmicas da empresa. Através de técnicas de text mining uma bag of words específica para as capacidades dinâmicas é proposta, bem como, baseado na literatura é proposto um conjunto de rotinas para avaliar a operacionalização e desenvolvimento das capacidades dinâmicas. Para avaliação das capacidades dinâmicas, foram aplicadas técnicas de text mining utilizando como fonte de dados os relatórios anuais de catorze empresas aéreas. Através da aplicação piloto foi possível realizar um diagnóstico das empresas aéreas e do setor. O trabalho aborda uma lacuna da literatura das capacidades dinâmicas, ao propor um método quantitativo para sua mensuração, assim como, a proposição de uma bag of words específica para as capacidades dinâmicas. Em termos práticos, a proposição pode contribuir para a tomada de decisões estratégicas embasada em dados, possibilitando assim inovar com mais eficiência e melhorar desempenho da firma. / The development of dynamic capabilities enables the company to innovate more efficiently and therefore improves its performance. This thesis presents a framework for measuring the dynamic capabilities development. Text mining techniques were used to propose a specific bag of words for dynamic capabilities. Furthermore, based on the literature, a group of routines is proposed to evaluate the operationalization and development of dynamic capabilities. In order to evaluate the dynamic capabilities, text mining techniques were applied using the annual reports of fourteen airlines as the data source. Through this pilot application it was possible to carry out a diagnosis of the airlines and the sector as well. The thesis approaches a dynamic capabilities literature gap by proposing a quantitative method for its measurement, as well as, the proposition of a specific bag of words for dynamic capabilities. The proposition can contribute to strategic decision making based on data, allowing firms to innovate more efficiently and improve performance.
|
Page generated in 0.0454 seconds