Spelling suggestions: "subject:"[een] TEXT MINING"" "subject:"[enn] TEXT MINING""
11 |
Development of strategies for assessing reporting in biomedical research : moving toward enhancing reproducibilityFlorez Vargas, Oscar January 2016 (has links)
The idea that the same experimental findings can be reproduced by a variety of independent approaches is one of the cornerstones of science's claim to objective truth. However, in recent years, it has become clear that science is plagued by findings that cannot be reproduced and, consequently, invalidating research studies and undermining public trust in the research enterprise. The observed lack of reproducibility may be a result, among other things, of the lack of transparency or completeness in reporting. In particular, omissions in reporting the technical nature of the experimental method make it difficult to verify the findings of experimental research in biomedicine. In this context, the assessment of scientific reports could help to overcome - at least in part - the ongoing reproducibility crisis. In addressing this issue, this Thesis undertakes the challenge of developing strategies for the evaluation of reporting biomedical experimental methods in scientific manuscripts. Considering the complexity of experimental design - often involving different technologies and models, we characterise the problem in methods reporting through domain-specific checklists. Then, by using checklists as a decision making tool, supported by miniRECH - a spreadsheet-based approach that can be used by authors, editors and peer-reviewers - a reasonable level of consensus on reporting assessments was achieved regardless of the domain-specific expertise of referees. In addition, by using a text-mining system as a screening tool, a framework to guide an automated assessment of the reporting of bio-experiments was created. The usefulness of these strategies was demonstrated in some domain-specific scientific areas as well as in mouse models across biomedical research. In conclusion, we suggested that the strategies developed in this work could be implemented through the publication process as barriers to prevent incomplete reporting from entering the scientific literature, as well as promoters of completeness in reporting to improve the general value of the scientific evidence.
|
12 |
Aspect Based Sentiment Analysis On Review DataXue, Wei 04 December 2017 (has links)
With proliferation of user-generated reviews, new opportunities and challenges arise. The advance of Web technologies allows people to access a large amount of reviews of products and services online. Knowing what others like and dislike becomes increasingly important for their decision making in online shopping. The retailers also care more than ever about online reviews, because a vast pool of reviews enables them to monitor reputations and collect feedbacks efficiently. However, people often find difficult times in identifying and summarizing fine-grained sentiments buried in the opinion-rich resources. The traditional sentiment analysis, which focuses on the overall sentiments, fails to uncover the sentiments with regard to the aspects of the reviewed entities.
This dissertation studied the research problem of Aspect Based Sentiment Analysis (ABSA), which is to reveal the aspect-dependent sentiment information of review text. ABSA consists of several subtasks: 1) aspect extraction, 2) aspect term extraction, 3) aspect category classification, and 4) sentiment polarity classification at aspect level. We focused on the approach of topic models and neural networks for ABSA. First, to extract the aspects from a collection of reviews and to detect the sentiment polarity regarding the aspects in each review, we proposed a few probabilistic graphical models, which can model words distribution in reviews and aspect ratings at the same time. Second, we presented a multi-task learning model based on long-short term memory and convolutional neural network for aspect category classification and aspect term extraction. Third, for aspect-level sentiment polarity classification, we developed a gated convolution neural network, which can be applied to aspect category sentiment analysis as well as aspect target sentiment analysis.
|
13 |
Supply chain design: a conceptual model and tactical simulationsBrann, Jeremy Matthew 15 May 2009 (has links)
In current research literature, supply chain management (SCM) is a hot topic
breaching the boundaries of many academic disciplines. SCM-related work can be
found in the relevant literature for many disciplines. Supply chain management can be
defined as effectively and efficiently managing the flows (information, financial and
physical) in all stages of the supply chain to add value to end customers and gain profit
for all firms in the chain. Supply chains involve multiple partners with the common goal
to satisfy customer demand at a profit.
While supply chains are not new, the way academics and practitioners view the
need for and the means to manage these chains is relatively new. Very little literature
can be found on designing supply chains from the ground up or what dimensions of
supply chain management should be considered when designing a supply chain.
Additionally, we have found that very few tools exist to help during the design phase of
a supply chain. Moreover, very few tools exist that allow for comparing supply chain
designs.
We contribute to the current literature by determining which supply chain
management dimensions should be considered during the design process. We employ
text mining to create a supply chain design conceptual model and compare this model to existing supply chain models and reference frameworks. We continue to contribute to
the current SCM literature by applying a creative application of concepts and results in
the field of Stochastic Processes to build a custom simulator capable of comparing
different supply chain designs and providing insights into how the different designs
affect the supply chain’s total inventory cost. The simulator provides a mechanism for
testing when real-time demand information is more beneficial than using first-come,
first-serve (FCFS) order processing when the distributional form of lead-time demand is
derived from the supply chain operating characteristics instead of using the assumption
that lead-time demand distributions are known. We find that in many instances FCFS
out-performs the use of real-time information in providing the lowest total inventory
cost.
|
14 |
Incident Data Analysis Using Data Mining TechniquesVeltman, Lisa M. 16 January 2010 (has links)
There are several databases collecting information on various types of incidents, and
most analyses performed on these databases usually do not expand past basic trend
analysis or counting occurrences. This research uses the more robust methods of data
mining and text mining to analyze the Hazardous Substances Emergency Events
Surveillance (HSEES) system data by identifying relationships among variables,
predicting the occurrence of injuries, and assessing the value added by the text data. The
benefits of performing a thorough analysis of past incidents include better understanding
of safety performance, better understanding of how to focus efforts to reduce incidents,
and a better understanding of how people are affected by these incidents.
The results of this research showed that visually exploring the data via bar graphs did not
yield any noticeable patterns. Clustering the data identified groupings of categories
across the variable inputs such as manufacturing events resulting from intentional acts
like system startup and shutdown, performing maintenance, and improper dumping.
Text mining the data allowed for clustering the events and further description of the data,
however, these events were not noticeably distinct and drawing conclusions based on
these clusters was limited. Inclusion of the text comments to the overall analysis of
HSEES data greatly improved the predictive power of the models. Interpretation of the
textual data?s contribution was limited, however, the qualitative conclusions drawn were
similar to the model without textual data input. Although HSEES data is collected to
describe the effects hazardous substance releases/threatened releases have on people, a
fairly good predictive model was still obtained from the few variables identified as cause
related.
|
15 |
Feature Translation-based Multilingual Document Clustering TechniqueLiao, Shan-Yu 08 August 2006 (has links)
Document clustering automatically organizes a document collection into distinct groups of similar documents on the basis of their contents. Most of existing document clustering techniques deal with monolingual documents (i.e., documents written in one language). However, with the trend of globalization and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for multilingual document clustering (MLDC). Motivated by its significance and need, this study designs a translation-based MLDC technique. Our empirical evaluation results show that the proposed multilingual document clustering technique achieves satisfactory clustering effectiveness measured by both cluster recall and cluster precision.
|
16 |
Construction Gene Relation Network Using Text Mining and Bayesian NetworkChen, Shu-fen 11 September 2007 (has links)
In the organism, genes don¡¦t work independently. The interaction of genes shows how the functional task affects. Observing the interaction can understand what the relation between genes and how the disease caused. Several methods are adopted to observe the interaction to construct gene relation network. Existing algorithms to construct gene relation network can be classified into two types. One is to use literatures to extract the relation between genes. The other is to construct the network, but the relations between genes are not described. In this thesis, we proposed a hybrid method based on these two methods. Bayesian network is applied to the microarray gene expression data to construct gene network. Text mining is used to extract the gene relations from the documents database. The proposed algorithm integrates gene network and gene relations into gene relation networks. Experimental results show that the related genes are connected in the network. Besides, the relations are also marked on the links of the related genes.
|
17 |
NewsFerret : supporting identity risk identification and analysis through text mining of news storiesGolden, Ryan Christian 18 December 2013 (has links)
Individuals, organizations, and devices are now interconnected to an unprecedented degree. This has forced identity risk analysts to redefine what “identity” means in such a context, and to explore new techniques for analyzing an ever expanding threat context. Major hurdles to modeling in this field include the inherent lack of publicly available data due to privacy and safety concerns, as well as the unstructured nature of incident reports. To address this, this report develops a system for strengthening an identity risk model using the text mining of news stories. The system—called NewsFerret—collects and analyzes news stories on the topic of identity theft, establishes semantic relatedness measures between identity concept pairs, and supports analysis of those measures through reports, visualizations, and relevant news stories. Evaluating the resulting analytical models shows where the system is effective in assisting the risk analyst to expand and validate identity risk models. / text
|
18 |
Document Clustering with Dual SupervisionHu, Yeming 19 June 2012 (has links)
Nowadays, academic researchers maintain a personal library of papers, which they would like
to organize based on their needs, e.g., research, projects, or courseware. Clustering techniques
are often employed to achieve this goal by grouping the document collection into different
topics. Unsupervised clustering does not require any user effort but only produces one universal
output with which users may not be satisfied. Therefore, document clustering needs user input
for guidance to generate personalized clusters for different users. Semi-supervised clustering
incorporates prior information and has the potential to produce customized clusters. Traditional
semi-supervised clustering is based on user supervision in the form of labeled instances or
pairwise instance constraints. However, alternative forms of user supervision exist such as
labeling features. For document clustering, document supervision involves labeling documents
while feature supervision involves labeling features. Their joint of use has been called dual
supervision. In this thesis, we first explore and propose a framework to use feature supervision
for interactive feature selection by indicating whether a feature is useful for clustering.
Second, we enhance the semi-supervised clustering with feature supervision using feature
reweighting. Third, we propose a unified framework to combine document supervision and
feature supervision through seeding. The newly proposed algorithms are evaluated using oracles
and demonstrated to be more helpful in producing better clusters matching a single user's point
of view than document clustering without any supervision and with only document supervision.
Finally, we conduct a user study to confirm that different users have different understandings of
the same document collection and prefer personalized clusters. At the same time, we demonstrate
that document clustering with dual supervision is able to produce good personalized clusters
even with noisy user input. Dual supervision is also demonstrated to be more effective in
personalized clustering than no supervision or any single supervision. We also analyze users'
behaviors during the user study and present suggestions for the design of document management
software.
|
19 |
Modelling Deception Detection in TextGupta, Smita 29 November 2007 (has links)
As organizations and government agencies work diligently to detect financial irregularities, malfeasance, fraud and criminal activities through intercepted communication, there is an increasing interest in devising an automated model/tool for deception detection. We build on Pennebaker's empirical model which suggests that deception in text leaves a linguistic signature characterised by changes in frequency of four categories of words: first-person pronouns, exclusive words, negative emotion words, and action words. By applying the model to the Enron email dataset and using an unsupervised matrix-decomposition technique, we explore the differential use of these cue-words/categories in deception detection. Instead of focusing on the predictive power of the individual cue-words, we construct a descriptive model which helps us to understand the multivariate profile of deception based on several linguistic dimensions and highlights the qualitative differences between deceptive and truthful communication. This descriptive model can not only help detect unusual and deceptive communication, but also possibly rank messages along a scale of relative deceptiveness (for instance from strategic negotiation and spin to deception and blatant lying). The model is unintrusive, requires minimal human intervention and, by following the defined pre-processing steps it may be applied to new datasets from different domains. / Thesis (Master, Computing) -- Queen's University, 2007-11-28 18:10:30.45
|
20 |
概念を用いたHK Graphによるテキスト解析支援FURUHASHI, Takeshi, YOSHIKAWA, Tomohiro, KOBAYASHI, Daisuke, 古橋, 武, 吉川, 大弘, 小林, 大輔 29 March 2012 (has links)
No description available.
|
Page generated in 0.0292 seconds