Global ETD Search

11	Development of strategies for assessing reporting in biomedical research : moving toward enhancing reproducibility Florez Vargas, Oscar January 2016 (has links) The idea that the same experimental findings can be reproduced by a variety of independent approaches is one of the cornerstones of science's claim to objective truth. However, in recent years, it has become clear that science is plagued by findings that cannot be reproduced and, consequently, invalidating research studies and undermining public trust in the research enterprise. The observed lack of reproducibility may be a result, among other things, of the lack of transparency or completeness in reporting. In particular, omissions in reporting the technical nature of the experimental method make it difficult to verify the findings of experimental research in biomedicine. In this context, the assessment of scientific reports could help to overcome - at least in part - the ongoing reproducibility crisis. In addressing this issue, this Thesis undertakes the challenge of developing strategies for the evaluation of reporting biomedical experimental methods in scientific manuscripts. Considering the complexity of experimental design - often involving different technologies and models, we characterise the problem in methods reporting through domain-specific checklists. Then, by using checklists as a decision making tool, supported by miniRECH - a spreadsheet-based approach that can be used by authors, editors and peer-reviewers - a reasonable level of consensus on reporting assessments was achieved regardless of the domain-specific expertise of referees. In addition, by using a text-mining system as a screening tool, a framework to guide an automated assessment of the reporting of bio-experiments was created. The usefulness of these strategies was demonstrated in some domain-specific scientific areas as well as in mouse models across biomedical research. In conclusion, we suggested that the strategies developed in this work could be implemented through the publication process as barriers to prevent incomplete reporting from entering the scientific literature, as well as promoters of completeness in reporting to improve the general value of the scientific evidence. 004
12	Aspect Based Sentiment Analysis On Review Data Xue, Wei 04 December 2017 (has links) With proliferation of user-generated reviews, new opportunities and challenges arise. The advance of Web technologies allows people to access a large amount of reviews of products and services online. Knowing what others like and dislike becomes increasingly important for their decision making in online shopping. The retailers also care more than ever about online reviews, because a vast pool of reviews enables them to monitor reputations and collect feedbacks efficiently. However, people often find difficult times in identifying and summarizing fine-grained sentiments buried in the opinion-rich resources. The traditional sentiment analysis, which focuses on the overall sentiments, fails to uncover the sentiments with regard to the aspects of the reviewed entities. This dissertation studied the research problem of Aspect Based Sentiment Analysis (ABSA), which is to reveal the aspect-dependent sentiment information of review text. ABSA consists of several subtasks: 1) aspect extraction, 2) aspect term extraction, 3) aspect category classification, and 4) sentiment polarity classification at aspect level. We focused on the approach of topic models and neural networks for ABSA. First, to extract the aspects from a collection of reviews and to detect the sentiment polarity regarding the aspects in each review, we proposed a few probabilistic graphical models, which can model words distribution in reviews and aspect ratings at the same time. Second, we presented a multi-task learning model based on long-short term memory and convolutional neural network for aspect category classification and aspect term extraction. Third, for aspect-level sentiment polarity classification, we developed a gated convolution neural network, which can be applied to aspect category sentiment analysis as well as aspect target sentiment analysis. sentiment analysis text mining Computer Engineering
13	Supply chain design: a conceptual model and tactical simulations Brann, Jeremy Matthew 15 May 2009 (has links) In current research literature, supply chain management (SCM) is a hot topic breaching the boundaries of many academic disciplines. SCM-related work can be found in the relevant literature for many disciplines. Supply chain management can be defined as effectively and efficiently managing the flows (information, financial and physical) in all stages of the supply chain to add value to end customers and gain profit for all firms in the chain. Supply chains involve multiple partners with the common goal to satisfy customer demand at a profit. While supply chains are not new, the way academics and practitioners view the need for and the means to manage these chains is relatively new. Very little literature can be found on designing supply chains from the ground up or what dimensions of supply chain management should be considered when designing a supply chain. Additionally, we have found that very few tools exist to help during the design phase of a supply chain. Moreover, very few tools exist that allow for comparing supply chain designs. We contribute to the current literature by determining which supply chain management dimensions should be considered during the design process. We employ text mining to create a supply chain design conceptual model and compare this model to existing supply chain models and reference frameworks. We continue to contribute to the current SCM literature by applying a creative application of concepts and results in the field of Stochastic Processes to build a custom simulator capable of comparing different supply chain designs and providing insights into how the different designs affect the supply chain’s total inventory cost. The simulator provides a mechanism for testing when real-time demand information is more beneficial than using first-come, first-serve (FCFS) order processing when the distributional form of lead-time demand is derived from the supply chain operating characteristics instead of using the assumption that lead-time demand distributions are known. We find that in many instances FCFS out-performs the use of real-time information in providing the lowest total inventory cost. Supply Chain Design Simulation Text Mining
14	Incident Data Analysis Using Data Mining Techniques Veltman, Lisa M. 16 January 2010 (has links) There are several databases collecting information on various types of incidents, and most analyses performed on these databases usually do not expand past basic trend analysis or counting occurrences. This research uses the more robust methods of data mining and text mining to analyze the Hazardous Substances Emergency Events Surveillance (HSEES) system data by identifying relationships among variables, predicting the occurrence of injuries, and assessing the value added by the text data. The benefits of performing a thorough analysis of past incidents include better understanding of safety performance, better understanding of how to focus efforts to reduce incidents, and a better understanding of how people are affected by these incidents. The results of this research showed that visually exploring the data via bar graphs did not yield any noticeable patterns. Clustering the data identified groupings of categories across the variable inputs such as manufacturing events resulting from intentional acts like system startup and shutdown, performing maintenance, and improper dumping. Text mining the data allowed for clustering the events and further description of the data, however, these events were not noticeably distinct and drawing conclusions based on these clusters was limited. Inclusion of the text comments to the overall analysis of HSEES data greatly improved the predictive power of the models. Interpretation of the textual data?s contribution was limited, however, the qualitative conclusions drawn were similar to the model without textual data input. Although HSEES data is collected to describe the effects hazardous substance releases/threatened releases have on people, a fairly good predictive model was still obtained from the few variables identified as cause related. data mining text mining incident data HSEES
15	Feature Translation-based Multilingual Document Clustering Technique Liao, Shan-Yu 08 August 2006 (has links) Document clustering automatically organizes a document collection into distinct groups of similar documents on the basis of their contents. Most of existing document clustering techniques deal with monolingual documents (i.e., documents written in one language). However, with the trend of globalization and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for multilingual document clustering (MLDC). Motivated by its significance and need, this study designs a translation-based MLDC technique. Our empirical evaluation results show that the proposed multilingual document clustering technique achieves satisfactory clustering effectiveness measured by both cluster recall and cluster precision. multilingual document clustering document clustering text mining
16	Construction Gene Relation Network Using Text Mining and Bayesian Network Chen, Shu-fen 11 September 2007 (has links) In the organism, genes don¡¦t work independently. The interaction of genes shows how the functional task affects. Observing the interaction can understand what the relation between genes and how the disease caused. Several methods are adopted to observe the interaction to construct gene relation network. Existing algorithms to construct gene relation network can be classified into two types. One is to use literatures to extract the relation between genes. The other is to construct the network, but the relations between genes are not described. In this thesis, we proposed a hybrid method based on these two methods. Bayesian network is applied to the microarray gene expression data to construct gene network. Text mining is used to extract the gene relations from the documents database. The proposed algorithm integrates gene network and gene relations into gene relation networks. Experimental results show that the related genes are connected in the network. Besides, the relations are also marked on the links of the related genes. Bayesian Network Gene Relation Network Text Mining
17	NewsFerret : supporting identity risk identification and analysis through text mining of news stories Golden, Ryan Christian 18 December 2013 (has links) Individuals, organizations, and devices are now interconnected to an unprecedented degree. This has forced identity risk analysts to redefine what “identity” means in such a context, and to explore new techniques for analyzing an ever expanding threat context. Major hurdles to modeling in this field include the inherent lack of publicly available data due to privacy and safety concerns, as well as the unstructured nature of incident reports. To address this, this report develops a system for strengthening an identity risk model using the text mining of news stories. The system—called NewsFerret—collects and analyzes news stories on the topic of identity theft, establishes semantic relatedness measures between identity concept pairs, and supports analysis of those measures through reports, visualizations, and relevant news stories. Evaluating the resulting analytical models shows where the system is effective in assisting the risk analyst to expand and validate identity risk models. / text Identity Identity theft Risk Text mining
18	Document Clustering with Dual Supervision Hu, Yeming 19 June 2012 (has links) Nowadays, academic researchers maintain a personal library of papers, which they would like to organize based on their needs, e.g., research, projects, or courseware. Clustering techniques are often employed to achieve this goal by grouping the document collection into different topics. Unsupervised clustering does not require any user effort but only produces one universal output with which users may not be satisfied. Therefore, document clustering needs user input for guidance to generate personalized clusters for different users. Semi-supervised clustering incorporates prior information and has the potential to produce customized clusters. Traditional semi-supervised clustering is based on user supervision in the form of labeled instances or pairwise instance constraints. However, alternative forms of user supervision exist such as labeling features. For document clustering, document supervision involves labeling documents while feature supervision involves labeling features. Their joint of use has been called dual supervision. In this thesis, we first explore and propose a framework to use feature supervision for interactive feature selection by indicating whether a feature is useful for clustering. Second, we enhance the semi-supervised clustering with feature supervision using feature reweighting. Third, we propose a unified framework to combine document supervision and feature supervision through seeding. The newly proposed algorithms are evaluated using oracles and demonstrated to be more helpful in producing better clusters matching a single user's point of view than document clustering without any supervision and with only document supervision. Finally, we conduct a user study to confirm that different users have different understandings of the same document collection and prefer personalized clusters. At the same time, we demonstrate that document clustering with dual supervision is able to produce good personalized clusters even with noisy user input. Dual supervision is also demonstrated to be more effective in personalized clustering than no supervision or any single supervision. We also analyze users' behaviors during the user study and present suggestions for the design of document management software. Document Management Text Mining Machine Learning
19	Modelling Deception Detection in Text Gupta, Smita 29 November 2007 (has links) As organizations and government agencies work diligently to detect financial irregularities, malfeasance, fraud and criminal activities through intercepted communication, there is an increasing interest in devising an automated model/tool for deception detection. We build on Pennebaker's empirical model which suggests that deception in text leaves a linguistic signature characterised by changes in frequency of four categories of words: first-person pronouns, exclusive words, negative emotion words, and action words. By applying the model to the Enron email dataset and using an unsupervised matrix-decomposition technique, we explore the differential use of these cue-words/categories in deception detection. Instead of focusing on the predictive power of the individual cue-words, we construct a descriptive model which helps us to understand the multivariate profile of deception based on several linguistic dimensions and highlights the qualitative differences between deceptive and truthful communication. This descriptive model can not only help detect unusual and deceptive communication, but also possibly rank messages along a scale of relative deceptiveness (for instance from strategic negotiation and spin to deception and blatant lying). The model is unintrusive, requires minimal human intervention and, by following the defined pre-processing steps it may be applied to new datasets from different domains. / Thesis (Master, Computing) -- Queen's University, 2007-11-28 18:10:30.45 Deception Detection Fraud Text mining Email Enron
20	概念を用いたHK Graphによるテキスト解析支援 FURUHASHI, Takeshi, YOSHIKAWA, Tomohiro, KOBAYASHI, Daisuke, 古橋, 武, 吉川, 大弘, 小林, 大輔 29 March 2012 (has links) No description available. Questionnaire analysis HK Graph Text mining

Search results