Spelling suggestions: "subject:"textmining"" "subject:"detemining""
11 |
Caracterización de perfiles influyentes en Twitter de acuerdo a tópicos de opinión y la generación de contenido interesanteVera Cid, Felipe Andrés January 2015 (has links)
Ingeniero Civil Industrial / Durante los últimos años en Chile ha aumentado el uso de Internet, de smartphones y de las redes sociales. Entre todas las redes sociales cabe destacar Twitter, dada la visibilidad que tiene al ser una red más abierta que otras. En Chile, el uso de Twitter se concentra en dos tipos: informarse y opinar. La cantidad de opiniones que se registran en Twitter es de gran interés para distintos actores del país, entre los cuales se encuentran empresas que utilizan Twitter como una herramienta de comunicación con sus clientes, para resolver quejas y dudas y hasta para realizar campañas de marketing viral en la red. Dada la masificación de Twitter y la gran cantidad de usuarios, existe la necesidad de poder saber el nivel de influencia de los usuarios y así poder priorizarlos en la resolución de sus necesidad como también poder hacer más efectivas diversas campañas de marketing.
Hoy en día, existen diversos servicios que realizan este tipo de tareas, como Klout o BrandMetric. Sin embargo, estos modelos miden la influencia de los usuarios de diversas formas, pero ninguno intenta vaticinar a los usuarios que se volverán influyentes en un futuro próximo. El presente trabajo consiste en definir una influencia en Twitter para luego ver se proyectaría en el tiempo, tomando como hipótesis que es posible medir la influencia de un usuario a partir de su generación de contenido interesante, para lograrlo se definió la influencia en la red de Twitter como la capacidad de generar contenido interesante que repercute en la red social. Viendo los modelos existentes se escogió uno y se modificó levemente para poder obtener un puntaje de lo interesante del contenido generado por un perfil.
Dado este modelo se generaron rankings sobre la influencia de un usuario en Twitter, además de rankings en agrupaciones de tópicos asociadas a política y deportes. No se pudo segregar en una mayor cantidad de tópicos por diversos motivos, por lo cual no se consideró que el modelo haya cumplido su objetivo de generar rankings de influencia para distintos grupos de tópicos. Luego, se realizaron los análisis de la predictibilidad para la influencia modelada, llegando a la conclusión que el periodo de datos es muy corto para poder predecir las series temporales.
Aunque los resultados pueden parecer desalentadores, el trabajo realizado deja un camino abierto para realizar otros enfoques y trabajos que son explicados en el capítulo final de la memoria. Así, se espera que una buena segmentación y priorización de perfiles puede servir para mejorar la resolución de problemas, encontrar perfiles que serán influyentes en determinados tópicos y focalizar campañas de marketing utilizando perfiles que no sean de un alto costo.
|
12 |
Distributed Text Mining in RTheußl, Stefan, Feinerer, Ingo, Hornik, Kurt 16 March 2011 (has links) (PDF)
R has recently gained explicit text mining support with the "tm" package enabling statisticians to answer many interesting research questions via statistical analysis or modeling of (text) corpora. However, we typically face two challenges when analyzing large corpora: (1) the amount of data to be processed in a single machine is usually limited by the available main memory (i.e., RAM), and (2) an increase of the amount of data to be analyzed leads to increasing computational workload. Fortunately,
adequate parallel programming models like MapReduce and the
corresponding open source implementation called Hadoop allow for processing data sets beyond what would fit into memory.
In this paper we present the package "tm.plugin.dc" offering a seamless integration between "tm" and Hadoop. We show on the basis of an application in culturomics that we
can efficiently handle data sets of significant size. / Series: Research Report Series / Department of Statistics and Mathematics
|
13 |
A tm Plug-In for Distributed Text Mining in RTheußl, Stefan, Feinerer, Ingo, Hornik, Kurt 11 1900 (has links) (PDF)
R has gained explicit text mining support with the tm package enabling statisticians
to answer many interesting research questions via statistical analysis or modeling of (text)
corpora. However, we typically face two challenges when analyzing large corpora: (1) the
amount of data to be processed in a single machine is usually limited by the available main
memory (i.e., RAM), and (2) the more data to be analyzed the higher the need for efficient
procedures for calculating valuable results. Fortunately, adequate programming models
like MapReduce facilitate parallelization of text mining tasks and allow for processing
data sets beyond what would fit into memory by using a distributed file system possibly
spanning over several machines, e.g., in a cluster of workstations. In this paper we present
a plug-in package to tm called tm.plugin.dc implementing a distributed corpus class which
can take advantage of the Hadoop MapReduce library for large scale text mining tasks.
We show on the basis of an application in culturomics that we can efficiently handle data
sets of signifficant size. (authors' abstract)
|
14 |
Machine Learning Methods to Understand Textual DataUnknown Date (has links)
The amount of textual data that produce every minute on the internet is extremely high. Processing of this tremendous volume of mostly unstructured data is not a straightforward function. But the enormous amount of useful information that lay down on them motivate scientists to investigate efficient and effective techniques and algorithms to discover meaningful patterns. Social network applications provide opportunities for people around the world to be in contact and share their valuable knowledge, such as chat, comments, and discussion boards. People usually do not care about spelling and accurate grammatical construction of a sentence in everyday life conversations. Therefore, extracting information from such datasets are more complicated. Text mining can be a solution to this problem. Text mining is a knowledge
discovery process used to extract patterns from natural language. Application of text mining techniques on social networking websites can reveal a significant amount of information. Text mining in conjunction with social networks can be used for finding a general opinion about any special subject, human thinking patterns, and group identification. In this study, we investigate machine learning methods in textual data in six chapters. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2018. / FAU Electronic Theses and Dissertations Collection
|
15 |
Development of strategies for assessing reporting in biomedical research : moving toward enhancing reproducibilityFlorez Vargas, Oscar January 2016 (has links)
The idea that the same experimental findings can be reproduced by a variety of independent approaches is one of the cornerstones of science's claim to objective truth. However, in recent years, it has become clear that science is plagued by findings that cannot be reproduced and, consequently, invalidating research studies and undermining public trust in the research enterprise. The observed lack of reproducibility may be a result, among other things, of the lack of transparency or completeness in reporting. In particular, omissions in reporting the technical nature of the experimental method make it difficult to verify the findings of experimental research in biomedicine. In this context, the assessment of scientific reports could help to overcome - at least in part - the ongoing reproducibility crisis. In addressing this issue, this Thesis undertakes the challenge of developing strategies for the evaluation of reporting biomedical experimental methods in scientific manuscripts. Considering the complexity of experimental design - often involving different technologies and models, we characterise the problem in methods reporting through domain-specific checklists. Then, by using checklists as a decision making tool, supported by miniRECH - a spreadsheet-based approach that can be used by authors, editors and peer-reviewers - a reasonable level of consensus on reporting assessments was achieved regardless of the domain-specific expertise of referees. In addition, by using a text-mining system as a screening tool, a framework to guide an automated assessment of the reporting of bio-experiments was created. The usefulness of these strategies was demonstrated in some domain-specific scientific areas as well as in mouse models across biomedical research. In conclusion, we suggested that the strategies developed in this work could be implemented through the publication process as barriers to prevent incomplete reporting from entering the scientific literature, as well as promoters of completeness in reporting to improve the general value of the scientific evidence.
|
16 |
Aspect Based Sentiment Analysis On Review DataXue, Wei 04 December 2017 (has links)
With proliferation of user-generated reviews, new opportunities and challenges arise. The advance of Web technologies allows people to access a large amount of reviews of products and services online. Knowing what others like and dislike becomes increasingly important for their decision making in online shopping. The retailers also care more than ever about online reviews, because a vast pool of reviews enables them to monitor reputations and collect feedbacks efficiently. However, people often find difficult times in identifying and summarizing fine-grained sentiments buried in the opinion-rich resources. The traditional sentiment analysis, which focuses on the overall sentiments, fails to uncover the sentiments with regard to the aspects of the reviewed entities.
This dissertation studied the research problem of Aspect Based Sentiment Analysis (ABSA), which is to reveal the aspect-dependent sentiment information of review text. ABSA consists of several subtasks: 1) aspect extraction, 2) aspect term extraction, 3) aspect category classification, and 4) sentiment polarity classification at aspect level. We focused on the approach of topic models and neural networks for ABSA. First, to extract the aspects from a collection of reviews and to detect the sentiment polarity regarding the aspects in each review, we proposed a few probabilistic graphical models, which can model words distribution in reviews and aspect ratings at the same time. Second, we presented a multi-task learning model based on long-short term memory and convolutional neural network for aspect category classification and aspect term extraction. Third, for aspect-level sentiment polarity classification, we developed a gated convolution neural network, which can be applied to aspect category sentiment analysis as well as aspect target sentiment analysis.
|
17 |
Supply chain design: a conceptual model and tactical simulationsBrann, Jeremy Matthew 15 May 2009 (has links)
In current research literature, supply chain management (SCM) is a hot topic
breaching the boundaries of many academic disciplines. SCM-related work can be
found in the relevant literature for many disciplines. Supply chain management can be
defined as effectively and efficiently managing the flows (information, financial and
physical) in all stages of the supply chain to add value to end customers and gain profit
for all firms in the chain. Supply chains involve multiple partners with the common goal
to satisfy customer demand at a profit.
While supply chains are not new, the way academics and practitioners view the
need for and the means to manage these chains is relatively new. Very little literature
can be found on designing supply chains from the ground up or what dimensions of
supply chain management should be considered when designing a supply chain.
Additionally, we have found that very few tools exist to help during the design phase of
a supply chain. Moreover, very few tools exist that allow for comparing supply chain
designs.
We contribute to the current literature by determining which supply chain
management dimensions should be considered during the design process. We employ
text mining to create a supply chain design conceptual model and compare this model to existing supply chain models and reference frameworks. We continue to contribute to
the current SCM literature by applying a creative application of concepts and results in
the field of Stochastic Processes to build a custom simulator capable of comparing
different supply chain designs and providing insights into how the different designs
affect the supply chain’s total inventory cost. The simulator provides a mechanism for
testing when real-time demand information is more beneficial than using first-come,
first-serve (FCFS) order processing when the distributional form of lead-time demand is
derived from the supply chain operating characteristics instead of using the assumption
that lead-time demand distributions are known. We find that in many instances FCFS
out-performs the use of real-time information in providing the lowest total inventory
cost.
|
18 |
Incident Data Analysis Using Data Mining TechniquesVeltman, Lisa M. 16 January 2010 (has links)
There are several databases collecting information on various types of incidents, and
most analyses performed on these databases usually do not expand past basic trend
analysis or counting occurrences. This research uses the more robust methods of data
mining and text mining to analyze the Hazardous Substances Emergency Events
Surveillance (HSEES) system data by identifying relationships among variables,
predicting the occurrence of injuries, and assessing the value added by the text data. The
benefits of performing a thorough analysis of past incidents include better understanding
of safety performance, better understanding of how to focus efforts to reduce incidents,
and a better understanding of how people are affected by these incidents.
The results of this research showed that visually exploring the data via bar graphs did not
yield any noticeable patterns. Clustering the data identified groupings of categories
across the variable inputs such as manufacturing events resulting from intentional acts
like system startup and shutdown, performing maintenance, and improper dumping.
Text mining the data allowed for clustering the events and further description of the data,
however, these events were not noticeably distinct and drawing conclusions based on
these clusters was limited. Inclusion of the text comments to the overall analysis of
HSEES data greatly improved the predictive power of the models. Interpretation of the
textual data?s contribution was limited, however, the qualitative conclusions drawn were
similar to the model without textual data input. Although HSEES data is collected to
describe the effects hazardous substance releases/threatened releases have on people, a
fairly good predictive model was still obtained from the few variables identified as cause
related.
|
19 |
Feature Translation-based Multilingual Document Clustering TechniqueLiao, Shan-Yu 08 August 2006 (has links)
Document clustering automatically organizes a document collection into distinct groups of similar documents on the basis of their contents. Most of existing document clustering techniques deal with monolingual documents (i.e., documents written in one language). However, with the trend of globalization and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for multilingual document clustering (MLDC). Motivated by its significance and need, this study designs a translation-based MLDC technique. Our empirical evaluation results show that the proposed multilingual document clustering technique achieves satisfactory clustering effectiveness measured by both cluster recall and cluster precision.
|
20 |
Construction Gene Relation Network Using Text Mining and Bayesian NetworkChen, Shu-fen 11 September 2007 (has links)
In the organism, genes don¡¦t work independently. The interaction of genes shows how the functional task affects. Observing the interaction can understand what the relation between genes and how the disease caused. Several methods are adopted to observe the interaction to construct gene relation network. Existing algorithms to construct gene relation network can be classified into two types. One is to use literatures to extract the relation between genes. The other is to construct the network, but the relations between genes are not described. In this thesis, we proposed a hybrid method based on these two methods. Bayesian network is applied to the microarray gene expression data to construct gene network. Text mining is used to extract the gene relations from the documents database. The proposed algorithm integrates gene network and gene relations into gene relation networks. Experimental results show that the related genes are connected in the network. Besides, the relations are also marked on the links of the related genes.
|
Page generated in 0.0774 seconds