Spelling suggestions: "subject:"[een] TEXT MINING"" "subject:"[enn] TEXT MINING""
1 |
Mining of identity theft stories to model and assess identity threat behaviorsYang, Yongpeng 18 September 2014 (has links)
Identity theft is an ever-present and ever-growing issue in our society. Identity theft, fraud and abuse are present and growing in every market sector. The data available to describe how these identity crimes are conducted and the consequences for victims is often recorded in stories and reports by the news press, fraud examiners and law enforcement. To translate and analyze these stories in this very unstructured format, this thesis first discusses the collection of identity theft data automatically using text mining techniques from the online news stories and reports on the topic of identity theft. The collected data are used to enrich the ITAP (Identity Threat Assessment and Prediction) Project repository under development at the Center for Identity at The University of Texas. Moreover, this thesis shows the statistics of common behaviors and resources used by identity thieves and fraudsters — identity attributes used to identify people, resources employed to conduct the identity crime, and patterns of identity criminal behavior. Analysis of these results should help researchers to better understand identity threat behaviors, offer people early warning signs and thwart future identity theft crimes. / text
|
2 |
Text mining im Customer-relationship-ManagementRentzmann, René January 2007 (has links)
Zugl.: Eichstätt, Ingolstadt, Univ., Diss., 2007
|
3 |
Integrating text-mining approaches to identify entities and extract events from the biomedical literatureGerner, Lars Martin Anders January 2012 (has links)
The amount of biomedical literature available is increasing at an exponential rate and is becoming increasingly difficult to navigate. Text-mining methods can potentially mitigate this problem, through the systematic and large-scale extraction of structured information from inherently unstructured biomedical text. This thesis reports the development of four text-mining systems that, by building on each other, has enabled the extraction of information about a large number of published statements in the biomedical literature. The first system, LINNAEUS, enables highly accurate detection ('recognition') and identification ('normalization') of species names in biomedical articles. Building on LINNAEUS, we implemented a range of improvements in the GNAT system, enabling high-throughput gene/protein detection and identification. Using gene/protein identification from GNAT, we developed the Gene Expression Text Miner (GETM), which extracts information about gene expression statements. Finally, building on GETM as a pilot project, we constructed the BioContext integrated event extraction system, which was used to extract information about over 11 million distinct biomolecular processes in 10.9 million abstracts and 230,000 full-text articles. The ability to detect negated statements in the BioContext system enables the preliminary analysis of potential contradictions in the biomedical literature. All tools (LINNAEUS, GNAT, GETM, and BioContext) are available under open-source software licenses, and LINNAEUS and GNAT are available as online web-services. All extracted data (36 million BioContext statements, 720,000 GETM statements, 72,000 contradictions, 37 million mentions of species names, 80 million mentions of gene names, and 57 million mentions of anatomical location names) is available for bulk download. In addition, the data extracted by GETM and BioContext is also available to biologists through easy-to-use search interfaces.
|
4 |
Identify Opiod Use ProblemAlzeer, Abdullah Hamad 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The aim of this research is to design a new method to identify the opioid use
problems (OUP) among long-term opioid therapy patients in Indiana University
Health using text mining and machine learning approaches. First, a systematic review
was conducted to investigate the current variables, methods, and opioid problem
definitions used in the literature. We identified 75 distinct variables in 9 models that
majorly used ICD codes to identify the opioid problem (OUP). The review concluded
that using ICD codes alone may not be enough to determine the real size of the opioid
problem and more effort is needed to adopt other methods to understand the issue.
Next, we developed a text mining approach to identify OUP and compared the results
with the current conventional method of identifying OUP using ICD-9 codes.
Following the institutional review board and an approval from the Regenstrief
Institute, structured and unstructured data of 14,298 IUH patients were collected
from the Indiana Network for Patient Care. Our text mining approach identified 127
opioid cases compared to 45 cases identified by ICD codes. We concluded that the text
mining approach may be used successfully to identify OUP from patients clinical
notes. Moreover, we developed a machine learning approach to identify OUP by
analyzing patients’ clinical notes. Our model was able to classify positive OUP from
clinical notes with a sensitivity of 88% on unseen data. We concluded that the
machine learning approach may be used successfully to identify the opioid use
problem from patients’ clinical notes. / 2019-06-21
|
5 |
Detecting Deception in Interrogation SettingsLamb, CAROLYN 18 December 2012 (has links)
Bag-of-words deception detection systems outperform humans, but are still not always accurate enough to be useful. In interrogation settings, present models do not take into account potential influence of the words in a question on the words in the answer. According to the theory of verbal mimicry, this ought to exist. We show with our research that it does exist: certain words in a question can "prompt" other words in the answer. However, the effect is receiver-state-dependent. Deceptive and truthful subjects in archival data respond to prompting in different ways. We can improve the accuracy of a bag-of-words deception model by training a machine learning algorithm on both question words and answer words, allowing it to pick up on differences in the relationships between these words. This approach should generalize to other bag-of-words models of psychological states in dialogues. / Thesis (Master, Computing) -- Queen's University, 2012-12-17 14:42:19.707
|
6 |
Extraction of Causal-Association Networks from Unstructured Text DataBojduj, Brett N 01 June 2009 (has links)
Causality is an expression of the interactions between variables in a system. Humans often explicitly express causal relations through natural language, so extracting these relations can provide insight into how a system functions. This thesis presents a system that uses a grammar parser to extract causes and effects from unstructured text through a simple, pre-defined grammar pattern. By filtering out non-causal sentences before the extraction process begins, the presented methodology is able to achieve a precision of 85.91% and a recall of 73.99%. The polarity of the extracted relations is then classified using a Fisher classifier. The result is a set of directed relations of causes and effects, with polarity as either increasing or decreasing. These relations can then be used to create networks of causes and effects. This “Causal-Association Network” (CAN) can be used to aid decision-making in complex domains such as economics or medicine, that rely upon dynamic interactions between many variables.
|
7 |
A Study of Visualization Method with HK Graph Using Concept WordsHirao, Eiji, Furuhashi, Takeshi, Yoshikawa, Tomohiro, Kobayashi, Daisuke January 2010 (has links)
Session ID: TH-B1-3 / SCIS & ISIS 2010, Joint 5th International Conference on Soft Computing and Intelligent Systems and 11th International Symposium on Advanced Intelligent Systems. December 8-12, 2010, Okayama Convention Center, Okayama, Japan
|
8 |
Einsatz von Text Mining zur Prognose kurzfristiger Trends von Aktienkursen nach der Publikation von UnternehmensnachrichtenMittermayer, Marc-André January 2005 (has links)
Zugl.: Bern, Univ., Diss., 2005
|
9 |
Caracterización de perfiles influyentes en Twitter de acuerdo a tópicos de opinión y la generación de contenido interesanteVera Cid, Felipe Andrés January 2015 (has links)
Ingeniero Civil Industrial / Durante los últimos años en Chile ha aumentado el uso de Internet, de smartphones y de las redes sociales. Entre todas las redes sociales cabe destacar Twitter, dada la visibilidad que tiene al ser una red más abierta que otras. En Chile, el uso de Twitter se concentra en dos tipos: informarse y opinar. La cantidad de opiniones que se registran en Twitter es de gran interés para distintos actores del país, entre los cuales se encuentran empresas que utilizan Twitter como una herramienta de comunicación con sus clientes, para resolver quejas y dudas y hasta para realizar campañas de marketing viral en la red. Dada la masificación de Twitter y la gran cantidad de usuarios, existe la necesidad de poder saber el nivel de influencia de los usuarios y así poder priorizarlos en la resolución de sus necesidad como también poder hacer más efectivas diversas campañas de marketing.
Hoy en día, existen diversos servicios que realizan este tipo de tareas, como Klout o BrandMetric. Sin embargo, estos modelos miden la influencia de los usuarios de diversas formas, pero ninguno intenta vaticinar a los usuarios que se volverán influyentes en un futuro próximo. El presente trabajo consiste en definir una influencia en Twitter para luego ver se proyectaría en el tiempo, tomando como hipótesis que es posible medir la influencia de un usuario a partir de su generación de contenido interesante, para lograrlo se definió la influencia en la red de Twitter como la capacidad de generar contenido interesante que repercute en la red social. Viendo los modelos existentes se escogió uno y se modificó levemente para poder obtener un puntaje de lo interesante del contenido generado por un perfil.
Dado este modelo se generaron rankings sobre la influencia de un usuario en Twitter, además de rankings en agrupaciones de tópicos asociadas a política y deportes. No se pudo segregar en una mayor cantidad de tópicos por diversos motivos, por lo cual no se consideró que el modelo haya cumplido su objetivo de generar rankings de influencia para distintos grupos de tópicos. Luego, se realizaron los análisis de la predictibilidad para la influencia modelada, llegando a la conclusión que el periodo de datos es muy corto para poder predecir las series temporales.
Aunque los resultados pueden parecer desalentadores, el trabajo realizado deja un camino abierto para realizar otros enfoques y trabajos que son explicados en el capítulo final de la memoria. Así, se espera que una buena segmentación y priorización de perfiles puede servir para mejorar la resolución de problemas, encontrar perfiles que serán influyentes en determinados tópicos y focalizar campañas de marketing utilizando perfiles que no sean de un alto costo.
|
10 |
Machine Learning Methods to Understand Textual DataUnknown Date (has links)
The amount of textual data that produce every minute on the internet is extremely high. Processing of this tremendous volume of mostly unstructured data is not a straightforward function. But the enormous amount of useful information that lay down on them motivate scientists to investigate efficient and effective techniques and algorithms to discover meaningful patterns. Social network applications provide opportunities for people around the world to be in contact and share their valuable knowledge, such as chat, comments, and discussion boards. People usually do not care about spelling and accurate grammatical construction of a sentence in everyday life conversations. Therefore, extracting information from such datasets are more complicated. Text mining can be a solution to this problem. Text mining is a knowledge
discovery process used to extract patterns from natural language. Application of text mining techniques on social networking websites can reveal a significant amount of information. Text mining in conjunction with social networks can be used for finding a general opinion about any special subject, human thinking patterns, and group identification. In this study, we investigate machine learning methods in textual data in six chapters. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2018. / FAU Electronic Theses and Dissertations Collection
|
Page generated in 0.0467 seconds