1 |
Mining of identity theft stories to model and assess identity threat behaviorsYang, Yongpeng 18 September 2014 (has links)
Identity theft is an ever-present and ever-growing issue in our society. Identity theft, fraud and abuse are present and growing in every market sector. The data available to describe how these identity crimes are conducted and the consequences for victims is often recorded in stories and reports by the news press, fraud examiners and law enforcement. To translate and analyze these stories in this very unstructured format, this thesis first discusses the collection of identity theft data automatically using text mining techniques from the online news stories and reports on the topic of identity theft. The collected data are used to enrich the ITAP (Identity Threat Assessment and Prediction) Project repository under development at the Center for Identity at The University of Texas. Moreover, this thesis shows the statistics of common behaviors and resources used by identity thieves and fraudsters — identity attributes used to identify people, resources employed to conduct the identity crime, and patterns of identity criminal behavior. Analysis of these results should help researchers to better understand identity threat behaviors, offer people early warning signs and thwart future identity theft crimes. / text
|
2 |
Text mining im Customer-relationship-ManagementRentzmann, René January 2007 (has links)
Zugl.: Eichstätt, Ingolstadt, Univ., Diss., 2007
|
3 |
Integrating text-mining approaches to identify entities and extract events from the biomedical literatureGerner, Lars Martin Anders January 2012 (has links)
The amount of biomedical literature available is increasing at an exponential rate and is becoming increasingly difficult to navigate. Text-mining methods can potentially mitigate this problem, through the systematic and large-scale extraction of structured information from inherently unstructured biomedical text. This thesis reports the development of four text-mining systems that, by building on each other, has enabled the extraction of information about a large number of published statements in the biomedical literature. The first system, LINNAEUS, enables highly accurate detection ('recognition') and identification ('normalization') of species names in biomedical articles. Building on LINNAEUS, we implemented a range of improvements in the GNAT system, enabling high-throughput gene/protein detection and identification. Using gene/protein identification from GNAT, we developed the Gene Expression Text Miner (GETM), which extracts information about gene expression statements. Finally, building on GETM as a pilot project, we constructed the BioContext integrated event extraction system, which was used to extract information about over 11 million distinct biomolecular processes in 10.9 million abstracts and 230,000 full-text articles. The ability to detect negated statements in the BioContext system enables the preliminary analysis of potential contradictions in the biomedical literature. All tools (LINNAEUS, GNAT, GETM, and BioContext) are available under open-source software licenses, and LINNAEUS and GNAT are available as online web-services. All extracted data (36 million BioContext statements, 720,000 GETM statements, 72,000 contradictions, 37 million mentions of species names, 80 million mentions of gene names, and 57 million mentions of anatomical location names) is available for bulk download. In addition, the data extracted by GETM and BioContext is also available to biologists through easy-to-use search interfaces.
|
4 |
Corpus construction based on Ontological domain knowledgeBenis, Nirupama, Kaliyaperumal, Rajaram January 2011 (has links)
The purpose of this thesis is to contribute a corpus for sentence level interpretation of biomedical language. The available corpora for the biomedical domain are small in terms of amount of text and predicates. Besides that these corpora are developed rather intuitively. In this effort which we call BioOntoFN, we created a corpus from the domain knowledge provided by an ontology. By doing this we believe that we can provide a rough set of rules to create corpora from ontologies. Besides that we also designed an annotation tool specifically for building our corpus. We built a corpus for biological transport events. The ontology we used is the piece of Gene Ontology pertaining to transport, the term transport GO: 0006810 and all of its child concepts, which could be called a sub-ontology. The annotation of the corpus follows the rules of FrameNet and the output is annotated text that is in an XML format similar to that of FrameNet. The text for the corpus is taken from abstracts of MEDLINE articles. The annotation tool is a GUI created using Java.
|
5 |
Identify Opiod Use ProblemAlzeer, Abdullah Hamad 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The aim of this research is to design a new method to identify the opioid use
problems (OUP) among long-term opioid therapy patients in Indiana University
Health using text mining and machine learning approaches. First, a systematic review
was conducted to investigate the current variables, methods, and opioid problem
definitions used in the literature. We identified 75 distinct variables in 9 models that
majorly used ICD codes to identify the opioid problem (OUP). The review concluded
that using ICD codes alone may not be enough to determine the real size of the opioid
problem and more effort is needed to adopt other methods to understand the issue.
Next, we developed a text mining approach to identify OUP and compared the results
with the current conventional method of identifying OUP using ICD-9 codes.
Following the institutional review board and an approval from the Regenstrief
Institute, structured and unstructured data of 14,298 IUH patients were collected
from the Indiana Network for Patient Care. Our text mining approach identified 127
opioid cases compared to 45 cases identified by ICD codes. We concluded that the text
mining approach may be used successfully to identify OUP from patients clinical
notes. Moreover, we developed a machine learning approach to identify OUP by
analyzing patients’ clinical notes. Our model was able to classify positive OUP from
clinical notes with a sensitivity of 88% on unseen data. We concluded that the
machine learning approach may be used successfully to identify the opioid use
problem from patients’ clinical notes. / 2019-06-21
|
6 |
Detecting Deception in Interrogation SettingsLamb, CAROLYN 18 December 2012 (has links)
Bag-of-words deception detection systems outperform humans, but are still not always accurate enough to be useful. In interrogation settings, present models do not take into account potential influence of the words in a question on the words in the answer. According to the theory of verbal mimicry, this ought to exist. We show with our research that it does exist: certain words in a question can "prompt" other words in the answer. However, the effect is receiver-state-dependent. Deceptive and truthful subjects in archival data respond to prompting in different ways. We can improve the accuracy of a bag-of-words deception model by training a machine learning algorithm on both question words and answer words, allowing it to pick up on differences in the relationships between these words. This approach should generalize to other bag-of-words models of psychological states in dialogues. / Thesis (Master, Computing) -- Queen's University, 2012-12-17 14:42:19.707
|
7 |
Extraction of Causal-Association Networks from Unstructured Text DataBojduj, Brett N 01 June 2009 (has links)
Causality is an expression of the interactions between variables in a system. Humans often explicitly express causal relations through natural language, so extracting these relations can provide insight into how a system functions. This thesis presents a system that uses a grammar parser to extract causes and effects from unstructured text through a simple, pre-defined grammar pattern. By filtering out non-causal sentences before the extraction process begins, the presented methodology is able to achieve a precision of 85.91% and a recall of 73.99%. The polarity of the extracted relations is then classified using a Fisher classifier. The result is a set of directed relations of causes and effects, with polarity as either increasing or decreasing. These relations can then be used to create networks of causes and effects. This “Causal-Association Network” (CAN) can be used to aid decision-making in complex domains such as economics or medicine, that rely upon dynamic interactions between many variables.
|
8 |
Comparing Naïve Bayes Classifiers with Support Vector Machines for Predicting Protein Subcellular Location Using Text FeaturesLam, Yin 07 July 2010 (has links)
Proteins play many roles in the body, and the task of understanding how proteins function is very challenging. Determining a protein’s location within the cell (also referred to as the subcellular location) helps shed light on the function of that protein. Protein subcellular location can be inferred through experimental methods or predicted using computational systems. In particular, we focus on two existing computational systems, namely EpiLoc and HomoLoc, that use features derived from text (abstracts of technical papers), and apply a support vector machine (SVM) classifier to classify proteins into their respective locations. Both EpiLoc and HomoLoc’s prediction accuracy is comparable to that of state-of-the-art protein location prediction systems. However, in addition to accuracy, other factors such as training efficiency must be considered in evaluating the quality of a location prediction system. In this thesis, we replace the SVM classifier in EpiLoc and HomoLoc, by a naïve Bayes classifier and by a novel classifier which we call the Mean Weight Text classifier. The Mean Weight Text classifier and the naïve Bayes classifier are simple to implement and execute efficiently. In addition, naïve Bayes classifiers have been shown effective in the context of protein location prediction and are considered preferable to SVM due to clarity in explaining the process used to derive the results. Evaluating the performance of these classifiers on existing data sets, we find that SVM classifiers have a slightly higher accuracy than naïve Bayes and Mean Weight Text classifiers. This slight advantage is offset by the simplicity and efficiency offered by naïve Bayes and Mean Weight Text classifiers. Moreover, we find that the Mean Weight Text classifier has a slightly higher accuracy than the naïve Bayes classifier. / Thesis (Master, Computing) -- Queen's University, 2010-07-06 11:06:47.613
|
9 |
A Study of Visualization Method with HK Graph Using Concept WordsHirao, Eiji, Furuhashi, Takeshi, Yoshikawa, Tomohiro, Kobayashi, Daisuke January 2010 (has links)
Session ID: TH-B1-3 / SCIS & ISIS 2010, Joint 5th International Conference on Soft Computing and Intelligent Systems and 11th International Symposium on Advanced Intelligent Systems. December 8-12, 2010, Okayama Convention Center, Okayama, Japan
|
10 |
Einsatz von Text Mining zur Prognose kurzfristiger Trends von Aktienkursen nach der Publikation von UnternehmensnachrichtenMittermayer, Marc-André January 2005 (has links)
Zugl.: Bern, Univ., Diss., 2005
|
Page generated in 0.032 seconds