Global ETD Search

181	Exploring Node Attributes for Data Mining in Attributed Graphs Jihwan Lee (6639122) 10 June 2019 (has links) Graphs have attracted researchers in various fields in that many different kinds of real-world entities and relationships between them can be represented and analyzed effectively and efficiently using graphs. In particular, researchers in data mining and machine learning areas have developed algorithms and models to understand the complex graph data better and perform various data mining tasks. While a large body of work exists on graph mining, most existing work does not fully exploit attributes attached to graph nodes or edges.<div><br></div><div>In this dissertation, we exploit node attributes to generate better solutions to several graph data mining problems addressed in the literature. First, we introduce the notion of statistically significant attribute associations in attribute graphs and propose an effective and efficient algorithm to discover those associations. The effectiveness analysis on the results shows that our proposed algorithm can reveal insightful attribute associations that cannot be identified using the earlier methods focused solely on frequency. Second, we build a probabilistic generative model for observed attributed graphs. Under the assumption that there exist hidden communities behind nodes in a graph, we adopt the idea of latent topic distributions to model a generative process of node attribute values and link structure more precisely. This model can be used to detect hidden communities and profile missing attribute values. Lastly, we investigate how to employ node attributes to learn latent representations of nodes in lower dimensional embedding spaces and use the learned representations to improve the performance of data mining tasks over attributed graphs.<br></div> Pattern Recognition and Data Mining Attributed Graphs Data Mining Machine Learning
182	Anwenderspezifische Reduzierung von Mengen interessanter Assoziationsregeln mittels Evolutionärer Algorithmen Wenke, Birgit January 2008 (has links) Zugl.: München, Univ. der Bundeswehr, Diss., 2008
183	Accuracy versus cost in distributed data mining / Deutschman, Stephanie. January 1900 (has links) Thesis (M.S.)--Oregon State University, 2008. / Printout. Includes bibliographical references (leaves 62-64). Also available on the World Wide Web.
184	Modeling and computational strategies for medical decision making Yuan, Fan 27 May 2016 (has links) In this dissertation, we investigate three topics: predictive models for disease diagnosis and patient behavior, optimization for cancer treatment planning, and public health decision making for infectious disease prevention. In the first topic, we propose a multi-stage classification framework that incorporates Particle Swarm Optimization (PSO) for feature selection and discriminant analysis via mixed integer programming (DAMIP) for classification. By utilizing the reserved judgment region, it allows the classifier to delay making decisions on ‘difficult-to-classify’ observations and develop new classification rules in later stage. We apply the framework to four real-life medical problems: 1) Patient readmissions: identifies the patients in emergency department who return within 72 hours using patient’s demographic information, complaints, diagnosis, tests, and hospital real-time utility. 2) Flu vaccine responder: predicts high/low responders of flu vaccine on subjects in 5 years using gene signatures. 3) Knee reinjection: predicts whether a patient needs to take a second surgery within 3 years of his/her first knee injection and tackles with missing data. 4) Alzheimer’s disease: distinguishes subjects in normal, mild cognitive impairment (MCI), and Alzheimer’s disease (AD) groups using neuropsychological tests. In the second topic, we first investigate multi-objective optimization approaches to determine the optimal dose configuration and radiation seed locations in brachytherapy treatment planning. Tumor dose escalation and dose-volume constraints on critical organs are incorporated to kill the tumor while preserving the functionality of organs. Based on the optimization framework, we propose a non-linear optimization model that optimizes the tumor control probability (TCP). The model is solved by a solution strategy that incorporates piecewise linear approximation and local search. In the third topic, we study optimal strategies for public health emergencies under limited resources. First we investigate the vaccination strategies against a pandemic flu to find the optimal strategy when limited vaccines are available by constructing a mathematical model for the course of the 2009 H1N1 pandemic flu and the process of the vaccination. Second, we analyze the cost-effectiveness of emergency response strategies again a large-scale anthrax attack to protect the entire regional population. Healthcare Operations research Data mining
185	Efficient decision tree building algorithms for uncertain data Tsang, Pui-kwan, Smith., 曾沛坤. January 2008 (has links) published_or_final_version / Computer Science / Master / Master of Philosophy Decision trees. Data mining. Algorithms.
186	New results on online job scheduling and data stream algorithms Lee, Lap-kei, 李立基 January 2009 (has links) published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy Computer scheduling. Data mining. Algorithms.
187	Cluster analysis on uncertain data Ngai, Wang-kay., 倪宏基. January 2008 (has links) published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy Cluster analysis. Data mining. Algorithms.
188	Privacy Preserving Data Mining Operations without Disrupting Data Quality B.Swapna, R.VijayaPrakash 01 December 2012 (has links) Data mining operations have become prevalent as they can extract trends or patterns that help in taking good business decisions. Often they operate on large historical databases or data warehouses to obtain actionable knowledge or business intelligence that helps in taking well informed decisions. In the data mining domain there came many tools to perform data mining operations. These tools are best used to obtain actionable knowledge from data. Manually doing this is not possible as the data is very huge and takes lot of time. Thus the data mining domain is being improved in a rapid pace. While data mining operations are very useful in obtaining business intelligence, they also have some drawbacks that are they get sensitive information from the database. People may misuse the freedom given by obtaining sensitive information illegally. Preserving privacy of data is also important. Towards this end many Privacy Preserving Data Mining (PPDM) algorithms came into existence that sanitize data to prevent data mining algorithms from extracting sensitive information from the databases. / Data mining operations help discover business intelligence from historical data. The extracted business intelligence or actionable knowledge helps in taking well informed decisions that leads to profit to the organization that makes use of it. While performing mining privacy of data has to be given utmost importance. To achieve this PPDM (Privacy Preserving Data Mining) came into existence by sanitizing database that prevents discovery of association rules. However, this leads to modification of data and thus disrupting the quality of data. This paper proposes a new technique and algorithms that can perform privacy preserving data mining operations while ensuring that the data quality is not lost. The empirical results revealed that the proposed technique is useful and can be used in real world applications. data mining PPDM sanitization algorithms
189	Detecting Deception in Interrogation Settings Lamb, CAROLYN 18 December 2012 (has links) Bag-of-words deception detection systems outperform humans, but are still not always accurate enough to be useful. In interrogation settings, present models do not take into account potential influence of the words in a question on the words in the answer. According to the theory of verbal mimicry, this ought to exist. We show with our research that it does exist: certain words in a question can "prompt" other words in the answer. However, the effect is receiver-state-dependent. Deceptive and truthful subjects in archival data respond to prompting in different ways. We can improve the accuracy of a bag-of-words deception model by training a machine learning algorithm on both question words and answer words, allowing it to pick up on differences in the relationships between these words. This approach should generalize to other bag-of-words models of psychological states in dialogues. / Thesis (Master, Computing) -- Queen's University, 2012-12-17 14:42:19.707 data mining deception text mining
190	Customer Churn Predictive Heuristics from Operator and Users' Perspective MOUNIKA REDDY, CHANDIRI January 2016 (has links) Telecommunication organizations are confronting in expanding client administration weight as they launch various user-desired services. Conveying poor client encounters puts client connections and incomes at danger. One of the metrics used by telecommunications companies to determine their relationship with customers is “Churn”. After substantial research in the field of churn prediction over many years, Big Data analytics with Data Mining techniques was found to be an efficient way for identifying churn. These techniques are usually applied to predict customer churn by building models, pattern classification and learning from historical data. Although some work has already been undertaken with regards to users’ perspective, it appears to be in its infancy. The aim of this thesis is to validate churn predictive heuristics from the operator perspective and close to user end. Conducting experiments with different sections of people regarding their data usage, designing a model, which is close to the user end and fitting with the data obtained through the survey done. Correlating the examined churn indicators and their validation, validation with the traffic volume variation with the users’ feedback collected by accompanying theses. A Literature review is done to analyze previous works and find out the difficulties faced in analyzing the users’ feeling, also to understand methodologies to get around problems in handling the churn prediction algorithms accuracy. Experiments are conducted with different sections of people across the globe. Their experiences with quality of calls, data and if they are looking to change in future, what would be their reasons of churn be, are analyzed. Their feedback will be validated using existing heuristics. The collected data set is analyzed by statistical analysis and validated for different datasets obtained by operators’ data. Also statistical and Big Data analysis has been done with data provided by an operator’s active and churned customers monthly data volume usage. A possible correlation of the user churn with users’ feedback will be studied by calculating the percentages and further correlate the results with that of the operators’ data and the data produced by the mobile app. The results show that the monthly volumes have not shown much decision power and the need for additional attributes such as higher time resolution, age, gender and others are needed. Whereas the survey done globally has shown similarities with the operator’s customers’ feedback and issues “around the globe” such a data plan issues, pricing, issues with connectivity and speed. Nevertheless, data preprocessing and feature selection has shown to be the key factors. Churn predictive models have given a better classification of 69.7 % when more attributes were provided. Telecom Operators’ data classification have given an accuracy of 51.7 % after preprocessing and for the variables we choose. Finally, a close observation of the end user revealed the possibility to yield a much higher classification precision of 95.2 %. Churn Prediction Data mining Telecommunication

Search results