Spelling suggestions: "subject:"data minining"" "subject:"data chanining""
181 |
Exploring Node Attributes for Data Mining in Attributed GraphsJihwan Lee (6639122) 10 June 2019 (has links)
Graphs have attracted researchers in various fields in that many different kinds of real-world entities and relationships between them can be represented and analyzed effectively and efficiently using graphs. In particular, researchers in data mining and machine learning areas have developed algorithms and models to understand the complex graph data better and perform various data mining tasks. While a large body of work exists on graph mining, most existing work does not fully exploit attributes attached to graph nodes or edges.<div><br></div><div>In this dissertation, we exploit node attributes to generate better solutions to several graph data mining problems addressed in the literature. First, we introduce the notion of statistically significant attribute associations in attribute graphs and propose an effective and efficient algorithm to discover those associations. The effectiveness analysis on the results shows that our proposed algorithm can reveal insightful attribute associations that cannot be identified using the earlier methods focused solely on frequency. Second, we build a probabilistic generative model for observed attributed graphs. Under the assumption that there exist hidden communities behind nodes in a graph, we adopt the idea of latent topic distributions to model a generative process of node attribute values and link structure more precisely. This model can be used to detect hidden communities and profile missing attribute values. Lastly, we investigate how to employ node attributes to learn latent representations of nodes in lower dimensional embedding spaces and use the learned representations to improve the performance of data mining tasks over attributed graphs.<br></div>
|
182 |
Anwenderspezifische Reduzierung von Mengen interessanter Assoziationsregeln mittels Evolutionärer AlgorithmenWenke, Birgit January 2008 (has links)
Zugl.: München, Univ. der Bundeswehr, Diss., 2008
|
183 |
Accuracy versus cost in distributed data mining /Deutschman, Stephanie. January 1900 (has links)
Thesis (M.S.)--Oregon State University, 2008. / Printout. Includes bibliographical references (leaves 62-64). Also available on the World Wide Web.
|
184 |
Modeling and computational strategies for medical decision makingYuan, Fan 27 May 2016 (has links)
In this dissertation, we investigate three topics: predictive models for disease diagnosis and patient behavior, optimization for cancer treatment planning, and public health decision making for infectious disease prevention. In the first topic, we propose a multi-stage classification framework that incorporates Particle Swarm Optimization (PSO) for feature selection and discriminant analysis via mixed integer programming (DAMIP) for classification. By utilizing the reserved judgment region, it allows the classifier to delay making decisions on ‘difficult-to-classify’ observations and develop new classification rules in later stage. We apply the framework to four real-life medical problems: 1) Patient readmissions: identifies the patients in emergency department who return within 72 hours using patient’s demographic information, complaints, diagnosis, tests, and hospital real-time utility. 2) Flu vaccine responder: predicts high/low responders of flu vaccine on subjects in 5 years using gene signatures. 3) Knee reinjection: predicts whether a patient needs to take a second surgery within 3 years of his/her first knee injection and tackles with missing data. 4) Alzheimer’s disease: distinguishes subjects in normal, mild cognitive impairment (MCI), and Alzheimer’s disease (AD) groups using neuropsychological tests. In the second topic, we first investigate multi-objective optimization approaches to determine the optimal dose configuration and radiation seed locations in brachytherapy treatment planning. Tumor dose escalation and dose-volume constraints on critical organs are incorporated to kill the tumor while preserving the functionality of organs. Based on the optimization framework, we propose a non-linear optimization model that optimizes the tumor control probability (TCP). The model is solved by a solution strategy that incorporates piecewise linear approximation and local search.
In the third topic, we study optimal strategies for public health emergencies under limited resources. First we investigate the vaccination strategies against a pandemic flu to find the optimal strategy when limited vaccines are available by constructing a mathematical model for the course of the 2009 H1N1 pandemic flu and the process of the vaccination. Second, we analyze the cost-effectiveness of emergency response strategies again a large-scale anthrax attack to protect the entire regional population.
|
185 |
Efficient decision tree building algorithms for uncertain dataTsang, Pui-kwan, Smith., 曾沛坤. January 2008 (has links)
published_or_final_version / Computer Science / Master / Master of Philosophy
|
186 |
New results on online job scheduling and data stream algorithmsLee, Lap-kei, 李立基 January 2009 (has links)
published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
|
187 |
Cluster analysis on uncertain dataNgai, Wang-kay., 倪宏基. January 2008 (has links)
published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
|
188 |
Privacy Preserving Data Mining Operations without Disrupting Data QualityB.Swapna, R.VijayaPrakash 01 December 2012 (has links)
Data mining operations have become prevalent as they can
extract trends or patterns that help in taking good business
decisions. Often they operate on large historical databases
or data warehouses to obtain actionable knowledge or
business intelligence that helps in taking well informed
decisions. In the data mining domain there came many
tools to perform data mining operations. These tools are
best used to obtain actionable knowledge from data.
Manually doing this is not possible as the data is very huge
and takes lot of time. Thus the data mining domain is
being improved in a rapid pace. While data mining
operations are very useful in obtaining business
intelligence, they also have some drawbacks that are they
get sensitive information from the database. People may
misuse the freedom given by obtaining sensitive
information illegally. Preserving privacy of data is also
important. Towards this end many Privacy Preserving
Data Mining (PPDM) algorithms came into existence that
sanitize data to prevent data mining algorithms from
extracting sensitive information from the databases. / Data mining operations help discover business intelligence from
historical data. The extracted business intelligence or actionable
knowledge helps in taking well informed decisions that leads to
profit to the organization that makes use of it. While performing
mining privacy of data has to be given utmost importance. To
achieve this PPDM (Privacy Preserving Data Mining) came into
existence by sanitizing database that prevents discovery of
association rules. However, this leads to modification of data and
thus disrupting the quality of data. This paper proposes a new
technique and algorithms that can perform privacy preserving
data mining operations while ensuring that the data quality is not
lost. The empirical results revealed that the proposed technique is
useful and can be used in real world applications.
|
189 |
Detecting Deception in Interrogation SettingsLamb, CAROLYN 18 December 2012 (has links)
Bag-of-words deception detection systems outperform humans, but are still not always accurate enough to be useful. In interrogation settings, present models do not take into account potential influence of the words in a question on the words in the answer. According to the theory of verbal mimicry, this ought to exist. We show with our research that it does exist: certain words in a question can "prompt" other words in the answer. However, the effect is receiver-state-dependent. Deceptive and truthful subjects in archival data respond to prompting in different ways. We can improve the accuracy of a bag-of-words deception model by training a machine learning algorithm on both question words and answer words, allowing it to pick up on differences in the relationships between these words. This approach should generalize to other bag-of-words models of psychological states in dialogues. / Thesis (Master, Computing) -- Queen's University, 2012-12-17 14:42:19.707
|
190 |
Customer Churn Predictive Heuristics from Operator and Users' PerspectiveMOUNIKA REDDY, CHANDIRI January 2016 (has links)
Telecommunication organizations are confronting in expanding client administration weight as they launch various user-desired services. Conveying poor client encounters puts client connections and incomes at danger. One of the metrics used by telecommunications companies to determine their relationship with customers is “Churn”. After substantial research in the field of churn prediction over many years, Big Data analytics with Data Mining techniques was found to be an efficient way for identifying churn. These techniques are usually applied to predict customer churn by building models, pattern classification and learning from historical data. Although some work has already been undertaken with regards to users’ perspective, it appears to be in its infancy. The aim of this thesis is to validate churn predictive heuristics from the operator perspective and close to user end. Conducting experiments with different sections of people regarding their data usage, designing a model, which is close to the user end and fitting with the data obtained through the survey done. Correlating the examined churn indicators and their validation, validation with the traffic volume variation with the users’ feedback collected by accompanying theses. A Literature review is done to analyze previous works and find out the difficulties faced in analyzing the users’ feeling, also to understand methodologies to get around problems in handling the churn prediction algorithms accuracy. Experiments are conducted with different sections of people across the globe. Their experiences with quality of calls, data and if they are looking to change in future, what would be their reasons of churn be, are analyzed. Their feedback will be validated using existing heuristics. The collected data set is analyzed by statistical analysis and validated for different datasets obtained by operators’ data. Also statistical and Big Data analysis has been done with data provided by an operator’s active and churned customers monthly data volume usage. A possible correlation of the user churn with users’ feedback will be studied by calculating the percentages and further correlate the results with that of the operators’ data and the data produced by the mobile app. The results show that the monthly volumes have not shown much decision power and the need for additional attributes such as higher time resolution, age, gender and others are needed. Whereas the survey done globally has shown similarities with the operator’s customers’ feedback and issues “around the globe” such a data plan issues, pricing, issues with connectivity and speed. Nevertheless, data preprocessing and feature selection has shown to be the key factors. Churn predictive models have given a better classification of 69.7 % when more attributes were provided. Telecom Operators’ data classification have given an accuracy of 51.7 % after preprocessing and for the variables we choose. Finally, a close observation of the end user revealed the possibility to yield a much higher classification precision of 95.2 %.
|
Page generated in 0.0668 seconds