Global ETD Search

191	A novel stroke prediction model based on clinical natural language processing (NLP) and data mining methods Sedghi, Elham 30 March 2017 (has links) Early detection and treatment of stroke can save lives. Before any procedure is planned, the patient is traditionally subjected to a brain scan such as Magnetic Resonance Imaging (MRI) in order to make sure he/she receives a safe treatment. Before any imaging is performed, the patient is checked into Emergency Room (ER) and clinicians from the Stroke Rapid Assessment Unit (SRAU) perform an evaluation of the patient's signs and symptoms. The question we address in this thesis is: Can Data Mining (DM) algorithms be employed to reliably predict the occurrence of stroke in a patient based on the signs and symptoms gathered by the clinicians and other staff in the ER or the SRAU? A reliable DM algorithm would be very useful in helping the clinicians make a better decision whether to escalate the case or classify it as a non-life threatening mimic and not put the patient through unnecessary imaging and tests. Such an algorithm would not only make the life of patients and clinicians easier but would also enable the hospitals to cut down on their costs. Most of the signs and symptoms gathered by clinicians in the ER or the SRAU are stored in free-text format in hospital information systems. Using techniques from Natural Language Processing (NLP), the vocabularies of interest can be extracted and classiffied. A big challenge in this process is that medical narratives are full of misspelled words and clinical abbreviations. It is a well known fact that the quality of data mining results crucially depends on the quality of input data. In this thesis, as a rst contribution, we describe a procedure to preprocess the raw data and transform it into clean, well-structured data that can be effectively used by DM learning algorithms. Another contribution of this thesis is producing a set of carefully crafted rules to perform detection of negated meaning in free-text sentences. Using these rules, we were able to get the correct semantics of sentences and provide much more useful datasets to DM learning algorithms. This thesis consists of three main parts. In the first part, we focus on building classi ers to reliably distinguish stroke and Transient Ischemic Attack (TIA) from mimic cases. For this, we used text extracted from the "chief complaint" and "history of patient illness" fields available in the patients' les at the Victoria General Hospital (VGH). In collaboration with stroke specialists, we identified a well-de ned set of stroke-related keywords. Next, we created practical tools to accurately assign keywords from this set to each patient. Then, we performed extensive experiments for nding the right learning algorithm to build the best classifier that provides a good balance between sensitivity, specificity, and a host of other quality indicators. In the second part, we focus on the most important mimic case, migraine, and how to e ectively distinguish it from stroke or TIA. This is a challenging problem because migraine has many signs and symptoms that are similar to those of stroke or TIA. Another challenge we address is the imbalance that our datasets have with respect to migraine. Namely the migraine cases are a minority of the overall cases. In order to alleviate this rarity problem, we propose a randomization procedure which is able to drastically improve the classi er quality. Finally, in the third part, we provide a detailed study on datamining algorithms for extracting the most important predictors that can help to detect and prevent Posterior circulation stroke. We compared our finding with the attributes reported by the Heart and Stroke Foundation of Canada, and the features found in our study performed better in accuracy, sensitivity, and ROC. / Graduate Data Mining Natural Language Processing
192	Data Visualization for the Benchmarking Engine Joish, Sudha 16 May 2003 (has links) In today's information age, data collection is not the ultimate goal; it is simply the first step in extracting knowledge-rich information to shape future decisions. In this thesis, we present ChartVisio - a simple web-based visual data-mining system that lets users quickly explore databases and transform raw data into processed visuals. It is highly interactive, easy to use and hides the underlying complexity of querying from its users. Data from tables is internally mapped into charts using aggregate functions across tables. The tool thus integrates querying and charting into a single general-purpose application. ChartVisio has been designed as a component of the Benchmark data engine, being developed at the Computer Science department, University of New Orleans. The data engine is an intelligent website generator and users who create websites using the Data Engine are the site owners. Using ChartVisio, owners may generate new charts and save them as XML templates for prospective website surfers. Everyday Internet users may view saved charts with the touch of a button and get real-time data, since charts are generated dynamically. Website surfers may also generate new charts, but may not save them as templates. As a result, even non-technical users can design and generate charts with minimal time and effort. visual data mining data visualization
193	Educational Data Mining : En kvalitativ studie med inriktning på dataanalys för att hitta mönster i närvarostatistik / Educational Data Mining : A qualitative study focusing on data analysis to find patterns in presence statistics Borg, Olivia January 2019 (has links) Studien fokuserar på att hitta olika mönster i närvarostatistik hos elever som inte närvarar i skolan. Informationen som resultatet ger kan därefter användas som ett beslutsunderlag för skolor eller till andra organisationer som är intresserade av EDM inom närvarostatistik. Arbetet genomförde en kvalitativ metodansats med en fallstudie som bestod utav en litteraturstudie samt en implementation. Litteraturstudien användes för att få en förståelse över vanliga tillvägagångssätt inom EDM, som därefter låg till grund för implementationen som använde arbetssättet CRISP-DM. Resultatet blev fem olika mönster som definieras genom dataanalys. Mönstren visar frånvaro ur ett tidsperspektiv samt per ämne och kan ligga till grund för framtida beslutsunderlag. / The study focuses on finding different patterns in attendance statistics for students who are not present at school. The information provided by the results can thereafter be used as a basis for decision-making for schools or for other organizations interested in EDM within attendance statistics. The work carried out a qualitative method approach with a case study that consisted a literature study and an implementation. The literature study was used to gain an understanding of common approaches within EDM, which subsequently formed the basis for the implementation that used the working method CRISP-DM. The project resulted in five different patterns defined by data analysis. The patterns show absence from a time perspective and per subject and can form the basis for future decision-making. Data Mining Educational Data Mining Patterns Data Mining Educational Data Mining Mönster Information Systems
194	Interactive data mining and visualization on multi-dimensional data. January 1999 (has links) by Chu, Hong Ki. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 75-79). / Abstracts in English and Chinese. / Acknowledgments --- p.ii / Abstract --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Problem Definitions --- p.3 / Chapter 1.2 --- Experimental Setup --- p.5 / Chapter 1.3 --- Outline of the thesis --- p.6 / Chapter 2 --- Survey on Previous Researches --- p.8 / Chapter 2.1 --- Association rules --- p.8 / Chapter 2.2 --- Clustering --- p.10 / Chapter 2.3 --- Motivation --- p.12 / Chapter 3 --- ID AN on discovering quantitative association rules --- p.16 / Chapter 3.1 --- Briefing --- p.17 / Chapter 3.2 --- A-Tree --- p.18 / Chapter 3.3 --- Insertion Algorithm --- p.25 / Chapter 3.4 --- Visualizing Association Rules --- p.28 / Chapter 4 --- ID AN on discovering patterns of clustering --- p.34 / Chapter 4.1 --- Briefing --- p.34 / Chapter 4.2 --- A-Tree --- p.36 / Chapter 4.3 --- Dimensionality Curse --- p.37 / Chapter 4.3.1 --- Discrete Fourier Transform --- p.38 / Chapter 4.3.2 --- Discrete Wavelet Transform --- p.40 / Chapter 4.3.3 --- Singular Value Decomposition --- p.42 / Chapter 4.4 --- IDAN - Algorithm --- p.45 / Chapter 4.5 --- Visualizing clustering patterns --- p.49 / Chapter 4.6 --- Comparison --- p.51 / Chapter 5 --- Performance Studies --- p.55 / Chapter 5.1 --- Association Rules --- p.55 / Chapter 5.2 --- Clustering --- p.58 / Chapter 6 --- Survey on data visualization techniques --- p.63 / Chapter 6.1 --- Geometric Projection Techniques --- p.64 / Chapter 6.1.1 --- Scatter-plot Matrix --- p.64 / Chapter 6.1.2 --- Parallel Coordinates --- p.65 / Chapter 6.2 --- Icon-based Techniques --- p.67 / Chapter 6.2.1 --- Chernoff Face --- p.67 / Chapter 6.2.2 --- Stick Figures --- p.68 / Chapter 6.3 --- Pixel-oriented Techniques --- p.70 / Chapter 6.4 --- Hierarchical Techniques --- p.72 / Chapter 7 --- Conclusion --- p.73 / Bibliography --- p.74 Data mining Visualization--Data processing
195	Bioinformatic mining and analysis of genetic elements in genomes. / CUHK electronic theses & dissertations collection January 2013 (has links) 在海量的生物數據中發掘重要的功能元件、揭示其功能特徵及相應的潛在生物機制是後基因組時代的一個巨大的挑戰。這裡，以特定的基因組為對象，運用生物信息學的理論與方法，對基因組島及後翻譯修飾系統進行了系統的挖掘、分析。 / 首先，收集源於7個真核生物的超過70,000個試驗驗證的翻譯後修飾事件。對照不帶有任何後翻譯修飾靶點的蛋白, 對受多種翻譯後修飾調控的蛋白 (MTP-蛋白) 的特性和功能進行了分析比較。(1) MTP-蛋白顯著傾向於形成蛋白質複合物，並能與更多的蛋白質相互作用，同時偏好於在蛋白質-蛋白質相互作用網絡中擔當樞紐。(2) MTP-蛋白還具有獨特的功能偏好以及特定的亞細胞定位。(3) 約80的後翻譯修飾位點位於蛋白的無序區域。同時MTP-蛋白比不受後翻譯修飾調控的蛋白擁有更多的無序區域。(4) 擁有較少無序區域的MTP-蛋白主要和蛋白質-DNA複合物的形成相關。(5) 只有一小部分單個後翻譯修飾事件對結合能的影響大於2kcal/mol，但組合的多種後翻譯修飾，如磷酸化加上乙酰化, 對結合能的影響大幅提升。 / 隨後，對74真菌基因組中泛素化系統的不同組件(分別為泛素，E1，E2，E3和E3的底物) 進行註釋並比較分析。 (1) 與擔子菌的其他基因組相比, 菇類基因組中具有顯著多的泛素。 (2) 儘管E1的數目在目標基因組之間波動極小, 菇類基因組中E2的數目仍顯著高於其他擔子菌。 (3) 對於候選的E3，菇類基因組中Paracaspase和F-box的數目也顯著高於其他擔子菌。這些結果表明，泛素化系統很可能在真菌形態分化、尤其是菇的形成中扮演著重要角色。 / 然後，與全基因組相比，發現基因組島具有顯著高的轉錄起始信號富集. 基於這種特異的轉錄調控信號，設計了一個新的基因組島預測程序(命名GIST)。通過分析顯示GIST具有較高的靈敏度和準確性. 最後，運用GIST，對最近在德國暴發的菌株TY-2482中的基因組島進行了首次的檢測和分析。 / 總之，這些工作不僅大大拓展了我們關於特定功能元素的理解，如MTP-蛋白和基因組島，同時也為進一步的相關研究提供了重要的工具和線索，如GIST以及菇類基因組中的泛素化系統。 / In the post-genomic era, it is a huge challenge to detect the functional elements in the "ocean" of data and provide meaningful biological inferences. Here, many interesting functional elements have been characterized and analyzed among targeted genomes. / First, through compiling more than 70,000 experimentally determined posttranslational modification (PTM) events from 7 eukaryotic organisms, the features and functions of proteins regulated by multiple types of PTMs (Mtp-Proteins) are detected and analyzed by compared with proteins harboring no known target site of PTMs. (1) The Mtp-Proteins are found significantly enriched in protein complexes, having more protein partners and preferred to act as hubs in protein-protein interaction network. (2) Mtp-Proteins also possess distinct function focus and biased subcellular locations. (3) Overall, about 80% analyzed PTM events are embedded in intrinsic disordered regions (IDRs). And most Mtp-Proteins have more IDRs than proteins without PTM sites. It suggests IDR may account most for why some proteins can harbor so many extraordinary functions. (4) Interestingly, some particular Mtp-Proteins biased carrying PTMs located in ordered regions are observed mainly related to "protein-DNA complex assembly". (5) We further evaluated the energetic effects of PTMs on stability of PPI and found that only a small fraction of single PTM event influence the binding energy more than 2kcal/mol; but combinational use of PTM types i.e. combinational phosphorylation and acetylation can change the binding energy dramatically. / On the second part, the different components in ubiquitination system, respectively ubiquitin, E1, E2, E3 and the substrates of E3, are identified and analyzed comparatively across 74 fungi genomes. The results mainly include: (1) the ubiquitin number is significantly higher within the mushroom-forming genomes compared to other basidiomycota genomes. (2) The number of E1, with the average of 2.92, is consistent among most genomes. However, the number of E2 is different between mushroom-forming genomes and other basidiomycota genomes. (3) For the E3 candidates, it is found that the number of domain Paracaspase and F-box in the mushroom-forming genomes is significantly higher than the other basidiomycota genomes. These results suggest that the ubiquitination system may play vital role in divergence of fungi morphogenesis, especially, such as the formation of mushroom. / Then, the focus shift to genomic islands (GIs). Compared to the whole genome, highly enriched transcription initiation positions are firstly found to be precipitated in GI regions. Based on this heterogeneous transcriptional regulatory signal, a novel procedure GIST (Genome-island Identification by Signals of Transcription) for genomic island detection is designed. Interestingly, our method demonstrates higher sensitivity in detecting genomic islands harboring genes with biased GI-like function, preferenced subcellular localization, skewed GC property and shorter gene length. Finally, using the GIST, many interesting GIs are detected and analyzed in the German outbreak strain TY-2482 for the first time. / In summary, these work not only considerably expand our understanding of several functional genetic elements, such as genomic island and proteins regulated by combinational multiple PTMs, but also provide important tool and clues, such as GIST and potential E3 expansion in mushroom-forming fungi, for further related studies. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Huang, Qianli. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 161-186). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts also in Chinese. / Abstract --- p.i / 論文摘要 --- p.iii / Abbreviations --- p.v / Acknowledgements --- p.vi / Declaration --- p.viii / Table of Contents --- p.ix / List of Figures --- p.xi / List of Tables --- p.xiv / Chapter Chapter 1 --- Literature Review --- p.1 / Chapter 1.1 --- General introduction --- p.1 / Chapter 1.2 --- Post-translational modification --- p.2 / Chapter 1.2.1 --- Combinational multiple types of post-translational modification --- p.2 / Chapter 1.3 --- Genomic islands --- p.7 / Chapter 1.3.1 --- Brief introduction --- p.7 / Chapter 1.3.2 --- Bioinformatic tools and database for identification of Genomic islands --- p.9 / Chapter 1.4 --- Objectives and significance --- p.13 / Chapter Chapter 2 --- Systematic analysis on features and functions of proteins regulated by combinational multiple types of post-translational modifications --- p.15 / Chapter 2.1 --- Introduction --- p.15 / Chapter 2.2 --- Materials and Methods --- p.18 / Chapter 2.2.1 --- Annotation of PTM pattern and analyses on target residues --- p.18 / Chapter 2.2.2 --- Classification of Human Proteins --- p.19 / Chapter 2.2.3 --- Dataset of human protein-protein interactions (PPIs) and Construction of PPI network --- p.19 / Chapter 2.2.4 --- Calculation of Binding Energy --- p.20 / Chapter 2.2.5 --- Functional characterization and subcellular localization analysis --- p.21 / Chapter 2.2.5 --- Annotating IDR regions --- p.22 / Chapter 2.2.7 --- Statistical analyses --- p.23 / Chapter 2.3 --- Results --- p.23 / Chapter 2.3.1 --- Combinational interactions of multiple PTM types are undergoing evolutionary selection --- p.23 / Chapter 2.3.2 --- Evolutionary profile of modified amino acid residues --- p.33 / Chapter 2.3.3 --- Mtp-Proteins are enriched in the protein complex --- p.43 / Chapter 2.3.4 --- Multiple PTMs enable target protein function as hub or super-hub in PPI network --- p.46 / Chapter 2.3.5 --- Energetic effect of PTMs on the Stability of protein-protein binding --- p.60 / Chapter 2.3.6 --- Mtp-Proteins demonstrate distinct function focus --- p.65 / Chapter 2.3.7 --- Mtp-Proteins: located preferedly in Cytoplasm and Nucleus --- p.69 / Chapter 2.3.8 --- Why Mtp-Proteins possess so many special features : importance of IDR --- p.75 / Chapter 2.4 --- Discussion --- p.82 / Chapter 2.4.1 --- The hints from the features of Mtp-Proteins --- p.82 / Chapter 2.4.2 --- The implication of combinational interaction between two different functional PTM categories: biased locating in IDRs and ordered regions respectively --- p.84 / Chapter Chapter 3 --- Genome-wide comparative analyses of ubiquitome among basidiomycota and other typical fungi genomes --- p.87 / Chapter 3.1 --- Introduction --- p.87 / Chapter 3.2 --- Materials and Methods --- p.89 / Chapter 3.2.1 --- Genome sequences and annotation acquirement. --- p.89 / Chapter 3.2.2 --- Bioinformatic prediction of components in ubiquitome --- p.89 / Chapter 3.3 --- Results --- p.90 / Chapter 3.3.1 --- Identification of ubiquitin candidates among 74 fungi genomes --- p.90 / Chapter 3.3.2 --- Detection of potential E1 and E2 among all considered genomes --- p.94 / Chapter 3.3.3 --- Prediction and comparative analysis of different types of E3 --- p.98 / Chapter 3.3.4 --- The possible substrates of E3 --- p.104 / Chapter 3.4 --- Discussion --- p.107 / Chapter Chapter 4 --- Genomic islands Identification by Signals of Transcription --- p.109 / Chapter 4.1 --- Introduction --- p.109 / Chapter 4.2 --- Materials and Methods --- p.112 / Chapter 4.2.1 --- Genome sequence and annotation data --- p.112 / Chapter 4.2.2 --- Transcription start points (TSPs) scanning --- p.113 / Chapter 4.2.3 --- Genomic island dataset construction --- p.114 / Chapter 4.2.4 --- GIST: Genomic-island Identification by Signal of Transcription --- p.115 / Chapter 4.2.5 --- Functional characterization and subcellular localization analysis --- p.116 / Chapter 4.2.6 --- Codon usage, GC content and gene length --- p.117 / Chapter 4.2.7 --- Statistical analyses --- p.118 / Chapter 4.3 --- Results --- p.132 / Chapter 4.3.1 --- High-density transcriptional initiation signals associated with GIs --- p.132 / Chapter 4.3.2 --- Predict the potential novel GIs through GIST: Genomic-island Identification by Signal of Transcription --- p.134 / Chapter 4.3.3 --- Comparative Analysis: Distribution of gene function categories --- p.138 / Chapter 4.3.4 --- Comparative Analysis: Divergence of subcellular locations --- p.140 / Chapter 4.3.5 --- Comparative Analysis: GC property and gene length --- p.144 / Chapter 4.3.6 --- Hints of "non-optimal" codon usage bias --- p.145 / Chapter 4.3.7 --- Application of GIST to analyze GIs in the German E. coli O104:H4 outbreak strain --- p.147 / Chapter 4.4 --- Discussion --- p.152 / Chapter Chapter 5 --- Concluding remarks --- p.158 / References --- p.161 Bioinformatics--Data processing Data mining
196	Practical and theoretical applications of the Regularity Lemma Song, Fei 22 April 2013 (has links) The Regularity Lemma of Szemeredi is a fundamental tool in extremal graph theory with a wide range of applications in theoretical computer science. Partly as a recognition of his work on the Regularity Lemma, Endre Szemeredi has won the Abel Prize in 2012 for his outstanding achievement. In this thesis we present both practical and theoretical applications of the Regularity Lemma. The practical applications are concerning the important problem of data clustering, the theoretical applications are concerning the monochromatic vertex partition problem. In spite of its numerous applications to establish theoretical results, the Regularity Lemma has a drawback that it requires the graphs under consideration to be astronomically large, thus limiting its practical utility. As stated by Gowers, it has been ``well beyond the realms of any practical applications', the existing applications have been theoretical, mathematical. In the first part of the thesis, we propose to change this and we propose some modifications to the constructive versions of the Regularity Lemma. While this affects the generality of the result, it also makes it more useful for much smaller graphs. We call this result the practical regularity partitioning algorithm and the resulting clustering technique Regularity Clustering. This is the first integrated attempt in order to make the Regularity Lemma applicable in practice. We present results on applying regularity clustering on a number of benchmark data-sets and compare the results with k-means clustering and spectral clustering. Finally we demonstrate its application in Educational Data Mining to improve the student performance prediction. In the second part of the thesis, we study the monochromatic vertex partition problem. To begin we briefly review some related topics and several proof techniques that are central to our results, including the greedy and absorbing procedures. We also review some of the current best results before presenting ours, where the Regularity Lemma has played a critical role. Before concluding we discuss some future research directions that appear particularly promising based on our work. data mining regularity lemma combinatorics
197	Visually Mining Interesting Patterns in Multivariate Datasets Guo, Zhenyu 06 January 2013 (has links) Data mining for patterns and knowledge discovery in multivariate datasets are very important processes and tasks to help analysts understand the dataset, describe the dataset, and predict unknown data values. However, conventional computer-supported data mining approaches often limit the user from getting involved in the mining process and performing interactions during the pattern discovery. Besides, without the visual representation of the extracted knowledge, the analysts can have difficulty explaining and understanding the patterns. Therefore, instead of directly applying automatic data mining techniques, it is necessary to develop appropriate techniques and visualization systems that allow users to interactively perform knowledge discovery, visually examine the patterns, adjust the parameters, and discover more interesting patterns based on their requirements. In the dissertation, I will discuss different proposed visualization systems to assist analysts in mining patterns and discovering knowledge in multivariate datasets, including the design, implementation, and the evaluation. Three types of different patterns are proposed and discussed, including trends, clusters of subgroups, and local patterns. For trend discovery, the parameter space is visualized to allow the user to visually examine the space and find where good linear patterns exist. For cluster discovery, the user is able to interactively set the query range on a target attribute, and retrieve all the sub-regions that satisfy the user's requirements. The sub-regions that satisfy the same query and are neareach other are grouped and aggregated to form clusters. For local pattern discovery, the patterns for the local sub-region with a focal point and its neighbors are computationally extracted and visually represented. To discover interesting local neighbors, the extracted local patterns are integrated and visually shown to the analysts. Evaluations of the three visualization systems using formal user studies are also performed and discussed. visual data mining multivariate visualization
198	Learning the Effectiveness of Content and Methodology in an Intelligent Tutoring System Dailey, Matthew D 03 May 2011 (has links) Classroom instruction time is a valuable yet scarce resource to teachers, who must decide how to best meet their objectives by selecting which topics to spend time on and when to move forward. Intelligent Tutoring Systems (ITS) are a powerful tool for teachers in this regard, allowing them to measure their students' current level of knowledge, helping them gauge student knowledge acquisition, and providing them with valuable insight into learning methodologies. By using ITS to identify the effectiveness of proven methods of instruction, we can more effectively teach students both in and outside of the classroom. In this paper we review the results and contributions of a new Bayesian data mining method which can be used to identify what works in an ITS and how it can be used to learn from data which is not in the typical randomized controlled trial design. We then discuss modifications to this dataset which use more knowledge about the students to improve accuracy. Lastly we evaluate this model on detecting and predicting long term student retention, and discuss methods to improve its predictive accuracy. educational data mining student modeling
199	A study of frequent pattern and association rule mining: with applications in inventory update and marketing. January 2004 (has links) Wong, Chi-Wing. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 149-153). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- MPIS --- p.3 / Chapter 1.2 --- ISM --- p.5 / Chapter 1.3 --- MPIS and ISM --- p.5 / Chapter 1.4 --- Thesis Organization --- p.6 / Chapter 2 --- MPIS --- p.7 / Chapter 2.1 --- Introduction --- p.7 / Chapter 2.2 --- Related Work --- p.10 / Chapter 2.2.1 --- Item Selection Related Work --- p.11 / Chapter 2.3 --- Problem Definition --- p.22 / Chapter 2.3.1 --- NP-hardness --- p.25 / Chapter 2.4 --- Cross Selling Effect by Association Rules --- p.28 / Chapter 2.5 --- Quadratic Programming Method --- p.32 / Chapter 2.6 --- Algorithm MPIS_Alg --- p.41 / Chapter 2.6.1 --- Overall Framework --- p.43 / Chapter 2.6.2 --- Enhancement Step --- p.47 / Chapter 2.6.3 --- Implementation Details --- p.48 / Chapter 2.7 --- Genetic Algorithm --- p.60 / Chapter 2.7.1 --- Crossover --- p.62 / Chapter 2.7.2 --- Mutation --- p.64 / Chapter 2.8 --- Performance Analysis --- p.64 / Chapter 2.8.1 --- Preparation Phase --- p.65 / Chapter 2.8.2 --- Main Phase --- p.69 / Chapter 2.9 --- Experimental Result --- p.77 / Chapter 2.9.1 --- Tools for Quadratic Programming --- p.77 / Chapter 2.9.2 --- Partition Matrix Technique --- p.78 / Chapter 2.9.3 --- Data Sets --- p.81 / Chapter 2.9.4 --- Empirical Study for GA --- p.84 / Chapter 2.9.5 --- Experimental Results --- p.92 / Chapter 2.9.6 --- Scalability --- p.102 / Chapter 2.10 --- Conclusion --- p.106 / Chapter 3 --- ISM --- p.107 / Chapter 3.1 --- Introduction --- p.107 / Chapter 3.2 --- Related Work --- p.108 / Chapter 3.2.1 --- Network Model --- p.108 / Chapter 3.3 --- Problem Definition --- p.112 / Chapter 3.4 --- Association Based Cross-Selling Effect --- p.117 / Chapter 3.5 --- Quadratic Programming --- p.118 / Chapter 3.5.1 --- Quadratic Form --- p.119 / Chapter 3.5.2 --- Algorithm --- p.128 / Chapter 3.5.3 --- Example --- p.129 / Chapter 3.6 --- Hill-Climbing Approach --- p.134 / Chapter 3.6.1 --- Efficient Calculation of Formula of Profit Gain --- p.134 / Chapter 3.6.2 --- FP-tree Implementation --- p.135 / Chapter 3.7 --- Empirical Study --- p.136 / Chapter 3.7.1 --- Data Set --- p.137 / Chapter 3.7.2 --- Experimental Results --- p.138 / Chapter 3.8 --- Conclusion --- p.141 / Chapter 4 --- Conclusion --- p.147 / Bibliography --- p.153 Data mining Selling--Data processing
200	Computational models for contrastive opinion mining and aspect extraction Ibeke, Emmanuel Ebuka January 2018 (has links) With the growing popularity and availability of opinion-rich resources such as social media platforms and networks, new opportunities arise as people can now share their opinions and also seek or understand the opinion of others about a specific topic or event. This growth has fuelled interest in opinion mining which seeks to understand opinions, attitudes, judgements and evaluations with respect to an entity or its aspects. The proliferation of reviews, ratings and online expressions have turned into a valuable asset to businesses seeking to manage their reputation, market their products, or identify new opportunities through opinion analysis. On the side of consumers, opinion mining serves as an information source that can support decision making. In this research, we focus on some fundamental challenges in opinion mining and make three contributions. First, we develop a curated corpus for training and evaluating opinion mining models. This corpus annotates sentiment and topic information at both sentence and review levels. It also captures the sentiment and topic time-variance information of the reviews. We demonstrate through experiments that this dataset supports opinion mining tasks such as contrastive opinion mining, and joint sentence and document level sentiment and topic analysis. As the corpus has a time-variance characteristic, it could also support studies in sentiment/topic dynamic analysis. Second, we propose a model for mining contrastive opinion from textual data (contraLDA). Unlike existing models that require input data to be separated into different collections beforehand, contraLDA models contrastive opinion from both single and multiple text collections. The model can also be flexibly trained in weakly-supervised and fully-supervised settings. In addition, the contraLDA model not only mines contrastive opinion but also quantifies the strength of opinion contrastiveness towards the topic of interest. The contraLDA model extracts relevant sentences related to the topics, making sentiment-bearing topics more interpretable. Third, we present an aspect extraction method which integrates a Natural Language Processing (NLP) algorithm and word embedding model to identify implicit and explicit aspect expressions from texts. Unlike existing systems, the proposed approach also maps aspect expressions to their corresponding aspect categories. This process allows easy identification of sentences about different aspects of a product. We demonstrate that this unsupervised approach is comparable to state-of-the-art models. 004 Public opinion ; Data mining

Search results