Global ETD Search

1	APPLICATIONS OF DATA MINING IN HEALTHCARE Bo Peng (6618929) 10 June 2019 (has links) With increases in the quantity and quality of healthcare related data, data mining tools have the potential to improve people’s standard of living through personalized and pre-<br>dictive medicine. In this thesis we improve the state-of-the-art in data mining for several problems in the healthcare domain. In problems such as drug-drug interaction prediction<br>and Alzheimer’s Disease (AD) biomarkers discovery and prioritization, current methods either require tedious feature engineering or have unsatisfactory performance. New effective computational tools are needed that can tackle these complex problems.<br>In this dissertation, we develop new algorithms for two healthcare problems: high-order drug-drug interaction prediction and amyloid imaging biomarker prioritization in<br>Alzheimer’s Disease. Drug-drug interactions (DDIs) and their associated adverse drug reactions (ADRs) represent a significant detriment to the public h ealth. Existing research on DDIs primarily focuses on pairwise DDI detection and prediction. Effective computational methods for high-order DDI prediction are desired. In this dissertation, I present a deep learning based model D3I for cardinality-invariant and order-invariant high-order DDI prediction. The proposed models achieve 0.740 F1 value and 0.847 AUC value on high-order DDI prediction, and outperform classical methods on order-2 DDI prediction. These results demonstrate the strong potential of D 3 I and deep learning based models in tackling the prediction problems of high-order DDIs and their induced ADRs.<br>The second problem I consider in this thesis is amyloid imaging biomarkers discovery, for which I propose an innovative machine learning paradigm enabling precision medicine in this domain. The paradigm tailors the imaging biomarker discovery process to individual characteristics of a given patient. I implement this paradigm using a newly developed learning-to-rank method PLTR. The PLTR model seamlessly integrates two objectives for joint optimization: pushing up relevant biomarkers and ranking among relevant biomarkers. The empirical study of PLTR conducted on the ADNI data yields promising results to identify and prioritize individual-specific amyloid imaging biomarkers based on the individual’s structural MRI data. The resulting top ranked imaging biomarkers have the potential to aid personalized diagnosis and disease subtyping. Applied Computer Science data Mining Techniques Applied
2	Intelligent computational solutions for constitutive modelling of materials in finite element analysis Faramarzi, Asaad January 2011 (has links) Over the past decades simulation techniques, and in particular finite element method, have been used successfully to predict the response of systems across a whole range of industries including aerospace, automotive, chemical processes, geotechnical engineering and many others. In these numerical analyses, the behaviour of the actual material is approximated with that of an idealised material that deforms in accordance with some constitutive relationships. Therefore, the choice of an appropriate constitutive model that adequately describes the behaviour of the material plays an important role in the accuracy and reliability of the numerical predictions. During the past decades several constitutive models have been developed for various materials. In recent years, by rapid and effective developments in computational software and hardware, alternative computer aided pattern recognition techniques have been introduced to constitutive modelling of materials. The main idea behind pattern recognition systems such as neural network, fuzzy logic or genetic programming is that they learn adaptively from experience and extract various discriminants, each appropriate for its purpose. In this thesis a novel approach is presented and employed to develop constitutive models for materials in general and soils in particular based on evolutionary polynomial regression (EPR). EPR is a hybrid data mining technique that searches for symbolic structures (representing the behaviour of a system) using genetic algorithm and estimates the constant values by the least squares method. Stress-strain data from experiments are employed to train and develop EPR-based material models. The developed models are compared with some of the existing conventional constitutive material models and its advantages are highlighted. It is also shown that the developed EPR-based material models can be incorporated in finite element (FE) analysis. Different examples are used to verify the developed EPR-based FE model. The results of the EPR-FEM are compared with those of a standard FEM where conventional constitutive models are used to model the material behaviour. These results show that EPR-FEM can be successfully employed to analyse different structural and geotechnical engineering problems. 628
3	Detecting fraud in cellular telephone networks Van Heerden, Johan H. 12 1900 (has links) Thesis (MSc)--University of Stellenbosch, 2005. / ENGLISH ABSTRACT: Cellular network operators globally loose between 3% and 5% of their annual revenue to telecommunications fraud. Hence it is of great importance that fraud management systems are implemented to detect, alarm, and shut down fraud within minutes, minimising revenue loss. Modern proprietary fraud management systems employ (i) classification methods, most often artificial neural networks learning from classified call data records to classify new call data records as fraudulent or legitimate, (ii) statistical methods building subscriber behaviour profiles based on the subscriber’s usage in the cellular network and detecting sudden changes in behaviour, and (iii) rules and threshold values defined by fraud analysts, utilising their knowledge of valid fraud cases and the false alarm rate as guidance. The purpose of this thesis is to establish a context for and evaluate the performance of well-known data mining techniques that may be incorporated in the fraud detection process. Firstly, a theoretical background of various well-known data mining techniques is provided and a number of seminal articles on fraud detection, which influenced this thesis, are summarised. The cellular telecommunications industry is introduced, including a brief discussion of the types of fraud experienced by South African cellular network operators. Secondly, the data collection process and the characteristics of the collected data are discussed. Different data mining techniques are applied to the collected data, demonstrating how user behaviour profiles may be built and how fraud may be predicted. An appraisal of the performances and appropriateness of the different data mining techniques is given in the context of the fraud detection process. Finally, an indication of further work is provided in the conclusion to this thesis, in the form of a number of recommendations for possible adaptations of the fraud detection methods, and improvements thereof. A combination of data mining techniques that may be used to build a comprehensive fraud detection model is also suggested. / AFRIKAANSE OPSOMMING: Sellulêre netwerk operateurs verloor wêreldwyd tussen 3% en 5% van hul jaarlikse inkomste as gevolg van telekommunikasie bedrog. Dit is dus van die uiterse belang dat bedrog bestuurstelsels geïmplimenteer word om bedrog op te spoor, alarms te genereer, en bedrog binne minute te staak om verlies aan inkomste tot ’n minimum te beperk. Moderne gepatenteerde bedrog bestuurstelsels maak gebruik van (i) klassifikasie metodes, mees dikwels kunsmatige neurale netwerke wat leer vanaf geklassifiseerde oproep rekords en gebruik word om nuwe oproep rekords as bedrog-draend of nie bedrog-draend te klassifiseer, (ii) statistiese metodes wat gedragsprofiele van ’n intekenaar bou, gebaseer op die intekenaar se gedrag in die sellulêre netwerk, en skielike verandering in gedrag opspoor, en (iii) reëls en drempelwaardes wat deur bedrog analiste daar gestel word, deur gebruik te maak van hulle ondervinding met geldige gevalle van bedrog en die koers waarteen vals alarms gegenereer word. Die doel van hierdie tesis is om ’n konteks te bepaal vir en die werksverrigting te evalueer van bekende data ontginningstegnieke wat in bedrog opsporingstelsels gebruik kan word. Eerstens word ’n teoretiese agtergrond vir ’n aantal bekende data ontginningstegnieke voorsien en ’n aantal gedagteryke artikels wat oor bedrog opsporing handel en wat hierdie tesis beïnvloed het, opgesom. Die sellulêre telekommunikasie industrie word bekend gestel, insluitend ’n kort bespreking oor die tipes bedrog wat deur Suid-Afrikaanse sellulˆere telekommunikasie netwerk operateurs ondervind word. Tweedens word die data versamelingsproses en die eienskappe van die versamelde data bespreek. Verskillende data ontginningstegnieke word vervolgens toegepas op die versamelde data om te demonstreer hoe gedragsprofiele van gebruikers gebou kan word en hoe bedrog voorspel kan word. Die werksverrigting en gepastheid van die verskillende data ontginningstegnieke word bespreek in die konteks van die bedrog opsporingsproses. Laastens word ’n aanduiding van verdere werk in die gevolgtrekking tot hierdie tesis verskaf, en wel in die vorm van ’n aantal aanbevelings oor moontlike aanpassings en verbeterings van die bedrog opsporingsmetodes wat beskou en toegepas is. ’n Omvattende bedrog opsporingsmodel wat gebruik maak van ’n kombinasie van data ontginningstegnieke word ook voorgestel. Data mining Cellular telephone systems Fraud Evaluating data mining techniques Fraud detection Dissertations -- Applied mathematics Theses -- Applied mathematics
4	Učení bez učitele / Unsupervised learning Kantor, Jan January 2008 (has links) The purpose of this work has been to describe some techniques which are normally used for cluster data analysis process of unsupervised learning. The thesis consists of two parts. The first part of thesis has been focused on some algorithms theory describing advantages and disadvantages of each discussed method and validation of clusters quality. There are many ways how to estimate and compute clustering quality based on internal and external knowledge which is mentioned in this part. A good technique of clustering quality validation is one of the most important parts in cluster analysis. The second part of thesis deals with implementation of different clustering techniques and programs on real datasets and their comparison with true dataset partitioning and published related work.
5	Διάγνωση, πρόγνωση και υποστήριξη θεραπευτικής αγωγής κακοηθών λεμφωμάτων με χρήση τεχνητής νοημοσύνης Δράκος, Ιωάννης 13 July 2010 (has links) Η παρούσα διδακτορική διατριβή έχει ως στόχο τη δημιουργία ενός αποδοτικού μοντέλου για το Λειτουργικό Συνδυασμό Βιο-Ιατρικών δεδομένων (BioMedical data integration). Ξεκινώντας από τη σχεδιαστική ανάλυση της ιατρικής γνώσης και των προβλημάτων που προκύπτουν από τον τρόπο παραγωγής των ιατρικών δεδομένων, προχωρεί στην επίλυση των επιμέρους θεμάτων Λειτουργικού Συνδυασμού εντός ενός συγκεκριμένου ιατρικού πεδίου και καταλήγει στον ολοκληρωμένο Λειτουργικό Συνδυασμό ιατρικών δεδομένων προερχόμενων από διαφορετικές πηγές και πεδία γνώσης. Συνεχίζει με τη σχεδίαση ενός μοντέλου βάσεων δεδομένων που ακολουθεί «οριζόντια» λογική και είναι αρκετά αποδοτικό ώστε να αποκρίνεται σε πολύπλοκα και ευρείας κλίμακας ερωτήματα σε πραγματικό χρόνο. Καταλήγει με την παρουσίαση μίας ολοκληρωμένης εφαρμογής η οποία εκμεταλλευόμενη τα πλεονεκτήματα του Λειτουργικού Συνδυασμού και της οριζόντιας δομής των δεδομένων είναι σε θέση να διαχειριστεί εξετάσεις προερχόμενες από κάθε κυτταρομετρητή ροής και συνδυάζοντάς αυτές με τις υπόλοιπες αιματολογικές κλινικοεργαστηριακές εξετάσεις να απαντά σε καθημερινά και σύνθετα ερευνητικά, ιατρικά ερωτήματα. Τα πρωτότυπα ερευνητικά αποτελέσματα που προέκυψαν στα πλαίσια της παρούσης εργασίας δημοσιεύτηκαν σε έγκυρα διεθνή περιοδικά και σε διεθνή και ελληνικά συνέδρια με κριτές. / Current dissertation focuses on the creation of an efficient model for Bio-medical data integration. Starting with an analytical approach of the medical knowledge and the problems that may occur cause of the way that medical data are produced, continues with the necessary solutions for single domain data integration and concludes with the proposal of a working framework for mass data integration, originating from multiple medical domains. The proposed integration model is based on the “horizontal” logic of a database design and it’s efficient enough to produce query results in real time, even for complex real-life medical questions. The proof of concept of the working framework and its goals for mass data integration is achieved through the presentation of a medical information system. The presented system, by taking advantage of the “horizontal” database design, is able to manage Flow Cytometry measurements, originating for any available hardware and by integrating the cytometric data with other types of hematological data is able to give answers to everyday and research medical questions. All original research results that produced within the scope of this dissertation were published in international research journals and medical conferences. Κυτταρομετρία ροής Λεμφώματα Β-ΧΛΛ Τεχνητή νοημοσύνη Ολοκλήρωση 616.075 82 Flow cytometry Lymphoma B-cell chronic lymphocytic leukemia B-CLL Medical databases Artificial intelligence Integration Data mining techniques
6	Návrh a implementace Data Mining modelu v technologii MS SQL Server / Design and implementation of Data Mining model with MS SQL Server technology Peroutka, Lukáš January 2012 (has links) This thesis focuses on design and implementation of a data mining solution with real-world data. The task is analysed, processed and its results evaluated. The mined data set contains study records of students from University of Economics, Prague (VŠE) over the course of past three years. First part of the thesis focuses on theory of data mining, definition of the term, history and development of this particular field. Current best practices and meth-odology are described, as well as methods for determining the quality of data and methods for data pre-processing ahead of the actual data mining task. The most common data mining techniques are introduced, including their basic concepts, advantages and disadvantages. The theoretical basis is then used to implement a concrete data mining solution with educational data. The source data set is described, analysed and some of the data are chosen as input for created models. The solution is based on MS SQL Server data mining platform and it's goal is to find, describe and analyse potential as-sociations and dependencies in data. Results of respective models are evaluated, including their potential added value. Also mentioned are possible extensions and suggestions for further development of the solution.
7	Text mining for social harm and criminal justice application Ritika Pandey (9147281) 30 July 2020 (has links) Increasing rates of social harm events and plethora of text data demands the need of employing text mining techniques not only to better understand their causes but also to develop optimal prevention strategies. In this work, we study three social harm issues: crime topic models, transitions into drug addiction and homicide investigation chronologies. Topic modeling for the categorization and analysis of crime report text allows for more nuanced categories of crime compared to official UCR categorizations. This study has important implications in hotspot policing. We investigate the extent to which topic models that improve coherence lead to higher levels of crime concentration. We further explore the transitions into drug addiction using Reddit data. We proposed a prediction model to classify the users’ transition from casual drug discussion forum to recovery drug discussion forum and the likelihood of such transitions. Through this study we offer insights into modern drug culture and provide tools with potential applications in combating opioid crises. Lastly, we present a knowledge graph based framework for homicide investigation chronologies that may aid investigators in analyzing homicide case data and also allow for post hoc analysis of key features that determine whether a homicide is ultimately solved. For this purpose<br>we perform named entity recognition to determine witnesses, detectives and suspects from chronology, use keyword expansion to identify various evidence types and finally link these entities and evidence to construct a homicide investigation knowledge graph. We compare the performance over several choice of methodologies for these sub-tasks and analyze the association between network statistics of knowledge graph and homicide solvability. <br> Natural Language Processing Pattern Recognition and Data Mining machine learning data mining techniques text mining Social harm Criminal Justice Natural language processing
8	Transparent Forecasting Strategies in Database Management Systems Fischer, Ulrike, Lehner, Wolfgang 02 February 2023 (has links) Whereas traditional data warehouse systems assume that data is complete or has been carefully preprocessed, increasingly more data is imprecise, incomplete, and inconsistent. This is especially true in the context of big data, where massive amount of data arrives continuously in real-time from vast data sources. Nevertheless, modern data analysis involves sophisticated statistical algorithm that go well beyond traditional BI and, additionally, is increasingly performed by non-expert users. Both trends require transparent data mining techniques that efficiently handle missing data and present a complete view of the database to the user. Time series forecasting estimates future, not yet available, data of a time series and represents one way of dealing with missing data. Moreover, it enables queries that retrieve a view of the database at any point in time - past, present, and future. This article presents an overview of forecasting techniques in database management systems. After discussing possible application areas for time series forecasting, we give a short mathematical background of the main forecasting concepts. We then outline various general strategies of integrating time series forecasting inside a database and discuss some individual techniques from the database community. We conclude this article by introducing a novel forecasting-enabled database management architecture that natively and transparently integrates forecast models. info:eu-repo/classification/ddc/004 ddc:004

Search results