• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 77
  • 74
  • 52
  • 10
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 271
  • 271
  • 177
  • 167
  • 95
  • 56
  • 55
  • 51
  • 50
  • 47
  • 44
  • 43
  • 42
  • 40
  • 36
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
141

Análise de agrupamentos baseada na topologia dos dados e em mapas auto-organizáveis. / Data clustering based on data topology and self organizing-maps.

Boscarioli, Clodis 16 May 2008 (has links)
Cada vez mais, na conjuntura das grandes tomadas de decisões, a análise de dados massivamente armazenados se torna uma necessidade das mais variadas áreas de conhecimento. A análise de dados envolve a realização de diferentes tarefas, que podem ser realizadas por diferentes técnicas e estratégias como análise de agrupamento de dados. Esta pesquisa enfatiza a realização da tarefa de análise de agrupamento de dados (Data Clustering) usando SOM (Self-Organizing Maps) como principal artefato. SOM é uma rede neural artificial baseada em aprendizado competitivo e não-supervisionado, o que significa que o treinamento é inteiramente guiado pelos dados e que os neurônios do mapa competem entre si. Essa rede neural possui a habilidade de formar mapeamentos que quantizam os dados, preservando a sua topologia. Este trabalho introduz uma nova metodologia de análise de agrupamentos a partir de SOM, que considera o mapa topológico gerado por ele e a topologia dos dados no processo de agrupamento. Uma análise experimental e comparativa é apresentada, evidenciando a potencialidade da proposta, destacando, por fim, as principais contribuições do trabalho. / More than ever, in environment of large decision making, the analysis of data stored massively becomes a real need in almost all knowledge areas. The data analyzing process covers the performing of different tasks that can be executed for different techniques and strategies as the data clustering analysis. This research is focused on the analysis task of data groups, called Data Clustering using Self Organizing Maps (SOM) as principal artifact. SOM is an artificial neural network based on competitive and unsupervised learning, what means that its training is entirely driven by the data, such the neurons of the map compete themselves for doing it. This neural network has the ability to build the mapping task that quantifies the source data, but preserving the topology. This work introduces a new clustering analysis methodology based on SOM, considering the topological map produced by it and also the topology of the data obtained in the clustering process. The experimental and comparative analysis are also presented to demonstrate the potential of the proposal, highlighting at the end the mainly contributions of the work.
142

O fenômeno blockchain na perspectiva da estratégia tecnológica: uma análise de conteúdo por meio da descoberta de conhecimento em texto

Fernandes, Marcelo Vighi 27 August 2018 (has links)
Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2018-11-06T11:47:27Z No. of bitstreams: 1 Marcelo Vighi Fernandes.pdf: 3509868 bytes, checksum: d6db1f1e680ba92bb965b2d327c5de04 (MD5) / Made available in DSpace on 2018-11-06T11:47:28Z (GMT). No. of bitstreams: 1 Marcelo Vighi Fernandes.pdf: 3509868 bytes, checksum: d6db1f1e680ba92bb965b2d327c5de04 (MD5) Previous issue date: 2018-08-27 / Nenhuma / A revolução das Tecnologias de Informação e Comunicação (TIC) fez as empresas perceberem a importância da estratégia tecnológica para a sua sobrevivência. Blockchain é uma tecnologia descentralizada de gerenciamento de transações e dados desenvolvida, primeiramente, para a moeda digital bitcoin. O interesse na tecnologia blockchain tem aumentado desde que o termo foi cunhado. Esse interesse fez com que este fenômeno se tornasse, atualmente, um dos principais tópicos de pesquisa e publicação na Web. O objetivo principal deste trabalho é entender de que forma o fenômeno blockchain está impactando na estratégia tecnológica. Para tanto, foi realizado um estudo exploratório utilizando o processo de Descoberta de Conhecimento em Texto (DCT), com a utilização de ferramentas de mineração de textos, de forma a coletar e analisar o conteúdo de um conjunto de notícias publicadas na Web sobre a tecnologia blockchain. Foram extraídas 2.605 notícias da Web sobre blockchain, publicadas entre os anos 2015 e 2017, no idioma inglês. Como resultado do estudo, foram geradas 6 proposições, mostrando que este fenômeno está impactando a estratégia tecnológica da indústria financeira direcionando o foco deste setor para implementação de soluções em arquiteturas descentralizadas. Também foi verificado que o foco estratégico tecnológico das empresas impulsionou o desenvolvimento das tecnologias de blockchain privadas. Identificou-se, também, os benefícios trazidos por esta tecnologia para sistemas de pagamentos entre países, diminuindo os intermediários e melhorando os processos. Ainda, foi possível mapear que esta tecnologia tem potencial para afetar as transações através de uma plataforma eletrônica comum. Em relação ao grau de maturidade desta tecnologia, foi realizada uma discussão dos achados das análises das notícias com a teoria da difusão da inovação e concluiu-se que esta tecnologia está no limiar entre as categorias de Innovators e Early Adopters. O mapa produzido por esta pesquisa ajudará empresas e profissionais na identificação de oportunidades de direcionamento das suas estratégias tecnológicas para a tecnologia de blockchain. / The Information and Communication Technologies (ICT) revolution made companies realize the importance of technology strategy for their survival. Blockchain is a decentralized transaction and data management technology first developed for the bitcoin digital currency. The interest in blockchain technology has increased since the idea was coined. This interest has made this phenomenon one of the main topics of research and publication on the Web. The main objective of this paper is to understand how the blockchain phenomenon is impacting technology strategy. To do so, an exploratory study was conducted using the Knowledge Discovery in Text (KDT) process, with the use of text mining tools, to collect and analyze the contents of a set of news published on the Web about blockchain technology. At total, 2605 blockchain web news were extracted, all news were published between the years of 2015 and 2017, in the English language. As a result of the study, 6 propositions were generated, in which the results showed that this phenomenon is impacting the technology strategy of the financial industry, directing the focus of this sector to the implementation of solutions using decentralized architectures. It was also verified that the companies’ strategic technological focus boosted the development of private blockchain technologies. Additionally, was identified the benefits brought by this technology to cross-border payment systems, reducing intermediaries and improving processes. Also, it was possible to map out that this technology has the potential to affect the transactions through a common electronic platform. In relation to the degree of maturity of this technology, a discussion of the findings with the theory of the diffusion of innovation was made and it is concluded that this technology is in the threshold between the categories of Innovators and Early Adopters. The map produced by this research will help companies and professionals in identifying opportunities to target their technology strategies to blockchain technology.
143

Descoberta de conhecimento aplicado à base de dados textual de saúde

Barbosa, Alexandre Nunes 26 March 2012 (has links)
Submitted by William Justo Figueiro (williamjf) on 2015-07-18T12:21:33Z No. of bitstreams: 1 42c.pdf: 1016491 bytes, checksum: 407619e0114b592531ee5a68ca0fd0f9 (MD5) / Made available in DSpace on 2015-07-18T12:21:33Z (GMT). No. of bitstreams: 1 42c.pdf: 1016491 bytes, checksum: 407619e0114b592531ee5a68ca0fd0f9 (MD5) Previous issue date: 2012 / UNISINOS - Universidade do Vale do Rio dos Sinos / Este trabalho propõe um processo de investigação do conteúdo de uma base de dados, composta por dados descritivos e pré-estruturados do domínio da saúde, mais especificamente da área da Reumatologia. Para a investigação da base de dados, foram compostos 3 conjuntos de interesse. O primeiro composto por uma classe com conteúdo descritivo relativo somente a área da Reumatologia em geral, e outra cujo seu conteúdo pertence a outras áreas da medicina. O segundo e o terceiro conjunto, foram constituídos após análises estatísticas na base de dados. Um formado pelo conteúdo descritivo associado as 5 maiores frequências de códigos CID, e outro formado por conteúdo descritivo associado as 3 maiores frequências de códigos CID relacionados exclusivamente à área da Reumatologia. Estes conjuntos foram pré-processados com técnicas clássicas de Pré-processamento tais como remoção de Stopwords e Stemmer. Com o objetivo de extrair padrões que através de sua interpretação resultem na produção de conhecimento, foram aplicados aos conjuntos de interesse técnicas de classificação e associação, visando à relação entre o conteúdo textual que descreve sintomas de doenças com o conteúdo pré-estruturado, que define o diagnóstico destas doenças. A execução destas técnicas foi realizada através da aplicação do algoritmo de classificação Support Vector Machines e do algoritmo para extração de Regras de Associação Apriori. Para o desenvolvimento deste processo foi pesquisado referencial teórico relativo à mineração de dados, bem como levantamento e estudo de trabalhos científicos produzidos no domínio da mineração textual e relacionados a Prontuário Médico Eletrônico, focando o conteúdo das bases de dados utilizadas, técnicas de pré-processamento e mineração empregados na literatura, bem como os resultados relatados. A técnica de classificação empregada neste trabalho obteve resultados acima de 80% de Acurácia, demonstrando capacidade do algoritmo de rotular dados da saúde relacionados ao domínio de interesse corretamente. Também foram descobertas associações entre conteúdo textual e conteúdo pré-estruturado, que segundo a análise de especialistas, podem conduzir a questionamentos quanto à utilização de determinados CIDs no local de origem dos dados. / This study suggests a process of investigation of the content of a database, comprising descriptive and pre-structured data related to the health domain, more particularly in the area of Rheumatology. For the investigation of the database, three sets of interest were composed. The first one formed by a class of descriptive content related only to the area of Rheumatology in general, and another whose content belongs to other areas of medicine. The second and third sets were constituted after statistical analysis in the database. One of them formed by the descriptive content associated to the five highest frequencies of ICD codes, and another formed by descriptive content associated with the three highest frequencies of ICD codes related exclusively to the area of Rheumatology. These sets were pre-processed with classic Pre-processing techniques such as Stopword Removal and Stemming. In order to extract patterns that, through their interpretation, result in knowledge production, association and classification techniques were applied to the sets of interest, aiming at to relate the textual content that describes symptoms of diseases with pre-structured content, which defines the diagnosis of these diseases. The implementation of these techniques was carried out by applying the classification algorithm Support Vector Machines and the Association Rules Apriori Algorithm. For the development of this process, theoretical references concerning data mining were researched, including selection and review of scientific publications produced on text mining and related to Electronic Medical Record, focusing on the content of the databases used, techniques for pre-processing and mining used in the literature, as well as the reported results. The classification technique used in this study reached over 80% accurate results, demonstrating the capacity the algorithm has to correctly label health data related to the field of interest. Associations between text content and pre-structured content were also found, which, according to expert analysis, may be questioned as for the use of certain ICDs in the place of origin of the data.
144

Praktické uplatnění technologií data mining ve zdravotních pojišťovnách / Practical applications of data mining technologies in health insurance companies

Kulhavý, Lukáš January 2010 (has links)
This thesis focuses on data mining technology and its possible practical use in the field of health insurance companies. Thesis defines the term data mining and its relation to the term knowledge discovery in databases. The term data mining is explained, inter alia, with methods describing the individual phases of the process of knowledge discovery in databases (CRISP-DM, SEMMA). There is also information about possible practical applications, technologies and products available in the market (both products available free and commercial products). Introduction of the main data mining methods and specific algorithms (decision trees, association rules, neural networks and other methods) serves as a theoretical introduction, on which are the practical applications of real data in real health insurance companies build. These are applications seeking the causes of increased remittances and churn prediction. I have solved these applications in freely-available systems Weka and LISP-Miner. The objective is to introduce and to prove data mining capabilities over this type of data and to prove capabilities of Weka and LISP-Miner systems in solving tasks due to the methodology CRISP-DM. The last part of thesis is devoted the fields of cloud and grid computing in conjunction with data mining. It offers an insight into possibilities of these technologies and their benefits to the technology of data mining. Possibilities of cloud computing are presented on the Amazon EC2 system, grid computing can be used in Weka Experimenter interface.
145

Análise de agrupamentos baseada na topologia dos dados e em mapas auto-organizáveis. / Data clustering based on data topology and self organizing-maps.

Clodis Boscarioli 16 May 2008 (has links)
Cada vez mais, na conjuntura das grandes tomadas de decisões, a análise de dados massivamente armazenados se torna uma necessidade das mais variadas áreas de conhecimento. A análise de dados envolve a realização de diferentes tarefas, que podem ser realizadas por diferentes técnicas e estratégias como análise de agrupamento de dados. Esta pesquisa enfatiza a realização da tarefa de análise de agrupamento de dados (Data Clustering) usando SOM (Self-Organizing Maps) como principal artefato. SOM é uma rede neural artificial baseada em aprendizado competitivo e não-supervisionado, o que significa que o treinamento é inteiramente guiado pelos dados e que os neurônios do mapa competem entre si. Essa rede neural possui a habilidade de formar mapeamentos que quantizam os dados, preservando a sua topologia. Este trabalho introduz uma nova metodologia de análise de agrupamentos a partir de SOM, que considera o mapa topológico gerado por ele e a topologia dos dados no processo de agrupamento. Uma análise experimental e comparativa é apresentada, evidenciando a potencialidade da proposta, destacando, por fim, as principais contribuições do trabalho. / More than ever, in environment of large decision making, the analysis of data stored massively becomes a real need in almost all knowledge areas. The data analyzing process covers the performing of different tasks that can be executed for different techniques and strategies as the data clustering analysis. This research is focused on the analysis task of data groups, called Data Clustering using Self Organizing Maps (SOM) as principal artifact. SOM is an artificial neural network based on competitive and unsupervised learning, what means that its training is entirely driven by the data, such the neurons of the map compete themselves for doing it. This neural network has the ability to build the mapping task that quantifies the source data, but preserving the topology. This work introduces a new clustering analysis methodology based on SOM, considering the topological map produced by it and also the topology of the data obtained in the clustering process. The experimental and comparative analysis are also presented to demonstrate the potential of the proposal, highlighting at the end the mainly contributions of the work.
146

Análise de grandezas cinemáticas e dinâmicas inerentes à hemiparesia através da descoberta de conhecimento em bases de dados / Analysis of kinematic and dynamic data inherent to hemiparesis through knowledge discovery in databases

Moretti, Caio Benatti 31 March 2016 (has links)
Em virtude de uma elevada expectativa de vida mundial, faz-se crescente a probabilidade de ocorrer acidentes naturais e traumas físicos no cotidiano, o que ocasiona um aumento na demanda por reabilitação. A terapia física, sob o paradigma da reabilitação robótica com serious games, oferece maior motivação e engajamento do paciente ao tratamento, cujo emprego foi recomendado pela American Heart Association (AHA), apontando a mais alta avaliação (Level A) para pacientes internados e ambulatoriais. No entanto, o potencial de análise dos dados coletados pelos dispositivos robóticos envolvidos é pouco explorado, deixando de extrair informações que podem ser de grande valia para os tratamentos. O foco deste trabalho consiste na aplicação de técnicas para descoberta de conhecimento, classificando o desempenho de pacientes diagnosticados com hemiparesia crônica. Os pacientes foram inseridos em um ambiente de reabilitação robótica, fazendo uso do InMotion ARM, um dispositivo robótico para reabilitação de membros superiores e coleta dos dados de desempenho. Foi aplicado sobre os dados um roteiro para descoberta de conhecimento em bases de dados, desempenhando pré-processamento, transformação (extração de características) e então a mineração de dados a partir de algoritmos de aprendizado de máquina. A estratégia do presente trabalho culminou em uma classificação de padrões com a capacidade de distinguir lados hemiparéticos sob uma precisão de 94%, havendo oito atributos alimentando a entrada do mecanismo obtido. Interpretando esta coleção de atributos, foi observado que dados de força são mais significativos, os quais abrangem metade da composição de uma amostra. / As a result of a higher life expectancy, the high probability of natural accidents and traumas occurences entails an increasing need for rehabilitation. Physical therapy, under the robotic rehabilitation paradigm with serious games, offers the patient better motivation and engagement to the treatment, being a method recommended by American Heart Association (AHA), pointing the highest assessment (Level A) for inpatients and outpatients. However, the rich potential of the data analysis provided by robotic devices is poorly exploited, discarding the opportunity to aggregate valuable information to treatments. The aim of this work consists of applying knowledge discovery techniques by classifying the performance of patients diagnosed with chronic hemiparesis. The patients, inserted into a robotic rehabilitation environment, exercised with the InMotion ARM, a robotic device for upper-limb rehabilitation which also does the collection of performance data. A Knowledge Discovery roadmap was applied over collected data in order to preprocess, transform and perform data mining through machine learning methods. The strategy of this work culminated in a pattern classification with the abilty to distinguish hemiparetic sides with an accuracy rate of 94%, having eight attributes feeding the input of the obtained mechanism. The interpretation of these attributes has shown that force-related data are more significant, comprising half of the composition of a sample.
147

Deep Learning Black Box Problem

Hussain, Jabbar January 2019 (has links)
Application of neural networks in deep learning is rapidly growing due to their ability to outperform other machine learning algorithms in different kinds of problems. But one big disadvantage of deep neural networks is its internal logic to achieve the desired output or result that is un-understandable and unexplainable. This behavior of the deep neural network is known as “black box”. This leads to the following questions: how prevalent is the black box problem in the research literature during a specific period of time? The black box problems are usually addressed by socalled rule extraction. The second research question is: what rule extracting methods have been proposed to solve such kind of problems? To answer the research questions, a systematic literature review was conducted for data collection related to topics, the black box, and the rule extraction. The printed and online articles published in higher ranks journals and conference proceedings were selected to investigate and answer the research questions. The analysis unit was a set of journals and conference proceedings articles related to the topics, the black box, and the rule extraction. The results conclude that there has been gradually increasing interest in the black box problems with the passage of time mainly because of new technological development. The thesis also provides an overview of different methodological approaches used for rule extraction methods.
148

Temporale Aspekte entdeckten Wissens

Baron, Steffan 06 October 2004 (has links)
In den letzten Jahren haben Anzahl und Umfang verfuegbarer Datensaetze stark zugenommen, wodurch die Entwicklung von Methoden zur Entdeckung von Wissens in den Daten zu einer grossen Herausforderung geworden ist. Waehrend dabei sonst Effizienzfragen im Vordergrund standen, wurde in juengerer Zeit auch die temporale Dimension der Daten einbezogen. Es wurden Methoden erarbeitet, die der Pflege des entdeckten Wissens dienen. Diesen Techniken liegt die Idee zugrunde, dass Daten oft ueber einen langen Zeitraum gesammelt werden. Damit sind sie den gleichen Aenderungen ausgesetzt wie die Realitaet. Aendern sich aber die Daten, ist auch mit Aenderungen in den Analyse-Ergebnissen zu rechnen. Es genuegt aber nicht, nur die Aktualitaet der Ergebnisse sicherzustellen. Vielmehr ist es notwendig, auch ihre Entwicklung im Zeitverlauf zu erfassen. In dieser Arbeit wird Wissensentdeckung als kontinuierlicher Prozess verstanden. Daten werden ueber einen potentiell langen Zeitraum gesammelt und in bestimmten Zeitabstaenden analysiert. Jede Analyse liefert eine Menge von Mustern, die in einer Regelbasis erfasst und deren Entwicklung aufgezeichnet wird. Ausgangspunkt ist ein temporales Datenmodell, das den Inhalt von Mustern und ihre statistischen Eigenschaften abbildet. Darauf aufbauend, wird ein umfassendes Bezugssystem fuer die Ueberwachung und Analyse der Entwicklung entdeckten Wissens entwickelt, das die vielen verschiedenen Facetten der Evolution von Mustern integriert und die Erkennung von Trends erlaubt. Dieses Bezugssystem ermoeglicht es, verschiedene Arten von Musteraenderungen nach qualitativen, quantitativen und temporalen Kriterien erkennen und bewerten zu koennen, andererseits gestattet es, die temporalen Eigenschaften der gefundenen Zusammenhaenge als Kriterium fuer ihre Relevanz zu nutzen und die Ursachen der beobachteten Aenderungen zu bestimmen. Im Rahmen zweier Fallstudien wurden die vorgestellten Konzepte einer eingehenden Ueberpruefung unterzogen. / Over the past years the number and size of datasets have grown significantly. This has stimulated research into the development of techniques for the discovery of knowledge in this data. Traditionally the emphasis has been on criteria such as performance and scalability; in recent years, however, the temporal dimension of the data has become a focus of interest. Methods have been developed that deal with the maintenance of the discovered knowledge. These approaches are based on the assumption that the data is collected over a long period of time and, thus, affected by the same changes as the aspects of reality captured in the data. Hence, changes to the data will also be reflected in changes to the results of analysing the data. Therefore, it is not sufficient to consider only the non-temporal aspects of the knowledge, rather it becomes a necessity to also consider the development of identified patterns over time. In this work, knowledge discovery is considered to be a continuous process: data is collected over a period of time and analysed at specific time intervals. Each analysis produces a set of patterns which are stored in a rule base and monitored based on their statistical properties. Using a temporal data model which consists of both the content of a pattern and its statistical measurements, a general framework for monitoring and analysing the development of the discovered knowledge is proposed. Integrating the many different facets of pattern evolution, the model also provides for trend recognition. The framework is used to detect and assess different types of pattern change with respect to their qualitative, quantitative and temporal aspects. In addition, it permits the usage of the temporal properties of patterns as criterion for their relevance and enables the application expert to determine the causes of pattern change. Two case studies are presented and discussed which examine the eligibility of the proposed concepts thoroughly.
149

企業網路下之資料發掘 / Data Mining in the Intranet Enviroment

金士俊, Chin, Shi-Chun Unknown Date (has links)
近幾年來,企業網路在各大小企業中蓬勃發展,因此而產生在企業網路上尋找資訊的問題。對於許多掌握其企業網路的複雜程度尚存在困難的企業來說,要從其企業網路上大量的文件中找出真正有用的、潛在的資訊與知識,誠非易事。因此有待以較深入的思考構面對於各企業的企業網路架構進行檢視,才能找出適當的資料發掘做法與技術,以解決此一難題。本研究先對資料發掘,企業網路等文獻與現況進行整理分析,並提出與企業網路複雜程度相關之參考思考構面並定義簡單型與複雜型的企業網路。然後以Han(1995)之概念樹及多層次資料庫的觀念,採用我們對於網際網路資料發掘所提出的理論架構與作法(楊亨利與金士俊,民90),特別著重企業在授權與知識視野上的思考,提出簡單型與複雜型企業網路下對資料內容發掘的架構,設計適用於簡單型與雜型企業網路之資料發掘系統,並部份實作其雛形,以驗証其可行性,並評估其日後實務操作之其他可能。 / In recent years, with the widespread use of the Intranet, Intranet data searching has become an important but problematic issue. Since a lot of enterprises still have difficulty in evaluating the complexity of their Intranet systems, it is an extremely demanding task for them to identify valid and potentially useful patterns from the huge amount of documents in the Intranet. In order to solve this problem, it is essential to take a close and careful examination of the different Intranet frameworks, so that appropriate data mining approaches and techniques can be worked out.   This research began with a literature review of the fields of data mining and Intranet applications in Taiwan. Based on the review of current theories and practices, the second part of the research focused on the dimensions which we need to consider for evaluating the complexity of Intranet systems. In the third part of the research, basic frameworks for simple and complex Intranet data mining systems were then proposed, three types of data mining approaches suitable for simple and complex Intranet systems were designed and a prototype for implementing two approaches were developed. Han’s (1995) “concept hierarchy” and “multiple layered database” and Yang & Chin’s (2000) approach of Internet data mining were adopted as the major bases. Besides, this research also paid particular attention to the issues of “authorization” and “knowledge view” when designing the software and developing the prototypes. In the last part of the research, the feasibility, practicality and potential uses of the ata mining approaches and prototype were discussed and some directions for future research were suggested.
150

Applications of Knowledge Discovery in Quality Registries - Predicting Recurrence of Breast Cancer and Analyzing Non-compliance with a Clinical Guideline

Razavi, Amir Reza January 2007 (has links)
In medicine, data are produced from different sources and continuously stored in data depositories. Examples of these growing databases are quality registries. In Sweden, there are many cancer registries where data on cancer patients are gathered and recorded and are used mainly for reporting survival analyses to high level health authorities. In this thesis, a breast cancer quality registry operating in South-East of Sweden is used as the data source for newer analytical techniques, i.e. data mining as a part of knowledge discovery in databases (KDD) methodology. Analyses are done to sift through these data in order to find interesting information and hidden knowledge. KDD consists of multiple steps, starting with gathering data from different sources and preparing them in data pre-processing stages prior to data mining. Data were cleaned from outliers and noise and missing values were handled. Then a proper subset of the data was chosen by canonical correlation analysis (CCA) in a dimensionality reduction step. This technique was chosen because there were multiple outcomes, and variables had complex relationship to one another. After data were prepared, they were analyzed with a data mining method. Decision tree induction as a simple and efficient method was used to mine the data. To show the benefits of proper data pre-processing, results from data mining with pre-processing of the data were compared with results from data mining without data pre-processing. The comparison showed that data pre-processing results in a more compact model with a better performance in predicting the recurrence of cancer. An important part of knowledge discovery in medicine is to increase the involvement of medical experts in the process. This starts with enquiry about current problems in their field, which leads to finding areas where computer support can be helpful. The experts can suggest potentially important variables and should then approve and validate new patterns or knowledge as predictive or descriptive models. If it can be shown that the performance of a model is comparable to domain experts, it is more probable that the model will be used to support physicians in their daily decision-making. In this thesis, we validated the model by comparing predictions done by data mining and those made by domain experts without finding any significant difference between them. Breast cancer patients who are treated with mastectomy are recommended to receive radiotherapy. This treatment is called postmastectomy radiotherapy (PMRT) and there is a guideline for prescribing it. A history of this treatment is stored in breast cancer registries. We analyzed these datasets using rules from a clinical guideline and identified cases that had not been treated according to the PMRT guideline. Data mining revealed some patterns of non-compliance with the PMRT guideline. Further analysis with data mining revealed some reasons for guideline non-compliance. These patterns were then compared with reasons acquired from manual inspection of patient records. The comparisons showed that patterns resulting from data mining were limited to the stored variables in the registry. A prerequisite for better results is availability of comprehensive datasets. Medicine can take advantage of KDD methodology in different ways. The main advantage is being able to reuse information and explore hidden knowledge that can be obtained using advanced analysis techniques. The results depend on good collaboration between medical informaticians and domain experts and the availability of high quality data.

Page generated in 0.0559 seconds