Spelling suggestions: "subject:"ddd"" "subject:"3dd""
51 |
Visual Analytics como ferramenta de auxílio ao processo de KDD : um estudo voltado ao pré-processamentoCini, Glauber 29 March 2017 (has links)
Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2017-06-27T13:53:26Z
No. of bitstreams: 1
Glauber Cini_.pdf: 2121004 bytes, checksum: c1f55ddc527cdaeb7ae3c224baea727a (MD5) / Made available in DSpace on 2017-06-27T13:53:26Z (GMT). No. of bitstreams: 1
Glauber Cini_.pdf: 2121004 bytes, checksum: c1f55ddc527cdaeb7ae3c224baea727a (MD5)
Previous issue date: 2017-03-29 / Nenhuma / O Visual Analytics consiste na combinação de métodos inteligentes e automáticos com a capacidade de percepção visual do ser humano visando a extração do conhecimento de conjuntos de dados. Esta capacidade visual é apoiada por interfaces interativas como, sendo a de maior importância para este trabalho, a visualização por Coordenadas Paralelas. Todavia, ferramentas que disponham de ambos os métodos automáticos (KDD) e visuais (Coordenadas Paralelas) de forma genérica e integrada mostra-se primordial. Deste modo, este trabalho apresenta um modelo integrado entre o processo de KDD e o de Visualização de Informação utilizando as Coordenadas Paralelas com ênfase no make sense of data, ao ampliar a possibilidade de exploração dos dados ainda na etapa de pré-processamento. Para demonstrar o funcionamento deste modelo, um plugin foi desenvolvido sobre a ferramenta WEKA. Este módulo é responsável por ampliar as possibilidades de utilização da ferramenta escolhida ao expandir suas funcionalidades a ponto de conceitua-la como uma ferramenta Visual Analytics. Junto a visualização de Coordenadas Paralelas disponibilizada, também se viabiliza a interação por permutação das dimensões (eixos), interação por seleção de amostras (brushing) e possibilidade de detalhamento das mesmas na própria visualização. / Visual Analytics is the combination of intelligent and automatic methods with the ability of human visual perception aiming to extract knowledge from data sets. This visual capability is supported by interactive interfaces, considering the most important for this work, the Parallel Coordinates visualization. However, tools that have both automatic methods (KDD) and visual (Parallel Coordinates) in a generic and integrated way is inherent. Thus, this work presents an integrated model between the KDD process and the Information Visualization using the Parallel Coordinates with emphasis on the make sense of data, by increasing the possibility of data exploration in the preprocessing stage. To demonstrate the operation of this model, a plugin was developed on the WEKA tool. This module is responsible for expanding the possibilities of chosen tool by expanding its functionality to the point of conceptualizing it as a Visual Analytics tool. In addition to the delivered visualization of Parallel Coordinate, it is also possible to interact by permutation of the dimensions (axes), interaction by selection of samples (brushing) and possibility of detailing them in the visualization itself.
|
52 |
Interestingness Measures for Association Rules in a KDD Process : PostProcessing of Rules with ARQAT ToolHuynh, Xuan-Hiep 07 December 2006 (has links) (PDF)
This work takes place in the framework of Knowledge Discovery in Databases (KDD), often called "Data Mining". This domain is both a main research topic and an application ¯eld in companies. KDD aims at discovering previously unknown and useful knowledge in large databases. In the last decade many researches have been published about association rules, which are frequently used in data mining. Association rules, which are implicative tendencies in data, have the advantage to be an unsupervised model. But, in counter part, they often deliver a large number of rules. As a consequence, a postprocessing task is required by the user to help him understand the results. One way to reduce the number of rules - to validate or to select the most interesting ones - is to use interestingness measures adapted to both his/her goals and the dataset studied. Selecting the right interestingness measures is an open problem in KDD. A lot of measures have been proposed to extract the knowledge from large databases and many authors have introduced the interestingness properties for selecting a suitable measure for a given application. Some measures are adequate for some applications but the others are not. In our thesis, we propose to study the set of interestingness measure available in the literature, in order to evaluate their behavior according to the nature of data and the preferences of the user. The ¯nal objective is to guide the user's choice towards the measures best adapted to its needs and in ¯ne to select the most interesting rules. For this purpose, we propose a new approach implemented in a new tool, ARQAT (Association Rule Quality Analysis Tool), in order to facilitate the analysis of the behavior about 40 interest- ingness measures. In addition to elementary statistics, the tool allows a thorough analysis of the correlations between measures using correlation graphs based on the coe±cients suggested by Pear- son, Spearman and Kendall. These graphs are also used to identify the clusters of similar measures. Moreover, we proposed a series of comparative studies on the correlations between interestingness measures on several datasets. We discovered a set of correlations not very sensitive to the nature of the data used, and which we called stable correlations. Finally, 14 graphical and complementary views structured on 5 levels of analysis: ruleset anal- ysis, correlation and clustering analysis, most interesting rules analysis, sensitivity analysis, and comparative analysis are illustrated in order to show the interest of both the exploratory approach and the use of complementary views.
|
53 |
Mineração de dados em múltiplas tabelas fato de um data warehouse.Ribeiro, Marcela Xavier 19 May 2004 (has links)
Made available in DSpace on 2016-06-02T19:05:14Z (GMT). No. of bitstreams: 1
DissMXR.pdf: 889186 bytes, checksum: fe616fa6309b5ac267855726e8a6938b (MD5)
Previous issue date: 2004-05-19 / Financiadora de Estudos e Projetos / The progress of the information technology has allowed huge amount of data to be stored. Those data, when submitted to a process of knowledge discovery, can bring interesting results. Data warehouses are repositories of high quality data. A procedure that has been adopted in big companies is the joint use of data warehouse and data mining technologies, where the process of knowledge discovery takes advantage over the high quality of the warehouse s data. When the data warehouse has information about more than one subject, it also has more than one fact table. The joint analysis of multiple fact tables can bring interesting knowledge as, for instance, the relationship between purchases and sales in a company. This research presents a technique to mine data from multiple fact tables of a data warehouse, which is a new kind of association rule mining. / O progresso da tecnologia de informação permitiu que quantidades cada vez maiores de dados fossem armazenadas. Esses dados, no formato original de armazenamento, não trazem conhecimento, porém, quando tratados e passados por um processo de extração de conhecimento, podem revelar conhecimentos interessantes. Os data warehouses são repositórios de dados com alta qualidade. Um procedimento que vem sendo amplamente adotado nas grandes empresas é a utilização conjunta das
tecnologias de data warehouse e da mineração de dados, para que o processo de descoberta de
conhecimento pela alta qualidade dos dados do data warehouse. Data warehouses que
envolvem mais de um assunto também envolvem mais de uma tabela fato (tabelas que representam o assunto do data warehouse). A análise em conjunto de múltiplos assuntos de um data warehouse pode revelar padrões interessantes como o relacionamento entre as compras e as vendas de determinada organização. Este projeto de pesquisa está direcionado ao desenvolvimento de técnicas de mineração de dados envolvendo múltiplas tabelas fato de um data warehouse, que é um novo tipo de mineração de regras de associação.
|
54 |
A Framework for How to Make Use of an Automatic Passenger Counting SystemFihn, John, Finndahl, Johan January 2011 (has links)
Most of the modern cities are today facing tremendous traffic congestions, which is a consequence of an increasing usage of private motor vehicles in the cities. Public transport plays a crucial role to reduce this traffic, but to be an attractive alternative to the use of private motor vehicles the public transport needs to provide services that suit the citizens requirements for travelling. A system that can provide transit agencies with rapid feedback about the usage of their transport network is the Automatic Passenger Counting (APC) system, a system that registers the number of passengers boarding and alighting a vehicle. Knowledge about the passengers travel behaviour can be used by transit agencies to adapt and improve their services to satisfy the requirements, but to achieve this knowledge transit agencies needs to know how to use an APC system. This thesis investigates how a transit agency can make use of an APC system. The research has taken place in Melbourne where Yarra Trams, operator of the tram network, now are putting effort in how to utilise the APC system. A theoretical framework based on theories about Knowledge Discovery from Data, System Development, and Human Computer Interaction, is built, tested, and evaluated in a case study at Yarra Trams. The case study resulted in a software system that can process and model Yarra Tram's APC data. The result of the research is a proposal of a framework consistingof different steps and events that can be used as a guide for a transit agency that wants to make use of an APC system.
|
55 |
Plataforma para la Extracción y Almacenamiento del Conocimiento Extraído de los Web DataRebolledo Lorca, Víctor January 2008 (has links)
No description available.
|
56 |
Sinkhole Hazard Assessment in Minnesota Using a Decision Tree ModelGao, Yongli, Alexander, E. Calvin 01 May 2008 (has links)
An understanding of what influences sinkhole formation and the ability to accurately predict sinkhole hazards is critical to environmental management efforts in the karst lands of southeastern Minnesota. Based on the distribution of distances to the nearest sinkhole, sinkhole density, bedrock geology and depth to bedrock in southeastern Minnesota and northwestern Iowa, a decision tree model has been developed to construct maps of sinkhole probability in Minnesota. The decision tree model was converted as cartographic models and implemented in ArcGIS to create a preliminary sinkhole probability map in Goodhue, Wabasha, Olmsted, Fillmore, and Mower Counties. This model quantifies bedrock geology, depth to bedrock, sinkhole density, and neighborhood effects in southeastern Minnesota but excludes potential controlling factors such as structural control, topographic settings, human activities and land-use. The sinkhole probability map needs to be verified and updated as more sinkholes are mapped and more information about sinkhole formation is obtained.
|
57 |
Knowledge discovery and machinelearning for capacity optimizationof Automatic Milking RotarySystemXie, Tian January 2016 (has links)
Dairy farming as one part of agriculture has thousands of year’s history. The increasingdemands of dairy products and the rapid development of technology bring dairyfarming tremendous changes. Started by first hand milking, dairy farming goes throughvacuum bucket milking, pipeline milking, and now parlors milking. The automatic andtechnical milking system provided farmer with high-efficiency milking, effective herdmanagement and above all booming income.DeLaval Automatic Milking Rotary (AMRTM) is the world’s leading automatic milkingrotary system. It presents an ultimate combination of technology and machinerywhich brings dairy farming with significant benefits. AMRTM technical milking capacityis 90 cows per hour. However, constrained by farm management, cow’s condition andsystem configuration, the actual capacity is lower than technical value. In this thesis, anoptimization system is designed to analyze and improve AMRTM performance. The researchis focusing on cow behavior and AMRTM robot timeout. Through applying knowledgediscover from database (KDD), building machine learning cow behavior predictionsystem and developing modeling methods for system simulation, the optimizing solutionsare proposed and validated. / Mjölkproduktion är en del av vårt jordbruks tusenåriga historia. Med ökande krav påmejeriprodukter tillsammans med den snabba utvecklingen utav tekniken för det enormaförändringar i mjölkproduktionen. Mjölkproduktion började inledningsvis med handmjölkningsedan har mjölkproduktionsmetoder utvecklats genom olika tekniker och gettoss t.ex. vakuum mjölkning, rörledning mjölkning, fram till dagens mjölkningskarusell.Nu har det automatiska och tekniska mjölkningssystem försedd bönder med högeffektivmjölkning, effektiv djurhållningen och framför allt blomstrande inkomster.DeLaval Automatic Milking Rotary (AMRTM) är världens ledande automatiska roterandemjölkningssystemet. Den presenterar en ultimat kombination av teknik och maskinersom ger mjölkproduktionen betydande fördelar. DeLaval Automatic Milking Rotarytekniska mjölknings kapacitet är 90 kor per timme. Den begränsas utav jordbruksdrift,tillståndet hos kor och hantering av systemet. Det gör att den faktiska kapaciteten blirlägre än den tekniska. I denna avhandling undersöks hur ett optimeringssystem kan analyseraoch förbättra DeLaval Automatic Milking Rotary prestanda genom fokusering påkors beteenden och robot timeout. Genom att tillämpa kunskap från databas (KDD), skapamaskininlärande system som förutsäger kors beteenden samt utveckla modelleringsmetoderför systemsimulering, ges lösningsförslag av optimering samt validering.
|
58 |
Análise inteligente de dados em um banco de dados de procedimentos em cardiologia intervencionista / Intelligent data analysis in an interventional cardiology procedures databaseCampos Neto, Cantídio de Moura 02 August 2016 (has links)
O tema deste estudo abrange duas áreas do conhecimento: a Medicina e a Ciência da Computação. Consiste na aplicação do processo de descoberta de conhecimento em base de Dados (KDD - Knowledge Discovery in Databases), a um banco de dados real na área médica denominado Registro Desire. O Registro Desire é o registro mais longevo da cardiologia intervencionista mundial, unicêntrico e acompanha por mais de 13 anos 5.614 pacientes revascularizados unicamente pelo implante de stents farmacológicos. O objetivo é criar por meio desta técnica um modelo que seja descritivo e classifique os pacientes quanto ao risco de ocorrência de eventos cardíacos adversos maiores e indesejáveis, e avaliar objetivamente seu desempenho. Posteriormente, apresentar as regras extraídas deste modelo aos usuários para avaliar o grau de novidade e de concordância do seu conteúdo com o conhecimento dos especialistas. Foram criados modelos simbólicos de classificação pelas técnicas da árvore de decisão e regras de classificação utilizando para a etapa de mineração de dados os algoritmos C4.5, Ripper e CN2, em que o atributo-classe foi a ocorrência ou não do evento cardíaco adverso. Por se tratar de uma classificação binária, os modelos foram avaliados objetivamente pelas métricas associadas à matriz de confusão como acurácia, sensibilidade, área sob a curva ROC e outras. O algoritmo de mineração processa automaticamente todos os atributos de cada paciente exaustivamente para identificar aqueles fortemente associados com o atributo-classe (evento cardíaco) e que irão compor as regras. Foram extraídas as principais regras destes modelos de modo indireto, por meio da árvore de decisão ou diretamente pela regra de classificação, que apresentaram as variáveis mais influentes e preditoras segundo o algoritmo de mineração. Os modelos permitiram entender melhor o domínio de aplicação, relacionando a influência de detalhes da rotina e as situações associadas ao procedimento médico. Pelo modelo, foi possível analisar as probabilidades da ocorrência e da não ocorrência de eventos em diversas situações. Os modelos induzidos seguiram uma lógica de interpretação dos dados e dos fatos com a participação do especialista do domínio. Foram geradas 32 regras das quais três foram rejeitadas, 20 foram regras esperadas e sem novidade, e 9 foram consideradas regras não tão esperadas, mas que tiveram grau de concordância maior ou igual a 50%, o que as tornam candidatas à investigação para avaliar sua eventual importância. Tais modelos podem ser atualizados ao aplicar novamente o algoritmo de mineração ao banco com os dados mais recentes. O potencial dos modelos simbólicos e interpretáveis é grande na Medicina quando aliado à experiência do profissional, contribuindo para a Medicina baseada em evidência. / The main subject of this study comprehends two areas of knowledge, the Medical and Computer Science areas. Its purpose is to apply the Knowledge Discovery Database-KDD to the DESIRE Registry, an actual Database in Medical area. The DESIRE Registry is the oldest world\'s registry in interventional cardiology, is unicentric, which has been following up 5.614 resvascularized patients for more then 13 years, solely with pharmacological stent implants. The goal is to create a model using this technique that is meaningful to classify patients as the risk of major adverse cardiac events (MACE) and objectively evaluate their performance. Later present rules drawn from this model to the users to assess the degree of novelty and compliance of their content with the knowledge of experts. Symbolic classification models were created using decision tree model, and classification rules using for data mining step the C4.5 algorithms, Ripper and CN2 where the class attribute is the presence or absence of a MACE. As the classification is binary, the models where objectively evaluated by metrics associated to the Confusion Matrix, such as accuracy, sensitivity, area under the ROC curve among others. The data mining algorithm automatically processes the attributes of each patient, who are thoroughly tested in order to identify the most predictive to the class attribute (MACE), whom the rules will be based on. Indirectly, using decision tree, or directly, using the classification rules, the main rules of these models were extracted to show the more predictable and influential variables according to the mining algorithm. The models allowed better understand the application range, creating a link between the influence of the routine details and situations related to the medical procedures. The model made possible to analyse the probability of occurrence or not of events in different situations. The induction of the models followed an interpretation of the data and facts with the participation of the domain expert. Were generated 32 rules of which only three were rejected, 20 of them were expected rules and without novelty and 9 were considered rules not as expected but with a degree of agreement higher or equal 50%, which became candidates for an investigation to assess their possible importance. These models can be easily updated by reapplying the mining process to the database with the most recent data. There is a great potential of the interpretable symbolic models when they are associated with professional background, contributing to evidence-based medicine.
|
59 |
Smart Marketing na TV Digital Interativa atrav?s de um sistema de recomenda??o de an?ncios / Smart Marketing on Interactive Digital TV through an advertising recommendation systemSantos, Alan Menk dos 03 December 2012 (has links)
Made available in DSpace on 2016-04-04T18:31:34Z (GMT). No. of bitstreams: 1
Alan Menk dos Santos.pdf: 6433244 bytes, checksum: d8118b5fa4198a1f3792738316afd65a (MD5)
Previous issue date: 2012-12-03 / With the implementation of the Brazilian Digital TV System (SBTVD) comes a range of new opportunities and possibilities both for viewer and TV stations. For the viewers, they
will have an immense amount of channels, programs and interactive advertisements. For TV stations, it increases the possibility of advertising in new media. In this context, the
opportunity arises for a recommendation system for applications and interactivity portals. This dissertation presents a proposal of advertising personalization into applications and portals of digital TV environment in order to bring a better experience to the viewer, a new form of income for the broadcasters and also a greater acceptance of specialized products for use. This work develops an application for interactive Digital TV called Smart Marketing
capable of capturing viewer navigation data through both implicit and explicit means by performing customized advertising from the process of knowledge discovery.
Developed from AstroTV middleware, compatible with the Brazilian specification, its application was evaluated by means of experiment that used varied user profiles, applying
into the generated database the process of knowledge discovery, which used tasks of classification and grouping. The results indicated the quality of the recommendation
generated by Smart Marketing. / Com a implanta??o do Sistema Brasileiro de TV Digital (SBTVD), inicia-se uma gama de novas oportunidades e possibilidades tanto para o telespectador quanto as emissoras de TV. Para os Telespectadores, eles ter?o uma imensa quantidade de canais, programas e propagandas interativas. Para as emissoras de TV, aumenta a possibilidade de
propagandas em novos meios de comunica??o. Neste contexto, surge a oportunidade de um sistema de recomenda??o para os aplicativos e portais de interatividade.
Esta disserta??o apresenta uma proposta de personaliza??o de propaganda em aplicativos e portais do ambiente de TV Digital com o objetivo de trazer uma melhor experi?ncia ao telespectador, uma nova forma de obten??o de recursos por parte das teledifusoras e tamb?m uma maior aceita??o de produtos especializados, para uso. Este trabalho desenvolve um aplicativo para a TV Digital interativa denominado Smart
Marketing capaz de capturar os dados de navega??o do telespectador tanto por meio impl?cito quanto explicito, realizando a apresenta??o de publicidades personalizadas a
partir do processo de descoberta do conhecimento. Elaborado a partir do middleware AstroTV, compat?vel com a especifica??o brasileira, sua aplica??o foi avaliada por meio do experimento que se utilizou, de usu?rios com perfis variados, aplicando na base de dados gerada o processo de descoberta de conhecimento, o qual utilizou-se das tarefas de classifica??o e agrupamento. Os resultados obtidos indicaram a qualidade da recomenda??o gerada pelo Smart Marketing.
|
60 |
Aplicação do processo de descoberta de conhecimento em dados do poder judiciário do estado do Rio Grande do Sul / Applying the Knowledge Discovery in Database (KDD) Process to Data of the Judiciary Power of Rio Grande do SulSchneider, Luís Felipe January 2003 (has links)
Para explorar as relações existentes entre os dados abriu-se espaço para a procura de conhecimento e informações úteis não conhecidas, a partir de grandes conjuntos de dados armazenados. A este campo deu-se o nome de Descoberta de Conhecimento em Base de Dados (DCBD), o qual foi formalizado em 1989. O DCBD é composto por um processo de etapas ou fases, de natureza iterativa e interativa. Este trabalho baseou-se na metodologia CRISP-DM . Independente da metodologia empregada, este processo tem uma fase que pode ser considerada o núcleo da DCBD, a “mineração de dados” (ou modelagem conforme CRISP-DM), a qual está associado o conceito “classe de tipo de problema”, bem como as técnicas e algoritmos que podem ser empregados em uma aplicação de DCBD. Destacaremos as classes associação e agrupamento, as técnicas associadas a estas classes, e os algoritmos Apriori e K-médias. Toda esta contextualização estará compreendida na ferramenta de mineração de dados escolhida, Weka (Waikato Environment for Knowledge Analysis). O plano de pesquisa está centrado em aplicar o processo de DCBD no Poder Judiciário no que se refere a sua atividade fim, julgamentos de processos, procurando por descobertas a partir da influência da classificação processual em relação à incidência de processos, ao tempo de tramitação, aos tipos de sentenças proferidas e a presença da audiência. Também, será explorada a procura por perfis de réus, nos processos criminais, segundo características como sexo, estado civil, grau de instrução, profissão e raça. O trabalho apresenta nos capítulos 2 e 3 o embasamento teórico de DCBC, detalhando a metodologia CRISP-DM. No capítulo 4 explora-se toda a aplicação realizada nos dados do Poder Judiciário e por fim, no capítulo 5, são apresentadas as conclusões. / With the purpose of exploring existing connections among data, a space has been created for the search of Knowledge an useful unknown information based on large sets of stored data. This field was dubbed Knowledge Discovery in Databases (KDD) and it was formalized in 1989. The KDD consists of a process made up of iterative and interactive stages or phases. This work was based on the CRISP-DM methodology. Regardless of the methodology used, this process features a phase that may be considered as the nucleus of KDD, the “data mining” (or modeling according to CRISP-DM) which is associated with the task, as well as the techniques and algorithms that may be employed in an application of KDD. What will be highlighted in this study is affinity grouping and clustering, techniques associated with these tasks and Apriori and K-means algorithms. All this contextualization will be embodied in the selected data mining tool, Weka (Waikato Environment for Knowledge Analysis). The research plan focuses on the application of the KDD process in the Judiciary Power regarding its related activity, court proceedings, seeking findings based on the influence of the procedural classification concerning the incidence of proceedings, the proceduring time, the kind of sentences pronounced and hearing attendance. Also, the search for defendants’ profiles in criminal proceedings such as sex, marital status, education background, professional and race. In chapters 2 and 3, the study presents the theoretical grounds of KDD, explaining the CRISP-DM methodology. Chapter 4 explores all the application preformed in the data of the Judiciary Power, and lastly, in Chapter conclusions are drawn
|
Page generated in 0.0288 seconds