Global ETD Search

1	Robust and Efficient Feature Selection for High-Dimensional Datasets Mo, Dengyao 19 April 2011 (has links) No description available. Information Systems Feature Selection Data Mining Machine Learning Statistical Modeling Knowledge Discovery in Database
2	DDAAV DETECTOR DO DESEMPENHO DO ALUNO EM AVAs / DDAAV DETECTOR PERFORMANCE OF STUDENTS IN VLES Mühlbeier, Andreia Rosangela Kessler 15 April 2014 (has links) Conselho Nacional de Desenvolvimento Científico e Tecnológico / The virtual learning environments (VLEs) are benefited with advances in the use of technologies in education, enabling a more dynamic and meaningful learning. In the face of increased interaction in these environments, greatly increases the amount of data stored. The process of knowledge discovery in database (KDDKnowledge Discovery in Databases) has been used successfully in several areas and in the academic area some results have been used to assist the teachers. This dissertation describes a survey conducted with the steps of KDD, which utilizes the WEKA tool (free data mining software), specifically the J48 algorithm, to apply data mining techniques on the information stored in the database, in order to detect the student performance while running the course. The research scenario was constructed with data from assessments of introduction to media in education, Integration of the specialization course in Media in education, composed of 134 (one hundred and thirty-four) students, distributed in 5 (five) different poles. In this way, with the results obtained in the research, noted that the application of rules of the algorithm, can be a valuable instrument to professor during the execution of the course, and not only a posteriori, because it allows a positive immediate intervention of even in several variables that impact on the success of the apprentice, as type of material, discussions, activities, methodologies and strategies. / Os Ambientes Virtuais de Aprendizagem (AVAs) são beneficiados com os avanços do uso de tecnologias na Educação, possibilitando uma aprendizagem mais dinâmica e significativa. Diante do aumento de interação nestes ambientes, aumenta consideravelmente o volume de dados armazenados. O processo de Descoberta de Conhecimento em Base de Dados (Knowledge Discovery in Databases - KDD) vem sendo utilizado com sucesso em diversas áreas e na área acadêmica alguns resultados têm sido utilizados para auxiliar os professores. A presente dissertação descreve uma pesquisa realizada com as etapas de KDD, que utiliza a ferramenta WEKA (software de mineração de dados livre), em específico o algoritmo J48, para aplicar técnicas de mineração de dados nas informações armazenadas no banco de dados, a fim de detectar o desempenho dos alunos durante a execução do curso. O cenário de investigação foi construído com os dados oriundos das avaliações da disciplina de Introdução à Integração de Mídias na Educação, do Curso de Especialização em Mídias na Educação, composto de 134 (cento e trinta e quatro) alunos, distribuídos em 5 (cinco) polos distintos. Dessa forma, com os resultados obtidos na pesquisa, observou se que a aplicação de regras do algoritmo, pode ser um valioso instrumento ao professor durante a execução do curso, e não apenas a posteriori, pois possibilita uma intervenção positiva imediata do mesmo, nas diversas variáveis que impactam no sucesso do aprendiz, como tipo de material, discussões, atividades, metodologias e estratégia. Desempenho do aluno WEKA Knowledge discovery in database Student performance WEKA
3	Sinkhole Hazard Assessment in Minnesota Using a Decision Tree Model Gao, Yongli, Alexander, E. Calvin 01 May 2008 (has links) An understanding of what influences sinkhole formation and the ability to accurately predict sinkhole hazards is critical to environmental management efforts in the karst lands of southeastern Minnesota. Based on the distribution of distances to the nearest sinkhole, sinkhole density, bedrock geology and depth to bedrock in southeastern Minnesota and northwestern Iowa, a decision tree model has been developed to construct maps of sinkhole probability in Minnesota. The decision tree model was converted as cartographic models and implemented in ArcGIS to create a preliminary sinkhole probability map in Goodhue, Wabasha, Olmsted, Fillmore, and Mower Counties. This model quantifies bedrock geology, depth to bedrock, sinkhole density, and neighborhood effects in southeastern Minnesota but excludes potential controlling factors such as structural control, topographic settings, human activities and land-use. The sinkhole probability map needs to be verified and updated as more sinkholes are mapped and more information about sinkhole formation is obtained. decision tree model karst feature database (KFD) knowledge discovery in database (KDD) Minnesota nearest neighbor analysis (NNA) sinkhole probability
4	Uso de redes neurais artificiais para descoberta de conhecimento sobre a escolha do modo de viagem / Using artificial neural network for the discovery of mode travel choice knowledge Wermersch, Fábio Glauco 09 May 2002 (has links) Esta pesquisa objetivou uma melhor compreensão do processo de escolha do modo de viagem. Empregou-se a abordagem indutiva dirigida a dados livre de suposições a priori da mineração em banco de dados (Data Mining), utilizando redes neurais artificiais (RNA) como ferramenta mineradora, à procura de conhecimento, ou informação útil, a respeito de escolha e capaz de indicar qual das estruturas de decisão subjacentes aos modelos de escolha modal considerados mais se aproximaria ao do observado. Partindo-se da ideia de que nesse processo exista um padrão o qual pode ser captado por uma RNA, ajustou-se um modelo de RNA aos dados e extraiu-se então o conhecimento contido no modelo de RNA ajustado através de um algoritmo de extração de árvore de decisão de RNA chamado Trepan (Trees parroting network), que foi analisado e interpretado à luz dos objetivos desta pesquisa. Os dados que foram utilizados nesse processo de descoberta de conhecimento são provenientes de uma pesquisa de entrevista domiciliar realizada na cidade de Bauru - SP, para fins de estimativa da matriz de deslocamentos origem-destino dessa cidade. Obteve-se quatro árvores de decisão com estruturas simples e com a araucária preditiva de 75% aproximadamente para os três modos de viagem estudados. Embora o conhecimento extraído dos modelos neurais ajustados não tenham proporcionado a indicação de qual das estruturas de decisão subjacentes aos modelos de escolha modal mais se aproxima da obtida com o modelo neural, foi constatada nas árvores resultantes do processo de descoberta do conhecimento uma relação de compensação entre o atributo sexo e os atributos relacionados à capacidade econômica do domicílio na decisão de escolha do modo carro para a realização de uma viagem. Os resultados também sugerem a não necessidade de mais um atributo de entrada referente ao deslocamento realizado em uma viagem para modelagem por RNA do processo de escolha do modo de viagem no contexto estudado. / This research aimed at a better understanding of the mode travel choice process. The inductive data driven free from a priori assumptions of the data mining approach was employed, using artificial neural networks (ANN) as a mining tool, looking for knowledge or useful information, concerning the choice process and capable of indicating which of the underlying decision structures to the considered modal choice models would come closer to the observed one. Taking into consideration that there is a pattern in this process that can be captured by ANN, an ANN model was fitted (trained) to the data, and the knowledge contained in the trained ANN model was extracted by employing an ANN decision tree extraction algorithm called Trepan (Trees parroting network), which was analysed and interpreted in the light of the object of this research. The data which was employed in this knowledge discovery process come from a household survey carried out in Bauru - SP in order to estimate the O-D matrix in this city. Four decision trees with simple structures and predicting accuracy of approximately 75% for the three travel modes studied were obtained. Even though the knowledge extracted from the trained ANN model has not yielded the indication of which of the underlying decision structures to the modal choice models was closer to the neural model, a compensating relation between the sex attribute and the household economic-related attribute in the decision of choosing the car mode in order to travel was evidenced in the trees resulting from the process of knowledge discovery. The results also suggest the lack of necessity of more than one input travel attribute concerning the displacement performed in a trip for the ANN modelling of the mode travel choice process in the studied context. Análise de demanda por transporte Artificial intelligence Artificial neural network Árvores de decisão Choice models Data mining Decision trees Inteligência artificial Knowledge discovery in database Mineração de dados Modelos de escolha Redes neurais artificiais Transport demand analysis
5	Uso de redes neurais artificiais para descoberta de conhecimento sobre a escolha do modo de viagem / Using artificial neural network for the discovery of mode travel choice knowledge Fábio Glauco Wermersch 09 May 2002 (has links) Esta pesquisa objetivou uma melhor compreensão do processo de escolha do modo de viagem. Empregou-se a abordagem indutiva dirigida a dados livre de suposições a priori da mineração em banco de dados (Data Mining), utilizando redes neurais artificiais (RNA) como ferramenta mineradora, à procura de conhecimento, ou informação útil, a respeito de escolha e capaz de indicar qual das estruturas de decisão subjacentes aos modelos de escolha modal considerados mais se aproximaria ao do observado. Partindo-se da ideia de que nesse processo exista um padrão o qual pode ser captado por uma RNA, ajustou-se um modelo de RNA aos dados e extraiu-se então o conhecimento contido no modelo de RNA ajustado através de um algoritmo de extração de árvore de decisão de RNA chamado Trepan (Trees parroting network), que foi analisado e interpretado à luz dos objetivos desta pesquisa. Os dados que foram utilizados nesse processo de descoberta de conhecimento são provenientes de uma pesquisa de entrevista domiciliar realizada na cidade de Bauru - SP, para fins de estimativa da matriz de deslocamentos origem-destino dessa cidade. Obteve-se quatro árvores de decisão com estruturas simples e com a araucária preditiva de 75% aproximadamente para os três modos de viagem estudados. Embora o conhecimento extraído dos modelos neurais ajustados não tenham proporcionado a indicação de qual das estruturas de decisão subjacentes aos modelos de escolha modal mais se aproxima da obtida com o modelo neural, foi constatada nas árvores resultantes do processo de descoberta do conhecimento uma relação de compensação entre o atributo sexo e os atributos relacionados à capacidade econômica do domicílio na decisão de escolha do modo carro para a realização de uma viagem. Os resultados também sugerem a não necessidade de mais um atributo de entrada referente ao deslocamento realizado em uma viagem para modelagem por RNA do processo de escolha do modo de viagem no contexto estudado. / This research aimed at a better understanding of the mode travel choice process. The inductive data driven free from a priori assumptions of the data mining approach was employed, using artificial neural networks (ANN) as a mining tool, looking for knowledge or useful information, concerning the choice process and capable of indicating which of the underlying decision structures to the considered modal choice models would come closer to the observed one. Taking into consideration that there is a pattern in this process that can be captured by ANN, an ANN model was fitted (trained) to the data, and the knowledge contained in the trained ANN model was extracted by employing an ANN decision tree extraction algorithm called Trepan (Trees parroting network), which was analysed and interpreted in the light of the object of this research. The data which was employed in this knowledge discovery process come from a household survey carried out in Bauru - SP in order to estimate the O-D matrix in this city. Four decision trees with simple structures and predicting accuracy of approximately 75% for the three travel modes studied were obtained. Even though the knowledge extracted from the trained ANN model has not yielded the indication of which of the underlying decision structures to the modal choice models was closer to the neural model, a compensating relation between the sex attribute and the household economic-related attribute in the decision of choosing the car mode in order to travel was evidenced in the trees resulting from the process of knowledge discovery. The results also suggest the lack of necessity of more than one input travel attribute concerning the displacement performed in a trip for the ANN modelling of the mode travel choice process in the studied context. Análise de demanda por transporte Árvores de decisão Inteligência artificial Mineração de dados Modelos de escolha Redes neurais artificiais Artificial intelligence Artificial neural network Choice models Data mining Decision trees Knowledge discovery in database Transport demand analysis
6	An?lise de desempenho de vendas em telecomunica??es utilizando t?cnicas de minera??o de dados / Analysis of business development in telecommunication using data minig techniques Mattozo, Te?filo Camara 22 November 2007 (has links) Made available in DSpace on 2014-12-17T14:52:36Z (GMT). No. of bitstreams: 1 TeofiloCM.pdf: 1145688 bytes, checksum: d9ef0be6d9fb3c2958916ee42bdb507a (MD5) Previous issue date: 2007-11-22 / Nowadays, telecommunications is one of the most dynamic and strategic areas in the world. Organizations are always seeking to find new management practices within an ever increasing competitive environment where resources are getting scarce. In this scenario, data obtained from business and corporate processes have even greater importance, although this data is not yet adequately explored. Knowledge Discovery in Databases (KDD) appears then, as an option to allow the study of complex problems in different areas of management. This work proposes both a systematization of KDD activities using concepts from different methodologies, such as CRISP-DM, SEMMA and FAYYAD approaches and a study concerning the viability of multivariate regression analysis models to explain corporative telecommunications sales using performance indicators. Thus, statistical methods were outlined to analyze the effects of such indicators on the behavior of business productivity. According to business and standard statistical analysis, equations were defined and fit to their respective determination coefficients. Tests of hypotheses were also conducted on parameters with the purpose of validating the regression models. The results show that there is a relationship between these development indicators and the amount of sales / Telecomunica??es ? uma das mais din?micas e estrat?gicas ?reas no mundo atual. H? constante necessidade das organiza??es buscarem novas formas de gerenciamento, em um ambiente cada vez mais competitivo e com recursos cada vez menores. A exist?ncia de bases de dados nas empresas passou a ter maior import?ncia. Na grande maioria dos casos, dados n?o s?o ainda explorados adequadamente. T?cnicas de Descoberta de Conhecimento em Bases de Dados (DCBD) surgem como alternativas, permitindo o estudo de problemas complexos, sendo cada vez mais utilizadas nas diferentes ?reas de gest?o. O presente trabalho apresenta uma proposta para a sistematiza??o das atividades de DCBD a qual integra as metodologias CRISP-DM, SEMMA, FAYYAD, em um ambiente interativo, bem como um estudo de viabilidade do uso de an?lise de regress?o linear m?ltipla para explica??o de vendas, no setor corporativo de telecomunica??es, utilizando indicadores de desempenho. Foi delineado um m?todo estat?stico para analisar o efeito que os indicadores de desempenho t?m sobre o comportamento da produtividade de venda. Mediante an?lises estat?sticas e comerciais criteriosas, as equa??es foram definidas, sendo ajustados os seus respectivos coeficientes de determina??o. Foram tamb?m realizados testes de hip?teses de seus par?metros, visando ? valida??o ou n?o dos modelos de regress?o e an?lise da qualidade de seus ajustamentos. Ficou evidenciada a exist?ncia de relacionamento entre as caracter?sticas desses indicadores de desempenho com o volume de vendas realizado Indicadores de Desempenho Gest?o de Neg?cios em Telecomunica??es Sistema de Apoio ? Decis?o Performance Indicators Knowledge Discovery in Database Business Management in Telecommunication Decision Support Systems
7	Modul pro dolování v časových řadách systému pro dolování z dat / Time-Serie Mining Module of a Data Mining System Klement, Ondřej January 2010 (has links) The subject of this master's thesis is extension of existing data mining system. System will be extended by the module for the time series data mining. This thesis consists of common introduction to data mining issues and continues with time series analysis. Thesis then also contains some of the current tasks and algorithms used in time series data mining, follows by the concept of the implementation and description of the choosen mining method. Possible future system's improvments are disscused at the end of the paper.
8	Algorithmes pour la fouille de données et la bio-informatique / Algorithms for data mining and bio-informatics Mondal, Kartick Chandra 12 July 2013 (has links) L'extraction de règles d'association et de bi-clusters sont deux techniques de fouille de données complémentaires majeures, notamment pour l'intégration de connaissances. Ces techniques sont utilisées dans de nombreux domaines, mais aucune approche permettant de les unifier n'a été proposée. Hors, réaliser ces extractions indépendamment pose les problèmes des ressources nécessaires (mémoire, temps d'exécution et accès aux données) et de l'unification des résultats. Nous proposons une approche originale pour extraire différentes catégories de modèles de connaissances tout en utilisant un minimum de ressources. Cette approche est basée sur la théorie des ensembles fermés et utilise une nouvelle structure de données pour extraire des représentations conceptuelles minimales de règles d'association, bi-clusters et règles de classification. Ces modèles étendent les règles d'association et de classification et les bi-clusters classiques, les listes d'objets supportant chaque modèle et les relations hiérarchiques entre modèles étant également extraits. Cette approche a été appliquée pour l'analyse de données d'interaction protéomiques entre le virus VIH-1 et l'homme. L'analyse de ces interactions entre espèces est un défi majeur récent en bio-informatique. Plusieurs bases de données intégrant des informations hétérogènes sur les interactions et des connaissances biologiques sur les protéines ont été construites. Les résultats expérimentaux montrent que l'approche proposée peut traiter efficacement ces bases de données et que les modèles conceptuels extraits peuvent aider à la compréhension et à l'analyse de la nature des relations entre les protéines interagissant. / Knowledge pattern extraction is one of the major topics in the data mining and background knowledge integration domains. Out of several data mining techniques, association rule mining and bi-clustering are two major complementary tasks for these topics. These tasks gained much importance in many domains in recent years. However, no approach was proposed to perform them in one process. This poses the problems of resources required (memory, execution times and data accesses) to perform independent extractions and of the unification of the different results. We propose an original approach for extracting different categories of knowledge patterns while using minimum resources. This approach is based on the frequent closed patterns theoretical framework and uses a novel suffix-tree based data structure to extract conceptual minimal representations of association rules, bi-clusters and classification rules. These patterns extend the classical frameworks of association and classification rules, and bi-clusters as data objects supporting each pattern and hierarchical relationships between patterns are also extracted. This approach was applied to the analysis of HIV-1 and human protein-protein interaction data. Analyzing such inter-species protein interactions is a recent major challenge in computational biology. Databases integrating heterogeneous interaction information and biological background knowledge on proteins have been constructed. Experimental results show that the proposed approach can efficiently process these databases and that extracted conceptual patterns can help the understanding and analysis of the nature of relationships between interacting proteins. Bases de règles d'association Règles de classification Règles d'association conceptuelles Itemsets fermés fréquents Treillis des itemsets fermés Connexion de galois Analyse de concepts formels Structures de données Arbres suffixés Data mining Knowledge discovery in database Bases of association rules Classification rules Conceptual association rules Bi-clustering Frequent closed itemsets Closed itemset lattice Galois connection Formal concept analysis Suffix-tree data structure
9	Využití data miningu v řízení podniku / Using data mining to manage an enterprise. Prášil, Zdeněk January 2010 (has links) The thesis is focused on data mining and its use in management of an enterprise. The thesis is structured into theoretical and practical part. Aim of the theoretical part was to find out: 1/ the most used methods of the data mining, 2/ typical application areas, 3/ typical problems solved in the application areas. Aim of the practical part was: 1/ to demonstrate use of the data mining in small Czech e-shop for understanding of the structure of the sale data, 2/ to demonstrate, how the data mining analysis can help to increase marketing results. In my analyses of the literature data I found decision trees, linear and logistic regression, neural network, segmentation methods and association rules are the most used methods of the data mining analysis. CRM and marketing, financial institutions, insurance and telecommunication companies, retail trade and production are the application areas using the data mining the most. The specific tasks of the data mining focus on relationships between marketing sales and customers to make better business. In the analysis of the e-shop data I revealed the types of goods which are buying together. Based on this fact I proposed that the strategy supporting this type of shopping is crucial for the business success. As a conclusion I proved the data mining is methods appropriate also for the small e-shop and have capacity to improve its marketing strategy.

Search results