Spelling suggestions: "subject:"knowledgediscover"" "subject:"knowledgediscoveryfor""
111 |
Descoberta de equivalência semântica entre atributos em bancos de dados utilizando redes neurais / Discovering semantic equivalences on attributes in databases using neural networksLima Junior, José January 2004 (has links)
Com o crescimento das empresas que fazem uso das tecnologias de bancos de dados, os administradores destes bancos de dados criam novos esquemas a cada instante, e na maioria dos casos não existe uma normalização ou procedimentos formais para que tal tarefa seja desempenhada de forma homogênea, resultando assim em bases de dados incompatíveis, o que dificulta a troca de dados entre as mesmas. Quando os Sistemas de Bancos de Dados (SBD) são projetados e implementados independentemente, é normal que existam incompatibilidades entre os dados de diferentes SBD. Como principais conflitos existentes nos esquemas de SBD, podem ser citados problemas relacionados aos nomes dos atributos, armazenamento em diferentes unidades de medida, diferentes níveis de detalhes, atributos diferentes com mesmo nome ou atributos iguais com nomes diferentes, tipos de dado diferentes, tamanho, precisão, etc. Estes problemas comprometem a qualidade da informação e geram maiores custos em relação à manutenção dos dados. Estes problemas são conseqüências de atributos especificados de forma redundante. Estes fatos têm provocado grande interesse em descobrir conhecimento em banco de dados para identificar informações semanticamente equivalentes armazenadas nos esquemas. O processo capaz de descobrir este conhecimento em banco de dados denomina-se DCDB (Descoberta de Conhecimento em Bancos de Dados). As ferramentas disponíveis para a execução das tarefas de DCDB são genéricas e derivadas de outras áreas do conhecimento, em especial, da estatística e inteligência artificial. As redes neurais artificiais (RNA) têm sido utilizadas em sistemas cujo propósito é a identificação de padrões, antes desconhecidos. Estas redes podem aprender similaridades entre os dados, diretamente de suas instâncias, sem conhecimento a priori. Uma RNA que tem sido usada com êxito para identificar equivalência semântica é o Mapa Auto-Organizável (SOM). Esta pesquisa objetiva descobrir, de modo semi-automatizado, equivalência semântica entre atributos de bases de dados, contribuindo para o gerenciamento e integração das mesmas. O resultado da pesquisa gerou uma sistemática para o processo de descoberta e uma ferramenta que a implementa. / With the increasing number of companies using database technologies, the database’s administrators create new schemes at every moment, and in most cases there are no normalization or formal procedures to do this task in a homogeneous form, it results in incompatible databases, that difficult data exchange. When the Database Systems (DBS) are projected and implemented independently, it is normal that data incompatibilities among different DBS. Problems related to the names of the attributes, storage in different measurement units, different levels of detail, different attributes with the same name or equal attributes with different names, different type of data, size, precision, etc, can be cited as main conflicts existing in the DBS schemes. These problems compromise the quality information and generate higher costs regarding the data maintenance. These problems arise as the consequence of redundant attributes’ specification. These facts have caused great interest in discovering knowledge in database to identify information semantically equivalent stored in schemes. The process capable to discover this knowledge in database is called KDD (Knowledge Discovery in Database). The available tools to do KDD tasks are generic and derived from other areas of knowledge, in special, statistics and artificial intelligence. The artificial neural networks (ANN) have been used in systems which aim is the identification of previously unknown patterns. These networks can learn similarities among the data directly from instances, without a priori knowledge. An ANN that has been used with success to identify semantic equivalence is the Self-Organizing Map (SOM). This research aims to discover, in a semi-automatic way, semantic equivalence on database attributes, contributing for the management and integration of these databases. This work resulted in a systematic for the discovery process and a tool that implements it.
|
112 |
Descoberta de conhecimento em bases de dados e estratégias de relacionamento com clientes: um estudo no setor de serviçosFernandes, Marcelo Pires 12 February 2008 (has links)
Made available in DSpace on 2016-03-15T19:26:36Z (GMT). No. of bitstreams: 1
Marcelo Pires Fernandes.pdf: 425391 bytes, checksum: 82c6fd61293544d4f47d5a6eec0f6580 (MD5)
Previous issue date: 2008-02-12 / The research problem to be studied is related to the way companies from the services industry use customer databases to discover useful knowledge about their customers, in order to improve the development of relationship strategies with them. This issue is important mainly because due to the increasing of concurrence and customer demand, the company needs to relate differently with their customers, so that thy can keep in its portfolio the most profitable ones. In this way, the theory has suggested a deeper integration among distinct disciplines as Relationship Marketing, CRM and Data Mining. In this current study, it was investigated the
way the theory presents and describes database analysis processes and, as a result, some proposals were found out, that segment the processes of discovering knowledge in databases in stages like problem understanding, data understanding, data preparation, data modeling data, model evaluation and deployment. The target population was composed by companies from the services industry from São Paulo and Rio de Janeiro cities and a quantitative research was made by applying a questionnaire to 67 professionals from the target population. In this research, themes as utilization level from stages of process of discovering knowledge in databases, utilization level of data mining techniques and utilization level of relationship strategies were investigated. It was discovered that the companies researched have a high utilization level of the stages of knowledge discovery identified in the theory, just only a
small part of the data mining techniques are uniformly used by the companies researched and, at last, the strategies with the highest utilization levels are that related to the acquisition of new customers and identification of profitable ones. This last discover was a little bit surprising, because it is opposed to the way of thinking of some authors who defend
companies should focus on their relationship strategies in the customer retention. These results can be used to support companies, in subjects related to the development of customer relationship strategies, based in an integrated analysis of business issues, customer information, as well quantitative models of analysis from this information, in order to turn it into useful knowledge to the making decision. / O problema de pesquisa a ser investigado está associado ao modo como empresas do setor de serviços utilizam bases de dados para descobrir conhecimento sobre o cliente e embasar o desenvolvimento de estratégias de relacionamento. Este tema é importante, visto que em função do aumento da concorrência e da exigência dos clientes, as empresas precisam tratar seus clientes de forma diferenciada, de forma a manter em sua carteira aqueles mais rentáveis. Neste sentido, a literatura tem sugerido uma integração cada vez mais intensa entre disciplinas como Marketing de Relacionamento, CRM e Mineração de Dados. O presente trabalho estudou o modo como a literatura apresenta e descreve processos de análise de bases de dados e algumas propostas foram encontradas, propostas que segmentam o processo de descoberta de conhecimento em bases de dados em etapas como entendimento do problema, entendimento e preparação dos dados, modelagem dos dados, avaliação do modelo e implementação da solução desenvolvida. O universo estudado foi o de empresas do setor de serviços que atuam nas cidades de São Paulo e do Rio de Janeiro e uma pesquisa quantitativa foi realizada por meio da aplicação de um questionário a 67 respondentes. Nesta pesquisa, foi investigado o nível de utilização das etapas dos processos de descoberta de conhecimento em bases de dados, as técnicas de mineração utilizadas, bem como as estratégias de relacionamento adotadas com clientes. Constatou-se que as empresas pesquisadas possuem um alto nível de utilização das etapas de descoberta de conhecimento identificadas na
literatura, que elas utilizam de forma uniforme apenas algumas das técnicas de mineração de dados identificadas na literatura e que, do ponto de vista de estratégias de relacionamento com clientes, as estratégias de aquisição de novos clientes e identificação dos melhores clientes possuem um nível de utilização superior ao de estratégias de retenção de clientes (considerando resultados da amostra). Esta última constatação, de certo modo, contraria o pensamento de algumas correntes teóricas, que defendem que as empresas devem focar suas estratégias de relacionamento na retenção de clientes. Estes resultados pode servir de apoio aos gestores das empresas, no que se refere aos processos de desenvolvimento de estratégias de relacionamento com clientes, sustentados em análise integrada dos aspectos de negócio envolvidos, informações sobre o cliente, bem como modelos quantitativos de análise destas informações, de forma a transformá-las em conhecimento útil para a tomada de decisão.
|
113 |
Otimização multiobjetivo e programação genética para descoberta de conhecimento em engenhariaRusso, Igor Lucas de Souza 26 January 2017 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-04-19T15:28:50Z
No. of bitstreams: 1
igorlucasdesouzarusso.pdf: 2265113 bytes, checksum: 0eb7e55f7354359d8fb9419e6e6da17f (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-04-20T12:28:17Z (GMT) No. of bitstreams: 1
igorlucasdesouzarusso.pdf: 2265113 bytes, checksum: 0eb7e55f7354359d8fb9419e6e6da17f (MD5) / Made available in DSpace on 2017-04-20T12:28:17Z (GMT). No. of bitstreams: 1
igorlucasdesouzarusso.pdf: 2265113 bytes, checksum: 0eb7e55f7354359d8fb9419e6e6da17f (MD5)
Previous issue date: 2017-01-26 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / A área de Otimização envolve o estudo e emprego de métodos para determinação dos
parâmetros que levam à obtenção de soluções ótimas, de acordo com critérios denominados
objetivos. Um problema é classificado como multiobjetivo quando apresenta objetivos
múltiplos e conflitantes, que devem ser otimizados simultaneamente. Recentemente tem
crescido o interesse dos pesquisadores pela análise de pós-otimalidade, que consiste na
busca por propriedades intrínsecas às soluções ótimas de problemas de otimização e que
podem lançar uma nova luz à compreensão dos mesmos. Innovization (inovação através
de otimização, do inglês innovation through optmization) é um processo de descoberta de
conhecimento a partir de problemas de otimização na forma de relações matemáticas
entre variáveis, objetivos, restrições e parâmetros. Dentre as técnicas de busca que
podem ser utilizadas neste processo está a Programação Genética (PG), uma meta
heurística bioinspirada capaz de evoluir programas de forma automatizada. Além de
numericamente válidos, os modelos encontrados devem utilizar corretamente as variáveis
de decisão em relação às unidades envolvidas, de forma a apresentar significado físico
coerente. Neste trabalho é proposta uma alternativa para tratamento das unidades através
de operações protegidas que ignoram os termos inválidos. Além disso, propõe-se aqui uma
estratégia para evitar a obtenção de soluções triviais que não agregam conhecimento sobre
o problema. Visando aumentar a diversidade dos modelos obtidos, propõe-se também a
utilização de um arquivo externo para armazenar as soluções de interesse ao longo da
busca. Experimentos computacionais são apresentados utilizando cinco estudos de caso
em engenharia para verificar a influência das ideias propostas. Os problemas tratados
aqui envolvem os projetos de: uma treliça de 2 barras, uma viga soldada, do corte de
uma peça metálica, de engrenagens compostas e de uma treliça de 10 barras, sendo este
último ainda não explorado na literatura de descoberta de conhecimento. Finalmente, o
conhecimento inferido no estudo de caso da estrutura de 10 barras é utilizado para reduzir
a dimensionalidade do problema. / The area of optimization involves the study and the use of methods to determine the
parameters that lead to optimal solutions, according to criteria called objectives. A
problem is classified as multiobjective when it presents multiple and conflicting objectives
which must be simultaneously optimized. Recently, the interest of the researchers
has grown in the analysis of post-optimality, which consists in the search for intrinsic
properties of the optimal solutions of optimization problems. This can shed a new light on
the understanding of the optimization problems. Innovization (from innovation through
optimization) is a process of knowledge discovery from optimization problems in the form
of mathematical relationships between variables, objectives, constraints, and parameters.
Genetic Programming (GP), a search technique that can be used in this process, is a
bio-inspired metaheuristic capable of evolving programs automatically. In addition to
be numerically valid, the models found must correctly use the decision variables with
respect to the units involved, in order to present coherent physical meaning. In this work,
a method is proposed to handle the units through protected operations which ignore
invalid terms. Also, a strategy is proposed here to avoid trivial solutions that do not add
knowledge about the problem. In order to increase the diversity of the models obtained,
it is also proposed the use of an external file to store the solutions of interest found
during the search. Computational experiments are presented using five case studies in
engineering to verify the influence of the proposed ideas. The problems dealt with here are
the designs of: a 2-bar truss, a welded beam, the cutting of a metal part, composite gears,
and a 10-bar truss. The latter was not previously explored in the knowledge discovery
literature. Finally, the inferred knowledge in the case study of the 10-bar truss structure
is used to reduce the dimensionality of that problem.
|
114 |
Méthodologie d’extraction de connaissances spatio-temporelles par fouille de données pour l’analyse de comportements à risques : application à la surveillance maritime / Methodology of spatio-temporal knowledge discovery through data mining for risk behavior analysis : application to maritime traffic monitoringIdiri, Bilal 17 December 2013 (has links)
Les progrès technologiques en systèmes de localisation (AIS, radar, GPS, RFID, etc.), de télétransmission (VHF, satellite, GSM, etc.), en systèmes embarqués et leur faible coût de production a permis leur déploiement à une large échelle. Énormément de données sur les déplacements d'objets sont produites par le biais de ces technologies et utilisées dans diverses applications de surveillance temps-réel comme la surveillance du trafic maritime. L'analyse a posteriori des données de déplacement de navires et d'événements à risques peut présenter des perspectives intéressantes pour la compréhension et l'aide à la modélisation des comportements à risques. Dans ce travail de thèse une méthodologie basée sur la fouille de données spatio-temporelle est proposée pour l'extraction de connaissances sur les comportements potentiellement à risques de navires. Un atelier d'aide à l'analyse de comportements de navires fondé sur cette méthodologie est aussi proposé. / The advent of positioning system technologies (AIS, radar, GPS, RFID, etc.), remote transmission (VHF, satellite, GSM, etc.), technological advances in embedded systems and low cost production, has enabled their deployment on a large scale. A huge amount of moving objects data are collected through these technologies and used in various applications such as real time monitoring surveillance of maritime traffic. The post-hoc analysis of data from moving ships and risk events may present interesting opportunities for the understanding and modeling support of risky behaviors. In this work, we propose a methodology based on Spatio-Temporal Data Mining for the knowledge discovery about potentially risky behaviors of ships. Based on this methodology, a workshop to support the analysis of behavior of ships is also proposed.
|
115 |
A Framework for How to Make Use of an Automatic Passenger Counting SystemFihn, John, Finndahl, Johan January 2011 (has links)
Most of the modern cities are today facing tremendous traffic congestions, which is a consequence of an increasing usage of private motor vehicles in the cities. Public transport plays a crucial role to reduce this traffic, but to be an attractive alternative to the use of private motor vehicles the public transport needs to provide services that suit the citizens requirements for travelling. A system that can provide transit agencies with rapid feedback about the usage of their transport network is the Automatic Passenger Counting (APC) system, a system that registers the number of passengers boarding and alighting a vehicle. Knowledge about the passengers travel behaviour can be used by transit agencies to adapt and improve their services to satisfy the requirements, but to achieve this knowledge transit agencies needs to know how to use an APC system. This thesis investigates how a transit agency can make use of an APC system. The research has taken place in Melbourne where Yarra Trams, operator of the tram network, now are putting effort in how to utilise the APC system. A theoretical framework based on theories about Knowledge Discovery from Data, System Development, and Human Computer Interaction, is built, tested, and evaluated in a case study at Yarra Trams. The case study resulted in a software system that can process and model Yarra Tram's APC data. The result of the research is a proposal of a framework consistingof different steps and events that can be used as a guide for a transit agency that wants to make use of an APC system.
|
116 |
Approche évolutionnaire et agrégation de variables : application à la prévision de risques hydrologiques / Evolutionary approach and variable aggregation : application to hydrological risks forecastingSegretier, Wilfried 10 December 2013 (has links)
Les travaux de recherche présentés dans ce mémoire s'inscrivent dans la lignée des approches de modélisation hydrologiques prédictives dirigées par les données. Nous avons particulièrement développé leur application sur le contexte difficile des phénomènes de crue éclairs caractéristiques des bassins versants de la région Caraïbe qui pose un dé fi sé.curi taire. En envisageant le problème de la prévision de crues comme un problème d'optimisation combinatoire difficile nous proposons d'utiliser la notion de métaneuristiques, à travers les algorithmes évolutionnaire notamment pour leur capacité à parcourir efficacement de grands espaces de recherche et fi fournir des solutions de bOlIDe qualité en des temps d'exécution raisonnables. Nous avons présenté l'approche de prédiction AV2D : Aggregate Variable Data Driven dom le concept central est la notion de variable agrégée. L'idée sous-jacente à ce concept est de considérer le pouvoir prédictif de nouvelles variables définies comme le résultat de fonctions tatistiques, dites d'agrégation calculées sur de donnée' correspondant à des périodes de temps précédent uo événem nt à prédire. Ces variable sont caractérisées par des ensembles de paramètres correspondant a leur pJ:opriétés. Nous avons imroduitle variables agrégées hydrométéorologiques permettant de répondre au problème de la classification d événements hydrologiques. La complexité du parcours de l'espace de recherche engendré par les paramètres définissant ces variables a été prise en compte grâce à la njse en oeuvre d'un algorithme évolutionnaire particulier dont les composants ont été spécifiquement définis pour ce problème. Nous avons montré, à travers une étude comparative avec d'autres approches de modélisation dirigées par les données, menée sur deux cas d'études de bassins versant caribéens, que l'approche AV2D est particulièrement bien adaptée à leur contexte. Nous étudions par la suite les bénéfices offerts par les approches de modélisation hydrologiques modulaires dirigées par les données, en définissant un procédé de division en sous-processus prenant en compte les caractéristiques paniculières des bassins versants auxquels nous nous intéressons. Nou avons proposé une extension des travaux précédents à travers la définition d'une approche de modélisation modulaire M2D: Spatial Modular Data Driven, consistant à considérer des sous-processus en divisant l'ensemble des exemples à classifier en sous-ensembles correspondant à des comportements hydrologiques homogènes. Nous avons montré à travers une étude comparative avec d autres approches dU'igées par les données mises en oeuvre sur les mêmes sous-ensembles de données que celte approche permet d améliorer les résultats de prédiction particulièrement à coun Lenne. Nous avons enfin proposé la modélisation d un outil de pi / The work presented in this thesis is in the area of data-driven hydrological modeling approaches. We particularly investigared their application on the difficult problem of flash flood phenomena typically observed in Caribbean watersheds. By considering the problem of flood prediction as a combinatorial optimization problem, we propose to use the notion of Oleraheuristics, through evolutionary algorithms, especially for their capacity ta visit effjciently large search space and to provide good solutions in reasonable execution times. We proposed the hydrological prediction approach AV2D: Aggregate Variable Data Driven which central concept is the notion of aggregate variable. The underlying idea of this [concept is to consider the predictive power of new variables defined as the results of statistical functions, called aggregation functions, computed on data corresponding ta time periods before an event ta predict. These variables are characterized by sets of parameters corresponding ta their specifications. We introduced hydro-meteorological aggregate variables allowing ta address the classification problem of hydrological events. We showed through a comparative study on two typical caribbean watersheds, using several common data driven modelling techniques that the AV2D approach is panicul.rly weil fitted ta the studied context. We also study the benefits offered by modulaI' approaches through the definition of the SM2D: Spatial Modular DataDriven approach, consisting in considering sub-processes partly defined by spatial criteria. We showed that the results obtained by the AV2D on these sub-processes allows to increase the performances particularly for short term prediction. Finally we proposed the modelization of a generic control tool for hydro-meteorological prediction systems, H2FCT: Hydro-meteorological Flood Forecasting Control 1'001
|
117 |
Empirické porovnání systémů dobývání znalostí z databází / Empirical comparison of systems for knowledge discovery in databasesBenešová, Kristýna January 2008 (has links)
S rostoucím množstvím shromažďovaných a ukládaných dat roste také potřeba a zájem majitelů těchto dat o využití jejich potenciálu k dalšímu rozhodování. Proto se vyvíjí nové přístupy a způsoby vycházející z informatiky, statistiky a oblasti strojového učení, které se této potřebě snaží vyhovět. Cílem této diplomové práce je uvést proces dobývání znalostí dat z databází na medicínských datech Tinnitus a představit systémy LISp-Miner a Weka, které daný proces podporují. Obsahem teoretické části diplomové práce je shrnutí základních charakteristik a přístupů procesu dobývání znalostí. Praktická část diplomové práce je věnována realizaci celého procesu v jednotlivých krocích. V samotném kroku modelování jsou využity již zmíněné systémy akademické LISp-Miner a Weka. Poslední část praktické části práce patří prezentaci dosažených výsledků a vlastnímu zhodnocení systémů.
|
118 |
Automatizace předzpracování dat za využití doménových znalosti / Automation of data preprocessing using domain knowledgeBeskyba, Jan January 2014 (has links)
In this work we propose a solution that would help automate the part of knowledge discovery in databases. Domain knowledge has an important role in the automation process which is necessary to include into the proposed program for data preparation. In the introduction to this work, we focus on the theoretical basis of knowledge discovery of databases with an emphasis on domain knowledge. Next, we focus on the basic principles of data pre-processing and scripting language LMCL that could be part of the design of the newly established applications for automated data preparation. Subsequently, we will deal with application design for data pre-processing, which will be verified on the data the House of Commons.
|
119 |
Aplikace data miningu v podnikové praxi / Data mining applications in business practiceTrávníček, Petr January 2011 (has links)
Throughout last decades, knowledge discovery from databases as one of the information and communicaiton technologies' disciplines has developed into its current state being showed increasing interest not only by major business corporates. Presented diploma thesis deals with problematique of data mining while paying prime attention to its practical utilization within business environment. Thesis objective is to review possibilities of data mining applications and to decompose implementation techniques focusing on specific data mining methods and algorithms as well as adaptation of business processes. This objective is subject of theoretical part of thesis focusing on principles of data mining, knowledge discovery from databases process, data mining commonly used methods and algorithms and finally tasks typically implemented in this domain. Further objective consists in presenting data mining benefits on the model example that is being displayed in the practical part of the thesis. Besides created data mining models evalution, practical part contains also design of subsequent steps that would enable higher efficiency in some specific areas of given business. I believe previous point together with characterization of knowledge discovery in databases process to be considered as the most beneficial one's of the thesis.
|
120 |
Mineração de dados aplicada à classificação do risco de evasão de discentes ingressantes em instituições federais de ensino superiorAMARAL, Marcelo Gomes do 08 July 2016 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2017-07-11T14:35:16Z
No. of bitstreams: 3
license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5)
projeto_v26016.pdf: 1271790 bytes, checksum: f724d8523f2ffdb11ce599aff1eb8eb6 (MD5)
projeto_v26016.pdf: 1271790 bytes, checksum: f724d8523f2ffdb11ce599aff1eb8eb6 (MD5) / Made available in DSpace on 2017-07-11T14:35:16Z (GMT). No. of bitstreams: 3
license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5)
projeto_v26016.pdf: 1271790 bytes, checksum: f724d8523f2ffdb11ce599aff1eb8eb6 (MD5)
projeto_v26016.pdf: 1271790 bytes, checksum: f724d8523f2ffdb11ce599aff1eb8eb6 (MD5)
Previous issue date: 2016-07-08 / As Instituições Federais de Ensino Superior (IFES) possuem um
importante papel no desenvolvimento social e econômico do país, contribuindo
para o avanço tecnológico e cientifico e fomentando investimentos. Nesse
sentido, entende-se que um melhor aproveitamento dos recursos educacionais
ofertados pelas IFES contribui para a evolução da educação superior, como um
todo. Uma maneira eficaz de atender esta necessidade é analisar o perfil dos
estudantes ingressos e procurar prever, com antecedência, casos indesejáveis
de evasão que, quanto mais cedo identificados, melhor poderão ser estudados
e tratados pela administração. Neste trabalho, propõe-se a definição de uma
abordagem para aplicação de técnicas diretas de Mineração de Dados
objetivando a classificação dos discentes ingressos de acordo com o risco de
evasão que apresentam. Como prova de conceito, a análise dos aspectos
inerentes ao processo de Mineração de Dados proposto se deu por meio de
experimentações conduzidas no ambiente da Universidade Federal de
Pernambuco (UFPE). Para alguns dos algoritmos classificadores, foi possível
obter uma acurácia de classificação de 73,9%, utilizando apenas dados
socioeconômicos disponíveis quando do ingresso do discente na instituição,
sem a utilização de nenhum dado dependente do histórico acadêmico. / The Brazilian's Federal Institutions of Higher Education have an
important role in the social and economic development of the country,
contributing to the technological and scientific advances and encouraging
investments. Therefore, it is possible to infer that a better use of the educational
resources offered by those institutions contributes to the evolution of higher
education as a whole. An effective way to meet this need is to analyze the
profile of the freshmen students and try to predict, as soon as possible,
undesirable cases of dropout that when earlier identified can be examined and
addressed by the institution's administration. This work propose the
development of a approach for direct application of Data Mining techniques to
classify newcomer students according to their dropout risk. As a viability proof,
the proposed Data Mining approach was evaluated through experimentations
conducted in the Federal University of Pernambuco. Some of the classification
algorithms tested had an classification accuracy of 73.9% using only
socioeconomic data available since the student's admission to the institution,
without the use of any academic related data.
|
Page generated in 0.062 seconds