Spelling suggestions: "subject:"leka®"" "subject:"peka®""
1 |
Data mining system in E-health systemzhu, chenguang January 2014 (has links)
No description available.
|
2 |
Predictive data mining in a collaborative editing system: the Wikipedia articles for deletion process.Ashok, Ashish Kumar January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / William H. Hsu / In this thesis, I examine the Articles for Deletion (AfD) system in /Wikipedia/, a large-scale collaborative editing project. Articles in Wikipedia can be nominated for deletion by registered users, who are expected to cite criteria for deletion from the Wikipedia deletion. For example, an article can be nominated for deletion if there are any copyright violations, vandalism, advertising or other spam without relevant content, advertising or other spam without relevant content. Articles whose subject matter does not meet the notability criteria or any other content not suitable for an encyclopedia are also subject to deletion.
The AfD page for an article is where Wikipedians (users of Wikipedia) discuss whether an article should be deleted. Articles listed are normally discussed for at least seven days, after which the deletion process proceeds based on community consensus. Then the page may be kept, merged or redirected, transwikied (i.e., copied to another Wikimedia project), renamed/moved to another title, userfied or migrated to a user subpage, or deleted per the deletion policy. Users can vote to keep, delete or merge the nominated article. These votes can be viewed in article’s view AfD page. However, this polling does not necessarily determine the outcome of the AfD process; in fact, Wikipedia policy specifically stipulates that a vote tally alone should not be considered sufficient basis for a decision to delete or retain a page.
In this research, I apply machine learning methods to determine how the final outcome of an AfD process is affected by factors such as the difference between versions of an article, number of edits, and number of disjoint edits (according to some contiguity constraints). My goal is to predict the outcome of an AfD by analyzing the AfD page and editing history of the article. The technical objectives are to extract features from the AfD discussion and version history, as reflected in the edit history page, that reflect factors such as those discussed above, can be tested for relevance, and provide a basis for inductive generalization over past AfDs. Applications of such feature analysis include prediction and recommendation, with the performance goal of improving the precision and recall of AfD outcome prediction.
|
3 |
Χρήση τεχνικών εξόρυξης γνώσης σε ιατρικά δεδομέναΡήγας, Λάμπρος 25 May 2015 (has links)
Γνωριμία με την διαδικασία εξόρυξης γνώσης από δεδομένα και εφαρμογή των τεχνικών εξόρυξης γνώσης σε ιατρικά δεδομένα ασθενών με την χρήση της πλατφόρμας αλγορίθμων μηχανικής μάθησης Weka. / Getting to the process of data mining and applying data mining techniques in medical data of patients with the use of machine learning algorithms platform Weka.
|
4 |
APLICAÇÃO DA MINERAÇÃO DE DADOS NA DESCOBERTA DE PADRÕES DO PERFIL DE ALUNOS DO CURSO DE SI-UnUCET-UEGDel-fiaco, Ronaldo de Castro 13 March 2012 (has links)
Made available in DSpace on 2016-08-10T10:40:15Z (GMT). No. of bitstreams: 1
RONALDO DE CASTRO DEL-FIACO.pdf: 3047529 bytes, checksum: b55c6c1e7ebb7f0baf421ea95eb9683a (MD5)
Previous issue date: 2012-03-13 / The Data Mining (DM) is a part of the process of Knowledge Discovery in
Databases. Its implementation requires knowledge of various areas such as computer
sciences, statistics, management sciences and the business itself. In particular, it can be
applied to discover knowledge that allows the manager to improve the quality of the
learning process in which he/she is involved. This work presents the theoretical
background of data mining, describes and analyzes a case study where the main
objective is to apply the Cross-Industry Standard Process for Data Mining - CRISP-DM
methodology to identify the profile of the student that graduates in due time from the
course of Bachelor of Information Systems of the State University of Goiás at Anápolis.
It describes the data preparation that is used in the process and identify the best
proposals for analysis of the case study. As input data, both the transcripts and the
answers of the socioeconomic and cultural questionnaire applied to the students are
used as attributes for the evaluation of decision tree algorithms implemented in the data
mining tool known as WEKA. It was possible to realize that data mining requires a
professional who mastered the theory of DM to correctly calibrate the tools and
extensive knowledge of the business in order to determine the data mining goals and
interpret the results. / A mineração de dados (Data Mining - DM) é uma parte do processo para
descobrir conhecimento em base de dados. Sua realização exige conhecimentos de
várias áreas tais como a computação, a estatística, as ciências administrativas e do
próprio negócio. Em particular, pode ser aplicada para descobrir conhecimento que
permita ao gestor educacional melhorar a qualidade do processo ensino-aprendizagem
no qual esteja envolvido. Este trabalho apresenta o embasamento teórico em mineração
de dados, descreve e analisa um estudo de caso, cujo principal objetivo é aplicar a
metodologia Cross-Industry Standard Process for Data Mining - CRISP-DM na
identificação do padrão do perfil do aluno que integraliza o curso de Bacharelado em
Sistemas de Informação da Unidade Universitária de Ciências Exatas e Tecnológicas da
Universidade de Estadual de Goiás, no tempo mínimo previsto pelo projeto pedagógico
do curso. É realizada a preparação dos dados que são utilizados no processo para,
posteriormente, identificar as melhores propostas de análise do estudo de caso. Como
dados de entrada, recorre-se às respostas do questionário socioeconômico e cultural
aplicados aos vestibulandos e ao histórico escolar dos mesmos, que são utilizados como
atributos para a avaliação dos algoritmos de árvore de decisão, através da ferramenta
WEKA. Com este estudo, foi possível perceber que, para realizar a mineração de dados,
é necessário um profissional que domine a teoria de DM, saiba calibrar uma ferramenta
computacional e tenha conhecimento aprofundado do negócio, para determinar os
objetivos da DM e interpretar os resultados encontrados.
|
5 |
MINERAÇÃO DE DADOS APLICADA À CLASSIFICAÇÃO DOS CONTRIBUINTES DE ICMS DA SEFAZ-GORocha, Santiago Meireles 18 August 2017 (has links)
Submitted by admin tede (tede@pucgoias.edu.br) on 2018-02-15T18:00:36Z
No. of bitstreams: 1
SANTIAGO MEIRELES ROCHA.pdf: 972185 bytes, checksum: afac5e4d20639e20e3c5eed384124a70 (MD5) / Made available in DSpace on 2018-02-15T18:00:36Z (GMT). No. of bitstreams: 1
SANTIAGO MEIRELES ROCHA.pdf: 972185 bytes, checksum: afac5e4d20639e20e3c5eed384124a70 (MD5)
Previous issue date: 2017-08-18 / With the exponential increase in the volume of data stored and the high potential for
hidden knowledge in these data that can aid in the strategies and decision making of
organizations, much has been invested in information technology and telecommunication.
The purpose of this dissertation was to apply the Knowledge Discovery in Database
(DCBD) process in order to classify the taxpayers of SEFAZ-GO ICMS in High Eviction
and Low Eviction, through the task of data mining Supervised Classification,
Implemented by the algorithm J48, on the WEKA computing platform. Three
experiments were carried out with a sample of ICMS taxpayers data from the wholesale
sector of the city of Goiânia-GO, with attributes selected from the Tax Code of the State
of Goiás. During the experiments, the AttributeSelection and Discretize algorithms were
applied. Reduction of attributes and transformation of the continuous variables into
discrete ones, respectively. The statistical indices Confusion Matrix and Kappa
Coefficient were used as validation metrics of the proposed model. After each
experiment, classification rules were extracted, thus forming the proposed predictive
model of classification. In the best scenario, a correct classification rate of 84% accuracy
was obtained. Data mining is a reality within many organizations and can be a strong ally
in fulfilling the, trivial, task of knowledge discovery in corporate databases. / Com o aumento exponencial do volume de dados armazenados e o alto potencial de
conhecimento oculto nesses dados que pode auxiliar nas estratégias e nas tomadas de
decisão das organizações, muito vem se investido em tecnologia da informação e
telecomunicação. A presente dissertação teve como objetivo aplicar o processo de
Descoberta do Conhecimento em Base de Dados (DCBD) a fim de classificar os
contribuintes de ICMS da SEFAZ-GO em Alto Sonegador e Baixo Sonegador, por meio
da tarefa de mineração de dados Classificação Supervisionada, implementada pelo
algoritmo J48, na plataforma computacional WEKA. Foram realizados 3 experimentos
com uma amostra de dados de contribuintes de ICMS do setor atacadista do município de
Goiânia-GO, com atributos selecionados a partir do Código do Tributário do Estado de
Goiás. Durante os experimentos foram aplicados os algoritmos AttributeSelection e
Discretize, para a redução de atributos e transformação das variáveis contínuas em
discretas, respectivamente. Os índices estatísticos Matriz de Confusão e Coeficiente de
Kappa foram utilizados como métricas de validação do modelo proposto. Após cada
experimento, regras de classificação foram extraídas formando assim o modelo preditivo
proposto de classificação. Obteve-se, no melhor cenário, uma taxa de classificação
correta de 84% de acerto. A mineração de dados é uma realidade dentro de muitas
organizações e pode ser uma forte aliada no cumprimento da, nada trivial, tarefa de
descoberta de conhecimento nas bases de dados corporativas.
|
6 |
Using machine learning to classify news articlesLagerkrants, Eleonor, Holmström, Jesper January 2016 (has links)
In today’s society a large portion of the worlds population get their news on electronicdevices. This opens up the possibility to enhance their reading experience bypersonalizing news for the readers based on their previous preferences. We have conductedan experiment to find out how accurately a Naïve Bayes classifier can selectarticles that a user might find interesting. Our experiments was done on two userswho read and classified 200 articles as interesting or not interesting. Those articleswere divided into four datasets with the sizes 50, 100, 150 and 200. We used a NaïveBayes classifier with 16 different settings configurations to classify the articles intotwo categories. From these experiments we could find several settings configurationsthat showed good results. One settings configuration was chosen as a good generalsetting for this kind of problem. We found that for datasets with a size larger than 50there were no significant increase in classification confidence.
|
7 |
Open-Source Machine Learning: R Meets WekaHornik, Kurt, Buchta, Christian, Zeileis, Achim January 2007 (has links) (PDF)
Two of the prime open-source environments available for machine/statistical learning in data mining and knowledge discovery are the software packages Weka and R which have emerged from the machine learning and statistics communities, respectively. To make the different sets of tools from both environments available in a single unified system, an R package RWeka is suggested which interfaces Weka's functionality to R. With only a thin layer of (mostly R) code, a set of general interface generators is provided which can set up interface functions with the usual "R look and feel", re-using Weka's standardized interface of learner classes (including classifiers, clusterers, associators, filters, loaders, savers, and stemmers) with associated methods. / Series: Research Report Series / Department of Statistics and Mathematics
|
8 |
Benchmarking Open-Source Tree Learners in R/RWekaSchauerhuber, Michael, Zeileis, Achim, Meyer, David, Hornik, Kurt January 2007 (has links) (PDF)
The two most popular classification tree algorithms in machine learning and statistics - C4.5 and CART - are compared in a benchmark experiment together with two other more recent constant-fit tree learners from the statistics literature (QUEST, conditional inference trees). The study assesses both misclassification error and model complexity on bootstrap replications of 18 different benchmark datasets. It is carried out in the R system for statistical computing, made possible by means of the RWeka package which interfaces R to the open-source machine learning toolbox Weka. Both algorithms are found to be competitive in terms of misclassification error - with the performance difference clearly varying across data sets. However, C4.5 tends to grow larger and thus more complex trees. (author's abstract) / Series: Research Report Series / Department of Statistics and Mathematics
|
9 |
Fitness Function for a SubscriberPodapati, Sasidhar January 2017 (has links)
Mobile communication has become a vital part of modern communication. The cost of network infrastructure has become a deciding factor with rise in mobile phone usage. Subscriber mobility patterns have major effect on load of radio cell in the network. The need for data analysis of subscriber mobility data is of utmost priority. The paper aims at classifying the entire dataset provided by Telenor, into two main groups i.e. Infrastructure stressing and Infrastructure friendly with respect to their impact on the mobile network. The research aims to predict the behavior of new subscriber based on his MOSAIC group. A heuristic method is formulated to characterize the subscribers into three different segments based on their mobility. Tetris Optimization is used to reveal the “Infrastructure Stressing” subscribers in the mobile network. All the experiments have been conducted on the subscriber trajectory data provided by the telecom operator. The results from the experimentation reveal that 5 percent of subscribers from entire data set are “Infrastructure Stressing”. A classification model is developed and evaluated to label the new subscriber as friendly or stressing using WEKA machine learning tool. Naïve Bayes, k-nearest neighbor and J48 Decision tree are classification algorithms used to train the model and to find the relation between features in the labeled subscriber dataset
|
10 |
A Database For Exploratory Analysis of Human SleepMisra, Shivin Satyawon 26 March 2008 (has links)
This thesis focuses on the design, development, and exploratory analysis of a human sleep data repository. We have successfully collected comprehensive data for 1,046 sleep disorder patients and created a Terabyte-scale database system to handle it. The data for each patient was collected from the patient's medical records, and from the patient's allnight sleep study (for a total of about 0.6 Gigabytes per patient). Data collected from the patient's medical record contain more than 70 attributes, including demographic data, smoking, drinking, and exercise habits, depression and daytime sleepiness questionnaires, and overall medical history. Data collected from the patient's all-night sleep study consist of 50-55 time-series signals recorded during a period of 6-8 hours at the hospital's sleep clinic. These signals include among others an electroencephalogram, electromyogram, electrooculogram, electrocardiogram, and signals tracking blood oxygen level, body position, limb movements, snoring and blood pressure. 350 additional attributes summarize sleep related events taking place during the night long study, including sleep stages, arousals, and respiratory disturbances. Particular attention during the development of our database system was paid to a database design that effectively handles the data size and complexity, that describes the structure of sleep data in clinically meaningful terms, and that will facilitates the discovery of patterns in sleep data using machine learning algorithms. We have interfaced our database with Weka, a well known data mining system. To the best of our knowledge, our database is one of the world's largest and most comprehensive in the domain of human sleep disorders.
|
Page generated in 0.0463 seconds