Global ETD Search

51	Uma abordagem temporal para identificação precoce de estudantes de graduação a distância com risco de evasão utilizando técnicas de mineração de dados Santos, Ramon Nóbrega dos 29 May 2015 (has links) Submitted by Clebson Anjos (clebson.leandro54@gmail.com) on 2016-02-15T18:37:51Z No. of bitstreams: 1 arquivototal.pdf: 2981698 bytes, checksum: 6dfa47590c870db030e7c1cbea499120 (MD5) / Made available in DSpace on 2016-02-15T18:37:51Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 2981698 bytes, checksum: 6dfa47590c870db030e7c1cbea499120 (MD5) Previous issue date: 2015-05-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Through the use of data mining techniques, more usually the classification algorithms, it is possible to implement predictive models that are able to early identify a student in risk of dropout. Several studies used data obtained from a Virtual Learning Environment (VLE) to implement predictive performance models in a discipline of a course. However, any study was carried out aimed at developing a model for dropout prediction, to distance graduation courses of longer duration, which integrates works that carry out performance prediction based on a VLE, allowing an early prediction during the first semester and throughout the others semesters. Thus, this work proposes a dropout identification approach for distance graduation courses that use the Rule-Based Classification technique to firstly identify the disciplines and grades limits that have higher influence on dropout, so that the predictive models for performance in a VLE can be used regarding the dropout detection of students along the whole distance graduation course. Experiments were carried out using four rulebased classification algorithms: JRip, OneR, PART and Ridor. Considering the use of this temporal approach, it was possible to prove the advantages of this approach, once better accuracies were obtained along the semesters and important rules were discovered to early identify students in risk of dropout. Among the applied algorithms, JRip and PART obtained the best predictive results with average accuracy of 81% at the end of first semester. Furthermore, considering our proposed partition methodology, where attributes of the predictive models are incrementally applied, it was possible to discovery rules potentially useful to dropout prevention. / Com a utilização de técnicas de mineração de dados, mais comumente os algoritmos de Classificação, pode-se construir modelos preditivos capazes de identificar precocemente um estudante com risco de evasão. Diversos estudos utilizaram dados obtidos de um Ambiente Virtual de Aprendizagem (AVA) para a construção de modelos preditivos de desempenho em uma disciplina de um curso. Porém, nenhum estudo foi realizado com o objetivo de desenvolver um modelo de predição de evasão, para um curso de graduação a distância de maior duração, que integre trabalhos que fazem a predição de desempenho a partir de um AVA, possibilitando uma predição da evasão antecipada durante o primeiro semestre e ao longo dos demais semestres. Assim, este trabalho propõe uma abordagem de identificação de evasão em um curso de graduação a distância a partir da utilização da técnica de classificação baseada em regras para, primeiramente, identificar as disciplinas e os limites de notas que mais influenciam na evasão para que os modelos preditivos de desempenhos em um AVA possam ser utilizados para a predição da evasão de um aluno com risco de evasão ao longo de todo o curso de graduação a distância. Foram realizados experimentos com quatro algoritmos de classificação baseados em regras: o JRip, o OneR, o PART e o Ridor. A partir da utilização da abordagem temporal proposta foi possível comprovar sua vantagem, uma vez que foram obtidos melhores desempenhos preditivos ao longo dos semestres e foram descobertas importantes regras para a identificação precoce de um estudante com risco de evasão. Entre os algoritmos estudados, JRip e PART obtiveram os melhores desempenhos preditivos com acurácia média de 81% ao final do primeiro semestre. A partir da metodologia proposta de partições, na qual os atributos dos modelos preditivos são aplicados de forma incremental, foi possível a descoberta de regras potencialmente úteis para prevenir a evasão.
52	Analysis and classification of spatial cognition using non-linear analysis and artificial neural networks / Análise e classificação da capacidade cognitiva espacial utilizando técnicas de análise não-linear e redes neurais artificiais Maron, Guilherme January 2014 (has links) O principal objetivo do presente trabalho é propor, desenvolver, testar e apresentar um método para a classificação do grau de desenvolvimento da capacidade cognitiva espacial de diferentes indivíduos. 37 alunos de graduação tiveram seus eletroencefalogramas (EEGs) capturados enquanto estavam engajados em tarefas de rotação mental de imagens tridimensionais. Seu grau de desenvolvimento da capacidade cognitiva espacial foi avaliado utilizando-se um teste psicológico BPR-5. O maior expoente de Lyapunov (LLE) foi calculado a partir de cada um dos 8 canais dos EEGs capturados. OS LLEs foram então utilizados como tuplas de entrada para 5 diferentes classificadores: i) perceptron de múltiplas camadas, ii) rede neural artificial de funções de base radial, iii) perceptron votado, iv) máquinas de vetor de suporte, e v) k-vizinhos. O melhor resultado foi obtido utilizando-se uma RBF com 4 clusters e a função de kernel Puk. Também foi realizada uma análise estatística das diferenças de atividade cerebral, baseando-se nos LLEs calculados, entre os dois grupos de interesse: SI+ (indivíduos com um suposto maior grau de desenvolvimento da sua capacidade cognitiva espacial) e SI- (grupo de controle) durante a realização de tarefas de rotação mental de imagens tridimensionais. Uma diferença média de 16% foi encontrada entre os dois grupos. O método de classificação proposto pode vir a contribuir e a interagir com outros processos na analise e no estudo da capacidade cognitiva espacial humana, assim como no entendimento da inteligência humana como um todo. Um melhor entendimento e avaliação das capacidades cognitivas de um indivíduo podem sugerir a este elementos de motivação, facilidade ou de inclinações naturais suas, podendo, provavelmente, afetar as decisões da sua vida e carreira de uma forma positiva. / The main objective of the present work is to propose, develop, test, and show a method for classifying the spatial cognition degree of development on different individuals. Thirty-Seven undergraduate students had their electroencephalogram (EEG) recorded while engaged in 3-D images mental rotation tasks. Their spatial cognition degree of development was evaluated using a BPR-5 psychological test. The Largest Lyapunov Exponent (LLE) was calculated from each of the 8 electrodes recorded in each EEG. The LLEs were used as input for five different classifiers: i) multi-layer perceptron artificial neural network, ii) radial base functions artificial neural network, iii) voted perceptron artificial neural network, iv) support vector machines, and v) K-Nearest Neighbors. The best result was achieved by using a RBF with 4 clusters and Puk kernel function. Also a statistical analysis of the brain activity, based in the calculated LLEs, differences between two interest groups: SI+ (participants with an alleged higher degree of development of their spatial cognition) and SI- (control group) during the performing of mental rotation of tridimensional images tasks was done.. An average difference of 16% was found between both groups. The proposed classification method can contribute and interact with other processes in the analysis and study of human spatial cognition, as in the understanding of the human intelligence at all. A better understanding and evaluation of the cognitive capabilities of an individual could suggest him elements of motivation, ease or natural inclinations, possibly affecting the decisions of his life and carrier positively. Inteligência artificial Redes neurais Non-linear analysis Lyapunov exponents Signal processing Biological signal processing EEG analysis Cognitive profiles Spatial cognition Classification algorithms Artificial neural networks MLP
53	Analysis and classification of spatial cognition using non-linear analysis and artificial neural networks / Análise e classificação da capacidade cognitiva espacial utilizando técnicas de análise não-linear e redes neurais artificiais Maron, Guilherme January 2014 (has links) O principal objetivo do presente trabalho é propor, desenvolver, testar e apresentar um método para a classificação do grau de desenvolvimento da capacidade cognitiva espacial de diferentes indivíduos. 37 alunos de graduação tiveram seus eletroencefalogramas (EEGs) capturados enquanto estavam engajados em tarefas de rotação mental de imagens tridimensionais. Seu grau de desenvolvimento da capacidade cognitiva espacial foi avaliado utilizando-se um teste psicológico BPR-5. O maior expoente de Lyapunov (LLE) foi calculado a partir de cada um dos 8 canais dos EEGs capturados. OS LLEs foram então utilizados como tuplas de entrada para 5 diferentes classificadores: i) perceptron de múltiplas camadas, ii) rede neural artificial de funções de base radial, iii) perceptron votado, iv) máquinas de vetor de suporte, e v) k-vizinhos. O melhor resultado foi obtido utilizando-se uma RBF com 4 clusters e a função de kernel Puk. Também foi realizada uma análise estatística das diferenças de atividade cerebral, baseando-se nos LLEs calculados, entre os dois grupos de interesse: SI+ (indivíduos com um suposto maior grau de desenvolvimento da sua capacidade cognitiva espacial) e SI- (grupo de controle) durante a realização de tarefas de rotação mental de imagens tridimensionais. Uma diferença média de 16% foi encontrada entre os dois grupos. O método de classificação proposto pode vir a contribuir e a interagir com outros processos na analise e no estudo da capacidade cognitiva espacial humana, assim como no entendimento da inteligência humana como um todo. Um melhor entendimento e avaliação das capacidades cognitivas de um indivíduo podem sugerir a este elementos de motivação, facilidade ou de inclinações naturais suas, podendo, provavelmente, afetar as decisões da sua vida e carreira de uma forma positiva. / The main objective of the present work is to propose, develop, test, and show a method for classifying the spatial cognition degree of development on different individuals. Thirty-Seven undergraduate students had their electroencephalogram (EEG) recorded while engaged in 3-D images mental rotation tasks. Their spatial cognition degree of development was evaluated using a BPR-5 psychological test. The Largest Lyapunov Exponent (LLE) was calculated from each of the 8 electrodes recorded in each EEG. The LLEs were used as input for five different classifiers: i) multi-layer perceptron artificial neural network, ii) radial base functions artificial neural network, iii) voted perceptron artificial neural network, iv) support vector machines, and v) K-Nearest Neighbors. The best result was achieved by using a RBF with 4 clusters and Puk kernel function. Also a statistical analysis of the brain activity, based in the calculated LLEs, differences between two interest groups: SI+ (participants with an alleged higher degree of development of their spatial cognition) and SI- (control group) during the performing of mental rotation of tridimensional images tasks was done.. An average difference of 16% was found between both groups. The proposed classification method can contribute and interact with other processes in the analysis and study of human spatial cognition, as in the understanding of the human intelligence at all. A better understanding and evaluation of the cognitive capabilities of an individual could suggest him elements of motivation, ease or natural inclinations, possibly affecting the decisions of his life and carrier positively. Inteligência artificial Redes neurais Non-linear analysis Lyapunov exponents Signal processing Biological signal processing EEG analysis Cognitive profiles Spatial cognition Classification algorithms Artificial neural networks MLP
54	Avaliação do uso de classificadores para verificação de atendimento a critérios de seleção em programas sociais Santos, Cinara de Jesus 07 March 2017 (has links) Submitted by isabela.moljf@hotmail.com (isabela.moljf@hotmail.com) on 2017-08-15T12:01:50Z No. of bitstreams: 1 cinaradejesussantos.pdf: 4566569 bytes, checksum: bddc2ea97276541c0a8ad30a371102d1 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-08-15T12:02:54Z (GMT) No. of bitstreams: 1 cinaradejesussantos.pdf: 4566569 bytes, checksum: bddc2ea97276541c0a8ad30a371102d1 (MD5) / Made available in DSpace on 2017-08-15T12:02:54Z (GMT). No. of bitstreams: 1 cinaradejesussantos.pdf: 4566569 bytes, checksum: bddc2ea97276541c0a8ad30a371102d1 (MD5) Previous issue date: 2017-03-07 / Classificadores são separadores de grupos que mediante determinadas características organiza os dados agrupando elementos que apresentem traços semelhantes, o que permite reconhecimento de padrões e identificação de elementos que não se encaixam. Esse procedimento de classificação e separação pode ser observado em processos do cotidiano como exames (clínicos ou por imagem), separadores automáticos de grãos na agroindústria, identificador de probabilidades, reconhecedores de caracteres, identificação biométrica - digital, íris, face, etc. O estudo aqui proposto utiliza uma base de dados do Ministério do Desenvolvimento Social e Combate a Fome (MDS), contendo informações sobre beneficiários do Programa Bolsa Família (PBF), onde contamos com registros descritores do ambiente domiciliar, grau de instrução dos moradores do domicílio assim como o uso de serviços de saúde pelos mesmos e informações de cunho financeiro (renda e gastos das famílias). O foco deste estudo não visa avaliar o PBF, mas o comportamento de classificadores aplicados sobre bases de caráter social, pois estas apresentam certas particularidades. Sobre as variáveis que descrevem uma família como beneficiária ou não do PBF, testamos três algoritmos classificadores - regressão logística, árvore binária de decisão e rede neural artificial em múltiplas camadas. O desempenho destes processos foi medido a partir de métricas decorrentes da chamada matriz de confusão. Como os erros e acertos de uma classe n˜ao s˜ao os complementares da outra classe é de suma importância que ambas sejam corretamente identificadas. Um desempenho satisfatório para ambas as classes em um mesmo cenário não foi alçado - a identificação do grupo minoritário apresentou baixa eficiência mesmo com reamostragem seguida de reaplicação dos três processos classificatórios escolhidos, o que aponta para a necessidade de novos experimentos. / Classifiers are group separators that, by means of certain characteristics, organize the data by grouping elements that present similar traits, which allows pattern recognition and the identification of elements that do not fit. Classification procedures can be used in everyday processes such as clinical or imaging exams, automatic grain separators in agribusiness, probability identifiers, character recognition, biometric identification by thumbprints, iris, face, etc. This study uses a database of the Ministry of Social Development and Fight against Hunger (MDS), containing information on beneficiaries of the Bolsa Fam´ılia Program (PBF). The data describe the home environment, the level of education of the residents of the household, their use of public health services, and some financial information (income and expenses of families). The focus of this study is not to evaluate the PBF, but to analyze the performance of the classifiers when applied to bases of social character, since these have certain peculiarities. We have tested three classification algorithms - logistic regression, binary decision trees and artificial neural networks. The performance of these algorithms was measured by metrics computed from the so-called confusion matrix. As the probabilities of right and wrong classifications of a class are not complementary, it is of the utmost importance that both are correctly identified. A good evaluation could not be archive for both classes in a same scenario was not raised - the identification of the minority group showed low efficiency even with resampling followed by reapplication of the three classificatory processes chosen, which points to the need for new experiments. CNPQ::CIENCIAS EXATAS E DA TERRA Algoritmos classificadores Predição Regressão logística Árvore de decisão Redes neurais articias Classificadores binários Classification algorithms Prediction Logistic regression Decision trees Artificial neural networks Binary classifiers
55	Využití hyperspektrálních dat ke klasifikaci vegetace alpínského bezlesí v Krkonoších / Hyperspectral data for classification of alpine treeless vegetation in the Krkonoše Mts. Andrštová, Martina January 2014 (has links) Hyperspectral data for classification of vegetation of alpine treeless in the Krkonoše Mts. ABSTRACT The Master Thesis is a part of the HyMountEcos project, which deals with a complex evaluation of mountain's ecosystems in the Giant Mountains National Park using the hyperspectral data. The area of interest is alpine treeless in the Giant Mountains National Park. The main goal of this thesis was to create detailed methodology for classification of vegetation cover using hyperspectral data from AISA DUAL and APEX sensors, to find a classification method, which would improve the accuracy of the results compared to those found in the literature, and to compare the accuracy reached with these two types of the data. Many different classification algorithms (Spectral Angle Mapper, Linear Spectral Unmixing, Support Vector Machine, MESMA a Neural Net) were applied and the classification results were statistically evaluated and compared in the next part of the work. The classification method Neural Net was found as the most accurate one, as it gives the most accurate results for APEX data (the overall accuracy 96 %, Kappa coefficient 0,95) as well as for AISA DUAL data (the overall accuracy 90 %, Kappa coefficient 0,88). The resulting accuracy of the classification (the overall one and also for some classes) reached...
56	Detecting Signal Corruptions in Voice Recordings for Speech Therapy / Igenkänning av Signalproblem i Röstinspelningar för Logopedi Nylén, Helmer January 2021 (has links) When recording voice samples from a patient in speech therapy the quality of the recording may be affected by different signal corruptions, for example background noise or clipping. The equipment and expertise required to identify small disturbances are not always present at smaller clinics. Therefore, this study investigates possible machine learning algorithms to automatically detect selected corruptions in speech signals, including infrasound and random muting. Five algorithms are analyzed: kernel substitution based Support Vector Machine, Convolutional Neural Network, Long Short-term Memory (LSTM), Gaussian Mixture Model based Hidden Markov Model and Generative Model based Hidden Markov Model. A tool to generate datasets of corrupted recordings is developed to test the algorithms in both single-label and multi-label settings. Mel-frequency Cepstral Coefficients are used as the main features. For each type of corruption different ways to increase the classification accuracy are tested, for example by using a Voice Activity Detector to filter out less relevant parts of the recording, changing the feature parameters, or using an ensemble of classifiers. The experiments show that a machine learning approach is feasible for this problem as a balanced accuracy of at least 75% is reached on all tested corruptions. While the single-label study gave mixed results with no algorithm clearly outperforming the others, in the multi-label case the LSTM in general performs better than other algorithms. Notably it achieves over 95% balanced accuracy on both white noise and infrasound. As the algorithms are trained only on spoken English phrases the usability of this tool in its current state is limited, but the experiments are easily expanded upon with other types of audio recordings, corruptions, features, or classification algorithms. / När en patients röst spelas in för analys i talterapi kan inspelningskvaliteten påverkas av olika signalproblem, till exempel bakgrundsljud eller klippning. Utrustningen och expertisen som behövs för att upptäcka små störningar finns dock inte alltid tillgänglig på mindre kliniker. Därför undersöker denna studie olika maskininlärningsalgoritmer för att automatiskt kunna upptäcka utvalda problem i talinspelningar, bland andra infraljud och slumpmässig utsläckning av signalen. Fem algoritmer analyseras: stödvektormaskin, Convolutional Neural Network, Long Short-term Memory (LSTM), Gaussian mixture model-baserad dold Markovmodell och generatorbaserad dold Markovmodell. Ett verktyg för att skapa datamängder med försämrade inspelningar utvecklas för att kunna testa algoritmerna. Vi undersöker separat fallen där inspelningarna tillåts ha en eller flera problem samtidigt, och använder framförallt en slags kepstralkoefficienter, MFCC:er, som särdrag. För varje typ av problem undersöker vi också sätt att förbättra noggrannheten, till exempel genom att filtrera bort irrelevanta delar av signalen med hjälp av en röstupptäckare, ändra särdragsparametrarna, eller genom att använda en ensemble av klassificerare. Experimenten visar att maskininlärning är ett rimligt tillvägagångssätt för detta problem då den balanserade träffsäkerheten överskrider 75%för samtliga testade störningar. Den delen av studien som fokuserade på enproblemsinspelningar gav inga resultat som tydde på att en algoritm var klart bättre än de andra, men i flerproblemsfallet överträffade LSTM:en generellt övriga algoritmer. Värt att notera är att den nådde över 95 % balanserad träffsäkerhet på både vitt brus och infraljud. Eftersom algoritmerna enbart tränats på engelskspråkiga, talade meningar så har detta verktyg i nuläget begränsad praktisk användbarhet. Däremot är det lätt att utöka dessa experiment med andra typer av inspelningar, signalproblem, särdrag eller algoritmer. Noise Classification algorithms Audio recording Machine learning Acoustic signal processing Störning Klassificeringsalgoritmer Ljudinspelning Maskininlärning Akustisk signalbehandling Computer and Information Sciences Data- och informationsvetenskap
57	Learning Algorithms Using Chance-Constrained Programs Jagarlapudi, Saketha Nath 07 1900 (has links) This thesis explores Chance-Constrained Programming (CCP) in the context of learning. It is shown that chance-constraint approaches lead to improved algorithms for three important learning problems — classification with specified error rates, large dataset classification and Ordinal Regression (OR). Using moments of training data, the CCPs are posed as Second Order Cone Programs (SOCPs). Novel iterative algorithms for solving the resulting SOCPs are also derived. Borrowing ideas from robust optimization theory, the proposed formulations are made robust to moment estimation errors. A maximum margin classifier with specified false positive and false negative rates is derived. The key idea is to employ chance-constraints for each class which imply that the actual misclassification rates do not exceed the specified. The formulation is applied to the case of biased classification. The problems of large dataset classification and ordinal regression are addressed by deriving formulations which employ chance-constraints for clusters in training data rather than constraints for each data point. Since the number of clusters can be substantially smaller than the number of data points, the resulting formulation size and number of inequalities are very small. Hence the formulations scale well to large datasets. The scalable classification and OR formulations are extended to feature spaces and the kernelized duals turn out to be instances of SOCPs with a single cone constraint. Exploiting this speciality, fast iterative solvers which outperform generic SOCP solvers, are proposed. Compared to state-of-the-art learners, the proposed algorithms achieve a speed up as high as 10000 times, when the specialized SOCP solvers are employed. The proposed formulations involve second order moments of data and hence are susceptible to moment estimation errors. A generic way of making the formulations robust to such estimation errors is illustrated. Two novel confidence sets for moments are derived and it is shown that when either of the confidence sets are employed, the robust formulations also yield SOCPs. Machine Learning Classification Dataset Classification Ordinal Regression (OR) Chance-Constrained Programming (CCP) Classification - Algorithms Ordinal Regression - Algorithms Machine Learning - Algorithms Second Order Cone Programs (SOCPs) Maximum Margin Classification Focused Crawling Large Datasets Error Rates Computer Science
58	Efficient Frequent Closed Itemset Algorithms With Applications To Stream Mining And Classification Ranganath, B N 09 1900 (has links) Data mining is an area to find valid, novel, potentially useful, and ultimately understandable abstractions in a data. Frequent itemset mining is one of the important data mining approaches to find those abstractions in the form of patterns. Frequent Closed itemsets provide complete and condensed information for non-redundant association rules generation. For many applications mining all the frequent itemsets is not necessary, and mining frequent Closed itemsets are adequate. Compared to frequent itemset mining, frequent Closed itemset mining generates less number of itemsets, and therefore improves the efficiency and effectiveness of these tasks. Recently, much research has been done on Closed itemsets mining, but it is mainly for traditional databases where multiple scans are needed, and whenever new transactions arrive, additional scans must be performed on the updated transaction database; therefore, they are not suitable for data stream mining. Mining frequent itemsets from data streams has many potential and broad applications. Some of the emerging applications of data streams that require association rule mining are network traffic monitoring and web click streams analysis. Different from data in traditional static databases, data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. Recent works on data stream mining based on sliding window method slide the window by one transaction at a time. But when the window size is large and support threshold is low, the existing methods consume significant time and lead to a large increase in user response time. In our first work, we propose a novel algorithm Stream-Close based on sliding window model to mine frequent Closed itemsets from the data streams within the current sliding window. We enhance the scalabality of the algorithm by introducing several optimization techniques such as sliding the window by multiple transactions at a time and novel pruning techniques which lead to a considerable reduction in the number of candidate itemsets to be examined for closure checking. Our experimental studies show that the proposed algorithm scales well with large data sets. Still the notion of frequent closed itemsets generates a huge number of closed itemsets in some applications. This drawback makes frequent closed itemsets mining infeasible in many applications since users cannot interpret the large volume of output (which sometimes will be greater than the data itself when support threshold is low) and may lead to an overhead to develop extra applications which post processes the output of original algorithm to reduce the size of the output. Recent work on clustering of itemsets considers strictly either expression(consists of items present in itemset) or support of the itemsets or partially both to reduce the number of itemsets. But the drawback of the above approaches is that in some situations, number of itemsets does not reduce due to their restricted view of either considering expressions or support. So we propose a new notion of frequent itemsets called clustered itemsets which considers both expressions and support of the itemsets in summarizing the output. We introduce a new distance measure w.r.t expressions and also prove the problem of mining clustered itemsets to be NP-hard. In our second work, we propose a deterministic locality sensitive hashing based classifier using clustered itemsets. Locality sensitive hashing(LSH)is a technique for efficiently finding a nearest neighbour in high dimensional data sets. The idea of locality sensitive hashing is to hash the points using several hash functions to ensure that for each function the probability of collision is much higher for objects that are close to each other than those that are far apart. We propose a LSH based approximate nearest neighbour classification strategy. But the problem with LSH is, it randomly chooses hash functions and the estimation of a large number of hash functions could lead to an increase in query time. From Classification point of view, since LSH chooses randomly from a family of hash functions the buckets may contain points belonging to other classes which may affect classification accuracy. So, in order to overcome these problems we propose to use class association rules based hash functions which ensure that buckets corresponding to the class association rules contain points from the same class. But associative classification involves generation and examination of large number of candidate class association rules. So, we use the clustered itemsets which reduce the number of class association rules to be examined. We also establish formal connection between clustering parameter(delta used in the generation of clustered frequent itemsets) and discriminative measure such as Information gain. Our experimental studies show that the proposed method achieves an increase in accuracy over LSH based near neighbour classification strategy. Data Mining Classification - Algorithms Frequent Itemset Mining Clustered Itemsets Data Stream Mining Locality Sensitive Hashing Stream-Close Algorithm Associative Classification Clustered Frequent Itemsets Closed Frequent Itemsets Stream Mining Computer Science
59	Análise de similaridade entre classes e padrões de ativação neuronal. / Analysis of similarity between classes and patterns of neuronal activation. SARAIVA, Eugênio de Carvalho. 04 April 2018 (has links) Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-04-04T21:48:36Z No. of bitstreams: 1 EUGÊNIO DE CARVALHO SARAIVA - DISSERTAÇÃO PPGCC 2014..pdf: 2813039 bytes, checksum: 9b76f48c8df4aee95923a8ce5f0385ce (MD5) / Made available in DSpace on 2018-04-04T21:48:36Z (GMT). No. of bitstreams: 1 EUGÊNIO DE CARVALHO SARAIVA - DISSERTAÇÃO PPGCC 2014..pdf: 2813039 bytes, checksum: 9b76f48c8df4aee95923a8ce5f0385ce (MD5) Previous issue date: 2014-07-30 / Há um número crescente de tecnologias que fazem uso de algoritmos de classificação para a automação de tarefas. Em particular, em Neurociências, algoritmos de classificação foram usados para testar hipóteses sobre o funcionamento do sistema nervoso central. No entanto, a relação entre as classes de padrões de ativação neuronal de áreas específicas do cérebro, como resultado de experiências sensoriais tem recebido pouca atenção. No contexto da Neurociência Computacional, este trabalho apresenta uma análise do nível de similaridade entre classes de padrões de ativação neuronal, com o uso das abordagens de aprendizagem não supervisionada e semi-supervisionada, em áreas específicas do cérebro de ratos em contato com objetos, obtidos durante um experimento envolvendo exploração livre de objetos pelos animais. As classes foram definidas de acordo com determinados tratamentos construídos com níveis específicos de um conjunto de 8 fatores (Animal, Região do Cérebro, Objeto ou Par de Objeto, Algoritmo de Agrupamento, Métrica, Bin, Janela e Intervalo de Contato). No total foram analisados 327.680 tratamentos. Foram definidas hipóteses quanto à relação de cada um dos fatores para com o nível de similaridade existente entre os tratamentos. As hipóteses foram verificadas por meio de testes estatísticos entre as distribuições que representavam cada uma das classes. Foram realizados testes de normalidade (Shapiro-Wilk, QQ-plot), análise de variância e um teste para diferenças entre tendência central (Kruskal-Wallis). Com base nos resultados encontrados nos estudos utilizando abordagem não supervisionada, foi inferido que os processos de aquisição e de definição dos padrões de ativação por um observador foram sujeitos a uma quantidade não significativa de ruídos causados por motivos não controláveis. Pela abordagem semisupervisionada, foi observado que nem todos os graus de similaridade entre pares de classes de objetos são iguais a um dado tratamento, o que indicou que a similaridade entre classes de padrões de ativação neuronal é sensível a todos os fatores analisados e fornece evidências da complexidade na codificação neuronal. / There are a growing number of technologies that make use of classification algorithms for automating tasks. In particular, in Neuroscience, classification algorithms were used to test hypotheses about the functioning of the central nervous system. However, the relationship between the classes of patterns of neuronal activation in specific brain areas as a result of sensorial experience has received little attention. In the context of Computational Neuroscience , this paper presents an analysis of the level of similarity between classes of patterns of neuronal activation with the use of learning approaches unsupervised and semi - supervised in specific areas of rat brain in contact with objects , obtained during an experiment involving free exploration of objects by animals. The classes were defined according to certain treatments constructed with specific levels with set of 8 factors (Animal, Brain Region, Object or Pair of Objects, Clustering Algorithm, Metric, Bin, Window and Interval Contact). In total 327.680 treatments were analyzed. Hypotheses regarding the relationship of each of the factors with the existing level of similarity between treatments were defined. The hypotheses were tested through between statistical distributions representing each class tests. The tests applied where the tests for normality (Shapiro-Wilk, QQ–plot), analysis of variance and a test for differences in central tendency (Kruskal-Wallis) were performed. Based on the results found in studies using an unsupervised approach, it was inferred that the process of acquisition and definition of patterns of activation by an observer was not subject to a significant amount of noise caused by uncontrollable reasons. For the semi-supervised approach, it was observed that not all degrees of similarity between pairs of classes of objects are equal to a given treatment, which indicated that the similarity between classes of patterns of neuronal activation is sensitive to all the factors analyzed and provides evidence about the complexity of neuronal coding. Ciência da computação Neurociência computacional Neurociência computacional Ativação neuronal Algoritmos de classificação Experiências sensoriais Abordagem supervisionada - neurociência Computational Neuroscience Neuronal activation Classification algorithms Sensory Experiences Semi-supervised approach - neuroscience Supervised approach - neuroscience
60	Multi-label Classification with Multiple Label Correlation Orders And Structures Posinasetty, Anusha January 2016 (has links) (PDF) Multilabel classification has attracted much interest in recent times due to the wide applicability of the problem and the challenges involved in learning a classifier for multilabeled data. A crucial aspect of multilabel classification is to discover the structure and order of correlations among labels and their effect on the quality of the classifier. In this work, we propose a structural Support Vector Machine (structural SVM) based framework which enables us to systematically investigate the importance of label correlations in multi-label classification. The proposed framework is very flexible and provides a unified approach to handle multiple correlation orders and structures in an adaptive manner and helps to effectively assess the importance of label correlations in improving the generalization performance. We perform extensive empirical evaluation on several datasets from different domains and present results on various performance metrics. Our experiments provide for the first time, interesting insights into the following questions: a) Are label correlations always beneficial in multilabel classification? b) What effect do label correlations have on multiple performance metrics typically used in multilabel classification? c) Is label correlation order significant and if so, what would be the favorable correlation order for a given dataset and a given performance metric? and d) Can we make useful suggestions on the label correlation structure? Multi Label Classification Structural Support Vector Machine Machine Learning Multiclass Classification Multi-Label Classification Algorithms Structural SVM Computer Science

Search results