• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 29
  • 7
  • 3
  • 2
  • 1
  • Tagged with
  • 54
  • 54
  • 32
  • 12
  • 10
  • 9
  • 8
  • 7
  • 6
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Contributions to Computational Methods for Association Extraction from Biomedical Data: Applications to Text Mining and In Silico Toxicology

Raies, Arwa B. 29 November 2018 (has links)
The task of association extraction involves identifying links between different entities. Here, we make contributions to two applications related to the biomedical field. The first application is in the domain of text mining aiming at extracting associations between methylated genes and diseases from biomedical literature. Gathering such associations can benefit disease diagnosis and treatment decisions. We developed the DDMGD database to provide a comprehensive repository of information related to genes methylated in diseases, gene expression, and disease progression. Using DEMGD, a text mining system that we developed, and with an additional post-processing, we extracted ~100,000 of such associations from free-text. The accuracy of extracted associations is 82% as estimated on 2,500 hand-curated entries. The second application is in the domain of computational toxicology that aims at identifying relationships between chemical compounds and toxicity effects. Identifying toxicity effects of chemicals is a necessary step in many processes including drug design. To extract these associations, we propose using multi-label classification (MLC) methods. These methods have not undergone comprehensive benchmarking in the domain of predictive toxicology that could help in identifying guidelines for overcoming the existing deficiencies of these methods. Therefore, we performed extensive benchmarking and analysis of ~19,000 MLC models. We demonstrated variability in the performance of these models under several conditions and determined the best performing model that achieves accuracy of 91% on an independent testing set. Finally, we propose a novel framework, LDR (learning from dense regions), for developing MLC and multi-target regression (MTR) models from datasets with missing labels. The framework is generic, so it can be applied to predict associations between samples and discrete or continuous labels. Our assessment shows that LDR performed better than the baseline approach (i.e., the binary relevance algorithm) when evaluated using four MLC and five MTR datasets. LDR achieved accuracy scores of up to 97% using testing MLC datasets, and R2 scores up to 88% for testing MTR datasets. Additionally, we developed a novel method for minority oversampling to tackle the problem of imbalanced MLC datasets. Our method improved the precision score of LDR by 10%.
22

An Improved Classifier Chain Ensemble for Multi-DimensionalClassification with Conditional Dependence

Heydorn, Joseph Ethan 01 July 2015 (has links) (PDF)
We focus on multi-dimensional classification (MDC) problems with conditional dependence, which we call multiple output dependence (MOD) problems. MDC is the task of predicting a vector of categorical outputs for each input. Conditional dependence in MDC means that the choice for one output value affects the choice for others, so it is not desirable to predict outputs independently. We show that conditional dependence in MDC implies that a single input can map to multiple correct output vectors. This means it is desirable to find multiple correct output vectors per input. Current solutions for MOD problems are not sufficient because they predict only one of the correct output vectors per input, ignoring all others.We modify four existing MDC solutions, including chain classifiers, to predict multiple output vectors. We further create a novel ensemble technique named weighted output vector ensemble (WOVE) which combines these multiple predictions from multiple chain classifiers in a way that preserves the integrity of output vectors and thus preserves conditional dependence among outputs. We verify the effectiveness of WOVE by comparing it against 7 other solutions on a variety of data sets and find that it shows significant gains over existing methods.
23

Estudos in silico com alcaloides oriundos de produtos naturais

Lorenzo, Vitor Prates 26 February 2016 (has links)
Submitted by Maike Costa (maiksebas@gmail.com) on 2017-09-13T11:59:49Z No. of bitstreams: 1 arquivototal.pdf: 7758959 bytes, checksum: db745d41b196978192ebc789e25f442b (MD5) / Made available in DSpace on 2017-09-13T11:59:49Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 7758959 bytes, checksum: db745d41b196978192ebc789e25f442b (MD5) Previous issue date: 2016-02-26 / The use of plants for medicinal purposes is one of the oldest forms of medical practice of mankind, emphasizing the alkaloids because they present rich structural and pharmacological properties extensive variety. The drug design is aided by computer based strategies based on linkers or target. When developing new compounds, the structure-based techniques, such as docking, can be applied to study of certain receptor and its corresponding ligand, evaluating bindingprotein interactions. Whereas in the ligand-based methods, a database of known ligands is used, looking for ways to evaluate parameters (molecular descriptors) that can assist in the development of compounds with higher power. This study aimed to perform in silico studies to investigate drug-target interactions with alkaloids derived from natural products and their analogues with relevant pharmacological activity. Different molecular descriptors and methodologies were used in the studies developed. In chapter 2, the interaction of alkaloid bisindolic caulerpine (CLP) was evaluated with the enzyme involved in Alzheimer's disease (AD) monoamine oxidase B (MAO-B), and a database with 109 analogs. It was possible to observe a chemical parameter of inhibition of PLC analogues where the replacement of the radicals must be asymmetric with different polarity. The studies based on the linker and the structure associated with the classification drug-like chemical skeleton suggest that the PLC has potential use in the treatment of AD. In chapter 3, 8 alkaloids isolated Cissampelos sympodialis and 101 derivatives, had their inhibitory potential against enzyme (BACE, GSK-3β and MAO-A) involved in degenerative diseases assessed by in silico methods. consensual analysis showed affinity alkaloids bisbenzilisoquinolinics by BACE, incluindos the roraimine natural alkaloids and simpodialine-β-N-oxide, supporting interest in investigating this skeleton as an antagonist of this enzyme. In Chapter 4 we evaluated the multi-target potential of 148 aphorphinics alkaloids Annonaceae against Leishmania donovani. Six were selected enzymes of this neglected disease for theoretical study, which was associated with experimental four alkaloids available data and integrating the bank, which had pIC50 value inferior to 5.26. The xyloguyelline alkaloid was named as a potential multi-agent target, demonstrating activity against 5 of 6 enzymes evaluated, likely to activity of over 60%. fragment descriptors were used to create model-based binder in a parallel approach with molecular docking to predict the cytotoxic and against topoisomerase II activity azaphenantrene alkaloids in chapter 5. The cytotoxic activity of this skeleton alkaloids are well described in the literature, molecules having activity against several tumor cell lines. The IMB 6 analog and 23 IMB showed interesting activity and selectivity, with MolDock energy similar to liriodenine composed characterized by potent anti-tumor action, but with high toxicity. Important structural information is provided by spectroscopy nuclear magnetic resonance (NMR), and Chapter 6 aimed to discuss the importance of this technique for generating molecular descriptors. Studies that applied successfully in drug design NMR descriptors assisted by computer are described and several QSAR and QSPR having as support data chemical shifts. / A utilização de plantas com fins medicinais é uma das mais antigas formas de prática medicinal da humanidade, enfatizando os alcaloides, por apresentarem rica variedade estrutural e extensa propriedade farmacológica. O desenho de drogas auxiliado pelo computador é fundamentado em estratégias baseadas nos ligantes ou no alvo. No desenvolvimento de novos compostos, técnicas baseadas na estrutura, como o docking, podem ser aplicadas no estudo de um determinado receptor e seu respectivo ligante, avaliando as interações ligante-proteína. Ao passo que nos métodos baseados no ligante, um banco de ligantes conhecidos é utilizado, buscando modos de avaliar parâmetros (descritores moleculares) que possam auxiliar no desenvolvimento de compostos com maior potência. Este estudo teve como objetivo realizar estudos in silico para investigar interações fármaco-alvo com alcaloides oriundos de produtos naturais, e respectivos análogos, com relevante atividade farmacológica. Diferentes descritores moleculares e metodologias foram utilizadas nos estudos desenvolvidos. No capítulo 2, foi avaliado a interação do alcaloide bisindolico caulerpina (CLP) com a enzima envolvida na doença de Alzheimer (DA) monoamina oxidase B (MAO-B), além de um banco com 109 análogos. Foi possível observar um parâmetro químico de inibição dos análogos da CLP, onde a substituição dos radicais deve ser assimétrica com polaridade distinta. Os estudos dos baseados no ligante e na estrutura, associado à classificação drug-like, sugerem que o esqueleto químico da CLP tem potencial uso no tratamento da DA. No capítulo 3, 8 alcaloides isolados de Cissampelos sympodialis e 101 derivados, tiveram seu potencial inibitório contra enzimas (BACE, GSK-3β e MAO-A) envolvidas em doenças degenerativas avaliado por metodologias in silico. Análise consensual demonstrou afinidade de alcaloides bisbenzilisoquinolínicos pela BACE, incluindos os alcaloides naturais roraimina e simpodialina- β-N-oxide, suportando interesse em investigar este esqueleto como antagonista desta enzima. No capítulo 4 foi avaliado o potencial multi-target de 148 alcaloides aporfinicos de Annonaceae contra Leishmania donovani. Foram utilizadas seis enzimas desta doença negligenciada para o estudo teórico, que foi associado com dados experimentais de quatro alcaloides disponíveis e que integram o banco, que apresentaram valor pIC50 inferior a 5.26. O alcaloide xyloguyellina foi apontado como potencial agente multitarget, demonstrando atividade contra 5 das 6 enzimas avaliadas, com probabilidade de atividade superior a 60%. Descritores de fragmento foram utilizados para criar modelo baseado no ligante em uma abordagem paralela com docking molecular, para predizer a atividade citotóxica e contra topoisomerase II de azafenantreno alcaloides, no capítulo 5. A atividade citotóxica deste esqueleto de alcaloides está bem descrita na literatura, com diversas moléculas apresentando atividade contra linhagens de células tumorais. Os análogos IMB 6 e IMB 23 apresentaram interessante atividade e com seletividade, apresentando energia MolDock similar à liriodenina, composto caracterizado por potente ação antitumoral, porém com elevada toxicidade. Importantes informações estruturais são fornecidas pela espectroscopia de ressonância magnética nuclear (RMN), sendo o capítulo 6 destinado a discorrer sobre a importância desta técnica para geração de descritores moleculares. Estudos que aplicaram com sucesso descritores RMN em design de drogas assistida pelo computador encontram-se descritos, além de diversos estudos de QSAR e QSPR tendo como amparo dados de deslocamentos químicos.
24

Systèmes de recommandation pour la publicité en ligne / Recommendation systems for online advertising

Sidana, Sumit 08 November 2018 (has links)
Cette thèse est consacrée à l’étude des systèmes de recommandation basés sur des réseaux de neurones artificiels appris pour faire de l'ordonnancement de produits avec des retours implicites (sous forme de clics). Dans ce sens, nous proposons un nouveau modèle neuronal qui apprend conjointement la représentation des utilisateurs et des produits dans un espace latent, ainsi que la relation de préférence des utilisateurs sur les produits. Nous montrons que le modèle proposé est apprenable au sens du principe de la minimisation du risque empirique et performant par rapport aux autres modèles de l'état de l'art sur plusieurs collections. En outre, nous contribuons à la création de deux nouvelles collections, produites grâce aux enregistrements des comportements de clients de Kelkoo (https://www.kelkoo.com/); le leader européen de la publicité programmatique et de Purch (http://www.purch.com/). Les deux jeux de données recueillent des retours implicites des utilisateurs sur des produits, ainsi qu’un grand nombre d'informations contextuelles concernant à la fois les clients et les produits. La collections de données de Purch contient en plus une information sur la popularité des produits ainsi que des commentaires textuelles associés. Nous proposons, une stratégie simple et efficace sur la manière de prendre en compte le biais de la popularité ainsi qu'un modèle probabiliste latent temporel pour extraire automatiquement les thèmes des textes des commentaires.Mots clés. Systèmes de recommandation, apprentissage d'ordonnancement, réseaux de neurones, recommandations avec des retours implicites, Modèles probabilistes latents temporels / This thesis is dedicated to the study of Recommendation Systems for implicit feedback (clicks) mostly using Learning-to-rank and neural network based approaches. In this line, we derive a novel Neural-Network model that jointly learns a new representation of users and items in an embedded space as well as the preference relation of users over the pairs of items and give theoretical analysis. In addition we contribute to the creation of two novel, publicly available, collections for recommendations that record the behavior of customers of European Leaders in eCommerce advertising, Kelkoofootnote{url{https://www.kelkoo.com/}} and Purchfootnote{label{purch}url{http://www.purch.com/}}. Both datasets gather implicit feedback, in form of clicks, of users, along with a rich set of contextual features regarding both customers and offers. Purch's dataset, is affected by popularity bias. Therefore, we propose a simple yet effective strategy on how to overcome the popularity bias introduced while designing an efficient and scalable recommendation algorithm by introducing diversity based on an appropriate representation of items. Further, this collection contains contextual information about offers in form of text. We make use of this textual information in novel time-aware topic models and show the use of topics as contextual information in Factorization Machines that improves performance. In this vein and in conjunction with a detailed description of the datasets, we show the performance of six state-of-the-art recommender models.Keywords. Recommendation Systems, Data Sets, Learning-to-Rank, Neural Network, Popularity Bias, Diverse Recommendations, Contextual information, Topic Model.
25

Multi-camera Computer Vision for Object Tracking: A comparative study

Turesson, Eric January 2021 (has links)
Background: Video surveillance is a growing area where it can help with deterring crime, support investigation or to help gather statistics. These are just some areas where video surveillance can aid society. However, there is an improvement that could increase the efficiency of video surveillance by introducing tracking. More specifically, tracking between cameras in a network. Automating this process could reduce the need for humans to monitor and review since the tracking can track and inform the relevant people on its own. This has a wide array of usability areas, such as forensic investigation, crime alerting, or tracking down people who have disappeared. Objectives: What we want to investigate is the common setup of real-time multi-target multi-camera tracking (MTMCT) systems. Next up, we want to investigate how the components in an MTMCT system affect each other and the complete system. Lastly, we want to see how image enhancement can affect the MTMCT. Methods: To achieve our objectives, we have conducted a systematic literature review to gather information. Using the information, we implemented an MTMCT system where we evaluated the components to see how they interact in the complete system. Lastly, we implemented two image enhancement techniques to see how they affect the MTMCT. Results: As we have discovered, most often, MTMCT is constructed using a detection for discovering object, tracking to keep track of the objects in a single camera and a re-identification method to ensure that objects across cameras have the same ID. The different components have quite a considerable effect on each other where they can sabotage and improve each other. An example could be that the quality of the bounding boxes affect the data which re-identification can extract. We discovered that the image enhancement we used did not introduce any significant improvement. Conclusions: The most common structure for MTMCT are detection, tracking and re-identification. From our finding, we can see that all the component affect each other, but re-identification is the one that is mostly affected by the other components and the image enhancement. The two tested image enhancement techniques could not introduce enough improvement, but other image enhancement could be used to make the MTMCT perform better. The MTMCT system we constructed did not manage to reach real-time.
26

Received radiation dose assessment for nuclear plants personnel by video-based surveillance

Jorge, Carlos Alexandre Fructuoso 07 1900 (has links)
Submitted by Almir Azevedo (barbio1313@gmail.com) on 2015-08-24T17:42:07Z No. of bitstreams: 1 CARLOS ALEXANDRE F. JORGE D.pdf: 11356748 bytes, checksum: 59927b7a303fb41d249f403942824b9a (MD5) / Made available in DSpace on 2015-08-24T17:42:07Z (GMT). No. of bitstreams: 1 CARLOS ALEXANDRE F. JORGE D.pdf: 11356748 bytes, checksum: 59927b7a303fb41d249f403942824b9a (MD5) Previous issue date: 2015-07 / This work proposes the development of a system to evaluate received radiation dose for nuclear plants personnel. The system is conceived to operate in a complementary form to the existing approaches for radiological protection, thus o ering redundancy, what is desirable for critical plants operation. The proposed system must operate in an independent form on the actions to be performed by the operators under evaluation. Therefore, it was decided it would be based on methods used for video surveillance. The nuclear plant used as example is Argonauta Nuclear Research Reactor, belonging to Instituto de Engenharia Nuclear, Comiss~ao Nacional de Energia Nuclear (Nuclear Engineering Institute, National Nuclear Energy Commission). During this thesis research, both radiation dose rate distribution and video databases were obtained. Methods available in the literature, for targets detection and/or tracking, were evaluated for this database. From these results, a new system was proposed, with the purpose of meeting the requisites for this particular application. Given the tracked positions of each worker, the radiation dose received by each one during tasks execution is estimated, and may serve as part of a decision support system.
27

Designing multi-target salesforce incentive contract

HUANG, Wenxin 07 September 2015 (has links)
Multi-target incentive contracts are widely observed in practice to stimulate salesforce effort. However, little is known about their effectiveness and the issues involved in designing them. In this thesis, we investigate the incentive contracting problem between a manufacturer and an agent when the realized sales of a product are affected by both the agent's selling effort and the type of the agent. The agent's type is uncertain to the manufacturer, whereas the agent can observe the actual type when exerting her selling effort. Again, this is unobservable by the manufacturer. For contract design problem, we develop a principal-agent model with both moral hazard and adverse selection. We examine the manufacturer's optimal contract parameter decisions employing a single multi-target contract for the agent who can be of different types. Because menu contracts are commonly studied in literature for the adverse selection problem, we also study a menu of single-target contracts; and examine the manufacturer's optimal contract parameter decisions. We then compare the performance between the two types of contract. We arrive at a number of managerial insights regarding the design and the performance of multi-target contract and menu contract.
28

Apprentissage en ligne de signatures audiovisuelles pour la reconnaissance et le suivi de personnes au sein d'un réseau de capteurs ambiants / Online learning of audiovisual signatures for people recognition and tracking within a network of ambient sensors

Decroix, François-Xavier 20 December 2017 (has links)
L'opération neOCampus, initiée en 2013 par l'Université Paul Sabatier, a pour objectif de créer un campus connecté, innovant, intelligent et durable en exploitant les compétences de 11 laboratoires et de plusieurs partenaires industriels. Pluridisciplinaires, ces compétences sont croisées dans le but d'améliorer le confort au quotidien des usagers du campus (étudiants, corps enseignant, personnel administratif) et de diminuer son empreinte écologique. L'intelligence que nous souhaitons apporter au Campus du futur exige de fournir à ses bâtiments une perception de son activité interne. En effet, l'optimisation des ressources énergétiques nécessite une caractérisation des activités des usagers afin que le bâtiment puisse s'y adapter automatiquement. L'activité humaine étant sujet à plusieurs niveaux d'interprétation nos travaux se focalisent sur l'extraction des déplacements des personnes présentes, sa composante la plus élémentaire. La caractérisation de l'activité des usagers, en termes de déplacements, exploite des données extraites de caméras et de microphones disséminés dans une pièce, ces derniers formant ainsi un réseau épars de capteurs hétérogènes. Nous cherchons alors à extraire de ces données une signature audiovisuelle et une localisation grossière des personnes transitant dans ce réseau de capteurs. Tout en préservant la vie privée de l'individu, la signature doit être discriminante, afin de distinguer les personnes entre elles, et compacte, afin d'optimiser les temps de traitement et permettre au bâtiment de s'auto-adapter. Eu égard à ces contraintes, les caractéristiques que nous modélisons sont le timbre de la voix du locuteur, et son apparence vestimentaire en termes de distribution colorimétrique. Les contributions scientifiques de ces travaux s'inscrivent ainsi au croisement des communautés parole et vision, en introduisant des méthodes de fusion de signatures sonores et visuelles d'individus. Pour réaliser cette fusion, des nouveaux indices de localisation de source sonore ainsi qu'une adaptation audiovisuelle d'une méthode de suivi multi-cibles ont été introduits, représentant les contributions principales de ces travaux. Le mémoire est structuré en 4 chapitres. Le premier présente un état de l'art sur les problèmes de ré-identification visuelle de personnes et de reconnaissance de locuteurs. Les modalités sonores et visuelles ne présentant aucune corrélation, deux signatures, une vidéo et une audio sont générées séparément, à l'aide de méthodes préexistantes de la littérature. Le détail de la génération de ces signatures est l'objet du chapitre 2. La fusion de ces signatures est alors traitée comme un problème de mise en correspondance d'observations audio et vidéo, dont les détections correspondantes sont cohérentes et compatibles spatialement, et pour lesquelles deux nouvelles stratégies d'association sont introduites au chapitre 3. La cohérence spatio-temporelle des observations sonores et visuelles est ensuite traitée dans le chapitre 4, dans un contexte de suivi multi-cibles. / The neOCampus operation, started in 2013 by Paul Sabatier University in Toulouse, aims to create a connected, innovative, intelligent and sustainable campus, by exploiting the skills of 11 laboratories and several industrial partners. These multidisciplinary skills are combined in order to improve users (students, teachers, administrative staff) daily comfort and to reduce the ecological footprint of the campus. The intelligence we want to bring to the campus of the future requires to provide to its buildings a perception of its intern activity. Indeed, optimizing the energy resources needs a characterization of the user's activities so that the building can automatically adapt itself to it. Human activity being open to multiple levels of interpretation, our work is focused on extracting people trajectories, its more elementary component. Characterizing users activities, in terms of movement, uses data extracted from cameras and microphones distributed in a room, forming a sparse network of heterogeneous sensors. From these data, we then seek to extract audiovisual signatures and rough localizations of the people transiting through this network of sensors. While protecting person privacy, signatures must be discriminative, to distinguish a person from another one, and compact, to optimize computational costs and enables the building to adapt itself. Having regard to these constraints, the characteristics we model are the speaker's timbre, and his appearance, in terms of colorimetric distribution. The scientific contributions of this thesis are thus at the intersection of the fields of speech processing and computer vision, by introducing new methods of fusing audio and visual signatures of individuals. To achieve this fusion, new sound source location indices as well as an audiovisual adaptation of a multi-target tracking method were introduced, representing the main contributions of this work. The thesis is structured in 4 chapters, and the first one presents the state of the art on visual reidentification of persons and speaker recognition. Acoustic and visual modalities are not correlated, so two signatures are separately computed, one for video and one for audio, using existing methods in the literature. After a first chapter dedicated to the state of the art in re-identification and speaker recognition methods, the details of the computation of the signatures is explored in chapter 2. The fusion of the signatures is then dealt as a problem of matching between audio and video observations, whose corresponding detections are spatially coherent and compatible. Two novel association strategies are introduced in chapter 3. Spatio-temporal coherence of the bimodal observations is then discussed in chapter 4, in a context of multi-target tracking.
29

Novel Support Vector Machines for Diverse Learning Paradigms

Melki, Gabriella A 01 January 2018 (has links)
This dissertation introduces novel support vector machines (SVM) for the following traditional and non-traditional learning paradigms: Online classification, Multi-Target Regression, Multiple-Instance classification, and Data Stream classification. Three multi-target support vector regression (SVR) models are first presented. The first involves building independent, single-target SVR models for each target. The second builds an ensemble of randomly chained models using the first single-target method as a base model. The third calculates the targets' correlations and forms a maximum correlation chain, which is used to build a single chained SVR model, improving the model's prediction performance, while reducing computational complexity. Under the multi-instance paradigm, a novel SVM multiple-instance formulation and an algorithm with a bag-representative selector, named Multi-Instance Representative SVM (MIRSVM), are presented. The contribution trains the SVM based on bag-level information and is able to identify instances that highly impact classification, i.e. bag-representatives, for both positive and negative bags, while finding the optimal class separation hyperplane. Unlike other multi-instance SVM methods, this approach eliminates possible class imbalance issues by allowing both positive and negative bags to have at most one representative, which constitute as the most contributing instances to the model. Due to the shortcomings of current popular SVM solvers, especially in the context of large-scale learning, the third contribution presents a novel stochastic, i.e. online, learning algorithm for solving the L1-SVM problem in the primal domain, dubbed OnLine Learning Algorithm using Worst-Violators (OLLAWV). This algorithm, unlike other stochastic methods, provides a novel stopping criteria and eliminates the need for using a regularization term. It instead uses early stopping. Because of these characteristics, OLLAWV was proven to efficiently produce sparse models, while maintaining a competitive accuracy. OLLAWV's online nature and success for traditional classification inspired its implementation, as well as its predecessor named OnLine Learning Algorithm - List 2 (OLLA-L2), under the batch data stream classification setting. Unlike other existing methods, these two algorithms were chosen because their properties are a natural remedy for the time and memory constraints that arise from the data stream problem. OLLA-L2's low spacial complexity deals with memory constraints imposed by the data stream setting, and OLLAWV's fast run time, early self-stopping capability, as well as the ability to produce sparse models, agrees with both memory and time constraints. The preliminary results for OLLAWV showed a superior performance to its predecessor and was chosen to be used in the final set of experiments against current popular data stream methods. Rigorous experimental studies and statistical analyses over various metrics and datasets were conducted in order to comprehensively compare the proposed solutions against modern, widely-used methods from all paradigms. The experimental studies and analyses confirm that the proposals achieve better performances and more scalable solutions than the methods compared, making them competitive in their respected fields.
30

Redes de regras de associação filtradas e multialvo / Filtered and multi-target association rules networks

Calçada, Dario Brito 21 March 2019 (has links)
A descoberta de Regras de Associação é uma tarefa de mineração de dados que procura identificar padrões em datasets, permitindo, após a sua interpretação, identificar conhecimento específico acerca do problema em análise. A Mineração de Regras de Associação pode ser usada como uma metodologia para descobrir hipóteses ou teorias candidatas em um domínio do conhecimento. No entanto, o processo de Mineração de Regras de Associação gera um grande número de regras superando a capacidade de exploração do usuário. Esse fato pode tornar o processo de análise inviável, além de afetar negativamente o resultado de alguns algoritmos de extração de conhecimento. Diante disso, várias abordagens foram propostas para guiar o usuário na exploração das Regras de Associação descobertas, em especial com a utilização de estruturas de Rede, que permitem analisar as relações existentes entre as regras. Neste contexto, esse trabalho foi motivado pelo potencial uso de Redes na otimização da identificação do conhecimento, em processos de Mineração de Regras de Associação, formulando abordagens explicáveis. Outra motivação surge da lacuna referente ao uso de Redes em tarefas multialvo inerente de várias aplicações do mundo real. O desenvolvimento deste trabalho teve o intento de avançar as pesquisas da área de Mineração de Regras de Associação com o uso de Redes em relação a métodos de geração de hipóteses validáveis com um ou dois itens objetivo, tanto em relação à interpretabilidade como na expressividade das representações construídas. Um Mapeamento Sistemático da literatura da área foi realizado com a finalidade de conhecer o estado da arte sobre como o uso das Redes pode auxiliar nos processos de Mineração de Regras de Associação. Neste trabalho é proposto e desenvolvido um método de seleção e avaliação das medidas de suporte e confiança mínimos referentes a extração de Regras de Associação com o uso de Medidas de Centralidade de Redes, cuja contribuição principal foi a elaboração de um critério objetivo para extração de Regras de Associação. Foram também propostas, desenvolvidas e validadas duas novas Redes, as Redes de Regras de Associação Filtradas (Filtered-ARNs) e as Redes de Regras de Associação Multialvo (MTARNs) que promoveram um impacto positivo na identificação do conhecimento por meio da comprovação matemática da influência entre os elementos de uma Regra de Associação e ampliaram a capacidade de extração do conhecimento em estudos de aplicações multialvo. / The discovery of Association Rules is a data mining task that seeks to identify patterns in datasets, allowing, after its interpretation, to determine specific knowledge about the problem under analysis. Association Rules Mining can be used as a methodology for discovering hypotheses or candidate theories in a knowledge domain. However, the Association Rules Mining process generates a large number of rules that exceed the users ability to exploit. This fact may make the analysis process impracticable, as well as negatively affect the outcome of some knowledge extraction algorithms. Therefore, several approaches were proposed to guide the user in the exploration of the discovered Association Rules, especially with the use of Network structures, which allow to analyze the relations between the rules. In this context, this work was motivated by the potential use of Networks in the optimization of knowledge identification, in Association Rules Mining processes, formulating explanable approaches. Another motivation arises from the gap regarding the use of Networks in multi-target tasks inherent to several real-world applications. The development of this work was intended to advance the research of the Association Rules Mining with the use of Networks with methods of generating validate hypotheses with one or two target items, both about the interpretability and in the expressiveness of representations built. A Systematic Mapping of the literature of the area was carried out with the purpose of knowing the state of the art on how the use of the Networks can help in the Mining processes of Association Rules. In this work, a method of selection and evaluation of the minimum support and trust measures regarding the extraction of Association Rules with the use of Network Centralization Measures was proposed and developed, whose main contribution was the elaboration of an objective criterion for extraction of Association Rules. Two new networks were also introduced, developed and validated, the Filtered Association Rules Networks (Filtered-ARNs) and the Multi-Target Association Rules Networks (MTARNs) that promoted a positive impact on the identification of knowledge through mathematical proof of the influence between the elements of an Association Rule and extended the capacity of knowledge extraction in studies of multi-target applications.

Page generated in 0.0487 seconds