• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • Tagged with
  • 16
  • 16
  • 16
  • 16
  • 12
  • 9
  • 8
  • 7
  • 7
  • 6
  • 6
  • 6
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Learning lost temporal fuzzy association rules

Matthews, Stephen January 2012 (has links)
Fuzzy association rule mining discovers patterns in transactions, such as shopping baskets in a supermarket, or Web page accesses by a visitor to a Web site. Temporal patterns can be present in fuzzy association rules because the underlying process generating the data can be dynamic. However, existing solutions may not discover all interesting patterns because of a previously unrecognised problem that is revealed in this thesis. The contextual meaning of fuzzy association rules changes because of the dynamic feature of data. The static fuzzy representation and traditional search method are inadequate. The Genetic Iterative Temporal Fuzzy Association Rule Mining (GITFARM) framework solves the problem by utilising flexible fuzzy representations from a fuzzy rule-based system (FRBS). The combination of temporal, fuzzy and itemset space was simultaneously searched with a genetic algorithm (GA) to overcome the problem. The framework transforms the dataset to a graph for efficiently searching the dataset. A choice of model in fuzzy representation provides a trade-off in usage between an approximate and descriptive model. A method for verifying the solution to the hypothesised problem was presented. The proposed GA-based solution was compared with a traditional approach that uses an exhaustive search method. It was shown how the GA-based solution discovered rules that the traditional approach did not. This shows that simultaneously searching for rules and membership functions with a GA is a suitable solution for mining temporal fuzzy association rules. So, in practice, more knowledge can be discovered for making well-informed decisions that would otherwise be lost with a traditional approach.
2

Využití data miningových metod při zpracování dat z demografických šetření / Using data mining methods for demographic survey data processing

Fišer, David January 2015 (has links)
USING DATA MINING METHODS FOR DEMOGRAPHIC SURVEY DATA PROCESSING Abstract The goal of the thesis was to describe and demonstrate principles of the process of knowledge discovery in databases - data mining (DM). In the theoretical part of the thesis, selected methods for data mining processes are described as well as basic principles of those DM techniques. In the second part of the thesis a DM task is realized in accordance to CRISP-DM methodology. Practical part of the thesis is divided into two parts and data from the survey of American Community Survey served as the basic data for the practical part of the thesis. First part contains a classification task which goal was to determinate whether the selected DM techniques can be used to solve missing data in the surveys. The success rate of classifications and following data value prediction in selected attributes was in 55-80 % range. The second part of the practical part of the thesis was then focused of determining knowledge of interest using associating rules and the GUHA method. Keywords: data mining, knowledge discovery in databases, statistic surveys, missing values, classification, association rules, GUHA method, ACS
3

On biclusters aggregation and its benefits for enumerative solutions = Agregação de biclusters e seus benefícios para soluções enumerativas / Agregação de biclusters e seus benefícios para soluções enumerativas

Oliveira, Saullo Haniell Galvão de, 1988- 27 August 2018 (has links)
Orientador: Fernando José Von Zuben / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-27T03:28:44Z (GMT). No. of bitstreams: 1 Oliveira_SaulloHaniellGalvaode_M.pdf: 1171322 bytes, checksum: 5488cfc9b843dbab6d7a5745af1e3d4b (MD5) Previous issue date: 2015 / Resumo: Biclusterização envolve a clusterização simultânea de objetos e seus atributos, definindo mo- delos locais de relacionamento entre os objetos e seus atributos. Assim como a clusterização, a biclusterização tem uma vasta gama de aplicações, desde suporte a sistemas de recomendação, até análise de dados de expressão gênica. Inicialmente, diversas heurísticas foram propostas para encontrar biclusters numa base de dados numérica. No entanto, tais heurísticas apresen- tam alguns inconvenientes, como não encontrar biclusters relevantes na base de dados e não maximizar o volume dos biclusters encontrados. Algoritmos enumerativos são uma proposta recente, especialmente no caso de bases numéricas, cuja solução é um conjunto de biclusters maximais e não redundantes. Contudo, a habilidade de enumerar biclusters trouxe mais um cenário desafiador: em bases de dados ruidosas, cada bicluster original se fragmenta em vá- rios outros biclusters com alto nível de sobreposição, o que impede uma análise direta dos resultados obtidos. Essa fragmentação irá ocorrer independente da definição escolhida de co- erência interna no bicluster, sendo mais relacionada com o próprio nível de ruído. Buscando reverter essa fragmentação, nesse trabalho propomos duas formas de agregação de biclusters a partir de resultados que apresentem alto grau de sobreposição: uma baseada na clusteriza- ção hierárquica com single linkage, e outra explorando diretamente a taxa de sobreposição dos biclusters. Em seguida, um passo de poda é executado para remover objetos ou atributos indesejados que podem ter sido incluídos como resultado da agregação. As duas propostas foram comparadas entre si e com o estado da arte, em diversos experimentos, incluindo bases de dados artificiais e reais. Essas duas novas formas de agregação não só reduziram significa- tivamente a quantidade de biclusters, essencialmente defragmentando os biclusters originais, mas também aumentaram consistentemente a qualidade da solução, medida em termos de precisão e recuperação, quando os biclusters são conhecidos previamente / Abstract: Biclustering involves the simultaneous clustering of objects and their attributes, thus defin- ing local models for the two-way relationship of objects and attributes. Just like clustering, biclustering has a broad set of applications, ranging from an advanced support for recom- mender systems of practical relevance to a decisive role in data mining techniques devoted to gene expression data analysis. Initially, heuristics have been proposed to find biclusters, and their main drawbacks are the possibility of losing some existing biclusters and the inca- pability of maximizing the volume of the obtained biclusters. Recently efficient algorithms were conceived to enumerate all the biclusters, particularly in numerical datasets, so that they compose a complete set of maximal and non-redundant biclusters. However, the ability to enumerate biclusters revealed a challenging scenario: in noisy datasets, each true bicluster becomes highly fragmented and with a high degree of overlapping, thus preventing a direct analysis of the obtained results. Fragmentation will happen no matter the boundary condi- tion adopted to specify the internal coherence of the valid biclusters, though the degree of fragmentation will be associated with the noise level. Aiming at reverting the fragmentation, we propose here two approaches for properly aggregating a set of biclusters exhibiting a high degree of overlapping: one based on single linkage and the other directly exploring the rate of overlapping. A pruning step is then employed to filter intruder objects and/or attributes that were added as a side effect of aggregation. Both proposals were compared with each other and also with the actual state-of-the-art in several experiments, including real and artificial datasets. The two newly-conceived aggregation mechanisms not only significantly reduced the number of biclusters, essentially defragmenting true biclusters, but also consistently in- creased the quality of the whole solution, measured in terms of Precision and Recall when the composition of the dataset is known a priori / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica
4

RECOMMENDATION SYSTEMS IN SOCIAL NETWORKS

Behafarid Mohammad Jafari (15348268) 18 May 2023 (has links)
<p> The dramatic improvement in information and communication technology (ICT) has made an evolution in learning management systems (LMS). The rapid growth in LMSs has caused users to demand more advanced, automated, and intelligent services. CourseNetworking is a next-generation LMS adopting machine learning to add personalization, gamification, and more dynamics to the system. This work tries to come up with two recommender systems that can help improve CourseNetworking services. The first one is a social recommender system helping CourseNetworking to track user interests and give more relevant recommendations. Recently, graph neural network (GNN) techniques have been employed in social recommender systems due to their high success in graph representation learning, including social network graphs. Despite the rapid advances in recommender systems performance, dealing with the dynamic property of the social network data is one of the key challenges that is remained to be addressed. In this research, a novel method is presented that provides social recommendations by incorporating the dynamic property of social network data in a heterogeneous graph by supplementing the graph with time span nodes that are used to define users long-term and short-term preferences over time. The second service that is proposed to add to Rumi services is a hashtag recommendation system that can help users label their posts quickly resulting in improved searchability of content. In recent years, several hashtag recommendation methods are proposed and developed to speed up processing of the texts and quickly find out the critical phrases. The methods use different approaches and techniques to obtain critical information from a large amount of data. This work investigates the efficiency of unsupervised keyword extraction methods for hashtag recommendation and recommends the one with the best performance to use in a hashtag recommender system. </p>
5

Models and Representation Learning Mechanisms for Graph Data

Susheel Suresh (14228138) 15 December 2022 (has links)
<p>Graph representation learning (GRL) has been increasing used to model and understand data from a wide variety of complex systems spanning social, technological, bio-chemical and physical domains. GRL consists of two main components (1) a parametrized encoder that provides representations of graph data and (2) a learning process to train the encoder parameters. Designing flexible encoders that capture the underlying invariances and characteristics of graph data are crucial to the success of GRL. On the other hand, the learning process drives the quality of the encoder representations and developing principled learning mechanisms are vital for a number of growing applications in self-supervised, transfer and federated learning settings. To this end, we propose a suite of models and learning algorithms for GRL which form the two main thrusts of this dissertation.</p> <p><br></p> <p>In Thrust I, we propose two novel encoders which build upon on a widely popular GRL encoder class called graph neural networks (GNNs). First, we empirically study the prediction performance of current GNN based encoders when applied to graphs with heterogeneous node mixing patterns using our proposed notion of local assortativity. We find that GNN performance in node prediction tasks strongly correlates with our local assortativity metric---thereby introducing a limit. We propose to transform the input graph into a computation graph with proximity and structural information as distinct types of edges. We then propose a novel GNN based encoder that operates on this computation graph and adaptively chooses between structure and proximity information. Empirically, adopting our transformation and encoder framework leads to improved node classification performance compared to baselines in real-world graphs that exhibit diverse mixing.</p> <p>Secondly, we study the trade-off between expressivity and efficiency of GNNs when applied to temporal graphs for the task of link ranking. We develop an encoder that incorporates a labeling approach designed to allow for efficient inference over the candidate set jointly, while provably boosting expressivity. We also propose to optimize a list-wise loss for improved ranking. With extensive evaluation on real-world temporal graphs, we demonstrate its improved performance and efficiency compared to baselines.</p> <p><br></p> <p>In Thrust II, we propose two principled encoder learning mechanisms for challenging and realistic graph data settings. First, we consider a scenario where only limited or even no labelled data is available for GRL. Recent research has converged on graph contrastive learning (GCL), where GNNs are trained to maximize the correspondence between representations of the same graph in its different augmented forms. However, we find that GNNs trained by traditional GCL often risk capturing redundant graph features and thus may be brittle and provide sub-par performance in downstream tasks. We then propose a novel principle, termed adversarial-GCL (AD-GCL), which enables GNNs to avoid capturing redundant information during the training by optimizing adversarial graph augmentation strategies used in GCL. We pair AD-GCL with theoretical explanations and design a practical instantiation based on trainable edge-dropping graph augmentation. We experimentally validate AD-GCL by comparing with state-of-the-art GCL methods and achieve performance gains in semi-supervised, unsupervised and transfer learning settings using benchmark chemical and biological molecule datasets. </p> <p>Secondly, we consider a scenario where graph data is silo-ed across clients for GRL. We focus on two unique challenges encountered when applying distributed training to GRL: (i) client task heterogeneity and (ii) label scarcity. We propose a novel learning framework called federated self-supervised graph learning (FedSGL), which first utilizes a self-supervised objective to train GNNs in a federated fashion across clients and then, each client fine-tunes the obtained GNNs based on its local task and available labels. Our framework enables the federated GNN model to extract patterns from the common feature (attribute and graph topology) space without the need of labels or being biased by heterogeneous local tasks. Extensive empirical study of FedSGL on both node and graph classification tasks yields fruitful insights into how the level of feature / task heterogeneity, the adopted federated algorithm and the level of label scarcity affects the clients’ performance in their tasks.</p>
6

Node Centric Community Detection and Evolutional Prediction in Dynamic Networks

Oluwafolake A Ayano (13161288) 27 July 2022 (has links)
<p>  </p> <p>Advances in technology have led to the availability of data from different platforms such as the web and social media platforms. Much of this data can be represented in the form of a network consisting of a set of nodes connected by edges. The nodes represent the items in the networks while the edges represent the interactions between the nodes. Community detection methods have been used extensively in analyzing these networks. However, community detection in evolving networks has been a significant challenge because of the frequent changes to the networks and the need for real-time analysis. Using Static community detection methods for analyzing dynamic networks will not be appropriate because static methods do not retain a network’s history and cannot provide real-time information about the communities in the network.</p> <p>Existing incremental methods treat changes to the network as a sequence of edge additions and/or removals; however, in many real-world networks, changes occur when a node is added with all its edges connecting simultaneously. </p> <p>For efficient processing of such large networks in a timely manner, there is a need for an adaptive analytical method that can process large networks without recomputing the entire network after its evolution and treat all the edges involved with a node equally. </p> <p>We proposed a node-centric community detection method that incrementally updates the community structure in the network using the already known structure of the network to avoid recomputing the entire network from the scratch and consequently achieve a high-quality community structure. The results from our experiments suggest that our approach is efficient for incremental community detection of node-centric evolving networks. </p>
7

Skin lesion detection using deep learning

Rajit Chandra (12495442) 03 May 2022 (has links)
<p>Skin lesion can be deadliest if not detected early. Early detection of skin lesion can save many lives. Artificial Intelligence and Machine learning is helping healthcare in many ways and so in the diagnosis of skin lesion. Computer aided diagnosis help clinicians in detecting the cancer. The study was conducted to classify the seven classes of skin lesion using very powerful convolutional neural networks. The two pre trained models i.e., DenseNet and Incepton-v3 were employed to train the model and accuracy, precision, recall, f1score and ROC-AUC was calculated for every class prediction. Moreover, gradient class activation maps were also used to aid the clinicians in determining what are the regions of image that influence model to make a certain decision. These visualizations are used for explainability of the model. Experiments showed that DenseNet performed better then Inception V3. Also it was noted that gradient class activation maps highlighted different regions for predicting same class. The main contribution was to introduce medical aided visualizations in lesion classification model that will help clinicians in understanding the decisions of the model. It will enhance the reliability of the model. Also, different optimizers were employed with both models to compare the accuracies.</p>
8

Rewiring Police Officer Training Networks to Reduce Forecasted Use of Force

Ritika Pandey (9147281) 30 August 2023 (has links)
<p><br></p> <p>Police use of force has become a topic of significant concern, particularly given the disparate impact on communities of color. Research has shown that police officer involved shootings, misconduct and excessive use of force complaints exhibit network effects, where officers are at greater risk of being involved in these incidents when they socialize with officers who have a history of use of force and misconduct. Given that use of force and misconduct behavior appear to be transmissible across police networks, we are attempting to address if police networks can be altered to reduce use of force and misconduct events in a limited scope.</p> <p><br></p> <p>In this work, we analyze a novel dataset from the Indianapolis Metropolitan Police Department on officer field training, subsequent use of force, and the role of network effects from field training officers. We construct a network survival model for analyzing time-to-event of use of force incidents involving new police trainees. The model includes network effects of the diffusion of risk from field training officers (FTOs) to trainees. We then introduce a network rewiring algorithm to maximize the expected time to use of force events upon completion of field training. We study several versions of the algorithm, including constraints that encourage demographic diversity of FTOs. The results show that FTO use of force history is the best predictor of trainee's time to use of force in the survival model and rewiring the network can increase the expected time (in days) of a recruit's first use of force incident by 8%. </p> <p>We then discuss the potential benefits and challenges associated with implementing such an algorithm in practice.</p> <p><br></p>
9

Learning From Data Across Domains: Enhancing Human and Machine Understanding of Data From the Wild

Sean Michael Kulinski (17593182) 13 December 2023 (has links)
<p dir="ltr">Data is collected everywhere in our world; however, it often is noisy and incomplete. Different sources of data may have different characteristics, quality levels, or come from dynamic and diverse environments. This poses challenges for both humans who want to gain insights from data and machines which are learning patterns from data. How can we leverage the diversity of data across domains to enhance our understanding and decision-making? In this thesis, we address this question by proposing novel methods and applications that use multiple domains as more holistic sources of information for both human and machine learning tasks. For example, to help human operators understand environmental dynamics, we show the detection and localization of distribution shifts to problematic features, as well as how interpretable distributional mappings can be used to explain the differences between shifted distributions. For robustifying machine learning, we propose a causal-inspired method to find latent factors that are robust to environmental changes and can be used for counterfactual generation or domain-independent training; we propose a domain generalization framework that allows for fast and scalable models that are robust to distribution shift; and we introduce a new dataset based on human matches in StarCraft II that exhibits complex and shifting multi-agent behaviors. We showcase our methods across various domains such as healthcare, natural language processing (NLP), computer vision (CV), etc. to demonstrate that learning from data across domains can lead to more faithful representations of data and its generating environments for both humans and machines.</p>
10

TEMPORAL EVENT MODELING OF SOCIAL HARM WITH HIGH DIMENSIONAL AND LATENT COVARIATES

Xueying Liu (13118850) 09 September 2022 (has links)
<p>    </p> <p>The counting process is the fundamental of many real-world problems with event data. Poisson process, used as the background intensity of Hawkes process, is the most commonly used point process. The Hawkes process, a self-exciting point process fits to temporal event data, spatial-temporal event data, and event data with covariates. We study the Hawkes process that fits to heterogeneous drug overdose data via a novel semi-parametric approach. The counting process is also related to survival data based on the fact that they both study the occurrences of events over time. We fit a Cox model to temporal event data with a large corpus that is processed into high dimensional covariates. We study the significant features that influence the intensity of events. </p>

Page generated in 0.1207 seconds