Global ETD Search

21	Organização flexível de documentos / Flexible organization of documents Tatiane Nogueira Rios 25 March 2013 (has links) Diversos métodos têm sido desenvolvidos para a organização da crescente quantidade de documentos textuais. Esses métodos frequentemente fazem uso de algoritmos de agrupamento para organizar documentos que referem-se a um mesmo assunto em um mesmo grupo, supondo que conteúdos de documentos de um mesmo grupo são similares. Porém, existe a possibilidade de que documentos pertencentes a grupos distintos também apresentem características semelhantes. Considerando esta situação, há a necessidade de desenvolver métodos que possibilitem a organização flexível de documentos, ou seja, métodos que possibilitem que documentos sejam organizados em diferentes grupos com diferentes graus de compatibilidade. O agrupamento fuzzy de documentos textuais apresenta-se como uma técnica adequada para este tipo de organização, uma vez que algoritmos de agrupamento fuzzy consideram que um mesmo documento pode ser compatível com mais de um grupo. Embora tem-se desenvolvido algoritmos de agrupamento fuzzy que possibilitam a organização flexível de documentos, tal organização é avaliada em termos do desempenho do agrupamento de documentos. No entanto, considerando que grupos de documentos devem possuir descritores que identifiquem adequadamente os tópicos representados pelos mesmos, de maneira geral os descritores de grupos tem sido extraídos utilizando alguma heurística sobre um conjunto pequeno de documentos, realizando assim, uma avaliação simples sobre o significado dos grupos extraídos. No entanto, uma apropriada extração e avaliação de descritores de grupos é importante porque os mesmos são termos representantes da coleção que identificam os tópicos abordados nos documentos. Portanto, em aplicações em que o agrupamento fuzzy é utilizado para a organização flexível de documentos, uma descrição apropriada dos grupos obtidos é tão importante quanto um bom agrupamento, uma vez que, neste tipo de agrupamento, um mesmo descritor pode indicar o conteúdo de mais de um grupo. Essa necessidade motivou esta tese, cujo objetivo foi investigar e desenvolver métodos para a extração de descritores de grupos fuzzy para a organização flexível de documentos. Para cumprir esse objetivo desenvolveu se: i) o método SoftO-FDCL (Soft Organization - Fuzzy Description Comes Last ), pelo qual descritores de grupos fuzzy at são extraídos após o processo de agrupamento fuzzy, visando identicar tópicos da organização flexível de documentos independentemente do algoritmo de agrupamento fuzzy utilizado; ii) o método SoftO-wFDCL ( Soft Organization - weighted Fuzzy Description Comes Last ), pelo qual descritores de grupos fuzzy at também são extraídos após o processo de agrupamento fuzzy utilizando o grau de pertinência dos documentos em cada grupo, obtidos do agrupamento fuzzy, como fator de ponderação dos termos candidatos a descritores; iii) o método HSoftO-FDCL (Hierarchical Soft Organization - Fuzzy Description Comes Last ), pelo qual descritores de grupos fuzzy hierárquicos são extraídos após o processo de agrupamento hierárquico fuzzy, identificando tópicos da organização hierárquica flexível de documentos. Adicionalmente, apresenta-se nesta tese uma aplicação do método SoftO-FDCL no contexto do programa de educação médica continuada canadense, reforçando a utilidade e aplicabilidade da organização flexível de documentos / Several methods have been developed to organize the growing number of textual documents. Such methods frequently use clustering algorithms to organize documents with similar topics into clusters. However, there are situations when documents of dffierent clusters can also have similar characteristics. In order to overcome this drawback, it is necessary to develop methods that permit a soft document organization, i.e., clustering documents into different clusters according to different compatibility degrees. Among the techniques that we can use to develop methods in this sense, we highlight fuzzy clustering algorithms (FCA). By using FCA, one of the most important steps is the evaluation of the yield organization, which is performed considering that all analyzed topics are adequately identified by cluster descriptors. In general, cluster descriptors are extracted using some heuristic over a small number of documents. The adequate extraction and evaluation of cluster descriptors is important because they are terms that represent the collection and identify the topics of the documents. Therefore, an adequate description of the obtained clusters is as important as a good clustering, since the same descriptor might identify one or more clusters. Hence, the development of methods to extract descriptors from fuzzy clusters obtained for soft organization of documents motivated this thesis. Aiming at investigating such methods, we developed: i) the SoftO-FDCL (Soft Organization - Fuzzy Description Comes Last) method, in which descriptors of fuzzy clusters are extracted after clustering documents, identifying topics regardless the adopted fuzzy clustering algorithm; ii) the SoftO-wFDCL (Soft Organization - weighted Fuzzy Description Comes Last) method, in which descriptors of fuzzy clusters are also extracted after the fuzzy clustering process using the membership degrees of the documents as a weighted factor for the candidate descriptors; iii) the HSoftO-FDCL (Hierarchical Soft Organization - Fuzzy Description Comes Last) method, in which descriptors of hierarchical fuzzy clusters are extracted after the hierarchical fuzzy clustering process, identifying topics by means of a soft hierarchical organization of documents. Besides presenting these new methods, this thesis also discusses the application of the SoftO-FDCL method on documents produced by the Canadian continuing medical education program, presenting the utility and applicability of the soft organization of documents in real-world scenario Agrupamento fuzzy Mineração de Textos Organização de documentos Documents organization Fuzzy clustering Text mining
22	Exploratory Study of Fuzzy Clustering and Set-Distance Based Validation Indexes Pangaonkar, Manali January 2012 (has links) No description available. Computer Science Fuzzy Clustering Cluster Validation Compactness Separation Set Distance Cluster Comparison
23	A Recommendation System Based on Multiple Databases. Goyal, Vivek 11 October 2013 (has links) No description available. Computer Science Collaborative Filtering Similarity measures Recommendation System Neighborhood Model Fuzzy Clustering Data Mining
24	A Relational Framework for Clustering and Cluster Validity and the Generalization of the Silhouette Measure Rawashdeh, Mohammad Y. 23 October 2014 (has links) No description available. Computer Science relational framework silhouettes fuzzy clustering cluster validity intracluster intercluster
25	Tracking time evolving data streams for short-term traffic forecasting Abdullatif, Amr R.A., Masulli, F., Rovetta, S. 20 January 2020 (has links) Yes / Data streams have arisen as a relevant topic during the last few years as an efficient method for extracting knowledge from big data. In the robust layered ensemble model (RLEM) proposed in this paper for short-term traffic flow forecasting, incoming traffic flow data of all connected road links are organized in chunks corresponding to an optimal time lag. The RLEM model is composed of two layers. In the first layer, we cluster the chunks by using the Graded Possibilistic c-Means method. The second layer is made up by an ensemble of forecasters, each of them trained for short-term traffic flow forecasting on the chunks belonging to a specific cluster. In the operational phase, as a new chunk of traffic flow data presented as input to the RLEM, its memberships to all clusters are evaluated, and if it is not recognized as an outlier, the outputs of all forecasters are combined in an ensemble, obtaining in this a way a forecasting of traffic flow for a short-term time horizon. The proposed RLEM model is evaluated on a synthetic data set, on a traffic flow data simulator and on two real-world traffic flow data sets. The model gives an accurate forecasting of the traffic flow rates with outlier detection and shows a good adaptation to non-stationary traffic regimes. Given its characteristics of outlier detection, accuracy, and robustness, RLEM can be fruitfully integrated in traffic flow management systems. Traffic forecasting Fuzzy clustering Big data Ensemble model Evolving data streams
26	Solving Factorable Programs with Applications to Cluster Analysis, Risk Management, and Control Systems Design Desai, Jitamitra 20 July 2005 (has links) Ever since the advent of the simplex algorithm, linear programming (LP) has been extensively used with great success in many diverse fields. The field of discrete optimization came to the forefront as a result of the impressive developments in the area of linear programming. Although discrete optimization problems can be viewed as belonging to the class of nonconvex programs, it has only been in recent times that optimization research has confronted the more formidable class of continuous nonconvex optimization problems, where the objective function and constraints are often highly nonlinear and nonconvex functions, defined in terms of continuous (and bounded) decision variables. Typical classes of such problems involve polynomial, or more general factorable functions. This dissertation focuses on employing the Reformulation-Linearization Technique (RLT) to enhance model formulations and to design effective solution techniques for solving several practical instances of continuous nonconvex optimization problems, namely, the hard and fuzzy clustering problems, risk management problems, and problems arising in control systems. Under the umbrella of the broad RLT framework, the contributions of this dissertation focus on developing models and algorithms along with related theoretical and computational results pertaining to three specific application domains. In the basic construct, through appropriate surrogation schemes and variable substitution strategies, we derive strong polyhedral approximations for the polynomial functional terms in the problem, and then rely on the demonstrated (robust) ability of the RLT for determining global optimal solutions for polynomial programming problems. The convergence of the proposed branch-and-bound algorithm follows from the tailored branching strategy coupled with consistency and exhaustive properties of the enumeration tree. First, we prescribe an RLT-based framework geared towards solving the hard and fuzzy clustering problems. In the second endeavor, we examine two risk management problems, providing novel models and algorithms. Finally, in the third part, we provide a detailed discussion on studying stability margins for control systems using polynomial programming models along with specialized solution techniques. / Ph. D. global optimization RLT factorable programs hard and fuzzy clustering risk management control systems nonconvex programs event trees
27	The Impact of Environmental Variables in Efficiency Analysis: A fuzzy clustering-DEA Approach Saraiya, Devang 01 September 2005 (has links) Data Envelopment Analysis (Charnes et al, 1978) is a technique used to evaluate the relative efficiency of any process or an organization. The efficiency evaluation is relative, which means it is compared with other processes or organizations. In real life situations different processes or units seldom operate in similar environments. Within a relative efficiency context, if units operating in different environments are compared, the units that operate in less desirable environments are at a disadvantage. In order to ensure that the comparison is fair within the DEA framework, a two-stage framework is presented in this thesis. Fuzzy clustering is used in the first stage to suitably group the units with similar environments. In a subsequent stage, a relative efficiency analysis is performed on these groups. By approaching the problem in this manner the influence of environmental variables on the efficiency analysis is removed. The concept of environmental dependency index is introduced in this thesis. The EDI reflects the extent to which the efficiency behavior of units is due to their environment of operation. The EDI also assists the decision maker to choose appropriate peers to guide the changes that the inefficient units need to make. A more rigorous series of steps to obtain the clustering solution is also presented in a separate chapter (chapter 5). / Master of Science Environmental Dependency Index Data Envelopment Analysis Non-Discretionary Variables Fuzzy Clustering Environmental Variables
28	Reducing domestic energy consumption through behaviour modification Ford, Rebecca January 2009 (has links) This thesis presents the development of techniques which enable appliance recognition in an Advanced Electricity Meter (AEM) to aid individuals reduce their domestic electricity consumption. The key aspect is to provide immediate and disaggregated information, down to appliance level, from a single point of measurement. Three sets of features including the short term time domain, time dependent finite state machine behaviour and time of day are identified by monitoring step changes in the power consumption of the home. Associated with each feature set is a membership which depicts the amount to which that feature set is representative of a particular appliance. These memberships are combined in a novel framework to effectively identify individual appliance state changes and hence appliance energy consumption. An innovative mechanism is developed for generating short term time domain memberships. Hierarchical and nearest neighbour clustering is used to train the AEM by generating appliance prototypes which contain an indication of typical parameters. From these prototypes probabilistic fuzzy memberships and possibilistic fuzzy typicalities are calculated for new data points which correspond to appliance state changes. These values are combined in a weighted geometric mean to produce novel memberships which are determined to be appropriate for the domestic model. A voltage independent feature space in the short term time domain is developed based on a model of the appliance’s electrical interface. The components within that interface are calculated and these, along with an indication of the appropriate model, form a novel feature set which is used to represent appliances. The techniques developed are verified with real data and are 99.8% accurate in a laboratory based classification in the short term time domain. The work presented in this thesis demonstrates the ability of the AEM to accurately track the energy consumption of individual appliances. 502.85
29	Approche robuste pour l’évaluation de la confiance des ressources sur le Web / A robust approach for Web resources trust assessment Saoud, Zohra 14 December 2016 (has links) Cette thèse en Informatique s'inscrit dans le cadre de gestion de la confiance et plus précisément des systèmes de recommandation. Ces systèmes sont généralement basés sur les retours d'expériences des utilisateurs (i.e., qualitatifs/quantitatifs) lors de l'utilisation des ressources sur le Web (ex. films, vidéos et service Web). Les systèmes de recommandation doivent faire face à trois types d'incertitude liés aux évaluations des utilisateurs, à leur identité et à la variation des performances des ressources au fil du temps. Nous proposons une approche robuste pour évaluer la confiance en tenant compte de ces incertitudes. Le premier type d'incertitude réfère aux évaluations. Cette incertitude provient de la vulnérabilité du système en présence d'utilisateurs malveillants fournissant des évaluations biaisées. Pour pallier cette incertitude, nous proposons un modèle flou de la crédibilité des évaluateurs. Ce modèle, basé sur la technique de clustering flou, permet de distinguer les utilisateurs malveillants des utilisateurs stricts habituellement exclus dans les approches existantes. Le deuxième type d'incertitude réfère à l'identité de l'utilisateur. En effet, un utilisateur malveillant a la possibilité de créer des identités virtuelles pour fournir plusieurs fausses évaluations. Pour contrecarrer ce type d'attaque dit Sybil, nous proposons un modèle de filtrage des évaluations, basé sur la crédibilité des utilisateurs et le graphe de confiance auquel ils appartiennent. Nous proposons deux mécanismes, l'un pour distribuer des capacités aux utilisateurs et l'autre pour sélectionner les utilisateurs à retenir lors de l'évaluation de la confiance. Le premier mécanisme permet de réduire le risque de faire intervenir des utilisateurs multi-identités. Le second mécanisme choisit des chemins dans le graphe de confiance contenant des utilisateurs avec des capacités maximales. Ces deux mécanismes utilisent la crédibilité des utilisateurs comme heuristique. Afin de lever l'incertitude sur l'aptitude d'une ressource à satisfaire les demandes des utilisateurs, nous proposons deux approches d'évaluation de la confiance d'une ressource sur leWeb, une déterministe et une probabiliste. La première consolide les différentes évaluations collectées en prenant en compte la crédibilité des évaluateurs. La deuxième s'appuie sur la théorie des bases de données probabilistes et la sémantique des mondes possibles. Les bases de données probabilistes offrent alors une meilleure représentation de l'incertitude sous-jacente à la crédibilité des utilisateurs et permettent aussi à travers des requêtes un calcul incertain de la confiance d'une ressource. Finalement, nous développons le système WRTrust (Web Resource Trust) implémentant notre approche d'évaluation de la confiance. Nous avons réalisé plusieurs expérimentations afin d'évaluer la performance et la robustesse de notre système. Les expérimentations ont montré une amélioration de la qualité de la confiance et de la robustesse du système aux attaques des utilisateurs malveillants / This thesis in Computer Science is part of the trust management field and more specifically recommendation systems. These systems are usually based on users’ experiences (i.e., qualitative / quantitative) interacting with Web resources (eg. Movies, videos and Web services). Recommender systems are undermined by three types of uncertainty that raise due to users’ ratings and identities that can be questioned and also due to variations in Web resources performance at run-time. We propose a robust approach for trust assessment under these uncertainties. The first type of uncertainty refers to users’ ratings. This uncertainty stems from the vulnerability of the system in the presence of malicious users providing false ratings. To tackle this uncertainty, we propose a fuzzy model for users’ credibility. This model uses a fuzzy clustering technique to distinguish between malicious users and strict users usually excluded in existing approaches. The second type of uncertainty refers to user’s identity. Indeed, a malicious user purposely creates virtual identities to provide false ratings. To tackle this type of attack known as Sybil, we propose a ratings filtering model based on the users’ credibility and the trust graph to which they belong. We propose two mechanisms, one for assigning capacities to users and the second one is for selecting users whose ratings will be retained when evaluating trust. The first mechanism reduces the attack capacity of Sybil users. The second mechanism chose paths in the trust graph including trusted users with maximum capacities. Both mechanisms use users’ credibility as heuristic. To deal with the uncertainty over the capacity of a Web resource in satisfying users’ requests, we propose two approaches for Web resources trust assessment, one deterministic and one probabilistic. The first consolidates users’ ratings taking into account users credibility values. The second relies on probability theory coupled with possible worlds semantics. Probabilistic databases offer a better representation of the uncertainty underlying users’ credibility and also permit an uncertain assessment of resources trust. Finally, we develop the system WRTrust (Web Resource Trust) implementing our trust assessment approach. We carried out several experiments to evaluate the performance and robustness of our system. The results show that trust quality has been significantly improved, as well as the system’s robustness in presence of false ratings attacks and Sybil attacks Ressource sur le Web Confiance Crédibilité Clustering flou Attaque Sybil Web resource Trust Credibility Fuzzy clustering Sybil attack 006.7
30	Agrupamento de dados fuzzy colaborativo / Collaborative fuzzy clustering Coletta, Luiz Fernando Sommaggio 19 May 2011 (has links) Nas últimas décadas, as técnicas de mineração de dados têm desempenhado um importante papel em diversas áreas do conhecimento humano. Mais recentemente, essas ferramentas têm encontrado espaço em um novo e complexo domínio, nbo qual os dados a serem minerados estão fisicamente distribuídos. Nesse domínio, alguns algorithmos específicos para agrupamento de dados podem ser utilizados - em particular, algumas variantes do algoritmo amplamente Fuzzy C-Means (FCM), as quais têm sido investigadas sob o nome de agrupamento fuzzy colaborativo. Com o objetivo de superar algumas das limitações encontradas em dois desses algoritmos, cinco novos algoritmos foram desenvolvidos nesse trabalho. Esses algoritmos foram estudados em dois cenários específicos de aplicação que levam em conta duas suposições sobre os dados (i.e., se os dados são de uma mesma npopulação ou de diferentes populações). Na prática, tais suposições e a dificuldade em se definir alguns dos parâmetros (que possam ser requeridos), podemn orientar a escolha feita pelo usuário entre os algoitmos diponíveis. Nesse sentido, exemplos ilustrativos destacam as diferenças de desempenho entre os algoritmos estudados e desenvolvidos, permitindo derivar algumas conclusões que podem ser úteis ao aplicar agrupamento fuzzy colaborativo na prática. Análises de complexidade de tempo, espaço, e comunicação também foram realizadas / Data mining techniques have played in important role in several areas of human kwnowledge. More recently, these techniques have found space in a new and complex setting in which the data to be mined are physically distributed. In this setting algorithms for data clustering can be used, such as some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data ditributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustring. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set systematically structured guidelines as to a selection of the specific algorithm dependeing upon a nature of the data environment and the assumption being made about the number of clusters. A thorough complexity analysis including space, time, and communication aspects is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study Descoberta de conhecimento distribuído Distributed knowledge discovery Índices de validade Validity indices

Search results