Global ETD Search

451	Pivot-based Data Partitioning for Distributed k Nearest Neighbor Mining Kuhlman, Caitlin Anne 20 January 2017 (has links) This thesis addresses the need for a scalable distributed solution for k-nearest-neighbor (kNN) search, a fundamental data mining task. This unsupervised method poses particular challenges on shared-nothing distributed architectures, where global information about the dataset is not available to individual machines. The distance to search for neighbors is not known a priori, and therefore a dynamic data partitioning strategy is required to guarantee that exact kNN can be found autonomously on each machine. Pivot-based partitioning has been shown to facilitate bounding of partitions, however state-of-the-art methods suffer from prohibitive data duplication (upwards of 20x the size of the dataset). In this work an innovative method for solving exact distributed kNN search called PkNN is presented. The key idea is to perform computation over several rounds, leveraging pivot-based data partitioning at each stage. Aggressive data-driven bounds limit communication costs, and a number of optimizations are designed for efficient computation. Experimental study on large real-world data (over 1 billion points) compares PkNN to the state-of-the-art distributed solution, demonstrating that the benefits of additional stages of computation in the PkNN method heavily outweigh the added I/O overhead. PkNN achieves a data duplication rate close to 1, significant speedup over previous solutions, and scales effectively in data cardinality and dimension. PkNN can facilitate distributed solutions to other unsupervised learning methods which rely on kNN search as a critical building block. As one example, a distributed framework for the Local Outlier Factor (LOF) algorithm is given. Testing on large real-world and synthetic data with varying characteristics measures the scalability of PkNN and the distributed LOF framework in data size and dimensionality. distributed computing kNN Search data Mining
452	Data mining aplicado ao serviço público, extração de conhecimento das ações do Ministério Público Brasileiro Guimarães, William Sérgio Azevêdo January 2000 (has links) Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Ciência da Computação. / Made available in DSpace on 2012-10-17T20:19:29Z (GMT). No. of bitstreams: 1 194202.pdf: 2045819 bytes, checksum: 09c4f4d78ca4843d88bc1b61687bc359 (MD5) Informatica Ciência da computação Data mining Arvores de decisao
453	Computational approaches for engineering effective teams Golshan, Behzad 04 December 2016 (has links) The performance of a team depends not only on the abilities of its individual members, but also on how these members interact with each other. Inspired by this premise and motivated by a large number of applications in educational, industrial and management settings, this thesis studies a family of problems, known as team-formation problems, that aim to engineer teams that are effective and successful. The major challenge in this family of problems is dealing with the complexity of the human team participants. Specifically, each individual has his own objectives, demands, and constraints that might be in contrast with the desired team objective. Furthermore, different collaboration models lead to different instances of team-formation problems. In this thesis, we introduce several such models and describe techniques and efficient algorithms for various instantiations of the team-formation problem. This thesis consists of two main parts. In the first part, we examine three distinct team-formation problems that are of significant interest in (i) educational settings, (ii) industrial organizations, and (iii) management settings respectively. What constitutes an effective team in each of the aforementioned settings is totally dependent on the objective of the team. For instance, the performance of a team (or a study group) in an educational setting can be measured as the amount of learning and collaboration that takes place inside the team. In industrial organizations, desirable teams are those that are cost-effective and highly profitable. Finally in management settings, an interesting body of research uncovers that teams with faultlines are prone to performance decrements. Thus, the challenge is to form teams that are free of faultlines, that is, to form teams that are robust and less likely to break due to disagreements. The first part of the thesis discusses approaches for formalizing these problems and presents efficient computational methods for solving them. In the second part of the thesis, we consider the problem of improving the functioning of existing teams. More precisely, we show how we can use models from social theory to capture the dynamics of the interactions between the team members. We further discuss how teams can be modified so that the interaction dynamics lead to desirable outcomes such as higher levels of agreement or lesser tension and conflict among the team members. Computer science Teams Data mining Team formation
454	Matrix completion with structure Ruchansky, Natali 07 December 2016 (has links) Often, data organized in matrix form contains missing entries. Further, such data has been observed to exhibit effective low-rank, and has led to interest in the particular problem of low-rank matrix-completion: Given a partially-observed matrix, estimate the missing entries such that the output completion is low-rank. The goal of this thesis is to improve matrix-completion algorithms by explicitly analyzing two sources of information in the observed entries: their locations and their values. First, we provide a categorization of a new approach to matrix-completion, which we call structural. Structural methods quantify the possibility of completion using tests applied only to the locations of known entries. By framing each test as the class of partially-observed matrices that pass the test, we provide the first organizing framework for analyzing the relationship among structural completion methods. Building on the structural approach, we then develop a new algorithm for active matrix-completion that is combinatorial in nature. The algorithm uses just the locations of known entries to suggest a small number of queries to be made on the missing entries that allow it to produce a full and accurate completion. If a budget is placed on the number of queries, the algorithm outputs a partial completion, indicating which entries it can and cannot accurately estimate given the observations at hand. Finally, we propose a local approach to matrix-completion that analyzes the values of the observed entries to discover a structure that is more fine-grained than the traditional low-rank assumption. Motivated by the Singular Value Decomposition, we develop an algorithm that finds low-rank submatrices using only the first few singular vectors of a matrix. By completing low-rank submatrices separately from the rest of the matrix, the local approach to matrix-completion produces more accurate reconstructions than traditional algorithms. Computer science Data mining Matrix completion
455	Applying data mining techniques over big data Al-Hashemi, Idrees Yousef January 2013 (has links) Thesis (M.S.C.S.) PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you. / The rapid development of information technology in recent decades means that data appear in a wide variety of formats — sensor data, tweets, photographs, raw data, and unstructured data. Statistics show that there were 800,000 Petabytes stored in the world in 2000. Today’s internet has about 0.1 Zettabytes of data (ZB is about 1021 bytes), and this number will reach 35 ZB by 2020. With such an overwhelming flood of information, present data management systems are not able to scale to this huge amount of raw, unstructured data—in today’s parlance, Big Data. In the present study, we show the basic concepts and design of Big Data tools, algorithms, and techniques. We compare the classical data mining algorithms to the Big Data algorithms by using Hadoop/MapReduce as a core implementation of Big Data for scalable algorithms. We implemented the K-means algorithm and A-priori algorithm with Hadoop/MapReduce on a 5 nodes Hadoop cluster. We explore NoSQL databases for semi-structured, massively large-scaling of data by using MongoDB as an example. Finally, we show the performance between HDFS (Hadoop Distributed File System) and MongoDB data storage for these two algorithms. Computer science Data mining Information technology
456	Classificação de contribuintes Corvalão, Eder Daniel 24 October 2012 (has links) Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Produção, Florianópolis, 2009. / Made available in DSpace on 2012-10-24T15:32:44Z (GMT). No. of bitstreams: 1 276406.pdf: 2760954 bytes, checksum: ce5a6bbcb2f8523af4e472c33703b020 (MD5) / O termo contribuinte se aplica à pessoa física ou jurídica que a lei obriga ao cumprimento de obrigação tributária. É função da administração tributária acompanhar e fiscalizar a correta execução das obrigações fiscais das empresas contribuintes. Na impossibilidade do acompanhamento de todas as empresas, o processo de seleção de contribuintes a serem auditados torna-se de vital importância. Com o crescimento do volume de informações apresentadas pelos contribuintes, sistematicamente armazenados em sistemas operacionais; e, com o aparecimento de novas ferramentas de análise de dados aliados à evolução dos recursos computacionais surgem novas alternativas para abordar o problema da seleção de contribuintes. Neste cenário a área de mineração de dados (data mining) aparece com diversas aplicações nas mais variadas áreas, entre elas a de detecção de fraude. Esta tese desenvolve um modelo formal para classificação dos contribuintes a partir dos dados de movimentação mensal que são apresentados ao setor de fiscalização. A proposta busca preservar as características econômicas e regionais de cada empresa, valendo-se da análise de agrupamentos. Na seqüência são construídos modelos probabilísticos que serão usados para relacionar os contribuintes com maiores indícios de irregularidades. Esta relação poderá ser utilizada para direcionar a seleção das empresas a serem auditadas. Para sua validação, este modelo foi aplicado num estudo de caso junto à Secretaria da Fazenda do Estado de Santa Catarina. A seleção de contribuintes do ICMS (Imposto sobre Circulação de Mercadorias e Serviços) foi o tema analisado utilizando-se dados mensais entre os anos 2005 e 2007. Engenharia de produção Data mining Fraude Sonegação fiscal
457	Um sistema de recomendação baseado em filtragem colaborativa Bernartt, João Lourenço Vivan January 2008 (has links) Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Engenharia Elétrica. / Made available in DSpace on 2012-10-23T16:11:58Z (GMT). No. of bitstreams: 1 260100.pdf: 565420 bytes, checksum: 1023ae9876ec3dee2d030e287eca9a48 (MD5) / Este trabalho tem como objetivo contribuir com a pesquisa na área de sistemas de recomendação, particularmente sistemas baseados em Filtragem Colaborativa, buscando promover o desenvolvimento de tecnologias informacionais no Brasil. Para tal, propõe-se desenvolver um sistema de recomendação completo para a competição promovida pela empresa Netflix, procurando obter uma precisão melhor que a do sistema em uso pela empresa - o Cinematch. Como resultado, primeiramente apresenta-se o estado da arte da pesquisa em sistemas de recomendação e a sistemática da competição. Na seqüência, a contribuição do autor é exposta através da descrição do algoritmo desenvolvido e dos resultados alcançados. Dentre estes, está a qualificação dentro da competição, o bom tempo computacional do algoritmo e a sua precisão que superou a do sistema Cinematch. Ao final, as conclusões acerca dos resultados alcançados são descritas e, estabelecem-se perspectivas para a continuidade do trabalho. Engenharia eletrica Data mining Aprendizado do computador
458	Srovnání metod vyhodnocujících výsledky shlukování Polcer, Ondřej January 2015 (has links) J. Žižka, O. Polcer: Comparison of methods evaluating results of clustering Diploma thesis, Mendel University in Brno, 2015. This thesis describes in detail data clustering, development of own clustering application, its comparison with the programme Cluto and analysis of results.
459	Tři eseje o veřejných zakázkách / Three essays on public procurement Skuhrovec, Jiří January 2017 (has links) The core of thesis lays in quantitative analysis of microeconomic data on public procurement and alternative forms of dealing with public money. It consists of three essays with one common attribute: extensive groundwork with data, including overlaps into legal and technical disciplines. The fist essay examines the relationship between transparency of ownership structure and (i) profits of firms winning public procurement contracts and (ii) competition for the contracts and savings of the public authority. It identifies a significant advantage of firms with opaque ownership structure in terms of access to public money. It concludes with a possible explanation of conflict of interest and corruption, which might channel such advantages. The second essay proposes and tests a novel methodology for benchmarking of contracting authorities. The proposed rating measures a deviation from best practice recommendations in the areas of openness, competition and transparency. Indirectly it measures efficiency and corruption potential in public procurement. The pilot results of the methodology are provided and extensively discussed for a group of Czech municipalities. Third essay investigates issue of crowding out effect potentially introduced by EU funds provision. It studies direct budgetary impacts of...
460	Integrace pokročilého objednávkového systému s vybraným ERP Katovská, Petra January 2010 (has links) No description available.

Search results