1 |
Fast and Scalable Outlier Detection with Metric Access Methods / Detecção Rápida e Escalável de Casos de Exceção com Métodos de Acesso MétricoBispo Junior, Altamir Gomes 25 July 2019 (has links)
It is well-known that the existing theoretical models for outlier detection make assumptions that may not reflect the true nature of outliers in every real application. This dissertation describes an empirical study performed on unsupervised outlier detection using 8 algorithms from the state-of-the-art and 8 datasets that refer to a variety of real-world tasks of practical relevance, such as spotting cyberattacks, clinical pathologies and abnormalities occurring in nature. We present our lowdown on the results obtained, pointing out to the strengths and weaknesses of each technique from the application specialists point of view, which is a shift from the designer-based point of view that is commonly adopted. Many of the techniques had unfeasibly high runtime requirements or failed to spot what the specialists consider as outliers in their own data. To tackle this issue, we propose MetricABOD: a novel ABOD-based algorithm that makes the analysis up to thousands of times faster, still being in average 26% more accurate than the most accurate related work. This improvement is tantamount to practical outlier detection in many real-world applications for which the existing methods present unstable accuracy or unfeasible runtime requirements. Finally, we studied two collections of text data to show that our MetricABOD works also for adimensional, purely metric data. / É conhecido e notável que os modelos teóricos existentes empregados na detecção de outliers realizam assunções que podem não refletir a verdadeira natureza dos outliers em cada aplicação. Esta dissertação descreve um estudo empírico sobre detecção de outliers não-supervisionada usando 8 algoritmos do estado-da-arte e 8 conjuntos de dados que foram extraídos de uma variedade de tarefas do mundo real de relevância prática, tais como a detecção de ataques cibernéticos, patologias clínicas e anormalidades naturais. Apresentam-se considerações sobre os resultados obtidos, apontando os pontos positivos e negativos de cada técnica do ponto de vista do especialista da aplicação, o que representa uma mudança do embasamento rotineiro no ponto de vista do desenvolvedor da técnica. A maioria das técnicas estudadas apresentou requerimentos de tempo impraticáveis ou falhou em encontrar o que os especialistas consideram como outliers nos conjuntos de dados confeccionados por eles próprios. Para lidar-se com esta questão, foi desenvolvido o método MetricABOD: um novo algoritmo baseado no ABOD que torna a análise milhares de vezes mais veloz, sendo ainda em média 26% mais acurada do que o trabalho relacionado mais acurado. Esta melhoria equivale a tornar a busca por outliers uma tarefa factível em muitas aplicações do mundo real para as quais os métodos existentes apresentam resultados instáveis ou requerimentos de tempo impassíveis de realização. Finalmente, foram também estudadas duas coleções de dados adimensionais para mostrar que o novo MetricABOD funciona também para dados puramente métricos.
2 |
Les systèmes complexes et la digitalisation des sciences. Histoire et sociologie des instituts de la complexité aux États-Unis et en France / Complex systems and the digitalization of sciences. History and sociology of complexity institutes in the United States and in FranceLi Vigni, Guido Fabrizio 26 November 2018 (has links)
Comment penser la relation entre les cultures scientifiques contemporaines et l’usage grandissant de l’ordinateur dans la production des savoirs ? Cette thèse se propose de donner une réponse à telle question à partir de l’analyse historique et sociologique d’un domaine scientifique fondé par le Santa Fe Institute (SFI) dans les années 1980 aux États-Unis : les « sciences des systèmes complexes » (SSC). Rendues célèbres par des publications grand-public, les SSC se répandent au cours des années 1990 et 2000 en Europe et dans d’autres pays du monde. Ce travail propose une histoire de la fondation de ce domaine en se concentrant sur le SFI et sur le Réseau National des Systèmes Complexes français. Avec un regard sociologique ancré dans les Science & Technology Studies et dans le courant pragmatiste, elle pose ensuite des questions sur le statut socio-épistémique de ce domaine, sur les modalités de l’administration de la preuve dans des savoirs fondés sur la simulation numérique et enfin sur les engagements épistémiques tenus par les spécialistes des systèmes complexes. Le matériau empirique – composé d’environ 200 entretiens, plusieurs milliers de pages d’archives et quelques visites de laboratoire – nous amène non seulement à mieux connaître ce champ de recherche – dont le langage est très répandu aujourd’hui, mais peu étudié par les historiens et les sociologues ; il nous porte aussi à questionner trois opinions courantes dans la littérature humaniste à propos des sciences numériques. À savoir : 1) l’ordinateur produit des connaissances de plus en plus interdisciplinaires, 2) il donne vie à des savoirs de type nouveau qui nécessitent une toute autre épistémologie pour être pensés et 3) il fait inévitablement advenir des visions du monde néolibérales. Or, cette thèse déconstruit ces trois formes de déterminisme technologique concernant les effets de l’ordinateur sur les pratiques scientifiques, en montrant d’abord que, dans les sciences computationnelles, les rapports interdisciplinaires ne se font pas sans effort ni pacifiquement ou sur pied d’égalité ; ensuite que les chercheurs et les chercheuses des SSC mobilisent des formes d’administration de la preuve déjà mises au point dans d’autres disciplines ; et enfin que les engagements épistémiques des scientifiques peuvent prendre une forme proche de la vision (néo)libérale, mais aussi des formes qui s’en éloignent ou qui s’y opposent. / How to think the relationship between contemporary scientific cultures and the rising usage of computer in the production of knowledge ? This thesis offers to give an answer to such a question, by analyzing historically and sociologically a scientific domain founded by the Santa Fe Institute (SFI) in the 1980s in the United States : the « complex systems sciences » (CSS). Become well-known thanks to popular books and articles, CSS have spread in Europe and in other countries of the world in the course of the 1990s and the 2000s. This work proposes a history of the foundation of this domain, by focussing on the SFI and on the French Complex Systems National Network. With a sociological take rooted into Science & Technology Studies and into pragmatism, it then asks some questions about the socio-epistemic status of such a domain, about the modalities of production of evidence as they are employed in the context of digital simulation and, finally, about the epistemic engagements hold by complexity specialists. Empirical material – composed by circa 200 interviews, several thousands archival pages and a small number of laboratory visits – allows us not only to improve knowledge about this field – whose language is very common today, but little studied by historians and sociologists ; it also brings us to question three current opinions in the human and social sciences literature regarding digital sciences. That is : 1) that the computer produces more and more interdisciplinary knowledge, 2) that it gives birth to a new type of knowledge which needs an entirely new epistemology to be well understood and 3) that it inevitably brings about neoliberal visions of the world. Now, this thesis deconstructs these three forms of technological determinism concerning the effects of computer on scientific practices, by showing firstly that, in digital sciences, the interdisciplinary collaborations are not made without any effort and in a symetrical and pacific way ; secondly, that CSS’ researchers mobilize a kind of evidence production techniques which are well known in other disciplines ; and, thirdly, that scientists’ epistemic engagements can take (neo)liberal forms, but also other forms that depart from neoliberalism or that stand against it.
Page generated in 0.1353 seconds