Global ETD Search

71	Praktické uplatnění technologií data mining ve zdravotních pojišťovnách / Practical applications of data mining technologies in health insurance companies Kulhavý, Lukáš January 2010 (has links) This thesis focuses on data mining technology and its possible practical use in the field of health insurance companies. Thesis defines the term data mining and its relation to the term knowledge discovery in databases. The term data mining is explained, inter alia, with methods describing the individual phases of the process of knowledge discovery in databases (CRISP-DM, SEMMA). There is also information about possible practical applications, technologies and products available in the market (both products available free and commercial products). Introduction of the main data mining methods and specific algorithms (decision trees, association rules, neural networks and other methods) serves as a theoretical introduction, on which are the practical applications of real data in real health insurance companies build. These are applications seeking the causes of increased remittances and churn prediction. I have solved these applications in freely-available systems Weka and LISP-Miner. The objective is to introduce and to prove data mining capabilities over this type of data and to prove capabilities of Weka and LISP-Miner systems in solving tasks due to the methodology CRISP-DM. The last part of thesis is devoted the fields of cloud and grid computing in conjunction with data mining. It offers an insight into possibilities of these technologies and their benefits to the technology of data mining. Possibilities of cloud computing are presented on the Amazon EC2 system, grid computing can be used in Weka Experimenter interface.
72	On the Resilience of Network Coding in Peer-to-Peer Networks and its Applications Niu, Di 14 July 2009 (has links) Most current-generation P2P content distribution protocols use fine-granularity blocks to distribute content in a decentralized fashion. Such systems often suffer from a significant variation in block distributions, such that certain blocks become rare or even unavailable, adversely affecting content availability and download efficiency. This phenomenon is further aggravated by peer dynamics which is inherent in P2P networks. In this thesis, we quantitatively analyze how network coding may improve block availability and introduce resilience to peer dynamics. Since in reality, network coding can only be performed within segments, each containing a subset of blocks, we explore the fundamental tradeoff between the resilience gain of network coding and its inherent coding complexity, as the number of blocks in a segment varies. As another application of the resilience of network coding, we also devise an indirect data collection scheme based on network coding for the purpose of large-scale network measurements. Peer-to-Peer Content Distribution Network Coding File Sharing Churn Measurement Collection Coding Complexity Loss Resilience Data Collection Log Collection Avalanche BitTorrent Generation Segment 0544
73	On the Resilience of Network Coding in Peer-to-Peer Networks and its Applications Niu, Di 14 July 2009 (has links) Most current-generation P2P content distribution protocols use fine-granularity blocks to distribute content in a decentralized fashion. Such systems often suffer from a significant variation in block distributions, such that certain blocks become rare or even unavailable, adversely affecting content availability and download efficiency. This phenomenon is further aggravated by peer dynamics which is inherent in P2P networks. In this thesis, we quantitatively analyze how network coding may improve block availability and introduce resilience to peer dynamics. Since in reality, network coding can only be performed within segments, each containing a subset of blocks, we explore the fundamental tradeoff between the resilience gain of network coding and its inherent coding complexity, as the number of blocks in a segment varies. As another application of the resilience of network coding, we also devise an indirect data collection scheme based on network coding for the purpose of large-scale network measurements. Peer-to-Peer Content Distribution Network Coding File Sharing Churn Measurement Collection Coding Complexity Loss Resilience Data Collection Log Collection Avalanche BitTorrent Generation Segment 0544
74	Appliction-driven Memory System Design on FPGAs Dai, Zefu 08 January 2014 (has links) Moore's Law has helped Field Programmable Gate Arrays (FPGAs) scale continuously in speed, capacity and energy efficiency, allowing the integration of ever-larger systems into a single FPGA chip. This brings challenges to the productivity of developers in leveraging the sea of FPGA resources. Higher level of design abstractions and programming models are needed to improve the design productivity, which in turn require memory architectural supports on FPGAs. While previous efforts focus on computation-centric applications, we take a bandwidth-centric approach in designing memory systems. In particular, we investigate the scheduling, buffered switching and searching problems, which are common to a wide range of FPGA applications. Despite that the bandwidth problem has been extensively studied for general-purpose computing and application specific integrated circuit (ASIC) designs, the proposed techniques are often not applicable to FPGAs. In order to achieve optimized design implementations, designers need to take into consideration both the underlying FPGA physical characteristics as well as the requirements from applications. We therefore extract design requirements from four driving applications for the selected problems, and address them by exploiting the physical architectures and available resources of FPGAs. Towards solving the selected problems, we manage to advance state-of-the-art with a scheduling algorithm, a switch organization and a cache analytical model. These lead to performance improvements, resource savings and feasibilities of new approaches for well-known problems. FPGA Memory System Muti-port Memory Controller Switch Fabric Cache Churn Rate Flash XML Quality-of-Service Bandwidth Guarantee Crosspoint Queued Switch Zipf's Law Online Video 0544
75	Appliction-driven Memory System Design on FPGAs Dai, Zefu 08 January 2014 (has links) Moore's Law has helped Field Programmable Gate Arrays (FPGAs) scale continuously in speed, capacity and energy efficiency, allowing the integration of ever-larger systems into a single FPGA chip. This brings challenges to the productivity of developers in leveraging the sea of FPGA resources. Higher level of design abstractions and programming models are needed to improve the design productivity, which in turn require memory architectural supports on FPGAs. While previous efforts focus on computation-centric applications, we take a bandwidth-centric approach in designing memory systems. In particular, we investigate the scheduling, buffered switching and searching problems, which are common to a wide range of FPGA applications. Despite that the bandwidth problem has been extensively studied for general-purpose computing and application specific integrated circuit (ASIC) designs, the proposed techniques are often not applicable to FPGAs. In order to achieve optimized design implementations, designers need to take into consideration both the underlying FPGA physical characteristics as well as the requirements from applications. We therefore extract design requirements from four driving applications for the selected problems, and address them by exploiting the physical architectures and available resources of FPGAs. Towards solving the selected problems, we manage to advance state-of-the-art with a scheduling algorithm, a switch organization and a cache analytical model. These lead to performance improvements, resource savings and feasibilities of new approaches for well-known problems. FPGA Memory System Muti-port Memory Controller Switch Fabric Cache Churn Rate Flash XML Quality-of-Service Bandwidth Guarantee Crosspoint Queued Switch Zipf's Law Online Video 0544
76	Technique de visualisation pour l’identification de l’usage excessif d’objets temporaires dans les traces d’exécution Duseau, Fleur 12 1900 (has links) De nos jours, les applications de grande taille sont développées à l’aide de nom- breux cadres d’applications (frameworks) et intergiciels (middleware). L’utilisation ex- cessive d’objets temporaires est un problème de performance commun à ces applications. Ce problème est appelé “object churn”. Identifier et comprendre des sources d’“object churn” est une tâche difficile et laborieuse, en dépit des récentes avancées dans les tech- niques d’analyse automatiques. Nous présentons une approche visuelle interactive conçue pour aider les développeurs à explorer rapidement et intuitivement le comportement de leurs applications afin de trouver les sources d’“object churn”. Nous avons implémenté cette technique dans Vasco, une nouvelle plate-forme flexible. Vasco se concentre sur trois principaux axes de con- ception. Premièrement, les données à visualiser sont récupérées dans les traces d’exécu- tion et analysées afin de calculer et de garder seulement celles nécessaires à la recherche des sources d’“object churn”. Ainsi, des programmes de grande taille peuvent être vi- sualisés tout en gardant une représentation claire et compréhensible. Deuxièmement, l’utilisation d’une représentation intuitive permet de minimiser l’effort cognitif requis par la tâche de visualisation. Finalement, la fluidité des transitions et interactions permet aux utilisateurs de garder des informations sur les actions accomplies. Nous démontrons l’efficacité de l’approche par l’identification de sources d’“object churn” dans trois ap- plications utilisant intensivement des cadres d’applications framework-intensive, inclu- ant un système commercial. / Nowadays, large framework-intensive programs are developed using many layers of frameworks and middleware. Bloat, and particularly object churn, is a common per- formance problem in framework-intensive applications. Object churn consists of an ex- cessive use of temporary objects. Identifying and understanding sources of churn is a difficult and labor-intensive task, despite recent advances in automated analysis tech- niques. We present an interactive visualization approach designed to help developers quickly and intuitively explore the behavior of their application with respect to object churn. We have implemented this technique in Vasco, a new flexible and scalable visualization platform. Vasco follows three main design goals. Firstly, data is collected from execu- tion traces. It is analyzed in order to calculate and keep only the data that is necessary to locate sources of object churn. Therefore, large programs can be visualized while keeping a clear and understandable view. Secondly, the use of an intuitive view allows minimizing the cognitive effort required for the visualization task. Finally, the fluidity of transitions and interactions allows users to mentally preserve the context throughout their interactions. We demonstrate the effectiveness of the approach by identifying churn in three framework-intensive applications, including a commercial system. visualisation visualization applications framework-intensive framework-intensive applications object churn dynamic analysis execution traces analyse dynamique traces d'execution
77	Prédiction de l'attrition en date de renouvellement en assurance automobile avec processus gaussiens Pannetier Lebeuf, Sylvain 08 1900 (has links) Le domaine de l’assurance automobile fonctionne par cycles présentant des phases de profitabilité et d’autres de non-profitabilité. Dans les phases de non-profitabilité, les compagnies d’assurance ont généralement le réflexe d’augmenter le coût des primes afin de tenter de réduire les pertes. Par contre, de très grandes augmentations peuvent avoir pour effet de massivement faire fuir la clientèle vers les compétiteurs. Un trop haut taux d’attrition pourrait avoir un effet négatif sur la profitabilité à long terme de la compagnie. Une bonne gestion des augmentations de taux se révèle donc primordiale pour une compagnie d’assurance. Ce mémoire a pour but de construire un outil de simulation de l’allure du porte- feuille d’assurance détenu par un assureur en fonction du changement de taux proposé à chacun des assurés. Une procédure utilisant des régressions à l’aide de processus gaus- siens univariés est développée. Cette procédure offre une performance supérieure à la régression logistique, le modèle généralement utilisé pour effectuer ce genre de tâche. / The field of auto insurance is working by cycles with phases of profitability and other of non-profitability. In the phases of non-profitability, insurance companies generally have the reflex to increase the cost of premiums in an attempt to reduce losses. For cons, very large increases may have the effect of massive attrition of the customers. A too high attrition rate could have a negative effect on long-term profitability of the company. Proper management of rate increases thus appears crucial to an insurance company. This thesis aims to build a simulation tool to predict the content of the insurance portfolio held by an insurer based on the rate change proposed to each insured. A proce- dure using univariate Gaussian Processes regression is developed. This procedure offers a superior performance than the logistic regression model typically used to perform such tasks. forage de données processus gaussien attrition assurance automobile data mining gaussian process churn automobile insurance
78	Random forest em dados desbalanceados: uma aplicação na modelagem de churn em seguro saúde Lento, Gabriel Carneiro 27 March 2017 (has links) Submitted by Gabriel Lento (gabriel.carneiro.lento@gmail.com) on 2017-05-01T23:16:04Z No. of bitstreams: 1 Dissertação Gabriel Carneiro Lento.pdf: 832965 bytes, checksum: f79e7cb4e5933fd8c3a7c67ed781ddb5 (MD5) / Approved for entry into archive by Leiliane Silva (leiliane.silva@fgv.br) on 2017-05-04T18:39:57Z (GMT) No. of bitstreams: 1 Dissertação Gabriel Carneiro Lento.pdf: 832965 bytes, checksum: f79e7cb4e5933fd8c3a7c67ed781ddb5 (MD5) / Made available in DSpace on 2017-05-17T12:43:35Z (GMT). No. of bitstreams: 1 Dissertação Gabriel Carneiro Lento.pdf: 832965 bytes, checksum: f79e7cb4e5933fd8c3a7c67ed781ddb5 (MD5) Previous issue date: 2017-03-27 / In this work we study churn in health insurance, that is predicting which clients will cancel the product or service within a preset time-frame. Traditionally, the probability whether a client will cancel the service is modeled using logistic regression. Recently, modern machine learning techniques are becoming popular in churn modeling, having been applied in the areas of telecommunications, banking, and car insurance, among others. One of the big challenges in this problem is that only a fraction of all customers cancel the service, meaning that we have to deal with highly imbalanced class probabilities. Under-sampling and over-sampling techniques have been used to overcome this issue. We use random forests, that are ensembles of decision trees, where each of the trees fits a subsample of the data constructed using either under-sampling or over-sampling. We compare the distinct specifications of random forests using various metrics that are robust to imbalanced classes, both in-sample and out-of-sample. We observe that random forests using imbalanced random samples with fewer observations than the original series present a better overall performance. Random forests also present a better performance than the classical logistic regression, often used in health insurance companies to model churn. / Neste trabalho estudamos o problema de churn em seguro saúde, isto é, a previsão se o cliente irá cancelar o produto ou serviço em até um período de tempo pré-estipulado. Tradicionalmente, regressão logística é utilizada para modelar a probabilidade de cancelamento do serviço. Atualmente, técnicas modernas de machine learning vêm se tornando cada vez mais populares para esse tipo de problema, com exemplos nas áreas de telecomunicação, bancos, e seguros de carro, dentre outras. Uma das grandes dificuldades nesta modelagem é que apenas uma pequena fração dos clientes de fato cancela o serviço, o que significa que a base de dados tratada é altamente desbalanceada. Técnicas de under-sampling e over-sampling são utilizadas para contornar esse problema. Neste trabalho, aplicamos random forests, que são combinações de árvores de decisão ajustadas em subamostras dos dados, construídas utilizando under-sampling e over-sampling. Ao fim do trabalho comparamos métricas de ajustes obtidas nas diversas especificações dos modelos testados e avaliamos seus resultados dentro e fora da amostra. Observamos que técnicas de random forest utilizando sub-amostras não balanceadas com o tamanho menor do que a amostra original apresenta a melhor performance dentre as random forests utilizadas e uma melhora com relação ao praticado no mercado de seguro saúde. Under-sampling Over-sampling Imbalanced class Health insurance Random forest Churn Dados desbalanceados Seguro saúde Matemática Aprendizado do computador Mineração de dados (Computação) Seguro-saúde
79	Understanding when customers leave : Defining customer health and how it correlates with software usage Åman, Robert January 2017 (has links) More and more businesses today focus on building long-term customer relationships with the objective to secure recurring revenues in competitive markets. As a result, management philosophies such as Customer Success have emerged, which underlines the importance of knowing your customers in order to make them stay. A common way of tracking the well-being of a firm's customers is the use of customer health scores. Such tools monitor assembled data and indicate whether a customer is doing fine, or is in the risk zone of ending the business relationship. However, there exists little to no consensus on what customer health actually means, or how to distinguish suitable parameters for measuring this concept. Therefore, the purpose of this thesis has been: To extend the existing knowledge of the business concept customer health, and show how to identify relevant parameters for measuring customer health. To reach this purpose, a study has been conducted at a software-as-a-service company operating in the field of digital marketing; where methods such as semi-structured interviews, ethnography, web survey, data mining execution and statistical analysis have been used. The results show that software usage differs between active and former customers, with the general tendency that a high software usage indicates a higher propensity to stay as a customer. The study concludes that customer health is best defined as "the perceived value a customer experiences when using a product". In addition, the parameters that were found to best indicate customer health at the company studied were linked to customers’ software usage as well as their marketing set-up. Customer Success Customer Relationship Management customer health perceived value satisfaction loyalty retention churn SaaS Övrig annan teknik
80	Structured peer-to-peer overlays for NATed churn intensive networks Chowdhury, Farida January 2015 (has links) The wide-spread coverage and ubiquitous presence of mobile networks has propelled the usage and adoption of mobile phones to an unprecedented level around the globe. The computing capabilities of these mobile phones have improved considerably, supporting a vast range of third party applications. Simultaneously, Peer-to-Peer (P2P) overlay networks have experienced a tremendous growth in terms of usage as well as popularity in recent years particularly in fixed wired networks. In particular, Distributed Hash Table (DHT) based Structured P2P overlay networks offer major advantages to users of mobile devices and networks such as scalable, fault tolerant and self-managing infrastructure which does not exhibit single points of failure. Integrating P2P overlays on the mobile network seems a logical progression; considering the popularities of both technologies. However, it imposes several challenges that need to be handled, such as the limited hardware capabilities of mobile phones and churn (i.e. the frequent join and leave of nodes within a network) intensive mobile networks offering limited yet expensive bandwidth availability. This thesis investigates the feasibility of extending P2P to mobile networks so that users can take advantage of both these technologies: P2P and mobile networks. This thesis utilises OverSim, a P2P simulator, to experiment with the performance of various P2P overlays, considering high churn and bandwidth consumption which are the two most crucial constraints of mobile networks. The experiment results show that Kademlia and EpiChord are the two most appropriate P2P overlays that could be implemented in mobile networks. Furthermore, Network Address Translation (NAT) is a major barrier to the adoption of P2P overlays in mobile networks. Integrating NAT traversal approaches with P2P overlays is a crucial step for P2P overlays to operate successfully on mobile networks. This thesis presents a general approach of NAT traversal for ring based overlays without the use of a single dedicated server which is then implemented in OverSim. Several experiments have been performed under NATs to determine the suitability of the chosen P2P overlays under NATed environments. The results show that the performance of these overlays is comparable in terms of successful lookups in both NATed and non-NATed environments; with Kademlia and EpiChord exhibiting the best performance. The presence of NATs and also the level of churn in a network influence the routing techniques used in P2P overlays. Recursive routing is more resilient to IP connectivity restrictions posed by NATs but not very robust in high churn environments, whereas iterative routing is more suitable to high churn networks, but difficult to use in NATed environments. Kademlia supports both these routing schemes whereas EpiChord only supports the iterating routing. This undermines the usefulness of EpiChord in NATed environments. In order to harness the advantages of both routing schemes, this thesis presents an adaptive routing scheme, called Churn Aware Routing Protocol (ChARP), combining recursive and iterative lookups where nodes can switch between recursive and iterative routing depending on their lifetimes. The proposed approach has been implemented in OverSim and several experiments have been carried out. The experiment results indicate an improved performance which in turn validates the applicability and suitability of ChARP in NATed environments. 004.6

Search results