Global ETD Search

1	Vybrané metody pro aplikace pokročilých analytik v prostředí Cloud Homola, Petr January 2013 (has links) No description available. dolování dat; clouds; CRISP-DM
2	Aplikace systému LISp-Miner na rozsáhlá reálná data / Using system LISp-Miner for large real data Hrnčíř, Jan January 2017 (has links) This dissertation thesis describes an advanced method of knowledge discovery in databases (KDD), implemented in system LISp-Miner. The goal is to show the possibilities of coordinated use of analytical tools and complex procedures GUHA in this system. The thesis uses methodology CRISP-DM, which is firstly described and work is proceeded using this methodology in the following sections. The author firstly introduces readers domain area and then the data itself, which are processed to the analysis needs. Analytical questions that are answered at, are drawn from the literature, which is focused on domain area. The work should be used as a guide to LISp-Miner users, using analytical tools and procedures GUHA is therefore described the easiest way to understand. LISp-Miner; KDD; DZD; GUHA; CRISP-DM
3	Model pro ohodnocení bonity klienta v pojišťovně Píška, Vladimír January 2006 (has links) Diplomová práce se zabývá problematikou hodnocení bonity klienta v české komerční pojišťovně. Skládá se ze dvou hlavních logických celků ? přípravy teoretického modelu bonity klienta a jeho praktického ověření na reálných datech jedné české pojišťovny. Příprava modelu bonity klienta se přidržuje postupu popsaného v metodice CRISP-DM. Postupně jsou prozkoumány současné způsoby sledování bonity klientů v českém bankovním i nebankovním sektoru a je rozebrán způsob určování bonity klienta v amerických pojišťovnách. Následuje samotné sestavování modelu bonity klienta v pojišťovně. Nejdříve jsou nalezeny oblasti ke sledování a z těchto oblastí jsou vybrány vhodné ukazatele bonity klienta. Přípravu modelu uzavírá nastavení vah u jednotlivých ukazatelů a popis sledovaných kategorií bonity klienta. Druhý logický celek se zabývá aplikací připraveného modelu bonity klienta v praxi. Popsána je fyzická architektura řešení, příprava datové základny, použitá skóringová aplikace a převedení modelu bonity klienta do této aplikace. Dalšími popsanými kroky jsou testování modelu na vzorku dat a na kompletním portfoliu klientů spolupracující pojišťovny. Výsledky jsou analyzovány a zobrazeny v grafech. Poté jsou obdržené výsledky porovnávány s očekávanými výsledky. Diplomová práce končí diskuzí k využití bonity klienta v reálných procesech pojišťovny.
4	Expanding Data Mining Theory for Industrial Applications January 2012 (has links) abstract: The field of Data Mining is widely recognized and accepted for its applications in many business problems to guide decision-making processes based on data. However, in recent times, the scope of these problems has swollen and the methods are under scrutiny for applicability and relevance to real-world circumstances. At the crossroads of innovation and standards, it is important to examine and understand whether the current theoretical methods for industrial applications (which include KDD, SEMMA and CRISP-DM) encompass all possible scenarios that could arise in practical situations. Do the methods require changes or enhancements? As part of the thesis I study the current methods and delineate the ideas of these methods and illuminate their shortcomings which posed challenges during practical implementation. Based on the experiments conducted and the research carried out, I propose an approach which illustrates the business problems with higher accuracy and provides a broader view of the process. It is then applied to different case studies highlighting the different aspects to this approach. / Dissertation/Thesis / M.S. Computer Science 2012 Computer science Computer engineering CRISP-DM Data Mining KDD SEMMA
5	Analýza metod k odhalení znalosti v datech Procházková, Veronika January 2014 (has links) My diploma thesis deals with the issue of data mining and its use in the commercial sphere. The aim of my work was first to assemble knowledge from data mining and then use it on particular data. In the first part I gather the theoretical information about data mining. I focused on definition, methods of data mining, algorithms and of course on the most frequent usage. The second part consists of the practical application of acquired knowledge on real-world date from mobile telecommunications.
6	Modelo de classificação multivariável para identificação de enchentes: um estudo empírico no sistema de monitoramento de rios e-noe / Multivariate classification model for identification of floods: an empirical study in the monitoring of e-noe rivers Brito, Lucas Augusto Vieira 17 May 2019 (has links) Nas últimas décadas, as enchentes vêm causando muitos problemas nas cidades, principalmente em grandes centros urbanos devido à alteração da paisagem natural e à impermeabilização do terreno. Geralmente esses eventos estão relacionados a eventos extremos de chuva, junto a um insuficiente sistema de drenagem para dar vazão ao escoamento gerado. Um ponto agravante - que colabora com o aumento da magnitude das enchentes - é o crescimento populacional desordenado. Assim, faltam políticas públicas, como um estudo prévio da região para alocação de pessoas de maneira eficiente. Na literatura, existem algumas soluções, como o uso da tecnologia de Redes de Sensores Sem Fio (RSSF), que podem ser implantadas no cenário urbano como forma de monitoramento de enchentes. Nesse cenário, um dos principais desafios para elaboração desses sistemas é emitir alertas para que desastres maiores sejam evitados. Porém, a utilização de uma única fonte de dados, unida a possíveis falhas que as RSSFs podem sofrer, acaba comprometendo o monitoramento e o alerta de enchentes. Uma outra abordagem é a utilização de modelos hidrológicos criados a partir de um estudos prévios do solo e da estrutura da bacia, pois eles são capazes de reproduzir o comportamento do escoamento da bacia a partir de séries temporais como entrada. Existem muitos modelos hidrológicos com diversas estruturas de dados e detalhamento da bacia hidrográfica, dos mais complexos - capazes de reproduzir a física dos processos de infiltração e o escoamento de água - até os mais simplificados, que utilizam parâmetros de ajustes que não são necessariamente relacionados aos fenômenos físicos envolvidos nesses processos. Porém, muitos desses modelos precisam de uma grande quantidade de dados para o seu desenvolvimento, tornando-os muito complexos e custosos. Dessa forma, esta dissertação de mestrado apresenta um modelo de identificação de enchentes baseado na mineração de dados e aprendizado de máquina, com o intuito de diminuir a complexidade e o custo dos modelos hidrológicos e a dependabilidade de uma única variável de sistemas de RSSF, além da vantagem de ser facilmente generalizável sem perder a eficiência na identificação de enchente. As variáveis utilizadas para o desenvolvimento do modelo são os dados de estações meteorológicas e o nível de água do canal. Assim, é utilizada a metodologia do Cross Industry Standard Process for Data Mining (CRISP-DM) para a mineração dos dados, por ser uma técnica objetiva que contém as melhores práticas para a exploração dos dados. Os resultados revelam que o modelo desenvolvido obteve uma acurácia de aproximadamente 87:8%, com o algoritmo Random_Forest. Além disso, nos testes de adaptabilidade e comparação com o Storm Water Management Model (SWMM)-um modelo hidrológico amplamente conhecido na literatura-, em uma mesma região de estudo, o modelo desenvolvido obteve resultados relevantes no contexto de identificação de enchente. Isso mostra que o modelo desenvolvido possui grande potencial de aplicação, principalmente por sua simplicidade de implementação e replicação sem comprometer a qualidade de identificação da ocorrência de enchentes. Consequentemente, algumas das principais contribuições deste trabalho são: (i) o modelo multivariável de identificação de enchente diminui a complexidade, custos e tempo de desenvolvimento em relação aos modelos hidrológicos e; (ii) o avanço do estado da arte em comparação aos trabalhos computacionais, por não depender de variáveis fixas e utilizar multivariáveis para identificar o padrão de enchentes. / In recent decades, floods have caused many problems in cities, especially in large urban centers due to the alteration of the natural landscape and the waterproofing of the terrain. Generally, these events are related to extreme rainfall events, together with an insufficient drainage system to give flow to the flow generated. An aggravating point - which contributes to the increase in flood magnitude - is disordered population growth. Thus, public policies are lacking, such as a prior study of the region for the efficient allocation of people. In the literature, there are some solutions, such as the use of the Wireless Sensor Networks (WSN) technology, which can be implemented in the urban scene as a form of flood monitoring. In this scenario, one of the major challenges in designing these systems is to issue alerts so that major disasters are avoided. However, the use of a single data source, coupled with the possible flaws that WSNs may suffer, endangers flood monitoring and alertness. Another approach is the use of hydrological models created from previous soil studies and basin structure, as they are able to reproduce basin flow behavior from time series as input. There are many hydrological models with diverse data structures and details of the hydrographic basin, of the most complex - capable of reproducing the physics of the infiltration processes and the water flow - to the more simplified, that use parameters of adjustments that are not necessarily related to the phenomena involved in these processes. However, many of these models need a lot of data for their development, making them very complex and costly. This dissertation presents a flood identification model based on data mining and machine learning in order to reduce the complexity and cost of hydrological models and the dependability of a single variable of WSN systems. of the advantage of being easily generalizable without losing efficiency in the identification of flood. The variables used for the development of the model are the data of meteorological stations and the water level of the channel. Thus, the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology for data mining is used, since it is an objective technique that contains the best practices for data mining. The results show that the developed model obtained an accuracy of approximately 87.8%, with the algorithm Random_Forest. In addition, in the adaptive and comparative tests with the Storm Water Management Model (SWMM), a hydrological model widely known in the literature, in the same region of study, the developed model obtained relevant results in the context of flood identification. This shows that the developed model has great application potential, mainly for its simplicity of implementation and replication without compromising the quality of the identification of the occurrence of floods. Consequently, some of the main contributions of this work are: (i) the multivariate model of flood identification decreases the complexity, costs and development time in relation to the hydrological models; (ii) the advance of the state of the art in comparison to the computational works, because it does not depend on fixed variables and use multivariable to identify the flood pattern. Aprendizado de máquina CRISP-DM CRISP-DM Data mining Flood identification Identificação de enchentes Machine learning Mineração de dados RSSF WSN
7	Doménové znalosti, analytické otázky, systém LISp-Miner a data ADAMEK / Knowledge base, analytical questions, LISp-Mner system and ADAMEK data Kubín, Richard January 2009 (has links) The steps associated with the analytical question solving in terms of LISp-Miner system in ADAMEK medical data are the theme of this thesis. The operating sequence of using 4ft-Miner and SD4ft-Miner procedures in ADAMEK data together with the possibility of further use of formalized background knowledge and preparing routing for automatization of the downrighted steps are the objectiv of this thesis. The summary of the basic concepts and axioms of association rules and GUHA method is the content of the theoretical part of the thesis. Operativ part starts from CRISP-DM methodology. The operating sequence enabling searching for interesting association rules in different data, that is applied on STULONG medical data afterwards in order to get instigations for it's revision, is the produce of this thesis. Used data that come from EuroMISE are concern with cardiological patients.
8	Získávání znalostí z marketingových dat / Knowledge discovery in marketing data Kazárová, Marie January 2020 (has links) Data mining techniques are used by companies to gain competitive advantages. In today's marketplace, they are also used by marketers mainly for personalization of advertising and for maintaining long-term relationship with customers. Progress in knowledge discovery in databases and availability of computational power comes not only with positive impact, but also with challenges. The practical part of the thesis aims to explore and describe data mining techniques applied to e-commerce dataset. Dataset consists of transaction and web analytics data. The goal of experimental application aims to make a selection of users who most probably react to a marketing communication and to identify the factors which influence them. Target segment of users is obtained through the use of data mining technique clustering. The classification model uses decision tree algorithm to predict whether users submit transaction with an accuracy of 75%. The results are useful for optimization of marketing and business strategy.
9	Churn inom SaaS : En fallstudie om betydelsefulla kundattribut inom ett SaaS-företag med B2B kunder / Churn in SaaS : A case study of significant customer attributes in a SaaS company with B2B customers Jonson, Filip, Hedvall, Love January 2021 (has links) Software as a service (SaaS) är en affärsmodell som syftar till att användaren prenumererar på en mjukvara Mjukvaran levereras över internet vilket medför att användaren inte behöver tänka på mjukvaruuppdateringar och driftunderhåll av servrar. Churn innebär att användaren avslutar sin prenumeration hos ett företag och därmed slutar vara kund. Förvärv av nya kunder är en dyr process, som kan kosta upp till fem gånger mer än att sälja till en redan befintlig kund. Tidigare forskning inom churn har främst varit koncentrerad till telekombolag. Undersökningar har specialiserats på maskininlärningsmetoder för att studera churn. Tidigare studier beskriver att det finns begränsad forskning för churn inom SaaS-företag med B2B kunder. De studier som har undersökt churn har främst varit fallstudier där olika kundattribut har studerats utifrån generella- och beteendekundattribut. Studien har i samarbete med ett SaaS-företag undersökt flera kundattribut på ett lönehanteringssystem. Syftet har varit att undersöka vilka kundattribut som är intressanta att ta ut statistik på när churn studeras. Ovanstående ska medföra att det studerade företaget kan införskaffa insikter och arbeta mer med datadrivna beslut. För att förstå vilka kunder som väljer att avsluta sin prenumeration behövs data samlas in om kunderna. En kvantitativ fallstudie utfördes genom att undersöka flera kundattribut hos de kunder som har churnat. Undersökningen utfördes med modellen CRISP-DM för att genomföra dataanalysen på ett systematiskt tillvägagångsätt. Undersökningen studerade kundattribut utifrån variablerna generella- och beteendekundattribut. Dataanalysen genomfördes med hjälp av Python-kod och resultatet presenterades med grafer och tabeller. Studiens resultat visade att vissa värden på följande kundattribut var överrepresenterade vid churn: Kundtyper, Bolagsform, Antal Anställda, Licenser, Antal skickade specifikationer och inloggning. Tidigare forskning har undersökt olika kundattribut och funnit att de kan behöva anpassas för det studerade företaget. / Software as a service (SaaS) is a business model that aims the user to subscribe to a software. The software is delivered over the internet, which means that the user does not have to consider updates and operational maintenance of servers. Churn means that the user cancels his subscription with a company and thereby stops to be a customer. Acquiring new customers is an expensive process, which can cost up to five times more than selling to an existing customer. Previous research in churn has mainly been concentrated in the telecommunications industry. In the mentioned area, churn has long been a problem for companies. Research has concentrated on machine learning methods for studying churn. Previous research describes that there are limited studies in churn with SaaS as a business model. Studies about churn have mainly been case studies where different attributes have been studied based on general and behavioral customer attributes. This study has in collaboration with a SaaS company, examined several customer attributes on a salary management program. The purpose has been to investigate which customer attributes that are interesting to collect statistics when churn is studied. This should enable that the studied company can acquire insights and work more with data-driven decisions. To understand which customers that unsubscribe, data needs to be collected about the customers. A quantitative case study was performed by examining several customer attributes of the customers who have churned. The survey was carried out with the CRISP-DM model to accomplish the data analysis in a systematic approach. The survey studied customer attributes based on the variables general and behavioral customer attributes. The data analysis was performed using Python code and the results were presented with graphs and tables. The results of the study showed that certain values of the following customer attributes were overrepresented in churn: Customer types, Business type, Number of Employees, Licenses, Number of specifications sent and Login. Previous research has examined various customer attributes and found that they may need to be adapted for the studied company. Churn SaaS CRISP-DM General customer attributes Behavioral customer attributes Churn SaaS CRISP-DM Generella kundattribut Beteendekundattribut Information Systems
10	Evaluating Frameworks for Implementing Machine Learning in Signal Processing : A Comparative Study of CRISP-DM, SEMMA and KDD Dåderman, Antonia, Rosander, Sara January 2018 (has links) Machine learning is when a computer can learn from data and draw its own conclusions without being explicitly programmed to do so. To implement machine learning effectively and correctly, it is important to have a structured framework to follow. Today, there exist several different frameworks but no framework is suited for all purposes of machine learning. This thesis evaluates three chosen frameworks CRISP-DM, SEMMA and KDD for the purpose of imple- menting machine learning in signal processing. This study was conducted at Saab AB in Ja¨rf¨alla. The specific problem area of signal processing that was evaluated in the thesis was radar warn- ing systems. A hypothesis is that they could become more efficient with machine learning. To evaluate the chosen frameworks, it was studied what was demanded from a framework when implementing machine learning in the chosen problem area. The evaluation was done with a theoretical comparison where no implementations of the different frameworks were done. The frameworks were evaluated through an evaluation method created by the authors. The evaluation method was used for the purpose of finding a framework suitable for signal processing when developing the software for a radar warning system. The result is that CRISP-DM is the most well-suited of the three frame- works. This because it originates from a business perspective, is distinct in how to use it and is easy to implement in an agile process like Scrum. / Maskininlärning är när en dator kan lära sig från data och dra egna slutsatser utan att specifikt vara programmerad att göra det. För att lyckas med att implementera maskininlärning på ett effektivt sätt så krävs det att man följer ett tydligt ramverk. Idag finns det många ramverk men inget som är lämpat för alla typer av maskininlärning. Denna rapport utvärderar tre valda ramverk: CRISP- DM, SEMMA och KDD. Detta med syftet att implementera maskininlärn-ing i signalbehandling. Studien utfördes på Saab AB i Järfälla. Det specifika problemområde inom signalbehandling som utvärderades i rapporten var radarvarningssys- tem. En hypotes är att de kan bli mer effektiva med maskininlärning. För att utvärdera de valda ramverken så studerades vad som krävdes av ett ramverk för det valda problemområdet. Utvärderingen skedde genom en teoretisk jämförelse där ingen implementation av de olika ramverken genomfördes. Ramverken utvärderades genom en utvärderingsmetod skapad av förfat-tarna. Utvärderingsmetoden användes med syftet att finna ett ramverk som var lämpligt för signalbehandling vid utveckling av mjukvara för ett radarvarningssystem. Resultatet var att CRISP-DM var den mest lämpade metoden. Detta för att den utgår från ett affärsperspektiv, har tydliga riktlinjer hur den ska användas och att den enkelt kan implementeras i agila processer såsom Scrum. Radar Warning System Saab Machine Learning CRISP-DM SEMMA KDD Computer and Information Sciences Data- och informationsvetenskap

Search results