Global ETD Search

31	Predikce výsledků hokejových utkání pomocí data mining modelu / Ice Hockey Match Prediction Using Data Mining Model Matuš, Martin January 2014 (has links) This thesis focuses on creation and comparison of ice hockey matches prediction models with the view on ice hockey world championship matches. The first part is dedicated to collecting theoretical knowledge needed for solving this problem and the second to applying this set of knowledge. The model creation approach is intertwined with the CRISP-DM data mining methodology, which also defines several chapters of this work. As input data for the models I used performance statistics of individual ice hockey players -- this brought me to implementing a script capable of automatic downloading and aggregating of player data from the Internet. Downloaded data were arranged so as they would represent ice hockey matches that were played during the championships (team A consisting of players X against team B consisting of players Y) with result of the match added to the data row. Data were also analyzed to detect any quality issue prior to the model creation and transformed into an integrated view. Result assessment consists of two parts, in the first the technical evaluation of models using data from the testing data set takes place. The first part also points out practical usefulness of the models. The next part is about comparing result data with the betting odds -- the business relevance of the model. This part uses open source data about betting odds listed on the corresponding matches. Finally, the outcome model is used for predicting matches of the group phase of the world championship taking place in Prague, 2015.
32	Využití systému LISp-Miner při analýze faktorů ovlivňujících dominanci sinic ve fytoplanktonu / Utilization of System LISp-Miner in the Analysis of the Factors Influencing the Dominance of Cyanobacteria in Phytoplankton Hlaváčová, Tereza January 2013 (has links) The aim of this work is to describe steps associated with solving analytical questions using the LISp-Miner in the data from water-analyzes of 12 ponds in South Bohemia in the period from year 2007 to 2012. Analytical questions are primarily focused on issues of cyanobacteria, based on instructions of data-owner, Povodí Vltavy, státní podnik. Apart from a description of the application of procedures KL-Miner, CF-Miner and 4ft-Miner on data, the work aims to prepare an automating process based on steps made during using procedures. The theoretical part is a summary of the basic concepts and principles associated with association rules and GUHA method. The practical part follows the CRISP-DM methodology. The result is a proposal of automation process by which it is possible to look for interesting rules in the hydrobiological and hydrochemical data. Then there is a set of recommendations for better utilization of database for KDD, with proposals how to modify and prepare the data.
33	Analýza reálných dat produktové redakce Alza.cz pomocí metod DZD / Analysis of real data from Alza.cz product department using methods of KDD Válek, Martin January 2014 (has links) This thesis deals with data analysis using methods of knowledge discovery in databases. The goal is to select appropriate methods and tools for implementation of a specific project based on real data from Alza.cz product department. Data analysis is performed by using association rules and decision rules in the Lisp-Miner and decision trees in the RapidMiner. The methodology used is the CRISP-DM. The thesis is divided into three main sections. First section is focused on the theoretical summary of information about KDD. There are defined basic terms and described the types of tasks and methods of KDD. In the second section is introduced the methodology CRISP-DM. The practical part firstly introduces company Alza.cz and its goals for this task. Afterwards, the basic structure of the data and preparation for the next step (data mining) is described. In conclusion, the results are evaluated and the possibility of their use is outlined.
34	Reálná úloha dobývání znalostí / The Real Knowledge Discovery Task Kolafa, Ondřej January 2012 (has links) The major objective of this thesis is to perform a real data mining task of classifying term deposit accounts holders. For this task an anonymous bank customers with low funds position data are used. In correspondence with CRISP-DM methodology the work is guided through these steps: business understanding, data understanding, data preparation, modeling, evaluation and deployment. The RapidMiner application is used for modeling. Methods and procedures used in actual task are described in theoretical part. Basic concepts of data mining with special respect to CRM segment was introduced as well as CRISP-DM methodology and technics suitable for this task. A difference in proportions of long term accounts holders and non-holders enforced data set had to be balanced in favour of holders. At the final stage, there are twelve models built. According to chosen criterias (area under curve and f-measure) two best models (logistic regression and bayes network) were elected. In the last stage of data mining process a possible real-world utilisation is mentioned. The task is developed only in form of recommendations, because it can't be applied to the real situation.
35	Automatizace dataminingového procesu v datech o dopravních nehodách v Londýně / Automation of a data mining process in the road accidents data from London by the LISp-Miner system Soukup, Tomáš January 2015 (has links) This thesis is focused on the area of automated data mining and to describe steps associated with solving analytical questions using the LISp-Miner system in the data with road accident records. Analytical tasks were primarily created based on domene knowledge from road accidents statistics in Great Britain and from previous analysis in my semestral project. The aim of this thesis is creation of an automated data mining process for analyze the input data by applying 4ft-Miner, Ac4ft-Miner a SD4ft-Miner procedures, and looking for a new knowledge for every single year of the analyzed period. The implementation language is the LMCL language that enables usage of the LISp-Miner system's functionality in an automated way. These created scripts could be used for analyses of another dataset with the same structure or with some manual changes in initial parameters for the quite different data.
36	Datamining - theory and it's application / Datamining - teorie a praxe Popelka, Aleš January 2012 (has links) This thesis deals with the topic of the technology called data mining. First, the thesis describes the term data mining as an independent discipline and then its processing methods and the most common use. The term data mining is thereafter explained with the help of methodologies describing all parts of the process of knowledge discovery in databases -- CRISP-DM, SEMMA. The study's purpose is presenting new data mining methods and particular algorithms -- decision trees, neural networks and genetic algorithms. These facts are used as theoretical introduction, which is followed by practical application searching for causes of meningoencephalitis development of certain sample of patients. Decision trees in system Clementine, which is one of the top datamining tools, were used for the analysys.
37	An exploratory study of manufacturing data and its potential for continuous process improvements from a production economical perspective Todorovac, Kennan, Wiking, Nils January 2021 (has links) Background: Continues improvements in production are essential in order to compete on the market. However, to be an active competitor on the market, companies need to know their strengths and weaknesses, and improve and develop their production continually. Today process industries generate enormous volumes of data and data are considered a valuable source for companies to find new ways to boost their operations' productivity and profitability. Data Mining (DM) is the process of discovering useful patterns and trends in large data sets. Several authors have pointed out data mining as a good data analysis process for manufacturing due to the large amount of data generated and collected from production processes. In manufacturing, DM has two primary goals, descriptive with the focus on discovering patterns to describe the data and predictive where a model is used to determine future values of important variables. Objectives: The objective of this study was to get a deeper understanding of how collected data from production can lead to insights regarding potential production economic improvementsby following the CRISP-DM methodology. In particular to the chosen production line if there were any differences in replenishment durations when it comes to different procedures. Duration in this study is the time the line is halted during a material replenishment. The procedures in question are single-replenishment versus double-replenishment. Further investigated was if there were any differences in the replenishment duration when it comes to which shift team and at what shift time the replenishment procedures were made. Methods: In this study the CRISP-DM methodology was used for structuring the collected data from the case company. The data was primarily historical data from a continues production process. To verify the objective of the study, three hypotheses derived from the objective was tested by using a t test and Bonferroni test. Results: The result showed that the duration of a double-replenishment is lower compared to two single-replenishments. Further results showed that there is a significant difference in the single-replenishment duration between the different shift times and different working teams. The interpretation of the result is that in the short term there is a possibility that implementingdouble replenishments can reduce the throughput time and possibility also the lead time. Conclusions: This study could contribute with knowledge for others who seek a way to use data to detect information or deeper knowledge about a continuous production process. The findings in this study could be specifically interesting for cable manufacturers and, in general, for continuous process manufacturers. Further conclusions are that time-based competition is one way for increasing the competitive advantage in the market. By using manufacturing generated data, it is possible to analyse and find valuable information that can contribute to continuous process improvements and increase the competitive advantage. Process improvement Manufactuing data CRISP-DM Lead time Throughput time Business Administration Företagsekonomi
38	Försäljningsprediktion : en jämförelse mellan regressionsmodeller / Sales prediction : a comparison between regression models Fridh, Anton, Sandbecker, Erik January 2021 (has links) Idag finns mängder av företag i olika branscher, stora som små, som vill förutsäga sin försäljning. Det kan bland annat bero på att de vill veta hur stort antal produkter de skall köpa in eller tillverka, och även vilka produkter som bör investeras i över andra. Vilka varor som är bra att investera i på kort sikt och vilka som är bra på lång sikt. Tidigare har detta gjorts med intuition och statistik, de flesta vet att skidjackor inte säljer så bra på sommaren, eller att strandprylar inte säljer bra under vintern. Det här är ett simpelt exempel, men hur blir det när komplexiteten ökar, och det finns ett stort antal produkter och butiker? Med hjälp av maskininlärning kan ett sånt här problem hanteras. En maskininlärningsalgoritm appliceras på en tidsserie, som är en datamängd med ett antal ordnade observationer vid olika tidpunkter under en viss tidsperiod. I den här studiens fall är detta försäljning av olika produkter som säljs i olika butiker och försäljningen ska prediceras på månadsbasis. Tidsserien som behandlas är ett dataset från Kaggle.com som kallas för “Predict Future Sales”. Algoritmerna som används i för den här studien för att hantera detta tidsserieproblem är XGBoost, MLP och MLR. XGBoost, MLR och MLP har i tidigare forskning gett bra resultat på liknande problem, där bland annat bilförsäljning, tillgänglighet och efterfrågan på taxibilar och bitcoin-priser legat i fokus. Samtliga algoritmer presterade bra utifrån de evalueringsmått som användes för studierna, och den här studien använder samma evalueringsmått. Algoritmernas prestation beskrivs enligt så kallade evalueringsmått, dessa är R², MAE, RMSE och MSE. Det är dessa mått som används i resultat- och diskussionskapitlen för att beskriva hur väl algoritmerna presterar. Den huvudsakliga forskningsfrågan för studien lyder därför enligt följande: Vilken av algoritmerna MLP, XGBoost och MLR kommer att prestera bäst enligt R², MAE, RMSE och MSE på tidsserien “Predict Future Sales”. Tidsserien behandlas med ett känt tillvägagångssätt inom området som kallas CRISP-DM, där metodens olika steg följs. Dessa steg innebär bland annat dataförståelse, dataförberedelse och modellering. Denna metod är vad som i slutändan leder till resultatet, där resultatet från de olika modellerna som skapats genom CRISP-DM presenteras. I slutändan var det MLP som fick bäst resultat enligt mätvärdena, följt av MLR och XGBoost. MLP fick en RMSE på 0.863, MLR på 1.233 och XGBoost på 1.262 / Today, there are a lot of companies in different industries, large and small, that want to predict their sales. This may be due, among other things, to the fact that they want to know how many products they should buy or manufacture, and also which products should be invested in over others. In the past, this has been done with intuition and statistics. Most people know that ski jackets do not sell so well in the summer, or that beach products do not sell well during the winter. This is a simple example, but what happens when complexity increases, and there are a large number of products and stores? With the help of machine learning, a problem like this can be managed easier. A machine learning algorithm is applied to a time series, which is a set of data with several ordered observations at different times during a certain time period. In the case of this study, it is the sales of different products sold in different stores, and sales are to be predicted on a monthly basis. The time series in question is a dataset from Kaggle.com called "Predict Future Sales". The algorithms used in this study to handle this time series problem are XGBoost, MLP and MLR. XGBoost, MLR and MLP. These have in previous research performed well on similar problems, where, among other things, car sales, availability and demand for taxis and bitcoin prices were in focus. All algorithms performed well based on the evaluation metrics used by the studies, and this study uses the same evaluation metrics. The algorithms' performances are described according to so-called evaluation metrics, these are R², MAE, RMSE and MSE. These measures are used in the results and discussion chapters to describe how well the algorithms perform. The main research question for the study is therefore as follows: Which of the algorithms MLP, XGBoost and MLR will perform best according to R², MAE, RMSE and MSE on the time series "Predict Future Sales". The time series is treated with a known approach called CRISP-DM, where the methods are followed in different steps. These steps include, among other things, data understanding, data preparation and modeling. This method is what ultimately leads to the results, where the results from the various models created by CRISP-DM are presented. In the end, it was the MLP algorithm that got the best results according to the measured values, followed by MLR and XGBoost. MLP got an RMSE of 0.863, MLR of 1,233 and XGBoost of 1,262 Time series XGBoost MLP MLR Evaluation Metrics Crisp-DM “Predict Future Sales” Machine Learning Regression Sales Prediction Tidsserie XGBoost MLP MLR Mätvärden Crisp-DM “Predict Future Sales” Maskininlärning Regression Prediktion av försäljning. Computer and Information Sciences Data- och informationsvetenskap
39	An investigation of the relationship between online activity on Studi.se and academic grades of newly arrived immigrant students : An application of educational data mining Menon, Akash, Islam, Nahida January 2017 (has links) This study attempts to analyze the impact of an online educational resource on academic performances among newly arrived immigrant students in Sweden between the grade six to nine in the Swedish school system. The study focuses on the web based educational resource called Studi.se made by Komplementskolan AB.The aim of the study was to investigate the relationship between academic performance and using Studi.se. Another purpose was to see what other factors that can impact academic performances.The study made use of the data mining process, Cross Industry Standard for Data Mining (CRISP-DM), to understand and prepare the data and then create a regression model that is evaluated. The regression model tries predict the dependent variable of grade based on the independent variables of Studi.se activity, gender and years in Swedish schools. The used data set includes the grades in mathematics, physics, chemistry, biology and religion of newly arrived students in Sweden from six municipalities that have access to Studi.se. The data used also includes metrics of the student’s activity on Studi.se.The results show negative correlation between grade and gender of the student across all subjects. In this report, the negative correlation means that female students perform better than male students. Furthermore, there was a positive correlation between number of years a student has been in the same school and their academic grade. The study could not conclude a statistically significant relationship between the activity on Studi.se and the students’ academic grade.Additional explanatory independent variables are needed to make a predictive model as well as investigating alternative regression models other than multiple linear regression. In the sample, a majority of the students have little or no activity on Studi.se despite having free access to the resource through the municipality. / Denna studie analyserar inverkan som digitala läromedel har på skolbetyg bland nyanlända elever i Sverige mellan årskurs sex och nio i det svenska skolsystemet. Studien fokuserar på den webbaserade pedagogisk resursen Studi.se, gjord av Komplementskolan AB.Målet med studien var att undersöka relationen mellan skolresultat och användandet av Studi.se. Ett annat syfte var att undersöka vad för andra faktorer som kan påverka skolresultat.Studien använder sig av datautvinningsprocessen, Cross Industry Standard for Datamining (CRISP-DM), för att förstå, förbereda och analysera datan i form av en regressionsmodell som sedan evalueras. Datasamlingen som används innehåller bland annat skolbetyg i ämnena matematik, fysik, kemi, biologi och religion från sex kommuner som har tillgång till Studi.se. Aktivitet hos eleverna från dessa kommuner på Studi.se hemsidan användes också för studien.Resultaten visar en negativ korrelation mellan betyg och kön hos eleverna i alla ämnena. Den negativa korrelationen betyder i denna rapport att tjejer får bättre betyg i genomsnitt än killar hos urvalet av nyanlända från de sex kommunerna. Dessutom fanns det en positiv korrelation mellan antal år en elev varit i skolan alternativt i svenska skolsystemet och deras betyg. Studien kunde inte säkerställa ett statistisk signifikant resultat mellan aktivitet på Studi.se och elevernas skolresultat.Ett flertal förklarande oberoende variabler behövs för att kunna skapa en prognastisk modell för skolresultat samt bör en undersökning på alternativa regressions modeller förutom linjär multipel regression göras. I studiens urval av nyanlända elever från kommunerna, har majoriteten inte använt eller knappt använt Studi.se även om dessa kommuner haft tillgång till denna resurs. Educational data mining (EDM) data mining (DM) Statistical analysis Multiple linear regression null-hypothesis and level of significance. Utbildningsdatautvinning Datautvinning CRISP-DM Statistisk analys Multipel linjär regression nollhypotes och signifikansnivå Computer and Information Sciences Data- och informationsvetenskap
40	AN INVESTIGATION INTO SPECIFIC SEMINAL PLASMA PROTEINS AND THEIR EFFECT ON THE INNATE IMMUNE RESPONSE TO BREEDING IN THE MARE Fedorka, Carleigh Elizabeth 01 January 2017 (has links) The mare experiences a transient innate immune response to breeding, the resolution of which is crucial for optimal fertility. The majority of mares are able to modulate this inflammation in a timely fashion, but a subpopulation exists which fail to do so and are considered susceptible to persistent breeding-induced endometritis (PBIE). Seminal plasma has been shown to modulate aspects of this inflammation. Recently, two seminal plasma proteins have garnered interest for their immune modulating properties: cysteine-rich secretory protein-3 (CRISP-3) and lactoferrin. These proteins have been found to alter the binding between sperm and neutrophils based on sperm viability in vitro, but minimal work has evaluated their effect on endometrial mRNA expression of cytokines and inflammation in response to breeding. Experiments were performed to analyze the expression of equine CRISP-3. Found to be primarily synthesized in the ampulla of the vas deferens and to a lesser extent in the vesicular gland, CRISP-3 expression was only seen in the postpubertal stallion. Due to the effect of sperm viability on protein function in vitro, varying sperm populations were analyzed for their effect on gene expression in the uterus. It was determined that viable sperm suppressed the gene expression of the inflammatory modulating cytokine interleukin-6 (IL-6) in comparison to dead sperm. Next, the effect of CRISP-3 and lactoferrin on endometrial gene expression in the normal and susceptible mare was investigated. Neither protein had a significant effect on the mRNA expression of inflammatory cytokines in the normal mares at six hours post-breeding. In contrast, lactoferrin was found to significantly suppress the expression of the pro-inflammatory cytokine tumor necrosis factor (TNF)-α in susceptible mares. Due to this, lactoferrin was further analyzed as an immunomodulant for the treatment of PBIE. Susceptible mares were infused with varying doses of lactoferrin at six hours post-breeding. Although not in a dose-dependent fashion, lactoferrin was found to decrease both fluid retention and neutrophil migration, in addition to suppressing the expression of the pro-inflammatory cytokine interferon gamma (IFNγ) and increasing the gene expression of the anti-inflammatory cytokine interleukin-1 receptor antagonist (IL-1RN). In conclusion, CRISP-3 expression occurs in secretory aspects of the male reproductive tract, and appears to be up regulated after sexual maturation. Viability of spermatozoa affects the immune response to breeding and should be taken into consideration for experimental design and interpretation of data. The seminal plasma proteins CRISP-3 and lactoferrin have minimal effect on endometrial gene expression in normal mares, but lactoferrin suppresses the expression of TNF in susceptible mares. Finally, lactoferrin was found to function as a potent anti-inflammatory for the persistent inflammation seen in susceptible mares when administered post-breeding. This protein should be further investigated as a potential therapeutic for the treatment of persistent breeding-induced endometritis. Equine Endometritis Lactoferrin CRISP-3 Uterus Cytokine Animal Studies Large or Food Animal and Equine Medicine Veterinary Physiology

Search results