Global ETD Search

791	Statistical Modeling of Dynamic Risk in Security Systems / Statistisk modellering av dynamisk risk i säkerhetssystem Singh, Gurpreet January 2020 (has links) Big data has been used regularly in finance and business to build forecasting models. It is, however, a relatively new concept in the security industry. This study predicts technology related alarm codes that will sound in the coming 7 days at location $L$ by observing the past 7 days. Logistic regression and neural networks are applied to solve this problem. Due to the problem being of a multi-labeled nature logistic regression is applied in combination with binary relevance and classifier chains. The models are trained on data that has been labeled with two separate methods, the first method labels the data by only observing location $L$. The second considers $L$ and $L$'s surroundings. As the problem is multi-labeled the labels are likely to be unbalanced, thus a resampling technique, SMOTE, and random over-sampling is applied to increase the frequency of the minority labels. Recall, precision, and F1-score are calculated to evaluate the models. The results show that the second labeling method performs better for all models and that the classifier chains and binary relevance model performed similarly. Resampling the data with the SMOTE technique increases the macro average F1-scores for the binary relevance and classifier chains models, however, the neural networks performance decreases. The SMOTE resampling technique also performs better than random over-sampling. The neural networks model outperforms the other two models on all methods and achieves the highest F1-score. / Big data har använts regelbundet inom ekonomi för att bygga prognosmodeller, det är dock ett relativt nytt koncept inom säkerhetsbranschen. Denna studie förutsäger vilka larmkoder som kommer att låta under de kommande 7 dagarna på plats $L$ genom att observera de senaste 7 dagarna. Logistisk regression och neurala nätverk används för att lösa detta problem. Eftersom att problemet är av en multi-label natur tillämpas logistisk regression i kombination med binary relevance och classifier chains. Modellerna tränas på data som har annoterats med två separata metoder. Den första metoden annoterar datan genom att endast observera plats $L$ och den andra metoden betraktar $L$ och $L$:s omgivning. Eftersom problemet är multi-labeled kommer annoteringen sannolikt att vara obalanserad och därför används resamplings metoden, SMOTE, och random over-sampling för att öka frekvensen av minority labels. Recall, precision och F1-score mättes för att utvärdera modellerna. Resultaten visar att den andra annoterings metoden presterade bättre för alla modeller och att classifier chains och binary relevance presterade likartat. Binary relevance och classifier chains modellerna som tränades på datan som använts sig av resamplings metoden SMOTE gav ett högre macro average F1-score, dock sjönk prestationen för neurala nätverk. Resamplings metoden SMOTE presterade även bättre än random over-sampling. Neurala nätverksmodellen överträffade de andra två modellerna på alla metoder och uppnådde högsta F1-score. Statistics applied mathematics machine learning forecasting neural networks logistic regression resampling classifier chains binary relevance Statistik tillämpad matematik maskininlärning neurala nätverk logistisk regression Mathematics Matematik
792	Product Similarity Matching for Food Retail using Machine Learning / Produktliknande matchning för livsmedel med maskininlärning Kerek, Hanna January 2020 (has links) Product similarity matching for food retail is studied in this thesis. The goal is to find products that are similar but not necessarily of the same brand which can be used as a replacement product for a product that is out of stock or does not exist in a specific store. The aim of the thesis is to examine which machine learning model that is best suited to perform the product similarity matching. The product data used for training the models were name, description, nutrients, weight and filters (labels, for example organic). Product similarity matching was performed pairwise and the similarity between the products was measured by jaccard distance for text attributes and relative difference for numeric values. Random Forest, Logistic Regression and Support Vector Machines were tested and compared to a baseline. The baseline computed the jaccard distance for the product names and did the classification based on a threshold value of the jaccard distance. The result was measured by accuracy, F-measure and AUC score. Random Forest performed best in terms of all evaluation metrics and Logistic Regression, Random Forest and Support Vector Machines all performed better than the baseline. / I den här rapporten studeras produktliknande matchning för livsmedel. Målet är att hitta produkter som är liknande men inte nödvändigtvis har samma märke som kan vara en ersättningsprodukt till en produkt som är slutsåld eller inte säljs i en specifik affär. Syftet med den här rapporten är att undersöka vilken maskininlärningsmodel som är bäst lämpad för att göra produktliknande matchning. Produktdatan som användes för att träna modellerna var namn, beskrivning, näringsvärden, vikt och märkning (exempelvis ekologisk). Produktmatchningen gjordes parvis och likhet mellan produkterna beräknades genom jaccard index för textattribut och relativ differens för numeriska värden. Random Forest, logistisk regression och Support Vector Machines testades och jämfördes mot en baslinje. I baslinjen räknades jaccard index ut enbart för produkternas namn och klassificeringen gjordes genom att använda ett tröskelvärde för jaccard indexet. Resultatet mättes genom noggrannhet, F-measure och AUC. Random Forest presterade bäst sett till alla prestationsmått och logistisk regression, Random Forest och Support Vector Machines gav alla bättre resultat än baslinjen. Product matching Machine Learning Random Forest Logistic Regression Support Vector Machines Produktmatchning maskininlärning Random Forest logistisk regression Support Vector Machines Probability Theory and Statistics Sannolikhetsteori och statistik
793	Modelling Non-life Insurance Policyholder Price Sensitivity : A Statistical Analysis Performed with Logistic Regression / Modellering av priskänslighet i sakförsäkring Hardin, Patrik, Tabari, Sam January 2017 (has links) This bachelor thesis within mathematical statistics studies the possibility of modelling the renewal probability for commercial non-life insurance policyholders. The project was carried out in collaboration with the non-life insurance company If P&C Insurance Ltd. at their headquarters in Stockholm, Sweden. The paper includes an introduction to underlying concepts within insurance and mathematics and a detailed review of the analytical process followed by a discussion and conclusions. The first stages of the project were the initial collection and processing of explanatory insurance data and the development of a logistic regression model for policy renewal. An initial model was built and modern methods of mathematics and statistics were applied in order obtain a final model consisting of 9 significant characteristics. The regression model had a predictive power of 61%. This suggests that it to a certain degree is possible to predict the renewal probability of non-life insurance policyholders based on their characteristics. The results from the final model were ultimately translated into a measure of price sensitivity which can be implemented in both pricing models and CRM systems. We believe that price sensitivity analysis, if done correctly, is a natural step in improving the current pricing models in the insurance industry and this project provides a foundation for further research in this area. / Detta kandidatexamensarbete inom matematisk statistik undersöker möjligheten att modellera förnyelsegraden för kommersiella skadeförsärkringskunder. Arbetet utfördes i samarbete med If Skadeförsäkring vid huvudkontoret i Stockholm, Sverige. Uppsatsen innehåller en introduktion till underliggande koncept inom försäkring och matematik samt en utförlig översikt över projektets analytiska process, följt av en diskussion och slutsatser. De huvudsakliga delarna av projektet var insamling och bearbetning av förklarande försäkringsdata samt utvecklandet och tolkningen av en logistisk regressionsmodell för förnyelsegrad. En första modell byggdes och moderna metoder inom matematik och statistik utfördes för att erhålla en slutgiltig regressionsmodell uppbyggd av 9 signifikanta kundkaraktäristika. Regressionsmodellen hade en förklaringsgrad av 61% vilket pekar på att det till en viss grad är möjligt att förklara förnyelsegraden hos försäkringskunder utifrån dessa karaktäristika. Resultaten från den slutgiltiga modellen översattes slutligen till ett priskänslighetsmått vilket möjliggjorde implementering i prissättningsmodeller samt CRM-system. Vi anser att priskänslighetsanalys, om korrekt genomfört, är ett naturligt steg i utvecklingen av dagens prissättningsmodeller inom försäkringsbranschen och detta projekt lägger en grund för fortsatta studier inom detta område. Mathematical Statistics Regression Analysis Logistic Regression Generalized Linear Model Insurance Pricing Price Sensitivity Data Analysis Matematisk Statistik Regression Logistisk Regression Försäkringsprissättning Priskänslighet Dataanalys Computational Mathematics Beräkningsmatematik
794	How to identify downturns within an office submarke : A quantitative time series analysis of Stockholm CBD / Hur man identifierar nedgångar inom en kontorsmarknad Palmquist, Jacob January 2018 (has links) The last couple of years there has been a significant increase in demand of attractive office locations in Stockholm consequently leading to all-time low office prime yields within the Central Business District (CBD), indicating warning signals regarding an overheated submarket. As the real estate market is crucial for the economy as a whole, it is essential to improve the understanding and predictability of future real estate cycles. This study produced three different logistic regression models with the purpose of identifying downturns in the office market of Stockholm CBD. The most successful model were able to predict 74 % of the actual downturns occurring throughout 114 observed quarters between Q3 1989 and Q4 2017. The dependent downturn variable consist of prime yield explained by variables on a national basis combined with submarket specific variables. Another produced model contained variables regarding confidence and expectations of tenants in Stockholm. However that model was unsatisfactory, leading to this study’s suggestion of further research on fluctuations of demand related to the current characteristics of Stockholm CBD. / Under de senaste åren har det skett en betydande ökning av efterfrågan på attraktiva kontorslokaler i Stockholm vilket resulterat i rekordlåga direktavkastningskrav inom Stockholm Central Business District (CBD), vilket indikerar på varningssignaler avseende en överhettad delmarknad. Eftersom fastighetsmarknaden är avgörande för ekonomin som helhet är det viktigt att förbättra förståelsen och förutsägbarheten för framtida fastighetscykler. Denna studie producerade tre olika logistiska regressionsmodeller med syfte att identifiera nedgångar i kontorsmarknaden inom Stockholm CBD. Den mest framgångsrika modellen kunde förutse 74 % av de faktiska nedgångarna som inträffade under 114 observerade kvartal mellan Q3 1989 och Q4 2017. Den beroende nedgångsvariabeln består av prime yield som förklaras av variabler på nationell basis i kombination med delmarknadsspecifika variabler. En annan producerad modell innehöll variabler avseende förtroende och förväntningar hos hyresgäster i Stockholm. Denna modell var dock otillfredsställande, vilket ledde till att denna studie föreslog ytterligare forskning om fluktuationer i efterfrågan relaterade till de nuvarande egenskaperna hos Stockholms centralbank Forecasting Logistic regression Real estate cycles Office market Yield Stockholm CBD Prognoser Logistisk regression Fastighetscykler Kontorsmarknad Direktavkastningskrav Stockholm CBD Engineering and Technology Teknik och teknologier
795	Sexual Dimorphism Of The Posterior Pelvis Of The Robert J. Terry Anatomical Collection And The William M. Bass Donated Skeletal Collection Novak, Lauren M. 01 January 2010 (has links) Studies of sexual dimorphism of the sacrum have generally been conducted as part of broader population research or on living persons and cadavers, making the anthropological literature sparse. The greater sciatic notch and the preauricular sulcus of the ilium have both been found to show sexual dimorphism, although studies of these traits often have ambiguous definitions of characteristics and lack the standardization of measurements. This research was designed to reexamine and test the accuracy of standard scoring systems and measurements of the posterior pelvis used to determine sex and to establish new formulae combining traits and measurements to accurately determine sex using logistic regression analysis. A series of metric measurements and morphological scores were recorded for 104 males and 106 females of both European- and African-American ancestry from the William M. Bass and Terry Collections. In order to reexamine previous research conducted on the posterior pelvis, standard ratios of metric measurements were analyzed to determine ranges and cut-off values for males and females in this sample. The ratio of ala width to the maximum transverse diameter of the sacral base and the ratio of the length and width of the sciatic notch have proven to be the most useful ratios in sex determination, though not as accurate as the formulae created using logistic regression. These data were also analyzed in SPSS using logistic regression to assess the usefulness of metric measurements and morphological scores of the posterior pelvis in sex determination. Using stepwise logistic regression, a combination of traits for both the sacrum and posterior ilium that are the most reliable and accurate for sex determination have been determined. The values for these selected traits can be incorporated into the log odds formulas which will classify an individual as male or female. The ultimate goal of this research was to provide physical anthropologists with iii logistic regression equations that can be used to estimate the sex of the posterior ilium and sacrum. Two equations ranging in accuracy from 79-84% were developed to determine sex of the posterior pelvis. Bass William M. -- 1928- Ilium -- Sex differences Logistic regression analysis Sacrum -- Sex differences Terry Robert J. -- (Robert James) -- 1871-1966 Anthropology
796	Florida School Indicator Report Data As Predictors Of High School Adequate Yearly Progress (ayp) Carr, John D 01 January 2011 (has links) The focus of this research was to identify variables reported in the 2008-2009 Florida School Indicator Report (FSIR) that had a statistical impact, positive or negative, on the likelihood that a school would achieve Adequate Yearly Progress (AYP) in reading or mathematics using the logistic regression technique. This study analyzed four broad categories reported by the FSIR to include academic, school, student, and teacher characteristics. FSIR and AYP data was collected for 468 Florida high schools that were categorized by the Florida Department of Education as presenting a comprehensive curriculum to grades 9-12 or grades 10-12. It was determined in this study that academic data associated with ACT results and the grade 11 FCAT Science were effective predictors of a school’s academic health in reading and mathematics. Student absenteeism showed the greatest impact on a school obtaining AYP in reading while the percentage of students qualifying for free and disabled populations within a school showed the greatest impact on a school obtaining AYP in mathematics. Teachers teaching out of field were identified as having a negative influence on AYP in reading and mathematics while a teacher’s experience was considered a positive influence on AYP in mathematics only. Further research is necessary to fully explore the use of logistic regression as a predictive tool at the state, school district, and school level. Academic achievement -- Florida Educational accountability -- Florida High schools -- Florida Logistic regression analysis -- Florida Mathematics -- Florida Reading -- Florida Education
797	Geological and Geochemical Controls on Non-Tuberculous Mycobacterium Transmission: Examples from Hawaii Robinson, Schuyler Thomas 01 June 2019 (has links) The opportunistic environmental microbes, non-tuberculous Mycobacterium (NTM), pose an increasing risk of disease and death in both immunodeficient and immunocompetent individuals in the USA and across the world. NTM lung disease is particularly prevalent in Hawaii, although the modes of NTM acquisition and transport in Hawaii are not fully understood. This study evaluated 149 soil and 50 water samples across the Hawaiian Islands to determine geochemical factors controlling NTM. Non-metric multidimensional scaling (NMDS) and principal component analyses (PCA) of modern soils show variables such as Total Organic Carbon (TOC), pH, P, mafic silicate minerals, and Pb seem to control NTM presence and transition metals and oxides such as TiO2, Zr, and Nb seem to control the absence perhaps due to toxicity. Logistic regression modeling coupled with Kolmogorov-Smirnov testing supported that TOC and P could be used to explain the probability of NTM presence in modern soils. Kolmogorov-Smirnov, non-metric multidimensional scaling, and principal components analysis results suggest poor predictability of NTM presence in soils when evaluating mineralogy alone. The same statistical methods indicated that transition metals appeared to control NTM presence in stream water and major cations and anions seemed to control NTM absence. However, additional bacterial stream data is needed to strengthen this finding. Additionally, an Oahu source water assessment and protection groundwater model was refined by including stream discharge data, including losses to the aquifer. NTM inhabits many environmental niches, although little is understood regarding the transport of NTM from the environment to indoor plumbing. However, transport from surface water to water-supply aquifers is likely important. This study analyzes groundwater flow from stream losses as a mechanism of NTM transport to water supplies. An updated MODFLOW groundwater model was developed for the north-east Oahu, Waimea River drainage. Results show hundreds of meters of lateral and tens of meters of vertical transport of NTM in 1-3 months. Additionally, geochemical modeling with Geochemist’s Workbench showed Fe oxy/hydroxides oversaturated in 100% of streams. Fe oxy/hydroxide affixed to NTM would potentially satisfy NTM’s preference for attachment and allow for colloidal transport through the aquifer. Mycobacteria soil chemistry logistic regression particle tracking Oahu Kilauea NTM HVO groundwater model disease PCA NMDS Life Sciences Microbiology Physical Sciences and Mathematics
798	Customer acquisition and onboarding at an online grocery company Borg, Ida January 2022 (has links) The master thesis is carried out in a collaboration with a Swedish online grocery company. The goal of the thesis is to investigate if it is possible to explain the underlying factors that affect new customers to be retained. Because of the difficulties of defining churn and retention in non-contractual settings, most of the literature is focused on contractual and subscription settings. There are a limited number of studies when trying to predict customer churn in non-contractual businesses and even fewer studies that emphasize retention. This thesis aims to contribute to the field of retention in non-contractual business and also highlight the assumptions and drawbacks of churn-related task. To achieve the goal of the thesis a literature review is carried out together with two statistical learning approaches; logistic regression model and extreme gradient boosting model. The results shows that it is possible to find the underlying factors that drive customers to be retained. The greatest drivers that could increase the probability of retaining new customers are the days between the first and second order, the second order value, and the total order value. / Examensarbetet är genomfört som ett samarbete med ett svenskt matvaruföretag på nätet. Målet med examensarbetet är att undersöka om det är möjligt att förklara de bakomliggande faktorer som påverkar nya kunder att stanna kvar som kunder. På grund av svårigheterna med att definiera kundbortfall och bibehållande av kunder i icke-kontraktuella affärer fokuserar den mesta av litteraturen på avtals- och prenumerationsmiljöer. Det finns ett begränsat antal studier där man försöker förutsäga kundbortfall i icke-kontraktuella verksamheter och ännu färre studier som fokuserar på bibehållande av kunder. Denna uppsats syftar till att bidra till området bibehållande av kunder i icke-kontraktuella affärer och även belysa antagandena och nackdelarna med analyser inom kundbortfall. För att uppnå målet med avhandlingen genomförs en litteraturgenomgång tillsammans med två statistiska lärandemetoder; logistisk regressionsmodell och extreme gradient boosting model. Resultaten visar att det är fullt möjligt att hitta de bakomliggande faktorerna som driver kunderna att stanna kvar. De största drivkrafterna som kan öka sannolikheten för att kunder ska bibehållas är dagarna mellan första och andra ordern, andra ordervärdet och det totala ordervärdet. retention churn customer acquisition customer onboarding logistic regression extreme gradient boosting model bibehållande av kunder kundbortfall kundförvärv kundonboarding logistisk regression exteme gradient boosting model Mathematics Matematik
799	Credit scoring using Logistic regression Hara Khanam, Iftho January 2023 (has links) In this thesis, we present the use of logistic regression method to develop a credit scoring modelusing the raw data of 4447 customers of a bank. The data of customers is collected under 14independent explanatory variables and 1 default indicator. The objective of this thesis is toidentify optimal coefficients. In order to clean data, the raw data set was put through variousdata calibration techniques such as Kurtosis, Skewness, Winsorization to eliminate outliers.On this winsorized dataset, LOGIT analysis is applied in two rounds with multiple statisticaltests. These tests aim to estimate the significance of each independent variable and modelfitness. The optimal coefficients can be used to obtain the credit scores for new customers witha new data set and rank them according to their credit risk. Kurtosis Skewness Winsorization Logistic regression analysis Maximum likelihood estimation Newton–Raphson method T–ratio test P-value LR test Probability Theory and Statistics Sannolikhetsteori och statistik
800	Deep Learning-Based Approach for Fusing Satellite Imagery and Historical Data for Advanced Traffic Accident Severity Sandaka, Gowtham Kumar, Madhamsetty, Praveen Kumar January 2023 (has links) Background. This research centers on tackling the serious global problem of trafficaccidents. With more than a million deaths each year and numerous injuries, it’svital to predict and prevent these accidents. By combining satellite images and dataon accidents, this study uses a mix of advanced learning methods to build a modelthat can foresee accidents. This model aims to improve how accurately we predictaccidents and understand what causes them. Ultimately, this could lead to betterroad safety, smoother maintenance, and even benefits for self-driving cars and insurance. Objective.The objective of this thesis is to create a predictive model that improvesthe accuracy of traffic accident severity forecasts by integrating satellite imagery andhistorical accident data and comparing this model with stand-alone data models.Through this hybrid approach, the aim is to enhance prediction precision and gaindeeper insights into the underlying factors contributing to accidents, thereby potentially aiding in the reduction of accidents and their resulting impact. Method.The proposed method involves doing a literature review to find currentimage recognition models and then experimentation by training a Logistic Regression, Random Forest, SVM classifier, VGG19, and the hybrid model using the CNNand VGG19 and then comparing their performance using metrics mentioned in thethesis work. Results.The performance of the proposed method is evaluated using various metrics, including precision, recall, F1 score, and confusion matrix, on a large datasetof labeled images. The results indicate that a high accuracy of 81.7% is achieved indetecting traffic accident severity through our proposed approach where the modelbuilt on individual structural data and image data got an accuracy of 58.4% and72.5%. The potential utilization of our proposed method can detect safe and dangerous locations for accidents. Conclusion.The predictive modeling of Traffic accidents are performed using thethree different types of datasets which are structural data, satellite images, and acombination of both. The finalized architectures are an SVM classifier, VGG19, anda hybrid input model using CNN and VGG19. These models are compared in orderto find the best-performing approach. The results indicate that our hybrid modelhas the best accuracy with 81.7% indicating a strong performance by the model. Traffic accidents Satellite imagery Logistic Regression Random Forest SVC VGG19 hybrid model CNN self-driving cars best-performing algorithm literature review Computer Sciences Datavetenskap (datalogi)

Search results