Global ETD Search

61	Data-driven models for e-commerce sales predictions Guseva, Liubov January 2022 (has links) Future predictions have various applications, including stock prices, house market prices, and company sales. For sales, predictions can guide future expectations and suggest ways to cut costs. Due to the value of predictions, researchers have developed plenitude of prediction algorithms. Still, many companies use simplistic prediction algorithms that fail to provide accurate results. Also, the vast number of existing algorithms makes it difficult to find the best algorithm for a specific data set. In this thesis, I predicted an e-commerce company’s future sales based on its historical trans-action data with three different models: the machine learning algorithm LSTM, a forecasting library released by Facebook called Prophet, and a model that I developed inspired by Prophet, called the Average Sales Prediction (ASP) model. I compared these models to each other and a benchmark model. The benchmark model I used is one of the simplistic algorithms that some companies currently use. It takes the mean value of the past month’s sales to predict the upcoming day. Using the mean absolute percentage error (MAPE), I found that LSTM had the best overall performance on these data, with a MAPE of 18%. The second-best performing model was ASP, which resulted in a MAPE of 26%. Finally, Prophet resulted in a MAPE of 31%. The results gauge the company’s future performance and will help them improve its sales through streamlined workloads and better warehouse and transportation planning. The models can be enhanced further for sales that, for example, depend on the weather or other external factors. Mathematical Analysis Matematisk analys
62	Approximate Bayesian Learning of Partition Directed Acyclic Graphs / Approximativ bayesiansk inlärning av partitionerade acykliska grafer Amundsson, Karl January 2016 (has links) Partition directed acyclic graphs (PDAGs) is a model whereby the conditional probability tables (CPTs) are partitioned into parts with equal probability. In this way, the number of parameters that need to be learned can be significantly reduced so that some problems become more computationally feasible. PDAGs have been shown to be connected to labeled DAGs (LDAGs) and the connection is summarized here. Furthermore, a clustering algorithm is compared to an exact algorithm for determining a PDAG. To evaluate the algorithm, we use it on simulated data where the expected result is known. / Partitionerade riktade acykliska grafer (engelska: PDAGs) är en modell där tabeller över betingade sannolikheter partitioneras i delar med lika sannolikhet. Detta gör att antalet parametrar som ska bestämmas kan reduceras, vilket i sin tur gör problemet beräkningsmässigt enklare. Ett känt samband mellan PDAGs och betecknade riktade acykliska grafer (engelska: LDAGs) sammanfattas här. Sedan jämförs en klustringsalgoritm med en algoritm som exakt bestämmer en PDAG. Klustringsalgoritmens pålitlighet kollas genom att använda den på simulerad data där det förväntade resultatet är känt. Mathematical Analysis Matematisk analys
63	Forecasting foreign exchange rates with large regularised factor models / Prediktering av valutakursförändringar med stora regulariserade faktormodeller Welander, Jesper January 2016 (has links) Vector autoregressive (VAR) models for time series analysis of high-dimensional data tend to suffer from overparametrisation as the number of parameters in a VAR model grows quadratically with the number of included predictors. In these cases, lower-dimensional structural assumptions are commonly imposed through factor models or regularisation. Factor models reduce the model dimension by projecting the observations onto a common lower-dimensional subspace, decomposing the variables into common and idiosyncratic terms, and might be preferred when predictors are highly collinear. Regularisation reduces overfitting by penalising certain features of the model estimates and might be preferred when, for example, only a few predictors are assumed important. We propose a regularised factor model where factors are constructed by projection onto a common subspace and where the transition matrices in a time series model with the resulting factors are estimated with regularisation. By the subspace estimation we hope to uncover underlying latent factors that explain the predictor dynamics and the additional penalisation is used to encourage additional sparsity and to impose a priori structural knowledge into the estimate. We investigate unsupervised and supervised subspace extraction and extend earlier results on dynamic subspace extraction. Additionally, we investigate element-wise regularisation by the ridge and lasso penalties and two extensions of the lasso penalty that encourage structural sparsity. The performance of the model is tested by forecasting log returns of exchange rates. / Vektor autoregressiva (VAR) modeller för tidsserieanalys av högdimensionell data tenderar att drabbas av överparametrisering eftersom antalet parametrar i modellerna växer kvadratiskt med antalet inkluderade prediktorer. I dessa fall används ofta lägredimensionella strukturella antaganden genom faktormodeller eller regularisering. Faktormodeller reducerar modellens dimension genom att projicera observationerna på ett lägredimensionellt underrum av gemensamma faktorer och kan föredras om prediktorerna är kollineära. Regularisering minskar överanpassning genom att bestraffa vissa egenskaper hos modellens estimerade parametrar och kan föredras när exempelvis endast ett mindre antal prediktorer antas betydande. Vi föreslår en regulariserad faktormodell där prediktorerna projiceras på ett gemensamt underrum för att skapa faktorer och där övergångsmatriserna i en tidsseriemodell med de resulterande faktorerna estimeras med en bestraffande term. Den lägredimensionella projiceringen används för att hitta latenta faktorer som beskriver dynamiken i prediktorerna och den ytterligare regulariseringen används för att premiera gleshet och a priori kunskap om modellens struktur. Både övervakade och oövervakade metoder undersöks för att estimera det gemensamma underrummet och vi generaliserar tidigare resultat om dynamisk estimering av underrum. Dessutom undersöks elementvis regularisering genom bestraffningstermerna ridge och lasso samt två varianter av lasso som premierar strukturell gleshet. Modellens prestanda testas genom att prediktera logaritmerade valutakursförändringar. Mathematical Analysis Matematisk analys
64	Optimal portfolio allocation by the martingale method in an incomplete and partially observable market / Optimal portföljallokering genom martingalmetoden i en inkomplett och partiellt observerad marknad Karlsson, Emil January 2016 (has links) In this thesis, we consider an agent who wants to maximize his expected utility of his terminal wealth with respect to the power utility by the martingale method. The assets that the agent can allocate his capital to are assumed to follow a stochastic differential equation and exhibits stochastic volatility. The stochastic volatility assumption will make the market incomplete and therefore, the martingale method will not have a unique solution. We resolve this issue by including fictitious assets that complete the market and solve the allocation problem in the completed market. From the optimal allocation in the completed market, we will adjust the drift parameter for the fictitious assets so that our allocation don't include the fictitious assets in the portfolio strategy. We consider also the case when the assets also has stochastic drift and the agent can only observe the price process, which makes the information in the market for the agent partially observable. Explicit results are presented for the full and partially observable case and a feedback solution is obtained in the full observable case when the asset and volatility are assumed to follow the Heston model. / I denna uppsats har vi en individ som vill maximera sin förväntade nytta med avseende till en potens nyttofunktion med martingal metoden. De tillgångar som agenten kan fördela sitt kapital till antas följa en stokastisk differentialekvation och dessa tillgångar antar stokastisk volatilitet. Den stokastiska volatilitet göra att marknaden blir ofullständig och implicerar att martingal metoden inte kommer anta en unik lösning. Vi löser det problemet genom att inkludera fiktiva tillgångar som kompletterar marknaden och löser allokerings problem med martingal metoden i den nya fiktiva marknaden. Från den optimala allokeringen i den fiktiva marknaden, kommer vi att justera drift parametrarna för de fiktiva tillgångarna så att vår allokering inte inkluderar dem i portföljstrategin. Vi undersöker också fallet när tillgångarna har stokastisk drift och individen kan bara observera pris processen, vilket gör informationen på marknaden för individen delvis observerbar. Resultat presenteras för de fullt och delvis observerbara fallet och en lösning erhålls i det fullt observerbara fallet när tillgången och volatiliteten antas följa en Heston modell. Mathematical Analysis Matematisk analys
65	Combining Unsupervised and Supervised Statistical Learning Methods for Currency Exchange Rate Forecasting / Valutakursprediktion med hjälp av övervakade och oövervakade statistiska inlärningsmetoder Vasiljeva, Polina January 2016 (has links) In this thesis we revisit the challenging problem of forecasting currency exchange rate. We combine machine learning methods such as agglomerative hierarchical clustering and random forest to construct a two-step approach for predicting movements in currency exchange prices of the Swedish krona and the US dollar. We use a data set with over 200 predictors comprised of different financial and macro-economic time series and their transformations. We perform forecasting for one week ahead with different parameterizations and find a hit rate of on average 53%, with some of the parameterizations yielding hit rates as high as 60%. However, there is no clear indicator that there exists a combination of the methods and parameters that outperforms all of the tested cases. In addition, our results indicate that the two-step approach is sensitive to changes in the training set. This work has been conducted at the Third Swedish National Pension Fund (AP3) and KTH Royal Institute of Technology. / I denna uppsats analyserar vi det svårlösta problemet med att prognostisera utvecklingen för en valutakurs. Vi kombinerar maskininlärningsmetoder såsom agglomerativ hierarkisk klustring och Random Forest för att konstruera en modell i två steg med syfte att förutsäga utvecklingen av valutakursen mellan den svenska kronan och den amerikanska dollarn. Vi använder över 200 prediktorer bestående av olika finansiella och makroekonomiska tidsserier samt deras transformationer och utför prognoser för en vecka framåt med olika modellparametriseringar. En träffsäkerhet på i genomsnitt 53% erhålls, med några fall där en träffsäkerhet så hög som 60% kunde observeras. Det finns emellertid ingen tydlig indikation på att det existerar en kombination av de analyserade metoderna eller parametriseringarna som är optimal inom samtliga av de testade fallen. Vidare konstaterar vi att metoden är känslig för förändringar i underliggande träningsdata. Detta arbete har utförts på Tredje AP-fonden (AP3) och Kungliga Tekniska Högskolan (KTH). Mathematical Analysis Matematisk analys
66	Predicting Hourly Residential Energy Consumption using Random Forest and Support Vector Regression : An Analysis of the Impact of Household Clustering on the Performance Accuracy / Prognostisering av timvis elförbrukning i bostäder med Random Forest och Support Vector Regression : En analys av effekterna av klustring på noggrannheten Hedén, William January 2016 (has links) The recent increase of smart meters in the residential sector has lead to large available datasets. The electricity consumption of individual households can be accessed in close to real time, and allows both the demand and supply side to extract valuable information for efficient energy management. Predicting electricity consumption should help utilities improve planning generation and demand side management, however this is not a trivial task as consumption at the individual household level is highly irregular. In this thesis the problem of improving load forecasting is ad-dressed using two machine learning methods, Support Vector Machines for regression (SVR) and Random Forest. For a customer base consisting of 187 households in Austin, Texas, pre-dictions are made on three spatial scales: (1) individual house-hold level (2) aggregate level (3) clusters of similar households according to their daily consumption proﬁle. Results indicate that using Random Forest with K = 32 clusters yields the most accurate results in terms of the coefficient of variation. In an attempt to improve the aggregate model, it was shown that by adding features describing the clusters’ historic load, the performance of the aggregate model was improved using Random Forest with information added based on the grouping into K = 3 clusters. The extended aggregate model did not outperform the cluster-based models. The work has been carried out at the Swedish company Watty. Watty performs energy disaggregation and management, allowing the energy usage of entire homes to be diagnosed in detail. / Den senaste tidens ökning av smarta elmätare inom bostadssektorn medför att vi har tillgång till stora mängder data. Hushållens totala elkonsumption är tillgänglig i nära realtid, vilket tillåter både tillgångssidan och efterfrågesidan att nyttja informationen för eﬀektiv energihantering. Att förutsäga elförbrukningen bör hjälpa elbolag att förbättra planering för elproduktion och hantering av efterfrågesidan. Dock är detta inte en trivial uppgift, då elkonsumptionen på individuell husnivå är mycket oregelbunden. Denna masteruppsats föreslår att använda två välkända maskininlärningsalgoritmer för att lösa problemet med att förbättra lastprognoser, och dessa är Support Vector Machines för regression (SVR) och Random Forest. För en kundbas bestående av 187 hushåll i Austin, Texas, gör vi prognoser baserat på tre tillvägagångssätt: (1) enskilda hushåll (2) aggregerad nivå (3) kluster av liknande hushåll enligt deras dagliga förbrukningsproﬁl. Resultaten visar att Random Forest med K = 32 kluster ger de mest precisa resultaten i termer av variationskoeﬃcienten. I ett försök att förbättra den aggegerade modellen visade det sig att genom att lägga till ytterligare prediktionsvariabler som beskriver klustrens historiska last, kunde precisionen förbättras genom att använda Random Forest med information från K = 3 olika kluster. Den förbättrade aggregerade modellen presterade inte bättre jämfört med de klusterbaserade modellerna. Arbetet har utförts vid det svenska företaget Watty. Watty utför energidisaggregering och energihantering, vilket gör att bostäders energianvändning kan analyseras i detalj. Mathematical Analysis Matematisk analys
67	Post Earnings-Announcement Drift : A Comparative Study on the Impact of Measure Calculations / Post Earnings-Announcement Drift : En jämförelsestudie av måttkonstruktionens påverkan Hagström, Hampus January 2016 (has links) This thesis investigate the phenomenon of post earnings-announcement drift where good (bad) interim reports are followed by an upward (downward) drift in stock price. The main question is whether the specific construction of the drift measure has any impact on the drift. The results show that reported earnings are more suited than earnings per share as a measure of earnings. Stock price is seen to affect the decile sorting in many of the measures and as such, using the standard deviation of past expected earnings is recommended. No definitive conclusion can be drawn about the model of unexpected earnings for the standardized unexpected earnings measure. Standardized unexpected earnings outperform earnings announcement returns in the size of drift. / Examensarbetet undersöker fenomenet “Post-Earnings Announcement Drift” där bra (dåliga) kvartalsrapporter följs av generell uppgång (nedgång) i aktiepris. Huvudfrågeställningen är om en specifik beräkningsmetod av måttet har någon inverkan på resultatet, d.v.s. storleken eller riktning på avkastningen. Resultatet visar att periodens resultat/vinst är mer lämpat i beräkningssyfte än vinst per aktie. Aktiepris visar sig ha en påverkan vid indelning av deciler i många fall, och att använda standarddeviationen som standardiserande nämnare är rekommenderat. Ingen slutgiltig slutsats kan dras rörande modellen för oväntad vinst i fallet då måttet baseras på ”Standardized Unexpected Earnings”-modellen. Nämnda modell beskriver däremot det undersökta fenomenet bättre om man ser till storleken av aktieprisutveckling. Mathematical Analysis Matematisk analys
68	Dependence Modelling and Risk Analysis in a Joint Credit-Equity Framework Backman, Fredrik January 2015 (has links) This thesis is set in the intersection between separate types of financial markets, with emphasis on joint risk modelling. Relying on empirical findings pointing toward the ex- istence of dependence across equity and corporate debt markets, a simulation framework intended to capture this property is developed. A few different types of models form building blocks of the framework, including stochastic processes describing the evolution of equity and credit risk factors in continuous time, as well as a credit rating based model, providing a mechanism for imposing dependent credit migrations and defaults for firms participating in the market. A flexible modelling framework results, proving capable of generating dependence of varying strength and shape, across as well as within studied markets. Particular focus is given to the way markets interact in the tails of the distributions. By means of simulation, it is highlighted that dependence as produced by the model tends to spread asymmetrically with simultaneously extreme outcomes occurring more frequently in lower than in upper tails. Attempts to fit the model to observed market data featuring historical stock index and corporate bond index values are promising as both marginal distributions and dependence connecting the investigated asset types appear largely replicable, although we conclude further validation remains. Mathematical Analysis Matematisk analys
69	Modeling and Forecasting Stock Index Returns using Intermarket Factor Models : Predicting Returns and Return Spreads using Multiple Regression and Classication Tingstrom, Emil January 2015 (has links) The purpose of this thesis is to examine the predictability of stock indices with regression models based on intermarket factors. The underlying idea is that there is some correlation between past price changes and future price changes, and that models attempting to capture this could be improved by including information derived from correlated assets to make predictions of future price changes. The models are tested using the daily returns from Swedish stock indices and evaluated from a portfolio perspective and their statistical signicance. Prediction of the direction of the price is also tested by Support vector machine classication on the OMXS30 index. The results indicate that there is some predictability in the market, in disagreement with the random walk hypothesis. / Syftet med denna uppsats är att undersöka förutsägbara tendenser hos aktieindex med regressionsmodeller baserade på intermarket-faktorer. The bakomliggande idén är att det existerar en viss korrelation mellan föregående prisrörelser och framtida prisrörelser, och att modeller som försöker fånga det kan förbättras genom att inkludera information från korrelerade tillgångar för att förutspå framtida prisförändringar. Modellerna testas med dagliga data på svenska aktieindex och utvärderas från ett portföljperspektiv och deras statistiska signifikans. Förutsägelser av riktningen hos priset testas också genom klassifikation med en Stödvektormaskin på OMXS30-index. Resultaten indikerar att det finns vissa förutsägbara tendenser i motsats till hypotesen om slumpmässiga aktiepriser. Mathematical Analysis Matematisk analys
70	Estimating the intrinsic dimensionality of high dimensional data / Metoder för estimering av effektiv dimension hos högdimensionella data Winiger, Joakim January 2015 (has links) This report presents a review of some methods for estimating what is known as intrinsic dimensionality (ID). The principle behind intrinsic dimensionality estimation is that frequently, it is possible to find some structure in data which makes it possible to re-express it using a fewer number of coordinates (dimensions). The main objective of the report is to solve a common problem: Given a (typically high-dimensional) dataset, determine whether the number of dimensions are redundant, and if so, find a lower dimensional representation of it. We introduce different approaches for ID estimation, motivate them theoretically and compare them using both synthetic and real datasets. The first three methods estimate the ID of a dataset while the fourth finds a low dimensional version of the data. This is a useful order in which to organize the task, given an estimate of the ID of a dataset, construct a simpler version of the dataset using this number of dimensions. The results show that it is possible to obtain a remarkable decrease in high-dimensional data. The different methods give similar results despite their different theoretical backgrounds and behave as expected when using them on synthetic datasets with known ID. / Denna rapport ger en genomgång av olika metoder för skattning av inre dimension (ID). Principen bakom begreppet ID är att det ofta är möjligt att hitta strukturer i data som gör det möjligt att uttrycka samma data på nytt med ett färre antal koordinater (dimensioner). Syftet med detta projekt är att lösa ett vanligt problem: given en (vanligtvis högdimensionell) datamängd, avgör om antalet dimensioner är överflödiga, och om så är fallet, hitta en representation av datamängden som har ett mindre antal dimensioner. Vi introducerar olika tillvägagångssätt för skattning av inre dimension, går igenom teorin bakom dem och jämför deras resultat för både syntetiska och verkliga datamängder. De tre första metoderna skattar den inre dimensionen av data medan den fjärde hittar en lägre-dimensionell version av en datamängd. Denna ordning är praktisk för syftet med projektet, när vi har en skattning av den inre dimensionen av en datamängd kan vi använda denna skattning för att konstruera en enklare version av datamängden som har detta antal dimensioner. Resultaten visar att för högdimensionell data går det att reducera antalet dimensioner avsevärt. De olika metoderna ger liknande resultat trots deras olika teoretiska bakgrunder, och ger väntade resultat när de används på syntetiska datamängder vars inre dimensioner redan är kända. Mathematical Analysis Matematisk analys

Search results