Global ETD Search

741	Understanding and Exploiting commodity currencies : A Study using time series Regression / Att förstå och utnyttja råvaruvalutor : En statistisk analys baserat på tidsserieregression Dehoky, Dylan, Sikorski, Edward January 2017 (has links) This thesis within Industrial Economics and Applied Mathematics examines the term commodity currency. The thesis delves into analysing the characteristics and consequences of such a currency through a macroeconomic perspective while discussing previous studies within the matter. The applied mathematical statistics section audits the correlation between the currency and the commodities of the exporting country through a time series regression. The regression is based on the currency as the dependent variable and the commodities represent the covariates. Furthermore, a trading strategy is developed to see if a profit can be made on the foreign exchange market when looking at the commodity price movements. / Det här kandidatexamensarbetet är skrivet inom industriell ekonomi och tillämpad matematik och granskar termen råvaruvaluta (commodity currency). Uppsatsen analyserar, utifrån ett makroekonomiskt perspektiv, karaktärsdragen och konsekvenserna av en sådan valuta, samtidigt som den diskuterar tidigare studier inom ämnet. Delen inom tillämpad matematik undersöker korrelationen mellan valutan och råvarorna som landet exporterar genom en tidsserieregression. Regressionen är baserad på valutan som responsvariabel samtidigt som råvarorna representerar kovariaterna. Den färdiga modellen används sedan i en handelsstrategi som försöker förutspå växelkursens rörelser genom att titta på råvarornas rörelser. Commodity currencies regression analysis time series regression Dutch disease and trading strategy Computational Mathematics Beräkningsmatematik
742	The Relationship Between Internet Connectivity and Labor Productivity : A study on the correlation between Internet connectivity and labor productivity in the European Union Agbakwuru, Blaise, Jiang, Ruiyang January 2022 (has links) The level of labor productivity differs among the European Union countries, especially when you compare a developing country to a more developed country in the EU. This is an issue because the achievement of high labor productivity is a necessary stipulation for a developing economy to realize economic growth and more economic development. On the other hand, the more individuals in an economy with access to the internet (internet connectivity) depicts how developed the economy is in terms of information and communication technology (ICT). Accordingly, the purpose of this paper is to ascertain whether there is a positive relationship between countries having high internet connectivity and labor productivity in the EU. In doing so, Political and entrepreneurial decision-makers can use these findings to decide how much attention or budget to put on the ICT sector to improve labor productivity. To understand the factors that affect labor productivity, Adam Smith and Karl Marx’s theory on labor productivity is used to gain a better understanding. A panel data analysis using a fixed-effect model and pooled OLS regression model is applied in the study to predict the relationship. The result of the study indicates that internet connectivity does not have a significant impact on Labour productivity, or there was not enough evidence showing that they are positively correlated with each other. Labor Productivity Internet Connectivity Fixed Effect Regression Model Pooled OLS Regression Model European Union Economics Nationalekonomi
743	Automated Regression Testing Approach To Expansion And Refinement Of Speech Recognition Grammars Dookhoo, Raul 01 January 2008 (has links) This thesis describes an approach to automated regression testing for speech recognition grammars. A prototype Audio Regression Tester called ART has been developed using Microsoft's Speech API and C#. ART allows a user to perform any of three tasks: automatically generate a new XML-based grammar file from standardized SQL database entries, record and cross-reference audio files for use by an underlying speech recognition engine, and perform regression tests with the aid of an oracle grammar. ART takes as input a wave sound file containing speech and a newly created XML grammar file. It then simultaneously executes two tests: one with the wave file and the new grammar file and the other with the wave file and the oracle grammar. The comparison result of the tests is used to determine whether the test was successful or not. This allows rapid exhaustive evaluations of additions to grammar files to guarantee forward process as the complexity of the voice domain grows. The data used in this research to derive results were taken from the LifeLike project. However, the capabilities of ART extend beyond LifeLike. The results gathered have shown that using a person's recorded voice to do regression testing is as effective as having the person do live testing. A cost-benefit analysis, using two published equations, one for Cost and the other for Benefit, was also performed to determine if automated regression testing is really more effective than manual testing. Cost captures the salaries of the engineers who perform regression testing tasks and Benefit captures revenue gains or losses related to changes in product release time. ART had a higher benefit of $21461.08 when compared to manual regression testing which had a benefit of $21393.99. Coupled with its excellent error detection rates, ART has proven to be very efficient and cost-effective in speech grammar creation and refinement. regression testing speech recognition audio regression tester ART Computer Sciences Engineering
744	Distributionally Robust Learning under the Wasserstein Metric Chen, Ruidi 29 September 2019 (has links) This dissertation develops a comprehensive statistical learning framework that is robust to (distributional) perturbations in the data using Distributionally Robust Optimization (DRO) under the Wasserstein metric. The learning problems that are studied include: (i) Distributionally Robust Linear Regression (DRLR), which estimates a robustified linear regression plane by minimizing the worst-case expected absolute loss over a probabilistic ambiguity set characterized by the Wasserstein metric; (ii) Groupwise Wasserstein Grouped LASSO (GWGL), which aims at inducing sparsity at a group level when there exists a predefined grouping structure for the predictors, through defining a specially structured Wasserstein metric for DRO; (iii) Optimal decision making using DRLR informed K-Nearest Neighbors (K-NN) estimation, which selects among a set of actions the optimal one through predicting the outcome under each action using K-NN with a distance metric weighted by the DRLR solution; and (iv) Distributionally Robust Multivariate Learning, which solves a DRO problem with a multi-dimensional response/label vector, as in Multivariate Linear Regression (MLR) and Multiclass Logistic Regression (MLG), generalizing the univariate response model addressed in DRLR. A tractable DRO relaxation for each problem is being derived, establishing a connection between robustness and regularization, and obtaining upper bounds on the prediction and estimation errors of the solution. The accuracy and robustness of the estimator is verified through a series of synthetic and real data experiments. The experiments with real data are all associated with various health informatics applications, an application area which motivated the work in this dissertation. In addition to estimation (regression and classification), this dissertation also considers outlier detection applications. Statistics Distributionally robust optimization Grouped variable selection Health informatics Multivariate linear regression Regression Classification Wasserstein metric
745	Tree-Based Methods and a Mixed Ridge Estimator for Analyzing Longitudinal Data With Correlated Predictors Eliot, Melissa Nicole 01 September 2011 (has links) Due to recent advances in technology that facilitate acquisition of multi-parameter defined phenotypes, new opportunities have arisen for predicting patient outcomes based on individual specific cell subset changes. The data resulting from these trials can be a challenge to analyze, as predictors may be highly correlated with each other or related to outcome within levels of other predictor variables. As a result, applying traditional methods like simple linear models and univariate approaches such as odds ratios may be insufficient. In this dissertation, we describe potential solutions including tree-based methods, ridge regression, mixed modeling, and a new estimator called a mixed ridge estimator with expectation-maximization (EM) algorithm. Data examples are provided. In particular, flow cytometry is a method of measuring a large number of particle counts at once by suspending them in a fluid and shining a beam of light onto the fluid. This is specifically relevant in the context of studying human immunodeficiency virus (HIV), where there exists a great potential to draw from the rich array of data on host cell-mediated response to infection and drug exposures, to inform and discover patient level determinants of disease progression and/or response to anti-retroviral therapy (ART). The data sets collected are often high dimensional with correlated columns, which can be challenging to analyze. We demonstrate the application and comparative interpretations of three tree-based algorithms for the analysis of data arising from flow cytometry in the first chapter of this manuscript. Specifically, we consider the question of what best predicts CD4 T-cell recovery in HIV-1 infected persons starting antiretroviral therapy with CD4 count between 200-350 cell/μl. The tree-based approaches, namely, classification and regression trees (CART), random forests (RF) and logic regression (LR), were designed specifically to uncover complex structure in high dimensional data settings. While contingency table analysis and RFs provide information on the importance of each potential predictor variable, CART and LR offer additional insight into the combinations of variables that together are predictive of the outcome. Specifically, application of tree-based methods to our data suggest that a combination of baseline immune activation states, with emphasis on CD8 T cell activation, may be a better predictor than any single T cell/innate cell subset analyzed. In the following chapter, tree-based methods are compared to each other via a simulation study. Each has its merits in particular circumstances; for example, RF is able to identify the order of importance of predictors regardless of whether there is a tree-like structure. It is able to adjust for correlation among predictors by using a machine learning algorithm, analyzing subsets of predictors and subjects over a number of iterations. CART is useful when variables are predictive of outcome within levels of other variables, and is able to find the most parsimonious model using pruning. LR also identifies structure within the set of predictor variables, and nicely illustrates relationship among variables. However, due to the vast number of combinations of predictor variables that would need to be analyzed in order to find the single best LR tree, an algorithm is used that only searches a subset of potential combinations of predictors. Therefore, results may be different each time the algorithm is used on the same data set. Next we use a regression approach to analyzing data with correlated predictors. Ridge regression is a method of accounting for correlated data by adding a shrinkage component to the estimators for a linear model. We perform a simulation study to compare ridge regression to linear regression over various correlation coefficients and find that ridge regression outperforms linear regression as correlation increases. To account for collinearity among the predictors along with longitudinal data, a new estimator that combines the applicability of ridge regression and mixed models using an EM algorithm is developed and compared to the mixed model. We find from a simulation study comparing our mixed ridge (MR) approach with a traditional mixed model that our new mixed ridge estimator is able to handle collinearity of predictor variables better than the mixed model, while accounting for random within-subject effects that regular ridge regression does not take into account. As correlation among predictors increases, power decreases more quickly for the mixed model than MR. Additionally, type I error rate is not significantly elevated when the MR approach is taken. The MR estimator gives us new insight into flow cytometry data and other data sets with correlated predictor variables that our tree-based methods could not give us. These methods all provide unique insight into our data that more traditional methods of analysis do not offer. CART Flow cytometry Logic Regression Mixed model Random Forest Ridge regression Biostatistics
746	Simulating the future of the Ifugao rice terraces through observations of the past / 過去の観測を踏まえたイフガオ棚田の将米予測 Estacio, Ian 25 September 2023 (has links) 京都大学 / 新制・課程博士 / 博士(地球環境学) / 甲第24954号 / 地環博第245号 / 新制\|\|地環\|\|49(附属図書館) / 京都大学大学院地球環境学舎環境マネジメント専攻 / (主査)教授星野敏, 准教授鬼塚健一郎, 教授伊藤孝行 / 学位規則第4条第1項該当 / Doctor of Global Environmental Studies / Kyoto University / DFAM Remote Sensing GIS Spatial regression Logistic regression Agent-based modeling Geomatics Land use/cover change 450
747	Modeling yield and aboveground live tree carbon dynamics in oak-gum-cypress bottomland hardwood forests Aryal, Suchana 12 May 2023 (has links) (PDF) The importance of bottomland hardwood (BLH) forests to support the economy through timber production and carbon sequestration is acknowledged; however, their full potential is yet to be explored. This study developed variable density yield models for BLH oak-gum-cypress forests along the US Gulf Coast and Lower Mississippi River Delta. The models, with an adjusted R2 of 98% for cubic foot growing stock volume and 77% for Doyle board foot sawlog volume, are expected to be valuable tools for landowners and managers seeking to make informed decisions about BLH forest management. A carbon stock model was also developed, and carbon sequestration was explored based on basal area increment. The results showed potential for carbon sequestration with an average carbon stock of 30.56 tons/acre and a maximum average discounted present value of carbon accumulation of $15.94/ton/acre/year. This provides valuable information to managers and landowners willing to participate in carbon credit markets. growth and yield fuzzy linear regression multiple linear regression bottomland hardwoods aboveground carbon Forest Sciences Life Sciences
748	Index replication within Corporate Investment Grade - With implementation of Lasso regression in order to analyze the impact of key figures / Replikering av index inom Corporate Investment Grade - Med implementering av Lasso regression för att analysera effekterna av nyckeltal Faiqi, Shaida January 2021 (has links) The fixed income market is not as exploited as other markets and has a more complex structure compared with the equity market. On the other hand, it has been seen that demand for research for the fixed income market has increased, which in turn has created greater interest in studying the characteristics of holdings in the market. This work studies whether it is possible to replicate indices through requirements for credit rating, sectors and mathematical key figures such as Duration, convexity, duration time spread (DTS) and option adjusted spread (OAS). Replication is made through linear programming in the program Python. By implementing lasso regression, this study examines whether it is possible to exceed the return by reducing the requirements for key figures that are not selected efter selection of variables in the regression. The investment company Alfred Berg has provided relevant data for this report. The data consists of information on all assets included in the index EUR Investment grade (ER00) over the period 2017-2021. The result of the replication follows the index returns, with small deviations, and the lasso regression selects the key figures DTS and OAS in its model. It is difficult to excess index return by focusing only on the key figures DTS and OAS. Analysis of other key figures and variables selected by the lasso regression can possibly create better results, as a suggestion for further work. / Räntemarknaden är inte lika exploaterad som andra marknader och har en mer komplex struktur jämfört med aktiemarknaden. Däremot har man sett att efterfrågan på forskning för räntemarknaden har ökat, vilket i sin tur skapat ett större intresse att studera egenskaperna av innehaven på marknaden. Detta arbete studerar om det går att replikera index genom krav på credit rating, sektor och matematiska nyckeltal som Duration, convexity, duration times spread (DTS) och option adjusted spread (OAS). Replikeringen sker genom linjär programmering i programmet Python. Genom att implementera Lasso regression undersöker detta arbete även om det går att överträffa vakastningen genom att minska kraven på nyckeltal som inte väljts ut efter urval av variabler i regressionen. Investmentbolaget Alfred Berg har bidragit med data för denna rapport. Datan består av information om alla tillgångar som ingår i indexet EUR Investment Grade (ER00) under perioden 2017-2021. Resultatet visar att replikeringen av index är möjlig, med små avvikelser, och lasso regressionen väljer nyckeltalen DTS och OAS i sin modell. Det är svårt att överträffa index genom att endast fokusera på nyckeltalen DTS och OAS. Analys av andra nyckeltal och variabler som väljs ut av lasso regressionen kan skapa ett bättre resultat. Lasso Regression Linear Programming Index Tracking Lasso regression linjär programmering index replikering Other Mathematics Annan matematik
749	A Multi-Variate Regression Analysis on Telecommunication Sites in a Sub-Saharan Country / En regressionsanalys i flera variabler på telekommunikationsmaster i ett land i subsahariska Afrika Berisha, Elza, Holma, Hampus January 2023 (has links) The purpose of this bachelor thesis is to investigate how different variables impact voice and data traffic for a telecom operator that operates in an undisclosed Sub-Saharan African country. The data has been provided by said company. The models, generated by using multivariate linear regression analysis, have a high explanatory power, as evidenced by high coefficients of determination. However, it is important to recognize the persistence of certain systematic issues, which are most likely due to the absence of key explanatory variables. Addressing these limitations in future research efforts will lead to a more comprehensive understanding of the subject and more robust findings to determine which factors drive voice and data traffic. In the report, the telecommunication sites are segmented based on generated income. Two segmentation models were created to categorize sites based on their data and voice revenue quartiles. A color matrix was used to depict the results. The hypothesis that nearby sites are more likely to perform similarly was tested using a quartile-based scoring method. The regression analysis uncovered significant variables and revealed information about the relationship between various factors and data and voice traffic. The regression residuals were analyzed using qualitative cluster analysis, which revealed distinct clustering patterns. Overall, the study provides useful insights into data and voice traffic segmentation and performance analysis in the analyzed region. / Syftet med detta kandidatarbete är att undersöka hur olika variabler påverkar röst- och datatrafik för en telekom-operatör som är verksam i ett Subsahariskt afrikanskt land. Studien använder sig av linjär regressionsanalys för att utveckla modeller som visar med en bra förklaringsgrad. Förklaringsgraden visas genom höga determinationskoefficienter. Men, trots ett bra resultat är det viktigt att ta hänsyn till systematiska problem hos modellerna. problemen beror troligtvis på att viktiga förklarande variabler saknas i datan. Framtida forskningsinsatse bör därför sträva efter att åtgärda dessa begränsningar, och på så sätt uppnå en mer omfattande förståelse av ämnet och mer korrekt resultat. I rapporten segmenteras telekommunikationsmasterna baserat på genererad inkomst. Två segmenteringsmodeller har utvecklats för att kategorisera masterna enligt deras kvartiler för data- och röstintäkter. Resultaten visas visuellt med hjälp av en färgmatris. Dessutom prövades hypotesen att närliggande master uppvisar liknande prestanda med hjälp av en kvartilsbaserad poängmetod. Regressionsanalysen identifierar signifikanta variabler och ger insikter i relationen mellan olika faktorer mellan data- och rösttrafik. Vidare upptäcks, via kvalitativ klusteranalys av regressionsresterna, tydliga klustringsmönster i resultatet. Sammantaget ger denna studie värdefulla insikter i data- och rösttrafiksegmentering samt prestandaanalys i den analyserade regionen. telecommunication linear regression segmentation cluster analysis telekommunikation linjär regression segmentering klusteranalys Probability Theory and Statistics Sannolikhetsteori och statistik
750	An Investigation of How Well Random Forest Regression Can Predict Demand : Is Random Forest Regression better at predicting the sell-through of close to date products at different discount levels than a basic linear model? Jonsson, Estrid, Fredrikson, Sara January 2021 (has links) Allt eftersom klimatkrisen fortskrider ökar engagemanget kring hållbarhet inom företag. Växthusgaser är ett av de största problemen och matsvinn har därför fått mycket uppmärksamhet sedan det utnämndes till den tredje största bidragaren till de globala utsläppen. För att minska sitt bidrag rabatterar många matbutiker produkter med kort bästföredatum, vilket kommit att kräva en förståelse för hur priskänslig efterfrågan på denna typ av produkt är. Prisoptimering görs vanligtvis med så kallade Generalized Linear Models men då efterfrågan är ett komplext koncept har maskininl ärningsmetoder börjat utmana de traditionella modellerna. En sådan metod är Random Forest Regression, och syftet med uppsatsen är att utreda ifall modellen är bättre på att estimera efterfrågan baserat på rabattnivå än en klassisk linjär modell. Vidare utreds det ifall ett tydligt linjärt samband existerar mellan rabattnivå och efterfrågan, samt ifall detta beror av produkttyp. Resultaten visar på att Random Forest tar bättre hänsyn till det komplexa samband som visade sig finnas, och i detta specifika fall presterar bättre. Vidare visade resultaten att det sammantaget inte finns något linjärt samband, men att vissa produktkategorier uppvisar svag linjäritet. / As the climate crisis continues to evolve many companies focus their development on becoming more sustainable. With greenhouse gases being highlighted as the main problem, food waste has obtained a great deal of attention after being named the third largest contributor to global emissions. One way retailers have attempted to improve is through offering close-to-date produce at discount, hence decreasing levels of food being thrown away. To minimize waste the level of discount must be optimized, and as the products can be seen as flawed the known price-to-demand relation of the products may be insufficient. The optimization process historically involves generalized linear regression models, however demand is a complex concept influenced by many factors. This report investigates whether a Machine Learning model, Random Forest Regression, is better at estimating the demand of close-to-date products at different discount levels than a basic linear regression model. The discussion also includes an analysis on whether discounts always increase the will to buy and whether this depends on product type. The results show that Random Forest to a greater extent considers the many factors influencing demand and is superior as a predictor in this case. Furthermore it was concluded that there is generally not a clear linear relation however this does depend on product type as certain categories showed some linearity. Random Forest Regression Linear Regression Food Waste Demand Prediction Computer and Information Sciences Data- och informationsvetenskap

Search results