Global ETD Search

61	PERFORMANCE EVALUATION OF UNIVARIATE TIME SERIES AND DEEP LEARNING MODELS FOR FOREIGN EXCHANGE MARKET FORECASTING: INTEGRATION WITH UNCERTAINTY MODELING Wajahat Waheed (11828201) 13 December 2021 (has links) Foreign exchange market is the largest financial market in the world and thus prediction of foreign exchange rate values is of interest to millions of people. In this research, I evaluated the performance of Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), Autoregressive Integrated Moving Average (ARIMA) and Moving Average (MA) on the USD/CAD and USD/AUD exchange pairs for 1-day, 1-week and 2-weeks predictions. For LSTM and GRU, twelve macroeconomic indicators along with past exchange rate values were used as features using data from January 2001 to December 2019. Predictions from each model were then integrated with uncertainty modeling to find out the chance of a model’s prediction being greater than or less than a user-defined target value using the error distribution from the test dataset, Monte-Carlo simulation trials and ChancCalc excel add-in. Results showed that ARIMA performs slightly better than LSTM and GRU for 1-day predictions for both USD/CAD and USD/AUD exchange pairs. However, when the period is increased to 1-week and 2-weeks, LSTM and GRU outperform both ARIMA and moving average for both USD/CAD and USD/AUD exchange pair. Thesis deep learning Foreign exchange market Time Series Forecasting LSTM GRU Uncertainty modeling Forex prediction univariate time series forecasting multivariate time series forecasting
62	Demand Forecasting of Outbound Logistics Using Neural Networks Otuodung, Enobong Paul, Gorhan, Gulten January 2023 (has links) Long short-term volume forecasting is essential for companies regarding their logistics service operations. It is crucial for logistic companies to predict the volumes of goods that will be delivered to various centers at any given day, as this will assist in managing the efficiency of their business operations. This research aims to create a forecasting model for outbound logistics volumes by utilizing design science research methodology in building 3 machine-learning models and evaluating the performance of the models . The dataset is provided by Tetra Pak AB, the World's leading food processing and packaging solutions company,. Research methods were mainly quantitative, based on statistical data and numerical calculations. Three algorithms were implemented: which are encoder–decoder networks based on Long Short-Term Memory (LSTM), Convolutional Long Short-Term Memory (ConvLSTM), and Convolutional Neural Network Long ShortTerm Memory (CNN-LSTM). Comparisons are made with the average Root Mean Square Error (RMSE) for six distribution centers (DC) of Tetra Pak. Results obtained from encoder–decoder networks based on LSTM are compared to results obtained by encoder–decoder networks based on ConvLSTM and CNN-LSTM. The three algorithms performed very well, considering the loss of the Train and Test with our multivariate time series dataset. However, based on the average score of the RMSE, there are slight differences between algorithms for all DCs. Time Series Prediction Demand Forecasting Outbound Logistics Machine Learning Deep Learning Univariate Forecasting Multivariate Forecasting Multi-Step Forecasting LSTM CNN-LSTM ConvLSTM Encoder-Decoder Design science Design science Engineering and Technology Teknik och teknologier
63	Assessment of foliar nitrogen as an indicator of vegetation stress using remote sensing : the case study of Waterberg region, Limpopo Province Manyashi, Enoch Khomotso 06 1900 (has links) Vegetation status is a key indicator of the ecosystem condition in a particular area. The study objective was about the estimation of leaf nitrogen (N) as an indicator of vegetation water stress using vegetation indices especially the red edge based ones, and how leaf N concentration is influenced by various environmental factors. Leaf nitrogen was estimated using univariate and multivariate regression techniques of stepwise multiple linear regression (SMLR) and random forest. The effects of environmental parameters on leaf nitrogen distribution were tested through univariate regression and analysis of variance (ANOVA). Vegetation indices were evaluated derived from the analytical spectral device (ASD) data, resampled to RapidEye. The multivariate models were also developed to predict leaf N. The best model was chosen based on the lowest root mean square error (RMSE) and higher coefficient of determination (R2) values. Univariate results showed that red edge based vegetation index called MERRIS Terrestrial Chlorophyll Index (MTCI) yielded higher leaf N estimation accuracy as compared to other vegetation indices. Simple ratio (SR) based on the bands red and near-infrared was found to be the best vegetation index for leaf N estimation with exclusion of red edge band for stepwise multiple linear regression (SMLR) method. Simple ratio (SR3) was the best vegetation index when red edge was included for stepwise linear regression (SMLR) method. Random forest prediction model achieved the highest leaf N estimation accuracy, the best vegetation index was Red Green Index (RGI1) based on all bands with red green index when including the red edge band. When red edge band was excluded the best vegetation index for random forest was Difference Vegetation Index (DVI1). The results for univariate and multivariate results indicated that the inclusion of the red edge band provides opportunity to accurately estimate leaf N. Analysis of variance results showed that vegetation and soil types have a significant effect on leaf N distribution with p-values<0.05. Red edge based indices provides opportunity to assess vegetation health using remote sensing techniques. / Environmental Sciences / M. Sc. (Environmental Management) Foliar nitrogen Remote sensing Red edge Vegetation index Leaf N estimation Univariate regression Multivariate regression Indicator Vegetation stress Leaf N map 581.71450968253
64	Essays on Consumption : - Aggregation, Asymmetry and Asset Distributions Bjellerup, Mårten January 2005 (has links) The dissertation consists of four self-contained essays on consumption. Essays 1 and 2 consider different measures of aggregate consumption, and Essays 3 and 4 consider how the distributions of income and wealth affect consumption from a macro and micro perspective, respectively. Essay 1 considers the empirical practice of seemingly interchangeable use of two measures of consumption; total consumption expenditure and consumption expenditure on nondurable goods and services. Using data from Sweden and the US in an error correction model, it is shown that consumption functions based on the two measures exhibit significant differences in several aspects of econometric modelling. Essay 2, coauthored with Thomas Holgersson, considers derivation of a univariate and a multivariate version of a test for asymmetry, based on the third central moment. The logic behind the test is that the dependent variable should correspond to the specification of the econometric model; symmetric with linear models and asymmetric with non-linear models. The main result in the empirical application of the test is that orthodox theory seems to be supported for consumption of both nondurable and durable consumption. The consumption of durables shows little deviation from symmetry in the four-country sample, while the consumption of nondurables is shown to be asymmetric in two out of four cases, the UK and the US. Essay 3 departs from the observation that introducing income uncertainty makes the consumption function concave, implying that the distributions of wealth and income are omitted variables in aggregate Euler equations. This implication is tested through estimation of the distributions over time and augmentation of consumption functions, using Swedish data for 1963-2000. The results show that only the dispersion of wealth is significant, the explanation of which is found in the marked changes of the group of households with negative wealth; a group that according to a concave consumption function has the highest marginal propensity to consume. Essay 4 attempts to empirically specify the nature of the alleged concavity of the consumption function. Using grouped household level Swedish data for 1999-2001, it is shown that the marginal propensity to consume out of current resources, i.e. current income and net wealth, is strictly decreasing in current resources and net wealth, but approximately constant in income. Also, an empirical reciprocal to the stylized theoretical consumption function is estimated, and shown to bear a close resemblance to the theoretical version. Aggregate consumption Aggregation Asymmetry Wealth distribution Income distribution Concavity Permanent Income Hypothesis Buffer stock saving Precautionary saving consumption saving Euler equation error correction skew-t distribution gamma distribution dispersion non-linear model durable goods nondurable goods household level data consumption function marginal propensity to consume current resources consumption expenditure life-cycle model univariate test multivariate test Sweden Economics Nationalekonomi
65	Saggi su geografia e crescita / Essays on Geography and Growth ACCETTURO, ANTONIO 21 February 2007 (has links) Si presentano un saggio empirico e due modelli teorici originali sul rapporto tra geografia economica e crescita. Nel saggio empirico si presentano alcuni fatti stilizzati sull'evoluzione della concentrazione spaziale delle attività innovative in Italia nel periodo 1971-2001. Si mostra, con metodologie non-parametriche su base markoviana, come la concentrazione spaziale sia diminuita nel tempo, con una persistenza del Core di regioni specializzate. Nel primo saggio teorico si propone un modello di crescita romeriana e localizzazione caratterizzato da costi di congestione. È possibile, in questo caso, un processo di divergenza e agglomerazione non permanente. Nel secondo saggio teorico si mostra come le predizioni principali dei modelli di geografia e crescita si estendano anche ad un modello di crescita schumpeteriana. / I present one empirical and two theoretical models on the relationship between geography and growth. in the empirical paper, I present some stylized facts on the evolution of the spatial concentration of innovative activities in Italy in the period 1971-2001. Using markov-based non parametric techniques, I show that spatial concentration decreased but regional specialization is highly persistent. in the first theoretical paper, I present a model of romerian growth and industrial location characterized by congestion costs. I show how a process of agglomeration and divergence might be reverted once trade integration deepens. in the second theoretical paper, I show how usual predictions of the geography and growth models apply to a Schumpeterian growth model. SECS-P/01: ECONOMIA POLITICA
66	Assessment of foliar nitrogen as an indicator of vegetation stress using remote sensing : the case study of Waterberg region, Limpopo Province Manyashi, Enoch Khomotšo 06 1900 (has links) Vegetation status is a key indicator of the ecosystem condition in a particular area. The study objective was about the estimation of leaf nitrogen (N) as an indicator of vegetation water stress using vegetation indices especially the red edge based ones, and how leaf N concentration is influenced by various environmental factors. Leaf nitrogen was estimated using univariate and multivariate regression techniques of stepwise multiple linear regression (SMLR) and random forest. The effects of environmental parameters on leaf nitrogen distribution were tested through univariate regression and analysis of variance (ANOVA). Vegetation indices were evaluated derived from the analytical spectral device (ASD) data, resampled to RapidEye. The multivariate models were also developed to predict leaf N. The best model was chosen based on the lowest root mean square error (RMSE) and higher coefficient of determination (R2) values. Univariate results showed that red edge based vegetation index called MERRIS Terrestrial Chlorophyll Index (MTCI) yielded higher leaf N estimation accuracy as compared to other vegetation indices. Simple ratio (SR) based on the bands red and near-infrared was found to be the best vegetation index for leaf N estimation with exclusion of red edge band for stepwise multiple linear regression (SMLR) method. Simple ratio (SR3) was the best vegetation index when red edge was included for stepwise linear regression (SMLR) method. Random forest prediction model achieved the highest leaf N estimation accuracy, the best vegetation index was Red Green Index (RGI1) based on all bands with red green index when including the red edge band. When red edge band was excluded the best vegetation index for random forest was Difference Vegetation Index (DVI1). The results for univariate and multivariate results indicated that the inclusion of the red edge band provides opportunity to accurately estimate leaf N. Analysis of variance results showed that vegetation and soil types have a significant effect on leaf N distribution with p-values<0.05. Red edge based indices provides opportunity to assess vegetation health using remote sensing techniques. / Environmental Sciences / M. Sc. (Environmental Management) Foliar nitrogen Remote sensing Red edge Vegetation index Leaf N estimation Univariate regression Multivariate regression Indicator Vegetation stress Leaf N map 581.71450968253
67	Frequency Analysis of Floods - A Nanoparametric Approach Santhosh, D January 2013 (has links) (PDF) Floods cause widespread damage to property and life in different parts of the world. Hence there is a paramount need to develop effective methods for design flood estimation to alleviate risk associated with these extreme hydrologic events. Methods that are conventionally considered for analysis of floods focus on estimation of continuous frequency relationship between peak flow observed at a location and its corresponding exceedance probability depicting the plausible conditions in the planning horizon. These methods are commonly known as at-site flood frequency analysis (FFA) procedures. The available FFA procedures can be classified as parametric and nonparametric. Parametric methods are based on the assumption that sample (at-site data) is drawn from a population with known probability density function (PDF). Those procedures have uncertainty associated with the choice of PDF and the method for estimation of its parameters. Moreover, parametric methods are ineffective in modeling flood data if multimodality is evident in their PDF. To overcome those artifacts, a few studies attempted using kernel based nonparametric (NP) methods as an alternative to parametric methods. The NP methods are data driven and they can characterize the uncertainty in data without prior assumptions as to the form of the PDF. Conventional kernel methods have shortcomings associated with boundary leakage problem and normal reference rule (considered for estimation of bandwidth), which have implications on flood quantile estimates. To alleviate this problem, focus of NP flood frequency analysis has been on development of new kernel density estimators (kdes). Another issue in FFA is that information on the whole hydrograph (e.g., time to the peak flow, volume of the flood flow and duration of the flood event) is needed, in addition to peak flow for certain applications. An option is to perform frequency analysis on each of the variables independently. However, these variables are not independent, and hence there is a need to perform multivariate analysis to construct multivariate PDFs and use the corresponding cumulative distribution functions (CDFs) to arrive at estimates of characteristics of design flood hydrograph. In this perspective, recent focus of flood frequency analysis studies has been on development of methods to derive joint distributions of flood hydrograph related variables in a nonparametric setting. Further, in real world scenario, it is often necessary to estimate design flood quantiles at target locations that have limited or no data. Regional Flood Frequency analysis (RFFA) procedures have been developed for use in such situations. These procedures involve use of a regionalization procedure for identification of a homogeneous group of watersheds that are similar to watershed of the target site in terms of flood response. Subsequently regional frequency analysis (RFA) is performed, wherein the information pooled from the group (region) forms basis for frequency analysis to construct a CDF (growth curve) that is subsequently used to arrive at quantile estimates at the target site. Though there are various procedures for RFFA, they are largely confined to only univariate framework considering a parametric approach as the basis to arrive at required quantile estimates. Motivated by these findings, this thesis concerns development of a linear diffusion process based adaptive kernel density estimator (D-kde) based methodologies for at-site as well as regional FFA in univariate as well as bivariate settings. The D-kde alleviates boundary leakage problem and also avoids normal reference rule while estimating optimal bandwidth by using Botev-Grotowski-Kroese estimator (BGKE). Potential of the proposed methodologies in both univariate and bivariate settings is demonstrated by application to synthetic data sets of various sizes drawn from known unimodal and bimodal parametric populations, and to real world data sets from India, USA, United Kingdom and Canada. In the context of at-site univariate FFA (considering peak flows), the performance of D- kde was found to be better when compared to four parametric distribution based methods (Generalized extreme value, Generalized logistic, Generalized Pareto, Generalized Normal), thirty-two ‘kde and bandwidth estimator’ combinations that resulted from application of four commonly used kernels in conjunction with eight bandwidth estimators, and a local polynomial–based estimator. In the context of at-site bivariate FFA considering ‘peakflow-flood volume’ and ‘flood duration-flood volume’ bivariate combinations, the proposed D-kde based methodology was shown to be effective when compared to commonly used seven copulas (Gumbel-Hougaard, Frank, Clayton, Joe, Normal, Plackett, and student’s-T copulas) and Gaussian kernel in conjunction with conventional as well as BGKE bandwidth estimators. Sensitivity analysis indicated that selection of optimum number of bins is critical in implementing D-kde in bivariate setting. In the context of univariate regional flood frequency analysis (RFFA) considering peak flows, a methodology based on D-kde and Index-flood methods is proposed and its performance is shown to be better when compared to that of widely used L-moment and Index-flood based method (‘regional L-moment algorithm’) through Monte-Carlo simulation experiments on homogeneous as well as heterogeneous synthetic regions, and through leave-one-out cross validation experiment performed on data sets pertaining to 54 watersheds in Godavari river basin, India. In this context, four homogeneous groups of watersheds are delineated in Godavari river basin using kernel principal component analysis (KPCA) in conjunction with Fuzzy c-means cluster analysis in L-moment framework, as an improvement over heterogeneous regions in the area (river basin) that are currently being considered by Central Water Commission, India. In the context of bivariate RFFA two methods are proposed. They involve forming site-specific pooling groups (regions) based on either L-moment based bivariate homogeneity test (R-BHT) or bivariate Kolmogorov-Smirnov test (R-BKS), and RFA based on D-kde. Their performance is assessed by application to data sets pertaining to stations in the conterminous United States. Results indicate that the R-BKS method is better than R-BHT in predicting quantiles of bivariate flood characteristics at ungauged sites, although the size of pooling groups formed using R-BKS is, in general, smaller than size of those formed using R-BHT. In general, the performance of the methods is found to improve with increase in size of pooling groups. Overall the results indicate that the D-kde always yields bona fide PDF (and CDF) in the context of univariate as well as bivariate flood frequency analysis, as probability density is nonnegative for all data points and integrates to unity for the valid range of the data. The performance of D-kde based at-site as well as regional FFA methodologies is found to be effective in univariate as well as bivariate settings, irrespective of the nature of population and sample size. A primary assumption underlying conventional FFA procedures has been that the time series of peak flow is stationarity (temporally homogeneous). However, recent studies carried out in various parts of the World question the assumption of flood stationarity. In this perspective, Time Varying Gaussian Copula (TVGC) based methodology is proposed in the thesis for flood frequency analysis in bivariate setting, which allows relaxing the assumption of stationarity in flood related variables. It is shown to be effective than seven commonly used stationary copulas through Monte-Carlo simulation experiments and by application to data sets pertaining to stations in the conterminous United States for which null hypothesis that peak flow data were non-stationary cannot be rejected. Flood Frequency Analysis At-Site Flood Frequency Analysis Nonparametric Flood Frequency Analysis Univariate Flood Frequency Analysis Bivariate Flood Frequency Analysis Regional Flood Frequency Analysis (RFFA) Time Varying Gaussian Copulas (TVGC) Floods - Frequency Analysis Hydraulic Engineering
68	Neural networks regularization through representation learning / Régularisation des réseaux de neurones via l'apprentissage des représentations Belharbi, Soufiane 06 July 2018 (has links) Les modèles de réseaux de neurones et en particulier les modèles profonds sont aujourd'hui l'un des modèles à l'état de l'art en apprentissage automatique et ses applications. Les réseaux de neurones profonds récents possèdent de nombreuses couches cachées ce qui augmente significativement le nombre total de paramètres. L'apprentissage de ce genre de modèles nécessite donc un grand nombre d'exemples étiquetés, qui ne sont pas toujours disponibles en pratique. Le sur-apprentissage est un des problèmes fondamentaux des réseaux de neurones, qui se produit lorsque le modèle apprend par coeur les données d'apprentissage, menant à des difficultés à généraliser sur de nouvelles données. Le problème du sur-apprentissage des réseaux de neurones est le thème principal abordé dans cette thèse. Dans la littérature, plusieurs solutions ont été proposées pour remédier à ce problème, tels que l'augmentation de données, l'arrêt prématuré de l'apprentissage ("early stopping"), ou encore des techniques plus spécifiques aux réseaux de neurones comme le "dropout" ou la "batch normalization". Dans cette thèse, nous abordons le sur-apprentissage des réseaux de neurones profonds sous l'angle de l'apprentissage de représentations, en considérant l'apprentissage avec peu de données. Pour aboutir à cet objectif, nous avons proposé trois différentes contributions. La première contribution, présentée dans le chapitre 2, concerne les problèmes à sorties structurées dans lesquels les variables de sortie sont à grande dimension et sont généralement liées par des relations structurelles. Notre proposition vise à exploiter ces relations structurelles en les apprenant de manière non-supervisée avec des autoencodeurs. Nous avons validé notre approche sur un problème de régression multiple appliquée à la détection de points d'intérêt dans des images de visages. Notre approche a montré une accélération de l'apprentissage des réseaux et une amélioration de leur généralisation. La deuxième contribution, présentée dans le chapitre 3, exploite la connaissance a priori sur les représentations à l'intérieur des couches cachées dans le cadre d'une tâche de classification. Cet à priori est basé sur la simple idée que les exemples d'une même classe doivent avoir la même représentation interne. Nous avons formalisé cet à priori sous la forme d'une pénalité que nous avons rajoutée à la fonction de perte. Des expérimentations empiriques sur la base MNIST et ses variantes ont montré des améliorations dans la généralisation des réseaux de neurones, particulièrement dans le cas où peu de données d'apprentissage sont utilisées. Notre troisième et dernière contribution, présentée dans le chapitre 4, montre l'intérêt du transfert d'apprentissage ("transfer learning") dans des applications dans lesquelles peu de données d'apprentissage sont disponibles. L'idée principale consiste à pré-apprendre les filtres d'un réseau à convolution sur une tâche source avec une grande base de données (ImageNet par exemple), pour les insérer par la suite dans un nouveau réseau sur la tâche cible. Dans le cadre d'une collaboration avec le centre de lutte contre le cancer "Henri Becquerel de Rouen", nous avons construit un système automatique basé sur ce type de transfert d'apprentissage pour une application médicale où l'on dispose d’un faible jeu de données étiquetées. Dans cette application, la tâche consiste à localiser la troisième vertèbre lombaire dans un examen de type scanner. L’utilisation du transfert d’apprentissage ainsi que de prétraitements et de post traitements adaptés a permis d’obtenir des bons résultats, autorisant la mise en oeuvre du modèle en routine clinique. / Neural network models and deep models are one of the leading and state of the art models in machine learning. They have been applied in many different domains. Most successful deep neural models are the ones with many layers which highly increases their number of parameters. Training such models requires a large number of training samples which is not always available. One of the fundamental issues in neural networks is overfitting which is the issue tackled in this thesis. Such problem often occurs when the training of large models is performed using few training samples. Many approaches have been proposed to prevent the network from overfitting and improve its generalization performance such as data augmentation, early stopping, parameters sharing, unsupervised learning, dropout, batch normalization, etc. In this thesis, we tackle the neural network overfitting issue from a representation learning perspective by considering the situation where few training samples are available which is the case of many real world applications. We propose three contributions. The first one presented in chapter 2 is dedicated to dealing with structured output problems to perform multivariate regression when the output variable y contains structural dependencies between its components. Our proposal aims mainly at exploiting these dependencies by learning them in an unsupervised way. Validated on a facial landmark detection problem, learning the structure of the output data has shown to improve the network generalization and speedup its training. The second contribution described in chapter 3 deals with the classification task where we propose to exploit prior knowledge about the internal representation of the hidden layers in neural networks. This prior is based on the idea that samples within the same class should have the same internal representation. We formulate this prior as a penalty that we add to the training cost to be minimized. Empirical experiments over MNIST and its variants showed an improvement of the network generalization when using only few training samples. Our last contribution presented in chapter 4 showed the interest of transfer learning in applications where only few samples are available. The idea consists in re-using the filters of pre-trained convolutional networks that have been trained on large datasets such as ImageNet. Such pre-trained filters are plugged into a new convolutional network with new dense layers. Then, the whole network is trained over a new task. In this contribution, we provide an automatic system based on such learning scheme with an application to medical domain. In this application, the task consists in localizing the third lumbar vertebra in a 3D CT scan. A pre-processing of the 3D CT scan to obtain a 2D representation and a post-processing to refine the decision are included in the proposed system. This work has been done in collaboration with the clinic "Rouen Henri Becquerel Center" who provided us with data Régularisation Surapprentissage Réseau de neurones à passe avant Réseaux de neurones convolutifs Apprentissage multi-tâches Apprentissage non supervisé Apprentissage des représentations Transfert d’apprentissage Classification Régression univariée Régression multiple Prédiction à sortie structurée Connaissances à priori Neural network Deep learning Regularization Overfitting Feedforawrd networks Convolutional networks Multi-task learning Unsupervised learning Representation learning Transfer learning Classification Univariate regression Multivariate regression Structured output prediction Prior knowledge

Search results