Global ETD Search

211	[en] FORECASTING AMERICAN INDUSTRIAL PRODUCTION WITH HIGH DIMENSIONAL ENVIRONMENTS FROM FINANCIAL MARKETS, SENTIMENTS, EXPECTATIONS, AND ECONOMIC VARIABLES / [pt] PREVENDO A PRODUÇÃO INDUSTRIAL AMERICANA EM AMBIENTES DE ALTA DIMENSIONALIDADE, ATRAVÉS DE MERCADOS FINANCEIROS, SENTIMENTOS, EXPECTATIVAS E VARIÁVEIS ECONÔMICAS EDUARDO OLIVEIRA MARINHO 20 February 2020 (has links) [pt] O presente trabalho traz 6 diferentes técnicas de previsão para a variação mensal do Índice da Produção Industrial americana em 3 ambientes diferentes totalizando 18 modelos. No primeiro ambiente foram usados como variáveis explicativas a própria defasagem da variação mensal do Índice da produção industrial e outras 55 variáveis de mercado e de expectativa tais quais retornos setoriais, prêmio de risco de mercado, volatilidade implícita, prêmio de taxa de juros (corporate e longo prazo), sentimento do consumidor e índice de incerteza. No segundo ambiente foi usado à data base do FRED com 130 variáveis econômicas como variáveis explicativas. No terceiro ambiente foram usadas as variáveis mais relevantes do ambiente 1 e do ambiente 2. Observa-se no trabalho uma melhora em prever o IP contra um modelo AR e algumas interpretações a respeito do comportamento da economia americana nos últimos 45 anos (importância de setores econômicos, períodos de incerteza, mudanças na resposta a prêmio de risco, volatilidade e taxa de juros). / [en] This thesis presents 6 different forecasting techniques for the monthly variation of the American Industrial Production Index in 3 different environments, totaling 18 models. In the first environment, the lags of the monthly variation of the industrial production index and other 55 market and expectation variables such as sector returns, market risk premium, implied volatility, and interest rate risk premiums (corporate premium and long term premium), consumer sentiment and uncertainty index. In the second environment was used the FRED data base with 130 economic variables as explanatory variables. In the third environment, the most relevant variables of environment 1 and environment 2 were used. It was observed an improvement in predicting IP against an AR model and some interpretations regarding the behavior of the American economy in the last 45 years (importance of sectors, uncertainty periods, and changes in response to risk premium, volatility and interest rate). [pt] VOLATILIDADE [pt] PREMIO A TERMO [pt] RIDGE [pt] ALTA DIMENSIONALIDADE [pt] RANDOM FOREST [pt] BAGGING [pt] LASSO [pt] PRODUCAO INDUSTRIAL [pt] EXPECTATIVAS [pt] SENTIMENTO [pt] PREMIO DE RISCO [pt] INCERTEZA [pt] TAXA DE JUROS [en] VOLATILITY MODELS [en] TERM PREMIUM [en] RIDGE [en] HIGH DIMENSION [en] RANDOM FOREST [en] BAGGING [en] LASSO [en] INDUSTRIAL PRODUCTION [en] EXPECTATIONS [en] FEELING [en] EQUITY RISK PREMIUM [en] UNCERTAINTY [en] INTEREST RATES
212	Predicting PV self-consumption in villas with machine learning GALLI, FABIAN January 2021 (has links) In Sweden, there is a strong and growing interest in solar power. In recent years, photovoltaic (PV) system installations have increased dramatically and a large part are distributed grid connected PV systems i.e. rooftop installations. Currently the electricity export rate is significantly lower than the import rate which has made the amount of self-consumed PV electricity a critical factor when assessing the system profitability. Self-consumption (SC) is calculated using hourly or sub-hourly timesteps and is highly dependent on the solar patterns of the location of interest, the PV system configuration and the building load. As this varies for all potential installations it is difficult to make estimations without having historical data of both load and local irradiance, which is often hard to acquire or not available. A method to predict SC using commonly available information at the planning phase is therefore preferred. There is a scarcity of documented SC data and only a few reports treating the subject of mapping or predicting SC. Therefore, this thesis is investigating the possibility of utilizing machine learning to create models able to predict the SC using the inputs: Annual load, annual PV production, tilt angle and azimuth angle of the modules, and the latitude. With the programming language Python, seven models are created using regression techniques, using real load data and simulated PV data from the south of Sweden, and evaluated using coefficient of determination (R2) and mean absolute error (MAE). The techniques are Linear Regression, Polynomial regression, Ridge Regression, Lasso regression, K-Nearest Neighbors (kNN), Random Forest, Multi-Layer Perceptron (MLP), as well as the only other SC prediction model found in the literature. A parametric analysis of the models is conducted, removing one variable at a time to assess the model’s dependence on each variable. The results are promising, with five out of eight models achieving an R2 value above 0.9 and can be considered good for predicting SC. The best performing model, Random Forest, has an R2 of 0.985 and a MAE of 0.0148. The parametric analysis also shows that while more input data is helpful, using only annual load and PV production is sufficient to make good predictions. This can only be stated for model performance for the southern region of Sweden, however, and are not applicable to areas outside the latitudes or country tested. / I Sverige finns ett starkt och växande intresse för solenergi. De senaste åren har antalet solcellsanläggningar ökat dramatiskt och en stor del är distribuerade nätanslutna solcellssystem, dvs takinstallationer. För närvarande är elexportpriset betydligt lägre än importpriset, vilket har gjort mängden egenanvänd solel till en kritisk faktor vid bedömningen av systemets lönsamhet. Egenanvändning (EA) beräknas med tidssteg upp till en timmes längd och är i hög grad beroende av solstrålningsmönstret för platsen av intresse, PV-systemkonfigurationen och byggnadens energibehov. Eftersom detta varierar för alla potentiella installationer är det svårt att göra uppskattningar utan att ha historiska data om både energibehov och lokal solstrålning, vilket ofta inte är tillgängligt. En metod för att förutsäga EA med allmän tillgänglig information är därför att föredra. Det finns en brist på dokumenterad EA-data och endast ett fåtal rapporter som behandlar kartläggning och prediktion av EA. I denna uppsats undersöks möjligheten att använda maskininlärning för att skapa modeller som kan förutsäga EA. De variabler som ingår är årlig energiförbrukning, årlig solcellsproduktion, lutningsvinkel och azimutvinkel för modulerna och latitud. Med programmeringsspråket Python skapas sju modeller med hjälp av olika regressionstekniker, där energiförbruknings- och simulerad solelproduktionsdata från södra Sverige används. Modellerna utvärderas med hjälp av determinationskoefficienten (R2) och mean absolute error (MAE). Teknikerna som används är linjär regression, polynomregression, Ridge regression, Lasso regression, K-nearest neighbor regression, Random Forest regression, Multi-Layer Perceptron regression. En additionell linjär regressions-modell skapas även med samma metodik som används i en tidigare publicerad rapport. En parametrisk analys av modellerna genomförs, där en variabel exkluderas åt gången för att bedöma modellens beroende av varje enskild variabel. Resultaten är mycket lovande, där fem av de åtta undersökta modeller uppnår ett R2-värde över 0,9. Den bästa modellen, Random Forest, har ett R2 på 0,985 och ett MAE på 0,0148. Den parametriska analysen visar också att även om ingångsdata är till hjälp, är det tillräckligt att använda årlig energiförbrukning och årlig solcellsproduktion för att göra bra förutsägelser. Det måste dock påpekas att modellprestandan endast är tillförlitlig för södra Sverige, från var beräkningsdata är hämtad, och inte tillämplig för områden utanför de valda latituderna eller land. Self-Consumption photovoltaics machine learning solar energy Random Forest K-nearest neighbor multi-layer perceptron lasso regression ridge regression linear regression prediction Egenanvändning photovoltaics maskininlärning solenergi Random Forest K-nearest neighbors multi-layer perceptron lasso regression ridge regression linjär regression data-förutsägelse Engineering and Technology Teknik och teknologier
213	PV self-consumption: Regression models and data visualization Tóth, Martos January 2022 (has links) In Sweden the installed capacity of the residential PV systems is increasing every year. The lack of feed-in-tariff-scheme makes the techno-economic optimization of the PV systems mainly based on the self-consumption. The calculation of this parameter involves hourly building loads and hourly PV generation. This data cannot be obtained easily from households. A predictive model based on already available data would be preferred and needed in this case. The already available machine learning models can be suitable and have been tested but the amount of literature in this topic is fairly low. The machine learning models are using a dataset which includes real measurement data of building loads and simulated PV generation data and the calculated self-consumption data based on these two inputs. The simulation of PV generation can be based on Typical Meteorological Year (TMY) weather file or on measured weather data. The TMY file can be generated quicker and more easily, but it is only spatially matched to the building load, while the measured data is matched temporally and spatially. This thesis investigates if the usage of TMY file leads to any major impact on the performance of the regression models by comparing it to the measured weather file model. In this model the buildings are single-family houses from south Sweden region. The different building types can have different load profiles which can affect the performance of the model. Because of the different load profiles, the effect of using TMY file may have more significant impact. This thesis also compares the impact of the TMY file usage in the case of multifamily houses and also compares the two building types by performance of the machine learning models. The PV and battery prices are decreasing from year to year. The subsidies in Sweden offer a significant tax credit on battery investments with PV systems. This can make the batteries profitable. Lastly this thesis evaluates the performance of the machine learning models after adding the battery to the system for both TMY and measured data. Also, the optimal system is predicted based on the self-consumption, PV generation and battery size. The models have high accuracy, the random forest model is above 0.9 R2for all cases. The results confirm that using the TMY file only leads to marginal errors, and it can be used for the training of the models. The battery model has promising results with above 0.9 R2 for four models: random forest, k-NN, MLP and polynomial. The prediction of the optimal system model has promising results as well for the polynomial model with 18% error in predicted payback time compared to the reference. / I Sverige ökar den installerade kapaciteten för solcellsanläggningarna för bostäder varje år. Bristen på inmatningssystem gör att den tekniska ekonomiska optimeringen av solcellssystemen huvudsakligen bygger på egen konsumtion. Beräkningen av denna parameter omfattar byggnadsbelastningar per timme och PV-generering per timme. Dessa uppgifter kan inte lätt erhållas från hushållen. En prediktiv modell baserad på redan tillgängliga data skulle vara att föredra och behövas i detta fall. De redan tillgängliga maskininlärningsmodellerna kan vara lämpliga och redan testade men mängden litteratur i detta ämne är ganska låg. Maskininlärningsmodellerna använder en datauppsättning som inkluderar verkliga mätdata från byggnader och simulerad PV-genereringsdata och den beräknade egenförbrukningsdata baserad på dessa två indata. Simuleringen av PV-generering kan baseras på väderfilen Typical Meteorological Year (TMY) eller på uppmätta väderdata. TMY-filen kan genereras snabbare och enklare, men den anpassas endast rumsligt till byggnadsbelastningen, medan uppmätta data är temporärt och rumsligt. Denna avhandling undersöker om användningen av TMY-fil leder till någon större påverkan på prestandan genom att jämföra den med den uppmätta väderfilsmodellen. I denna modell är byggnaderna småhus från södra Sverige. De olika byggnadstyperna kan ha olika belastningsprofiler vilket kan påverka modellens prestanda. På grund av dessa olika belastningsprofiler kan effekten av att använda TMY-fil ha mer betydande inverkan. Den här avhandlingen jämför också effekten av TMY-filanvändningen i fallet med flerfamiljshus och jämför också de två byggnadstyperna efter prestanda för maskininlärningsmodellerna. PV- och batteripriserna minskar från år till år. Subventionerna i Sverige ger en betydande skattelättnad på batteriinvesteringar med solcellssystem. Detta kan göra batterierna lönsamma. Slutligen utvärderar denna avhandling prestandan för maskininlärningsmodellerna efter att ha lagt till batteriet i systemet för både TMY och uppmätta data. Det optimala systemet förutsägs också baserat på egen förbrukning, årlig byggnadsbelastning, årlig PV-generering och batteristorlek. Modellerna har hög noggrannhet, den slumpmässiga skogsmodellen är över 0,9 R2 för alla fall. Resultaten bekräftar att användningen av TMY-filen endast leder till marginella fel, och den kan användas för träning av modellerna. Batterimodellen har lovande resultat med över 0,9 R2 för fyra modeller: random skog, k-NN, MLP och polynom. Förutsägelsen av den optimala systemmodellen har också lovande resultat för polynommodellen med 18 % fel i förutspådd återbetalningstid jämfört med referensen. Self-Consumption photovoltaics battery machine learning solar energy random forest k-nearest neighbor multi-layer perceptron lasso regression ridge regression linear regression Egenanvändning photovoltaics batteri maskininlärning solenergi random forest k-nearest neighbors multi-layer perceptron lasso regression ridge regression linjär regression Engineering and Technology Teknik och teknologier
214	[en] ESSAYS IN ECONOMETRICS: ONLINE LEARNING IN HIGH-DIMENSIONAL CONTEXTS AND TREATMENT EFFECTS WITH COMPLEX AND UNKNOWN ASSIGNMENT RULES / [pt] ESTUDOS EM ECONOMETRIA: APRENDIZADO ONLINE EM AMBIENTES DE ALTA DIMENSÃO E EFEITOS DE TRATAMENTO COM REGRAS DE ALOCAÇÃO COMPLEXAS E DESCONHECIDAS CLAUDIO CARDOSO FLORES 04 October 2021 (has links) [pt] Essa tese é composta por dois capítulos. O primeiro deles refere-se ao problema de aprendizado sequencial, útil em diversos campos de pesquisa e aplicações práticas. Exemplos incluem problemas de apreçamento dinâmico, desenhos de leilões e de incentivos, além de programas e tratamentos sequenciais. Neste capítulo, propomos a extensão de uma das mais populares regras de aprendizado, epsilon-greedy, para contextos de alta-dimensão, levando em consideração uma diretriz conservadora. Em particular, nossa proposta consiste em alocar parte do tempo que a regra original utiliza na adoção de ações completamente novas em uma busca focada em um conjunto restrito de ações promissoras. A regra resultante pode ser útil para aplicações práticas nas quais existem restrições suaves à adoção de ações não-usuais, mas que eventualmente, valorize surpresas positivas, ainda que a uma taxa decrescente. Como parte dos resultados, encontramos limites plausíveis, com alta probabilidade, para o remorso cumulativo para a regra epsilon-greedy conservadora em alta-dimensão. Também, mostramos a existência de um limite inferior para a cardinalidade do conjunto de ações viáveis que implica em um limite superior menor para o remorso da regra conservadora, comparativamente a sua versão não-conservadora. Adicionalmente, usuários finais possuem suficiente flexibilidade em estabelecer o nível de segurança que desejam, uma vez que tal nível não impacta as propriedades teóricas da regra de aprendizado proposta. Ilustramos nossa proposta tanto por meio de simulação, quanto por meio de um exercício utilizando base de dados de um problema real de sistemas de classificação. Por sua vez, no segundo capítulo, investigamos efeitos de tratamento determinísticos quando a regra de aloção é complexa e desconhecida, talvez por razões éticas, ou para evitar manipulação ou competição desnecessária. Mais especificamente, com foco na metodologia de regressão discontínua sharp, superamos a falta de conhecimento de pontos de corte na alocação de unidades, pela implementação de uma floresta de árvores de classificação, que também utiliza aprendizado sequencial na sua construção, para garantir que, assintoticamente, as regras de alocação desconhecidas sejam identificadas corretamente. A estrutura de árvore também é útil nos casos em que a regra de alocação desconhecida é mais complexa que as tradicionais univariadas. Motivado por exemplos da vida prática, nós mostramos nesse capítulo que, com alta probabilidade e baseado em premissas razoáveis, é possível estimar consistentemente os efeitos de tratamento sob esse cenário. Propomos ainda um algoritmo útil para usuários finais que se mostrou robusto para diferentes especificações e que revela com relativa confiança a regra de alocação anteriormente desconhecida. Ainda, exemplificamos os benefícios da metodologia proposta pela sua aplicação em parte do P900, um programa governamental Chileno de suporte para escolas, que se mostrou adequado ao cenário aqui estudado. / [en] Sequential learning problems are common in several fields of research and practical applications. Examples include dynamic pricing and assortment, design of auctions and incentives and permeate a large number of sequential treatment experiments. In this essay, we extend one of the most popular learning solutions, the epsilon-greedy heuristics, to high-dimensional contexts considering a conservative directive. We do this by allocating part of the time the original rule uses to adopt completely new actions to a more focused search in a restrictive set of promising actions. The resulting rule might be useful for practical applications that still values surprises, although at a decreasing rate, while also has restrictions on the adoption of unusual actions. With high probability, we find reasonable bounds for the cumulative regret of a conservative high-dimensional decaying epsilon-greedy rule. Also, we provide a lower bound for the cardinality of the set of viable actions that implies in an improved regret bound for the conservative version when compared to its non-conservative counterpart. Additionally, we show that end-users have sufficient flexibility when establishing how much safety they want, since it can be tuned without impacting theoretical properties. We illustrate our proposal both in a simulation exercise and using a real dataset. The second essay studies deterministic treatment effects when the assignment rule is both more complex than traditional ones and unknown to the public perhaps, among many possible causes, due to ethical reasons, to avoid data manipulation or unnecessary competition. More specifically, sticking to the well-known sharp RDD methodology, we circumvent the lack of knowledge of true cutoffs by employing a forest of classification trees which also uses sequential learning, as in the last essay, to guarantee that, asymptotically, the true unknown assignment rule is correctly identified. The tree structure also turns out to be suitable if the program s rule is more sophisticated than traditional univariate ones. Motivated by real world examples, we show in this essay that, with high probability and based on reasonable assumptions, it is possible to consistently estimate treatment effects under this setup. For practical implementation we propose an algorithm that not only sheds light on the previously unknown assignment rule but also is capable to robustly estimate treatment effects regarding different specifications imputed by end-users. Moreover, we exemplify the benefits of our methodology by employing it on part of the Chilean P900 school assistance program, which proves to be suitable for our framework. [pt] LASSO [pt] APRENDIZADO DE MAQUINA [pt] REGRESSAO DISCONTINUA [pt] FLORESTAS ALEATORIAS [pt] ARVORES DE CLASSIFICACAO [pt] REGRAS DE ALOCACAO DESCONHECIDAS [pt] EFEITOS DE TRATAMENTO [pt] BANDIT [pt] APRENDIZADO ONLINE [en] LASSO [en] MACHINE LEARNING [en] REGRESSION DISCONTINUITY [en] RANDOM FOREST [en] CLASSIFICATION TREES [en] ASSIGNMENT RULES [en] DESIGN [en] BANDIT [en] ONLINE LEARNING
215	Dynamique des filaments élastiques et visqueux Brun, Pierre-Thomas 17 September 2012 (has links) (PDF) Ce manuscrit porte conjointement sur la dynamique des filaments élastiques et visqueux. Leur dynamique est décrite par les équations de Kirchhoff qui sont très largement discutées dans le manuscrit. La description Lagrandienne des filaments visqueux a conduit au code numérique pour leur simulation : Discrete Viscous Rods . Il est validé par l'exemple des enroulements de filaments visqueux similaires à ce qu'on voit en faisant chuter du miel sur une tartine. Nous en étudions ensuite une variante : la machine à coudre fluide. Le filament s'écoule alors sur un tapis roulant dont la vitesse est contrôlée et forme une myriade de motifs complexes ressemblant à ceux d'une véritable machine à coudre. Nous en produisons un diagramme de phase numérique et les décrivons dans un formalisme cinématique unifié. La cinématique de leur contact avec le tapis est d'ailleurs à l'origine de leur formation. Le filament compose avec des fractions exactes de sa fréquence propre pour s'adapter à la vitesse prescrite par le tapis roulant. Les motifs sont une trace de cette adaptation. Ensuite, nous étudions un exemple élastique de dynamique filamentaire : le repliement d'un ressort à courbure constante initialement déposé à plat sur une surface, maintenu par une extrémité, l'autre étant libérée à t = 0. L'énergie élastique de flexion est convertie à mesure que le ressort décolle du support. Son enroulement est inertiel et est formé d'une zone de rayon deux fois supérieur à celui du ressort repos. L'enroulement a une forme autosimilaire, et avance à vitesse constante. On finit par l'étude de la dynamique du lasso dont la dynamique est étudiée avec un modèle minimal de chaîne et d'un noeud coulant. Filaments élastiques filaments visqueux tiges cordes machine à coudre fluide motifs enroulement lasso solutions auto-similaires bifurcations non linéaire
216	Contributions à l'apprentissage statistique dans les modèles parcimonieux Alquier, Pierre 06 December 2013 (has links) (PDF) Ce mémoire d'habilitation a pour objet diverses contributions à l'estimation et à l'apprentissage statistique dans les modeles en grande dimension, sous différentes hypothèses de parcimonie. Dans une première partie, on introduit la problématique de la statistique en grande dimension dans un modèle générique de régression linéaire. Après avoir passé en revue les différentes méthodes d'estimation populaires dans ce modèle, on présente de nouveaux résultats tirés de (Alquier & Lounici 2011) pour des estimateurs agrégés. La seconde partie a essentiellement pour objet d'étendre les résultats de la première partie à l'estimation de divers modèles de séries temporelles (Alquier & Doukhan 2011, Alquier & Wintenberger 2013, Alquier & Li 2012, Alquier, Wintenberger & Li 2012). Enfin, la troisième partie présente plusieurs extensions à des modèles non param\étriques ou à des applications plus spécifiques comme la statistique quantique (Alquier & Biau 2013, Guedj & Alquier 2013, Alquier, Meziani & Peyré 2013, Alquier, Butucea, Hebiri, Meziani & Morimae 2013, Alquier 2013, Alquier 2008). Dans chaque section, des estimateurs sont proposés, et, aussi souvent que possible, des inégalités oracles optimales sont établies. [MATH:MATH_ST] Mathematics/Statistics [STAT:TH] Statistics/Statistics Theory [STAT:TH] Statistiques/Théorie Théorie de l'apprentissage statistique estimateurs agrégés inégalités PAC-Bayésiennes statistique en grande dimension parcimonie estimateur LASSO estimateurs pénalisés dépendance faible statistique quantique régression matricielle méthodes de Monte-Carlo
217	Inférence non-paramétrique pour des interactions poissoniennes Sansonnet, Laure 14 June 2013 (has links) (PDF) L'objet de cette thèse est d'étudier divers problèmes de statistique non-paramétrique dans le cadre d'un modèle d'interactions poissoniennes. De tels modèles sont, par exemple, utilisés en neurosciences pour analyser les interactions entre deux neurones au travers leur émission de potentiels d'action au cours de l'enregistrement de l'activité cérébrale ou encore en génomique pour étudier les distances favorisées ou évitées entre deux motifs le long du génome. Dans ce cadre, nous introduisons une fonction dite de reproduction qui permet de quantifier les positions préférentielles des motifs et qui peut être modélisée par l'intensité d'un processus de Poisson. Dans un premier temps, nous nous intéressons à l'estimation de cette fonction que l'on suppose très localisée. Nous proposons une procédure d'estimation adaptative par seuillage de coefficients d'ondelettes qui est optimale des points de vue oracle et minimax. Des simulations et une application en génomique sur des données réelles provenant de la bactérie E. coli nous permettent de montrer le bon comportement pratique de notre procédure. Puis, nous traitons les problèmes de test associés qui consistent à tester la nullité de la fonction de reproduction. Pour cela, nous construisons une procédure de test optimale du point de vue minimax sur des espaces de Besov faibles, qui a également montré ses performances du point de vue pratique. Enfin, nous prolongeons ces travaux par l'étude d'une version discrète en grande dimension du modèle précédent en proposant une procédure adaptative de type Lasso. Processus de Poisson Estimation et tests adaptatifs Seuillage de coefficients d'ondelettes Inégalités oracle U-statistiques Vitesse de séparation uniforme Modèle d'interactions processus de Hawkes Espaces de Besov Lasso
218	Learning algorithms for sparse classification Sanchez Merchante, Luis Francisco 07 June 2013 (has links) (PDF) This thesis deals with the development of estimation algorithms with embedded feature selection the context of high dimensional data, in the supervised and unsupervised frameworks. The contributions of this work are materialized by two algorithms, GLOSS for the supervised domain and Mix-GLOSS for unsupervised counterpart. Both algorithms are based on the resolution of optimal scoring regression regularized with a quadratic formulation of the group-Lasso penalty which encourages the removal of uninformative features. The theoretical foundations that prove that a group-Lasso penalized optimal scoring regression can be used to solve a linear discriminant analysis bave been firstly developed in this work. The theory that adapts this technique to the unsupervised domain by means of the EM algorithm is not new, but it has never been clearly exposed for a sparsity-inducing penalty. This thesis solidly demonstrates that the utilization of group-Lasso penalized optimal scoring regression inside an EM algorithm is possible. Our algorithms have been tested with real and artificial high dimensional databases with impressive resuits from the point of view of the parsimony without compromising prediction performances. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Linear discriminant analysis Feature selection Regularization Variational approach Clustering Optimal scoring Group Lasso Sparsity
219	Probabilistic solar power forecasting using partially linear additive quantile regression models: an application to South African data Mpfumali, Phathutshedzo 18 May 2019 (has links) MSc (Statistics) / Department of Statistics / This study discusses an application of partially linear additive quantile regression models in predicting medium-term global solar irradiance using data from Tellerie radiometric station in South Africa for the period August 2009 to April 2010. Variables are selected using a least absolute shrinkage and selection operator (Lasso) via hierarchical interactions and the parameters of the developed models are estimated using the Barrodale and Roberts's algorithm. The best models are selected based on the Akaike information criterion (AIC), Bayesian information criterion (BIC), adjusted R squared (AdjR2) and generalised cross validation (GCV). The accuracy of the forecasts is evaluated using mean absolute error (MAE) and root mean square errors (RMSE). To improve the accuracy of forecasts, a convex forecast combination algorithm where the average loss su ered by the models is based on the pinball loss function is used. A second forecast combination method which is quantile regression averaging (QRA) is also used. The best set of forecasts is selected based on the prediction interval coverage probability (PICP), prediction interval normalised average width (PINAW) and prediction interval normalised average deviation (PINAD). The results show that QRA is the best model since it produces robust prediction intervals than other models. The percentage improvement is calculated and the results demonstrate that QRA model over GAM with interactions yields a small improvement whereas QRA over a convex forecast combination model yields a higher percentage improvement. A major contribution of this dissertation is the inclusion of a non-linear trend variable and the extension of forecast combination models to include the QRA. / NRF Lasso via hierarchical interaction Partiallly linear additive Models Probabilistic solar power forecasting Forecast combination 621.450968 Solar energy -- South Africa Solar power plants -- South Africa Solar cells -- South Africa
220	Statistical and Machine Learning for assessment of Traumatic Brain Injury Severity and Patient Outcomes Rahman, Md Abdur January 2021 (has links) Traumatic brain injury (TBI) is a leading cause of death in all age groups, causing society to be concerned. However, TBI diagnostics and patient outcomes prediction are still lacking in medical science. In this thesis, I used a subset of TBIcare data from Turku University Hospital in Finland to classify the severity, patient outcomes, and CT (computerized tomography) as positive/negative. The dataset was derived from the comprehensive metabolic profiling of serum samples from TBI patients. The study included 96 TBI patients who were diagnosed as 7 severe (sTBI=7), 10 moderate (moTBI=10), and 79 mild (mTBI=79). Among them, there were 85 good recoveries (Good_Recovery=85) and 11 bad recoveries (Bad_Recovery=11), as well as 49 CT positive (CT. Positive=49) and 47 CT negative (CT. Negative=47). There was a total of 455 metabolites (features), excluding three response variables. Feature selection techniques were applied to retain the most important features while discarding the rest. Subsequently, four classifications were used for classification: Ridge regression, Lasso regression, Neural network, and Deep learning. Ridge regression yielded the best results for binary classifications such as patient outcomes and CT positive/negative. The accuracy of CT positive/negative was 74% (AUC of 0.74), while the accuracy of patient outcomes was 91% (AUC of 0.91). For severity classification (multi-class classification), neural networks performed well, with a total accuracy of 90%. Despite the limited number of data points, the overall result was satisfactory. TBI (Traumatic brain injury) Metabolites Glasgow coma scale Severity Patient outcomes CT positive /negative Random Forest Boruta Lasso regression Ridge regression Neural network Deep learning. Social Sciences Interdisciplinary

Search results