Global ETD Search

231	Meta-učení v oblasti dolování dat / Meta-Learning in the Area of Data Mining Kučera, Petr January 2013 (has links) This paper describes the use of meta-learning in the area of data mining. It describes the problems and tasks of data mining where meta-learning can be applied, with a focus on classification. It provides an overview of meta-learning techniques and their possible application in data mining, especially model selection. It describes design and implementation of meta-learning system to support classification tasks in data mining. The system uses statistics and information theory to characterize data sets stored in the meta-knowledge base. The meta-classifier is created from the base and predicts the most suitable model for the new data set. The conclusion discusses results of the experiments with more than 20 data sets representing clasification tasks from different areas and suggests possible extensions of the project.
232	Algorithmes optimaux de traitement de données pour des systèmes complexes d'information et télécommunication dans un environnement incertain / Optimal algorithms of data processing for complex information and telecommunication systems in an uncertain environment Beltaief, Slim 08 September 2017 (has links) Ce travail est consacré au problème d'estimation non paramétrique dans des modèles de régression en temps continu. On considère le problème d'estimation d'une fonction inconnue S supposée périodique. Cette estimation est basée sur des observations générées par un processus stochastique; ces observations peuvent être en temps continu ou discret. Pour ce faire, nous construisons une série d'estimateurs par projection et nous approchons la fonction inconnue S par une série de Fourier finie. Dans cette thèse, nous considérons le problème d'estimation dans le cadre adaptatif, c'est-à-dire le cas où la régularité de la fonction S est inconnue. Pour ce problème, nous développons une nouvelle méthode d'adaptation basée sur la procédure de sélection de modèle proposée par Konev et Pergamenshchikov (2012). Tout d'abord, cette procédure nous donne une famille d'estimateurs ; après nous choisissons le meilleur estimateur possible en minimisant une fonction coût. Nous donnons également une inégalité d'Oracle pour le risque de nos estimateurs et nous donnons la vitesse de convergence minimax. / This thesis is devoted to the problem of non parametric estimation for continuous-time regression models. We consider the problem of estimating an unknown periodoc function S. This estimation is based on observations generated by a stochastic process; these observations may be in continuous or discrete time. To this end, we construct a series of estimators by projection and thus we approximate the unknown function S by a finite Fourier series. In this thesis we consider the estimation problem in the adaptive setting, i.e. in situation when the regularity of the fonction S is unknown. In this way, we develop a new adaptive method based on the model selection procedure proposed by Konev and Pergamenshchikov (2012). Firstly, this procedure give us a family of estimators, then we choose the best possible one by minimizing a cost function. We give also an oracle inequality for the risk of our estimators and we give the minimax convergence rate. Estimation non asymptotique Sélection de modèle Inégalité d'Oracle pointue Risque robuste Efficacité asymptotique Non asymptotic estimation Robust risk Model selection Sharp oracle inequality Assymptotic efficiency 519
233	Model Selection and Uniqueness Analysis for Reservoir History Matching Rafiee, Mohammad Mohsen 28 January 2011 (has links) “History matching” (model calibration, parameter identification) is an established method for determination of representative reservoir properties such as permeability, porosity, relative permeability and fault transmissibility from a measured production history; however the uniqueness of selected model is always a challenge in a successful history matching. Up to now, the uniqueness of history matching results in practice can be assessed only after individual and technical experience and/or by repeating history matching with different reservoir models (different sets of parameters as the starting guess). The present study has been used the stochastical theory of Kullback & Leibler (K-L) and its further development by Akaike (AIC) for the first time to solve the uniqueness problem in reservoir engineering. In addition - based on the AIC principle and the principle of parsimony - a penalty term for OF has been empirically formulated regarding geoscientific and technical considerations. Finally a new formulation (Penalized Objective Function, POF) has been developed for model selection in reservoir history matching and has been tested successfully in a North German gas field. / „History Matching“ (Modell-Kalibrierung, Parameter Identifikation) ist eine bewährte Methode zur Bestimmung repräsentativer Reservoireigenschaften, wie Permeabilität, Porosität, relative Permeabilitätsfunktionen und Störungs-Transmissibilitäten aus einer gemessenen Produktionsgeschichte (history). Bis heute kann die Eindeutigkeit der identifizierten Parameter in der Praxis nicht konstruktiv nachgewiesen werden. Die Resultate eines History-Match können nur nach individueller Erfahrung und/oder durch vielmalige History-Match-Versuche mit verschiedenen Reservoirmodellen (verschiedenen Parametersätzen als Startposition) auf ihre Eindeutigkeit bewertet werden. Die vorliegende Studie hat die im Reservoir Engineering erstmals eingesetzte stochastische Theorie von Kullback & Leibler (K-L) und ihre Weiterentwicklung nach Akaike (AIC) als Basis für die Bewertung des Eindeutigkeitsproblems genutzt. Schließlich wurde das AIC-Prinzip als empirischer Strafterm aus geowissenschaftlichen und technischen Überlegungen formuliert. Der neu formulierte Strafterm (Penalized Objective Function, POF) wurde für das History Matching eines norddeutschen Erdgasfeldes erfolgreich getestet. info:eu-repo/classification/ddc/620 ddc:620
234	Uncertainty Assessment of Hydrogeological Models Based on Information Theory De Aguinaga, José Guillermo 03 December 2010 (has links) There is a great deal of uncertainty in hydrogeological modeling. Overparametrized models increase uncertainty since the information of the observations is distributed through all of the parameters. The present study proposes a new option to reduce this uncertainty. A way to achieve this goal is to select a model which provides good performance with as few calibrated parameters as possible (parsimonious model) and to calibrate it using many sources of information. Akaike’s Information Criterion (AIC), proposed by Hirotugu Akaike in 1973, is a statistic-probabilistic criterion based on the Information Theory, which allows us to select a parsimonious model. AIC formulates the problem of parsimonious model selection as an optimization problem across a set of proposed conceptual models. The AIC assessment is relatively new in groundwater modeling and it presents a challenge to apply it with different sources of observations. In this dissertation, important findings in the application of AIC in hydrogeological modeling using different sources of observations are discussed. AIC is tested on ground-water models using three sets of synthetic data: hydraulic pressure, horizontal hydraulic conductivity, and tracer concentration. In the present study, the impact of the following factors is analyzed: number of observations, types of observations and order of calibrated parameters. These analyses reveal not only that the number of observations determine how complex a model can be but also that its diversity allows for further complexity in the parsimonious model. However, a truly parsimonious model was only achieved when the order of calibrated parameters was properly considered. This means that parameters which provide bigger improvements in model fit should be considered first. The approach to obtain a parsimonious model applying AIC with different types of information was successfully applied to an unbiased lysimeter model using two different types of real data: evapotranspiration and seepage water. With this additional independent model assessment it was possible to underpin the general validity of this AIC approach. / Hydrogeologische Modellierung ist von erheblicher Unsicherheit geprägt. Überparametrisierte Modelle erhöhen die Unsicherheit, da gemessene Informationen auf alle Parameter verteilt sind. Die vorliegende Arbeit schlägt einen neuen Ansatz vor, um diese Unsicherheit zu reduzieren. Eine Möglichkeit, um dieses Ziel zu erreichen, besteht darin, ein Modell auszuwählen, das ein gutes Ergebnis mit möglichst wenigen Parametern liefert („parsimonious model“), und es zu kalibrieren, indem viele Informationsquellen genutzt werden. Das 1973 von Hirotugu Akaike vorgeschlagene Informationskriterium, bekannt als Akaike-Informationskriterium (engl. Akaike’s Information Criterion; AIC), ist ein statistisches Wahrscheinlichkeitskriterium basierend auf der Informationstheorie, welches die Auswahl eines Modells mit möglichst wenigen Parametern erlaubt. AIC formuliert das Problem der Entscheidung für ein gering parametrisiertes Modell als ein modellübergreifendes Optimierungsproblem. Die Anwendung von AIC in der Grundwassermodellierung ist relativ neu und stellt eine Herausforderung in der Anwendung verschiedener Messquellen dar. In der vorliegenden Dissertation werden maßgebliche Forschungsergebnisse in der Anwendung des AIC in hydrogeologischer Modellierung unter Anwendung unterschiedlicher Messquellen diskutiert. AIC wird an Grundwassermodellen getestet, bei denen drei synthetische Datensätze angewendet werden: Wasserstand, horizontale hydraulische Leitfähigkeit und Tracer-Konzentration. Die vorliegende Arbeit analysiert den Einfluss folgender Faktoren: Anzahl der Messungen, Arten der Messungen und Reihenfolge der kalibrierten Parameter. Diese Analysen machen nicht nur deutlich, dass die Anzahl der gemessenen Parameter die Komplexität eines Modells bestimmt, sondern auch, dass seine Diversität weitere Komplexität für gering parametrisierte Modelle erlaubt. Allerdings konnte ein solches Modell nur erreicht werden, wenn eine bestimmte Reihenfolge der kalibrierten Parameter berücksichtigt wurde. Folglich sollten zuerst jene Parameter in Betracht gezogen werden, die deutliche Verbesserungen in der Modellanpassung liefern. Der Ansatz, ein gering parametrisiertes Modell durch die Anwendung des AIC mit unterschiedlichen Informationsarten zu erhalten, wurde erfolgreich auf einen Lysimeterstandort übertragen. Dabei wurden zwei unterschiedliche reale Messwertarten genutzt: Evapotranspiration und Sickerwasser. Mit Hilfe dieser weiteren, unabhängigen Modellbewertung konnte die Gültigkeit dieses AIC-Ansatzes gezeigt werden. info:eu-repo/classification/ddc/550 ddc:550
235	Modelling regime shifts for foreign exchange market data using hidden Markov models / Modellering av regimskiften för valutamarknadsdata genom dolda Markovkedjor Persson, Liam January 2021 (has links) Financial data is often said to follow different market regimes. These regimes, which not possible to observe directly, are assumed to influence the observable returns. In this thesis such regimes are modeled using hidden Markov models. We will investigate whether the five different currency pairs EUR/NOK, USD/NOK, EUR/USD, EUR/SEK, and USD/SEK exhibit market regimes that can be described using hidden Markov modeling. We will find the most optimal number of states and study the mean, variance, and correlations in each market regime. / Finansiella data sägs ofta följa olika marknadsregimer. Dessa marknadsregimer kan inte observeras direkt men antas ha inflytande på de observerade avkastningarna. I denna uppsats undersöks om de fem valutaparen EUR/NOK, USD/NOK, EUR/USD, EUR/SEK och USD/SEK tycks följa separata marknadsregimer som kan detekteras med hjälp av en dold Markovkedja. exchange market data model selection hidden Markov model market regime correlation valuta modellval dold Markovmodell marknadsregimer korrelation Other Mathematics Annan matematik
236	Evoluční algoritmy pro vícekriteriální optimalizaci / Evolutionary Algorithms for Multiobjective Optimization Pilát, Martin January 2013 (has links) Multi-objective evolutionary algorithms have gained a lot of atten- tion in the recent years. They have proven to be among the best multi-objective optimizers and have been used in many industrial ap- plications. However, their usability is hindered by the large number of evaluations of the objective functions they require. These can be expensive when solving practical tasks. In order to reduce the num- ber of objective function evaluations, surrogate models can be used. These are a simple and fast approximations of the real objectives. In this work we present the results of research made between the years 2009 and 2013. We present a multi-objective evolutionary algo- rithm with aggregate surrogate model, its newer version, which also uses a surrogate model for the pre-selection of individuals. In the next part we discuss the problem of selection of a particular type of model. We show which characteristics of the various models are im- portant and desirable and provide a framework which combines sur- rogate modeling with meta-learning. Finally, in the last part, we ap- ply multi-objective optimization to the problem of hyper-parameters tuning. We show that additional objectives can make finding of good parameters for classifiers faster. 1
237	Výběr řádu GARCH modelu / GARCH model selection Turzová, Kristína January 2021 (has links) The GARCH model estimates the volatility of a time series. Information criteria are often used to determine orders of the GARCH model, although their suit- ability is not known. This thesis focuses on the order selection of the GARCH model using information criteria. The simulation study investigates whether in- formation criteria are appropriate for the model selection and how the selection depends on the order, number of observations, distribution of innovations, estima- tion method or model parameters. The predictive capabilities of models selected by information criteria are compared to the true model. 1
238	Determining the number of classes in latent class regression models / A Monte Carlo simulation study on class enumeration Luo, Sherry January 2021 (has links) A Monte Carlo simulation study on class enumeration with latent class regression models. / Latent class regression (LCR) is a statistical method used to identify qualitatively different groups or latent classes within a heterogeneous population and commonly used in the behavioural, health, and social sciences. Despite the vast applications, an agreed fit index to correctly determine the number of latent classes is hotly debated. To add, there are also conflicting views on whether covariates should or should not be included into the class enumeration process. We conduct a simulation study to determine the impact of covariates on the class enumeration accuracy as well as study the performance of several commonly used fit indices under different population models and modelling conditions. Our results indicate that of the eight fit indices considered, the aBIC and BLRT proved to be the best performing fit indices for class enumeration. Furthermore, we found that covariates should not be included into the enumeration procedure. Our results illustrate that an unconditional LCA model can enumerate equivalently as well as a conditional LCA model with its true covariate specification. Even with the presence of large covariate effects in the population, the unconditional model is capable of enumerating with high accuracy. As noted by Nylund and Gibson (2016), a misspecified covariate specification can easily lead to an overestimation of latent classes. Therefore, we recommend to perform class enumeration without covariates and determine a set of candidate latent class models with the aBIC. Once that is determined, the BLRT can be utilized on the set of candidate models and confirm whether results obtained by the BLRT match the results of the aBIC. By separating the enumeration procedure of the BLRT, it still allows one to use the BLRT but reduce the heavy computational burden that is associated with this fit index. Subsequent analysis can then be pursued accordingly after the number of latent classes is determined. / Thesis / Master of Science (MSc) latent class analysis class enumeration latent variable models mplus simulations classifcation mixture models categorical data model selection latent class regression latent classes covariates measurement non-invariance direct effects
239	[en] FORECASTING INDUSTRIAL PRODUCTION IN BRAZIL USING MANY PREDICTORS / [pt] PREVENDO A PRODUÇÃO INDUSTRIAL BRASILEIRA USANDO MUITOS PREDITORES LEONARDO DE PAOLI CARDOSO DE CASTRO 23 December 2016 (has links) [pt] Nesse artigo, utilizamos o índice de produção industrial brasileira para comparar a capacidade preditiva de regressões irrestritas e regressões sujeitas a penalidades usando muitos preditores. Focamos no least absolute shrinkage and selection operator (LASSO) e suas extensões. Propomos também uma combinação entre métodos de encolhimento e um algorítmo de seleção de variáveis (PVSA). A performance desses métodos foi comparada com a de um modelo de fatores. Nosso estudo apresenta três principais resultados. Em primeiro lugar, os modelos baseados no LASSO apresentaram performance superior a do modelo usado como benchmark em projeções de curto prazo. Segundo, o PSVA teve desempenho superior ao benchmark independente do horizonte de projeção. Finalmente, as variáveis com a maior capacidade preditiva foram consistentemente selecionadas pelos métodos considerados. Como esperado, essas variáveis são intimamente relacionadas à atividade industrial brasileira. Exemplos incluem a produção de veículos e a expedição de papelão. / [en] In this article we compared the forecasting accuracy of unrestricted and penalized regressions using many predictors for the Brazilian industrial production index. We focused on the least absolute shrinkage and selection operator (Lasso) and its extensions. We also proposed a combination between penalized regressions and a variable search algorithm (PVSA). Factor-based models were used as our benchmark specification. Our study produced three main findings. First, Lasso-based models over-performed the benchmark in short-term forecasts. Second, the PSVA over-performed the proposed benchmark, regardless of the horizon. Finally, the best predictive variables are consistently chosen by all methods considered. As expected, these variables are closely related to Brazilian industrial activity. Examples include vehicle production and cardboard production. [pt] PROJECAO [pt] INDICADORES ANTECEDENTES [pt] ENCOLHIMENTO [pt] SELECAO DE MODELOS [pt] LASSO [pt] PRODUCAO INDUSTRIAL [en] PROJECTION [en] MODEL SELECTION [en] LASSO [en] INDUSTRIAL PRODUCTION
240	Automatic Development of Pharmacokinetic Structural Models Hamdan, Alzahra January 2022 (has links) Introduction: The current development strategy of population pharmacokinetic models is a complex and iterative process that is manually performed by modellers. Such a strategy is time-demanding, subjective, and dependent on the modellers’ experience. This thesis presents a novel model building tool that automates the development process of pharmacokinetic (PK) structural models. Methods: Modelsearch is a tool in Pharmpy library, an open-source package for pharmacometrics modelling, that searches for the best structural model using an exhaustive stepwise search algorithm. Given a dataset, a starting model and a pre-specified model search space of structural model features, the tool creates and fits a series of candidate models that are then ranked based on a selection criterion, leading to the selection of the best model. The Modelsearch tool was used to develop structural models for 10 clinical PK datasets (5 orally and 5 i.v. administered drugs). A starting model for each dataset was generated using the assemblerr package in R, which included a first-order (FO) absorption without any absorption delay for oral drugs, one-compartment disposition, FO elimination, a proportional residual error model, and inter-individual variability on the starting model parameters with a correlation between clearance (CL) and central volume of distribution (VC). The model search space included aspects of absorption and absorption delay (for oral drugs), distribution and elimination. In order to understand the effects of different IIV structures on structural model selection, five model search approaches were investigated that differ in the IIV structure of candidate models: 1. naïve pooling, 2. IIV on starting model parameters only, 3. additional IIV on mean delay time parameter, 4. additional diagonal IIVs on newly added parameters, and 5. full block IIVs. Additionally, the implementation of structural model selection in the workflow of the fully automatic model development was investigated. Three strategies were evaluated: SIR, SRI, and RSI depending on the development order of structural model (S), IIV model (I) and residual error model (R). Moreover, the NONMEM errors encountered when using the tool were investigated and categorized in order to be handled in the automatic model building workflow. Results: Differences in the final selected structural models for each drug were observed between the five different model search approaches. The same distribution components were selected through Approaches 1 and 2 for 6/10 drugs. Approach 2 has also identified an absorption delay component in 4/5 oral drugs, whilst the naïve pooling approach only identified an absorption delay model in 2 drugs. Compared to Approaches 1 and 2, Approaches 3, 4 and 5 tended to select more complex models and more often resulted in minimization errors during the search. For the SIR, SRI and RSI investigations, the same structural model was selected in 9/10 drugs with a significant higher run time in RSI strategy compared to the other strategies. The NONMEM errors were categorized into four categories based on the handling suggestions which is valuable to further improve the tool in its automatic error handling. Conclusions: The Modelsearch tool was able to automatically select a structural model with different strategies of setting the IIV model structure. This novel tool enables the evaluation of numerous combinations of model components, which would not be possible using a traditional manual model building strategy. Furthermore, the tool is flexible and can support multiple research investigations for how to best implement structural model selection in a fully automatic model development workflow. pharmacometrics non-linear mixed effects modelling pharmacokinetics NONMEM automation exhaustive algorithm stepwise algorithm structural models inter-individual variability residual error models model selection Pharmaceutical Sciences Farmaceutiska vetenskaper

Search results