Global ETD Search

1	Topological data analysis: applications in machine learning / Análise topológica de dados: aplicações em aprendizado de máquina Calcina, Sabrina Graciela Suárez 05 December 2018 (has links) Recently computational topology had an important development in data analysis giving birth to the field of Topological Data Analysis. Persistent homology appears as a fundamental tool based on the topology of data that can be represented as points in metric space. In this work, we apply techniques of Topological Data Analysis, more precisely, we use persistent homology to calculate topological features more persistent in data. In this sense, the persistence diagrams are processed as feature vectors for applying Machine Learning algorithms. In order to classification, we used the following classifiers: Partial Least Squares-Discriminant Analysis, Support Vector Machine, and Naive Bayes. For regression, we used Support Vector Regression and KNeighbors. Finally, we will give a certain statistical approach to analyze the accuracy of each classifier and regressor. / Recentemente a topologia computacional teve um importante desenvolvimento na análise de dados dando origem ao campo da Análise Topológica de Dados. A homologia persistente aparece como uma ferramenta fundamental baseada na topologia de dados que possam ser representados como pontos num espaço métrico. Neste trabalho, aplicamos técnicas da Análise Topológica de Dados, mais precisamente, usamos homologia persistente para calcular características topológicas mais persistentes em dados. Nesse sentido, os diagramas de persistencia são processados como vetores de características para posteriormente aplicar algoritmos de Aprendizado de Máquina. Para classificação, foram utilizados os seguintes classificadores: Análise de Discriminantes de Minimos Quadrados Parciais, Máquina de Vetores de Suporte, e Naive Bayes. Para a regressão, usamos a Regressão de Vetores de Suporte e KNeighbors. Finalmente, daremos uma certa abordagem estatística para analisar a precisão de cada classificador e regressor. Betti numbers Classificação de proteínas Classificador Naive Bayes Classificador PLS-DA Classificador SVM Diagramas de persistencia Homologia persistente KNeighbors regressor Naive Bayes classifier Números de Betti Persistence diagrams Persistent homology PLS-DA classifier Protein classification Regressor KNeighbors Regressor SVR SVM classifier SVR regressor
2	Initialization Methods for System Identification Lyzell, Christian January 2009 (has links) <p>In the system identification community a popular framework for the problem of estimating a parametrized model structure given a sequence of input and output pairs is given by the prediction-error method. This method tries to find the parameters which maximize the prediction capability of the corresponding model via the minimization of some chosen cost function that depends on the prediction error. This optimization problem is often quite complex with several local minima and is commonly solved using a local search algorithm. Thus, it is important to find a good initial estimate for the local search algorithm. This is the main topic of this thesis.</p><p>The first problem considered is the regressor selection problem for estimating the order of dynamical systems. The general problem formulation is difficult to solve and the worst case complexity equals the complexity of the exhaustive search of all possible combinations of regressors. To circumvent this complexity, we propose a relaxation of the general formulation as an extension of the nonnegative garrote regularization method. The proposed method provides means to order the regressors via their time lag and a novel algorithmic approach for the \textsc{arx} and \textsc{lpv-arx} case is given.</p><p> </p><p>Thereafter, the initialization of linear time-invariant polynomial models is considered. Usually, this problem is solved via some multi-step instrumental variables method. For the estimation of state-space models, which are closely related to the polynomial models via canonical forms, the state of the art estimation method is given by the subspace identification method. It turns out that this method can be easily extended to handle the estimation of polynomial models. The modifications are minor and only involve some intermediate calculations where already available tools can be used. Furthermore, with the proposed method other a priori information about the structure can be readily handled, including a certain class of linear gray-box structures. The proposed extension is not restricted to the discrete-time case and can be used to estimate continuous-time models.</p><p> </p><p>The final topic in this thesis is the initialization of discrete-time systems containing polynomial nonlinearities. In the continuous-time case, the tools of differential algebra, especially Ritt's algorithm, have been used to prove that such a model structure is globally identifiable if and only if it can be written as a linear regression model. In particular, this implies that once Ritt's algorithm has been used to rewrite the nonlinear model structure into a linear regression model, the parameter estimation problem becomes trivial. Motivated by the above and the fact that most system identification problems involve sampled data, a version of Ritt's algorithm for the discrete-time case is provided. This algorithm is closely related to the continuous-time version and enables the handling of noise signals without differentiations.</p> System identification Initialization methods Regressor selection Identifiability Automatic control Reglerteknik
3	Initialization Methods for System Identification Lyzell, Christian January 2009 (has links) In the system identification community a popular framework for the problem of estimating a parametrized model structure given a sequence of input and output pairs is given by the prediction-error method. This method tries to find the parameters which maximize the prediction capability of the corresponding model via the minimization of some chosen cost function that depends on the prediction error. This optimization problem is often quite complex with several local minima and is commonly solved using a local search algorithm. Thus, it is important to find a good initial estimate for the local search algorithm. This is the main topic of this thesis. The first problem considered is the regressor selection problem for estimating the order of dynamical systems. The general problem formulation is difficult to solve and the worst case complexity equals the complexity of the exhaustive search of all possible combinations of regressors. To circumvent this complexity, we propose a relaxation of the general formulation as an extension of the nonnegative garrote regularization method. The proposed method provides means to order the regressors via their time lag and a novel algorithmic approach for the \textsc{arx} and \textsc{lpv-arx} case is given. Thereafter, the initialization of linear time-invariant polynomial models is considered. Usually, this problem is solved via some multi-step instrumental variables method. For the estimation of state-space models, which are closely related to the polynomial models via canonical forms, the state of the art estimation method is given by the subspace identification method. It turns out that this method can be easily extended to handle the estimation of polynomial models. The modifications are minor and only involve some intermediate calculations where already available tools can be used. Furthermore, with the proposed method other a priori information about the structure can be readily handled, including a certain class of linear gray-box structures. The proposed extension is not restricted to the discrete-time case and can be used to estimate continuous-time models. The final topic in this thesis is the initialization of discrete-time systems containing polynomial nonlinearities. In the continuous-time case, the tools of differential algebra, especially Ritt's algorithm, have been used to prove that such a model structure is globally identifiable if and only if it can be written as a linear regression model. In particular, this implies that once Ritt's algorithm has been used to rewrite the nonlinear model structure into a linear regression model, the parameter estimation problem becomes trivial. Motivated by the above and the fact that most system identification problems involve sampled data, a version of Ritt's algorithm for the discrete-time case is provided. This algorithm is closely related to the continuous-time version and enables the handling of noise signals without differentiations. System identification Initialization methods Regressor selection Identifiability Control Engineering Reglerteknik
4	Factor analysis of the growth of startups / Faktoranalys av tillväxten av start-ups Stenharg, Jonatan, Räisänen, Marcus January 2022 (has links) The task of predicting start-up growth has been an item of institutional as wellas widespread individual research and acclaim of those successful. This workis an attempt to distill the alleged factors of prediction in the large body ofwork that has already been documented, as well as investigating reasonable butyet untested variables. Conclusions are built with a multiple regression model,exploring 7 regressors with data spanning 2014-2019 to avoid the potentiallyabnormal impact of the Covid-19 crisis.Due to the choice of non-predictive regressors, the final result is an explanatorymodel, highlighting the importance of rigorousness in the process of model-building and outlining of data collection in regression analysis. Most regres-sors had a non-significant or weak relationship with the response variable, butconcludes an explanatory degree of 51%. Even if it can not be utilised as apredictive model, it may provide some interesting insight. In the final model,every regressor except one had an unexpected beta value, contradicting earlierresearch. / Åtagandet att förutsäga 'start-up' tillväxt har varit ämne för institutionellsåväl som utbredd individuell undersökning och följaktligen hyllning för de som varit framgångsrika. Denna studieär ett försök att destillera de påstådda faktorerna för förutsägelse utifrån den redan dokumenterade litteraturen, samt utreda rimliga men ännu outforskade variabler. Slutsatser byggs med en multipel regressionsmodell,som utforskar 7 regressorer med data som sträcker sig 2014-2019 för att undvika de potentiellt atypiska effekterna av covid-19-krisen.På grund av valet av vissa icke-prediktiva regressorer är slutresultatet en förklarandemodell, som belyser vikten av noggrannhet i processen för modell-uppbyggnad och planeringen av datainsamling i regressionsanalys. De flesta regressorer hade ett icke-signifikant eller svagt samband med svarsvariabeln, men den slutgiltiga modellen påvisar en förklaringsgrad på 51 %. Även om den inte kan användas som en prediktiv modell, så bidrar den med betydande insikter. Alla regressorer utom en visade ett oväntat beta-värde, vilket motsäger tidigare forskning. Start-up Growth Prediction Regressor CAGR Venture Capital Unicorn SaaS Start-up Tillväxt Prediktion Regressor Riskkapital CAGR Enhörning SaaS Probability Theory and Statistics Sannolikhetsteori och statistik
5	Maskininlärning som verktyg för att extrahera information om attribut kring bostadsannonser i syfte att maximera försäljningspris / Using machine learning to extract information from real estate listings in order to maximize selling price Ekeberg, Lukas, Fahnehjelm, Alexander January 2018 (has links) The Swedish real estate market has been digitalized over the past decade with the current practice being to post your real estate advertisement online. A question that has arisen is how a seller can optimize their public listing to maximize the selling premium. This paper analyzes the use of three machine learning methods to solve this problem: Linear Regression, Decision Tree Regressor and Random Forest Regressor. The aim is to retrieve information regarding how certain attributes contribute to the premium value. The dataset used contains apartments sold within the years of 2014-2018 in the Östermalm / Djurgården district in Stockholm, Sweden. The resulting models returned an R2-value of approx. 0.26 and Mean Absolute Error of approx. 0.06. While the models were not accurate regarding prediction of premium, information was still able to be extracted from the models. In conclusion, a high amount of views and a publication made in April provide the best conditions for an advertisement to reach a high selling premium. The seller should try to keep the amount of days since publication lower than 15.5 days and avoid publishing on a Tuesday. / Den svenska bostadsmarknaden har blivit alltmer digitaliserad under det senaste årtiondet med nuvarande praxis att säljaren publicerar sin bostadsannons online. En fråga som uppstår är hur en säljare kan optimera sin annons för att maximera budpremie. Denna studie analyserar tre maskininlärningsmetoder för att lösa detta problem: Linear Regression, Decision Tree Regressor och Random Forest Regressor. Syftet är att utvinna information om de signifikanta attribut som påverkar budpremien. Det dataset som använts innehåller lägenheter som såldes under åren 2014-2018 i Stockholmsområdet Östermalm / Djurgården. Modellerna som togs fram uppnådde ett R²-värde på approximativt 0.26 och Mean Absolute Error på approximativt 0.06. Signifikant information kunde extraheras from modellerna trots att de inte var exakta i att förutspå budpremien. Sammanfattningsvis skapar ett stort antal visningar och en publicering i april de bästa förutsättningarna för att uppnå en hög budpremie. Säljaren ska försöka hålla antal dagar sedan publicering under 15.5 dagar och undvika att publicera på tisdagar. correlation linear regression decision tree regressor random forest regressor gini impurity pricing property market data features predictive models machine learning algorithms Computer and Information Sciences Data- och informationsvetenskap
6	Regressor and Structure Selection : Uses of ANOVA in System Identification Lind, Ingela January 2006 (has links) Identification of nonlinear dynamical models of a black box nature involves both structure decisions (i.e., which regressors to use and the selection of a regressor function), and the estimation of the parameters involved. The typical approach in system identification is often a mix of all these steps, which for example means that the selection of regressors is based on the fits that is achieved for different choices. Alternatively one could then interpret the regressor selection as based on hypothesis tests (F-tests) at a certain confidence level that depends on the data. It would in many cases be desirable to decide which regressors to use, independently of the other steps. A survey of regressor selection methods used for linear regression and nonlinear identification problems is given. In this thesis we investigate what the well known method of analysis of variance (ANOVA) can offer for this problem. System identification applications violate many of the ideal conditions for which ANOVA was designed and we study how the method performs under such non-ideal conditions. It turns out that ANOVA gives better and more homogeneous results compared to several other regressor selection methods. Some practical aspects are discussed, especially how to categorise the data set for the use of ANOVA, and whether to balance the data set used for structure identification or not. An ANOVA-based method, Test of Interactions using Layout for Intermixed ANOVA (TILIA), for regressor selection in typical system identification problems with many candidate regressors is developed and tested with good performance on a variety of simulated and measured data sets. Typical system identification applications of ANOVA, such as guiding the choice of linear terms in the regression vector and the choice of regime variables in local linear models, are investigated. It is also shown that the ANOVA problem can be recast as an optimisation problem. Two modified, convex versions of the ANOVA optimisation problem are then proposed, and it turns out that they are closely related to the nn-garrote and wavelet shrinkage methods, respectively. In the case of balanced data, it is also shown that the methods have a nice orthogonality property in the sense that different groups of parameters can be computed independently. System identification Regressor selection Analysis of variance Nonlinear systems Structure selection Automatic control Reglerteknik
7	Economic valuation of ecosystems and natural resources / Evaluation économique des écosystèmes et des ressources naturelles Kalisa, Thierry 26 May 2014 (has links) Cette thèse a pour but d'étudier les méthodes d'évaluation des ressources environnementales : la méthode des Coûts de Transport (CT) à préférences révélées et la méthode d'évaluation contingente (EC) à préférences déclarées afin de proposer les contributions suivantes. Dans le chapitre 1, nous montrons qu'il est possible si les données sur les deux méthodes sont disponibles pour les mêmes observations, d'obtenir une meilleure mesure de la disposition à payer (DAP) par la combinaison des deux méthodes en utilisant la technique du maximum de vraisemblance simulé. Dans le chapitre 2, nous montrons qu'une nouvelle approche: le "special regressor" pourrait être une solution pour traiter les problèmes d'endogénéité en EC. En utilisant des données sur la DAP pour réduire les risques subjectifs de mortalité due à la présence d' Arsenic dans l'eau potable , nous montrons que le problème d'endogénéité du niveau subjectif de risque de mortalité peut être réglé efficacement. Enfin dans le chapitre 3, en utilisant une nouvelle enquête sur l'électrification rurale au Rwanda, nous proposons un nouveau design de la méthode d'EC en permettant aux personnes interrogées de choisir entre une contribution en temps ou en argent. Ainsi, en plus de mesurer une DAP classique, nous obtenons aussi une disposition à contribuer du temps mesurée en jours, qui est une mesure aussi voire même plus pertinente que la DAP dans le contexte d'un pays en développement. / This dissertation aims at investigating the methods of the environmental resources valuation: revealed preferences Travel Cost (TC) method and stated preferences Contingent Valuation (CV) method in order to propose the following contributions. In chapter 1, we show that it is possible if both CV and TC data are available for the same observations, to obtain a better measure of willingness to pay (WTP) by combining the two methods using Simulated maximum Likelihood technique. In chapter 2, we show that the new special regressor approach could be a solution to treat endogeneity issues in CV. Using data on WTP for reducing subjective mortality risks due to arsenic in drinking water, we show that the endogeneity of the subjective mortality risk level can be treated effectively. Finally in chapter 3, using a new survey about rural electrification in Rwanda, we propose a new design for the CV method by allowing people to choose between a contribution in time or in money. Thus, in addition to measure a conventional WTP, we also obtain a willingness to contribute time measure which is as or even more relevant than WTP in the context of a developing country. Evaluation contingente Disposition à payer Risque subjectif Préférences révélées Maximum de vraisemblance simulé Arsenic dans l’eau potable Special regressor Electrification Temps Contingent valuation Willingness to Pay Subjective risk Revealed preferences Simulated maximum likelihood Arsenic in drinking water Special regressor Electrification Time
8	Predictive Autoscaling of Systems using Artificial Neural Networks Lundström, Christoffer, Heiding, Camilla January 2021 (has links) Autoscalers handle the scaling of instances in a system automatically based on specified thresholds such as CPU utilization. Reactive autoscalers do not take the delay of initiating a new instance into account, which may lead to overutilization. By applying machine learning methodology to predict future loads and the desired number of instances, it is possible to preemptively initiate scaling such that new instances are available before demand occurs. Leveraging efficient scaling policies keeps the costs and energy consumption low while ensuring the availability of the system. In this thesis, the predictive capability of different multilayer perceptron configurations is investigated to elicit a suitable model for a telecom support system. The results indicate that it is possible to accurately predict future load using a multilayer perceptron regressor model. However, the possibility of reproducing the results in a live environment is questioned as the dataset used is derived from a simulation. autoscaling predictive autoscaling machine learning artificial neural networks multilayer preceptrons MLP-regressor time series forecasting Computer Sciences Datavetenskap (datalogi)
9	Application of Machine Learning and AI for Prediction in Ungauged Basins Pin-Ching Li (16734693) 03 August 2023 (has links) <p>Streamflow prediction in ungauged basins (PUB) is a process generating streamflow time series at ungauged reaches in a river network. PUB is essential for facilitating various engineering tasks such as managing stormwater, water resources, and water-related environmental impacts. Machine Learning (ML) has emerged as a powerful tool for PUB using its generalization process to capture the streamflow generation processes from hydrological datasets (observations). ML’s generalization process is impacted by two major components: data splitting process of observations and the architecture design. To unveil the potential limitations of ML’s generalization process, this dissertation explores its robustness and associated uncertainty. More precisely, this dissertation has three objectives: (1) analyzing the potential uncertainty caused by the data splitting process for ML modeling, (2) investigating the improvement of ML models’ performance by incorporating hydrological processes within their architectures, and (3) identifying the potential biases in ML’s generalization process regarding the trend and periodicity of streamflow simulations.</p><p>The first objective of this dissertation is to assess the sensitivity and uncertainty caused by the regular data splitting process for ML modeling. The regular data splitting process in ML was initially designed for homogeneous and stationary datasets, but it may not be suitable for hydrological datasets in the context of PUB studies. Hydrological datasets usually consist of data collected from diverse watersheds with distinct streamflow generation regimes influenced by varying meteorological forcing and watershed characteristics. To address the potential inconsistency in the data splitting process, multiple data splitting scenarios are generated using the Monte Carlo method. The scenario with random data splitting results accounts for frequent covariate shift and tends to add uncertainty and biases to ML’s generalization process. The findings in this objective suggest the importance of avoiding the covariate shift during the data splitting process when developing ML models for PUB to enhance the robustness and reliability of ML’s performance.</p><p>The second objective of this dissertation is to investigate the improvement of ML models’ performance brought by Physics-Guided Architecture (PGA), which incorporates ML with the rainfall abstraction process. PGA is a theory-guided machine learning framework integrating conceptual tutors (CTs) with ML models. In this study, CTs correspond to rainfall abstractions estimated by Green-Ampt (GA) and SCS-CN models. Integrating the GA model’s CTs, which involves information on dynamic soil properties, into PGA models leads to better performance than a regular ML model. On the contrary, PGA models integrating the SCS-CN model's CTs yield no significant improvement of ML model’s performance. The results of this objective demonstrate that the ML’s generalization process can be improved by incorporating CTs involving dynamic soil properties.</p><p>The third objective of this dissertation is to explore the limitations of ML’s generalization process in capturing trend and periodicity for streamflow simulations. Trend and periodicity are essential components of streamflow time series, representing the long-term correlations and periodic patterns, respectively. When the ML models generate streamflow simulations, they tend to have relatively strong long-term periodic components, such as yearly and multiyear periodic patterns. In addition, compared to the observed streamflow data, the ML models display relatively weak short-term periodic components, such as daily and weekly periodic patterns. As a result, the ML’s generalization process may struggle to capture the short-term periodic patterns in the streamflow simulations. The biases in ML’s generalization process emphasize the demands for external knowledge to improve the representation of the short-term periodic components in simulating streamflow.</p> Surface water hydrology Hydrology Machine Learning prediction in ungauged basins (PUB) streamflow predictions Random forest (RF) regressor LSTM neural networks
10	Analysis of Robustness in Lane Detection using Machine Learning Models Adams, William A. January 2015 (has links) No description available. Artificial Intelligence Automotive Engineering Engineering Computer Science Machine Learning ADAS Lane Detection Autoencoder Regressor Deep Network Deep Learning

Search results