11 |
Aggregating predictions using Non-Disclosed Conformal PredictionCarrión Brännström, Robin January 2019 (has links)
When data are stored in different locations and pooling of such data is not allowed, there is an informational loss when doing predictive modeling. In this thesis, a new method called Non-Disclosed Conformal Prediction (NDCP) is adapted into a regression setting, such that predictions and prediction intervals can be aggregated from different data sources without interchanging any data. The method is built upon the Conformal Prediction framework, which produces predictions with confidence measures on top of any machine learning method. The method is evaluated on regression benchmark data sets using Support Vector Regression, with different sizes and settings for the data sources, to simulate real life scenarios. The results show that the method produces conservatively valid prediction intervals even though in some settings, the individual data sources do not manage to create valid intervals. NDCP also creates more stable intervals than the individual data sources. Thanks to its straightforward implementation, data owners which cannot share data but would like to contribute to predictive modeling, would benefit from using this method.
|
12 |
Use of social media to monitor and predict outbreaks and public opinion on health topicsSignorini, Alessio 01 December 2014 (has links)
The world in which we live has changed rapidly over the last few decades. Threats of bioterrorism, influenza pandemics, and emerging infectious diseases coupled with unprecedented population mobility led to the development of public health surveillance systems. These systems are useful in detecting and responding to infectious disease outbreaks but often operate with a considerable delay and fail to provide the necessary lead time for optimal public health response.
In contrast, syndromic surveillance systems rely on clinical features (e.g., activities prompted by the onset of symptoms) that are discernible prior to diagnosis to warn of changes in disease activity. Although less precise, these systems can offer considerable lead time. Patient information may be acquired from multiple existing sources established for other purposes, including, for example, emergency department primary complaints, ambulance dispatch data, and over-the-counter medication sales. Unfortunately, these data are often expensive, sometimes difficult to obtain and almost always hard to integrate.
Fortunately, the proliferation of online social networks makes much more information about our daily habits and lifestyles freely available and easily accessible on the web. Twitter, Facebook and FourSquare are only a few examples of the many websites where people voluntarily post updates on their daily behaviors, health status, and physical location.
In this thesis we develop and apply methods to collect, filter and analyze the content of social media postings in order to make predictions. As a proof of concept we used Twitter data to predict public opinion in the form of the outcome of a popular television show. We then used the same methods to monitor and track public perception of influenza during the H1N1 epidemic, and even to predict disease burden in real time, which is a measurable advance over current public health practice. Finally, we used location specific social media data to model human travels and show how this data can improve our prediction of disease burden.
|
13 |
Modeling Crash Severity and Speed Profile at Roadway Work ZonesWang, Zhenyu 25 March 2008 (has links)
Work zone tends to cause hazardous conditions for drivers and construction workers since work zones generate conflicts between construction activities and the traffic, therefore aggravate the existing traffic conditions and result in severe traffic safety and operational problems. To address the influence of various factors on the crash severity is beneficial to understand the characteristics of work zone crashes. The understanding can be used to select proper countermeasures for reducing the crash severity at work zones and improving work zone safety. In this dissertation, crash severity models were developed to explore the factor impacts on crash severity for two work zone crash datasets (overall crashes and rear-end crashes). Partial proportional odds logistic regression, which has less restriction to the parallel regression assumption and provides more reasonable interpretations of the coefficients, was used to estimate the models. The factor impacts were summarized to indicate which factors are more likely to increase work zone crash severity or which factors tends to reduce the severity.
Because the speed variety is an important factor causing accidents at work zone area, the work zone speed profile was analyzed and modeled to predict the distribution of speed along the distance to the starting point of lane closures. A new learning machine algorithm, support vector regression (SVR), was utilized to develop the speed profile model for freeway work zone sections under various scenarios since its excellent generalization ability. A simulation-based experiment was designed for producing the speed data (output data) and scenario data (input data). Based on these data, the speed profile model was trained and validated. The speed profile model can be used as a reference for designing appropriate traffic control countermeasures to improve the work zone safety.
|
14 |
Constrained Motion Particle Swarm Optimization for Non-Linear Time Series PredictionSapankevych, Nicholas 13 March 2015 (has links)
Time series prediction techniques have been used in many real-world applications such as financial market prediction, electric utility load forecasting, weather and environmental state prediction, and reliability forecasting. The underlying system models and time series data generating processes are generally complex for these applications and the models for these systems are usually not known a priori. Accurate and unbiased estimation of time series data produced by these systems cannot always be achieved using well known linear techniques, and thus the estimation process requires more advanced time series prediction algorithms.
One type of time series interpolation and prediction algorithm that has been proven to be effective for these various types of applications is Support Vector Regression (SVR) [1], which is based on the Support Vector Machine (SVM) developed by Vapnik et al. [2, 3]. The underlying motivation for using SVMs is the ability of this methodology to accurately forecast time series data when the underlying system processes are typically nonlinear, non-stationary and not defined a-priori. SVMs have also been proven to outperform other non-linear techniques including neural-network based non-linear prediction techniques such as multi-layer perceptrons.
As with most time series prediction algorithms, there are typically challenges associated in applying a given heuristic to any general problem. One difficult challenge in using SVR to solve these types of problems is the selection of free parameters associated with the SVR algorithm. There is no given heuristic to select SVR free parameters and the user is left to adjust these parameters in an ad hoc manner.
The focus of this dissertation is to present an alternative to the typical ad hoc approach of tuning SVR for time series prediction problems by using Particle Swarm Optimization (PSO) to assist in the SVR free parameter selection process. Developed by Kennedy and Eberhart [4-8], PSO is a technique that emulates the process living creatures (such as birds or insects) use to discover food resources at a given geographic location. PSO has been proven to be an effective technique for many different kinds of optimization problems [9-11].
The focus of this dissertation is to present an alternative to the typical ad hoc approach of tuning SVR for time series prediction problems by using Particle Swarm Optimization (PSO) to assist in the SVR free parameter selection process. Developed by Kennedy and Eberhart [4-8], PSO is a technique that emulates the process living creatures (such as birds or insects) use to discover food resources at a given geographic location. PSO has been proven to be an effective technique for many different kinds of optimization problems [9-11].
|
15 |
Geometric Tolerancing of Cylindricity Utilizing Support Vector RegressionLee, Keun Joo 01 January 2009 (has links)
In the age where quick turn around time and high speed manufacturing methods are becoming more important, quality assurance is a consistent bottleneck in production. With the development of cheap and fast computer hardware, it has become viable to use machine vision for the collection of data points from a machined part. The generation of these large sample points have necessitated a need for a comprehensive algorithm that will be able to provide accurate results while being computationally efficient. Current established methods are least-squares (LSQ) and non-linear programming (NLP). The LSQ method is often deemed too inaccurate and is prone to providing bad results, while the NLP method is computationally taxing. A novel method of using support vector regression (SVR) to solve the NP-hard problem of cylindricity of machined parts is proposed. This method was evaluated against LSQ and NLP in both accuracy and CPU processing time. An open-source, user-modifiable programming package was developed to test the model. Analysis of test results show the novel SVR algorithm to be a viable alternative in exploring different methods of cylindricity in real-world manufacturing.
|
16 |
ICA-clustered Support Vector Regressions in Time Series Stock Price ForecastingChen, Tse-Cheng 29 August 2012 (has links)
Financial time-series forecasting has long been discussed because of its vitality for making informed investment decisions. This kind of problem, however, is intrinsically challenging due to the data dynamics in nature. Most of the research works in the past focus on artificial neural network (ANN)-based approaches. It has been pointed out that such approaches suffer from explanatory power and generalized prediction ability though.
The objective of this research is thus to propose a hybrid approach for stock price forecasting. Independent component analysis (ICA) is employed to reveal the latent structure of the observed time-series and remove noise and redundancy in the structure. It further assists clustering analysis. Support vector regression (SVR) models are then applied to enhance the generalization ability with separate models built based on the time-series data of companies in each individual cluster.
Two experiments are conducted accordingly. The results show that SVR has robust accuracy performance. More importantly, SVR models with ICA-based clustered data perform better than the single SVR model with all data involved. Our proposed approach does enhance the generalization ability of the forecasting models, which justifies the feasibility of its applications.
|
17 |
Parameter learning and support vector reduction in support vector regressionYang, Chih-cheng 21 July 2006 (has links)
The selection and learning of kernel functions is a very important but rarely studied problem in the field of support vector learning. However, the kernel function of a support vector regression has great influence on its performance. The kernel function projects the dataset from the original data space into the feature space, and therefore the problems which can not be done in low dimensions could be done in a higher dimension through the transform of the kernel function.
In this paper, there are two main contributions. Firstly, we introduce the gradient descent method to the learning of kernel functions. Using the gradient descent method, we can conduct learning rules of the parameters which indicate the shape and distribution of the kernel functions. Therefore, we can obtain better kernel functions by training their parameters with respect to the risk minimization principle. Secondly, In order to reduce the number of support vectors, we use the orthogonal least squares method. By choosing the representative support vectors, we may remove the less important support vectors in the support vector regression model.
The experimental results have shown that our approach can derive better kernel functions than others and has better generalization ability. Also, the number of support vectors can be effectively reduced.
|
18 |
A Multiple-Kernel Support Vector Regression Approach for Stock Market Price ForecastingHuang, Chi-wei 05 August 2009 (has links)
Support vector regression has been applied to stock market forecasting problems. However, it is usually needed to tune manually the hyperparameters of the kernel functions. Multiple-kernel learning was developed to deal with this problem, by which the kernel matrix weights and Lagrange multipliers can be simultaneously derived through semidefinite programming. However, the amount of time and space required is very demanding. We develop a two-stage multiple-kernel learning algorithm by incorporating sequential minimal optimization and the gradient projection method.
By this algorithm, advantages from different hyperparameter settings can be combined and overall system performance can be improved. Besides, the user need not specify the hyperparameter settings in advance, and trial-and-error for determining appropriate hyperparameter settings can then be avoided. Experimental results, obtained by running on datasets taken from Taiwan Capitalization Weighted Stock Index, show that our method performs better than other methods.
|
19 |
Photovoltaic Systems: Forecasting for Demand Response Management and Environmental Modelling to Design Accelerated Aging TestsJanuary 2017 (has links)
abstract: Distributed Renewable energy generators are now contributing a significant amount of energy into the energy grid. Consequently, reliability adequacy of such energy generators will depend on making accurate forecasts of energy produced by them. Power outputs of Solar PV systems depend on the stochastic variation of environmental factors (solar irradiance, ambient temperature & wind speed) and random mechanical failures/repairs. Monte Carlo Simulation which is typically used to model such problems becomes too computationally intensive leading to simplifying state-space assumptions. Multi-state models for power system reliability offer a higher flexibility in providing a description of system state evolution and an accurate representation of probability. In this study, Universal Generating Functions (UGF) were used to solve such combinatorial problems. 8 grid connected Solar PV systems were analyzed with a combined capacity of about 5MW located in a hot-dry climate (Arizona) and accuracy of 98% was achieved when validated with real-time data. An analytics framework is provided to grid operators and utilities to effectively forecast energy produced by distributed energy assets and in turn, develop strategies for effective Demand Response in times of increased share of renewable distributed energy assets in the grid. Second part of this thesis extends the environmental modelling approach to develop an aging test to be run in conjunction with an accelerated test of Solar PV modules. Accelerated Lifetime Testing procedures in the industry are used to determine the dominant failure modes which the product undergoes in the field, as well as predict the lifetime of the product. UV stressor is one of the ten stressors which a PV module undergoes in the field. UV exposure causes browning of modules leading to drop in Short Circuit Current. This thesis presents an environmental modelling approach for the hot-dry climate and extends it to develop an aging test methodology. This along with the accelerated tests would help achieve the goal of correlating field failures with accelerated tests and obtain acceleration factor. This knowledge would help predict PV module degradation in the field within 30% of the actual value and help in knowing the PV module lifetime accurately. / Dissertation/Thesis / Masters Thesis Industrial Engineering 2017
|
20 |
Regressão por vetores de suporte aplicado na determinação de propriedades físico-químicas de petróleo e biocombustíveis / Support vector regression applied to the determination of physicochemical properties of petroleum and biofuelsFilgueiras, Paulo Roberto, 1982- 26 August 2018 (has links)
Orientador: Ronei Jesus Poppi / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Química / Made available in DSpace on 2018-08-26T02:05:14Z (GMT). No. of bitstreams: 1
Filgueiras_PauloRoberto_D.pdf: 5218504 bytes, checksum: f307cad15b41230e15e9d018c384a91d (MD5)
Previous issue date: 2014 / Resumo: O petróleo é constituído por uma mistura complexa de composição química heterogênea. Sua completa avaliação envolve cerca de 700 ensaios físico-químicos, consumindo de 10 a 70 litros de amostra em aproximadamente 1 ano de análises. Visando reduzir tempo e quantidade de amostra, nesta Tese, métodos espectroscópicos aliados à Regressão por Vetores de Suporte (SVR) foram aplicados na determinação de algumas propriedades físico-químicas de petróleos e biocombustíveis. Diferentes abordagens para otimização e interpretação dos modelos SVR foram desenvolvidas: técnica para determinar as variáveis mais importantes na construção dos modelos, estimativa de intervalos de confiança nas previsões e avaliação de tendências nos resíduos. Foram realizadas quatro aplicações com diferentes técnicas instrumentais. A primeira aplicação foi direcionada a interpretação dos modelos SVR construídos a partir de espectros de infravermelho médio (MIR) na determinação da gravidade API, viscosidade cinemática e teor de água em petróleos. Na segunda aplicação foi desenvolvido um método para estimar o intervalo de confiança de modelos SVR aplicados a espectros de Ressonância Magnética Nuclear de próton (RMN de 1H) na determinação das temperaturas equivalentes a 10%, 50% e 90% de volume destilado de petróleo. Na terceira aplicação foi desenvolvido um método para selecionar variáveis espectrais e otimizar os parâmetros do modelo SVR simultaneamente por algoritmo genético, aplicado a espectros de Ressonância Magnética Nuclear de carbono 13 (RMN de 13C) na determinação de saturados, aromáticos, resinas e asfaltenos (SARA) em petróleos. Na última aplicação, procurou-se selecionar variáveis espectrais utilizando o método de sinergismo de intervalos, aplicado a espectros de infravermelho próximo (NIR) para quantificar biodiesel de gordura animal em mistura com biodiesel de soja e diesel B20. Os resultados apontam o SVR como excelente ferramenta para calibração multivariada aplicada a dados complexos como petróleo e biocombustíveis / Abstract: Crude oil is composed by a complex mixture of heterogeneous chemical composition. Its full evaluation involves about 700 physicochemical experiments, consuming about 10-70 liters of sample in there about 1 year of analysis. In order to reduce time and amount of sample, in this Thesis, spectroscopic methods combined with Support Vector Regression (SVR) were applied in determination of physicochemical properties of petroleum and biofuels. Different approaches for optimization and interpretation of SVR models were developed: techniques to determine the most important variables in the model development, determination of confidence intervals in predictions and assessment of trends in residuals. Four applications with different instrumental techniques were performed. The first application was directed to interpretation of SVR models built from mid infrared (MIR) spectra to determination of the API gravity, kinematic viscosity and water content in petroleum. In the second application, it was developed a method to estimation of the confidence interval of SVR models applied in spectra of proton nuclear magnetic resonance (1H NMR) for the determination of equivalent temperatures to 10%, 50% and 90% of distillate volume in petroleum. In the third application it was developed a method for spectral variables selection and optimization the SVR model parameters simultaneously by genetic algorithm, applied to nuclear magnetic resonance spectra of carbon 13 (13C NMR) in determination of saturates, aromatics, resins and asphaltenes (SARA) in petroleum. In the last application, it was proposed a method for spectral variables selection using the synergism of intervals, applied to near-infrared (NIR) spectra to quantify biodiesel from animal fat in blend with biodiesel from soybean and diesel B20. The results indicate the SVR as an excellent tool for multivariate calibration applied to complex dataset such as petroleum and biofuels / Doutorado / Quimica Analitica / Doutor em Ciências
|
Page generated in 0.1129 seconds