Global ETD Search

1	Methods to combine predictions from ensemble learning in multivariate forecasting Conesa Gago, Agustin January 2021 (has links) Making predictions nowadays is of high importance for any company, whether small or large, as thanks to the possibility to analyze the data available, new market opportunities can be found, risks and costs can be reduced, among others. Machine learning algorithms for time series can be used for predicting future values of interest. However, choosing the appropriate algorithm and tuning its metaparameters require a great level of expertise. This creates an adoption barrier for small and medium enterprises which could not afford hiring a machine learning expert to their IT team. For these reasons, this project studies different possibilities to make good predictions based on machine learning algorithms, but without requiring great theoretical knowledge from the users. Moreover, a software package that implements the prediction process has been developed. The software is an ensemble method that first predicts a value taking into account different algorithms at the same time, and then it combines their results considering also the previous performance of each algorithm to obtain a final prediction of the value. Moreover, the solution proposed and implemented in this project can also predict according to a concrete objective (e.g., optimize the prediction, or do not exceed the real value) because not every prediction problem is subject to the same constraints. We have experimented and validated the implementation with three different cases. In all of them, a better performance has been obtained in comparison with each of the algorithms involved, reaching improvements of 45 to 95%. Machine learning Online supervised learning Ensemble method Regression Computer Systems Datorsystem
2	A regression spline based approach to enhance the prediction accuracy of bicycle counter data Alkayali, Omar January 2022 (has links) Regression analysis has been used in previous research to predict the number of bicycles registered by a bicycle counter. An important step to improve the prediction is to include a long-term trend curve estimate as part of the formulation of the regression target variable. In this way, it is possible to use the deviation from the trend curve estimate instead of the absolute number of bicycles as target variable in the regression problem formulation. This can help capturing the factors that are difficult, or even impossible, to model as input variables in the regression model, for example, larger infrastructural changes. This study aims to evaluate a regression spline-based approach to enhance the prediction accuracy of bicycle counter data. This will be achieved by formulating a regression problem, generating trend curve estimates using regression splines, and evaluating the resulted curves using cross validation on a set of chosen regression algorithms. We illustrate our approach by applying it on a time series recorded by a bicycle counter in Malmö city, Sweden. For the considered data set, our experimental results show that the spline trend curve estimate with knots between 12-19, which has been fitted to the time series, gives the best prediction. It also shows that the use of ensemble methods leads to better prediction, where the G.B. Regressor shows best performance with 19 knots. Bicycle counter regression spline trend curve prediction ensemble method Computer Sciences Datavetenskap (datalogi)
3	Bioinformatický nástroj pro predikci rozpustnosti proteinů / Bioinformatics Tool for Prediction of Protein Solubility Hronský, Patrik January 2016 (has links) This master's thesis addresses the solubility of recombinant proteins and its prediction. It describes the subject of protein synthesis, as well as the process of recombinant protein creation. Recombinant protein synthesis is of great importance for example to pharmacologic industry. This synthesis is not a simple task and it does not always produce viable proteins. Protein solubility is an important factor, determining the viability of the resulting proteins. It is of course favourable for companies, that take part in recombinant protein synthesis, to focus their effort and their resources on proteins, that will be viable in the end. In this regard, bioinformatics is of great help, as it is capable, with the help of machine learning, of predicting the solubility of proteins, for example based on their sequences. This thesis introduces the reader to the basic principles of machine learning and presents several machine learning methods, used in the field of protein solubility prediction. It deals with the definition of a dataset, which is later used to test selected predictors, as well as to train the ensemble predictor, which is the main focus of this thesis. It also focuses on several specific protein solubility predictors and explains the basic principles upon which they are built, as well as the results of their testing. In the end, it presents the ensemble predictor of protein solubility.
4	Drift-Aware Ensemble Regression Rosenthal, Frank, Volk, Peter Benjamin, Hahmann, Martin, Habich, Dirk, Lehner, Wolfgang 13 January 2023 (has links) Regression models are often required for controlling production processes by predicting parameter values. However, the implicit assumption of standard regression techniques that the data set used for parameter estimation comes from a stationary joint distribution may not hold in this context because manufacturing processes are subject to physical changes like wear and aging, denoted as process drift. This can cause the estimated model to deviate significantly from the current state of the modeled system. In this paper, we discuss the problem of estimating regression models from drifting processes and we present ensemble regression, an approach that maintains a set of regression models—estimated from different ranges of the data set—according to their predictive performance. We extensively evaluate our approach on synthetic and real-world data. info:eu-repo/classification/ddc/004 ddc:004
5	<b>PROBABILISTIC ENSEMBLE MACHINE LEARNING APPROACHES FOR UNSTRUCTURED TEXTUAL DATA CLASSIFICATION</b> Srushti Sandeep Vichare (17277901) 26 April 2024 (has links) <p dir="ltr">The volume of big data has surged, notably in unstructured textual data, comprising emails, social media, and more. Currently, unstructured data represents over 80% of global data, the growth is propelled by digitalization. Unstructured text data analysis is crucial for various applications like social media sentiment analysis, customer feedback interpretation, and medical records classification. The complexity is due to the variability in language use, context sensitivity, and the nuanced meanings that are expressed in natural language. Traditional machine learning approaches, while effective in handling structured data, frequently fall short when applied to unstructured text data due to the complexities. Extracting value from this data requires advanced analytics and machine learning. Recognizing the challenges, we developed innovative ensemble approaches that combine the strengths of multiple conventional machine learning classifiers through a probabilistic approach. Response to the challenges , we developed two novel models: the Consensus-Based Integration Model (CBIM) and the Unified Predictive Averaging Model (UPAM).The CBIM and UPAM ensemble models were applied to Twitter (40,000 data samples) and the National Electronic Injury Surveillance System (NEISS) datasets (323,344 data samples) addressing various challenges in unstructured text analysis. The NEISS dataset achieved an unprecedented accuracy of 99.50%, demonstrating the effectiveness of ensemble models in extracting relevant features and making accurate predictions. The Twitter dataset, utilized for sentiment analysis, demonstrated a significant boost in accuracy over conventional approaches, achieving a maximum of 65.83%. The results highlighted the limitations of conventional machine learning approaches when dealing with complex, unstructured text data and the potential of ensemble models. The models exhibited high accuracy across various datasets and tasks, showcasing their versatility and effectiveness in obtaining valuable insights from unstructured text data. The results obtained extend the boundaries of text analysis and improve the field of natural language processing.</p> probablistic approach Machine Learning Ensemble Method ensemble approaches unstructured text data Injury classification sentiment analyis
6	Método baseado em rotação e projeção otimizadas para a construção de ensembles de modelos / Ensemble method based on optimized rotation and projection Ferreira, Ednaldo José 31 May 2012 (has links) O desenvolvimento de novas técnicas capazes de produzir modelos de predição com erros de generalização relativamente baixos é uma constante em aprendizado de máquina e áreas correlatas. Nesse sentido, a composição de um conjunto de modelos no denominado ensemble merece destaque por seu potencial teórico e empírico de minimizar o erro de generalização. Diversos métodos para construção de ensembles de modelos são encontrados na literatura. Dentre esses, o método baseado em rotação (RB) tem apresentado desempenho superior a outros clássicos. O método RB utiliza a técnica de extração de características da análise de componentes principais (PCA) como estratégia de rotação para provocar acurácia e diversidade entre os modelos componentes. Contudo, essa estratégia não assegura que a direção resultante será apropriada para a técnica de aprendizado supervisionado (SLT) escolhida. Adicionalmente, o método RB não é adequado com SLTs invariantes à rotação e não foi amplamente validado com outras estáveis. Esses aspectos tornam-no inadequado e/ou restrito a algumas SLTs. Nesta tese, é proposta uma nova abordagem de extração baseada na concatenação de rotação e projeção otimizadas em prol da SLT (denominada roto-projeção otimizada). A abordagem utiliza uma metaheurística para otimizar os parâmetros da transformação de roto-projeção e minimizar o erro da técnica diretora da otimização. Mais enfaticamente, propõe-se a roto-projeção otimizada como parte fundamental de um novo método de ensembles, denominado ensemble baseado em roto-projeção otimizada (ORPE). Os resultados obtidos mostram que a roto-projeção otimizada pode reduzir a dimensionalidade e a complexidade dos dados e do modelo, além de aumentar o desempenho da SLT utilizada posteriormente. O método ORPE superou, com relevância estatística, o RB e outros com SLTs estáveis e instáveis em bases de classificação e regressão de domínio público e privado. O ORPE mostrou-se irrestrito e altamente eficaz assumindo a primeira posição em todos os ranqueamentos de dominância realizados / The development of new techniques capable of inducing predictive models with low generalization errors has been a constant in machine learning and other related areas. In this context, the composition of an ensemble of models should be highlighted due to its theoretical and empirical potential to minimize the generalization error. Several methods for building ensembles are found in the literature. Among them, the rotation-based (RB) has become known for outperforming other traditional methods. RB method applies the principal components analysis (PCA) for feature extraction as a rotation strategy to provide diversity and accuracy among base models. However, this strategy does not ensure that the resulting direction is appropriate for the supervised learning technique (SLT). Moreover, the RB method is not suitable for rotation-invariant SLTs and also it has not been evaluated with stable ones, which makes RB inappropriate and/or restricted to the use with only some SLTs. This thesis proposes a new approach for feature extraction based on concatenation of rotation and projection optimized for the SLT (called optimized roto-projection). The approach uses a metaheuristic to optimize the parameters from the roto-projection transformation, minimizing the error of the director technique of the optimization process. More emphatically, it is proposed the optimized roto-projection as a fundamental part of a new ensemble method, called optimized roto-projection ensemble (ORPE). The results show that the optimized roto-projection can reduce the dimensionality and the complexities of the data and model. Moreover, optimized roto-projection can increase the performance of the SLT subsequently applied. The ORPE outperformed, with statistical significance, RB and others using stable and unstable SLTs for classification and regression with databases from public and private domains. The ORPE method was unrestricted and highly effective holding the first position in every dominance rankings Aprendizado de Ensemble Aprendizado de máquina Ensemble learning Ensemble method Machine learning Método de ensemble Optimized roto-projection Optimized roto-projection ensemble Roto-projeção otimizada
7	Método baseado em rotação e projeção otimizadas para a construção de ensembles de modelos / Ensemble method based on optimized rotation and projection Ednaldo José Ferreira 31 May 2012 (has links) O desenvolvimento de novas técnicas capazes de produzir modelos de predição com erros de generalização relativamente baixos é uma constante em aprendizado de máquina e áreas correlatas. Nesse sentido, a composição de um conjunto de modelos no denominado ensemble merece destaque por seu potencial teórico e empírico de minimizar o erro de generalização. Diversos métodos para construção de ensembles de modelos são encontrados na literatura. Dentre esses, o método baseado em rotação (RB) tem apresentado desempenho superior a outros clássicos. O método RB utiliza a técnica de extração de características da análise de componentes principais (PCA) como estratégia de rotação para provocar acurácia e diversidade entre os modelos componentes. Contudo, essa estratégia não assegura que a direção resultante será apropriada para a técnica de aprendizado supervisionado (SLT) escolhida. Adicionalmente, o método RB não é adequado com SLTs invariantes à rotação e não foi amplamente validado com outras estáveis. Esses aspectos tornam-no inadequado e/ou restrito a algumas SLTs. Nesta tese, é proposta uma nova abordagem de extração baseada na concatenação de rotação e projeção otimizadas em prol da SLT (denominada roto-projeção otimizada). A abordagem utiliza uma metaheurística para otimizar os parâmetros da transformação de roto-projeção e minimizar o erro da técnica diretora da otimização. Mais enfaticamente, propõe-se a roto-projeção otimizada como parte fundamental de um novo método de ensembles, denominado ensemble baseado em roto-projeção otimizada (ORPE). Os resultados obtidos mostram que a roto-projeção otimizada pode reduzir a dimensionalidade e a complexidade dos dados e do modelo, além de aumentar o desempenho da SLT utilizada posteriormente. O método ORPE superou, com relevância estatística, o RB e outros com SLTs estáveis e instáveis em bases de classificação e regressão de domínio público e privado. O ORPE mostrou-se irrestrito e altamente eficaz assumindo a primeira posição em todos os ranqueamentos de dominância realizados / The development of new techniques capable of inducing predictive models with low generalization errors has been a constant in machine learning and other related areas. In this context, the composition of an ensemble of models should be highlighted due to its theoretical and empirical potential to minimize the generalization error. Several methods for building ensembles are found in the literature. Among them, the rotation-based (RB) has become known for outperforming other traditional methods. RB method applies the principal components analysis (PCA) for feature extraction as a rotation strategy to provide diversity and accuracy among base models. However, this strategy does not ensure that the resulting direction is appropriate for the supervised learning technique (SLT). Moreover, the RB method is not suitable for rotation-invariant SLTs and also it has not been evaluated with stable ones, which makes RB inappropriate and/or restricted to the use with only some SLTs. This thesis proposes a new approach for feature extraction based on concatenation of rotation and projection optimized for the SLT (called optimized roto-projection). The approach uses a metaheuristic to optimize the parameters from the roto-projection transformation, minimizing the error of the director technique of the optimization process. More emphatically, it is proposed the optimized roto-projection as a fundamental part of a new ensemble method, called optimized roto-projection ensemble (ORPE). The results show that the optimized roto-projection can reduce the dimensionality and the complexities of the data and model. Moreover, optimized roto-projection can increase the performance of the SLT subsequently applied. The ORPE outperformed, with statistical significance, RB and others using stable and unstable SLTs for classification and regression with databases from public and private domains. The ORPE method was unrestricted and highly effective holding the first position in every dominance rankings Aprendizado de Ensemble Aprendizado de máquina Método de ensemble Roto-projeção otimizada Ensemble learning Ensemble method Machine learning Optimized roto-projection Optimized roto-projection ensemble
8	Differential evolution technique on weighted voting stacking ensemble method for credit card fraud detection Dolo, Kgaugelo Moses 12 1900 (has links) Differential Evolution is an optimization technique of stochastic search for a population-based vector, which is powerful and efficient over a continuous space for solving differentiable and non-linear optimization problems. Weighted voting stacking ensemble method is an important technique that combines various classifier models. However, selecting the appropriate weights of classifier models for the correct classification of transactions is a problem. This research study is therefore aimed at exploring whether the Differential Evolution optimization method is a good approach for defining the weighting function. Manual and random selection of weights for voting credit card transactions has previously been carried out. However, a large number of fraudulent transactions were not detected by the classifier models. Which means that a technique to overcome the weaknesses of the classifier models is required. Thus, the problem of selecting the appropriate weights was viewed as the problem of weights optimization in this study. The dataset was downloaded from the Kaggle competition data repository. Various machine learning algorithms were used to weight vote a class of transaction. The differential evolution optimization techniques was used as a weighting function. In addition, the Synthetic Minority Oversampling Technique (SMOTE) and Safe Level Synthetic Minority Oversampling Technique (SL-SMOTE) oversampling algorithms were modified to preserve the definition of SMOTE while improving the performance. Result generated from this research study showed that the Differential Evolution Optimization method is a good weighting function, which can be adopted as a systematic weight function for weight voting stacking ensemble method of various classification methods. / School of Computing / M. Sc. (Computing) Differentia evolution Weighted voting Stacking ensemble method Class distribution Data distribution SMOTE Machine learning Bid data Credit card fraud 364.163 Credit Card Fraud
9	[en] E-AUTOMFIS: INTERPRETABLE MODEL FOR TIME SERIES FORECASTING USING ENSEMBLE LEARNING OF FUZZY INFERENCE SYSTEM / [pt] E-AUTOMFIS: MODELO INTERPRETÁVEL PARA PREVISÃO DE SÉRIES MULTIVARIADAS USANDO COMITÊS DE SISTEMAS DE INFERÊNCIA FUZZY THIAGO MEDEIROS CARVALHO 17 June 2021 (has links) [pt] Por definição, a série temporal representa o comportamento de uma variável em função do tempo. Para o processo de previsão de séries, o modelo deve ser capaz de aprender a dinâmica temporal das variáveis para obter valores futuros. Contudo, prever séries temporais com exatidão é uma tarefa que vai além de escolher o modelo mais complexo, e portanto a etapa de análise é um processo fundamental para orientar o ajuste do modelo. Especificamente em problemas multivariados, o AutoMFIS é um modelo baseado na lógica fuzzy, desenvolvido para introduzir uma explicabilidade dos resultados através de regras semanticamente compreensíveis. Mesmo com características promissoras e positivas, este sistema possui limitações que tornam sua utilização impraticável em problemas com bases de dados com alta dimensionalidade. E com a presença cada vez maior de bases de dados mais volumosas, é necessário que a síntese automática de sistemas fuzzy seja adaptada para abranger essa nova classe de problemas de previsão. Por conta desta necessidade, a presente dissertação propõe a extensão do modelo AutoMFIS para a previsão de séries temporais com alta dimensionalidade, chamado de e-AutoMFIS. Apresentase uma nova metodologia, baseada em comitê de previsores, para o aprendizado distribuído de geração de regras fuzzy. Neste trabalho, são descritas as características importantes do modelo proposto, salientando as modificações realizadas para aprimorar tanto a previsão quanto a interpretabilidade do sistema. Além disso, também é avaliado o seu desempenho em problemas reais, comparando-se a acurácia dos resultados com as de outras técnicas descritas na literatura. Por fim, em cada problema selecionado também é considerado o aspecto da interpretabilidade, discutindo-se os critérios utilizados para a análise de explicabilidade. / [en] By definition, the time series represents the behavior of a variable as a time function. For the series forecasting process, the model must be able to learn the temporal dynamics of the variables in order to obtain consistent future values. However, an accurate time series prediction is a task that goes beyond choosing the most complex (or promising) model that is applicable to the type of problem, and therefore the analysis step is a fundamental procedure to guide the adaptation of a model. Specifically, in multivariate problems, AutoMFIS is a model based on fuzzy logic, developed not only to give accurate forecasts but also to introduce the explainability of results through semantically understandable rules. Even with such promising characteristics, this system has shown practical limitations in problems that involve datasets of high dimensionality. With the increasing demand formethods to deal with large datasets, it should be great that approaches for the automatic synthesis of fuzzy systems could be adapted to cover a new class of forecasting problems. This dissertation proposes an extension of the base model AutoMFIS modeling method for time series forecasting with high dimensionality data, named as e-AutoMFIS. Based on the Ensemble learning theory, this new methodology applies distributed learning to generate fuzzy rules. The main characteristics of the proposed model are described, highlighting the changes in order to improve both the accuracy and the interpretability of the system. The proposed model is also evaluated in different case studies, in which the results are compared in terms of accuracy against the results produced by other methods in the literature. In addition, in each selected problem, the aspect of interpretability is also assessed, which is essential for explainability evaluation. [pt] BASE DE DADOS [pt] COMITE DE PREVISORES [pt] PREVISAO DE SERIES MULTIVARIADAS [pt] INTERPRETABILIDADE [pt] SISTEMA DE INFERENCIA FUZZY [en] BIG DATA [en] ENSEMBLE METHOD [en] INTERPRETABILITY [en] FUZZY INFERENCE SYSTEM

Search results