Global ETD Search

1	Short-term wind power forecasting using artificial neural networks-based ensemble model Chen,Qin 20 July 2022 (has links) (PDF) Short-term wind power forecasting is crucial for the efficient operation of power systems with high wind power penetration. Many forecasting approaches have been developed in the past to forecast short-term wind power. In recent years, artificial neural network-based approaches (ANNs) have been one of the most effective and popular approaches for short-term wind power forecasting because of the availability of large amounts of historical data and strong computational power. Although ANNs usually perform well for short-term wind power forecasting, further improvement can be obtained by selecting suitable input features, model parameters, and using forecasting techniques like spatial correlation and ensemble for ANNs. In this research, the effect of input features, model parameters, spatial correlation and ensemble techniques on short-term wind power forecasting performance of the ANNs models was evaluated. Pearson correlation coefficients between wind speed and other meteorological variables, together with a basic ANN model, were used to determine the impact of different input features on the forecasting performance of the ANNs. The effect of training sample resolution and training sample size on the forecasting performance was also investigated. To separately investigate the impact of the number of hidden layers and the number of hidden neurons on short-term wind power forecasting and to keep a single variable for each experiment, the same number of hidden neurons was used in each hidden layer. The ANNs with a total of 20 hidden neurons are shown to be sufficient for the nonlinear multivariate wind power forecasting problems faced in this dissertation. The ANNs with two hidden layers performed better than the one with a single hidden layer because additional hidden layer adds nonlinearity to the model. However, the ANNs with more than two hidden layers have the same or worse forecasting performance than the one with two hidden layers. ANNs with too many hidden layers and hidden neurons can overfit the training data. Spatial correlation technique was used to include meteorological variables from highly correlated neighbouring stations as input features to provide more surrounding information to the ANNs. The advantages of input features, model parameters, and spatial correlation and ensemble techniques were combined to form an ANN-based ensemble model to further enhance the forecasting performance from an individual ANN model. The simulation results show that all the available meteorological variables have different levels of impact on forecasting performance. Wind speed has the most significant impact on both short-term wind speed and wind power forecasting, whereas air temperature, barometric pressure, and air density have the smallest effects. The ANNs perform better with a higher data resolution and a significantly larger training sample size. However, one requires more computational power and a longer training time to train the model with a higher data resolution and a larger training sample size. Using the meteorological variables from highly related neighbouring stations do significantly improve the forecasting accuracy of target stations. It is shown that an ANNs-based ensemble model can further enhance the forecasting performance of an individual ANN by obtaining a large amount of surrounding meteorological information in parallel without encountering the overfitting issue faced by a single ANN model. Artificial intelligence artificial neural networks ensemble model particle swarm optimization
2	Artificial immune systems based committee machine for classification application Al-Enezi, Jamal January 2012 (has links) A new adaptive learning Artificial Immune System (AIS) based committee machine is developed in this thesis. The new proposed approach efficiently tackles the general problem of clustering high-dimensional data. In addition, it helps on deriving useful decision and results related to other application domains such classification and prediction. Artificial Immune System (AIS) is a branch of computational intelligence field inspired by the biological immune system, and has gained increasing interest among researchers in the development of immune-based models and techniques to solve diverse complex computational or engineering problems. This work presents some applications of AIS techniques to health problems, and a thorough survey of existing AIS models and algorithms. The main focus of this research is devoted to building an ensemble model integrating different AIS techniques (i.e. Artificial Immune Networks, Clonal Selection, and Negative Selection) for classification applications to achieve better classification results. A new AIS-based ensemble architecture with adaptive learning features is proposed by integrating different learning and adaptation techniques to overcome individual limitations and to achieve synergetic effects through the combination of these techniques. Various techniques related to the design and enhancements of the new adaptive learning architecture are studied, including a neuro-fuzzy based detector and an optimizer using particle swarm optimization method to achieve enhanced classification performance. An evaluation study was conducted to show the performance of the new proposed adaptive learning ensemble and to compare it to alternative combining techniques. Several experiments are presented using different medical datasets for the classification problem and findings and outcomes are discussed. The new adaptive learning architecture improves the accuracy of the ensemble. Moreover, there is an improvement over the existing aggregation techniques. The outcomes, assumptions and limitations of the proposed methods with its implications for further research in this area draw this research to its conclusion. 006.3
3	Using Machine Learning to Accurately Predict Ambient Soundscapes from Limited Data Sets Pedersen, Katrina Lynn 04 October 2018 (has links) The ability to accurately characterize the soundscape, or combination of sounds, of diverse geographic areas has many practical implications. Interested parties include the United States military and the National Park Service, but applications also exist in areas such as public health, ecology, community and social justice noise analyses, and real estate. I use an ensemble of machine learning models to predict ambient sound levels throughout the contiguous United States. Our data set consists of 607 training sites, where various acoustic metrics, such as overall daytime L50 levels and one-third octave frequency band levels, have been obtained. I have data for 117 geospatial features for the entire contiguous United States, which include metrics such as distance to the nearest road or airport, and the percentage of industrialization or forest in a specific area. I discuss initial model predictions in the spatial, frequency, and temporal domains, and the statistical advantages of using an ensemble of machine learning models, particularly for limited data sets. I comment on uncertainty quantification for machine learning models originating from limited data sets. acoustics ensemble model machine learning soundscape statistics uncertainty quantiﬁcation Physical Sciences and Mathematics
4	Automated Mental Disorders Assessment Using Machine Learning Abaei Koupaei, Niloufar 13 December 2021 (has links) Mental and behavioural disorders such as bipolar disorder and depression are critical healthcare issues that affected approximately 45 and 264 million people around the world, respectively in 2020. Early detection and intervention are crucial for limiting the negative effects that these illnesses can have on people’s lives. Although the symptoms for different mental disorders vary, they generally are characterized by a combination of abnormal behaviours, thoughts, and emotions. Mental disorders can affect one’s ability to relate to others and function every day. To assess symptoms, clinicians often use structured clinical interviews and standard questioners. However, there is a scarcity of automated or technology-assisted tools that can simplify the diagnostic process. The main objective of this thesis is to investigate, develop, and propose automated methods for mental disorder detection. We focus in our research on bipolar disorder and depression as they are two of the most common and debilitating mental illnesses. Bipolar disorder is one of the most prevalent mental illnesses in the world. Its principal indicator is the extreme swings in the mood ranging from the manic to depressive states. We propose automatic ternary classification models for the bipolar disorder manic states. We employ a dataset that uses the Young Mania Recall Scale to distinguish the manic states of patients as: Mania, Hypo- Mania, and Remission. The dataset comprises audio-visual recordings of bipolar disorder patients undergoing a structured interview. We propose three bipolar disorder classification solutions. The first approach uses a hybrid LSTM-CNN model. We apply a CNN model to extract facial features from video signals. We supply the features’ sequence to an LSTM model to resolve the bipolar disorder state. Our solution achieved promising results on the development and test set of the Turkish Audio-Visual Bipolar Disorder Corpus with the Unweighted Average Recall of 60.67% and 57.4%, respectively. The second solution employs additional features from the structured interview recordings. We acquire visual representations along with audio and textual cues. We capture Mel-Frequency Cepstral Coefficients and Geneva Minimalistic Acoustic Parameter Set as audio features. We compute linguistic and sentiment features for each subject’s transcript. We present a stacked ensemble classifier to classify all fused features after feature selection. A set of three homogeneous CNNs and an MLP constitute the first and second levels of the stacked ensemble classifier respectively. Moreover, we use reinforcement learning to optimize the networks and their hyperparameters. We show that our stacked ensemble solution outperforms existing models on the Turkish Audio-Visual Bipolar Disorder corpus with a 59.3% unweighted average unit on the test set. To the best of our knowledge, this is the highest performance achieved on this dataset. The Turkish Audio-Visual Bipolar Disorder dataset comprises a relatively small number of videos. Moreover, the labels for the testing set are kept confidential by the dataset provider. Hence, this motivated us to train a classifier using a semi-supervised ladder network for the third solution. This network benefits from unlabeled data during training. Our goal was to investigate whether a bipolar disorder states classifier can be trained using a mix of labelled and unlabelled data. This would alleviate the burden of labelling all the videos in the training set. We collect informative audio, visual, and textual features from the recordings to realize a multi-model classifier of the manic states. The third proposed model achieved a 53.7% and 60.0% unweighted average unit on the test and development sets, respectively. There is a growing demand for automated depression detection system to control the subjective bias in diagnosis. We propose an automated depression severity detection model that uses multi- modal fusion of audio and textual information. We train the model on the E-DAIC corpus, which labels the individual’s depression level with patient health questionnaire score. We use MFCCs and eGeMAPs as audio representations and Word2Vec embeddings for the textual modality. Then, we implement a stacked ensemble regressor to detect depression severity. The proposed model achieves a concordance correlation coefficient 0.49 on the test set. To the best of our knowledge, this is the highest performing model on this dataset. Machine Learning Mental Health Bipolar Disorder Depression Ensemble model Classification/Regression
5	W2R: an ensemble Anomaly detection model inspired by language models for web application firewalls security Wang, Zelong, AnilKumar, Athira January 2023 (has links) Nowadays, web application attacks have increased tremendously due to the large number of users and applications. Thus, industries are paying more attention to using Web application Firewalls and improving their security which acts as a shield between the app and the internet by filtering and monitoring the HTTP traffic. Most works focus on either traditional feature extraction or deep methods that require no feature extraction method. We noticed that a combination of an unsupervised language model and a classic dimension reduction method is less explored for this problem. Inspired by this gap, we propose a new unsupervised anomaly detection model with better results than the existing state-of-the-art model for anomaly detection in WAF security. This paper focuses on this structure to explore WAF security: 1) feature extraction from HTTP traffic packets by using NLP (natural language processing) methods such as word2vec and Bert, and 2) Dimension reduction by PCA and Autoencoder, 3) Using different types of anomaly detection techniques including OCSVM, isolation forest, LOF and combination of these algorithms to explore how these methods affect results.  We used the datasets CSIC 2010 and ECML/PKDD 2007 in this paper, and the model has better results. web application firewall anomaly detection word2vec BERT dimension reduction ensemble model Computer Sciences Datavetenskap (datalogi)
6	Tracking time evolving data streams for short-term traffic forecasting Abdullatif, Amr R.A., Masulli, F., Rovetta, S. 20 January 2020 (has links) Yes / Data streams have arisen as a relevant topic during the last few years as an efficient method for extracting knowledge from big data. In the robust layered ensemble model (RLEM) proposed in this paper for short-term traffic flow forecasting, incoming traffic flow data of all connected road links are organized in chunks corresponding to an optimal time lag. The RLEM model is composed of two layers. In the first layer, we cluster the chunks by using the Graded Possibilistic c-Means method. The second layer is made up by an ensemble of forecasters, each of them trained for short-term traffic flow forecasting on the chunks belonging to a specific cluster. In the operational phase, as a new chunk of traffic flow data presented as input to the RLEM, its memberships to all clusters are evaluated, and if it is not recognized as an outlier, the outputs of all forecasters are combined in an ensemble, obtaining in this a way a forecasting of traffic flow for a short-term time horizon. The proposed RLEM model is evaluated on a synthetic data set, on a traffic flow data simulator and on two real-world traffic flow data sets. The model gives an accurate forecasting of the traffic flow rates with outlier detection and shows a good adaptation to non-stationary traffic regimes. Given its characteristics of outlier detection, accuracy, and robustness, RLEM can be fruitfully integrated in traffic flow management systems. Traffic forecasting Fuzzy clustering Big data Ensemble model Evolving data streams
7	Is this the real life, or is this just fantasy? Assessing species distribution model realism and applicability with virtual and empirical species Bevan, Hannah R 01 January 2024 (has links) (PDF) Species distribution models (SDMs) can be important tools for proactive conservation management if they are realistic. Unfortunately, achieving and assessing SDM realism is challenging given the general limitations of scientific models and empirical species data. We addressed the issue of achieving realism with high model quality and reproducibility by reviewing 200 SDMs and cataloguing methods for data availability, response and predictor variables, model fitting, and model performance. We addressed the issue of assessing SDM realism by comparing known and predicted distributions of habitat suitability with simulated data for various model fitting choices. Finally, we applied and compared subsequent lessons to empirical, ensemble SDMs for the exotic ball python (Python regius) and invasive Argentine black and white tegu (Salvator merianae) as case studies for Florida mitigation management practices. Fundamental SDM standards were addressed inconsistently in the literature and lacked transparency and replicability. This decreases SDM quality and increases method confusion. We provided a new checklist with well-supported guidelines to aid in greater method consistency (thus quality and reproducibility) and realism. Model realism varied based on algorithm choice but was consistent across sample sizes and species types. No algorithm was perfectly realistic, but eight consistently produced high rates of realism and performance (and the two were not strongly correlated). Ensemble strategies were consistently more robust than individual algorithms, so we recommended a new ensemble based on those eight high-performing algorithms. We applied this ensemble strategy to our empirical SDMs along with other ensemble groupings (including the most popular individual algorithm) from the literature to inform novel SDMs. Ensemble SDMs consistently performed well with the empirical data and outperformed the individual algorithm. Results here help inform general SDM method guidance for a variety of native and nonnative species (with both simulated and empirical demonstrations) to improve SDM realism and applications in the future. Biomod2 ensemble model habitat suitability nonnative species species distribution model virtual species
8	Regression Model to Project and Mitigate Vehicular Emissions in Cochabamba, Bolivia Wagner, Christopher 28 August 2017 (has links) No description available. Engineering Environmental Engineering Mechanical Engineering Random Forest Model Vehicular Fleet Cochabamba, Bolivia Vehicle Emissions Predictive Ensemble Model
9	Strategies for Combining Tree-Based Ensemble Models Zhang, Yi 01 January 2017 (has links) Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach. ensemble models model selection multiple correspondence analysis predictive models random forest extremely randomized tree and eXtreme gradient boosting model tree based ensemble model Computer Sciences
10	Utilizing Hybrid Ensemble Prediction Model In Order to Predict Energy Demand in Sweden : A Machine-Learning Approach / En maskininlärningsmetod som använder hybridensembleprediktionsmodell för att förutsäga energiefterfrågan i Sverige Su, Binxin January 2022 (has links) Conventional machine learning (ML) models and algorithms are constantly advancing at a fast pace. Most of this development are due to the implementation of hybrid- and ensemble techniques that are powerful tools to complement and empower the efficiency of the algorithms. At the same time, the development and demand for renewable energy sources are rapidly increasing driven by political and environmental issues in which failure to act fast enough, could lead to an existential crisis. With the phasing of non-renewable to renewable energy sources, new challenges arise due to its intermittent and variable nature. Accurate forecasting techniques plays a crucial role in addressing these challenges. In this thesis, I present a hybrid ensemble machine learning model based upon stacking, utilizing a Gradient Boosted Tree as a meta-learner to predict the energy demand for the energy area SE3 in Sweden. The Hybrid model is based on three composite models: XGBoost, CatBoost and Random Forest (RF); utilizing only features extracted from the timeseries data. For training and testing the proposed Hybrid model, hourly demand load data was gathered from Svenska Kraftnät, measuring energy consumption for the energy area SE3 from year 2016-2021. The forecasting results of the models are measured using a regression score (R-squared, which measures Explained Variance) and Accuracy (measured in terms of Mean Absolute Percentage Error). The result shows that in an experimental setting, the Hybrid model reaches a R-squared score of 0.9785 and an accuracy of 97.85%. When utilized for day-ahead prediction on unseen data outside of the scope of the training dataset, the Hybrid model reaches a R-squared score of 0.9764 and an Accuracy of 93.43%. This thesis concludes that the proposed methodology can be utilized to accurately predict the variance in the energy demand and can serve as a framework to decision makers in order to accurately predict the energy demand in Sweden. / Konventionella maskininlärningsmodeller (ML) och algoritmer utvecklas ständigt i snabb takt. Det mesta av denna utveckling beror på implementeringen av hybrid- och ensembletekniker som är kraftfulla verktyg för att komplettera och stärka effektiviteten hos algoritmer. Samtidigt ökar utvecklingen och efterfrågan på förnybara energikällor snabbt, drivet av politiska och miljömässiga motiv, där underlåtenhet att agera tillräckligt snabbt kan leda till en existentiell kris. Med utfasningen av icke-förnybara till förnybara energikällor uppstår nya utmaningar på grund av dess intermittenta och varierande karaktär. Noggranna prognostekniker spelar en avgörande roll för att hantera dessa utmaningar. I det här examensarbetet presenterar jag en hybrid ensemble maskininlärningsmodell baserad på stacking, med användning av ett Gradient Boosted Decision Tree (GBDT) som en meta-learner för att förutsäga energibehovet för energiområdet SE3 i Sverige. Hybridmodellen är baserad på tre kompositmodeller: XGBoost, CatBoost och Random Forest (RF) och använder endast features extraherade från tidsseriedata. För att utbilda och testa den föreslagna hybridmodellen samlades timbelastningsdata från Svenska Kraftnät, som mäter energiförbrukningen för energiområdet SE3 från år 2016-2021. Modellernas prognosresultat mäts med hjälp av ett regressionsmått (R-kvadrat, som mäter Explained Variance) och Accuracy (mätt i termer av Mean Absolute Percentage Error). Resultatet visar att i en experimentell miljö når hybridmodellen en R-kvadratvärde på 0,9785 och en Accuracy på 97,85%. När hybridmodellen används för att förutsäga energiförbrukningen dagen framåt på data utanför omfattningen av träningsdata, når hybridmodellen ett R-kvadratpoäng på 0,9764 och en Accuracy på 93,43%. Denna avhandling drar slutsatsen att den föreslagna metoden kan användas för att korrekt förutsäga variansen i energibehovet och kan fungera som ett ramverk för beslutsfattare för att korrekt prognostisera energibehovet i Sverige. Energy Demand Prediction Machine Learning Hybrid Model Ensemble Model Random Forest XGBoost CatBoost Prognos av Energiefterfrågan Maskininlärning Hybridmodell Ensemblemodell Random Forest XGBoost CatBoost Computer and Information Sciences Data- och informationsvetenskap

Search results