Global ETD Search

61	Spatial Ensemble Distillation Learning Based Real-Time Crash Prediction and Management Framework Islam, Md Rakibul 01 January 2023 (has links) (PDF) Real-time crash prediction is a complex task, since there is no existing framework to predict crash likelihood, types, and severity together along with a real-time traffic management strategy. Developing such a framework presents various challenges, including not independent and identically distributed data, imbalanced data, large model size, high computational cost, missing data, sensitivity vs. false alarm rate (FAR) trade-offs, estimation of traffic restoration time after crash occurrence, and real-world deployment strategy. A novel spatial ensemble distillation learning modeling technique is proposed to address these challenges. First, large-scale real-time data were used to develop a crash likelihood prediction model. Second, the proposed crash likelihood model's viability in predicting specific crash types was tested for real-world applications. Third, the framework was extended to predict crash severity in real-time, categorizing crashes into four levels. The results demonstrated strong performance with sensitivities of 90.35%, 94.80%, and 84.23% for all crashes, rear-end crashes, and sideswipe/angle crashes, and 83.32%, 81.25%, 83.08%, and 84.59% for fatal, severe, minor injury, and PDO crashes, respectively, all while remaining very low FARs. This methodology can also reduce model size, lower computation costs, improve sensitivity, and decrease FAR. These results will be used by traffic management center for taking measures to prevent crashes in real-time through active traffic management strategies. The framework was further extended for efficient traffic management after any crash occurrence despite adopting these strategies. Particularly, the framework was extended to predict the traffic state after a crash, predict the traffic restoration time based on the estimated post-crash traffic state, and apply a three-step validation technique to evaluate the performance of the developed approach. Finally, real-world deployment strategies of the proposed methodologies for real-time crash prediction along with their types and severities and real-time post-crash management are discussed. Overall, the methodologies presented in this dissertation offer multifaceted novel contributions and have excellent potential to reduce fatalities and injuries. Large-scale Real-time Crash Prediction Crash severity prediction Calibrated Confidence Learning Ensemble Learning Knowledge Distillation Traffic Restoration Time Transportation Engineering
62	Transformer-based Source Code Description Generation : An ensemble learning-based approach / Transformatorbaserad Generering av Källkodsbeskrivning : En ensemblemodell tillvägagångssätt Antonios, Mantzaris January 2022 (has links) Code comprehension can be significantly benefited from high-level source code summaries. For the majority of the developers, understanding another developer’s code or code that was written in the past by them, is a timeconsuming and frustrating task. This is necessary though in software maintenance or in cases where several people are working on the same project. A fast, reliable and informative source code description generator can automate this procedure, which is often avoided by developers. The rise of Transformers has turned the attention to them leading to the development of various Transformer-based models that tackle the task of source code summarization from different perspectives. Most of these models though are treating each other in a competitive manner when their complementarity could be proven beneficial. To this end, an ensemble learning-based approach is followed to explore the feasibility and effectiveness of the collaboration of more than one powerful Transformer-based models. The used base models are PLBart and GraphCodeBERT, two models with different focuses, and the ensemble technique is stacking. The results show that such a model can improve the performance and informativeness of individual models. However, it requires changes in the configuration of the respective models, that might harm them, and also further fine-tuning at the aggregation phase to find the most suitable base models’ weights and next-token probabilities combination, for the at the time ensemble. The results also revealed the need for human evaluation since metrics like BiLingual Evaluation Understudy (BLEU) are not always representative of the quality of the produced summary. Even if the outcome is promising, further work should follow, driven by this approach and based on the limitations that are not resolved in this work, for the development of a potential State Of The Art (SOTA) model. / Mjukvaruunderhåll samt kodförståelse är två områden som märkbart kan gynnas av källkodssammanfattning på hög nivå. För majoriteten av dagens utvecklare är det en tidskrävande och frustrerande uppgift att förstå en annan utvecklares kod.. För majoriteten av utvecklarna är det en tidskrävande och frustrerande uppgift att förstå en annan utvecklares kod eller kod som skrivits tidigare an dem. Detta är nödvändigt vid underhåll av programvara eller när flera personer arbetar med samma projekt. En snabb, pålitlig och informativ källkodsbeskrivningsgenerator kan automatisera denna procedur, som ofta undviks av utvecklare. Framväxten av Transformers har riktat uppmärksamheten mot dem, vilket har lett till utvecklingen av olika Transformer-baserade modeller som tar sig an uppgiften att sammanfatta källkod ur olika perspektiv. De flesta av dessa modeller behandlar dock varandra på ett konkurrenskraftigt sätt när deras komplementaritet kan bevisas vara mer fördelaktigt. För detta ändamål följs en ensembleinlärningsbaserad strategi för att utforska genomförbarheten och effektiviteten av samarbetet mellan mer än en kraftfull transformatorbaserad modell. De använda basmodellerna är PLBart och GraphCodeBERT, två modeller med olika fokus, och ensemblingstekniken staplas. Resultaten visar att en sådan modell kan förbättra prestanda och informativitet hos enskilda modeller. Det kräver dock förändringar i konfigurationen av respektive modeller som kan leda till skada, och även ytterligare finjusteringar i aggregeringsfasen för att hitta de mest lämpliga basmodellernas vikter och nästa symboliska sannolikhetskombination för den dåvarande ensemblen. Resultaten visade också behovet av mänsklig utvärdering eftersom mätvärden som BLEU inte alltid är representativa för kvaliteten på den producerade sammanfattningen. Även om resultaten är lovande bör ytterligare arbete följa, drivet av detta tillvägagångssätt och baserat på de begränsningar som inte är lösta i detta arbete, för utvecklingen av en potentiell SOTA-modell. Natural Language Processing Code Summarization Transformers Text Generation Ensemble Learning BLEU Naturlig språkbehandling Kodsammanfattning Transformatorer Textgenerering Ensemble Lärande BLEU Computer and Information Sciences Data- och informationsvetenskap
63	Automatic Change Detection in Visual Scenes Brolin, Morgan January 2021 (has links) This thesis proposes a Visual Scene Change Detector(VSCD) system which is a system which involves four parts, image retrieval, image registration, image change detection and panorama creation. Two prestudies are conducted in order to find a proposed image registration method and a image retrieval method. The two found methods are then combined with a proposed image registration method and a proposed panorama creation method to form the proposed VSCD. The image retrieval prestudy evaluates a SIFT related method with a bag of words related method and finds the SIFT related method to be the superior method. The image change detection prestudy evaluates 8 different image change detection methods. Result from the image change detection prestudy shows that the methods performance is dependent on the image category and an ensemble method is the least dependent on the category of images. An ensemble method is found to be the best performing method followed by a range filter method and then a Convolutional Neural Network (CNN) method. Using a combination of the 2 image retrieval methods and the 8 image change detection method 16 different VSCD are formed and tested. The final result show that the VSCD comprised of the best methods from the prestudies is the best performing method. / Detta exjobb föreslår ett Visual Scene Change Detector(VSCD) system vilket är ett system som har 4 delar, image retrieval, image registration, image change detection och panorama creation. Två förstudier görs för att hitta en föreslagen image registration metod och en föreslagen panorama creation metod. De två föreslagna delarna kombineras med en föreslagen image registration och en föreslagen panorama creation metod för att utgöra det föreslagna VSCD systemet. Image retrieval förstudien evaluerar en ScaleInvariant Feature Transform (SIFT) relaterad method med en Bag of Words (BoW) relaterad metod och hittar att den SIFT relaterade methoden är bäst. Image change detection förstudie visar att metodernas prestanda är beroende av catagorin av bilder och att en enemble metod är minst beroende av categorin av bilder. Enemble metoden är hittad att vara den bästa presterande metoden följt av en range filter metod och sedan av en CNN metod. Genom att använda de 2 image retrieval metoder kombinerat med de 8 image change detection metoder är 16 st VSCD system skapade och testade. Sista resultatet visar att den VSCD som använder de bästa metoderna från förstudien är den bäst presterande VSCD. Master thesis. Image retrieval Image registration Image change detection Panorama creation CNN VGG16 Computer vision Machine learning Ensemble learning. Master examensarbete Image retrieval Image registration Image change detection Panorama creation CNN VGG16 Computer vision Machine learning Ensemble learning. Computer and Information Sciences Data- och informationsvetenskap
64	Cooperative coevolutionary mixture of experts : a neuro ensemble approach for automatic decomposition of classification problems Nguyen, Minh Ha, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2006 (has links) Artificial neural networks have been widely used for machine learning and optimization. A neuro ensemble is a collection of neural networks that works cooperatively on a problem. In the literature, it has been shown that by combining several neural networks, the generalization of the overall system could be enhanced over the separate generalization ability of the individuals. Evolutionary computation can be used to search for a suitable architecture and weights for neural networks. When evolutionary computation is used to evolve a neuro ensemble, it is usually known as evolutionary neuro ensemble. In most real-world problems, we either know little about these problems or the problems are too complex to have a clear vision on how to decompose them by hand. Thus, it is usually desirable to have a method to automatically decompose a complex problem into a set of overlapping or non-overlapping sub-problems and assign one or more specialists (i.e. experts, learning machines) to each of these sub-problems. An important feature of neuro ensemble is automatic problem decomposition. Some neuro ensemble methods are able to generate networks, where each individual network is specialized on a unique sub-task such as mapping a subspace of the feature space. In real world problems, this is usually an important feature for a number of reasons including: (1) it provides an understanding of the decomposition nature of a problem; (2) if a problem changes, one can replace the network associated with the sub-space where the change occurs without affecting the overall ensemble; (3) if one network fails, the rest of the ensemble can still function in their sub-spaces; (4) if one learn the structure of one problem, it can potentially be transferred to other similar problems. In this thesis, I focus on classification problems and present a systematic study of a novel evolutionary neuro ensemble approach which I call cooperative coevolutionary mixture of experts (CCME). Cooperative coevolution (CC) is a branch of evolutionary computation where individuals in different populations cooperate to solve a problem and their fitness function is calculated based on their reciprocal interaction. The mixture of expert model (ME) is a neuro ensemble approach which can generate networks that are specialized on different sub-spaces in the feature space. By combining CC and ME, I have a powerful framework whereby it is able to automatically form the experts and train each of them. I show that the CCME method produces competitive results in terms of generalization ability without increasing the computational cost when compared to traditional training approaches. I also propose two different mechanisms for visualizing the resultant decomposition in high-dimensional feature spaces. The first mechanism is a simple one where data are grouped based on the specialization of each expert and a color-map of the data records is visualized. The second mechanism relies on principal component analysis to project the feature space onto lower dimensions, whereby decision boundaries generated by each expert are visualized through convex approximations. I also investigate the regularization effect of learning by forgetting on the proposed CCME. I show that learning by forgetting helps CCME to generate neuro ensembles of low structural complexity while maintaining their generalization abilities. Overall, the thesis presents an evolutionary neuro ensemble method whereby (1) the generated ensemble generalizes well; (2) it is able to automatically decompose the classification problem; and (3) it generates networks with small architectures. Artificial neural networks automatic problem decomposition binary classification cooperative coevolution data mining ensemble learning evolutionary computation generalization learning by forgetting machine learning mixture of experts neuro ensemble regularization visualization weight distribution weight elimination
65	Forêts uniformément aléatoires et détection des irrégularités aux cotisations sociales / Detection of irregularities in social contributions using random uniform forests Ciss, Saïp 20 June 2014 (has links) Nous présentons dans cette thèse une application de l'apprentissage statistique à la détection des irrégularités aux cotisations sociales. L'apprentissage statistique a pour but de modéliser des problèmes dans lesquels il existe une relation, généralement non déterministe, entre des variables et le phénomène que l'on cherche à évaluer. Un aspect essentiel de cette modélisation est la prédiction des occurrences inconnues du phénomène, à partir des données déjà observées. Dans le cas des cotisations sociales, la représentation du problème s'exprime par le postulat de l'existence d'une relation entre les déclarations de cotisation des entreprises et les contrôles effectués par les organismes de recouvrement. Les inspecteurs du contrôle certifient le caractère exact ou inexact d'un certain nombre de déclarations et notifient, le cas échéant, un redressement aux entreprises concernées. L'algorithme d'apprentissage "apprend", grâce à un modèle, la relation entre les déclarations et les résultats des contrôles, puis produit une évaluation de l'ensemble des déclarations non encore contrôlées. La première partie de l'évaluation attribue un caractère régulier ou irrégulier à chaque déclaration, avec une certaine probabilité. La seconde estime les montants de redressement espérés pour chaque déclaration. Au sein de l'URSSAF (Union de Recouvrement des cotisations de Sécurité sociale et d'Allocations Familiales) d'Île-de-France, et dans le cadre d'un contrat CIFRE (Conventions Industrielles de Formation par la Recherche), nous avons développé un modèle de détection des irrégularités aux cotisations sociales que nous présentons et détaillons tout au long de la thèse. L'algorithme fonctionne sous le logiciel libre R. Il est entièrement opérationnel et a été expérimenté en situation réelle durant l'année 2012. Pour garantir ses propriétés et résultats, des outils probabilistes et statistiques sont nécessaires et nous discutons des aspects théoriques ayant accompagné sa conception. Dans la première partie de la thèse, nous effectuons une présentation générale du problème de la détection des irrégularités aux cotisations sociales. Dans la seconde, nous abordons la détection spécifiquement, à travers les données utilisées pour définir et évaluer les irrégularités. En particulier, les seules données disponibles suffisent à modéliser la détection. Nous y présentons également un nouvel algorithme de forêts aléatoires, nommé "forêt uniformément aléatoire", qui constitue le moteur de détection. Dans la troisième partie, nous détaillons les propriétés théoriques des forêts uniformément aléatoires. Dans la quatrième, nous présentons un point de vue économique, lorsque les irrégularités aux cotisations sociales ont un caractère volontaire, cela dans le cadre de la lutte contre le travail dissimulé. En particulier, nous nous intéressons au lien entre la situation financière des entreprises et la fraude aux cotisations sociales. La dernière partie est consacrée aux résultats expérimentaux et réels du modèle, dont nous discutons.Chacun des chapitres de la thèse peut être lu indépendamment des autres et quelques notions sont redondantes afin de faciliter l'exploration du contenu. / We present in this thesis an application of machine learning to irregularities in the case of social contributions. These are, in France, all contributions due by employees and companies to the "Sécurité sociale", the french system of social welfare (alternative incomes in case of unemployement, Medicare, pensions, ...). Social contributions are paid by companies to the URSSAF network which in charge to recover them. Our main goal was to build a model that would be able to detect irregularities with a little false positive rate. We, first, begin the thesis by presenting the URSSAF and how irregularities can appear, how can we handle them and what are the data we can use. Then, we talk about a new machine learning algorithm we have developped for, "random uniform forests" (and its R package "randomUniformForest") which are a variant of Breiman "random Forests" (tm), since they share the same principles but in in a different way. We present theorical background of the model and provide several examples. Then, we use it to show, when irregularities are fraud, how financial situation of firms can affect their propensity for fraud. In the last chapter, we provide a full evaluation for declarations of social contributions of all firms in Ile-de-France for year 2013, by using the model to predict if declarations present irregularities or not. Apprentissage statistique Apprentissage automatique Classification Régression Forêts uniformément aléatoires Irrégularités Fraude Cotisations sociales URSSAF d'Île-de-France Machine learning Ensemble learning Classification Regression Random uniform forests Decision trees Irregularities Fraud Social contributions URSSAF of Île-de-France 510 330
66	Horseshoe RuleFit : Learning Rule Ensembles via Bayesian Regularization Nalenz, Malte January 2016 (has links) This work proposes Hs-RuleFit, a learning method for regression and classiﬁcation, which combines rule ensemble learning based on the RuleFit algorithm with Bayesian regularization through the horseshoe prior. To this end theoretical properties and potential problems of this combination are studied. A second step is the implementation, which utilizes recent sampling schemes to make the Hs-RuleFit computationally feasible. Additionally, changes to the RuleFit algorithm are proposed such as Decision Rule post-processing and the usage of Decision rules generated via Random Forest. Hs-RuleFit addresses the problem of ﬁnding highly accurate and yet interpretable models. The method shows to be capable of ﬁnding compact sets of informative decision rules that give a good insight in the data. Through the careful choice of prior distributions the horse-shoe prior shows to be superior to the Lasso in this context. In an empirical evaluation on 16 real data sets Hs-RuleFit shows excellent performance in regression and outperforms the popular methods Random Forest, BART and RuleFit in terms of prediction error. The interpretability is demonstrated on selected data sets. This makes the Hs-RuleFit a good choice for science domains in which interpretability is desired. Problems are found in classiﬁcation, regarding the usage of the horseshoe prior and rule ensemble learning in general. A simulation study is performed to isolate the problems and potential solutions are discussed. Arguments are presented, that the horseshoe prior could be a good choice in other machine learning areas, such as artiﬁcial neural networks and support vector machines. Bayesian Statistics Regularization Ensemble Learning Decision Rules Horseshoe prior Machine Learning Knowledge Discovery Probability Theory and Statistics Sannolikhetsteori och statistik Computer Sciences Datavetenskap (datalogi) Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Other Computer and Information Science Annan data- och informationsvetenskap
67	Creation of a vocal emotional profile (VEP) and measurement tools Aghajani, Mahsa 10 1900 (has links) La parole est le moyen de communication dominant chez les humains. Les signaux vocaux véhiculent à la fois des informations et des émotions du locuteur. La combinaison de ces informations aide le récepteur à mieux comprendre ce que veut dire le locuteur et diminue la probabilité de malentendus. Les robots et les ordinateurs peuvent également bénéficier de ce mode de communication. La capacité de reconnaître les émotions dans la voix des locuteurs aide les ordinateurs à mieux répondre aux besoins humains. Cette amélioration de la communication entre les humains et les ordinateurs conduit à une satisfaction accrue des utilisateurs. Dans cette étude, nous avons proposé plusieurs approches pour détecter les émotions de la parole ou de la voix par ordinateur. Nous avons étudié comment différentes techniques et classificateurs d'apprentissage automatique et d'apprentissage profond permettent de détecter les émotions de la parole. Les classificateurs sont entraînés avec des ensembles de données d'émotions audio couramment utilisés et bien connus, ainsi qu'un ensemble de données personnalisé. Cet ensemble de données personnalisé a été enregistré à partir de personnes non-acteurs et non-experts tout en essayant de déclencher des émotions associées. La raison de considérer cet ensemble de données important est de rendre le modèle compétent pour reconnaître les émotions chez les personnes qui ne sont pas aussi parfaites que les acteurs pour refléter leurs émotions dans leur voix. Les résultats de plusieurs classificateurs d'apprentissage automatique et d'apprentissage profond tout en reconnaissant sept émotions de colère, de bonheur, de tristesse, de neutralité, de surprise, de peur et de dégoût sont rapportés et analysés. Les modèles ont été évalués avec et sans prise en compte de l'ensemble de données personnalisé pour montrer l'effet de l'utilisation d'un ensemble de données imparfait. Dans cette étude, tirer parti des techniques d'apprentissage en profondeur et des méthodes d'apprentissage en ensemble a dépassé les autres techniques. Nos meilleurs classificateurs pourraient obtenir des précisions de 90,41 % et 91,96 %, tout en étant entraînés par des réseaux de neurones récurrents et des classificateurs d'ensemble à vote majoritaire, respectivement. / Speech is the dominant way of communication among humans. Voice signals carry both information and emotion of the speaker. The combination of this information helps the receiver to get a better understanding of what the speaker means and decreases the probability of misunderstandings. Robots and computers can also benefit from this way of communication. The capability of recognizing emotions in speakers voice, helps the computers to serve the human need better. This improvement in communication between humans and computers leads to increased user satisfaction. In this study we have proposed several approaches to detect the emotions from speech or voice computationally. We have investigated how different machine learning and deep learning techniques and classifiers perform in detecting the emotions from speech. The classifiers are trained with some commonly used and well-known audio emotion datasets together with a custom dataset. This custom dataset was recorded from non-actor and non-expert people while trying to trigger related emotions in them. The reason for considering this important dataset is to make the model proficient in recognizing emotions in people who are not as perfect as actors in reflecting their emotions in their voices. The results from several machine learning and deep learning classifiers while recognizing seven emotions of anger, happiness, sadness, neutrality, surprise, fear and disgust are reported and analyzed. Models were evaluated with and without considering the custom data set to show the effect of employing an imperfect dataset. In this study, leveraging deep learning techniques and ensemble learning methods has surpassed the other techniques. Our best classifiers could obtain accuracies of 90.41% and 91.96%, while being trained by recurrent neural networks and majority voting ensemble classifiers, respectively. Machine Learning Deep Learning Ensemble Learning Interface Cerveau-Ordinateur Reconnaissance Vocale Des Émotions Informatique Affective Voice Emotion Recognition Brain Computer Interface Affective Computing
68	Complex Vehicle Modeling: A Data Driven Approach Schoen, Alexander C. 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / This thesis proposes an artificial neural network (NN) model to predict fuel consumption in heavy vehicles. The model uses predictors derived from vehicle speed, mass, and road grade. These variables are readily available from telematics devices that are becoming an integral part of connected vehicles. The model predictors are aggregated over a fixed distance traveled (i.e., window) instead of fixed time interval. It was found that 1km windows is most appropriate for the vocations studied in this thesis. Two vocations were studied, refuse and delivery trucks. The proposed NN model was compared to two traditional models. The first is a parametric model similar to one found in the literature. The second is a linear regression model that uses the same features developed for the NN model. The confidence level of the models using these three methods were calculated in order to evaluate the models variances. It was found that the NN models produce lower point-wise error. However, the stability of the models are not as high as regression models. In order to improve the variance of the NN models, an ensemble based on the average of 5-fold models was created. Finally, the confidence level of each model is analyzed in order to understand how much error is expected from each model. The mean training error was used to correct the ensemble predictions for five K-Fold models. The ensemble K-fold model predictions are more reliable than the single NN and has lower confidence interval than both the parametric and regression models. Neural Network Prediction Fuel Consumption Improvement Ensemble Learning Refuse Truck Complex System Modeling Delivery Truck Vehicle Routing SAE J1321 Synthetic Data Generation Aerodynamic Speed Characteristic Acceleration Feature Importance Influence of Weights Machine Learning Point-wise Error Artificial Neural Network
69	Algorithmic Methods for Multi-Omics Biomarker Discovery Li, Yichao January 2018 (has links) No description available. Bioinformatics Computer Science Motif Diabetes Transcription Factor HiC Set Cover Machine Learning Ensemble Learning HbA1C Glycated Peptide Motif Discovery Motif Pair 3D Genome Organization DREAM challenge Python Data Analytics Hist1 Clustering Analysis Cross Validation
70	Ensemble Classifier Design and Performance Evaluation for Intrusion Detection Using UNSW-NB15 Dataset Zoghi, Zeinab 30 November 2020 (has links) No description available. Mathematics Computer Engineering Computer Science Engineering Statistics UNSW-NB15 Ensemble Learning Ensemble Classification XGBoost Random Forest Balanced Bagging Bagging Boosting Hellinger Distance Elastic Net Sequential Feature Selection Anomaly Detection System Machine Learning Cybersecurity Data Science

Search results