Global ETD Search

61	Search for the Higgs boson in the ttH(H -> bb) channel and the identification of jets containing two B hadrons with the ATLAS experiment / Recherche du boson de Higgs dans le canal ttH(H -> bb) et l'identification des jets contenant deux hadrons B avec l'expérience Atlas Ticse Torres, Royer Edson 29 September 2016 (has links) En juillet 2012, le CERN a annoncé la découverte du boson de Higgs qui est la dernière particule manquante du Modèle Standard (MS). Le boson de Higgs a été observé dans différents canaux. La mesure précise de ses propriétés est maintenant très importante pour rechercher des déviations par rapport au SM. Cette thèse présente une recherche du boson de Higgs produit en association avec une paire de quarks top et se désintégrant en une paire de b quark, le canal ttH(H → bb) , en utilisant les données de collisions proton-proton à √s = 13 TeV, collectées avec le détecteur ATLAS en 2015 et 2016. Ce document détaille en particulier la reconstruction du systéme ttH et la séparation du signal du bruit de fond principal tt + jets. Des études récentes montrent qu'il existe une fraction importante du bruit de fond ttbb avec des jets contenant deux hadrons b. Un nouvel algorithme a été développé pour séparer ces jets des jets contenant un seul b-hadron. La description de cet outil est présenté dans cette thèse. / In July 2012, CERN announced the discovery of the Higgs boson, the last missing piece of the Standard Model (SM). The Higgs boson was observed in different channels. Precise measurement of its properties is now very important to investigate for possible deviations from the SM. This thesis presents a search for the Higgs boson produced in association with top quarks and decaying to a b quark pair, the ttH(H→bb) channel, using proton-proton collisions at √s = 13 TeV, collected with the ATLAS detector in 2015 and 2016. This document details in particular the full reconstruction of the ttH(H→bb) system and the discrimination between signal and the main background, tt+jets. The ttbb is a subset of the tt+jets backgrounds recent studies show that there is a large fraction of ttbb events with jets containing two b-hadrons. A new algorithm has been developed to discriminate such jets from single b-hadrons jets. The description of this tool is presented in this thesis. Lhc Atlas Higgs boson H bb Jet flavor tagging Boosted decision trees Lhc Atlas Higgs boson H → bb Jet flavor tagging Boosted decision trees 530
62	Improving XRD Analysis with Machine Learning Drapeau, Rachel E. 14 August 2023 (has links) (PDF) X-ray diffraction analysis (XRD) is an inexpensive method to quantify the relative proportions of mineral phases in a rock or soil sample. However, the analytical software available for XRD requires extensive user input to choose phases to include in the analysis. Consequently, analysis accuracy depends greatly on the experience of the analyst, especially as the number of phases in a sample increases (Raven & Self, 2017; Omotoso, 2006). The purpose of this project is to test whether incorporating machine learning methods into XRD software can improve the accuracy of analyses by assisting in the phase-picking process. In order to provide a large enough sample of X-ray diffraction (XRD) patterns and their known compositions to train the machine learning models, I created a dataset of 1.5 million calculated XRD patterns of realistic mineral mixtures. These synthetic XRD patterns were calculated using crystal structure files from the American Mineralogist Crystal Structure Database (AMCSD) with mineral occurrence data from the Mineral Evolution Database (MED) to mimic geologic knowledge used by expert analysts. Using this dataset, I trained and refined a variety of machine learning models to determine which model is most accurate in identifying the correct mineral phases. X-ray diffraction analysis XRD machine learning Rietveld method crystal structure classification decision trees bagged decision trees data generation mineral mixture Physical Sciences and Mathematics
63	The Effectiveness of a Random Forests Model in Detecting Network-Based Buffer Overflow Attacks Julock, Gregory Alan 01 January 2013 (has links) Buffer Overflows are a common type of network intrusion attack that continue to plague the networked community. Unfortunately, this type of attack is not well detected with current data mining algorithms. This research investigated the use of Random Forests, an ensemble technique that creates multiple decision trees, and then votes for the best tree. The research Investigated Random Forests' effectiveness in detecting buffer overflows compared to other data mining methods such as CART and Naïve Bayes. Random Forests was used for variable reduction, cost sensitive classification was applied, and each method's detection performance compared and reported along with the receive operator characteristics. The experiment was able to show that Random Forests outperformed CART and Naïve Bayes in classification performance. Using a technique to obtain Buffer Overflow most important variables, Random Forests was also able to improve upon its Buffer Overflow classification performance. Buffer Overflows Cost sensitive classification Data Mining Decision Trees Naive Bayes Random Forests Computer Sciences
64	Decision forests for computer Go feature learning Van Niekerk, Francois 04 1900 (has links) Thesis (MSc)--Stellenbosch University, 2014. / ENGLISH ABSTRACT: In computer Go, moves are typically selected with the aid of a tree search algorithm. Monte-Carlo tree search (MCTS) is currently the dominant algorithm in computer Go. It has been shown that the inclusion of domain knowledge in MCTS is able to vastly improve the strength of MCTS engines. A successful approach to representing domain knowledge in computer Go is the use of appropriately weighted tactical features and pattern features, which are comprised of a number of hand-crafted heuristics and a collection of patterns respectively. However, tactical features are hand-crafted specifically for Go, and pattern features are Go-specific, making it unclear how they can be easily transferred to other domains. As such, this work proposes a new approach to representing domain knowledge, decision tree features. These features evaluate a state-action pair by descending a decision tree, with queries recursively partitioning the state-action pair input space, and returning a weight corresponding to the partition element represented by the resultant leaf node. In this work, decision tree features are applied to computer Go, in order to determine their feasibility in comparison to state-of-the-art use of tactical and pattern features. In this application of decision tree features, each query in the decision tree descent path refines information about the board position surrounding a candidate move. The results of this work showed that a feature instance with decision tree features is a feasible alternative to the state-of-the-art use of tactical and pattern features in computer Go, in terms of move prediction and playing strength, even though computer Go is a relatively well-developed research area. A move prediction rate of 35.9% was achieved with tactical and decision tree features, and they showed comparable performance to the state of the art when integrated into an MCTS engine with progressive widening. We conclude that the decision tree feature approach shows potential as a method for automatically extracting domain knowledge in new domains. These features can be used to evaluate state-action pairs for guiding searchbased techniques, such as MCTS, or for action-prediction tasks. / AFRIKAANSE OPSOMMING: In rekenaar Go, word skuiwe gewoonlik geselekteer met behulp van ’n boomsoektogalgoritme. Monte-Carlo boomsoektog (MCTS) is tans die dominante algoritme in rekenaar Go. Dit is bekend dat die insluiting van gebiedskennis in MCTS in staat is om die krag van MCTS enjins aansienlik te verbeter. ’n Suksesvolle benadering tot die voorstelling van gebiedskennis in rekenaar Go is taktiek- en patroonkenmerke met geskikte gewigte. Hierdie behels ’n aantal handgemaakte heuristieke en ’n versameling van patrone onderskeidelik. Omdat taktiekkenmerke spesifiek vir Go met die hand gemaak is, en dat patroonkenmerke Go-spesifiek is, is dit nie duidelik hoe hulle maklik oorgedra kan word na ander velde toe nie. Hierdie werk stel dus ’n nuwe verteenwoordiging van gebiedskennis voor, naamlik besluitboomkenmerke. Hierdie kenmerke evalueer ’n toestand-aksie paar deur rekursief die toevoerruimte van toestand-aksie pare te verdeel deur middel van die keuses in die besluitboom, en dan die gewig terug te keer wat ooreenstem met die verdelingselement wat die ooreenstemmende blaarnodus verteenwoordig. In hierdie werk, is besluitboomkenmerke geëvalueer op rekenaar Go, om hul lewensvatbaarheid in vergelyking met veldleidende gebruik van taktiek- en patroonkenmerke te bepaal. In hierdie toepassing van besluitboomkenmerke, verfyn elke navraag in die pad na onder van die besluitboom inligting oor die posisie rondom ’n kandidaatskuif. Die resultate van hierdie werk het getoon dat ’n kenmerkentiteit met besluitboomkenmerke ’n haalbare alternatief is vir die veldleidende gebruik van taktiek- en patroonkenmerke in rekenaar Go in terme van skuifvoorspelling as ook speelkrag, ondanks die feit dat rekenaar Go ’n relatief goedontwikkelde navorsingsgebied is. ’n Skuifvoorspellingskoers van 35.9% is behaal met taktiek- en besluitboomkenmerke, en hulle het vergelykbaar met veldleidende tegnieke presteer wanneer hulle in ’n MCTS enjin met progressiewe uitbreiding geïntegreer is. Ons lei af dat ons voorgestelde besluitboomkenmerke potensiaal toon as ’n metode vir die outomaties onttrek van gebiedskennis in nuwe velde. Hierdie eienskappe kan gebruik word om toestand-aksie pare te evalueer vir die leiding van soektog-gebaseerde tegnieke, soos MCTS, of vir aksie-voorspelling. Go (Game) Decision trees Monte-Carlo method UCTD Dissertations -- Computer science Theses -- Computer science
65	Oblique decision trees in transformed spaces. Wickramarachchi, Darshana Chitraka January 2015 (has links) Decision trees (DTs) play a vital role in statistical modelling. Simplicity and interpretability of the solution structure have made the method popular in a wide range of disciplines. In data classification problems, DTs recursively partition the feature space into disjoint sub-regions until each sub-region becomes homogeneous with respect to a particular class. Axis parallel splits, the simplest form of splits, partition the feature space parallel to feature axes. However, for some problem domains DTs with axis parallel splits can produce complicated boundary structures. As an alternative, oblique splits are used to partition the feature space potentially simplifying the boundary structure. Various approaches have been explored to find optimal oblique splits. One approach is based on optimisation techniques. This is considered the benchmark approach, however, its major limitation is that the tree induction algorithm is computationally expensive. On the other hand, split finding approaches based on heuristic arguments have gained popularity and have made improvements on benchmark methods. This thesis proposes a methodology to induce oblique decision trees in transformed spaces based on a heuristic argument. As the first goal of the thesis, a new oblique decision tree algorithm, called HHCART (\underline{H}ouse\underline{H}older \underline{C}lassification and \underline{R}egression \underline{T}ree) is proposed. The proposed algorithm utilises a series of Householder matrices to reflect the training data at each non-terminal node during the tree construction. Householder matrices are constructed using the eigenvectors from each classes' covariance matrix. Axis parallel splits in the reflected (or transformed) spaces provide an efficient way of finding oblique splits in the original space. Experimental results show that the accuracy and size of the HHCART trees are comparable with some benchmark methods in the literature. The appealing features of HHCART is that it can handle both qualitative and quantitative features in the same oblique split, conceptually simple and computationally efficient. Data mining applications often come with massive example sets and inducing oblique DTs for such example sets often consumes considerable time. HHCART is a serial computing memory resident algorithm which may be ineffective when handling massive example sets. As the second goal of the thesis parallel computing and disk resident versions of the HHCART algorithm are presented so that HHCART can be used irrespective of the size of the problem. HHCART is a flexible algorithm and the eigenvectors defining Householder matrices can be replaced by other vectors deemed effective in oblique split finding. The third endeavour of this thesis explores this aspect of HHCART. HHCART can be used with other vectors in order to improve classification results. For example, a normal vector of the angular bisector, introduced in the Geometric Decision Tree (GDT) algorithm, is used to construct the Householder reflection matrix. The proposed method produces better results than GDT for some problem domains. In the second case, \textit{Class Representative Vectors} are introduced and used to construct Householder reflection matrices. The results of this experiment show that these oblique trees produce classification results competitive with those achieved with some benchmark decision trees. DTs are constructed using two approaches, namely: top-down and bottom-up. HHCART is a top-down tree, which is the most common approach. As the fourth idea of the thesis, the concept of HHCART is used to induce a new DT, HHBUT, using the bottom-up approach. The bottom-up approach performs cluster analysis prior to the tree building to identify the terminal nodes. The use of the Bayesian Information Criterion (BIC) to determine the number of clusters leads to accurate and compact trees when compared with Cross Validation (CV) based bottom-up trees. We suggest that HHBUT is a good alternative to the existing bottom-up tree especially when the number of examples is much higher than the number of features. Statistical Data Classification Oblique Decision Trees Householder Reflection Bottom-Up Tree Induction
66	Geographic Relevance for Travel Search: The 2014-2015 Harvey Mudd College Clinic Project for Expedia, Inc. Long, Hannah 01 January 2015 (has links) The purpose of this Clinic project is to help Expedia, Inc. expand the search capabilities it offers to its users. In particular, the goal is to help the company respond to unconstrained search queries by generating a method to associate hotels and regions around the world with the higher-level attributes that describe them, such as “family- friendly” or “culturally-rich.” Our team utilized machine-learning algorithms to extract metadata from textual data about hotels and cities. We focused on two machine-learning models: decision trees and Latent Dirichlet Allocation (LDA). The first appeared to be a promising approach, but would require more resources to replicate on the scale Expedia needs. On the other hand, we were able to generate useful results using LDA. We created a website to visualize these results. Machine Learning Unconstrained Search Decision Trees Latent Dirichlet Allocation Unsupervised Learning Computer Sciences
67	Predicting High-cost Patients in General Population Using Data Mining Techniques Izad Shenas, Seyed Abdolmotalleb 26 October 2012 (has links) In this research, we apply data mining techniques to a nationally-representative expenditure data from the US to predict very high-cost patients in the top 5 cost percentiles, among the general population. Samples are derived from the Medical Expenditure Panel Survey’s Household Component data for 2006-2008 including 98,175 records. After pre-processing, partitioning and balancing the data, the final MEPS dataset with 31,704 records is modeled by Decision Trees (including C5.0 and CHAID), Neural Networks. Multiple predictive models are built and their performances are analyzed using various measures including correctness accuracy, G-mean, and Area under ROC Curve. We conclude that the CHAID tree returns the best G-mean and AUC measures for top performing predictive models ranging from 76% to 85%, and 0.812 to 0.942 units, respectively. Among a primary set of 66 attributes, the best predictors to estimate the top 5% high-cost population include individual’s overall health perception, history of blood cholesterol check, history of physical/sensory/mental limitations, age, and history of colonic prevention measures. It is worthy to note that we do not consider number of visits to care providers as a predictor since it has a high correlation with the expenditure, and does not offer a new insight to the data (i.e. it is a trivial predictor). We predict high-cost patients without knowing how many times the patient was visited by doctors or hospitalized. Consequently, the results from this study can be used by policy makers, health planners, and insurers to plan and improve delivery of health services. Cost prediction Data mining Decision Trees Neural networks Medical Expenditure Panel Survey Predictive modelling
68	Machine learning in embedded systems Swere, Erick A. R. January 2008 (has links) This thesis describes novel machine learning techniques specifically designed for use in real-time embedded systems. The techniques directly address three major requirements of such learning systems. Firstly, learning must be capable of being achieved incrementally, since many applications do not have a representative training set available at the outset. Secondly, to guarantee real-time performance, the techniques must be able to operate within a deterministic and limited time bound. Thirdly, the memory requirement must be limited and known a priori to ensure the limited memory available to hold data in embedded systems will not be exceeded. The work described here has three principal contributions. The frequency table is a data structure specifically designed to reduce the memory requirements of incremental learning in embedded systems. The frequency table facilitates a compact representation of received data that is sufficient for decision tree generation. The frequency table decision tree (FTDT) learning method provides classification performance similar to existing decision tree approaches, but extends these to incremental learning while substantially reducing memory usage for practical problems. The incremental decision path (IDP) method is able to efficiently induce, from the frequency table of observations, the path through a decision tree that is necessary for the classification of a single instance. The classification performance of IDP is equivalent to that of existing decision tree algorithms, but since IDP allows the maximum number of partial decision tree nodes to be determined prior to the generation of the path, both the memory requirement and the execution time are deterministic. In this work, the viability of the techniques is demonstrated through application to realtime mobile robot navigation. 006.31
69	Machine Learning Multi-Stage Classification and Regression in the Search for Vector-like Quarks and the Neyman Construction in Signal Searches Leone, Robert Matthew, Leone, Robert Matthew January 2016 (has links) A search for vector-like quarks (VLQs) decaying to a Z boson using multi-stage machine learning was compared to a search using a standard square cuts search strategy. VLQs are predicted by several new theories beyond the Standard Model. The searches used 20.3 inverse femtobarns of proton-proton collisions at a center-of-mass energy of 8 TeV collected with the ATLAS detector in 2012 at the CERN Large Hadron Collider. CLs upper limits on production cross sections of vector-like top and bottom quarks were computed for VLQs produced singly or in pairs, Tsingle, Bsingle, Tpair, and Bpair. The two stage machine learning classification search strategy did not provide any improvement over the standard square cuts strategy, but for Tpair, Bpair, and Tsingle, a third stage of machine learning regression was able to lower the upper limits of high signal masses by as much as 50%. Additionally, new test statistics were developed for use in the Neyman construction of confidence regions in order to address deficiencies in current frequentist methods, such as the generation of empty set confidence intervals. A new method for treating nuisance parameters was also developed that may provide better coverage properties than current methods used in particle searches. Finally, significance ratio functions were derived that allow a more nuanced interpretation of the evidence provided by measurements than is given by confidence intervals alone. Learning Algorithm Machine Learning Multi-Stage Neyman Construction Vector-like Quark Physics Boosted Decision Trees
70	Let's Have a party! An Open-Source Toolbox for Recursive Partytioning Hothorn, Torsten, Zeileis, Achim, Hornik, Kurt January 2007 (has links) (PDF) Package party, implemented in the R system for statistical computing, provides basic classes and methods for recursive partitioning along with reference implementations for three recently-suggested tree-based learners: conditional inference trees and forests, and model-based recursive partitioning. / Series: Research Report Series / Department of Statistics and Mathematics RVK ST 250

Search results