Global ETD Search

1	Program distribution estimation with grammar models Shan, Yin, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2005 (has links) This thesis studies grammar-based approaches in the application of Estimation of Distribution Algorithms (EDA) to the tree representation widely used in Genetic Programming (GP). Although EDA is becoming one of the most active fields in Evolutionary computation (EC), the solution representation in most EDA is a Genetic Algorithms (GA) style linear representation. The more complex tree representations, resembling GP, have received only limited exploration. This is unfortunate, because tree representations provide a natural and expressive way of representing solutions for many problems. This thesis aims to help fill this gap, exploring grammar-based approaches to extending EDA to GP-style tree representations. This thesis firstly provides a comprehensive survey of current research on EDA with emphasis on EDA with GP-style tree representation. The thesis attempts to clarify the relationship between EDA with conventional linear representations and those with a GP-style tree representation, and to reveal the unique difficulties which face this research. Secondly, the thesis identifies desirable properties of probabilistic models for EDA with GP-style tree representation, and derives the PRODIGY framework as a consequence. Thirdly, following the PRODIGY framework, three methods are proposed. The first method is Program Evolution with Explicit Learning (PEEL). Its incremental general-to-specific grammar learning method balances the effectiveness and efficiency of the grammar learning. The second method is Grammar Model-based Program Evolution (GMPE). GMPE realises the PRODIGY framework by introducing elegant inference methods from the formal grammar field. GMPE provides good performance on some problems, but also provides a means to better understand some aspects of conventional GP, especially the building block hypothesis. The third method is Swift GMPE (sGMPE), which is an extension of GMPE, aiming at reducing the computational cost. Fourthly, a more accurate Minimum Message Length metric for grammar learning in PRODIGY is derived in this thesis. This metric leads to improved performance in the GMPE system, but may also be useful in grammar learning in general. It is also relevant to the learning of other probabilistic graphical models. Distribution algorithms estimation of distribution explicit learning genetic programming grammar models linear representation probabilistic models PRODIGY tree representation
2	A Comparative Study Of Tree Encodings For Evolutionary Computing Saka, Esin 01 July 2005 (has links) (PDF) One of the most important factors on the success of evolutionary algorithms (EAs) about trees is the representation of them. The representation should exhibit efficiency, locality and heritability to enable effective evolutionary computing. Neville proposed three different methods for encoding labeled trees. The first one is similar with Pr&uuml / fer&#039 / s encoding. In 2001, it is reported that, the use of Pr&uuml / fer numbers is a poor representation of spanning trees for evolutionary search, since it has low locality for random trees. In the thesis Neville&#039 / s other two encodings, namely Neville branch numbers and Neville leaf numbers, are studied. For their performance in EA their properties and algorithms for encoding and decoding them are also examined. Optimal algorithms with time and space complexities of O(n) , where n is the number of nodes, for encoding and decoding Neville branch numbers are given. The localities of Neville&#039 / s encodings are investigated. It is shown that, although the localities of Neville branch and leaf numbers are perfect for star type trees, they are low for random trees. Neville branch and Neville leaf numbers are compared with other codings in EAs and SA for four problems: &#039 / onemax tree problem&#039 / , &#039 / degree-constrained minimum spanning tree problem&#039 / , &#039 / all spanning trees problem&#039 / and &#039 / all degree constrained spanning trees problem&#039 / . It is shown that, neither Neville nor Pr&uuml / fer encodings are suitable for EAs. These encodings are suitable for only tree enumeration and degree computation. Algorithms which are timewise and spacewise optimal for &#039 / all spanning trees problem&#039 / (ASTP) for complete graphs, are given by using Neville branch encoding. Computed time and space complexities for solving ASTP of complete graphs are O(nn-2) and O(n) if trees are only enumerated and O(nn-1) and O(n) if all spanning trees are printed , respectively, where n is the number of nodes. Similarly, &#039 / all degree constrained spanning trees problem&#039 / of a complete graph is solvable in O(nn-1) time and O(n) space.
3	Multi-modal Neural Representations for Semantic Code Search / Multimodala neurala representationer för semantisk kodsökning Gu, Jian January 2020 (has links) In recent decades, various software systems have gradually become the basis of our society. Programmers search existing code snippets from time to time in their daily life. It would be beneficial and meaningful to have better solutions for the task of semantic code search, which is to find the most semantically relevant code snippets for a given query. Our approach is to introduce tree representations by multi-modal learning. The core idea is to enrich semantic information for code snippets by preparing data of different modalities, and meanwhile ignore syntactic information. We design one novel tree structure named Simplified Semantic Tree and then extract RootPath representations from that. We utilize RootPath representation to complement the conventional sequential representation, namely the token sequence of the code snippet. Our multi-modal model receives code-query pair as input and computes similarity score as output, following the pseudo-siamese architecture. For each pair, besides the ready-made code sequence and query sequence, we extra one extra tree sequence from Simplified Semantic Tree. There are three encoders in our model, and they respectively encode these three sequences as vectors of the same length. Then we combine the code vector with the tree vector for one joint vector, which is still of the same length, as the multi-modal representation for the code snippet. We introduce triplet loss to ensure vectors of code and query in the same pair be close at the shared vector space. We conduct experiments in one large-scale multi-language corpus, with comparisons of strong baseline models by specified performance metrics. Among baseline models, the simplest Neural Bag-of-Words model is with the most satisfying performance. It indicates that syntactic information is likely to distract complex models from critical semantic information. Results show that our multi-modal representation approach performs better because it surpasses baseline models by far in most cases. The key to our multi-modal model is that it is totally about semantic information, and it learns from data of multiple modalities. / Under de senaste decennierna har olika programvarusystem gradvis blivit basen i vårt samhälle. Programmerare söker i befintliga kodavsnitt från tid till annan i deras dagliga liv. Det skulle vara fördelaktigt och meningsfullt att ha bättre lösningar för uppgiften att semantisk kodsökning, vilket är att hitta de mest semantiskt relevanta kodavsnitten för en given fråga. Vår metod är att introducera trädrepresentationer genom multimodal inlärning. Grundidén är att berika semantisk information för kodavsnitt genom att förbereda data med olika modaliteter och samtidigt ignorera syntaktisk information. Vi designar en ny trädstruktur med namnet Simplified Semantic Tree och extraherar sedan RootPath-representationer från det. Vi använder RootPath-representation för att komplettera den konventionella sekvensrepresentationen, nämligen kodsekvensens symbolsekvens. Vår multimodala modell får kodfrågeställningar som inmatning och beräknar likhetspoäng som utgång efter den pseudo-siamesiska arkitekturen. För varje par, förutom den färdiga kodsekvensen och frågesekvensen, extrager vi en extra trädsekvens från Simplified Semantic Tree. Det finns tre kodare i vår modell, och de kodar respektive tre sekvenser som vektorer av samma längd. Sedan kombinerar vi kodvektorn med trädvektorn för en gemensam vektor, som fortfarande är av samma längd som den multimodala representationen för kodavsnittet. Vi introducerar tripletförlust för att säkerställa att vektorer av kod och fråga i samma par är nära det delade vektorn. Vi genomför experiment i ett storskaligt flerspråkigt korpus, med jämförelser av starka baslinjemodeller med specificerade prestandametriker. Bland baslinjemodellerna är den enklaste Neural Bag-of-Words-modellen med den mest tillfredsställande prestanda. Det indikerar att syntaktisk information sannolikt kommer att distrahera komplexa modeller från kritisk semantisk information. Resultaten visar att vår multimodala representationsmetod fungerar bättre eftersom den överträffar basmodellerna i de flesta fall. Nyckeln till vår multimodala modell är att den helt handlar om semantisk information, och den lär sig av data om flera modaliteter. multi-modal learning pseudo-siamese architecture neural bagof- words model tree representation Simplified Semantic Tree semantic code search multimodal inlärning pseudo-siamesisk arkitektur neural väskamed- ord-modell trädrepresentation förenklat semantiskt träd semantisk kodsökning Computer and Information Sciences Data- och informationsvetenskap

1

Page generated in 0.132 seconds