Global ETD Search

1	Snippet Generation for Provenance Workflows Bhatti, Ayesha January 2011 (has links) Scientists often need to know how data was derived in addition to what it is. The detailed tracking of data transformation or provenance allows result reproducibility, knowledge reuse and data analysis. Scientific workflows are increasingly being used to represent provenance as they are capable of recording complicated processes at various levels of detail. In context of knowledge reuse and sharing; search technology is of paramount importance specially considering the huge and ever increasing amount of scientific data. It is computationally hard to produce a single exact answer to the user's query due to sheer volume and complicated structure of provenance. One solution to this difficult problem is to produce a list of candidate matches and let user select the most relevant result. Here search result presentation becomes very important as the user is required to make the final decision by looking at the workflows in the result list. Presentation of these candidate matches needs to be brief, precise, clear and revealing. This is a challenging task in case of workflows as they contain textual content as well as graphical structure. Current workflow search engines such as Yahoo Pipes! or myExperiment ignore the actual workflow specification and use metadata to create summaries. Workflows which lack metadata do not make good summaries even if they are useful and relevant as search criteria. This work investigates the possibility of creating meaningful and usable summaries or snippets based on structure and specification of workflows. We shall (1) present relevant published work done regarding snippet building techniques (2) explain how we mapped current techniques to our work (3) describe how we identified techniques from interface design theory in order to make usable graphical interface (4) present implementation of two new algorithms for workflow graph compression and their complexity analysis (5) identify future work in our implementation and outline open research problems in snippet building field. provenance snippets graph compression
2	Exploiting abstract syntax trees to locate software defects Shippey, Thomas Joshua January 2015 (has links) Context. Software defect prediction aims to reduce the large costs involved with faults in a software system. A wide range of traditional software metrics have been evaluated as potential defect indicators. These traditional metrics are derived from the source code or from the software development process. Studies have shown that no metric clearly out performs another and identifying defect-prone code using traditional metrics has reached a performance ceiling. Less traditional metrics have been studied, with these metrics being derived from the natural language of the source code. These newer, less traditional and finer grained metrics have shown promise within defect prediction. Aims. The aim of this dissertation is to study the relationship between short Java constructs and the faultiness of source code. To study this relationship this dissertation introduces the concept of a Java sequence and Java code snippet. Sequences are created by using the Java abstract syntax tree. The ordering of the nodes within the abstract syntax tree creates the sequences, while small sub sequences of this sequence are the code snippets. The dissertation tries to find a relationship between the code snippets and faulty and non-faulty code. This dissertation also looks at the evolution of the code snippets as a system matures, to discover whether code snippets significantly associated with faulty code change over time. Methods. To achieve the aims of the dissertation, two main techniques have been developed; finding defective code and extracting Java sequences and code snippets. Finding defective code has been split into two areas - finding the defect fix and defect insertion points. To find the defect fix points an implementation of the bug-linking algorithm has been developed, called S + e . Two algorithms were developed to extract the sequences and the code snippets. The code snippets are analysed using the binomial test to find which ones are significantly associated with faulty and non-faulty code. These techniques have been performed on five different Java datasets; ArgoUML, AspectJ and three releases of Eclipse.JDT.core Results. There are significant associations between some code snippets and faulty code. Frequently occurring fault-prone code snippets include those associated with identifiers, method calls and variables. There are some code snippets significantly associated with faults that are always in faulty code. There are 201 code snippets that are snippets significantly associated with faults across all five of the systems. The technique is unable to find any significant associations between code snippets and non-faulty code. The relationship between code snippets and faults seems to change as the system evolves with more snippets becoming fault-prone as Eclipse.JDT.core evolved over the three releases analysed. Conclusions. This dissertation has introduced the concept of code snippets into software engineering and defect prediction. The use of code snippets offers a promising approach to identifying potentially defective code. Unlike previous approaches, code snippets are based on a comprehensive analysis of low level code features and potentially allow the full set of code defects to be identified. Initial research into the relationship between code snippets and faults has shown that some code constructs or features are significantly related to software faults. The significant associations between code snippets and faults has provided additional empirical evidence to some already researched bad constructs within defect prediction. The code snippets have shown that some constructs significantly associated with faults are located in all five systems, and although this set is small finding any defect indicators that transfer successfully from one system to another is rare. 005.1
3	A qualitative study: how Solution Snippets are presented in Stack Overflow and how those Solution Snippets need to be adapted for reuse Weeraddana, Nimmi Rashinika 22 March 2022 (has links) Researchers use datasets of Question-Solution pairs to train machine learning models, such as source code generation models. A Question-Solution pair contains two parts: a programming question and its corresponding Solution Snippet. A Solution Snippet is a source code that solves a programming question. These datasets of Question-Solution pairs can be extracted from a number of different platforms. In this research, I study how Question-Solution pairs are extracted from Stack Overflow (SO). There are two limitations of datasets of Question-Solution pairs extracted from SO: (1) according to the authors of these datasets, some Question-Solution pairs contain Solution Snippets that do not solve the question correctly, and (2) these datasets do not contain the information on how Solution Snippets need to be reused, and such information would enhance the reusability of Solution Snippets. These limitations of datasets of pairs could adversely affect the quality of the code being generated by machine learning models. In this research, I conducted a qualitative study to categorize various presentations of Solution Snippets in SO’s answers as well as how Solution Snippets can be adapted for reuse. By doing so, I identified eight categories of how Solution Snippets are presented in SO’s answers and five categories of how Solution Snippets could be adapted. Based on these results, I concluded several potential reasons why it is not easy to create datasets of Question-Solution pairs. The first categorization informs that finding the correct location of the Solution Snippet is challenging when there are several code blocks within the answer to the question. Subsequently, the researcher must identify which code within that code block is the Solution Snippet. The second categorization informs that most Solution Snippets appear challenging to be adapted for reuse, and how Solution Snippets are potentially adapted is not explicitly stated in them. These insights shed light on creating better quality datasets from questions and answers posted on Stack Overflow. / Graduate Stack Overflow Code Blocks Solution Snippets Code Reuse Qualitative Study Question-Solution pairs Datasets Source code generation Source code synthesis
4	Capturing Knowledge of Emerging Entities from the Extended Search Snippets Ngwobia, Sunday C. January 2019 (has links) No description available. Computer Science Information Systems Emerging entities Capturing Knowledge Knowledge Graph search snippets Entity embedding Enhanced corpus, entity types entailment
5	Semantic snippets via query-biased ranking of linked data entities / Snippets sémantiques via l'ordonnancement biaisé-requête des entités LOD Alsarem, Mazen 30 May 2016 (has links) Dans cette thèse, nous introduisons un nouvel artefact interactif pour le SERP: le "Snippet sémantique". Les snippets sémantiques s'appuient sur la coexistence des deux Webs pour faciliter le transfert des connaissances aux utilisateurs grâce a une contextualisation sémantique du besoin d'information de l'utilisateur. Ils font apparaître les relations entre le besoin d'information et les entités les plus pertinentes présentes dans la page Web. / In this thesis, we introduce a new interactive artifact for the SERP: the "Semantic Snippet". Semantic Snippets rely on the coexistence of the two webs to facilitate the transfer of knowledge to the user thanks to a semantic contextualization of the user's information need. It makes apparent the relationships between the information need and the most relevant entities present in the web page. Informatique Web sémantique Web de données Ordonnancement d'entités Snippet sémantique Information Technology Semantic web Web of data Entity ranking Semantic snippets 025.040 72
6	Clustering the Web : Comparing Clustering Methods in Swedish / Webbklustring : En jämförelse av klustringsmetoder på svenska Hinz, Joel January 2013 (has links) Clustering -- automatically sorting -- web search results has been the focus of much attention but is by no means a solved problem, and there is little previous work in Swedish. This thesis studies the performance of three clustering algorithms -- k-means, agglomerative hierarchical clustering, and bisecting k-means -- on a total of 32 corpora, as well as whether clustering web search previews, called snippets, instead of full texts can achieve reasonably decent results. Four internal evaluation metrics are used to assess the data. Results indicate that k-means performs worse than the other two algorithms, and that snippets may be good enough to use in an actual product, although there is ample opportunity for further research on both issues; however, results are inconclusive regarding bisecting k-means vis-à-vis agglomerative hierarchical clustering. Stop word and stemmer usage results are not significant, and appear to not affect the clustering by any considerable magnitude. clustering web search results snippets k-means agglomerative hierarchical clustering bisecting k-means swedish Human Computer Interaction

1

Page generated in 0.049 seconds