Global ETD Search

61	Intelligent Code Inspection using Static Code Features : An approach for Java Moriggl, Irene January 2010 (has links) Effective defect detection is still a hot issue when it comes to software quality assurance. Static source code analysis plays thereby an important role, since it offers the possibility for automated defect detection in early stages of the development. As detecting defects can be seen as a classification problem, machine learning is recently investigated to be used for this purpose. This study presents a new model for automated defect detection by means of machine learn- ers based on static Java code features. The model comprises the extraction of necessary features as well as the application of suitable classifiers to them. It is realized by a prototype for the feature extraction and a study on the prototype’s output in order to identify the most suitable classifiers. Finally, the overall approach is evaluated in a using an open source project. The suitability study and the evaluation show, that several classifiers are suitable for the model and that the Rotation Forest, Multilayer Perceptron and the JRip classifier make the approach most effective. They detect defects with an accuracy higher than 96%. Although the approach comprises only a prototype, it shows the potential to become an effective alternative to nowa- days defect detection methods. Java Static Source Code Analysis Machine Learning Automated Defect Detection Software Engineering Programvaruteknik
62	Information Visualization and Machine Learning Applied on Static Code Analysis Kacan, Denis, Sidlauskas, Darius January 2008 (has links) Software engineers will possibly never see the perfect source code in their lifetime, but they are seeing much better analysis tools for finding defects in software. The approaches used in static code analysis emerged from simple code crawling to usage of statistical and probabilistic frameworks. This work presents a new technique that incorporates machine learning and information visualization into static code analysis. The technique learns patterns in a program’s source code using a normalized compression distance and applies them to classify code fragments into faulty or correct. Since the classification frequently is not perfect, the training process plays an essential role. A visualization element is used in the hope that it lets the user better understand the inner state of the classifier making the learning process transparent. An experimental evaluation is carried out in order to prove the efficacy of an implementation of the technique, the Code Distance Visualizer. The outcome of the evaluation indicates that the proposed technique is reasonably effective in learning to differentiate between faulty and correct code fragments, and the visualization element enables the user to discern when the tool is correct in its output and when it is not, and to take corrective action (further training or retraining) interactively, until the desired level of performance is reached. Software validation static analyzer normalized compression distance source code visualization Software Engineering Programvaruteknik
63	Static Code Features for a Machine Learning based Inspection : An approach for C Tribus, Hannes January 2010 (has links) Delivering fault free code is the clear goal of each devel- oper, however the best method to achieve this aim is still an open question. Despite that several approaches have been proposed in literature there exists no overall best way. One possible solution proposed recently is to combine static source code analysis with the discipline of machine learn- ing. An approach in this direction has been defined within this work, implemented as a prototype and validated subse- quently. It shows a possible translation of a piece of source code into a machine learning algorithm’s input and further- more its suitability for the task of fault detection. In the context of the present work two prototypes have been de- veloped to show the feasibility of the presented idea. The output they generated on open source projects has been collected and used to train and rank various machine learn- ing classifiers in terms of accuracy, false positive and false negative rates. The best among them have subsequently been validated again on an open source project. Out of the first study at least 6 classifiers including “MultiLayerPer- ceptron”, “Ibk” and “ADABoost” on a “BFTree” could convince. All except the latter, which failed completely, could be validated in the second study. Despite that the it is only a prototype, it shows the suitability of some machine learning algorithms for static source code analysis. static source code analysis machine learning feature selection fault detection Software Engineering Programvaruteknik
64	Using an XML-driven approach to create tools for program understanding : An implementation for Configura and CET Designer / Ett XML-drivet tillvägagångssätt för att skapa vertyg för programförståelse : En implementation för Configura och CET Designer Wihlborg, Åsa January 2011 (has links) A major problem during development and maintenance of software is lack of quality documentation. Many programers have problems identifying which infor- mation is relevant for someone with no knowledge of the system and therefore write incomplete documentation. One way to get around these problems would be to use a tool that extracts information from both comments and the actual source code and presents the structure of the program visually. This thesis aims to design an XML-driven system for the extraction and pre- sentation of meta information about source code to that purpose. Relevant meta information in this case is, for example, which entities (classes, methods, variables, etc.) exist in the program and how they interact with each other. The result is a prototype implemented to manage two company developed lan- guages. The prototype demonstrates how the system can be implemented and show that the approach is scalable. The prototype is not suitable for commercial use due to its abstraction level, but with the help of qualified XML databases there are great possibilities to build a usable system using the same techniques in the future. / Ett stort problem under utvecklingen och underhållet av mjukvara är bristande dokumentation av källkoden. Många programmerare har svårt att identifiera vilken information som är viktig för någon som inte är insatt i systemet och skriver därför bristfällig dokumentation. Ett sätt att komma runt dessa problem skulle vara att använda verktyg som extraherar information från såväl kommentarer som faktisk källkod och presenterar programmets struktur påett tydligt och visuellt sätt. Det här examensarbetet ämnar att designa ett system för XML-driven extra- hering och presentation av metainformation om källkoden med just det syftet. Metainformationen som avses här är exempelvis vilka entiteter (klasser, metoder, variabler, mm.) som finns i källkoden samt hur dessa interagerar med varandra. Resultatet är en prototyp implementerad för att hantera tvåföretagsutvecklade språk. Prototypen demonstrerar hur systemet kan implementeras och visar att me- toden är skalbar. Prototypen är abstraktionsmässigt inte lämplig för kommersiellt bruk men med hjälp av kvalificerade XML-databaser finns det stora möjligheter att i framtiden bygga ett praktiskt användbart system baserat på samma tekniker. Program understanding XML Fact extraction Source code visualization Computer Sciences Datavetenskap (datalogi)
65	The analysis of enumerative source codes and their use in Burrows‑Wheeler compression algorithms McDonald, Andre Martin 10 September 2010 (has links) In the late 20th century the reliable and efficient transmission, reception and storage of information proved to be central to the most successful economies all over the world. The Internet, once a classified project accessible to a selected few, is now part of the everyday lives of a large part of the human population, and as such the efficient storage of information is an important part of the information economy. The improvement of the information storage density of optical and electronic media has been remarkable, but the elimination of redundancy in stored data and the reliable reconstruction of the original data is still a desired goal. The field of source coding is concerned with the compression of redundant data and its reliable decompression. The arithmetic source code, which was independently proposed by J. J. Rissanen and R. Pasco in 1976, revolutionized the field of source coding. Compression algorithms that use an arithmetic code to encode redundant data are typically more effective and computationally more efficient than compression algorithms that use earlier source codes such as extended Huffman codes. The arithmetic source code is also more flexible than earlier source codes, and is frequently used in adaptive compression algorithms. The arithmetic code remains the source code of choice, despite having been introduced more than 30 years ago. The problem of effectively encoding data from sources with known statistics (i.e. where the probability distribution of the source data is known) was solved with the introduction of the arithmetic code. The probability distribution of practical data is seldomly available to the source encoder, however. The source coding of data from sources with unknown statistics is a more challenging problem, and remains an active research topic. Enumerative source codes were introduced by T. J. Lynch and L. D. Davisson in the 1960s. These lossless source codes have the remarkable property that they may be used to effectively encode source sequences from certain sources without requiring any prior knowledge of the source statistics. One drawback of these source codes is the computationally complex nature of their implementations. Several years after the introduction of enumerative source codes, J. G. Cleary and I. H. Witten proved that approximate enumerative source codes may be realized by using an arithmetic code. Approximate enumerative source codes are significantly less complex than the original enumerative source codes, but are less effective than the original codes. Researchers have become more interested in arithmetic source codes than enumerative source codes since the publication of the work by Cleary and Witten. This thesis concerns the original enumerative source codes and their use in Burrows–Wheeler compression algorithms. A novel implementation of the original enumerative source code is proposed. This implementation has a significantly lower computational complexity than the direct implementation of the original enumerative source code. Several novel enumerative source codes are introduced in this thesis. These codes include optimal fixed–to–fixed length source codes with manageable computational complexity. A generalization of the original enumerative source code, which includes more complex data sources, is proposed in this thesis. The generalized source code uses the Burrows–Wheeler transform, which is a low–complexity algorithm for converting the redundancy of sequences from complex data sources to a more accessible form. The generalized source code effectively encodes the transformed sequences using the original enumerative source code. It is demonstrated and proved mathematically that this source code is universal (i.e. the code has an asymptotic normalized average redundancy of zero bits). AFRIKAANS : Die betroubare en doeltreffende versending, ontvangs en berging van inligting vorm teen die einde van die twintigste eeu die kern van die mees suksesvolle ekonomie¨e in die wˆereld. Die Internet, eens op ’n tyd ’n geheime projek en toeganklik vir slegs ’n klein groep verbruikers, is vandag deel van die alledaagse lewe van ’n groot persentasie van die mensdom, en derhalwe is die doeltreffende berging van inligting ’n belangrike deel van die inligtingsekonomie. Die verbetering van die bergingsdigteid van optiese en elektroniese media is merkwaardig, maar die uitwissing van oortolligheid in gebergde data, asook die betroubare herwinning van oorspronklike data, bly ’n doel om na te streef. Bronkodering is gemoeid met die kompressie van oortollige data, asook die betroubare dekompressie van die data. Die rekenkundige bronkode, wat onafhanklik voorgestel is deur J. J. Rissanen en R. Pasco in 1976, het ’n revolusie veroorsaak in die bronkoderingsveld. Kompressiealgoritmes wat rekenkundige bronkodes gebruik vir die kodering van oortollige data is tipies meer doeltreffend en rekenkundig meer effektief as kompressiealgoritmes wat vroe¨ere bronkodes, soos verlengde Huffman kodes, gebruik. Rekenkundige bronkodes, wat gereeld in aanpasbare kompressiealgoritmes gebruik word, is ook meer buigbaar as vroe¨ere bronkodes. Die rekenkundige bronkode bly na 30 jaar steeds die bronkode van eerste keuse. Die probleem om data wat afkomstig is van bronne met bekende statistieke (d.w.s. waar die waarskynlikheidsverspreiding van die brondata bekend is) doeltreffend te enkodeer is opgelos deur die instelling van rekenkundige bronkodes. Die bronenkodeerder het egter selde toegang tot die waarskynlikheidsverspreiding van praktiese data. Die bronkodering van data wat afkomstig is van bronne met onbekende statistieke is ’n groter uitdaging, en bly steeds ’n aktiewe navorsingsveld. T. J. Lynch and L. D. Davisson het tel–bronkodes in die 1960s voorgestel. Tel– bronkodes het die merkwaardige eienskap dat bronsekwensies van sekere bronne effektief met hierdie foutlose kodes ge¨enkodeer kan word, sonder dat die bronenkodeerder enige vooraf kennis omtrent die statistieke van die bron hoef te besit. Een nadeel van tel–bronkodes is die ho¨e rekenkompleksiteit van hul implementasies. J. G. Cleary en I. H. Witten het verskeie jare na die instelling van tel–bronkodes bewys dat benaderde tel–bronkodes gerealiseer kan word deur die gebruik van rekenkundige bronkodes. Benaderde tel–bronkodes het ’n laer rekenkompleksiteit as tel–bronkodes, maar benaderde tel–bronkodes is minder doeltreffend as die oorspronklike tel–bronkodes. Navorsers het sedert die werk van Cleary en Witten meer belangstelling getoon in rekenkundige bronkodes as tel–bronkodes. Hierdie tesis is gemoeid met die oorspronklike tel–bronkodes en die gebruik daarvan in Burrows–Wheeler kompressiealgoritmes. ’n Nuwe implementasie van die oorspronklike tel–bronkode word voorgestel. Die voorgestelde implementasie het ’n beduidende laer rekenkompleksiteit as die direkte implementasie van die oorspronklike tel–bronkode. Verskeie nuwe tel–bronkodes, insluitende optimale vaste–tot–vaste lengte tel–bronkodes met beheerbare rekenkompleksiteit, word voorgestel. ’n Veralgemening van die oorspronklike tel–bronkode, wat meer komplekse databronne insluit as die oorspronklike tel–bronkode, word voorgestel in hierdie tesis. The veralgemeende tel–bronkode maak gebruik van die Burrows–Wheeler omskakeling. Die Burrows–Wheeler omskakeling is ’n lae–kompleksiteit algoritme wat die oortolligheid van bronsekwensies wat afkomstig is van komplekse databronne omskakel na ’n meer toeganklike vorm. Die veralgemeende bronkode enkodeer die omgeskakelde sekwensies effektief deur die oorspronklike tel–bronkode te gebruik. Die universele aard van hierdie bronkode word gedemonstreer en wiskundig bewys (d.w.s. dit word bewys dat die kode ’n asimptotiese genormaliseerde gemiddelde oortolligheid van nul bisse het). Copyright / Dissertation (MEng)--University of Pretoria, 2010. / Electrical, Electronic and Computer Engineering / unrestricted Source coding Burrows wheeler Enumerative Source code Compression Fixed length code Universal code UCTD
66	Software Source Code Readability : A Mapping Study Bexell, Andreas January 2020 (has links) Background: Building software systems is an iterative and collaborative project, requiring developers not only to write code, but to maintain, expand, fix and enhance code already written. In order to do so, reading code is a central activity, and therefore it is important that code is written in a manner that makes it readable. Objectives: To map the state-of-the-art of software source code readability and find the definitions and methods to measure it, and provide an overview of the kinds of factors considered to impact software source code readability, and to compare this to practitioners' experiences of software source code readability. Methods: A systematic literature review of 76 studies in 72 papers from the last 40 years, explicitly concerning software source code readability, is compared with the results of five interviews with practitioners, of which three are case studies of commits explicitly targeting readability. Results: While individual factors' contribution towards readability is studied with some success, more general modelling studies often suffer from methodological problems, making them difficult to apply in practice or in studies of the correlation between software source code readability and other metrics. Conclusions: Key elements of the state-of-the-art have been implemented in practice, however, readability models are not used by the practitioners in this study. Several factors mentioned by practitioners are not considered by the studies included, and further qualitative study of software development practitioners may be needed. software architecture software source code readability systematic literature review Computer Systems Datorsystem
67	Detekce plagiátů programových kódů / Plagiarism detection of program codes Nečadová, Anežka January 2015 (has links) This semestral thesis presents definition of plagiarism and focuses primarily on solving this problem in academic world. The main topic is the detection of plagiarism. It is discussed the various steps of the detection process and special attention is given to plagiarism detection of program codes. The work mentions question of the reliability of detection tools and divides the plagiarism detection methods into basic groups. One chapter is devoted metrics for comparing files. Mentioned are two tools available to detect plagiarism. In the last chapter is analyzed own draft program for plagiarism detection of program codes. The detector was applied to a database of student’s works, and the results were plotted.
68	Diff pro různé typy dokumentů (Red Hat) / Multiple Document Type Diff Zemko, Michal January 2011 (has links) This thesis deals with comparing different types of files, especially source codes. It describes the problem of comparing source code and different ways of solving this problem, from simple line comparison, to AST comparison. Chosen method was comparison based on lexical analysis. This is also described in the work with instruments of its automation. The goal of this thesis is to design and implement modular application, which compares different types of files. The implemented module compares source code in programming languages C/C++, Java a Python. This module is easily extendable for comparisons with other languages.
69	För ett automatiserat återskapande av inbyggda systems funktionella arkitektur från källkod och produkt data / Towards automated recovery of embedded system functional architecture from source code and product data Zamouche, Ahmed, Chammam, Oussama January 2013 (has links) “För ett automatiserat återskapande av inbyggda systems funktionella arkitektur från källkod och produkt data” Den ökade komplexiteten i inbyggda system inom fordonsindustrin tillsammans med de striktare säkerhetsrestriktionerna som infördes av ISO26262 standarden, kräver bättre kunskap och kännedom om produktarkitekturen. Men, för befintliga produkter som inte var utvecklade enligt en väldefinierad arkitekturmodell, så måste en modell återhämtas. Syftet med detta examensarbete är att automatisera återhämtningen av funktionella arkitekturen för fordons inbyggda system, vilket är ett krav för många av ISO26262 aktiviteter. Detta examensarbete föreslår och beskriver två modeller för det inbyggda systemet i ett fordon, och visar dess användning för att bland annat generera användarvänliga vyer. Återhämtningen av modellerna sker genom att tolka den inbyggda C-koden och bearbeta fordonets data såsom inblandade styrenheter, deras adresser och CAN buss detaljerna. Två modeller har föreslagits för att fånga den återskapade informationen om inbyggda systemet i ett fordon: en produktmodell för inbyggda system och en mjukvaruarkitektur modell för den inbyggda mjukvaran. Produktmodellen är en enkel modell på det inbyggda systemet som bara inkluderar nödvändig hårdvaru- och mjukvaru-detaljer för att klara uppgiften att skapa denfunktionella arkitekturen. Den inbyggda mjukvaruarkitektur modellen härleds från produktmodellen. Den modellerar endast högnivå komponentbaserade mjukvarulagret i alla styrenheter tillsammans. Därmed så abstraheras all hårdvaruinformation inklusive mjukvaruallokering och CAN buss information. De föreslagna modellerna har framgångsrikt använts för att generera funktionella arkitekturen för ett par SCANIA lastbilar. Generering och återhämtningen av modellerna utfördes med hjälp av ett verktyg som utvecklades för detta ändamål. Vidare så har en standardiseringsmekanism från de föreslagna modellerna till AUTOSAR också tagits fram och presenterats. Standardiseringsmekanismen är rättfram när styrenhets kring utrustning inte beaktas i modellen. I framtiden bör sensorer och ställdon inkluderas i modellerna. En mer detaljerad studie av den inbyggda programvaruarkitektur modellen, beträffande databeroende, bör också genomföras för att ta itu med problemen rörande felaktiga data-flödesvägar vilka har träffats på i detta arbete. Dessa problem uppkommer vid steget för CAN buss abstraktion. / “Towards automated recovery of embedded system functional architecture from source code and product data” The increased embedded system complexity in the automotive industry together with stricter safety constraints introduced by the ISO26262 standard, require a better knowledge about the product architecture. However, for existing products which were not developed according to a well defined architecture model, the latter need to be recovered. The objective of this thesis work is to automate the recovery of the functional architecture in a vehicle, which is required for many of ISO26262 activities. The work of this thesis proposes and describes two embedded system models for the target system, and shows their usage to generate user friendly views. The recovery of the models is done by parsing embedded C-code and fetching vehicle's data such as involved ECUs, their addresses and CAN bus details. This work has proposed two models for capturing the recreated information about an automotive embedded system: a product model for the embedded system and an architecture model for the embedded software. The product model is a simple embedded system model that only includes needed hardware and software details for the task of generating the functional architecture. The embedded software architecture model is derived from the product model and abstracts all hardware information. The embedded software architecture model covers only the high-level component based software in all ECUs together abstracting away allocation and CAN bus information. The proposed models have been successfully used to generate functional architecture for a couple of SCANIA trucks. The generation and recovery of the models was performed by a software tool that has been developed for this purpose. In addition, a mapping from the embedded software model to AUTOSAR standard has been proposed as a way to standardise the representation. The mapping to AUTOSAR showed that it is quite straight forward when not taking in consideration any possible ECU peripherals. In the future, representation of sensors and actuators should be included in the models. A more detailed study of the architecture model for the embedded software, with regards to data-flow, should also be conducted to tackle issues related to wrong data-flow paths which have been found in this thesis. The issues arise in the step of CAN bus abstraction. / الموجز"نحو أتمتة استخلص البنية الوظيفية للنظم المضمنة من مصدر وبيانات المنتَج"،ISO إن الزيادة في تعقيد النظام المضمن في السيارات بالضافة إلى شروط السلمة الكثثر صرامة والتي أدخلها معيار 26262تتطلب معرفة أفضل ببنية المنتَج. ولذلك، فبالنسبة للسيارات التي تم إنتاجها بدون تتبع نموذج معماري واضح المعففالم، ف ففإن هففذاالخير يجب أن يتم استخلصه ل حقا.كثان الهدف من هذه الطروحة إثبات إمكثانية أتمتة استعادة البنية الوظيفية للنظام المضمن في السيارة، وهو المر المطلوب للعديدهذه الطروحة تقترح وتصف نموذجين للنظام المضمن، وتُظظهر إمكثاني ففة اسففتخدامهما لخل ففق .ISO من أنشطة المعيار 26262وبيانات السيارة المختلف ففة م ففن C عروض حاسوبية سهلة الفهم والستعمال. يتم استخلص النماذج عبر تحليل مصدر لغة البرمجة.(CAN) مثل أجهزة التحكثم الموجودة، عناوينها وتفاصيل شبكثة الكثانيقترح هذا العمل نموذجين جزئييْنن للنظم المضمنة في السيارات: نموذجَ المنتَج للنظام المضمن ونموذجففا لبرمجي ففات النظففامالمضمن فقط. نموذج المنتَج هو نموذج مبسط للنظام المضمن يحتوي فقط على الحد الدنى من التفاصيل اللزمففة عففن الجهففزةوالبرمجيات الذي يخوله للقيام بمهمة استخلص البنية وظيفية. أما نموذج برمجيات النظام المضمن فيُستمد من نموذج المنتج. هففذاالنموذج يغطي فقط البرمجيات المتواجدة في الشرائح العليا من برامج أنظمة التحكثم كثلها في آن واحد. بهذا الشكثل، فإن هذا النموذج.(CAN) ل يحتوي أي معلومات عن الجهزة بما في ذلك معلومات تموقع البرامج في أجهزة التحكثم ومعلومات شبكثة الكثانوقد تم استخلص ه ففذه النم ففاذج ،SCANIA تم استخدام النماذج المقترحة بنجاح لتوليد بنية وظيفية لبضع شاحنات تابعة لشركثةباستخدام برنامج حاسوبي تم تطويره لهذا الغرض. إضافة إلى ذلك، فقد تم تحديد طريقة ترجمة لنموذج برمجيات النظام المضففمنوالتي أظهرت أن الترجمة عملية سهلة ومباشِررة إذا لم يُؤخخذ في العتب ففار ،AUTOSAR المقترح حتي يصير متوافقا مع معيار الفأي أجهزة طرفية قد تكثون مربوطة بأجهزة التحكثم.مستقبل، ينبغي إضافة دعم لجهزة الستشعار والمشغلت الميكثانيكثية إلى النماذج المقترحة. كثما ينبغي إجففراء دراسففة أكثففثرتفصيل لنموذج برمجيات النظام المضمن فيما يتعلق بتدفق البيانات، وذلك لمعالجة بعض الشكثالت المتعلقة بمسارات تدفق خاطئ ففة.(CAN) للبيانات والتي تم العثور عليها في هذه الطروحة. هذه الخطاء تنشأ في خطوة تجريد معلومات شبكثة الكثان Engineering and Technology Teknik och teknologier
70	Automatic Detection of Source Code Plagiarism in Programming Courses / Automatisk identifiering av kodplagiat i programmeringskurser Bergman, Adam January 2021 (has links) Source code plagiarism is an ongoing problem in programming courses at higher academic institutions. For this reason, different automated source code plagiarism detection tools have been developed. However, they require several manual steps before the submissions can be compared. Linnaeus University uses GitLab to handle their students’ code-related assignments but lacks an integrated workflow for checking submissions against plagiarism. Instead, Linnaeus University’s plagiarism-checking process is done manually, which is a challenging and time-consuming process. This thesis is a case study on Linnaeus University, focusing on integrating one of the plagiarism detection tools with GitLab using Continuous integration pipelines. The objectives have been to collect students’ submissions, communicate with the plagiarism tool, and visually present the results within GitLab. The prototype has been evaluated by a set of manually created submissions with different levels of plagiarism to ensure that the detection tool differentiates plagiarized and non-plagiarized submissions. Teachers at Linnaeus University have tested the workflow and reasoned whether the prototype fulfills their requirements. source code plagiarism continuous integration gitlab jplag moss Computer Sciences Datavetenskap (datalogi)

Search results