Spelling suggestions: "subject:"static core analysis""
11 |
Code Profiling : Static Code AnalysisBorchert, Thomas January 2008 (has links)
Capturing the quality of software and detecting sections for further scrutiny within are of high interest for industry as well as for education. Project managers request quality reports in order to evaluate the current status and to initiate appropriate improvement actions and teachers feel the need of detecting students which need extra attention and help in certain programming aspects. By means of software measurement software characteristics can be quantified and the produced measures analyzed to gain an understanding about the underlying software quality. In this study, the technique of code profiling (being the activity of creating a summary of distinctive characteristics of software code) was inspected, formulized and conducted by means of a sample group of 19 industry and 37 student programs. When software projects are analyzed by means of software measurements, a considerable amount of data is produced. The task is to organize the data and draw meaningful information from the measures produced, quickly and without high expenses. The results of this study indicated that code profiling can be a useful technique for quick program comparisons and continuous quality observations with several application scenarios in both industry and education.
|
12 |
Static Code Analysis: A Systematic Literature Review and an Industrial SurveyIlyas, Bilal, Elkhalifa, Islam January 2016 (has links)
Context: Static code analysis is a software verification technique that refers to the process of examining code without executing it in order to capture defects in the code early, avoiding later costly fixations. The lack of realistic empirical evaluations in software engineering has been identified as a major issue limiting the ability of research to impact industry and in turn preventing feedback from industry that can improve, guide and orient research. Studies emphasized rigor and relevance as important criteria to assess the quality and realism of research. The rigor defines how adequately a study has been carried out and reported, while relevance defines the potential impact of the study on industry. Despite the importance of static code analysis techniques and its existence for more than three decades, the number of empirical evaluations in this field are less in number and do not take into account the rigor and relevance into consideration. Objectives: The aim of this study is to contribute toward bridging the gap between static code analysis research and industry by improving the ability of research to impact industry and vice versa. This study has two main objectives. First, developing guidelines for researchers, which will explore the existing research work in static code analysis research to identify the current status, shortcomings, rigor and industrial relevance of the research, reported benefits/limitations of different static code analysis techniques, and finally, give recommendations to researchers to help improve the future research to make it more industrial oriented. Second, developing guidelines for practitioners, which will investigate the adoption of different static code analysis techniques in industry and identify benefits/limitations of these techniques as perceived by industrial professionals. Then cross-analyze the findings of the SLR and the surbvey to draw final conclusions, and finally, give recommendations to professionals to help them decide which techniques to adopt. Methods: A sequential exploratory strategy characterized by the collection and analysis of qualitative data (systematic literature review) followed by the collection and analysis of quantitative data (survey), has been used to conduct this research. In order to achieve the first objective, a thorough systematic literature review has been conducted using Kitchenham guidelines. To achieve the second study objective, a questionnaire-based online survey was conducted, targeting professionals from software industry in order to collect their responses regarding the usage of different static code analysis techniques, as well as their benefits and limitations. The quantitative data obtained was subjected to statistical analysis for the further interpretation of the data and draw results based on it. Results: In static code analysis research, inspection and static analysis tools received significantly more attention than the other techniques. The benefits and limitations of static code analysis techniques were extracted and seven recurrent variables were used to report them. The existing research work in static code analysis field significantly lacks rigor and relevance and the reason behind it has been identified. Somre recommendations are developed outlining how to improve static code analysis research and make it more industrial oriented. From the industrial point of view, static analysis tools are widely used followed by informal reviews, while inspections and walkthroughs are rarely used. The benefits and limitations of different static code analysis techniques, as perceived by industrial professionals, have been identified along with the influential factors. Conclusions: The SLR concluded that the techniques having a formal, well-defined process and process elements have receive more attention in research, however, this doesn’t necessarily mean that technique is better than the other techniques. The experiments have been used widely as a research method in static code analysis research, but the outcome variables in the majority of the experiments are inconsistent. The use of experiments in academic context contributed nothing to improve the relevance, while the inadequate reporting of validity threats and their mitigation strategies contributed significantly to poor rigor of research. The benefits and limitations of different static code analysis techniques identified by the SLR could not complement the survey findings, because the rigor and relevance of most of the studies reporting them was weak. The survey concluded that the adoption of static code analysis techniques in the industry is more influenced by the software life-cycle models in practice in organizations, while software product type and company size do not have much influence. The amount of attention a static code analysis technique has received in research doesn’t necessarily influence its adoption in industry which indicates a wide gap between research and industry. However, the company size, product type, and software life-cycle model do influence professionals perception on benefits and limitations of different static code analysis techniques.
|
13 |
Security smells in open-source infrastructure as code scripts : A replication studyHortlund, Andreas January 2021 (has links)
With the rising number of servers used in productions, virtualization technology engineers needed a new a tool to help them manage the rising configuration workload. Infrastructure as code(IaC), a term that consists mainly of techniques and tools to define wanted configuration states of servers in machine readable code files, which aims at solving the high workload induced by the configuration of several servers. With new tools, new challenges rise regarding the security of creating the infrastructure as code scripts that will take over the processing load. This study is about finding out how open-source developers perform when creating IaC scripts in regard to how many security smells they insert into their scripts in comparison to previous studies and such how developers can mitigate these risks. Security smells are code patterns that show vulnerability and can lead to exploitation. Using data gathered from GitHub with a web scraper tool created for this study, the author analyzed 400 repositories from Ansible and Puppet with a second tool created, tested and validated from previous study. The Security Linter for Infrastructure as Code uses static code analysis on these repositories and tested these against a certain ruleset for weaknesses in code such as default admin and hard-coded password among others. The present study used both qualitative and quantitative methods to analyze the data. The results show that developers that actively participated in developing these repositories with a creation date of at latest 2019-01-01 produced less security smells than Rahman et al (2019b, 2020c) with a data source ranging to November 2018. While Ansible produced 9,2 compared to 28,8 security smells per thousand lines of code and Puppet 13,6 compared to 31,1. Main limitation of the study come mainly in looking only at the most popular and used tools of the time of writing, being Ansible and Puppet. Further mitigation on results from both studies can be achieved through training and education. As well as the use of tools such as SonarQube for static code analysis against custom rulesets before the scripts are being pushed to public repositories.
|
14 |
Identifying and documenting false positive patterns generated by static code analysis toolsReynolds, Zachary P. 18 July 2017 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Static code analysis tools are known to flag a large number of false positives. A false positive is a warning message generated by a static code analysis tool for a location in the source code that does not have any known problems. This thesis presents our approach and results in identifying and documenting false positives generated by static code analysis tools. The goal of our study was to understand the different kinds of false positives generated so we can (1) automatically determine if a warning message from a static code analysis tool truly indicates an error, and (2) reduce the number of false positives developers must triage. We used two open-source tools and one commercial tool in our study. Our approach led to a hierarchy of 14 core false positive patterns, with some patterns appearing in multiple variations. We implemented checkers to identify the code structures of false positive patterns and to eliminate them from the output of the tools. Preliminary results showed that we were able to reduce the number of warnings by 14.0%-99.9% with a precision of 94.2%-100.0% by applying our false positive filters in different cases.
|
15 |
Using Machine Learning Techniques to Improve Static Code Analysis Tools UsefulnessAlikhashashneh, Enas A. 08 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / This dissertation proposes an approach to reduce the cost of manual inspections for as large a number of false positive warnings that are being reported by Static Code Analysis (SCA) tools as much as possible using Machine Learning (ML) techniques. The proposed approach neither assume to use the particular SCA tools nor depends on the specific programming language used to write the target source code or the application. To reduce the number of false positive warnings we first evaluated a number of SCA tools in terms of software engineering metrics using a highlighted synthetic source code named the Juliet test suite. From this evaluation, we concluded that the SCA tools report plenty of false positive warnings that need a manual inspection. Then we generated a number of datasets from the source code that forced the SCA tool to generate either true positive, false positive, or false negative warnings. The datasets, then, were used to train four of ML classifiers in order to classify the collected warnings from the synthetic source code. From the experimental results of the ML classifiers, we observed that the classifier that built using the Random Forests
(RF) technique outperformed the rest of the classifiers. Lastly, using this classifier and an instance-based transfer learning technique, we ranked a number of warnings that were aggregated from various open-source software projects. The experimental results show that the proposed approach to reduce the cost of the manual inspection of the false positive warnings outperformed the random ranking algorithm and was highly correlated with the ranked list that the optimal ranking algorithm generated.
|
16 |
MLpylint: Automating the Identification of Machine Learning-Specific Code SmellsHamfelt, Peter January 2023 (has links)
Background. Machine learning (ML) has rapidly grown in popularity, becoming a vital part of many industries. This swift expansion has brought about new challenges to technical debt, maintainability and the general software quality of ML systems. With ML applications becoming more prevalent, there is an emerging need for extensive research to keep up with the pace of developments. Currently, the research on code smells in ML applications is limited and there is a lack of tools and studies that address these issues in-depth. This gap in the research highlights the necessity for a focused investigation into the validity of ML-specific code smells in ML applications, setting the stage for this research study. Objectives. Addressing the limited research on ML-specific code smells within Python-based ML applications. To achieve this, the study begins with the identification of these ML-specific code smells. Once recognized, the next objective is to choose suitable methods and tools to design and develop a static code analysis tool based on code smell criteria. After development, an empirical evaluation will assess both the tool’s efficacy and performance. Additionally, feedback from industry professionals will be sought to measure the tool’s feasibility and usefulness. Methods. This research employed Design Science Methodology. In the problem identification phase, a literature review was conducted to identify ML-specific code smells. In solution design, a secondary literature review and consultations with experts were performed to select methods and tools for implementing the tool. Additionally, 160 open-source ML applications were sourced from GitHub. The tool was empirically tested against these applications, with a focus on assessing its performance and efficacy. Furthermore, using the static validation method, feedback on the tool’s usefulness was gathered through an expert survey, which involved 15 ML professionals from Ericsson. Results. The study introduced MLpylint, a tool designed to identify 20 ML-specific code smells in Python-based ML applications. MLpylint effectively analyzed 160ML applications within 36 minutes, identifying in total 5380 code smells, although, highlighting the need for further refinements to each code smell checker to accurately identify specific patterns. In the expert survey, 15 ML professionals from Ericsson acknowledged the tool’s usefulness, user-friendliness and efficiency. However, they also indicated room for improvement in fine-tuning the tool to avoid ambiguous smells. Conclusions. Current studies on ML-specific code smells are limited, with few tools addressing them. The development and evaluation of MLpylint is a significant advancement in the ML software quality domain, enhancing reliability and reducing associated technical debt in ML applications. As the industry integrates such tools, it’s vital they evolve to detect code smells from new ML libraries. Aiding developers in upholding superior software quality but also promoting further research in the ML software quality domain.
|
17 |
On the notions and predictability of Technical DebtDalal, Varun January 2023 (has links)
Technical debt (TD) is a by-product of short-term optimisation that results in long-term disadvantages. Because every system gets more complicated while it is evolving, technical debt can emerge naturally. The impact of technical debt is great on the financial cost of development, management, and deployment, it also has an impact on the time needed to maintain the project. As technical debt affects all parts of a development cycle for any project, it is believed that it is a major aspect of measuring the long-term quality of a software project. It is still not clear what aspect of a project impact and build upon the existing measure of technical debt. Hence this experiment, the ultimate task is to try and estimate the generalisation error in predicting technical debt using software metrics, and adaptive learning methodology. As software metrics are considered to be absolute regardless of how they are estimated. The software metrics were compiled from an established data set; Qualitas.classCorpus, and the notions of technical debt were collected from three different Staticanalysis tools; SonarQube, Codiga, and CodeClimate.The adaptive learning methodology uses multiple parameters and multiple machine learning models, to remove any form of bias regarding the estimation. The outcome suggests that it is not feasible to predict technical debt for small-sized projects using software-level metrics for now, but for bigger projects, it can be a good idea to have a rough estimation in terms of the number of hours needed to maintain the project.
|
18 |
Measurement and Development for Automated Secure Coding SolutionsFrantz, Miles Eugene 09 September 2024 (has links)
With the rise of development efforts, there has also been a rise in source code vulnerabilities.
Advanced security tools have been created to identify these vulnerabilities throughout the lifetime of the developer's ecosystem and afterward, before the vulnerabilities are exposed.
One such popular method is Static Code Analysis (Code Analysis) (SCA), which scans developers' source code to identify potential vulnerabilities in the code. My Ph.D. work aims to help reduce the vulnerabilities exposed by YIELD, ENHANCE, and EVALUATE (EYE) SCA tools to identify vulnerabilities while the developer writes the code. We first look into evaluating tools that support developers with their source code by determining how accurate they are with identifying vulnerability information. Large Language Machine Learning Model (LLM)s have been on the rise recently with the introduction of Chat Generative Pre-trained Transformer (ChatGPT) 3.5, ChatGPT 4.1, Google Gemini, and many more.
Using a common framework, we created a zero-shot prompt instructing the LLM to identify; whether there is a vulnerability in the provided source code and what Common Weakness Enumeration (CWE) value represents the vulnerability. With our Python cryptographic benchmark PyCryptoBench, we sent vulnerable samples to four different LLMs and two different versions of ChatGPT Application Program Interface (API)s. The samples allow us to measure how reliable each LLM is at vulnerability identification and defining. The Chat- GPT APIs include multiple reproducible fields that allowed us to measure how reproducible the responses are. Next, we yield a new SCA tool to apply what we learned to a current gap in increasingly complex source code. Cryptolation, our state-of-the-art (SOA) Python SCA tool uses constant propagation-supported variable inference to obtain insight into the data flow state through the program's execution. Python source code has ever-increasing complexities and a lack of SCA tools compared to Java. We compare Cryptolation with the other SOA SCA tools Bandit, Semgrep, and Dlint. To verify the Precision of our tool, we created the benchmark PyCryptoBench, which contains 1,836 test cases and encompasses five different language features. Next, we crawled over 1,000 cryptographic-related Python projects on GitHub and each with each tool. Finally, we reviewed all PyCryptoBench results and sampled over 10,000 cryptographic-related Python projects. The results reveal Cryptolation has a 100% Precision on the benchmark, with the second highest Precision with cryptographic-related projects. Finally, we look at enhancing SCA tools. The SOA tools already compete to have the highest Precision, Recall, and Accuracy. However, we examine several developer surveys to determine their reasons for not adopting such tools. These are generally better aesthetics, usability, customization, and a low effort cost to use consistently.
To achieve this, we enhance the SOA Java SCA tool CryptoGuard with the following: integrated build tools, modern terminal Command Line Interface (CLI) usage, customizable and vendor-specific output formats, and no-install demos. / Doctor of Philosophy / With the rise of more development efforts and source codes, there has also been a rise in source code vulnerabilities. More advanced security tools have been created to identify these vulnerabilities before they are exposed to match this. SCA are a popular method for identifying vulnerable source code since they do not execute any code and can scan the code while the developer is writing it. Despite their popularity, there is still much room for improvement.
My Ph.D. work aims to help reduce the vulnerabilities exposed by EYE SCA tools to identify vulnerabilities while the developer writes the code. First, we look into evaluating tools that support and refine SCA by examining the Accuracy and secureness of generative LLMs. LLM have been on the rise recently with the introduction of ChatGPT 3.5 and, more recently, ChatGPT 4.1. ChatGPT is a conversation-based program in which you ask the program a question, and it answers the question. This can explain small source code snippets to developers, provide source code examples, or even fix source code. While the developers of the LLMs have restricted certain aspects of the models, one of their main selling points is their source code assistance. With over 1,000 zero-shot prompts, we measure how accurate and reliable LLMs are in identifying the existence and information of vulnerabilities within the source code. First, we yield a new SCA tool to apply what we learned to a current gap in increasingly complex source code. This tool is Cryptolation, a Python SCA tool that uses variable inference to try to determine the variable values without execution. Python source code has ever-increasing complexities and a lack of tools compared to Java. We compare Cryptolation with four other SOA tools. To verify the Precision of our tool, we create the benchmark PyCryptoBench, over 1,000 test cases encompassing five different language features.
Next, we crawled over 1,000 cryptographic-related Python projects on GitHub and each with each tool. Finally, we reviewed all PyCryptoBench results and samples of the 10,000 cryptographic-related Python projects. The results reveal Cryptolation has a 100% Precision on the benchmark, with the second highest Precision with cryptographic-related projects. Next, we look at enhancing SCA tools. The SOA tools already compete to have the highest Precision, Recall, and Accuracy. However, we investigated developers' current surveys to see what they identified as reasons not to adopt such tools. These are generally better aesthetics, usability, customization, and a low effort cost to use consistently. To achieve this, we enhance the SOA Java SCA tool CryptoGuardto address these adequately.
|
19 |
Analysis of Security Findings and Reduction of False Positives through Large Language ModelsWagner, Jonas 18 October 2024 (has links)
This thesis investigates the integration of State-of-the-Art (SOTA) Large Language Models
(LLMs) into the process of reassessing security findings generated by Static Application
Security Testing (SAST) tools. The primary objective is to determine whether LLMs are
able to detect false positives (FPs) while maintaining a high true positive (TP) rate, thereby
enhancing the efficiency and effectiveness of security assessments.
Four consecutive experiments were conducted, each addressing specific research questions.
The initial experiment, using a dataset of security findings extracted from the OWASP Bench-
mark, identified the optimal combination of context items provided by the SAST tool Spot-
Bugs, which, when used with GPT-3.5 Turbo, reduced FPs while minimizing the loss of
TPs. The second experiment, conducted on the same dataset, demonstrated that advanced
prompting techniques, particularly few-shot Chain-of-Thought (CoT) prompting combined
with Self-Consistency (SC), further improved the reassessment process. The third experiment
compared both proprietary and open-source LLMs on an OWASP Benchmark dataset about
one-fourth the size of the previously used dataset. GPT-4o achieved the highest performance,
detecting 80 out of 128 FPs without missing any TPs, resulting in a perfect TPR of 100% and
a decrease in FPR by 41.27 percentage points. Meanwhile, Llama 3.1 70B detected 112 out
of the 128 FPs but missed 10 TPs, resulting in a TPR of 94.94% and a reduction in FPR by
56.62 percentage points. To validate these findings in a real-world context, the approach was
applied to a dataset generated from the open-source project Mnestix using multiple SAST
tools. GPT-4o again emerged as the top performer, detecting 26 out of 68 FPs while only
missing one TP, resulting in a TPR decreased by 2.22 percentage points but simultaneously
an FPR decreased 37.57 percentage points.:Table of Contents IV
List of Figures VI
List of Tables VIII
List of Source Codes IX
List of Abbreviations XI
1. Motivation 1
2. Background 3
3. Related Work 17
4. Concept 31
5. Preparing a Security Findings Dataset 39
6. Implementing a Workflow 51
7. Identifying Context Items 67
8. Comparing Prompting Techniques 85
9. Comparing Large Language Models 101
10.Evaluating Developed Approach 127
11.Discussion 141
12.Conclusion 145
A. Appendix: Figures 147
A.1. Repository Directory Tree 148
A.2. Precision-Recall Curve of Compared Large Language Models 149
A.3. Performance Metrics Self-Consistency on Mnestix Dataset 150
B. Appendix: Tables 151
B.1. Design Science Research Concept 151
C. Appendix: Code 153
C.1. Pydantic Base Config Documentation 153
C.2. Pydantic LLM Client Config Documentation 155
C.3. LLM BaseClient Class 157
C.4. Test Cases Removed From Dataset 158
|
20 |
Ranking of Android Apps based on Security EvidencesAyush Maharjan (9728690) 07 January 2021 (has links)
<p>With the large number of Android apps available in app stores such as
Google Play, it has become increasingly challenging to choose among the apps.
The users generally select the apps based on the ratings and reviews of other
users, or the recommendations from the app store. But it is very important to
take the security into consideration while choosing an app with the increasing
security and privacy concerns with mobile apps. This thesis proposes different
ranking schemes for Android apps based on security apps evaluated from the
static code analysis tools that are available. It proposes the ranking schemes
based on the categories of evidences reported by the tools, based on the
frequency of each category, and based on the severity of each evidence. The
evidences are gathered, and rankings are generated based on the theory of
Subjective Logic. In addition to these ranking schemes, the tools are
themselves evaluated against the Ghera benchmark. Finally, this work proposes
two additional schemes to combine the evidences from difference tools to
provide a combined ranking.</p>
|
Page generated in 0.0911 seconds