• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Theory and Patterns for Avoiding Regex Denial of Service

Hassan, Sk Adnan 01 June 2022 (has links)
Regular expressions are ubiquitous. They are used for diverse purposes, including input validation and firewalls. Unfortunately, they can also lead to a security vulnerability called ReDoS(Regular Expression Denial of Service), caused by a super-linear worst-case execution time during regex matching. ReDoS has a serious and wide impact: since applications written in most programming languages can be vulnerable to it, ReDoS has caused outages at prominent web services including Cloudflare and Stack Overflow. Due to the severity and prevalence of ReDoS, past work proposed mechanisms to identify and repair regexes. In this work, we set a different goal: helping developers avoid introducing regexes that could trigger ReDoS in the first place. A necessary condition for a regex to trigger ReDoS is to be infinitely ambiguous (IA). We propose a theory and a collection of anti-patterns to characterize infinitely ambiguous (IA) regexes. We evaluate our proposed anti-patterns in two complementary ways: quantitatively, over a dataset of 209,188 regexes from open- source software; and qualitatively, by observing humans using them in practice. In our large-scale evaluation, our anti-patterns characterized IA regexes with 100% precision and 99% recall, showing that they can capture the large majority of IA regexes, even when they are a simplified version of our theory. In our human experiment, practitioners applying our anti-patterns correctly assessed whether the regex that they were composing was IA or not in all of our studied regex-composition tasks. / Master of Science / Regular expressions are used by developers for different purposes, including input validation and firewalls. Unfortunately, they can also lead to a security vulnerability called ReDoS(Regular Expression Denial of Service), caused by a super-linear worst-case execution time during regex matching. ReDoS has caused outages at prominent web services including Cloudflare and Stack Overflow. ReDoS has a serious and wide impact: since applications written in most programming languages can be vulnerable to it. With this work, we wanted to help developers avoid introducing regexes that could trigger ReDoS in the first place. A necessary condition for a regex to trigger ReDoS is to be infinitely ambiguous (IA). We propose a theory and a collection of anti-patterns to characterize infinitely ambiguous (IA) regexes
2

Exploring the Process and Challenges of Programming with Regular Expressions

Michael, Louis Guy IV 27 June 2019 (has links)
Regular expressions (regexes) are a powerful mechanism for solving string-matching problems and are supported by all modern programming languages. While regular expressions are highly expressive, they are also often perceived to be highly complex and hard to read. While existing studies have focused on improving the readability of regular expressions, little is known about any other difficulties that developers face when programming with regular expressions. In this paper, we aim to provide a deeper understanding of the process of programming regular expressions by studying: (1) how developers make decisions through the process, (2) what difficulties they face, and (3) how aware they are about serious risks involved in programming regexes. We surveyed 158 professional developers from a diversity of backgrounds, and we conducted a series of interviews to learn more details about the difficulties and solutions that participants face in this process. This mixed methods approach revealed that some of the difficulties of regexes come in the shape of: inability to effectively search for them; fully validate them; and document them. Developers also reported cascading impacts of poor readability, lack of universal portability, and struggling with overall problem comprehension. The majority of our studied developers were unaware of critical security risks that can occur when using regexes, and those that were aware of potential problems felt that they lacked the ability to identify problematic regexes. Our findings provide multiple implications for future work, including development of semantic regex search engines for regex reuse, and improved input generators for regex validation. / Master of Science / Regular expressions (regexes) are a method to search for a set of matching text. They are easily understood as a way to flexibly search beyond exact matching and are frequently seen in the capacity of the find functionality of ctrl-f. Regexes are also very common in source code for a range of tasks including form validation, where a program needs to confirm that a user provided information that conformed to a specific structure, such as an email address. Despite being a widely supported programming feature, little is known about how developers go about creating regexes or what they struggle with when doing so. To gain a better understanding of how regexes are created and reused, we surveyed 158 professional developers from a diversity of backgrounds and experience levels about their processes and perceptions about regexes. As a followup to the survey we conducted a series of interviews focusing on the challenges faced by developers when tackling problems for which they felt that a regex was worth using. Through the combination of these studies, we developed a detailed understanding of how professional developers create regexes as well as many of the struggles that they face when doing so. These challenges come in the form of the inability to effectively search for, fully validate, and document regexes, as well as the cascading impacts of poor readability, lack of universal portability, and overall problem comprehension.
3

On the Impact and Defeat of Regular Expression Denial of Service

Davis, James Collins 28 May 2020 (has links)
Regular expressions (regexes) are a widely-used yet little-studied software component. Engineers use regexes to match domain-specific languages of strings. Unfortunately, many regex engine implementations perform these matches with worst-case polynomial or exponential time complexity in the length of the string. Because they are commonly used in user-facing contexts, super-linear regexes are a potential denial of service vector known as Regular expression Denial of Service (ReDoS). Part I gives the necessary background to understand this problem. In Part II of this dissertation, I present the first large-scale empirical studies of super-linear regex use. Guided by case studies of ReDoS issues in practice (Chapter 3), I report that the risk of ReDoS affects up to 10% of the regexes used in practice (Chapter 4), and that these findings generalize to software written in eight popular programming languages (Chapter 5). ReDoS appears to be a widespread vulnerability, motivating the consideration of defenses. In Part III I present the first systematic comparison of ReDoS defenses. Based on the necessary conditions for ReDoS, a ReDoS defense can be erected at the application level, the regex engine level, or the framework/runtime level. In my experiments I report that application-level defenses are difficult and error prone to implement (Chapter 6), that finding a compatible higher-performing regex engine is unlikely (Chapter 7), that optimizing an existing regex engine using memoization incurs (perhaps acceptable) space overheads (Chapter 8), and that incorporating resource caps into the framework or runtime is feasible but faces barriers to adoption (Chapter 9). In Part IV of this dissertation, we reflect on our findings. By leveraging empirical software engineering techniques, we have exposed the scope of potential ReDoS vulnerabilities, and given strong motivation for a solution. To assist practitioners, we have conducted a systematic evaluation of the solution space. We hope that our findings assist in the elimination of ReDoS, and more generally that we have provided a case study in the value of data-driven software engineering. / Doctor of Philosophy / Software commonly performs pattern-matching tasks on strings. For example, when validating input in a Web form, software commonly tests whether an input fits the pattern of a credit card number or an email address. Software engineers often implement such string-based pattern matching using a tool called regular expressions (regexes). Regexes permit software engineers to succinctly describe the sequences of characters that make up common "languages" like the set of valid Visa credit card numbers (16 digits, starting with a 4) or the set of valid emails (some characters, an '@', and more characters including at least one'.'). Using regexes on untrusted user input in this manner may be a dangerous decision because some regexes take a long time to evaluate. These slow regexes can be exploited by attackers in order to carry out a denial of service attack known as Regular expression Denial of Service (ReDoS). To date, ReDoS has led to outages affecting hundreds of websites and tens of thousands of users. While the risk of ReDoS is well known in theory, in this dissertation I present the first large-scale empirical studies measuring the extent to which slow regular expressions are used in practice. I found that about 10% of real regular expressions extracted from hundreds of thousands of software projects can exhibit longer-than-expected worst-case behavior in popular programming languages including JavaScript, Python, and Ruby. Motivated by these findings, I then consider a range of ReDoS solution approaches: application refactoring, regex engine replacement, regex engine optimization, and resource caps. I report that application refactoring is error-prone, and that regex engine replacement seems unlikely due to incompatibilities between regex engines. Some resource caps are more successful than others, but all resource cap approaches struggle with adoption. My novel regex engine optimizations seem the most promising approach for protecting existing regex engines, offering significant time reductions with acceptable space overheads.

Page generated in 0.0395 seconds