Global ETD Search

191	staDFA: An Efficient Subexpression Matching Method Unknown Date (has links) The main task of a Lexical Analyzer such as Lex [20], Flex [26] and RE/Flex [34], is to perform tokenization of a given input file within reasonable time and with limited storage requirements. Hence, most lexical analyzers use Deterministic Finite Automata (DFA) to tokenize input to ensure that the running time of the lexical analyzer is linear (or close to linear) in the size of the input. However, DFA constructed from Regular Expressions (RE) are inadequate to indicate the positions and/or extents in a matching string of a given subexpression of the regular expression. This means that all implementations of trailing contexts in DFA-based lexical analyzers, including Lex, Flex and RE/Flex, produce incorrect results. For any matching string in the input (also called the lexeme) that matches a token is regular expression pattern, it is not always possible to tell the position of a part of the lexeme that matches a subexpression of the regular expression. For example, the string abba matches the pattern a b*/b a, but the position of the trailing context b a of the pattern in the string abba cannot be determined by a DFA-based matcher in the aforementioned lexical analyzers. There are algorithms based on Nondeterministic Finite Automata (NFA) that match subexpressions accurately. However, these algorithms are costly to execute and use backtracking or breadth-first search algorithms that run in non-linear time, with polynomial or even exponential worst-case time complexity. A tagged DFA-based approach (TDFA) was pioneered by Ville Laurikari [15] to efficiently match subexpressions. However, TDFA are not perfectly suitable for lexical analyzers since the tagged DFA edges require sets of memory updates, which hampers the performance of DFA edge traversals when matching input. I will introduce a new DFA-based algorithm for efficient subexpression matching that performs memory updates in DFA states. I propose, the Store-Transfer-Accept Deterministic Finite Automata (staDFA). In my proposed algorithm, the subexpression matching positions and/or extents are stored in a Marker Position Store (MPS). The MPS is updated while the input is tokenized to provide the positions/extents of the sub-match. Compression techniques for DFA, such as Hopcroft’s method [14], default transitions [18, 19], and other methods, can be applied to staDFA. For an instance, this thesis provide a modified Hopcroft’s method for the minimization of staDFA. / A Thesis submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Master of Science. / Summer Semester 2018. / July 20, 2018. / Includes bibliographical references. / Robert A. van Engelen, Professor Directing Thesis; David Whalley, Committee Member; An-I Andy Wang, Committee Member. Computer science
192	Text Representation using Convolutional Networks Zhang, Xiang 29 March 2019 (has links) <p> This dissertation applies convolutional networks for learning representations of text, and it consists of several parts. The first part offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks. These results indicate that using low-level inputs – in this case characters – for convolutional networks could be feasible for text representation learning. The second part concerns which text encoding method might work for convolutional networks. We include a comprehensive comparison of different encoding methods for the task of text classification using 14 large-scale datasets in 4 languages including Chinese, English, Japanese and Korean. Different encoding levels are studied, including UTF-8 bytes, characters, words, romanized characters and romanized words. For all encoding levels, whenever applicable, we provide comparisons with linear models, fastText and convolutional networks. For convolutional networks, we compare between encoding mechanisms using character glyph images, one-hot (or one-of-n) encoding, and embedding. From these 473 models, one of the conclusions is that byte-level one-hot encoding works consistently best for convolutional networks. Based on this, in the third part of the dissertation we develop a convolutional network at the level of bytes for learning representations through the task of auto-encoding. The proposed model is a multi-stage deep convolutional encoder-decoder framework using residual connections, containing up to 160 parameterized layers. Each encoder or decoder contains a shared group of modules that consists of either pooling or up-sampling layers, making the network recursive in terms of abstraction levels in representation. The decoding process is non-sequential. Results for 6 large-scale paragraph datasets are reported, in 3 languages including Arabic, Chinese and English. Analyses are conducted to study several properties of the proposed model. Experiments are presented to verify that the auto-encoder can learn useful representations. In the fourth part of the dissertation, we use the improved design from the previous auto-encoding model to text classification, adding comparisons between residual and dense connections. This further validates the choice of the architecture we made for the auto-encoding model, and the effectiveness of the recursive architecture with residual or dense connections.</p><p> Computer science
193	Foundations and applications of program obfuscation Paneth, Omer 05 February 2019 (has links) Code is said to be obfuscated if it is intentionally difficult for humans to understand. Obfuscating a program conceals its sensitive implementation details and protects it from reverse engineering and hacking. Beyond software protection, obfuscation is also a powerful cryptographic tool, enabling a variety of advanced applications. Ideally, an obfuscated program would hide any information about the original program that cannot be obtained by simply executing it. However, Barak et al. [CRYPTO 01] proved that for some programs, such ideal obfuscation is impossible. Nevertheless, Garg et al. [FOCS 13] recently suggested a candidate general-purpose obfuscator which is conjectured to satisfy a weaker notion of security called indistinguishability obfuscation. In this thesis, we study the feasibility and applicability of secure obfuscation: - What notions of secure obfuscation are possible and under what assumptions? - How useful are weak notions like indistinguishability obfuscation? Our first result shows that the applications of indistinguishability obfuscation go well beyond cryptography. We study the tractability of computing a Nash equilibrium vii of a game { a central problem in algorithmic game theory and complexity theory. Based on indistinguishability obfuscation, we construct explicit games where a Nash equilibrium cannot be found efficiently. We also prove the following results on the feasibility of obfuscation. Our starting point is the Garg at el. obfuscator that is based on a new algebraic encoding scheme known as multilinear maps [Garg et al. EUROCRYPT 13]. 1. Building on the work of Brakerski and Rothblum [TCC 14], we provide the first rigorous security analysis for obfuscation. We give a variant of the Garg at el. obfuscator and reduce its security to that of the multilinear maps. Specifically, modeling the multilinear encodings as ideal boxes with perfect security, we prove ideal security for our obfuscator. Our reduction shows that the obfuscator resists all generic attacks that only use the encodings' permitted interface and do not exploit their algebraic representation. 2. Going beyond generic attacks, we study the notion of virtual-gray-box obfusca- tion [Bitansky et al. CRYPTO 10]. This relaxation of ideal security is stronger than indistinguishability obfuscation and has several important applications such as obfuscating password protected programs. We formulate a security requirement for multilinear maps which is sufficient, as well as necessary for virtual-gray-box obfuscation. 3. Motivated by the question of basing obfuscation on ideal objects that are simpler than multilinear maps, we give a negative result showing that ideal obfuscation is impossible, even in the random oracle model, where the obfuscator is given access to an ideal random function. This is the first negative result for obfuscation in a non-trivial idealized model. Computer science
194	High-performance geometric vascular modelling Qi, Quan January 2018 (has links) Image-based high-performance geometric vascular modelling and reconstruction is an essential component of computer-assisted surgery on the diagnosis, analysis and treatment of cardiovascular diseases. However, it is an extremely challenging task to efficiently reconstruct the accurate geometric structures of blood vessels out of medical images. For one thing, the shape of an individual section of a blood vessel is highly irregular because of the squeeze of other tissues and the deformation caused by vascular diseases. For another, a vascular system is a very complicated network of blood vessels with different types of branching structures. Although some existing vascular modelling techniques can reconstruct the geometric structure of a vascular system, they are either time-consuming or lacking sufficient accuracy. What is more, these techniques rarely consider the interior tissue of the vascular wall, which consists of complicated layered structures. As a result, it is necessary to develop a better vascular geometric modelling technique, which is not only of high performance and high accuracy in the reconstruction of vascular surfaces, but can also be used to model the interior tissue structures of the vascular walls. This research aims to develop a state-of-the-art patient-specific medical image-based geometric vascular modelling technique to solve the above problems. The main contributions of this research are: - Developed and proposed the Skeleton Marching technique to reconstruct the geometric structures of blood vessels with high performance and high accuracy. With the proposed technique, the highly complicated vascular reconstruction task is reduced to a set of simple localised geometric reconstruction tasks, which can be carried out in a parallel manner. These locally reconstructed vascular geometric segments are then combined together using shape-preserving blending operations to faithfully represent the geometric shape of the whole vascular system. - Developed and proposed the Thin Implicit Patch method to realistically model the interior geometric structures of the vascular tissues. This method allows the multi-layer interior tissue structures to be embedded inside the vascular wall to illustrate the geometric details of the blood vessel in real world. Computer science
195	THINC: A Virtual and Remote Display Architecture for Desktop Computing and Mobile Devices Baratto, Ricardo A. January 2011 (has links) THINC is a new virtual and remote display architecture for desktop computing. It has been designed to address the limitations and performance shortcomings of existing remote display technology, and to provide a building block around which novel desktop architectures can be built. THINC is architected around the notion of a virtual display device driver, a software-only component that behaves like a traditional device driver, but instead of managing specific hardware, enables desktop input and output to be intercepted, manipulated, and redirected at will. On top of this architecture, THINC introduces a simple, low-level, device-independent representation of display changes, and a number of novel optimizations and techniques to perform efficient interception and redirection of display output. This dissertation presents the design and implementation of THINC. It also introduces a number of novel systems which build upon THINC's architecture to provide new and improved desktop computing services. The contributions of this dissertation are as follows: - A high performance remote display system for LAN and WAN environments. This system differs from existing remote display technologies in that it focuses on the architecture of the system as a mechanism to improve performance, and not just on the remote display protocol and compression techniques. - A novel mechanism to natively support multimedia content in a remote display system in a way that is both transparent to applications and format independent. - pTHINC, a system to deliver improved remote display support for mobile devices, both in terms of performance and usability, and provide a competitive, and in some cases superior, alternative to native mobile applications. - MobiDesk, a desktop utility computing infrastructure that enables service providers to host desktop sessions in fully virtualized environments. Hosted sessions can be remotely accessed using THINC, they can be migrated across computers to provide high-availability, and can be effectively and efficiently protected from denial of service attacks. - Moving beyond remote display, we show how THINC's architecture can be used to provide continuous, low overhead recording of a desktop. Alongside, we introduce a novel way to leverage desktop accessibility services to allow users to search their recording based on captured text content. We have implemented prototypes for these systems, and evaluated their performance in a number of scenarios, and compared it to representative alternatives whenever possible. Our results demonstrate that THINC can provide superior remote display performance, and can be successfully used as a fundamental building block for new and improved desktop applications and services. Computer science
196	A Personal Virtual Computer Recorder Laadan, Oren January 2011 (has links) Continuing advances in hardware technology have enabled the proliferation of faster, cheaper, and more capable personal computers. Users of all backgrounds rely on their computers to handle ever-expanding information, communication, and computation needs. As users spend more time interacting with their computers, it is becoming increasingly important to archive and later search the knowledge, ideas and information that they have viewed through their computers. However, existing state-of-the-art web and desktop search tools fail to provide a suitable solution, as they focus on static, accessible documents in isolation. Thus, finding the information one has viewed among the ever-increasing and chaotic sea of data available from a computer remains a challenge. This dissertation introduces DejaView, a personal virtual computer recorder that enhances personal computers with the ability to process display-centric content to help users with all the information they see through their computers. DejaView continuously records a user's session to provide a complete WYSIWYS (What You Search Is What You've Seen) record of a desktop computing experience, enabling users to playback, browse, search, and revive records, making it easier to retrieve and interact with information they have seen before. DejaView records visual output, checkpoints corresponding application and file system states, and captures onscreen text with contextual information to index the record. A user can then browse and search the record for any visual information that has been previously displayed on the desktop, and revive and interact with the desktop computing state corresponding to any point in the record. DejaView introduces new, transparent operating system, display and file system virtualization techniques and novel semantic display-centric information recording, and combines them to provide its functionality without any modifications to applications, window systems, or operating system kernels. Our results demonstrate that DejaView can provide continuous low-overhead recording without any user-noticeable performance degradation, and allows users to playback, browse, search, and time-travel back to records fast enough for interactive use. This dissertation also demonstrates how DejaView's execution virtualization and recording extend beyond the desktop recorder context. We introduce a coordinated, parallel checkpoint-restart mechanism for distributed applications that minimizes synchronization overhead and uniquely supports complete checkpoint and restart of network state in a transport protocol independent manner, for both reliable and unreliable protocols. We introduce a scalable system that enables significant energy saving by migrating network state and applications off of idle hosts allowing the hosts to enter low-power suspend state, while preserving their network presence. Finally, we show how our techniques can be integrated into a commodity operating system, mainline Linux, thereby allowing the entire operating systems community to benefit from mature checkpoint-restart that is transparent, secure, reliable, efficient, and integral to the Linux kernel. Computer science
197	Data-Driven Programming Abstractions and Optimization for Multi-Core Platforms Collins, Rebecca L. January 2011 (has links) Multi-core platforms have spread to all corners of the computing industry, and trends in design and power indicate that the shift to multi-core will become even wider-spread in the future. As the number of cores on a chip rises, the complexity of memory systems and on-chip interconnects increases drastically. The programmer inherits this complexity in the form of new responsibilities for task decomposition, synchronization, and data movement within an application, which hitherto have been concealed by complex processing pipelines or deemed unimportant since tasks were largely executed sequentially. To some extent, the need for explicit parallel programming is inevitable, due to limits in the instruction-level parallelism that can be automatically extracted from a program. However, these challenges create a great opportunity for the development of new programming abstractions which hide the low-level architectural complexity while exposing intuitive high-level mechanisms for expressing parallelism. Many models of parallel programming fall into the category of data-centric models, where the structure of an application depends on the role of data and communication in the relationships between tasks. The utilization of the inter-core communication networks and effective scaling to large data sets are decidedly important in designing efficient implementations of parallel applications. The questions of how many low-level architectural details should be exposed to the programmer, and how much parallelism in an application a programmer should expose to the compiler remain open-ended, with different answers depending on the architecture and the application in question. I propose that the key to unlocking the capabilities of multi-core platforms is the development of abstractions and optimizations which match the patterns of data movement in applications with the inter-core communication capabilities of the platforms. After a comparative analysis that confirms and stresses the importance of finding a good match between the programming abstraction, the application, and the architecture, this dissertation proposes two techniques that showcase the power of leveraging data dependency patterns in parallel performance optimizations. Flexible Filters dynamically balance load in stream programs by creating flexibility in the runtime data flow through the addition of redundant stream filters. This technique combines a static mapping with dynamic flow control to achieve light-weight, distributed and scalable throughput optimization. The properties of stream communication, i.e., FIFO pipes, enable flexible filters by exposing the backpressure dependencies between tasks. Next, I present Huckleberry, a novel recursive programming abstraction developed in order to allow programmers to expose data locality in divide-and-conquer algorithms at a high level of abstraction. Huckleberry automatically converts sequential recursive functions with explicit data partitioning into parallel implementations that can be ported across changes in the underlying architecture including the number of cores and the amount of on-chip memory. I then present a performance model for multi-core applications which provides an efficient means to evaluate the trade-offs between the computational and communication requirements of applications together with the hardware resources of a target multi-core architecture. The model encompasses all data-driven abstractions that can be reduced to a task graph representation and is extensible to performance techniques such as Flexible Filters that alter an application's original task graph. Flexible Filters and Huckleberry address the challenges of parallel programming on multi-core architectures by taking advantage of properties specific to the stream and recursive paradigms, and the performance model creates a unifying framework based on the communication between tasks in parallel applications. Combined, these contributions demonstrate that specialization with respect to communication patterns enhances the ability of parallel programming abstractions and optimizations to harvest the power of multi-core platforms. Computer science
198	Quantum Algorithms and Complexity for Numerical Problems Zhang, Chi January 2011 (has links) Quantum computing has attracted a lot of attention in different research fields, such as mathematics, physics and computer science. Quantum algorithms can solve certain problems significantly faster than classical algorithms. There are many numerical problems, especially those arising from quantum systems, which are notoriously difficult to solve using classical computers, since the computational time required often scales exponentially with the size of the problem. However, quantum computers have the potential to solve these problems efficiently, which is also one of the founding ideas of the field of quantum computing. In this thesis, we explore five computational problems, designing innovative quantum algorithms and studying their computational complexity. First, we design an adiabatic quantum algorithm based on the Berry phases for the counting problem. For the running time, it is not as good as the optimal algorithm in the quantum circuit model, but better than the classical random algorithm. Moreover, since the Berry phase is a purely geometric feature, the result should be robust to decoherence and resilient to certain kinds of noise. Since the counting problem is the foundation of many other numerical problems, such as high-dimensional integration and path integration, our adiabatic algorithms can be directly generalized to these kinds of problems. In addition, we study the quantum PAC learning model, offering an improved lower bound on the query complexity. The lower bound is very close to the best lower bound on query complexity known for the classical PAC learning model. We also study the algorithms and the cost of simulating a system evolving under a given Hamiltonian. We consider high order splitting methods that are particularly applicable in quantum simulation and obtain bounds on the number of exponentials required. Moreover, we derive the optimal order of convergence given the required error bound. We compare our complexity estimates to previously known ones and show the resulting speedup. Furthermore, we consider randomized algorithms for Hamiltonian simulation. The evolution is simulated by a product of exponentials in a random sequence and random evolution times. Hence the final state of the system is approximated by a mixed quantum state. We provide a scheme to bound the error of the final quantum state in a randomized algorithm, and obtain randomized algorithms which have the same efficiency as certain deterministic algorithms but which are simpler to implement. We also apply Hamiltonian simulation in estimating the ground state energy of a multiparticle system, which is also known as the multivariate Sturm-Liouville eigenvalue problem. Since the cost of this problem grows exponentially with the number of particles using deterministic classical algorithms, it suffers from the curse of dimensionality. Quantum computers can vanquish the curse, and we exhibit a quantum algorithm whose total cost are linear in the number of particles. Computer science
199	Automatic Dialect and Accent Recognition and its Application to Speech Recognition Biadsy, Fadi January 2011 (has links) A fundamental challenge for current research on speech science and technology is understanding and modeling individual variation in spoken language. Individuals have their own speaking styles, depending on many factors, such as their dialect and accent as well as their socioeconomic background. These individual differences typically introduce modeling difficulties for large-scale speaker-independent systems designed to process input from any variant of a given language. This dissertation focuses on automatically identifying the dialect or accent of a speaker given a sample of their speech, and demonstrates how such a technology can be employed to improve Automatic Speech Recognition (ASR). In this thesis, we describe a variety of approaches that make use of multiple streams of information in the acoustic signal to build a system that recognizes the regional dialect and accent of a speaker. In particular, we examine frame-based acoustic, phonetic, and phonotactic features, as well as high-level prosodic features, comparing generative and discriminative modeling techniques. We first analyze the effectiveness of approaches to language identification that have been successfully employed by that community, applying them here to dialect identification. We next show how we can improve upon these techniques. Finally, we introduce several novel modeling approaches -- Discriminative Phonotactics and kernel-based methods. We test our best performing approach on four broad Arabic dialects, ten Arabic sub-dialects, American English vs. Indian English accents, American English Southern vs. Non-Southern, American dialects at the state level plus Canada, and three Portuguese dialects. Our experiments demonstrate that our novel approach, which relies on the hypothesis that certain phones are realized differently across dialects, achieves new state-of-the-art performance on most dialect recognition tasks. This approach achieves an Equal Error Rate (EER) of 4% for four broad Arabic dialects, an EER of 6.3% for American vs. Indian English accents, 14.6% for American English Southern vs. Non-Southern dialects, and 7.9% for three Portuguese dialects. Our framework can also be used to automatically extract linguistic knowledge, specifically the context-dependent phonetic cues that may distinguish one dialect form another. We illustrate the efficacy of our approach by demonstrating the correlation of our results with geographical proximity of the various dialects. As a final measure of the utility of our studies, we also show that, it is possible to improve ASR. Employing our dialect identification system prior to ASR to identify the Levantine Arabic dialect in mixed speech of a variety of dialects allows us to optimize the engine's language model and use Levantine-specific acoustic models where appropriate. This procedure improves the Word Error Rate (WER) for Levantine by 4.6% absolute; 9.3% relative. In addition, we demonstrate in this thesis that, using a linguistically-motivated pronunciation modeling approach, we can improve the WER of a state-of-the art ASR system by 2.2% absolute and 11.5% relative WER on Modern Standard Arabic. Computer science
200	Synthesis, Editing, and Rendering of Multiscale Textures Han, Charles January 2011 (has links) The study of textures---images with repeated visual content---has produced a number of useful tools and algorithms for analysis, synthesis, editing, rendering, and a variety of other applications. However, the recent rapid growth in data storage and computational abilities has expanded the notion of what constitutes a texture. Modern textures can often outstrip traditional assumptions on input size by several orders of magnitude. Additionally, these multiscale textures typically contain features at not just one scale but rather across a wide range of scales, further violating existing assumptions. In order to meaningfully capture the large-scale features present in multiscale textures, we introduce a new example-based input representation, the exemplar graph. This representation enables allows us to efficiently define textures spanning a large--or possibly infinite--range of visual scales. We develop a hierarchical, parallelizable algorithm for performing texture synthesis from an input exemplar graph. In addition to automated generation, an increasingly important application of texture synthesis is in interactive tools for guiding texture design. This modality is especially important for multiscale textures, as they offer special perceptual challenges to artists. We examine algorithmic and engineering optimizations to enable real-time analysis and synthesis of multiscale textures, and explore potential implications for editing tools. Finally, we study the issue of display. To accurately view a large image at distance, some filtering operation must be performed. In many cases, such as traditional color images, the filtering operations are well-known. However, other texture representations, such as normal or displacement maps, present special difficulties for filtering. We treat the former case, presenting a principled analysis and algorithms for filtering and display of large normal maps. Computer science

Search results