Global ETD Search

91	Application of Digital Forensic Science to Electronic Discovery in Civil Litigation Roux, Brian 15 December 2012 (has links) Following changes to the Federal Rules of Civil Procedure in 2006 dealing with the role of Electronically Stored Information, digital forensics is becoming necessary to the discovery process in civil litigation. The development of case law interpreting the rule changes since their enactment defines how digital forensics can be applied to the discovery process, the scope of discovery, and the duties imposed on parties. Herein, pertinent cases are examined to determine what trends exist and how they effect the field. These observations buttress case studies involving discovery failures in large corporate contexts along with insights on the technical reasons those discovery failures occurred and continue to occur. The state of the art in the legal industry for handling Electronically Stored Information is slow, inefficient, and extremely expensive. These failings exacerbate discovery failures by making the discovery process more burdensome than necessary. In addressing this problem, weaknesses of existing approaches are identified, and new tools are presented which cure these defects. By drawing on open source libraries, components, and other support the presented tools exceed the performance of existing solutions by between one and two orders of magnitude. The transparent standards embodied in the open source movement allow for clearer defensibility of discovery practice sufficiency whereas existing approaches entail difficult to verify closed source solutions. Legacy industry practices in numbering documents based on Bates numbers inhibit efficient parallel and distributed processing of electronic data into paginated forms. The failures inherent in legacy numbering systems is identified, and a new system is provided which eliminates these inhibiters while simultaneously better modeling the nature of electronic data which does not lend itself to pagination; such non-paginated data includes databases and other file types which are machine readable, but not human readable in format. In toto, this dissertation provides a broad treatment of digital forensics applied to electronic discovery, an analysis of current failures in the industry, and a suite of tools which address the weaknesses, problems, and failures identified. Digital Forensics Forensic Computer Science Electronic Discovery Electronically Stored Information Discere Black Friar Civil Procedure Databases and Information Systems Theory and Algorithms
92	Reaper – Toward Automating Mobile Cloud Communication Ward, Daniel R 06 August 2013 (has links) Mobile devices connected to cloud based services are becoming a mainstream method of delivery up-to-date and context aware information to users. Connecting mobile applications to cloud service require significant developer effort. Yet this communication code usually follows certain patterns, varying accordingly to the specific type of data sent and received from the server. By analyzing the causes of theses variations, we can create a system that can automate the code creation for communication from a mobile device to a cloud server. To automate code creation, a general pattern must extracted. This general solution can then be applied to any database configuration. Automating this process frees up valuable development time, allowing developers to make other parts of the application and/or backend service a better experience for the end user. Databases and Information Systems OS and Networks Programming Languages and Compilers Software Engineering Theory and Algorithms
93	ULTRA-FAST AND MEMORY-EFFICIENT LOOKUPS FOR CLOUD, NETWORKED SYSTEMS, AND MASSIVE DATA MANAGEMENT Yu, Ye 01 January 2018 (has links) Systems that process big data (e.g., high-traffic networks and large-scale storage) prefer data structures and algorithms with small memory and fast processing speed. Efficient and fast algorithms play an essential role in system design, despite the improvement of hardware. This dissertation is organized around a novel algorithm called Othello Hashing. Othello Hashing supports ultra-fast and memory-efficient key-value lookup, and it fits the requirements of the core algorithms of many large-scale systems and big data applications. Using Othello hashing, combined with domain expertise in cloud, computer networks, big data, and bioinformatics, I developed the following applications that resolve several major challenges in the area. Concise: Forwarding Information Base. A Forwarding Information Base is a data structure used by the data plane of a forwarding device to determine the proper forwarding actions for packets. The polymorphic property of Othello Hashing the separation of its query and control functionalities, which is a perfect match to the programmable networks such as Software Defined Networks. Using Othello Hashing, we built a fast and scalable FIB named \textit{Concise}. Extensive evaluation results on three different platforms show that Concise outperforms other FIB designs. SDLB: Cloud Load Balancer. In a cloud network, the layer-4 load balancer servers is a device that acts as a reverse proxy and distributes network or application traffic across a number of servers. We built a software load balancer with Othello Hashing techniques named SDLB. SDLB is able to accomplish two functionalities of the SDLB using one Othello query: to find the designated server for packets of ongoing sessions and to distribute new or session-free packets. MetaOthello: Taxonomic Classification of Metagenomic Sequences. Metagenomic read classification is a critical step in the identification and quantification of microbial species sampled by high-throughput sequencing. Due to the growing popularity of metagenomic data in both basic science and clinical applications, as well as the increasing volume of data being generated, efficient and accurate algorithms are in high demand. We built a system to support efficient classification of taxonomic sequences using its k-mer signatures. SeqOthello: RNA-seq Sequence Search Engine. Advances in the study of functional genomics produced a vast supply of RNA-seq datasets. However, how to quickly query and extract information from sequencing resources remains a challenging problem and has been the bottleneck for the broader dissemination of sequencing efforts. The challenge resides in both the sheer volume of the data and its nature of unstructured representation. Using the Othello Hashing techniques, we built the SeqOthello sequence search engine. SeqOthello is a reference-free, alignment-free, and parameter-free sequence search system that supports arbitrary sequence query against large collections of RNA-seq experiments, which enables large-scale integrative studies using sequence-level data. Othello Hashing Forwarding Information Base Software Load Balancer Taxonomy Classification Sequence Search Bioinformatics Computer Sciences OS and Networks Theory and Algorithms
94	Bi-Objective Optimization of Kidney Exchanges Xu, Siyao 01 January 2018 (has links) Matching people to their preferences is an algorithmic topic with real world applications. One such application is the kidney exchange. The best "cure" for patients whose kidneys are failing is to replace it with a healthy one. Unfortunately, biological factors (e.g., blood type) constrain the number of possible replacements. Kidney exchanges seek to alleviate some of this pressure by allowing donors to give their kidney to a patient besides the one they most care about and in turn the donor for that patient gives her kidney to the patient that this first donor most cares about. Roth et al.~first discussed the classic kidney exchange problem. Freedman et al.~expanded upon this work by optimizing an additional objective in addition to maximal matching. In this work, I implement the traditional kidney exchange algorithm as well as expand upon more recent work by considering multi-objective optimization of the exchange. In addition I compare the use of 2-cycles to 3-cycles. I offer two hypotheses regarding the results of my implementation. I end with a summary and a discussion about potential future work. algorithmic matching multi-objective optimization kidney exchanges bi-objective optimization top trading cycle and chain algorithmic analysis Theory and Algorithms
95	MAnanA: A Generalized Heuristic Scoring Approach for Concept Map Analysis as Applied to Cybersecurity Education Blake Gatto, Sharon Elizabeth 06 August 2018 (has links) Concept Maps (CMs) are considered a well-known pedagogy technique in creating curriculum, educating, teaching, and learning. Determining comprehension of concepts result from comparisons of candidate CMs against a master CM, and evaluate "goodness". Past techniques for comparing CMs have revolved around the creation of a subjective rubric. We propose a novel CM scoring scheme called MAnanA based on a Fuzzy Similarity Scaling (FSS) score to vastly remove the subjectivity of the rubrics in the process of grading a CM. We evaluate our framework against a predefined rubric and test it with CM data collected from the Introduction to Computer Security course at the University of New Orleans (UNO), and found that the scores obtained via MAnanA captured the trend that we observed from the rubric via peak matching. Based on our evaluation, we believe that our framework can be used to objectify CM analysis. Cybersecurity Education Concept Map Analysis Fuzzy Set Theory and Logic Heuristic Scoring Membership Function Artificial Intelligence and Robotics Information Security Theory and Algorithms
96	Scalable Community Detection using Distributed Louvain Algorithm Sattar, Naw Safrin 23 May 2019 (has links) Community detection (or clustering) in large-scale graph is an important problem in graph mining. Communities reveal interesting characteristics of a network. Louvain is an efficient sequential algorithm but fails to scale emerging large-scale data. Developing distributed-memory parallel algorithms is challenging because of inter-process communication and load-balancing issues. In this work, we design a shared memory-based algorithm using OpenMP, which shows a 4-fold speedup but is limited to available physical cores. Our second algorithm is an MPI-based parallel algorithm that scales to a moderate number of processors. We also implement a hybrid algorithm combining both. Finally, we incorporate dynamic load-balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms, shows around 12-fold speedup scaling to a larger number of processors. Overall, we present the challenges, our solutions, and the empirical performance of our algorithms for several large real-world networks. Community Detection Louvain Method Parallel Algorithms MPI OpenMP Load-balancing Graph Mining Computer Engineering Computer Sciences Theory and Algorithms
97	Parallelizing a nondeterministic optimization algorithm D'Souza, Sammy Raymond 01 January 2007 (has links) This research explores the idea that for certain optimization problems there is a way to parallelize the algorithm such that the parallel efficiency can exceed one hundred percent. Specifically, a parallel compiler, PC, is used to apply shortcutting techniquest to a metaheuristic Ant Colony Optimization (ACO), to solve the well-known Traveling Salesman Problem (TSP) on a cluster running Message Passing Interface (MPI). The results of both serial and parallel execution are compared using test datasets from the TSPLIB. Combinatorial optimization Evolutionary computation Combinatorial optimization Evolutionary computation. Theory and Algorithms
98	Distributed multi-label learning on Apache Spark Gonzalez Lopez, Jorge 01 January 2019 (has links) This thesis proposes a series of multi-label learning algorithms for classification and feature selection implemented on the Apache Spark distributed computing model. Five approaches for determining the optimal architecture to speed up multi-label learning methods are presented. These approaches range from local parallelization using threads to distributed computing using independent or shared memory spaces. It is shown that the optimal approach performs hundreds of times faster than the baseline method. Three distributed multi-label k nearest neighbors methods built on top of the Spark architecture are proposed: an exact iterative method that computes pair-wise distances, an approximate tree-based method that indexes the instances across multiple nodes, and an approximate local sensitive hashing method that builds multiple hash tables to index the data. The results indicated that the predictions of the tree-based method are on par with those of an exact method while reducing the execution times in all the scenarios. The aforementioned method is then used to evaluate the quality of a selected feature subset. The optimal adaptation for a multi-label feature selection criterion is discussed and two distributed feature selection methods for multi-label problems are proposed: a method that selects the feature subset that maximizes the Euclidean norm of individual information measures, and a method that selects the subset of features maximizing the geometric mean. The results indicate that each method excels in different scenarios depending on type of features and the number of labels. Rigorous experimental studies and statistical analyses over many multi-label metrics and datasets confirm that the proposals achieve better performances and provide better scalability to bigger data than the methods compared in the state of the art. multi-label apache spark feature selection mutual information Artificial Intelligence and Robotics Theory and Algorithms
99	Active Tile Self-assembly and Simulations of Computational Systems Karpenko, Daria 01 April 2015 (has links) Algorithmic self-assembly has been an active area of research at the intersection of computer science, chemistry, and mathematics for almost two decades now, motivated by the natural self-assembly mechanism found in DNA and driven by the desire for precise control of nanoscale material manufacture and for the development of nanocomputing and nanorobotics. At the theoretical core of this research is the Abstract Tile Assembly Model (aTAM), the original abstract model of DNA tile self-assembly. Recent advancements in DNA nanotechnology have been made in developing strand displacement mechanisms that could allow DNA tiles to modify themselves during the assembly process by opening or closing certain binding sites, introducing new dynamics into tile self-assembly. We focus on one way of incorporating such signaling mechanisms for binding site activation and deactivation into the theoretical model of tile self-assembly by extending the aTAM to create the Active aTAM. We give appropriate definitions first for incorporating activation signals and then for incorporating deactivation signals and tile detachment into the aTAM. We then give a comparison of Active aTAM to related models, such as the STAM, and take a look at some theoretical results. The goal of the work presented here is to define and demonstrate the power of the Active aTAM with and without deactivation. To this end, we provide four constructions of temperature 1 (also known as "non-cooperative") active tile assembly systems that can simulate other computational systems. The first construction concerns the simulation of an arbitrary temperature 2 (also known as "cooperative") standard aTAM system in the sense of producing equivalent structures with a scaling factor of 2 in each dimension; the second construction generates the time history of a given 1D cellular automaton. The third and fourth constructions make use of tile detachment in order to dynamically simulate arbitrary 1D and 2D cellular automata with assemblies that record only the current state updates and not the entire computational history of the specified automaton. Active aTAM Cellular automata DNA self-assembly DNA tiling Simulation Temperature 1 assembly Applied Mathematics Theory and Algorithms
100	The Maximum Clique Problem: Algorithms, Applications, and Implementations Eblen, John David 01 August 2010 (has links) Computationally hard problems are routinely encountered during the course of solving practical problems. This is commonly dealt with by settling for less than optimal solutions, through the use of heuristics or approximation algorithms. This dissertation examines the alternate possibility of solving such problems exactly, through a detailed study of one particular problem, the maximum clique problem. It discusses algorithms, implementations, and the application of maximum clique results to real-world problems. First, the theoretical roots of the algorithmic method employed are discussed. Then a practical approach is described, which separates out important algorithmic decisions so that the algorithm can be easily tuned for different types of input data. This general and modifiable approach is also meant as a tool for research so that different strategies can easily be tried for different situations. Next, a specific implementation is described. The program is tuned, by use of experiments, to work best for two different graph types, real-world biological data and a suite of synthetic graphs. A parallel implementation is then briefly discussed and tested. After considering implementation, an example of applying these clique-finding tools to a specific case of real-world biological data is presented. Results are analyzed using both statistical and biological metrics. Then the development of practical algorithms based on clique-finding tools is explored in greater detail. New algorithms are introduced and preliminary experiments are performed. Next, some relaxations of clique are discussed along with the possibility of developing new practical algorithms from these variations. Finally, conclusions and future research directions are given. FPT paraclique NP-complete bioinformatics correlation software Bioinformatics Computational Biology Discrete Mathematics and Combinatorics Software Engineering Theory and Algorithms

Search results