Global ETD Search

31	Region-based memory management for expressive GPU programming Holk, Eric 10 August 2016 (has links) <p> Over the last decade, graphics processing units (GPUs) have seen their use broaden from purely graphical tasks to general purpose computation. The increased programmability required by demanding graphics applications has proven useful for a number of non-graphical problems as well. GPUs' high memory bandwidth and floating point performance make them attractive for general computation workloads, yet these benefits come at the cost of added complexity. One particular problem is the fact that GPUs and their associated high performance memory typically lie on discrete cards that are separated from the host CPU} by the PCI-Express bus. This requires programmers to carefully manage the transfer of data between the CPU and GPU memory so that the right data is in the right place at the right time. Programmers must design data structures with serialization in mind in order to efficiently move data across the PCI bus. In practice, this leads to programmers working with only simple data structures such as one or two-dimensional arrays and the applications that can be easily expressed in terms of these structures. CPU programmers have long had access to richer data structures, such as trees or first class procedures, which enable new and simpler approaches to solving certain problems. </p><p> This thesis explores the use of RBMM to overcome these data movement challenges. RBMM is a technique in which data is assigned to regions and these regions can then be operated on as a unit. One of the first uses of regions was to amortize the cost of deallocation. Many small objects would be allocated in a single region and the region could be deallocated as a single operation independent of the number of items in the region. In this thesis, regions are used as the unit of data movement between the CPU and GPU. Data structures are assigned to a region and thus the runtime system does not have to be aware of the internal layout of a data structure. The runtime system can simply move the entire region from one device to another, keeping the internal layout intact and allowing code running on either device to operate on the data in the same way. </p><p> These ideas are explored through a new programming language called Harlan. Harlan is designed to simplify programming GPUs and other data parallel processors. It provides kernel expressions as its fundamental mechanism for parallelism. Kernels function similarly to a parallel map or zipWith operation from other functional programming languages. For example, the expression <tt> (kernel ([x xs] [y ys]) (+ x y))</tt> evaluates to a vector where each element is the sum of the corresponding elements in xs and ys. Kernels can have arbitrary body expressions that can even include kernels, thereby supporting nested data parallelism. Harlan uses a region-based memory system to enable higher level programming features such as trees and ADTs and even first class procedures. Like all data in Harlan, first class procedures are device-independent, so a procedure created in GPU code can be applied in CPU code and vice-versa. </p><p> Besides providing the design and description of the implementation of Harlan, this thesis includes a type safety proof for a small model of Harlan's region system as well as a number of small application case studies. The type safety proof provides formal support that Harlan ensures programs will have the right data in the right place at the right time. The application case studies show that Harlan and the ideas embodied within it are useful both for a number of traditional applications as well as problems that are problematic for previous GPU programming languages. The design and implementation of Harlan, its proof of type safety and the set of application case studies together show that region-based memory management is an effective way of enabling high level features in languages targeting CPU/GPU systems and other machines with disjoint memories.</p> Computer science
32	Social Media Network Data Mining and Optimization Jose, Neha Clare 13 June 2016 (has links) Many small social aid organizations could benefit from collaborating with other organizations on common causes, but may not have the necessary social relationships. We present a framework for a recommender system for the Louisiana Poverty Initiative that identifies member organizations with common causes and aims to forge connections between these organizations. Our framework employs a combination of graph and text analyses of the organizations' Facebook pages. We use NodeXL, a plugin to Microsoft Excel, to download the Facebook graph and to interface with SNAP, the Stanford Network Analysis Platform, for calculating network measurements. Our framework extends NodeXL with algorithms that analyze the text found on the Facebook pages as well as the connections between organizations and individuals posting on those pages. As a substitute for more complex text data mining, we use a simple keyword analysis for identifying the goals and initiatives of organizations. We present algorithms that combine this keyword analysis with graph analyses that compute connectivity measurements for both organizations and individuals. The results of these analyses can then be used to form a recommender system that suggests new network links between organizations and individuals to let them explore collaboration possibilities. Our experiments on Facebook data from the Louisiana Poverty Initiative show that our framework will be able to collect the information necessary for building such a user-to-user recommender system. Computer Science
33	Parameter Advising for Multiple Sequence Alignment DeBlasio, Daniel Frank January 2016 (has links) The problem of aligning multiple protein sequences is essential to many biological analyses, but most standard formulations of the problem are NP-complete. Due to both the difficulty of the problem and its practical importance, there are many heuristic multiple sequence aligners that a researcher has at their disposal. A basic issue that frequently arises is that each of these alignment tools has a multitude of parameters that must be set, and which greatly affect the quality of the alignment produced. Most users rely on the default parameter setting that comes with the aligner, which is optimal on average, but can produce a low-quality alignment for the given inputs. This dissertation develops an approach called parameter advising to find a parameter setting that produces a high-quality alignment for each given input. A parameter advisor aligns the input sequences for each choice in a collection of parameter settings, and then selects the best alignment from the resulting alignments produced. A parameter advisor has two major components: (i) an advisor set of parameter choices that are given to the aligner, and (ii) an accuracy estimator that is used to rank alignments produced by the aligner. Alignment accuracy is measured with respect to a known reference alignment, in practice a reference alignment is not available, and we can only estimate accuracy. We develop a new accuracy estimator that we call called Facet (short for "feature-based accuracy estimator") that computes an accuracy estimate as a linear combination of efficiently-computable feature functions, whose coefficients are learned by solving a large scale linear programming problem. We also develop an efficient approximation algorithm for finding an advisor set of a given cardinality for a fixed estimator, whose cardinality should ideally small, as the aligner is invoked for each parameter choice in the set. Using Facet for parameter advising boosts advising accuracy by almost 20% beyond using a single default parameter choice for the hardest-to-align benchmarks. This dissertation further applies parameter advising in two ways: (i) to ensemble alignment, which uses the advising process on a collection of aligners to choose both the aligner and its parameter settings, and (ii) to adaptive local realignment, which can align different regions of the input sequences with distinct parameter choices to conform to mutation rates as they vary across the lengths of the sequences. Computer Science
34	On Strategic Behavior in Networks Johnson, Samuel David 14 June 2016 (has links) <p> As our understanding of complex social, economic, and technological systems improves, it is increasingly apparent that a full account of a system's macroscopic level properties requires us to carefully explore the structure of local, pairwise interactions that take place at the microscopic level. Over the past two decades, networks have emerged as the <i>de facto</i> representation of such systems, leading to the genesis of the interdisciplinary field of <i> network science.</i> During this same period, we have witnessed an explosion of participation and consumption of social media, advertising, and e-commerce on the internet; an ecosystem that is the embodiment of and whose success is fundamentally coupled to the use and exploitation of complex networks. <i>What are the processes and mechanisms responsible for shaping these networks? Do these processes posses any inherent fairness? How can these structures be exploited for the benefit of strategic actors?</i> In this dissertation, I explore these questions and present analytical results couched in a theory of strategic decision making — <i>algorithmic game theory.</i> (Abstract shortened by ProQuest.)</p> Computer science
35	Discovery of Latent Factors in High-dimensional Data Using Tensor Methods Huang, Furong 15 June 2016 (has links) <p>Unsupervised learning aims at the discovery of hidden structure that drives the observations in the real world. It is essential for success in modern machine learning and artificial intelligence. Latent variable models are versatile in unsupervised learning and have applications in almost every domain, e.g., social network analysis, natural language processing, computer vision and computational biology. Training latent variable models is challenging due to the non-convexity of the likelihood objective function. An alternative method is based on the spectral decomposition of low order moment matrices and tensors. This versatile framework is guaranteed to estimate the correct model consistently. My thesis spans both theoretical analysis of tensor decomposition framework and practical implementation of various applications. </p><p> This thesis presents theoretical results on convergence to globally optimal solution of tensor decomposition using the stochastic gradient descent, despite non-convexity of the objective. This is the first work that gives global convergence guarantees for the stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points. </p><p> This thesis also presents large-scale deployment of spectral methods (matrix and tensor decomposition) carried out on CPU, GPU and Spark platforms. Dimensionality reduction techniques such as random projection are incorporated for a highly parallel and scalable tensor decomposition algorithm. We obtain a gain in both accuracies and in running times by several orders of magnitude compared to the state-of-art variational methods. </p><p> To solve real world problems, more advanced models and learning algorithms are proposed. After introducing tensor decomposition framework under latent Dirichlet allocation (LDA) model, this thesis discusses generalization of LDA model to mixed membership stochastic block model for learning hidden user commonalities or communities in social network, convolutional dictionary model for learning phrase templates and word-sequence embeddings, hierarchical tensor decomposition and latent tree structure model for learning disease hierarchy in healthcare analytics, and spatial point process mixture model for detecting cell types in neuroscience. </p> Computer science
36	Effects of clipping distortion on an Automatic Speaker Recognition system Ramirez, Jose Luis 08 June 2016 (has links) <p>Clipping distortion is a common problem faced in the audio recording world in which an audio signal is recorded at higher amplitude than the recording system’s limitations, resulting in a portion of the acoustic event not being recorded. Several government agencies employ the use of Automatic Speaker Recognition (ASR) systems in order to identify the speaker of an acquired recording. This is done automatically using a nonbiased approach by running a questioned recording through an ASR system and comparing it to a pre-existing database of voice samples of whom the speakers are known. A matched speaker is indicated by a high correlation of likelihood between the questioned recording and the ones from the known database. It is possible that during the process of making the questioned recording the speaker was speaking too loudly into the recording device, a gain setting was set too high, or there was post-processing done to the point that clipping distortion is introduced into the recording. Clipping distortion results from the amplitude of an audio signal surpassing the maximum sampling value of the recording system. This affects the quantized audio signal by truncating peaks at the max value rather than the actual amplitude of the input signal. In theory clipping distortion will affect likelihood ratios in a negative way between two compared recordings of the same speaker. This thesis will test this hypothesis. Currently there is no research that has helped as a guideline for knowing the limitations when using clipped recordings. This thesis will investigate to what degree of effect will clipped material have on the system performance of a Forensic Automatic Speaker Recognition system. </p> Computer science
37	Adding Semantics to Unstructured and Semi-structured Data on the Web Bhagavatula, Chandra Sekhar 09 June 2016 (has links) <p> Acquiring vast bodies of knowledge in machine-understandable form is one of the main challenges in artificial intelligence. Information Extraction is the task of automatically extracting structured, machine-understandable information from unstructured or semi-structured data. Recent advances in information extraction and the massive scale of data on the Web present a unique opportunity for artificial intelligence systems for large-scale automatic knowledge acquisition. However, to realize the full potential of the automatically extracted information, it is essential to understand their semantics. </p><p> A key step in understanding the semantics of extracted information is entity linking: the task of mapping a phrase in text to its referent entity in a given knowledge base. In addition to identifying entities mentioned in text, an AI system can benefit significantly from the organization of entities in a taxonomy. While taxonomies are used in a variety of applications, including IBM’s Jeopardy-winning Watson system, they demand significant effort in their creation. They are either manually curated, or built using semi-supervised machine learning techniques.</p><p> This dissertation explores methods to automatically infer a taxonomy of entities, given the properties that are usually associated with them (e.g. as a City, Chicago is usually associated with properties like "population" and "area"). Our approach is based on the <i>Property Inheritance hypothesis, </i> which states that entities of a specific type in a taxonomy inherit properties from more general types. We apply this hypothesis to two distinct information extraction tasks — each of which is aimed at understanding the semantics of information mined from the Web. First, we describe the two systems (1) TABEL: a state-of-the art system that performs the task of entity linking on Web tables, and (2) SKEY: a system that extracts key phrases that summarize a document in a given corpus. We then apply topic models that encode our hypothesis in a probabilistic framework to automatically infer a taxonomy in each task.</p> Computer science
38	Use of GPU architecture to optimize Rabin fingerprint data chunking algorithm by concurrent programming Wang, Sean 01 June 2016 (has links) <p> Data deduplication is introduced as a popular technique used to increase storage efficiency used in various data centers and corporate backup environments. There are various caching techniques and metadata checking available to prevent excessive file scanning. Due to the nature of content addressable chunking algorithm being a serial operation, the data deduplication chunking process often times become the performance bottleneck. This project introduces a parallelized Rabin fingerprint algorithm suitable for GPU hardware architecture that aims to optimize the performance of the deduplication process.</p> Computer science
39	Geometry of Presentation Videos and Slides, and the Semantic Linking of Instructional Content (SLIC) System Kharitonova, Yekaterina January 2016 (has links) Presentation slides are now a de facto standard in most classroom lectures, business meetings, and conference talks. Until recently, electronic presentation materials have been disjointed from each other: the video file and the corresponding slides are typically available separately for viewing or download. In this work, we exploit the fact that video frames of a presentation and the corresponding slides are mapped into one another by a geometric transformation, called a homography. This mapping allows us to synchronize a video with the slides shown in it, enabling users to interactively view presentation materials, and search within and across presentations. We show how we can approximate homographies with affine transformations. Similarly to the original homographies, such transformations allow us to project slides back into the video (i.e., perform backprojection), which improves their resulting appearance. The advantage of our method is that we use homographies to compress the original video, reducing bandwidth used to transmit the video file, and then carry out backprojection using affine transformations on the client side. Additionally, we introduce a novel approach to slide appearance approximation, which improves SIFT-based matching for videos with out-of-plane rotation of the projection screen. This method also allows us to split each slide into three overlapping panels, and generate rotated versions of each such panel. Using these panels during matching, we detect slide's content that is projected onto a speaker (what we call "slide tattoos"). We treat these "tattoos" as implicit structured light, which provides hints about the scene geometry. We then use the homography obtained from detecting "slide tattoos" to compute a fundamental matrix. The main significance of this contribution is that it allows us to infer 3-D information from 2-D presentation materials. Finally, we present the Semantically Linked Instructional Content (SLIC) Portal, an online system for accessing presentations that exploits our slide-video matching. Aspects of the SLIC system fully developed or significantly improved as part of this work include: a publicly-open web collection of video presentations indexed by slides a unified clear interface displaying a video player along with slide images synchronized with their appearance in the video a categorization tree that allows browsing for presentations by topic/category an ability to query slide words within and across presentations; querying is integrated with the"browsing"mode, where the search results can be narrowed to only the selected categories an easy integration with the audio transcript: the ability to preview and search within speech words cross-platform and mobile support. We conducted user studies at the University of Arizona to measure the effect of synchronized presentation materials on learners, and discuss students' favorable response to the SLIC Portal, which they used during the experiments. Computer Science
40	Building Privacy-Preserving Cryptographic Credentials from Federated Online Identities Maheswaran, John 16 February 2016 (has links) <p> Third-party applications such as Quora or StackOverflow allow users to log in through a federated identity provider such as Facebook (Log in with Facebook), Google+ or Twitter. This process is called federated authentication. Examples of federated identity providers include social networks as well as other non-social network identity providers such as PayPal.</p><p> Federated identity providers have gained widespread popularity among users as a way to manage their online identity across the web. While protocols like OAuth and OpenID allow users to maintain a single set of credentials for federated authentication, such federated login can leak privacy-sensitive profile information, making the user's online activity more easily tracked. </p><p> To protect themselves, users could forego using such identities altogether, or limit the content of their profiles. Ideally, users could leverage their federated identities but in a way as to prevent third party applications from accessing sensitive information. While anonymous authentication techniques have been proposed, their practicality depend on such technologies as POP or complex encryption algorithms which most users lack the knowledge or motivation to use effectively.</p><p> While federated identity providers offer a convenient and increasingly popular mechanism for federated authentication, unfortunately, they also exacerbate many privacy and tracking risks. We present Crypto-Book, a privacy preserving layer enabling federated authentication while reducing these risks. </p><p> Crypto-Book relies on a set of independently managed servers that collectively assign each federated identity credentials (either a public/private keypair or blinded signed messages). We propose two components, "credential producers" that create and issue clients with privacy preserving credentials, and "credendial consumers" that verify these privacy preserving credentials for authentication of clients to third party applications.</p><p> The credential producer servers have split trust and use a (t,n)-threshold cryptosystem to collaboratively generate client credentials. Using their credentials, clients can then leverage anonymous authentication techniques such as linkable ring signatures or blind signatures to log into third party applications via credential consumers, while preserving privacy.</p><p> We have implemented our system and demonstrate its use with four distinct applications: a Wild system, an anonymous group communication system, a whistle blower submission system based on SecureDrop, and a privacy preserving chat room system. Our results show that for anonymity sets of size 100 and 2048-bit DSA keys, Crypto-Book ring signature authentication takes 1.641s for signature generation by the client, 1.632s for signature verification on the server, and requires 8.761KB of communication bandwidth. Similarly for partially blind signature authentication, each phase takes under 0.05s and requires 0.325KB of bandwidth.</p><p> Crypto-Book is practical and has low overhead: We deployed a privacy preserving chat room system built on top of the Crypto-Book architecture. Within the deployment within our research group, Crypto-Book group authentication took 1.607s end-to-end, an overhead of 1.2s compared to traditional non privacy preserving federated authentication.</p> Computer science

Search results