Global ETD Search

461	Late radiation effects in radiotherapy : changes in the biomechanical properties of normal skin, and surgically produced lesions after X irradiation measured in vivo and in vitro Baker, Mark Ralph January 1993 (has links) No description available. 571.45
462	PRIVACY PRESERVING DATA MINING FOR NUMERICAL MATRICES, SOCIAL NETWORKS, AND BIG DATA Liu, Lian 01 January 2015 (has links) Motivated by increasing public awareness of possible abuse of confidential information, which is considered as a significant hindrance to the development of e-society, medical and financial markets, a privacy preserving data mining framework is presented so that data owners can carefully process data in order to preserve confidential information and guarantee information functionality within an acceptable boundary. First, among many privacy-preserving methodologies, as a group of popular techniques for achieving a balance between data utility and information privacy, a class of data perturbation methods add a noise signal, following a statistical distribution, to an original numerical matrix. With the help of analysis in eigenspace of perturbed data, the potential privacy vulnerability of a popular data perturbation is analyzed in the presence of very little information leakage in privacy-preserving databases. The vulnerability to very little data leakage is theoretically proved and experimentally illustrated. Second, in addition to numerical matrices, social networks have played a critical role in modern e-society. Security and privacy in social networks receive a lot of attention because of recent security scandals among some popular social network service providers. So, the need to protect confidential information from being disclosed motivates us to develop multiple privacy-preserving techniques for social networks. Affinities (or weights) attached to edges are private and can lead to personal security leakage. To protect privacy of social networks, several algorithms are proposed, including Gaussian perturbation, greedy algorithm, and probability random walking algorithm. They can quickly modify original data in a large-scale situation, to satisfy different privacy requirements. Third, the era of big data is approaching on the horizon in the industrial arena and academia, as the quantity of collected data is increasing in an exponential fashion. Three issues are studied in the age of big data with privacy preservation, obtaining a high confidence about accuracy of any specific differentially private queries, speedily and accurately updating a private summary of a binary stream with I/O-awareness, and launching a mutual private information retrieval for big data. All three issues are handled by two core backbones, differential privacy and the Chernoff Bound. Privacy Preservation Data Mining Social Networks Big Data Sampling Computer Security Databases and Information Systems
463	Multidimensional Visualization of News Articles / Flerdimensionel Visualisering av Nyhetsartiklar Åklint, Richard, Khan, Muhammad Farhan January 2015 (has links) Large data sets are difficult to visualize. For a human to find structures and understand the data, good visualization tools are required. In this project a technique will be developed that makes it possible for a user to look at complex data at different scales. This technique is obvious when viewing geographical data where zooming in and out gives a good feeling for the spatial relationships in map data or satellite images. However, for other types of data it is not obvious how much scaling should be done. In this project, an experimental application is developed that visualizes data in multiple dimensions from a large news article database. Using this experimental application, the user can select multiple keywords on different axis and then can create a visualization containing news articles with those keywords. The user is able to move around the visualization. If the camera is far away from the document icons then they are clustered using red coloured spheres. If the user moves the camera closer to the clusters they will pop up into single document icons. If the camera is very close to the document icons it is possible to read the news articles Visualization TFIDF Octree Keywords Extractor News Articles Data Abstraction Big Data
464	Middleware for online scientific data analytics at extreme scale Zheng, Fang 22 May 2014 (has links) Scientific simulations running on High End Computing machines in domains like Fusion, Astrophysics, and Combustion now routinely generate terabytes of data in a single run, and these data volumes are only expected to increase. Since such massive simulation outputs are key to scientific discovery, the ability to rapidly store, move, analyze, and visualize data is critical to scientists' productivity. Yet there are already serious I/O bottlenecks on current supercomputers, and movement toward the Exascale is further accelerating this trend. This dissertation is concerned with the design, implementation, and evaluation of middleware-level solutions to enable high performance and resource efficient online data analytics to process massive simulation output data at large scales. Online data analytics can effectively overcome the I/O bottleneck for scientific applications at large scales by processing data as it moves through the I/O path. Online analytics can extract valuable insights from live simulation output in a timely manner, better prepare data for subsequent deep analysis and visualization, and gain improved performance and reduced data movement cost (both in time and in power) compared to the conventional post-processing paradigm. The thesis identifies the key challenges for online data analytics based on the needs of a variety of large-scale scientific applications, and proposes a set of novel and effective approaches to efficiently program, distribute, and schedule online data analytics along the critical I/O path. In particular, its solution approach i) provides a high performance data movement substrate to support parallel and complex data exchanges between simulation and online data analytics, ii) enables placement flexibility of analytics to exploit distributed resources, iii) for co-placement of analytics with simulation codes on the same nodes, it uses fined-grained scheduling to harvest idle resources for running online analytics with minimal interference to the simulation, and finally, iv) it supports scalable efficient online spatial indices to accelerate data analytics and visualization on the deep memory hierarchies of high end machines. Our middleware approach is evaluated with leadership scientific applications in domains like Fusion, Combustion, and Molecular Dynamics, and on different High End Computing platforms. Substantial improvements are demonstrated in end-to-end application performance and in resource efficiency at scales of up to 16384 cores, for a broad range of analytics and visualization codes. The outcome is a useful and effective software platform for online scientific data analytics facilitating large-scale scientific data exploration. Scientific data analytics I/O middleware Middleware Big data High performance computing
465	Cost-effective and privacy-conscious cloud service provisioning: architectures and algorithms Palanisamy, Balaji 27 August 2014 (has links) Cloud Computing represents a recent paradigm shift that enables users to share and remotely access high-powered computing resources (both infrastructure and software/services) contained in off-site data centers thereby allowing a more efficient use of hardware and software infrastructures. This growing trend in cloud computing, combined with the demands for Big Data and Big Data analytics, is driving the rapid evolution of datacenter technologies towards more cost-effective, consumer-driven, more privacy conscious and technology agnostic solutions. This dissertation is dedicated to taking a systematic approach to develop system-level techniques and algorithms to tackle the challenges of large-scale data processing in the Cloud and scaling and delivering privacy-aware services with anytime-anywhere availability. We analyze the key challenges in effective provisioning of Cloud services in the context of MapReduce-based parallel data processing considering the concerns of cost-effectiveness, performance guarantees and user-privacy and we develop a suite of solution techniques, architectures and models to support cost-optimized and privacy-preserving service provisioning in the Cloud. At the cloud resource provisioning tier, we develop a utility-driven MapReduce Cloud resource planning and management system called Cura for cost-optimally allocating resources to jobs. While existing services require users to select a number of complex cluster and job parameters and use those potentially sub-optimal per-job configurations, the Cura resource management achieves global resource optimization in the cloud by minimizing cost and maximizing resource utilization. We also address the challenges of resource management and job scheduling for large-scale parallel data processing in the Cloud in the presence of networking and storage bottlenecks commonly experienced in Cloud data centers. We develop Purlieus, a self-configurable locality-based data and virtual machine management framework that enables MapReduce jobs to access their data either locally or from close-by nodes including all input, output and intermediate data achieving significant improvements in job response time. We then extend our cloud resource management framework to support privacy-preserving data access and efficient privacy-conscious query processing. Concretely, we propose and implement VNCache: an efficient solution for MapReduce analysis of cloud-archived log data for privacy-conscious enterprises. Through a seamless data streaming and prefetching model in VNCache, Hadoop jobs begin execution as soon as they are launched without requiring any apriori downloading. At the cloud consumer tier, we develop mix-zone based techniques for delivering anonymous cloud services to mobile users on the move through Mobimix, a novel road-network mix-zone based framework that enables real time, location based service delivery without disclosing content or location privacy of the consumers. Cloud computing MapReduce Big data Cost-optimization Locality-awareness Caching System privacy Location privacy
466	BIG DATA : From hype to reality Danesh, Sabri January 2014 (has links) Big data is all of a sudden everywhere. It is too big to ignore!It has been six decades since the computer revolution, four decades after the development of the microchip, and two decades of the modern Internet! More than a decade after the 90s “.com” fizz, can Big Data be the next Big Bang? Big data reveals part of our daily lives. It has the potential to solve virtually any problem for a better urbanized global. Big Data sources are also very interesting from an official statistics point of view. The purpose of this paper is to explore the conceptions of big data and opportunities and challenges associated with using big data especially in official statistics. “A petabyte is the equivalent of 1,000 terabytes, or a quadrillion bytes. One terabyte is a thousand gigabytes. One gigabyte is made up of a thousand megabytes. There are a thousand thousand—i.e., a million—petabytes in a zettabyte” (Shaw 2014). And this is to be continued… Big Data Volume Velocity Variety Veracity Cukier Mayer-Schönberger SCB Statistics bigdata
467	An artefact to analyse unstructured document data stores / by André Romeo Botes Botes, André Romeo January 2014 (has links) Structured data stores have been the dominating technologies for the past few decades. Although dominating, structured data stores lack the functionality to handle the ‘Big Data’ phenomenon. A new technology has recently emerged which stores unstructured data and can handle the ‘Big Data’ phenomenon. This study describes the development of an artefact to aid in the analysis of NoSQL document data stores in terms of relational database model constructs. Design science research (DSR) is the methodology implemented in the study and it is used to assist in the understanding, design and development of the problem, artefact and solution. This study explores the existing literature on DSR, in addition to structured and unstructured data stores. The literature review formulates the descriptive and prescriptive knowledge used in the development of the artefact. The artefact is developed using a series of six activities derived from two DSR approaches. The problem domain is derived from the existing literature and a real application environment (RAE). The reviewed literature provided a general problem statement. A representative from NFM (the RAE) is interviewed for a situation analysis providing a specific problem statement. An objective is formulated for the development of the artefact and suggestions are made to address the problem domain, assisting the artefact’s objective. The artefact is designed and developed using the descriptive knowledge of structured and unstructured data stores, combined with prescriptive knowledge of algorithms, pseudo code, continuous design and object-oriented design. The artefact evolves through multiple design cycles into a final product that analyses document data stores in terms of relational database model constructs. The artefact is evaluated for acceptability and utility. This provides credibility and rigour to the research in the DSR paradigm. Acceptability is demonstrated through simulation and the utility is evaluated using a real application environment (RAE). A representative from NFM is interviewed for the evaluation of the artefact. Finally, the study is communicated by describing its findings, summarising the artefact and looking into future possibilities for research and application. / MSc (Computer Science), North-West University, Vaal Triangle Campus, 2014 Design science research Structured data stores Unstructured data stores Artefact NoSQL Big data
468	The fast multipole method at exascale Chandramowlishwaran, Aparna 13 January 2014 (has links) This thesis presents a top to bottom analysis on designing and implementing fast algorithms for current and future systems. We present new analysis, algorithmic techniques, and implementations of the Fast Multipole Method (FMM) for solving N- body problems. We target the FMM because it is broadly applicable to a variety of scientific particle simulations used to study electromagnetic, fluid, and gravitational phenomena, among others. Importantly, the FMM has asymptotically optimal time complexity with guaranteed approximation accuracy. As such, it is among the most attractive solutions for scalable particle simulation on future extreme scale systems. We specifically address two key challenges. The first challenge is how to engineer fast code for today’s platforms. We present the first in-depth study of multicore op- timizations and tuning for FMM, along with a systematic approach for transforming a conventionally-parallelized FMM into a highly-tuned one. We introduce novel opti- mizations that significantly improve the within-node scalability of the FMM, thereby enabling high-performance in the face of multicore and manycore systems. The second challenge is how to understand scalability on future systems. We present a new algorithmic complexity analysis of the FMM that considers both intra- and inter- node communication costs. Using these models, we present results for choosing the optimal algorithmic tuning parameter. This analysis also yields the surprising prediction that although the FMM is largely compute-bound today, and therefore highly scalable on current systems, the trajectory of processor architecture designs, if there are no significant changes could cause it to become communication-bound as early as the year 2015. This prediction suggests the utility of our analysis approach, which directly relates algorithmic and architectural characteristics, for enabling a new kind of highlevel algorithm-architecture co-design. To demonstrate the scientific significance of FMM, we present two applications namely, direct simulation of blood which is a multi-scale multi-physics problem and large-scale biomolecular electrostatics. MoBo (Moving Boundaries) is the infrastruc- ture for the direct numerical simulation of blood. It comprises of two key algorithmic components of which FMM is one. We were able to simulate blood flow using Stoke- sian dynamics on 200,000 cores of Jaguar, a peta-flop system and achieve a sustained performance of 0.7 Petaflop/s. The second application we propose as future work in this thesis is biomolecular electrostatics where we solve for the electrical potential using the boundary-integral formulation discretized with boundary element methods (BEM). The computational kernel in solving the large linear system is dense matrix vector multiply which we propose can be calculated using our scalable FMM. We propose to begin with the two dielectric problem where the electrostatic field is cal- culated using two continuum dielectric medium, the solvent and the molecule. This is only a first step to solving biologically challenging problems which have more than two dielectric medium, ion-exclusion layers, and solvent filled cavities. Finally, given the difficulty in producing high-performance scalable code, productivity is a key concern. Recently, numerical algorithms are being redesigned to take advantage of the architectural features of emerging multicore processors. These new classes of algorithms express fine-grained asynchronous parallelism and hence reduce the cost of synchronization. We performed the first extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. The CnC model is well-suited to expressing asynchronous-parallel algorithms, so we evaluate CnC using two dense linear algebra algorithms in this style for execution on state-of-the-art mul- ticore systems. Our implementations in CnC was able to match and in some cases even exceed competing vendor-tuned and domain specific library codes. We combine these two distinct research efforts by expressing FMM in CnC, our approach tries to marry performance with productivity that will be critical on future systems. Looking forward, we would like to extend this to distributed memory machines, specifically implement FMM in the new distributed CnC, distCnC to express fine-grained paral- lelism which would require significant effort in alternative models. Fast multipole method Performance analysis High performance computing Multi-body problem Algorithms Big data
469	An Ensemble Method for Large Scale Machine Learning with Hadoop MapReduce Liu, Xuan 25 March 2014 (has links) We propose a new ensemble algorithm: the meta-boosting algorithm. This algorithm enables the original Adaboost algorithm to improve the decisions made by different WeakLearners utilizing the meta-learning approach. Better accuracy results are achieved since this algorithm reduces both bias and variance. However, higher accuracy also brings higher computational complexity, especially on big data. We then propose the parallelized meta-boosting algorithm: Parallelized-Meta-Learning (PML) using the MapReduce programming paradigm on Hadoop. The experimental results on the Amazon EC2 cloud computing infrastructure show that PML reduces the computation complexity enormously while retaining lower error rates than the results on a single computer. As we know MapReduce has its inherent weakness that it cannot directly support iterations in an algorithm, our approach is a win-win method, since it not only overcomes this weakness, but also secures good accuracy performance. The comparison between this approach and a contemporary algorithm AdaBoost.PL is also performed. Adaboost Meta-learning Big Data Hadoop MapReduce Ensemble Learning Scalable Machine Learning Algorithm
470	Nonparametric Inference for High Dimensional Data Mukhopadhyay, Subhadeep 03 October 2013 (has links) Learning from data, especially ‘Big Data’, is becoming increasingly popular under names such as Data Mining, Data Science, Machine Learning, Statistical Learning and High Dimensional Data Analysis. In this dissertation we propose a new related field, which we call ‘United Nonparametric Data Science’ - applied statistics with “just in time” theory. It integrates the practice of traditional and novel statistical methods for nonparametric exploratory data modeling, and it is applicable to teaching introductory statistics courses that are closer to modern frontiers of scientific research. Our framework includes small data analysis (combining traditional and modern nonparametric statistical inference), big and high dimensional data analysis (by statistical modeling methods that extend our unified framework for small data analysis). The first part of the dissertation (Chapters 2 and 3) has been oriented by the goal of developing a new theoretical foundation to unify many cultures of statistical science and statistical learning methods using mid-distribution function, custom made orthonormal score function, comparison density, copula density, LP moments and comoments. It is also examined how this elegant theory yields solution to many important applied problems. In the second part (Chapter 4) we extend the traditional empirical likelihood (EL), a versatile tool for nonparametric inference, in the high dimensional context. We introduce a modified version of the EL method that is computationally simpler and applicable to a large class of “large p small n” problems, allowing p to grow faster than n. This is an important step in generalizing the EL in high dimensions beyond the p ≤ n threshold where the standard EL and its existing variants fail. We also present detailed theoretical study of the proposed method. Big data Quantile Empirical Likelihood LP score function Copula Nonparametric Classification Data Science

Search results