Global ETD Search

451	DCMS: A Data Analytics and Management System for Molecular Simulation Berrada, Meryem 16 March 2015 (has links) Despite the fact that Molecular Simulation systems represent a major research tool in multiple scientific and engineering fields, there is still a lack of systems for effective data management and fast data retrieval and processing. This is mainly due to the nature of MS which generate a very large amount of data - a system usually encompass millions of data information, and one query usually runs for tens of thousands of time frames. For this purpose, we designed and developed a new application, DCMS (A data Analytics and Management System for molecular Simulation), that intends to speed up the process of new discovery in the medical/physics fields. DCMS stores simulation data in a database; and provides users with a user-friendly interface to upload, retrieve, query, and analyze MS data without having to deal with any raw data. In addition, we also created a new indexing scheme, the Time-Parameterized Spatial (TPS) tree, to accelerate query processing through indexes that take advantage of the locality relationships between atoms. The tree was implemented directly inside the PostgreSQL kernel, on top of the SP-GiST platform. Along with this new tree, two new data types were also defined, as well as new algorithms for five data points' retrieval queries. Scientific database Molecular Dynamics Big Data Quadtree SP-GIST Computer Engineering Computer Sciences
452	An Analysis of (Bad) Behavior in Online Video Games Blackburn, Jeremy 04 September 2014 (has links) This dissertation studies bad behavior at large-scale using data traces from online video games. Video games provide a natural laboratory for exploring bad behavior due to their popularity, explicitly defined (programmed) rules, and a competitive nature that provides motivation for bad behavior. More specifically, we look at two forms of bad behavior: cheating and toxic behavior. Cheating is most simply defined as breaking the rules of the game to give one player an edge over another. In video games, cheating is most often accomplished using programs, or "hacks," that circumvent the rules implemented by game code. Cheating is a threat to the gaming industry in that it diminishes the enjoyment of fair players, siphons off money that is paid to cheat creators, and requires investment in anti-cheat technologies. Toxic behavior is a more nebulously defined term, but can be thought of as actions that violate social norms, especially those that harm other members of the society. Toxic behavior ranges from insults or harassment of players (which has clear parallels to the real world) to domain specific instances such as repeatedly "suiciding"" to help an enemy team. While toxic behavior has clear parallels to bad behavior in other online domains, e.g., cyberbullying, if gone unchecked it has the potential to "kill" a game by driving away its players. We first present a distributed architecture and reference implementation for the collection and analysis of large-scale social data. Using this implementation we then study the social structure of over 10 million gamers collected from a planetary scale Online Social Network, about 720 thousand of whom have been labeled cheaters, finding a significant correlation between social structure and the probability of partaking in cheating behavior. We additionally collect over half a billion daily observations of the cheating status of these gamers. Using about 10 months of detailed server logs from a community owned and operated game server we next analyze how relationships in the aforementioned online social network are backed by in-game interactions. Next, we use the insights gained and find evidence for a contagion process underlying the spread of cheating behavior and perform a data driven simulation using mathematical models for contagion. Finally, we build a model using millions of crowdsourced decisions for predicting toxic behavior in online games. To the best of our knowledge, this dissertation presents the largest study of bad behavior to date. Our findings confirm theories about cheating and unethical behavior that have previously remained untested outside of controlled laboratory experiments or only with small, survey based studies. We find that the intensity of interactions between players is a predictor of a future relationship forming. We provide statistically significant evidence for cheating as a contagion. Finally, we extract insights from our model for detecting toxic behavior on how human reviewers perceive the presence and severity of bad behavior. big data cheating contagion social network analysis toxic behavior Computer Engineering
453	Ontology Driven Model for an Engineered Agile Healthcare System Ramadoss, Balaji 14 February 2014 (has links) Healthcare is in urgent need of an effective way to manage the complexity it of its systems and to prepare quickly for immense changes in the economics of healthcare delivery and reimbursement. Centers for Medicare & Medicaid Services (CMS) releases policies affecting inpatient and long-term care hospitals policies that directly affect reimbursement and payment rates. One of these policy changes, a quality-reporting program called Hospital Inpatient Quality Reporting (IQR), will effect approximately 3,400 acute-care and 440 long-term care hospitals. IQR sets guidelines and measures that will contain financial incentives and penalties based on the quality of care provided. CMS, the largest healthcare payer, is aggressively promoting high quality of care by linking payment incentives to outcomes. With CMS assessing each hospital's performance by comparing its Quality Achievements and Quality Improvement scores, there is a growing need and demand to understand these quality measures under the context of patient care, data management and system integration. This focus on patient-centered quality care is difficult for healthcare systems due to the lack of a systemic view of the patient and patient care. This research uniquely addresses the hospital's need to meet these challenges by presenting a healthcare specific framework and methodology for translating data on quality metrics into actionable processes and feedback to produce the desired quality outcome. The solution is based on a patient-care level process ontology, rather than the technology itself, and creates a bridge that applies systems engineering principles to permit observation and control of the system. This is a transformative framework conceived to meet the needs of the rapidly changing healthcare landscape. Without this framework, healthcare is dealing with outcomes that are six to seven months old, meaning patients may not have been cared for effectively. In this research a framework and methodology called the Healthcare Ontology Based Systems Engineering Model (HOB-SEM) is developed to allow for observability and controllability of compartmental healthcare systems. HOB-SEM applies systems and controls engineering principles to healthcare using ontology as the method and the data lifecycle as the framework. The ontology view of patient-level system interaction and the framework to deliver data management and quality lifecycles enables the development of an agile systemic healthcare view for observability and controllability Big Data Control Systems Modelling System Design Systems Engineering Electrical and Computer Engineering Medicine and Health Sciences
454	Performance Optimization Techniques and Tools for Data-Intensive Computation Platforms : An Overview of Performance Limitations in Big Data Systems and Proposed Optimizations Kalavri, Vasiliki January 2014 (has links) Big data processing has recently gained a lot of attention both from academia and industry. The term refers to tools, methods, techniques and frameworks built to collect, store, process and analyze massive amounts of data. Big data can be structured, unstructured or semi-structured. Data is generated from various different sources and can arrive in the system at various rates. In order to process these large amounts of heterogeneous data in an inexpensive and efficient way, massive parallelism is often used. The common architecture of a big data processing system consists of a shared-nothing cluster of commodity machines. However, even in such a highly parallel setting, processing is often very time-consuming. Applications may take up to hours or even days to produce useful results, making interactive analysis and debugging cumbersome. One of the main problems is that good performance requires both good data locality and good resource utilization. A characteristic of big data analytics is that the amount of data that is processed is typically large in comparison with the amount of computation done on it. In this case, processing can benefit from data locality, which can be achieved by moving the computation close the to data, rather than vice versa. Good utilization of resources means that the data processing is done with maximal parallelization. Both locality and resource utilization are aspects of the programming framework’s runtime system. Requiring the programmer to work explicitly with parallel process creation and process placement is not desirable. Thus, specifying good optimization that would relieve the programmer from low-level, error-prone instrumentation to achieve good performance is essential. The main goal of this thesis is to study, design and implement performance optimizations for big data frameworks. This work contributes methods and techniques to build tools for easy and efficient processing of very large data sets. It describes ways to make systems faster, by inventing ways to shorten job completion times. Another major goal is to facilitate the application development in distributed data-intensive computation platforms and make big-data analytics accessible to non-experts, so that users with limited programming experience can benefit from analyzing enormous datasets. The thesis provides results from a study of existing optimizations in MapReduce and Hadoop related systems. The study presents a comparison and classification of existing systems, based on their main contribution. It then summarizes the current state of the research field and identifies trends and open issues, while also providing our vision on future directions. Next, this thesis presents a set of performance optimization techniques and corresponding tools fordata-intensive computing platforms; PonIC, a project that ports the high-level dataflow framework Pig, on top of the data-parallel computing framework Stratosphere. The results of this work show that Pig can highly benefit from using Stratosphereas the backend system and gain performance, without any loss of expressiveness. The work also identifies the features of Pig that negatively impact execution time and presents a way of integrating Pig with different backends. HOP-S, a system that uses in-memory random sampling to return approximate, yet accurate query answers. It uses a simple, yet efficient random sampling technique implementation, which significantly improves the accuracy of online aggregation. An optimization that exploits computation redundancy in analysis programs and m2r2, a system that stores intermediate results and uses plan matching and rewriting in order to reuse results in future queries. Our prototype on top of the Pig framework demonstrates significantly reduced query response times. Finally, an optimization framework for iterative fixed points, which exploits asymmetry in large-scale graph analysis. The framework uses a mathematical model to explain several optimizations and to formally specify the conditions under which, optimized iterative algorithms are equivalent to the general solution. / <p>QC 20140605</p> performance optimization data-intensive computing big data Engineering and Technology Teknik och teknologier
455	Efficient and Private Processing of Analytical Queries in Scientific Datasets Kumar, Anand 01 January 2013 (has links) Large amount of data is generated by applications used in basic-science research and development applications. The size of data introduces great challenges in storage, analysis and preserving privacy. This dissertation proposes novel techniques to efficiently analyze the data and reduce storage space requirements through a data compression technique while preserving privacy and providing data security. We present an efficient technique to compute an analytical query called spatial distance histogram (SDH) using spatiotemporal properties of the data. Special spatiotemporal properties present in the data are exploited to process SDH efficiently on the fly. General purpose graphics processing units (GPGPU or just GPU) are employed to further boost the performance of the algorithm. Size of the data generated in scientific applications poses problems of disk space requirements, input/output (I/O) delays and data transfer bandwidth requirements. These problems are addressed by applying proposed compression technique. We also address the issue of preserving privacy and security in scientific data by proposing a security model. The security model monitors user queries input to the database that stores and manages scientific data. Outputs of user queries are also inspected to detect privacy breach. Privacy policies are enforced by the monitor to allow only those queries and results that satisfy data owner specified policies. Big Data Compression Edit Automata GPU Computing Molecular Simulations Parallel Processing Computer Engineering Computer Sciences Engineering
456	Modeling Large Social Networks in Context Ho, Qirong 01 July 2014 (has links) Today’s social and internet networks contain millions or even billions of nodes, and copious amounts of side information (context) such as text, attribute, temporal, image and video data. A thorough analysis of a social network should consider both the graph and the associated side information, yet we also expect the algorithm to execute in a reasonable amount of time on even the largest networks. Towards the goal of rich analysis on societal-scale networks, this thesis provides (1) modeling and algorithmic techniques for incorporating network context into existing network analysis algorithms based on statistical models, and (2) strategies for network data representation, model design, algorithm design and distributed multi-machine programming that, together, ensure scalability to very large networks. The methods presented herein combine the flexibility of statistical models with key ideas and empirical observations from the data mining and social networks communities, and are supported by software libraries for cluster computing based on original distributed systems research. These efforts culminate in a novel mixed-membership triangle motif model that easily scales to large networks with over 100 million nodes on just a few cluster machines, and can be readily extended to accommodate network context using the other techniques presented in this thesis. Social Networks Statistical Models Scalable Algorithms Big Data Distributed Systems Cluster Computing
457	以MapReduce做有效率的天際線查詢 / Efficient Skyline Computation with MapReduce 陳家慶, Chen, Chia Ching Unknown Date (has links) 隨著巨量資料的議題逐漸被重視，有越來越多的巨量資料的分析都利用MapReduce作計算處理。而在資料庫查詢中，天際線查詢是一種常見的決策分析方法，其目的是要幫助使用者找出資料庫中各維度的數值貼近使用者查詢條件的資料。然而，過去在大量資料的查詢方法中，如果資料筆數較多，同時查詢的維度也大的情況下，往往會有著效率不彰的問題。因此，本研究提出一種在大量資料中，有效率應用MapReduce作天際線查詢的方法。而根據實驗結果顯示，我們的方法，比先前方法更有效率。 / With the big data issue being taken seriously today, more and more big data is processed with MapReduce. Moreover, skyline query is a common method for decision making, which helps users find the data whose value in each dimension is close to the user query. In the past, if the data is huge, or the data space involves many dimensions, the query processing becomes inefficient. Therefore, in this study, we present a new method to process skyline queries with MapReduce. According to the experimental results, our method is more efficient than previous methods. 巨量資料天際線 Big Data Skyline MapReduce
458	Late radiation effects in radiotherapy : changes in the biomechanical properties of normal skin, and surgically produced lesions after X irradiation measured in vivo and in vitro Baker, Mark Ralph January 1993 (has links) No description available. 571.45
459	PRIVACY PRESERVING DATA MINING FOR NUMERICAL MATRICES, SOCIAL NETWORKS, AND BIG DATA Liu, Lian 01 January 2015 (has links) Motivated by increasing public awareness of possible abuse of confidential information, which is considered as a significant hindrance to the development of e-society, medical and financial markets, a privacy preserving data mining framework is presented so that data owners can carefully process data in order to preserve confidential information and guarantee information functionality within an acceptable boundary. First, among many privacy-preserving methodologies, as a group of popular techniques for achieving a balance between data utility and information privacy, a class of data perturbation methods add a noise signal, following a statistical distribution, to an original numerical matrix. With the help of analysis in eigenspace of perturbed data, the potential privacy vulnerability of a popular data perturbation is analyzed in the presence of very little information leakage in privacy-preserving databases. The vulnerability to very little data leakage is theoretically proved and experimentally illustrated. Second, in addition to numerical matrices, social networks have played a critical role in modern e-society. Security and privacy in social networks receive a lot of attention because of recent security scandals among some popular social network service providers. So, the need to protect confidential information from being disclosed motivates us to develop multiple privacy-preserving techniques for social networks. Affinities (or weights) attached to edges are private and can lead to personal security leakage. To protect privacy of social networks, several algorithms are proposed, including Gaussian perturbation, greedy algorithm, and probability random walking algorithm. They can quickly modify original data in a large-scale situation, to satisfy different privacy requirements. Third, the era of big data is approaching on the horizon in the industrial arena and academia, as the quantity of collected data is increasing in an exponential fashion. Three issues are studied in the age of big data with privacy preservation, obtaining a high confidence about accuracy of any specific differentially private queries, speedily and accurately updating a private summary of a binary stream with I/O-awareness, and launching a mutual private information retrieval for big data. All three issues are handled by two core backbones, differential privacy and the Chernoff Bound. Privacy Preservation Data Mining Social Networks Big Data Sampling Computer Security Databases and Information Systems
460	Multidimensional Visualization of News Articles / Flerdimensionel Visualisering av Nyhetsartiklar Åklint, Richard, Khan, Muhammad Farhan January 2015 (has links) Large data sets are difficult to visualize. For a human to find structures and understand the data, good visualization tools are required. In this project a technique will be developed that makes it possible for a user to look at complex data at different scales. This technique is obvious when viewing geographical data where zooming in and out gives a good feeling for the spatial relationships in map data or satellite images. However, for other types of data it is not obvious how much scaling should be done. In this project, an experimental application is developed that visualizes data in multiple dimensions from a large news article database. Using this experimental application, the user can select multiple keywords on different axis and then can create a visualization containing news articles with those keywords. The user is able to move around the visualization. If the camera is far away from the document icons then they are clustered using red coloured spheres. If the user moves the camera closer to the clusters they will pop up into single document icons. If the camera is very close to the document icons it is possible to read the news articles Visualization TFIDF Octree Keywords Extractor News Articles Data Abstraction Big Data

Search results