Global ETD Search

91	A paging scheme for pointer-based quadtrees Brown, Patrick R. 06 October 2009 (has links) The quadtree is a family of data structures that organize spatial data using recursive subdivision. A pointer-based quadtree uses an explicit tree structure to represent the subdivision, while a linear quadtree holds a sorted list of records corresponding to the leaves of the tree structure. Small quadtrees are typically represented using pointers since this leads to simpler algorithms. However, linear quadtrees have been historically used to represent larger data sets. The primary reason is that linear quadtrees are easily organized on pages in disk files. In addition, linear quadtrees were thought to require less space than pointerbased quadtrees. Though pointer-based quadtrees have many other advantages, there has still been much interest in the linear quadtree. This thesis presents a pointer-based representation for quadtrees called the paged-pointer quadtree. The paged-pointer quadtree overcomes both of the historical advantages of the linear quadtree. It partitions the nodes of a pointer-based quadtree into pages, stores these nodes in order, and manages pages using B-tree techniques. In addition, a paged-pointer quadtree always requires less space than a corresponding linear quadtree. Our representation overcomes the performance problems associated with representing traditional pointer-based quadtrees on disk. As a result, our implementation produces better performance than highly optimized systems based on linear quadtrees. / Master of Science LD5655.V855 1992.B768 Computer graphics Data structures (Computer science)
92	A study of data Tiao, Hsao-Ying Jennifer January 2010 (has links) Typescript (photocopy). / Digitized by Kansas Correctional Industries Database managementCase studies Data structures (Computer science)
93	Avoiding data inconsistency problems in the conceptual design of data bases: a semantic approach Leasure, David Elden. January 1984 (has links) Call number: LD2668 .T4 1984 L42 / Master of Science Database management. System design. Data structures (Computer science) Masters theses
94	Enhanced font services for X Window system Tsang, Pong-fan, Dex, 曾邦勳 January 2000 (has links) published_or_final_version / Computer Science and Information Systems / Master / Master of Philosophy X Window System (Computer system) Data structures (Computer science) Cache memory.
95	DEVELOPMENT AND TESTING OF DATA STRUCTURES FOR THE CPM/MRP METHODOLOGY. Ardalan, Alireza January 1983 (has links) A major purpose of this dissertation is to design and develop data structures for the Critical Path Method-Material Requirements Planning (CPM/MRP) methodology. The data structures developed consider the trade-off between processing time required to perform the required operations on data structures and the computer capacity utilization to store data. The CPM/MRP technique was designed to combine the capabilities of the critical path method and material requirements planning system. The critical path method is a project planning and control technique which schedules projects subject to technological sequence constraints and activity durations. When combined with material requirements planning, the methodology explicitly considers both the resources required by the activities comprising the project and the lead time to acquire the required resources. CPM/MRP contains algorithms for project scheduling subject to technological sequence and resource constraints. The early start and late start algorithms find feasible early start and late start schedules for both activity start times and resource order release times. The major drawback of the FORTRAN IV computer program which incorporated the CPM/MRP algorithms was the tremendous computer memory capacity requirements. This prohibited application of the CPM/MRP to large projects. The data structures developed in this dissertation are efficient with respect to both computer memory utilization and processing time. To design the data structures, the characteristics of storable and non-storable resources and the necessary operations within each resource category is studied. Another purpose of this dissertation is to develop an algorithm to schedule operating rooms for surgery procedures in hospitals subject to resource constraints to increase operating suite utilization. Since the major reason for low operating suite utilization is lack of required resources when they are needed and where they are needed, the CPM/MRP concept is applied to schedule surgeries. The late start algorithm outlined in this dissertation schedules surgeries and resources required for each surgery. The data structures and the surgery scheduling algorithm are incorporated into a FORTRAN IV computer program. The program has been tested with actual data gathered from a hospital. The results met the objectives of both computer memory utilization and low computation time. Computer programs.
96	Distributed Frameworks Towards Building an Open Data Architecture Venumuddala, Ramu Reddy 05 1900 (has links) Data is everywhere. The current Technological advancements in Digital, Social media and the ease at which the availability of different application services to interact with variety of systems are causing to generate tremendous volumes of data. Due to such varied services, Data format is now not restricted to only structure type like text but can generate unstructured content like social media data, videos and images etc. The generated Data is of no use unless been stored and analyzed to derive some Value. Traditional Database systems comes with limitations on the type of data format schema, access rates and storage sizes etc. Hadoop is an Apache open source distributed framework that support storing huge datasets of different formatted data reliably on its file system named Hadoop File System (HDFS) and to process the data stored on HDFS using MapReduce programming model. This thesis study is about building a Data Architecture using Hadoop and its related open source distributed frameworks to support a Data flow pipeline on a low commodity hardware. The Data flow components are, sourcing data, storage management on HDFS and data access layer. This study also discuss about a use case to utilize the architecture components. Sqoop, a framework to ingest the structured data from database onto Hadoop and Flume is used to ingest the semi-structured Twitter streaming json data on to HDFS for analysis. The data sourced using Sqoop and Flume have been analyzed using Hive for SQL like analytics and at a higher level of data access layer, Hadoop has been compared with an in memory computing system using Spark. Significant differences in query execution performances have been analyzed when working with Hadoop and Spark frameworks. This integration helps for ingesting huge Volumes of streaming json Variety data to derive better Value based analytics using Hive and Spark. Hadoop Hive MapReduce Flume Software architecture. Data structures (Computer science) Apache Hadoop.
97	Data mining heuristic-¬based malware detection for android applications Unknown Date (has links) The Google Android mobile phone platform is one of the dominant smartphone operating systems on the market. The open source Android platform allows developers to take full advantage of the mobile operation system, but also raises significant issues related to malicious applications (Apps). The popularity of Android platform draws attention of many developers which also attracts the attention of cybercriminals to develop different kinds of malware to be inserted into the Google Android Market or other third party markets as safe applications. In this thesis, we propose to combine permission, API (Application Program Interface) calls and function calls to build a Heuristic-Based framework for the detection of malicious Android Apps. In our design, the permission is extracted from each App’s profile information and the APIs are extracted from the packed App file by using packages and classes to represent API calls. By using permissions, API calls and function calls as features to characterize each of Apps, we can develop a classifier by data mining techniques to identify whether an App is potentially malicious or not. An inherent advantage of our method is that it does not need to involve any dynamic tracking of the system calls but only uses simple static analysis to find system functions from each App. In addition, Our Method can be generalized to all mobile applications due to the fact that APIs and function calls are always present for mobile Apps. Experiments on real-world Apps with more than 1200 malwares and 1200 benign samples validate the algorithm performance. Research paper published based on the work reported in this thesis: Naser Peiravian, Xingquan Zhu, Machine Learning for Android Malware Detection Using Permission and API Calls, in Proc. of the 25th IEEE International Conference on Tools with Artificial Intelligence (ICTAI) – Washington D.C, November 4-6, 2013. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2013. Computer networks -- Security measures Data encryption (Computer science) Data structures (Computer science) Internet -- Security measures
98	Summarizing static graphs and mining dynamic graphs. / CUHK electronic theses & dissertations collection January 2011 (has links) Besides finding changing areas based on the number of node and edge evolutions, a more interesting problem is to analyze the impact of these evolutions to graphs and find the regions that exhibit significant changes when these evolutions happen. The more different the relationship between nodes in a certain region is, the more significant this region is. This problem is challenging since it is hard to define the range of changing regions that is closely related to actual evolutions. We formalize the problem by using a similarity measure based on neighborhood random walks, and design an efficient algorithm which is able to identify the significant changing regions without recomputing all similarities. Meaningful examples in experiments demonstrate the effectiveness of our algorithms. / Graph patterns are able to represent the complex structural relations among objects in many applications in various domains. Managing and mining graph data, on which we study in this thesis, are no doubt among the most important tasks. We focus on two challenging problems, namely, graph summarization and graph change detection. / In the area of summarizing a collection of graphs, we study the problem of summarizing frequent subgraphs, since it is not much necessary to summarize a collection of random graphs. The bottleneck for exploring and understanding frequent subgraphs is that they are numerous. A summary can be a solution to this issue, so the goal of frequent subgraph summarization is to minimize the restoration errors of the structure and the frequency information. The unique challenge in frequent subgraph summarization comes from the fact that a subgraph can have multiple embeddings in a summarization template graph. We handle this issue by introducing a partial order between edges to allow accurate structure and frequency estimation based on an independence probabilistic model. The proposed algorithm discovers k summarization templates in a top-down fashion to control the restoration error of frequencies within sigma. There is no restoration error of structures. Experiments on both real and synthetic graph datasets show that our framework can control the frequency restoration error within 10% by a compact summarization model. / The objective of graph change detection is to discover the changing areas on graphs when they evolves at a high speed. The most changing areas are those areas having the highest number of evolutions (additions/deletions) of nodes and edges, which is called burst areas. We study on finding the most burst areas in a stream of fast graph evolutions. We propose to use Haar wavelet tree to monitor the upper bound of the number of evolutions. Our approach monitors all potential changing areas of different sizes and computes incrementally the number of evolutions in those areas. The top-k burst areas are returned as soon as they are detected. Our solution is capable of handling a large amount of evolutions in a short time, which is consistent to the experimental results. / The objective of graph summarization is to obtain a concise representation of a single large graph or a collection of graphs, which is interpretable and suitable for analysis. A good summary can reveal the hidden relationships between nodes in a graph. The key issue of summarizing a single graph is how to construct a high-quality and representative summary, which is in the form of a super-graph. We propose an entropy-based unified model for measuring the homogeneity of the super-graph. The best summary in terms of homogeneity could be too large to explore. By using the unified model, we relax three summarization criteria to obtain an approximate homogeneous summary of appropriate size. We propose both agglomerative and divisive algorithms for approximate summarization, as well as pruning techniques and heuristics for both algorithms to save computation cost. Experimental results confirm that our approaches can efficiently generate high-quality summaries. / Liu, Zheng. / Advisers: Wai Lam; Jeffrey Xu Yu. / Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 133-141). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Data mining Data structures (Computer science) Graph theory--Data processing Graphic methods
99	Superseding neighbor search on uncertain data. / 在不確定的空間數據庫中尋找最高取代性的最近鄰 / Zai bu que ding de kong jian shu ju ku zhong xun zhao zui gao qu dai xing de zui jin lin January 2009 (has links) Yuen, Sze Man. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves [44]-46). / Abstract also in Chinese. / Thesis Committee --- p.i / Abstract --- p.ii / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Related Work --- p.6 / Chapter 2.1 --- Nearest Neighbor Search on Precise Data --- p.6 / Chapter 2.2 --- NN Search on Uncertain Data --- p.8 / Chapter 3 --- Problem Definitions and Basic Characteristics --- p.11 / Chapter 4 --- The Full-Graph Approach --- p.16 / Chapter 5 --- The Pipeline Approach --- p.19 / Chapter 5.1 --- The Algorithm --- p.20 / Chapter 5.2 --- Edge Phase --- p.24 / Chapter 5.3 --- Pruning Phase --- p.27 / Chapter 5.4 --- Validating Phase --- p.28 / Chapter 5.5 --- Discussion --- p.29 / Chapter 6 --- Extension --- p.31 / Chapter 7 --- Experiment --- p.34 / Chapter 7.1 --- Properties of the SNN-core --- p.34 / Chapter 7.2 --- Efficiency of Our Algorithms --- p.38 / Chapter 8 --- Conclusions and Future Work --- p.42 / Chapter A --- List of Publications --- p.43 / Bibliography --- p.44 Database management Nearest neighbor analysis (Statistics) Data structures (Computer science) Uncertainty (Information theory)
100	Bayesian Modeling Strategies for Complex Data Structures, with Applications to Neuroscience and Medicine Lu, Feihan January 2018 (has links) Bayesian statistical procedures use probabilistic models and probability distributions to summarize data, estimate unknown quantities of interest, and predict future observations. The procedures borrow strength from other observations in the dataset by using prior distributions and/or hierarchical model specifications. The unique posterior sampling techniques can handle different issues, e.g., missing data, imputation, and extraction of parameters (and their functional forms) that would otherwise be difficult to address using conventional methods. In this dissertation, we propose Bayesian modeling strategies to address various challenges arising in the fields of neuroscience and medicine. Specifically, we propose a sparse Bayesian hierarchical Vector Autoregressive (VAR) model to map human brain connectivity using multi-subject multi-session functional magnetic resonance image (fMRI) data. We use the same model on patient diary databases, focusing on patient-level prediction of medical conditions using posterior predictive samples. We also propose a Bayesian model with an augmented Markov Chain Monte Carlo (MCMC) algorithm on repeat Electrical Stimulation Mappings (ESM) to evaluate the variability of localization in brain sites responsible for language function. We close by using Bayesian disproportionality analyses on spontaneous reporting system (SRS) databases for post-market drug safety surveillance, illustrating the caution required in real-world analysis and decision making. Statistics Bayesian statistical decision theory Data structures (Computer science) Neurosciences--Data processing Medicine--Data processing

Search results