• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 189
  • 25
  • 22
  • 21
  • 13
  • 12
  • 7
  • 6
  • 4
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 356
  • 356
  • 66
  • 63
  • 61
  • 53
  • 50
  • 47
  • 42
  • 41
  • 41
  • 39
  • 36
  • 33
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

Management, visualisation & mining of quantitative proteomics data

Ahmad, Yasmeen January 2012 (has links)
Exponential data growth in life sciences demands cross discipline work that brings together computing and life sciences in a usable manner that can enhance knowledge and understanding in both fields. High throughput approaches, advances in instrumentation and overall complexity of mass spectrometry data have made it impossible for researchers to manually analyse data using existing market tools. By applying a user-centred approach to effectively capture domain knowledge and experience of biologists, this thesis has bridged the gap between computation and biology through software, PepTracker (http://www.peptracker.com). This software provides a framework for the systematic detection and analysis of proteins that can be correlated with biological properties to expand the functional annotation of the genome. The tools created in this study aim to place analysis capabilities back in the hands of biologists, who are expert in evaluating their data. Another major advantage of the PepTracker suite is the implementation of a data warehouse, which manages and collates highly annotated experimental data from numerous experiments carried out by many researchers. This repository captures the collective experience of a laboratory, which can be accessed via user-friendly interfaces. Rather than viewing datasets as isolated components, this thesis explores the potential that can be gained from collating datasets in a “super-experiment” ideology, leading to formation of broad ranging questions and promoting biology driven lines of questioning. This has been uniquely implemented by integrating tools and techniques from the field of Business Intelligence with Life Sciences and successfully shown to aid in the analysis of proteomic interaction experiments. Having conquered a means of documenting a static proteomics snapshot of cells, the proteomics field is progressing towards understanding the extremely complex nature of cell dynamics. PepTracker facilitates this by providing the means to gather and analyse many protein properties to generate new biological insight, as demonstrated by the identification of novel protein isoforms.
92

Towards developing a goal-driven data integration framework for counter-terrorism analytics

Liu, Dapeng 01 January 2019 (has links)
Terrorist attacks can cause massive casualties and severe property damage, resulting in terrorism crises surging across the world; accordingly, counter-terrorism analytics that take advantage of big data have been attracting increasing attention. The knowledge and clues essential for analyzing terrorist activities are often spread across heterogeneous data sources, which calls for an effective data integration solution. In this study, employing the goal definition template in the Goal-Question-Metric approach, we design and implement an automated goal-driven data integration framework for counter-terrorism analytics. The proposed design elicits and ontologizes an input user goal of counter-terrorism analytics; recognizes goal-relevant datasets; and addresses semantic heterogeneity in the recognized datasets. Our proposed design, following the design science methodology, presents a theoretical framing for on-demand data integration designs that can accommodate diverse and dynamic user goals of counter-terrorism analytics and output integrated data tailored to these goals.
93

Filecules: A New Granularity for Resource Management in Grids

Doraimani, Shyamala 26 March 2007 (has links)
Grids provide an infrastructure for seamless, secure access to a globally distributed set of shared computing resources. Grid computing has reached the stage where deployments are run in production mode. In the most active Grid community, the scientific community, jobs are data and compute intensive. Scientific Grid deployments offer the opportunity for revisiting and perhaps updating traditional beliefs related to workload models and hence reevaluate traditional resource management techniques. In this thesis, we study usage patterns from a large-scale scientificGrid collaboration in high-energy physics. We focus mainly on data usage, since data is the major resource for this class of applications. We perform a detailed workload characterization which led us to propose a new data abstraction, filecule, that groups correlated files. We characterize filecules and show that they are an appropriate data granularity for resource management. In scientific applications, job scheduling and data staging are tightly coupled. The only algorithm previously proposed for this class of applications, Greedy Request Value (GRV), uses a function that assigns a relative value to a job. We wrote a cache simulator that uses the same technique of combining cache replacement with job reordering to evaluate and compare quantitatively a set of alternative solutions. These solutions are combinations of Least Recently Used (LRU) and GRV from the cache replacement space with First-Come First-Served (FCFS) and the GRV-specific job reordering from the scheduling space. Using real workload from the DZero Experiment at Fermi National Accelerator Laboratory, we measure and compare performance based on byte hit rate, cache change, job waiting time, job waiting queue length, and scheduling overhead. Based on our experimental investigations, we propose a new technique that combines LRU for cache replacement and job scheduling based onthe relative request value. This technique incurs less data transfer costs than the GRV algorithm and shorter job processing delays than FCFS. We also propose using filecules for data management to further improve the results obtained from the above LRU and GRV combination. We show that filecules can be identified in practical situations and demonstrate how the accuracy of filecule identification influences caching performance.
94

Benefits of Simulation Models in Product Data Management System : A pilot study with cooling system simulation models

Köhn, Elvira January 2019 (has links)
The product development today handles increasingly complex products and to be able to compete on the current market companies need an effective PLM/PDM system to manage the lifecycle, models, and data connected to their products. Three of the factors for success in product development are time, cost, and quality. Which need to be supported by the processes and tool used in a project. Product development often uses both physical and analytical prototypes. The analytical method of simulation is an important element in product development that has started to shift from being a validation and verification tool at the last stages in the development process to be more included in the early stages. Simulation models often generate a big amount of data and because of this, the storing and management of them can be troublesome. Therefore, there is a need to have closer integration between design and simulation. The purpose of the thesis is to do an inquiry about what a PLM system contributes to in a company regarding their product development and why and how simulation models can be connected to the company’s PDM system.The methods used during the study were literature reviews, interviews, workshops, and a survey. The results show that in the literature the benefits of using a PLM and PDM system are connected to the factors for a successful product development which are time, quality and cost. While the employees think traceability, reuse of data and storing is the most important benefits. Simulation models are beneficial to the product development process and should, therefore, be stored in a way that there is a connection between the simulation model and the design model. For the employees, the highest-ranking benefits with adding simulation models to the system are traceability, reuse of simulation models and control over simulation models. A manual for how the simulation engineer can utilize the system and add simulation models to it are presented.
95

Long term preservation of textual information in the AEC sector

Bader, Refad, University of Western Sydney, College of Health and Science, School of Computing and Mathematics January 2007 (has links)
As we are living in a vast changing technological era, the hardware and software required to read electronic documents continue to evolve, and the technology may be so different in the near future that it may not work on older documents. Preserving information over long term is already known as a problem. This research investigates the potential of using XML in improving long term preservation of textual information resulting from AEC (Architectural, Engineering and Construction) projects. It identifies and analyses the issues involved in the subject of handling information over a long period of time in this sector and maps out a strategy to solve those issues. The main focus is not the centralized preservation of documents, but rather the preservation of segments of information scattered between different decision makers in the AEC. In the end a methodology for exchanging information between different decision makers, collecting related information from different decision makers, and preserving such information in the AEC sector for long term purposes that is based on the use of XML will be presented. / Master of Science (Hons)
96

Management of Time Series Data

Matus Castillejos, Abel, n/a January 2006 (has links)
Every day large volumes of data are collected in the form of time series. Time series are collections of events or observations, predominantly numeric in nature, sequentially recorded on a regular or irregular time basis. Time series are becoming increasingly important in nearly every organisation and industry, including banking, finance, telecommunication, and transportation. Banking institutions, for instance, rely on the analysis of time series for forecasting economic indices, elaborating financial market models, and registering international trade operations. More and more time series are being used in this type of investigation and becoming a valuable resource in today�s organisations. This thesis investigates and proposes solutions to some current and important issues in time series data management (TSDM), using Design Science Research Methodology. The thesis presents new models for mapping time series data to relational databases which optimise the use of disk space, can handle different time granularities, status attributes, and facilitate time series data manipulation in a commercial Relational Database Management System (RDBMS). These new models provide a good solution for current time series database applications with RDBMS and are tested with a case study and prototype with financial time series information. Also included is a temporal data model for illustrating time series data lifetime behaviour based on a new set of time dimensions (confidentiality, definitiveness, validity, and maturity times) specially targeted to manage time series data which are introduced to correctly represent the different status of time series data in a timeline. The proposed temporal data model gives a clear and accurate picture of the time series data lifecycle. Formal definitions of these time series dimensions are also presented. In addition, a time series grouping mechanism in an extensible commercial relational database system is defined, illustrated, and justified. The extension consists of a new data type and its corresponding rich set of routines that support modelling and operating time series information within a higher level of abstraction. It extends the capability of the database server to organise and manipulate time series into groups. Thus, this thesis presents a new data type that is referred to as GroupTimeSeries, and its corresponding architecture and support functions and operations. Implementation options for the GroupTimeSeries data type in relational based technologies are also presented. Finally, a framework for TSDM with enough expressiveness of the main requirements of time series application and the management of that data is defined. The framework aims at providing initial domain know-how and requirements of time series data management, avoiding the impracticability of designing a TSDM system on paper from scratch. Many aspects of time series applications including the way time series data are organised at the conceptual level are addressed. The central abstraction for the proposed domain specific framework is the notions of business sections, group of time series, and time series itself. The framework integrates comprehensive specification regarding structural and functional aspects for time series data management. A formal framework specification using conceptual graphs is also explored.
97

Distributed data management with access control : social Networks and Data of the Web

Galland, Alban 28 September 2011 (has links) (PDF)
The amount of information on the Web is spreading very rapidly. Users as well as companies bring data to the network and are willing to share with others. They quickly reach a situation where their information is hosted on many machines they own and on a large number of autonomous systems where they have accounts. Management of all this information is rapidly becoming beyond human expertise. We introduce WebdamExchange, a novel distributed knowledge-base model that includes logical statements for specifying information, access control, secrets, distribution, and knowledge about other peers. These statements can be communicated, replicated, queried, and updated, while keeping track of time and provenance. The resulting knowledge guides distributed data management. WebdamExchange model is based on WebdamLog, a new rule-based language for distributed data management that combines in a formal setting deductiverules as in Datalog with negation, (to specify intensional data) and active rules as in Datalog:: (for updates and communications). The model provides a novel setting with a strong emphasis on dynamicity and interactions(in a Web 2.0 style). Because the model is powerful, it provides a clean basis for the specification of complex distributed applications. Because it is simple, it provides a formal framework for studying many facets of the problem such as distribution, concurrency, and expressivity in the context of distributed autonomous peers. We also discuss an implementation of a proof-of-concept system that handles all the components of the knowledge base and experiments with a lighter system designed for smartphones. We believe that these contributions are a good foundation to overcome theproblems of Web data management, in particular with respect to access control.
98

Information Centric Development of Component-Based Embedded Real-Time Systems

Hjertström, Andreas January 2009 (has links)
<p>This thesis presents new techniques for data management of run-time data objectsin component-based embedded real-time systems. These techniques enabledata to be modeled, analyzed and structured to achieve data managementduring development, maintenance and execution.The evolution of real-time embedded systems has resulted in an increasedsystem complexity beyond what was thought possible just a few years ago.Over the years, new techniques and tools have been developed to manage softwareand communication complexity. However, as this thesis show, currenttechniques and tools for data management are not sufficient. Today, developmentof real-time embedded systems focuses on the function aspects of thesystem, in most cases disregarding data management.The lack of proper design-time data management often results in ineffectivedocumentation routines and poor overall system knowledge. Contemporarytechniques to manage run-time data do not satisfy demands on flexibility,maintainability and extensibility. Based on an industrial case-study that identifiesa number of problems within current data management techniques, bothduring design-time and run-time, it is clear that data management needs to beincorporated as an integral part of the development of the entire system architecture.As a remedy to the identified problems, we propose a design-time data entityapproach, where the importance of data in the system is elevated to beincluded in the entire design phase with proper documentation, properties, dependenciesand analysis methods to increase the overall system knowledge.Furthermore, to efficiently manage data during run-time, we introduce databaseproxies to enable the fusion between two existing techniques; ComponentBased Software Engineering (CBSE) and Real-Time Database ManagementSystems (RTDBMS). A database proxy allows components to be decoupledfrom the underlying data management strategy without violating the componentencapsulation and communication interface.</p> / INCENSE
99

Energy-Efficient Data Management in Wireless Sensor Networks

Ai, Chunyu 13 July 2010 (has links)
Wireless Sensor Networks (WSNs) are deployed widely for various applications. A variety of useful data are generated by these deployments. Since WSNs have limited resources and unreliable communication links, traditional data management techniques are not suitable. Therefore, designing effective data management techniques for WSNs becomes important. In this dissertation, we address three key issues of data management in WSNs. For data collection, a scheme of making some nodes sleep and estimating their values according to the other active nodes’ readings has been proved energy-efficient. For the purpose of improving the precision of estimation, we propose two powerful estimation models, Data Estimation using a Physical Model (DEPM) and Data Estimation using a Statistical Model (DESM). Most of existing data processing approaches of WSNs are real-time. However, historical data of WSNs are also significant for various applications. No previous study has specifically addressed distributed historical data query processing. We propose an Index based Historical Data Query Processing scheme which stores historical data locally and processes queries energy-efficiently by using a distributed index tree. Area query processing is significant for various applications of WSNs. No previous study has specifically addressed this issue. We propose an energy-efficient in-network area query processing scheme. In our scheme, we use an intelligent method (Grid lists) to describe an area, thus reducing the communication cost and dropping useless data as early as possible. With a thorough simulation study, it is shown that our schemes are effective and energy- efficient. Based on the area query processing algorithm, an Intelligent Monitoring System is designed to detect various events and provide real-time and accurate information for escaping, rescuing, and evacuation when a dangerous event happened.
100

Flexible techniques for heterogeneous XML data retrieval

Sanz Blasco, Ismael 31 October 2007 (has links)
The progressive adoption of XML by new communities of users has motivated the appearance of applications that require the management of large and complex collections, which present a large amount of heterogeneity. Some relevant examples are present in the fields of bioinformatics, cultural heritage, ontology management and geographic information systems, where heterogeneity is not only reflected in the textual content of documents, but also in the presence of rich structures which cannot be properly accounted for using fixed schema definitions. Current approaches for dealing with heterogeneous XML data are, however, mainly focused at the content level, whereas at the structural level only a limited amount of heterogeneity is tolerated; for instance, weakening the parent-child relationship between nodes into the ancestor-descendant relationship. The main objective of this thesis is devising new approaches for querying heterogeneous XML collections. This general objective has several implications: First, a collection can present different levels of heterogeneity in different granularity levels; this fact has a significant impact in the selection of specific approaches for handling, indexing and querying the collection. Therefore, several metrics are proposed for evaluating the level of heterogeneity at different levels, based on information-theoretical considerations. These metrics can be employed for characterizing collections, and clustering together those collections which present similar characteristics. Second, the high structural variability implies that query techniques based on exact tree matching, such as the standard XPath and XQuery languages, are not suitable for heterogeneous XML collections. As a consequence, approximate querying techniques based on similarity measures must be adopted. Within the thesis, we present a formal framework for the creation of similarity measures which is based on a study of the literature that shows that most approaches for approximate XML retrieval (i) are highly tailored to very specific problems and (ii) use similarity measures for ranking that can be expressed as ad-hoc combinations of a set of --basic' measures. Some examples of these widely used measures are tf-idf for textual information and several variations of edit distances. Our approach wraps these basic measures into generic, parametrizable components that can be combined into complex measures by exploiting the composite pattern, commonly used in Software Engineering. This approach also allows us to integrate seamlessly highly specific measures, such as protein-oriented matching functions.Finally, these measures are employed for the approximate retrieval of data in a context of highly structural heterogeneity, using a new approach based on the concepts of pattern and fragment. In our context, a pattern is a concise representations of the information needs of a user, and a fragment is a match of a pattern found in the database. A pattern consists of a set of tree-structured elements --- basically an XML subtree that is intended to be found in the database, but with a flexible semantics that is strongly dependent on a particular similarity measure. For example, depending on a particular measure, the particular hierarchy of elements, or the ordering of siblings, may or may not be deemed to be relevant when searching for occurrences in the database. Fragment matching, as a query primitive, can deal with a much higher degree of flexibility than existing approaches. In this thesis we provide exhaustive and top-k query algorithms. In the latter case, we adopt an approach that does not require the similarity measure to be monotonic, as all previous XML top-k algorithms (usually based on Fagin's algorithm) do. We also presents two extensions which are important in practical settings: a specification for the integration of the aforementioned techniques into XQuery, and a clustering algorithm that is useful to manage complex result sets.All of the algorithms have been implemented as part of ArHeX, a toolkit for the development of multi-similarity XML applications, which supports fragment-based queries through an extension of the XQuery language, and includes graphical tools for designing similarity measures and querying collections. We have used ArHeX to demonstrate the effectiveness of our approach using both synthetic and real data sets, in the context of a biomedical research project.

Page generated in 0.0824 seconds