Global ETD Search

1	Grid-based semantic integration of heterogeneous data resources : implementation on a HealthGrid Naseer, Aisha January 2007 (has links) The semantic integration of geographically distributed and heterogeneous data resources still remains a key challenge in Grid infrastructures. Today's mainstream Grid technologies hold the promise to meet this challenge in a systematic manner, making data applications more scalable and manageable. The thesis conducts a thorough investigation of the problem, the state of the art, and the related technologies, and proposes an Architecture for Semantic Integration of Data Sources (ASIDS) addressing the semantic heterogeneity issue. It defines a simple mechanism for the interoperability of heterogeneous data sources in order to extract or discover information regardless of their different semantics. The constituent technologies of this architecture include Globus Toolkit (GT4) and OGSA-DAI (Open Grid Service Architecture Data Integration and Access) alongside other web services technologies such as XML (Extensive Markup Language). To show this, the ASIDS architecture was implemented and tested in a realistic setting by building an exemplar application prototype on a HealthGrid (pilot implementation). The study followed an empirical research methodology and was informed by extensive literature surveys and a critical analysis of the relevant technologies and their synergies. The two literature reviews, together with the analysis of the technology background, have provided a good overview of the current Grid and HealthGrid landscape, produced some valuable taxonomies, explored new paths by integrating technologies, and more importantly illuminated the problem and guided the research process towards a promising solution. Yet the primary contribution of this research is an approach that uses contemporary Grid technologies for integrating heterogeneous data resources that have semantically different. data fields (attributes). It has been practically demonstrated (using a prototype HealthGrid) that discovery in semantically integrated distributed data sources can be feasible by using mainstream Grid technologies, which have been shown to have some Significant advantages over non-Grid based approaches. 005.758
2	Retrieving information from heterogeneous freight data sources to answer natural language queries Seedah, Dan Paapanyin Kofi 09 February 2015 (has links) The ability to retrieve accurate information from databases without an extensive knowledge of the contents and organization of each database is extremely beneficial to the dissemination and utilization of freight data. The challenges, however, are: 1) correctly identifying only the relevant information and keywords from questions when dealing with multiple sentence structures, and 2) automatically retrieving, preprocessing, and understanding multiple data sources to determine the best answer to user’s query. Current named entity recognition systems have the ability to identify entities but require an annotated corpus for training which in the field of transportation planning does not currently exist. A hybrid approach which combines multiple models to classify specific named entities was therefore proposed as an alternative. The retrieval and classification of freight related keywords facilitated the process of finding which databases are capable of answering a question. Values in data dictionaries can be queried by mapping keywords to data element fields in various freight databases using ontologies. A number of challenges still arise as a result of different entities sharing the same names, the same entity having multiple names, and differences in classification systems. Dealing with ambiguities is required to accurately determine which database provides the best answer from the list of applicable sources. This dissertation 1) develops an approach to identify and classifying keywords from freight related natural language queries, 2) develops a standardized knowledge representation of freight data sources using an ontology that both computer systems and domain experts can utilize to identify relevant freight data sources, and 3) provides recommendations for addressing ambiguities in freight related named entities. Finally, the use of knowledge base expert systems to intelligently sift through data sources to determine which ones provide the best answer to a user’s question is proposed. / text Freight data Heterogeneous data sources Freight ontology Natural language processing Ambiguity Disambiguation Knowledge systems
3	Contextual Outlier Detection from Heterogeneous Data Sources Yan, Yizhou 17 May 2020 (has links) The dissertation focuses on detecting contextual outliers from heterogeneous data sources. Modern sensor-based applications such as Internet of Things (IoT) applications and autonomous vehicles are generating a huge amount of heterogeneous data including not only the structured multi-variate data points, but also other complex types of data such as time-stamped sequence data and image data. Detecting outliers from such data sources is critical to diagnose and fix malfunctioning systems, prevent cyber attacks, and save human lives. The outlier detection techniques in the literature typically are unsupervised algorithms with a pre-defined logic, such as, to leverage the probability density at each point to detect outliers. Our analysis of the modern applications reveals that this rigid probability density-based methodology has severe drawbacks. That is, low probability density objects are not necessarily outliers, while the objects with relatively high probability densities might in fact be abnormal. In many cases, the determination of the outlierness of an object has to take the context in which this object occurs into consideration. Within this scope, my dissertation focuses on four research innovations, namely techniques and system for scalable contextual outlier detection from multi-dimensional data points, contextual outlier pattern detection from sequence data, contextual outlier image detection from image data sets, and lastly an integrative end-to-end outlier detection system capable of doing automatic outlier detection, outlier summarization and outlier explanation. 1. Scalable Contextual Outlier Detection from Multi-dimensional Data. Mining contextual outliers from big datasets is a computational expensive process because of the complex recursive kNN search used to define the context of each point. In this research, leveraging the power of distributed compute clusters, we design distributed contextual outlier detection strategies that optimize the key factors determining the efficiency of local outlier detection, namely, to localize the kNN search while still ensuring the load balancing. 2. Contextual Outlier Detection from Sequence Data. For big sequence data, such as messages exchanged between devices and servers and log files measuring complex system behaviors over time, outliers typically occur as a subsequence of symbolic values (or sequential pattern), in which each individual value itself may be completely normal. However, existing sequential pattern mining semantics tend to mis-classify outlier patterns as typical patterns due to ignoring the context in which the pattern occurs. In this dissertation, we present new context-aware pattern mining semantics and then design efficient mining strategies to support these new semantics. In addition, methodologies that continuously extract these outlier patterns from sequence streams are also developed. 3. Contextual Outlier Detection from Image Data. An image classification system not only needs to accurately classify objects from target classes, but also should safely reject unknown objects that belong to classes not present in the training data. Here, the training data defines the context of the classifier and unknown objects then correspond to contextual image outliers. Although the existing Convolutional Neural Network (CNN) achieves high accuracy when classifying known objects, the sum operation on multiple features produced by the convolutional layers causes an unknown object being classified to a target class with high confidence even if it matches some key features of a target class only by chance. In this research, we design an Unknown-aware Deep Neural Network (UDN for short) to detect contextual image outliers. The key idea of UDN is to enhance existing Convolutional Neural Network (CNN) to support a product operation that models the product relationship among the features produced by convolutional layers. This way, missing a single key feature of a target class will greatly reduce the probability of assigning an object to this class. To further improve the performance of our UDN at detecting contextual outliers, we propose an information-theoretic regularization strategy that incorporates the objective of rejecting unknowns into the learning process of UDN. 4. An End-to-end Integrated Outlier Detection System. Although numerous detection algorithms proposed in the literature, there is no one approach that brings the wealth of these alternate algorithms to bear in an integrated infrastructure to support versatile outlier discovery. In this work, we design the first end-to-end outlier detection service that integrates outlier-related services including automatic outlier detection, outlier summarization and explanation, human guided outlier detector refinement within one integrated outlier discovery paradigm. Experimental studies including performance evaluation and user studies conducted on benchmark outlier detection datasets and real world datasets including Geolocation, Lighting, MNIST, CIFAR and the Log file datasets confirm both the effectiveness and efficiency of the proposed approaches and systems. Outlier Detection Context Scalable End-to-end System Heterogeneous Data Sources
4	DATAWAREHOUSE APPROACH TO DECISION SUPPORT SYSTEM FROM DISTRIBUTED, HETEROGENEOUS SOURCES Sannellappanavar, Vijaya Laxmankumar 05 October 2006 (has links) No description available. Decision making Several sources of data Distributed, heterogeneous data sources extract information querying and analysis Data Integration Information Integration Datawarehousing Datawarehouse

1

Page generated in 0.1061 seconds