• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 19464
  • 3369
  • 2388
  • 2005
  • 1551
  • 1432
  • 873
  • 406
  • 390
  • 359
  • 297
  • 233
  • 208
  • 208
  • 208
  • Tagged with
  • 37876
  • 12330
  • 9168
  • 7047
  • 6630
  • 5834
  • 5255
  • 5149
  • 4686
  • 3385
  • 3294
  • 2778
  • 2724
  • 2521
  • 2094
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

MACHINE LEARNING ALGORITHM PERFORMANCE OPTIMIZATION: SOLVING ISSUES OF BIG DATA ANALYSIS

Sohangir, Soroosh 01 December 2015 (has links) (PDF)
Because of high complexity of time and space, generating machine learning models for big data is difficult. This research is introducing a novel approach to optimize the performance of learning algorithms with a particular focus on big data manipulation. To implement this method a machine learning platform using eighteen machine learning algorithms is implemented. This platform is tested using four different use cases and result is illustrated and analyzed.
112

Graph Neural Networks for Improved Interpretability and Efficiency

Pho, Patrick 01 January 2022 (has links) (PDF)
Attributed graph is a powerful tool to model real-life systems which exist in many domains such as social science, biology, e-commerce, etc. The behaviors of those systems are mostly defined by or dependent on their corresponding network structures. Graph analysis has become an important line of research due to the rapid integration of such systems into every aspect of human life and the profound impact they have on human behaviors. Graph structured data contains a rich amount of information from the network connectivity and the supplementary input features of nodes. Machine learning algorithms or traditional network science tools have limitation in their capability to make use of both network topology and node features. Graph Neural Networks (GNNs) provide an efficient framework combining both sources of information to produce accurate prediction for a wide range of tasks including node classification, link prediction, etc. The exponential growth of graph datasets drives the development of complex GNN models causing concerns about processing time and interpretability of the result. Another issue arises from the cost and limitation of collecting a large amount of annotated data for training deep learning GNN models. Apart from sampling issue, the existence of anomaly entities in the data might degrade the quality of the fitted models. In this dissertation, we propose novel techniques and strategies to overcome the above challenges. First, we present a flexible regularization scheme applied to the Simple Graph Convolution (SGC). The proposed framework inherits fast and efficient properties of SGC while rendering a sparse set of fitted parameter vectors, facilitating the identification of important input features. Next, we examine efficient procedures for collecting training samples and develop indicative measures as well as quantitative guidelines to assist practitioners in choosing the optimal sampling strategy to obtain data. We then improve upon an existing GNN model for the anomaly detection task. Our proposed framework achieves better accuracy and reliability. Lastly, we experiment with adapting the flexible regularization mechanism to link prediction task.
113

Change Point Detection for Streaming Data Using Support Vector Methods

Harrison, Charles 01 January 2022 (has links) (PDF)
Sequential multiple change point detection concerns the identification of multiple points in time where the systematic behavior of a statistical process changes. A special case of this problem, called online anomaly detection, occurs when the goal is to detect the first change and then signal an alert to an analyst for further investigation. This dissertation concerns the use of methods based on kernel functions and support vectors to detect changes. A variety of support vector-based methods are considered, but the primary focus concerns Least Squares Support Vector Data Description (LS-SVDD). LS-SVDD constructs a hypersphere in a kernel space to bound a set of multivariate vectors using a closed-form solution. The mathematical tractability of the LS-SVDD facilitates closed-form updates for the LS-SVDD Lagrange multipliers. The update formulae concern either adding or removing a block of observations from an existing LS-SVDD description, respectively, and thus LS-SVDD can be constructed or updated sequentially which makes it attractive for online problems with sequential data streams. LS-SVDD is applied to a variety of scenarios including online anomaly detection and sequential multiple change point detection.
114

A fully reversible data transform technique enhancing data compression of SMILES data

Scanlon, Shagufta A., Ridley, Mick J. January 2013 (has links)
no / The requirement to efficiently store and process SMILES data used in Chemoinformatics creates a demand for efficient techniques to compress this data. General-purpose transforms and compressors are available to transform and compress this type of data to a certain extent, however, these techniques are not specific to SMILES data. We develop a transform specific to SMILES data that can be used alongside other general-purpose compressors as a preprocessor and post-processor to improve the compression of SMILES data. We test our transform with six other general-purpose compressors and also compare our results with another transform on our SMILES data corpus, we also compare our results with untransformed data.
115

Urban Data Center: An Architectural Celebration of Data

Talarico, Gui 23 June 2011 (has links)
Throughout the last century, the popularization of the automobile and development of roads and highways has changed the way we live, and how cities develop. Bridges, aqueducts, and power plants had comparable impact in the past. I consider each of these examples to be "icons" of infrastructures that we humans build to improve our living environments and to fulfill our urge to become better.Fast forward to now. The last decades showed us the development of new sophisticated networks that connect people and continents. Communication grids, satellite communication, high speed fiber optics and many other technologies have made possible the existence of the ultimate human network - the internet. A network created by us to satisfy our needs to connect, to share, to socialize and communicate over distances never before imagined. The data center is the icon of this network.Through modern digitalization methods, text, sounds, images, and knowledge can be converted into zero's and one's and distributed almost instantly to all corners of the world. The data center is the center piece in the storage, processing, and distribution of this data.The Urban Data Center hopes to bring this icon closer to its creators and users. Let us celebrate its existence and shed some light into the inner workings of the world's largest network. Let the users that inhabit this critical network come inside of it and understand where it lives. This thesis explores the expressive potential of networks and data through the design of a data center in Washington, DC. / Master of Architecture
116

A Framework for Data Quality for Synthetic Information

Gupta, Ragini 24 July 2014 (has links)
Data quality has been an area of increasing interest for researchers in recent years due to the rapid emergence of 'big data' processes and applications. In this work, the data quality problem is viewed from the standpoint of synthetic information. Based on the structure and complexity of synthetic data, a need to have a data quality framework specific to it was realized. This thesis presents this framework along with implementation details and results of a large synthetic dataset to which the developed testing framework is applied. A formal conceptual framework was designed for assessing data quality of synthetic information. This framework involves developing analytical methods and software for assessing data quality for synthetic information. It includes dimensions of data quality that check the inherent properties of the data as well as evaluate it in the context of its use. The framework developed here is a software framework which is designed considering software design techniques like scalability, generality, integrability and modularity. A data abstraction layer has been introduced between the synthetic data and the tests. This abstraction layer has multiple benefits over direct access of the data by the tests. It decouples the tests from the data so that the details of storage and implementation are kept hidden from the user. We have implemented data quality measures for several quality dimensions: accuracy and precision, reliability, completeness, consistency, and validity. The particular tests and quality measures implemented span a range from low-level syntactic checks to high-level semantic quality measures. In each case, in addition to the results of the quality measure itself, we also present results on the computational performance (scalability) of the measure. / Master of Science
117

Drowning in Data, Starving for Knowledge OMEGA Data Environment

Coble, Keith 10 1900 (has links)
International Telemetering Conference Proceedings / October 20-23, 2003 / Riviera Hotel and Convention Center, Las Vegas, Nevada / The quantity T&E data has grown in step with the increase in computing power and digital storage. T&E data management and exploitation technologies have not kept pace with this exponential growth. New approaches to the challenges posed by this data explosion must provide for continued growth while providing seamless integration with the existing body of work. Object Oriented Data Management provides the framework to handle the continued rapid growth in computer speed and the amount of data gathered and legacy integration. The OMEGA Data Environment is one of the first commercially available examples of this emerging class of OODM applications.
118

Investigating pluralistic data architectures in data warehousing

Oladele, Kazeem Ayinde January 2015 (has links)
Understanding and managing change is a strategic objective for many organisations to successfully compete in a market place; as a result, organisations are leveraging their data asset and implementing data warehouses to gain business intelligence necessary to improve their businesses. Data warehouses are expensive initiatives, one-half to two-thirds of most data warehousing efforts end in failure. In the absence of well-formalised design methodology in the industry and in the context of the debate on data architecture in data warehousing, this thesis examines why multidimensional and relational data models define the data architecture landscape in the industry. The study develops a number of propositions from the literature and empirical data to understand the factors impacting the choice of logical data model in data warehousing. Using a comparative case study method as the mean of collecting empirical data from the case organisations, the research proposes a conceptual model for logical data model adoption. The model provides a framework that guides decision making for adopting a logical data model for a data warehouse. The research conceptual model identifies the characteristics of business requirements and decision pathways for multidimensional and relational data warehouses. The conceptual model adds value by identifying the business requirements which a multidimensional and relational logical data model is empirically applicable.
119

Uncovering Nuances in Complex Data Through Focus and Context Visualizations

Rzeszotarski, Jeffrey M. 01 May 2017 (has links)
Across a wide variety of digital devices, users create, consume, and disseminate large quantities of information. While data sometimes look like a spreadsheet or network diagram, more often for everyday users their data look more like an Amazon search page, the line-up for a fantasy football team, or a set of Yelp reviews. However, interpreting these kinds of data remains a difficult task even for experts since they often feature soft or unknown constraints (e.g. ”I want some Thai food, but I also want a good bargain”) across highly multidimensional data (i.e. rating, reviews, popularity, proximity). Existing technology is largely optimized for users with hard criteria and satisfiable constraints, and consumer systems often use representations better suited for browsing than sensemaking. In this thesis I explore ways to support soft constraint decision-making and exploratory data analysis by giving users tools that show fine-grained features of the data while at the same time displaying useful contextual information. I describe approaches for representing collaborative content history and working behavior that reveal both individual and group/dataset level features. Using these approaches, I investigate general visualizations that utilize physics to help even inexperienced users find small and large trends in multivariate data. I describe the transition of physicsbased visualization from the research space into the commercial space through a startup company, and the insights that emerged both from interviews with experts in a wide variety of industries during commercialization and from a comparative lab study. Taking one core use case from commercialization, consumer search, I develop a prototype, Fractal, which helps users explore and apply constraints to Yelp data at a variety of scales by curating and representing individual-, group-, and dataset-level features. Through a user study and theoretical model I consider how the prototype can best aide users throughout the sensemaking process. My dissertation further investigates physics-based approaches for represent multivariate data, and explores how the user’s exploration process itself can help dynamically to refine the search process and visual representation. I demonstrate that selectively representing points using clusters can extend physics-based visualizations across a variety of data scales, and help users make sense of data at scales that might otherwise overload them. My model provides a framework for stitching together a model of user interest and data features, unsupervised clustering, and visual representations for exploratory data visualization. The implications from commercialization are more broad, giving insight into why research in the visualization space is/isn’t adopted by industry, a variety of real-world use cases for multivariate exploratory data analysis, and an index of common data visualization needs in industry.
120

Data blending in health care : Evaluation of data blending

Chen, Qian January 2016 (has links)
This report is aimed at those who are interested in data analysis and data blending. Decision making is crucial for an organization to succeed in today’s market. Data analysis is an important support activity in decision making and is applied in many industries, for example healthcare. For many years data analysts have worked on structured data in small volumes, with traditional methods such as spreadsheet. As new data sources emerged, such as social media, data is generated in higher volume, velocity and variety [1]. The traditional methods data analysts apply are no longer capable of handling this situation. Hence scientists and engineers have developed a new technology called data blending. Data blending is the process of merging, sorting, joining and combining all the useful data into a functional dataset [2]. Some of the well-known data blending platforms include Datawatch, Microsoft Power Query for Excel, IBM DataWorks and Alteryx [3]. Synergus AB is a consulting company engaged in health economics, market access and Health Technology Assessment (HTA) [4]. The company does analysis for their clients. Unfortunately the way they work is not efficient. New tools and methods need to be applied in the company. The company has decided to apply data blending in their daily work. My task in this project was to build datasets for analysis and create workflows for future use with a data blending platform. For my interest, I did a research on data blending to understand how this new technology works. During the project I have worked with four data sources. These were Microsoft Excel worksheet, CSV file, MS Access database and JSON file. I built datasets the company needs. I also preceded a case study on data blending process. I focused on the three steps of data handling, namely input, process and output. After the project, I reached a conclusion that data blending offers better performance and functionality. It is easy to learn and use, too. / Denna rapport vänder sig till de som är intresserad av data analys och datahantering. Belsut fattande är avgörande för en organisation att lyckas i dagens marknad. Data analys är en viktig stöd inom beslutfattande och tillämpas i många industrier, till exempel hälsovård. I många år har data analyster arbetat med strukturerad data i små volymer, med traditionella arbetsmetoder såsom kalkyblad. Med nya data källor uppstått, såsom sociala media, data är genererad i högre volym, högre hastighet och högre variation. De traditionella metoder data analyster använder är inte längre kapabla av att hantera denna situation. Därför har vetenskapsmän och ingenjörer utvecklat ett ny teknologi kallad datahantering. Datahantering är en process för att sammanfoga, sortera och kombinera all värdeful data till en funktionell dataset. Några av de välkända datahanteringsplatformer inkluderar Datawatch, Microsoft Power Query for Excel, IBM DataWorks and Alteryx. Synergus AB är ett konsultföretag engagerad inom hälsoekonomi, marknad tillträde, och Health Technology Assessment (HTA). Företaget gör analys för deras kunder. Tyvärr är de arbetsmetoder inom företaget inte effektiv. Nya verktyg och metoder måste tillämpas inom företaget. Synergus AB har beslutat att tillämpa datahantering i deras dagliga arbete. Mitt uppdrag i detta projekt var att bygga dataset för analys och skapa arbetsflöde för framtida användning med en datahanteringsplatform. För mitt eget intresse, jag utförde en studie av datahantering för att förstå hur denna nya teknologi fungerar. Under projektet har jag arbetat med fyra data källor. De var Microsft Excel kalkylblad, CSV fil, MS Access databas och JSON fil. Jag byggde dataset företaget behöver. Jag också utförde ett fall studie om datahanteringsprocess. Jag fokuserade mig på de tre steg inom datahantering, nämligen inmatning, bearbetning och utmatning. Efter projektet kom jag till en slutsats att datahantering erjuder bättre prestanda och funktionelitet. Det är också lätt att lära sig och använda.

Page generated in 0.1295 seconds