111 |
A computer-aided design scheme for drainage and runoff systemsBattle, Timothy P. January 1985 (has links)
A computer-aided design scheme for both man-made and natural runoff systems is presented. The model uses linear programming to solve Muskingum routing equations through a drainage system, and provides design information through post-optimality (sensitivity) analysis. With the objective of minimizing the peak outflow from the system and using hydrograph ordinates as the decision variables, the output of the linear programming analysis shows the extent that each flow ordinate at every node in the network influences the peak flow at some downstream location. This information can aid the user in speeding up the design process to arrive at an efficient design - i.e., one which either minimizes construction costs or reduces the potential risk of flood damage. / Applied Science, Faculty of / Civil Engineering, Department of / Graduate
|
112 |
MACHINE LEARNING ALGORITHM PERFORMANCE OPTIMIZATION: SOLVING ISSUES OF BIG DATA ANALYSISSohangir, Soroosh 01 December 2015 (has links) (PDF)
Because of high complexity of time and space, generating machine learning models for big data is difficult. This research is introducing a novel approach to optimize the performance of learning algorithms with a particular focus on big data manipulation. To implement this method a machine learning platform using eighteen machine learning algorithms is implemented. This platform is tested using four different use cases and result is illustrated and analyzed.
|
113 |
Graph Neural Networks for Improved Interpretability and EfficiencyPho, Patrick 01 January 2022 (has links) (PDF)
Attributed graph is a powerful tool to model real-life systems which exist in many domains such as social science, biology, e-commerce, etc. The behaviors of those systems are mostly defined by or dependent on their corresponding network structures. Graph analysis has become an important line of research due to the rapid integration of such systems into every aspect of human life and the profound impact they have on human behaviors. Graph structured data contains a rich amount of information from the network connectivity and the supplementary input features of nodes. Machine learning algorithms or traditional network science tools have limitation in their capability to make use of both network topology and node features. Graph Neural Networks (GNNs) provide an efficient framework combining both sources of information to produce accurate prediction for a wide range of tasks including node classification, link prediction, etc. The exponential growth of graph datasets drives the development of complex GNN models causing concerns about processing time and interpretability of the result. Another issue arises from the cost and limitation of collecting a large amount of annotated data for training deep learning GNN models. Apart from sampling issue, the existence of anomaly entities in the data might degrade the quality of the fitted models. In this dissertation, we propose novel techniques and strategies to overcome the above challenges. First, we present a flexible regularization scheme applied to the Simple Graph Convolution (SGC). The proposed framework inherits fast and efficient properties of SGC while rendering a sparse set of fitted parameter vectors, facilitating the identification of important input features. Next, we examine efficient procedures for collecting training samples and develop indicative measures as well as quantitative guidelines to assist practitioners in choosing the optimal sampling strategy to obtain data. We then improve upon an existing GNN model for the anomaly detection task. Our proposed framework achieves better accuracy and reliability. Lastly, we experiment with adapting the flexible regularization mechanism to link prediction task.
|
114 |
Change Point Detection for Streaming Data Using Support Vector MethodsHarrison, Charles 01 January 2022 (has links) (PDF)
Sequential multiple change point detection concerns the identification of multiple points in time where the systematic behavior of a statistical process changes. A special case of this problem, called online anomaly detection, occurs when the goal is to detect the first change and then signal an alert to an analyst for further investigation. This dissertation concerns the use of methods based on kernel functions and support vectors to detect changes. A variety of support vector-based methods are considered, but the primary focus concerns Least Squares Support Vector Data Description (LS-SVDD). LS-SVDD constructs a hypersphere in a kernel space to bound a set of multivariate vectors using a closed-form solution. The mathematical tractability of the LS-SVDD facilitates closed-form updates for the LS-SVDD Lagrange multipliers. The update formulae concern either adding or removing a block of observations from an existing LS-SVDD description, respectively, and thus LS-SVDD can be constructed or updated sequentially which makes it attractive for online problems with sequential data streams. LS-SVDD is applied to a variety of scenarios including online anomaly detection and sequential multiple change point detection.
|
115 |
A fully reversible data transform technique enhancing data compression of SMILES dataScanlon, Shagufta A., Ridley, Mick J. January 2013 (has links)
no / The requirement to efficiently store and process SMILES data used in Chemoinformatics creates a demand for efficient techniques to compress this data. General-purpose transforms and compressors are available to transform and compress this type of data to a certain extent, however, these techniques are not specific to SMILES data. We develop a transform specific to SMILES data that can be used alongside other general-purpose compressors as a preprocessor and post-processor to improve the compression of SMILES data. We test our transform with six other general-purpose compressors and also compare our results with another transform on our SMILES data corpus, we also compare our results with untransformed data.
|
116 |
Urban Data Center: An Architectural Celebration of DataTalarico, Gui 23 June 2011 (has links)
Throughout the last century, the popularization of the automobile and development of roads and highways has changed the way we live, and how cities develop. Bridges, aqueducts, and power plants had comparable impact in the past. I consider each of these examples to be "icons" of infrastructures that we humans build to improve our living environments and to fulfill our urge to become better.Fast forward to now. The last decades showed us the development of new sophisticated networks that connect people and continents. Communication grids, satellite communication, high speed fiber optics and many other technologies have made possible the existence of the ultimate human network - the internet. A network created by us to satisfy our needs to connect, to share, to socialize and communicate over distances never before imagined. The data center is the icon of this network.Through modern digitalization methods, text, sounds, images, and knowledge can be converted into zero's and one's and distributed almost instantly to all corners of the world. The data center is the center piece in the storage, processing, and distribution of this data.The Urban Data Center hopes to bring this icon closer to its creators and users. Let us celebrate its existence and shed some light into the inner workings of the world's largest network. Let the users that inhabit this critical network come inside of it and understand where it lives. This thesis explores the expressive potential of networks and data through the design of a data center in Washington, DC. / Master of Architecture
|
117 |
A Framework for Data Quality for Synthetic InformationGupta, Ragini 24 July 2014 (has links)
Data quality has been an area of increasing interest for researchers in recent years due to the rapid emergence of 'big data' processes and applications. In this work, the data quality problem is viewed from the standpoint of synthetic information. Based on the structure and complexity of synthetic data, a need to have a data quality framework specific to it was realized. This thesis presents this framework along with implementation details and results of a large synthetic dataset to which the developed testing framework is applied. A formal conceptual framework was designed for assessing data quality of synthetic information. This framework involves developing analytical methods and software for assessing data quality for synthetic information. It includes dimensions of data quality that check the inherent properties of the data as well as evaluate it in the context of its use. The framework developed here is a software framework which is designed considering software design techniques like scalability, generality, integrability and modularity. A data abstraction layer has been introduced between the synthetic data and the tests. This abstraction layer has multiple benefits over direct access of the data by the tests. It decouples the tests from the data so that the details of storage and implementation are kept hidden from the user. We have implemented data quality measures for several quality dimensions: accuracy and precision, reliability, completeness, consistency, and validity. The particular tests and quality measures implemented span a range from low-level syntactic checks to high-level semantic quality measures. In each case, in addition to the results of the quality measure itself, we also present results on the computational performance (scalability) of the measure. / Master of Science
|
118 |
Drowning in Data, Starving for Knowledge OMEGA Data EnvironmentCoble, Keith 10 1900 (has links)
International Telemetering Conference Proceedings / October 20-23, 2003 / Riviera Hotel and Convention Center, Las Vegas, Nevada / The quantity T&E data has grown in step with the increase in computing power and digital storage.
T&E data management and exploitation technologies have not kept pace with this exponential
growth. New approaches to the challenges posed by this data explosion must provide for continued
growth while providing seamless integration with the existing body of work. Object Oriented Data
Management provides the framework to handle the continued rapid growth in computer speed and
the amount of data gathered and legacy integration. The OMEGA Data Environment is one of the
first commercially available examples of this emerging class of OODM applications.
|
119 |
Investigating pluralistic data architectures in data warehousingOladele, Kazeem Ayinde January 2015 (has links)
Understanding and managing change is a strategic objective for many organisations to successfully compete in a market place; as a result, organisations are leveraging their data asset and implementing data warehouses to gain business intelligence necessary to improve their businesses. Data warehouses are expensive initiatives, one-half to two-thirds of most data warehousing efforts end in failure. In the absence of well-formalised design methodology in the industry and in the context of the debate on data architecture in data warehousing, this thesis examines why multidimensional and relational data models define the data architecture landscape in the industry. The study develops a number of propositions from the literature and empirical data to understand the factors impacting the choice of logical data model in data warehousing. Using a comparative case study method as the mean of collecting empirical data from the case organisations, the research proposes a conceptual model for logical data model adoption. The model provides a framework that guides decision making for adopting a logical data model for a data warehouse. The research conceptual model identifies the characteristics of business requirements and decision pathways for multidimensional and relational data warehouses. The conceptual model adds value by identifying the business requirements which a multidimensional and relational logical data model is empirically applicable.
|
120 |
Uncovering Nuances in Complex Data Through Focus and Context VisualizationsRzeszotarski, Jeffrey M. 01 May 2017 (has links)
Across a wide variety of digital devices, users create, consume, and disseminate large quantities of information. While data sometimes look like a spreadsheet or network diagram, more often for everyday users their data look more like an Amazon search page, the line-up for a fantasy football team, or a set of Yelp reviews. However, interpreting these kinds of data remains a difficult task even for experts since they often feature soft or unknown constraints (e.g. ”I want some Thai food, but I also want a good bargain”) across highly multidimensional data (i.e. rating, reviews, popularity, proximity). Existing technology is largely optimized for users with hard criteria and satisfiable constraints, and consumer systems often use representations better suited for browsing than sensemaking. In this thesis I explore ways to support soft constraint decision-making and exploratory data analysis by giving users tools that show fine-grained features of the data while at the same time displaying useful contextual information. I describe approaches for representing collaborative content history and working behavior that reveal both individual and group/dataset level features. Using these approaches, I investigate general visualizations that utilize physics to help even inexperienced users find small and large trends in multivariate data. I describe the transition of physicsbased visualization from the research space into the commercial space through a startup company, and the insights that emerged both from interviews with experts in a wide variety of industries during commercialization and from a comparative lab study. Taking one core use case from commercialization, consumer search, I develop a prototype, Fractal, which helps users explore and apply constraints to Yelp data at a variety of scales by curating and representing individual-, group-, and dataset-level features. Through a user study and theoretical model I consider how the prototype can best aide users throughout the sensemaking process. My dissertation further investigates physics-based approaches for represent multivariate data, and explores how the user’s exploration process itself can help dynamically to refine the search process and visual representation. I demonstrate that selectively representing points using clusters can extend physics-based visualizations across a variety of data scales, and help users make sense of data at scales that might otherwise overload them. My model provides a framework for stitching together a model of user interest and data features, unsupervised clustering, and visual representations for exploratory data visualization. The implications from commercialization are more broad, giving insight into why research in the visualization space is/isn’t adopted by industry, a variety of real-world use cases for multivariate exploratory data analysis, and an index of common data visualization needs in industry.
|
Page generated in 0.1572 seconds