• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 161
  • 20
  • 11
  • 11
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 316
  • 316
  • 132
  • 109
  • 77
  • 69
  • 63
  • 42
  • 40
  • 38
  • 38
  • 38
  • 36
  • 34
  • 34
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

Project-based Multi-tenant Container Registry For Hopsworks

Kashyap, Pradyumna Krishna January 2020 (has links)
There has been a substantial growth in the usage of data in the past decade, cloud technologies and big data platforms have gained popularity as they help in processing such data on a large scale. Hopsworks is such a managed plat- form for scale out data science. It is an open-source platform for the develop- ment and operation of Machine Learning models, available on-premise and as a managed platform in the cloud. As most of these platforms provide data sci- ence environments to collate the required libraries to work with, Hopsworks provides users with Anaconda environments.Hopsworks provides multi-tenancy, ensuring a secure model to manage sen- sitive data in the shared platform. Most of the Hopsworks features are built around projects, each project includes an Anaconda environment that provides users with a number of libraries capable of processing data. Each project cre- ation triggers a creation of a base Anaconda environment and each added li- brary updates this environment. For an on-premise application, as data science teams are diverse and work towards building repeatable and scalable models, it becomes increasingly important to manage these environments in a central location locally.The purpose of the thesis is to provide a secure storage for these Anaconda en- vironments. As Hopsworks uses a Kubernetes cluster to serve models, these environments can be containerized and stored on a secure container registry on the Kubernetes Cluster. The provided solution also aims to extend the multi- tenancy feature of Hopsworks onto the hosted local storage. The implemen- tation comprises of two parts; First one, is to host a compatible open source container registry to store the container images on a local Kubernetes cluster with fault tolerance and by avoiding a single point of failure. Second one, is to leverage the multi-tenancy feature in Hopsworks by storing the images on the self sufficient secure registry with project level isolation. / Det har skett en betydande tillväxt i dataanvändningen under det senaste decen- niet, molnteknologier och big data-plattformar har vunnit popularitet eftersom de hjälper till att bearbeta sådan data i stor skala. Hopsworks är en sådan hante- rad plattform för att skala ut datavetenskap. Det är en öppen källkodsplattform för utveckling och drift av Machine Learning-modeller, tillgänglig på plats och som en hanterad plattform i molnet. Eftersom de flesta av dessa plattformar tillhandahåller datavetenskapsmiljöer för att samla in de bibliotek som krävs för att arbeta med, ger Hopsworks användare Anaconda-miljöer.Hopsworks tillhandahåller multi-tenancy, vilket säkerställer en säker modell för att hantera känslig data i den delade plattformen. De flesta av Hopsworks- funktionerna är uppbyggda kring projekt, varje projekt innehåller en Anaconda- miljö som ger användarna ett antal bibliotek som kan bearbeta data. Varje projektskapning utlöser skapandet av en basanacondamiljö och varje tillagt bibliotek uppdaterar denna miljö. För en lokal applikation, eftersom datave- tenskapsteam är olika och arbetar för att bygga repeterbara och skalbara mo- deller, blir det allt viktigare att hantera dessa miljöer på en central plats lokalt. Syftet med avhandlingen är att tillhandahålla en säker lagring för dessa Anaconda- miljöer. Eftersom Hopsworks använder ett Kubernetes-kluster för att betjäna modeller kan dessa miljöer containeriseras och lagras i ett säkert container- register i Kubernetes-klustret. Den medföljande lösningen syftar också till att utvidga Hopsworks-funktionen för flera hyresgäster till det lokala lagrade vär- det. Implementeringen består av två delar; Den första är att vara värd för ett kompatibelt register med öppen källkod för att lagra behållaravbildningarna iett lokalt Kubernetes-kluster med feltolerans och genom att undvika en enda felpunkt. Den andra är att utnyttja multihyresfunktionen i Hopsworks genom att lagra bilderna i det självförsörjande säkra registret med projektnivåisole- ring.
122

Data Driven High Performance Data Access

Ramljak, Dusan January 2018 (has links)
Low-latency, high throughput mechanisms to retrieve data become increasingly crucial as the cyber and cyber-physical systems pour out increasing amounts of data that often must be analyzed in an online manner. Generally, as the data volume increases, the marginal utility of an ``average'' data item tends to decline, which requires greater effort in identifying the most valuable data items and making them available with minimal overhead. We believe that data analytics driven mechanisms have a big role to play in solving this needle-in-the-haystack problem. We rely on the claim that efficient pattern discovery and description, coupled with the observed predictability of complex patterns within many applications offers significant potential to enable many I/O optimizations. Our research covers exploitation of storage hierarchy for data driven caching and tiering, reduction of distance between data and computations, removing redundancy in data, using sparse representations of data, the impact of data access mechanisms on resilience, energy consumption, storage usage, and the enablement of new classes of data driven applications. For caching and prefetching, we offer a powerful model that separates the process of access prediction from the data retrieval mechanism. Predictions are made on a data entity basis and used the notions of ``context'' and its aspects such as ``belief'' to uncover and leverage future data needs. This approach allows truly opportunistic utilization of predictive information. We elaborate on which aspects of the context we are using in areas other than caching and prefetching different situations and why it is appropriate in the specified situation. We present in more details the methods we have developed, BeliefCache for data driven caching and prefetching and AVSC for pattern mining based compression of data. In BeliefCache, using a belief, an aspect of context representing an estimate of the probability that the storage element will be needed, we developed modular framework BeliefCache, to make unified informed decisions about that element or a group. For the workloads we examined we were able to capture complex non-sequential access patterns better than a state-of-the-art framework for optimizing cloud storage gateways. Moreover, our framework is also able to adjust to variations in the workload faster. It also does not require a static workload to be effective since modular framework allows for discovering and adapting to the changes in the workload. In AVSC, using an aspect of context to gauge the similarity of the events, we perform our compression by keeping relevant events intact and approximating other events. We do that in two stages. We first generate a summarization of the data, then approximately match the remaining events with the existing patterns if possible, or add the patterns to the summary otherwise. We show gains over the plain lossless compression for a specified amount of accuracy for purposes of identifying the state of the system and a clear tradeoff in between the compressibility and fidelity. In other mentioned research areas we present challenges and opportunities with the hope that will spur researchers to further examine those issues in the space of rapidly emerging data intensive applications. We also discuss the ideas how our research in other domains could be applied in our attempts to provide high performance data access. / Computer and Information Science
123

Restoring Consistency in Ontological Multidimensional Data Models via Weighted Repairs

Haque, Enamul January 2020 (has links)
This can be considered as a multidisciplinary research where ideas from Operations Research, Data Science and Logic came together to solve an inconsistency handling problem in a special type of ontology. / High data quality is a prerequisite for accurate data analysis. However, data inconsistencies often arise in real data, leading to untrusted decision making downstream in the data analysis pipeline. In this research, we study the problem of inconsistency detection and repair of the Ontology Multi-dimensional Data Model (OMD). We propose a framework of data quality assessment, and repair for the OMD. We formally define a weight-based repair-by-deletion semantics, and present an automatic weight generation mechanism that considers multiple input criteria. Our methods are rooted in multi-criteria decision making that consider the correlation, contrast, and conflict that may exist among multiple criteria, and is often needed in the data cleaning domain. After weight generation we present a dynamic programming based Min-Sum algorithm to identify minimal weight solution. We then apply evolutionary optimization techniques and demonstrate improved performance using medical datasets, making it realizable in practice. / Thesis / Master of Computer Science (MCS) / Accurate data analysis requires high quality data as input. In this research, we study inconsistency in an ontology known as Ontology Multi-dimensional Data (OMD) Model and propose algorithms to repair them based on their automatically generated relative weights. We proposed two techniques to restore consistency, one provides optimal results but takes longer time compared to the other one, which produces sub-optimal results but fast enough for practical purposes, shown with experiments on datasets.
124

Interpreting Shift Encoders as State Space models for Stationary Time Series

Donkoh, Patrick 01 May 2024 (has links) (PDF)
Time series analysis is a statistical technique used to analyze sequential data points collected or recorded over time. While traditional models such as autoregressive models and moving average models have performed sufficiently for time series analysis, the advent of artificial neural networks has provided models that have suggested improved performance. In this research, we provide a custom neural network; a shift encoder that can capture the intricate temporal patterns of time series data. We then compare the sparse matrix of the shift encoder to the parameters of the autoregressive model and observe the similarities. We further explore how we can replace the state matrix in a state-space model with the sparse matrix of the shift encoder.
125

Establishing “The Fossil Record”: A Database of Vertebrate Paleontological Sites Across the State of Tennessee

Mclaurine, Sarah 01 May 2024 (has links) (PDF)
Fossil localities across the state of Tennessee and the data related to those sites were compiled from Tennessee Division of Geology Bulletin 84, titled “Tennessee’s Prehistoric Vertebrates,” and stored in a Microsoft Access geodatabase housed by the Department of Collections at the East Tennessee State University Museum of Natural History located at the Gray Fossil Site. Included in the database are forms to enter new site localities, view information about those already entered, view and add data to a master faunal list for the state, view sites repository information and store and add documents that are key-word searchable from the main menu. This database was compiled to give researchers a straightforward and easy to use means of analyzing known information about paleontological sites across the state, with the potential to be expanded worldwide. Conservation of data is crucial and can be lost over time unless data preservation efforts are made.
126

<b>Using ICU Admission as a Predictor for Maternal Mortality: Identifying Essential Features for Accurate Classification</b>

Dairian Haulani Ly Balai (18415224) 20 April 2024 (has links)
<p dir="ltr">Maternal mortality (MM) is a pressing global health issue that results in thousands of mothers dying annually from pregnancy-related complications. Despite spending trillions of dollars on the healthcare industry, the U.S. continues to experience one of the highest rates of maternal death (MD) compared to other developed countries. This ongoing public health crisis highlights the urgent need for innovative strategies to detect and mitigate adverse maternal outcomes. This study introduces a novel approach, utilizing admission to the ICU as a proxy for MM. By analyzing 14 years of natality birth data, this study aims to explore the complex web of factors that elevate the chances of MD. The primary goal of this study is to identify features that are most influential in predicting ICU admission cases. These factors hold the potential to be applied to MM, as they can serve as early warning signs that complications may arise, allowing healthcare professionals to step in and intervene before adverse maternal outcomes occur. Two supervised machine learning models were employed in this study, specifically Logistic Regression (LR) and eXtreme Gradient Boosting (XGBoost). The models were executed twice for each dataset: once incorporating all available features and again utilizing only the most significant features. Following model training, XGBoost’s feature selection technique was employed to identify the top 10 influential features that are most important to the classification process. Our analysis revealed a diverse range of factors that are important for the prediction of ICU admission cases. In this study, we identified maternal transfusion, labor and delivery characteristics, delivery methods, gestational age, maternal attributes, and newborn conditions as the most influential factors to categorize maternal ICU admission cases. In terms of model performance, the XGBoost consistently outperformed LR across various datasets, demonstrating higher accuracy, precision, and F1 scores. For recall, however, LR maintained higher scores, surpassing those of XGBoost. Moreover, the models consistently achieved higher scores when trained with all available features compared to those trained solely with the top features. Although the models demonstrated satisfactory performance in some evaluation metrics, there were notable deficiencies in recall and precision, which suggests further model refinement is needed to effectively predict these cases.</p>
127

Enhancing NFL Game Insights: Leveraging XGBoost For Advanced Football Data Analytics To Quantify Multifaceted Aspects Of Gameplay

Schoborg, Christopher P 01 January 2024 (has links) (PDF)
XGBoost, renowned for its efficacy in various statistical domains, offers enhanced precision and efficiency. Its versatility extends to both regression and categorization tasks, rendering it a valuable asset in predictive modeling. In this dissertation, I aim to harness the power of XGBoost to forecast and rank performances within the National Football League (NFL). Specifically, my research focuses on predicting the next play in NFL games based on pre-snap data, optimizing the draft ranking process by integrating data from the NFL combine, and collegiate statistics, creating a player rating system that can be compared across all positions, and evaluating strategic decisions for NFL teams when crossing the 50-yard line, including the feasibility of attempting a first down conversion versus opting for a field goal attempt.
128

High-variance multivariate time series forecasting using machine learning

Katardjiev, Nikola January 2018 (has links)
There are several tools and models found in machine learning that can be used to forecast a certain time series; however, it is not always clear which model is appropriate for selection, as different models are suited for different types of data, and domain-specific transformations and considerations are usually required. This research aims to examine the issue by modeling four types of machine- and deep learning algorithms - support vector machine, random forest, feed-forward neural network, and a LSTM neural network - on a high-variance, multivariate time series to forecast trend changes one time step in the future, accounting for lag.The models were trained on clinical trial data of patients in an alcohol addiction treatment plan provided by a Uppsala-based company. The results showed moderate performance differences, with a concern that the models were performing a random walk or naive forecast. Further analysis was able to prove that at least one model, the feed-forward neural network, was not undergoing this and was able to make meaningful forecasts one time step into the future. In addition, the research also examined the effec tof optimization processes by comparing a grid search, a random search, and a Bayesian optimization process. In all cases, the grid search found the lowest minima, though its slow runtimes were consistently beaten by Bayesian optimization, which contained only slightly lower performances than the grid search. / Det finns flera verktyg och modeller inom maskininlärning som kan användas för att utföra tidsserieprognoser, men det är sällan tydligt vilken modell som är lämplig vid val, då olika modeller är anpassade för olika sorts data. Denna forskning har som mål att undersöka problemet genom att träna fyra modeller - support vector machine, random forest, ett neuralt nätverk, och ett LSTM-nätverk - på en flervariabelstidserie med hög varians för att förutse trendskillnader ett tidssteg framåt i tiden, kontrollerat för tidsfördröjning. Modellerna var tränade på klinisk prövningsdata från patienter som deltog i en alkoholberoendesbehandlingsplan av ett Uppsalabaserat företag. Resultatet visade vissa moderata prestandaskillnader, och en oro fanns att modellerna utförde en random walk-prognos. I analysen upptäcktes det dock att den ena neurala nätverksmodellen inte gjorde en sådan prognos, utan utförde istället meningsfulla prediktioner. Forskningen undersökte även effekten av optimiseringsprocesser genomatt jämföra en grid search, random search, och Bayesisk optimisering. I alla fall hittade grid search lägsta minimumpunkten, men dess långsamma körtider blev konsistent slagna av Bayesisk optimisering, som även presterade på nivå med grid search.
129

Kompendium der Online-Forschung (DGOF)

Deutsche Gesellschaft für Online-Forschung e. V. (DGOF) 24 November 2021 (has links)
Die DGOF veröffentlicht hier digitale Kompendien zu aktuellen Themen der Online-Forschung mit Fachbeiträgen von Experten und Expertinnen aus der Branche.
130

Machine Learning Modeling of Polymer Coating Formulations: Benchmark of Feature Representation Schemes

Evbarunegbe, Nelson I 14 November 2023 (has links) (PDF)
Polymer coatings offer a wide range of benefits across various industries, playing a crucial role in product protection and extension of shelf life. However, formulating them can be a non-trivial task given the multitude of variables and factors involved in the production process, rendering it a complex, high-dimensional problem. To tackle this problem, machine learning (ML) has emerged as a promising tool, showing considerable potential in enhancing various polymer and chemistry-based applications, particularly those dealing with high dimensional complexities. Our research aims to develop a physics-guided ML approach to facilitate the formulations of polymer coatings. As the first step, this project focuses on finding machine-readable feature representation techniques most suitable for encoding formulation ingredients. Utilizing two polymer-informatics datasets, one encompassing a large set of 700,000 common homopolymers including epoxies and polyurethanes as coating base materials while the other a relatively small set of 1000 data points of epoxy-diluent formulations, four featurization schemes to represent polymer coating molecules were benchmarked. They include the molecular access system, the extended connectivity fingerprint, molecular graph-based chemical graph network, and graph convolutional network (MG-GCN) embeddings. These representation schemes were used with ensemble models to predict molecular properties including topological surface area and viscosity. The results show that the combination of MG-GCN and ensemble models such as the extreme boosting machine and random forest models achieved the best overall performance, with coefficient of determination (r2) values of 0.74 in topological surface area and 0.84 in viscosity, which compare favorably with existing techniques. These results lay the foundation for using ML with physical modeling to expedite the development of polymer coating formulations.

Page generated in 0.0966 seconds