Global ETD Search

61	Manifold Learning with Tensorial Network Laplacians Sanders, Scott 01 August 2021 (has links) The interdisciplinary field of machine learning studies algorithms in which functionality is dependent on data sets. This data is often treated as a matrix, and a variety of mathematical methods have been developed to glean information from this data structure such as matrix decomposition. The Laplacian matrix, for example, is commonly used to reconstruct networks, and the eigenpairs of this matrix are used in matrix decomposition. Moreover, concepts such as SVD matrix factorization are closely connected to manifold learning, a subfield of machine learning that assumes the observed data lie on a low-dimensional manifold embedded in a higher-dimensional space. Since many data sets have natural higher dimensions, tensor methods are being developed to deal with big data more efficiently. This thesis builds on these ideas by exploring how matrix methods can be extended to data presented as tensors rather than simply as ordinary vectors. tensors network laplacian image blending poisson equation Data Science Geometry and Topology Other Applied Mathematics
62	Neural Network Pruning for ECG Arrhythmia Classification Labarge, Isaac E 01 April 2020 (has links) Convolutional Neural Networks (CNNs) are a widely accepted means of solving complex classification and detection problems in imaging and speech. However, problem complexity often leads to considerable increases in computation and parameter storage costs. Many successful attempts have been made in effectively reducing these overheads by pruning and compressing large CNNs with only a slight decline in model accuracy. In this study, two pruning methods are implemented and compared on the CIFAR-10 database and an ECG arrhythmia classification task. Each pruning method employs a pruning phase interleaved with a finetuning phase. It is shown that when performing the scale-factor pruning algorithm on ECG, finetuning time can be expedited by 1.4 times over the traditional approach with only 10% of expensive floating-point operations retained, while experiencing no significant impact on accuracy. Convolutional Neural Network Pruning ECG Artificial Intelligence and Robotics Biomedical Data Science Software Engineering Theory and Algorithms
63	Řízení expanze e-commerce / E-commerce Expansion Management Sojka, David January 2021 (has links) This master thesis pursues a business model of an online store's international expansion in Europe. According to theoretical backgrounds, it evaluates the state of currently used methods of today's expansion strategy. Further, a management plan for systematic expansion is being proposed along with risk mitigations, stabilization of current international operations, and a proposal of a technological-information solution for data science.
64	Assessing Influential Users in Live Streaming Social Networks January 2019 (has links) abstract: Live streaming has risen to significant popularity in the recent past and largely this live streaming is a feature of existing social networks like Facebook, Instagram, and Snapchat. However, there does exist at least one social network entirely devoted to live streaming, and specifically the live streaming of video games, Twitch. This social network is unique for a number of reasons, not least because of its hyper-focus on live content and this uniqueness has challenges for social media researchers. Despite this uniqueness, almost no scientific work has been performed on this public social network. Thus, it is unclear what user interaction features present on other social networks exist on Twitch. Investigating the interactions between users and identifying which, if any, of the common user behaviors on social network exist on Twitch is an important step in understanding how Twitch fits in to the social media ecosystem. For example, there are users that have large followings on Twitch and amass a large number of viewers, but do those users exert influence over the behavior of other user the way that popular users on Twitter do? This task, however, will not be trivial. The same hyper-focus on live content that makes Twitch unique in the social network space invalidates many of the traditional approaches to social network analysis. Thus, new algorithms and techniques must be developed in order to tap this data source. In this thesis, a novel algorithm for finding games whose releases have made a significant impact on the network is described as well as a novel algorithm for detecting and identifying influential players of games. In addition, the Twitch network is described in detail along with the data that was collected in order to power the two previously described algorithms. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2019 Computer science Data Mining Data Science Influence Machine Learning Social Media Twitch
65	Data analytics and optimization methods in biomedical systems: from microbes to humans Wang, Taiyao 19 May 2020 (has links) Data analytics and optimization theory are well-developed techniques to describe, predict and optimize real-world systems, and they have been widely used in engineering and science. This dissertation focuses on applications in biomedical systems, ranging from the scale of microbial communities to problems relating to human disease and health care. Starting from the microbial level, the first problem considered is to design metabolic division of labor in microbial communities. Given a number of microbial species living in a community, the starting point of the analysis is a list of all metabolic reactions present in the community, expressed in terms of the metabolite proportions involved in each reaction. Leveraging tools from Flux Balance Analysis (FBA), the problem is formulated as a Mixed Integer Program (MIP) and new methods are developed to solve large scale instances. The strategies found reveal a large space of nuanced and non-intuitive metabolic division of labor opportunities, including, for example, splitting the Tricarboxylic Acid Cycle (TCA) cycle into two separate halves. More broadly, the landscape of possible 1-, 2-, and 3-strain solutions is systematically mapped at increasingly tight constraints on the number of allowed reactions. The second problem addressed involves the prediction and prevention of short-term (30-day) hospital re-admissions. To develop predictive models, a variety of classification algorithms are adapted and coupled with robust (regularized) learning and heuristic feature selection approaches. Using real, large datasets, these methods are shown to reliably predict re-admissions of patients undergoing general surgery, within 30-days of discharge. Beyond predictions, a novel prescriptive method is developed that computes specific control actions with the effect of altering the outcome. This method, termed Prescriptive Support Vector Machines (PSVM), is based on an underlying SVM classifier. Applied to the hospital re-admission data, it is shown to reduce 30-day re-admissions after surgery through better control of the patient’s pre-operative condition. Specifically, using the new method the patient’s pre-operative hematocrit is regulated through limited blood transfusion. In the last problem in this dissertation, a framework for parameter estimation in Regularized Mixed Linear Regression (MLR) problems is developed. In the specific MLR setting considered, training data are generated from a mixture of distinct linear models (or clusters) and the task is to identify the corresponding coefficient vectors. The problem is formulated as a Mixed Integer Program (MIP) subject to regularization constraints on the coefficient vectors. A number of results on the convergence of parameter estimates for MLR are established. In addition, experimental prediction results are presented comparing the prediction algorithm with mean absolute error regression and random forest regression, in terms of both accuracy and interpretability. Statistics Data science Flux balance analysis Machine learning Mixed linear regression Optimization Predictive and prescriptive models
66	Predicting Delays In Delivery Process Using Machine Learning-Based Approach Shehryar Shahid (9745388) 16 December 2020 (has links) <div>There has been a great interest in applying Data Science, Machine Learning, and AI-related technologies in recent years. Industries are adopting these technologies very rapidly, which has enabled them to gather valuable data about their businesses. One such industry that can leverage this data to improve their business's output and quality is the logistics and transport industry. This phenomenon provides an excellent opportunity for companies who rely heavily on air transportation to leverage this data to gain valuable insights and improve their business operations. This thesis is aimed to leverage this data to develop techniques to model complex business processes and design a machine learning-based predictive analytical approach to predict process violations.</div><div>This thesis focused on solving delays in shipment delivery by modeling a prediction technique to predict these delays. The approach presented here was based on real airfreight shipping data, which follows the International Air and Transport Association industry standard for airfreight transportation, to identify shipments at risk of being delayed. By leveraging the shipment process structure, this research presented a new approach that solved the complex event-driven structure of airfreight data that made it difficult to model for predictive analytics.</div><div>By applying different data mining and machine learning techniques, prediction techniques were developed to predict delays in delivering airfreight shipments. The prediction techniques were based on random forest and gradient boosting algorithms. To compare and select the best model, the prediction results were interpreted in the form of six confusion matrix-based performance metrics. The results showed that all the predictors had a high specificity of over 90%, but the sensitivity was low, under 44%. Accuracy was observed to be over 75%, and a geometric mean was between 58% – 64%.</div><div>The performance metrics results provided evidence that our approach could be implemented to develop a prediction technique to model complex business processes. Additionally, an early prediction method was designed to test predictors' performance if complete process information was not available. This proposed method delivered compelling evidence suggesting that early prediction can be achieved without compromising the predictor’s performance.</div> Applied Computer Science Predictive Modeling Machine Learning Decision Making Logistics information system Data Science
67	Aplicación de Data Science Specialist Ccora Camarena, Yuli, Jeri De La Cruz, Nélida, Enriquez Yance, Rosario Grace 14 January 2020 (has links) El trabajo de investigación que se presenta a continuación constituye el análisis de la problemática planteada sobre la empresa Travico Perú S.A.C, la cual ha reportado un descenso en sus ventas de sus diferentes servicios que ofrece. Para este desarrollo de este trabajo se ha aplicado la metodología de la ciencia de datos, con la cual se logró identificar las variables que influyeron en las ventas de todos los servicios durante los años 2016 al 2018, el conjunto de datos se obtuvo a través de plataformas con las que la empresa trabaja y los reportes de control interno, con ello, se identificaron 12 variables con 6429 datos. Así mismo, se empleó la técnica de aprendizaje automático no supervisado, basado en particiones: K means, las cual permitió segmentar y agrupar las variables que fueron seleccionadas. Finalmente, para el análisis, se presentaron distintas gráficas con los resultados de las ventas de la empresa y se hicieron comparaciones con los resultados de los clústeres. / The research work presented below constitutes the analysis of the problem raised about the company Travico Perú S.A.C, which has reported a decrease in its sales of its different services offered. For this development of this work, the methodology of data science has been applied, with which it has been identified to identify the variables that influenced the sales of all services during the years 2016 to 2018, the data set was achieved through of platforms with which the company works and internal control reports, thereby identifying 12 variables with 6429 data. Furthermore, we use a technique machine learning without supervised, based on partitions: K means the qualified segment and group the variables that were selected. Finally, for the analysis, different graphs are shown with the results of the company's sales and comparisons were made with the results of the clusters. / Trabajo de investigación Ciencia de datos Aprendizaje automático K-means Análisis de datos Data science Machine learning
68	Improving Recommendation Systems Using Image Data Åslin, Filip January 2022 (has links) Recommendation systems typically use historical interactions between users and items topredict what other items can be of interest to a user. The recommendations are based onpatterns in how users interact similarly with items. This thesis investigates if it is possible toimprove the quality of the recommendations by including more information about the items inthe model that predicts the recommendations. More specifically, the use of deep learning toextract information from item images is investigated. To do this, two types of collaborativefiltering models, based on historic interactions, are implemented. These models are thencompared to different collaborative filtering models that either make use of user and itemattributes, or images of the items. Three pre-trained image classification models are used toextract useful item features from the item images. The models are trained and evaluated using adataset of historic transactions and item images from the online sports shop Stadium, given bythe thesis supervisor. The results show no noticeable improvement in performance for themodels using the images compared to the models without images. The model using the userand item attributes performs the best, indicating that the collaborative filtering models can beimproved by giving it more information than just the historic interactions. Possible ways tofurther investigate using the image feature vectors in collaborative filtering models, as well asusing them to create better item attributes, are discussed and suggested for future work. Recommendation Systems Image Analysis Machine Learning Data Science Computer Sciences Datavetenskap (datalogi)
69	Topological Hierarchies and Decomposition: From Clustering to Persistence Brown, Kyle A. 27 May 2022 (has links) No description available. Computer Science topological data analysis hierarchical clustering exploratory data analysis topology clustering data science
70	GENERATIVE, PREDICTIVE, AND REACTIVE MODELS FOR DATA SCARCE PROBLEMS IN CHEMICAL ENGINEERING Nicolae Christophe Iovanac (11167785) 22 July 2021 (has links) <div>Data scarcity is intrinsic to many problems in chemical engineering due to physical constraints or cost. This challenge is acute in chemical and materials design applications, where a lack of data is the norm when trying to develop something new for an emerging application. Addressing novel chemical design under these scarcity constraints takes one of two routes: the traditional forward approach, where properties are predicted based on chemical structure, and the recent inverse approach, where structures are predicted based on required properties. Statistical methods such as machine learning (ML) could greatly accelerate chemical design under both frameworks; however, in contrast to the modeling of continuous data types, molecular prediction has many unique obstacles (e.g., spatial and causal relationships, featurization difficulties) that require further ML methods development. Despite these challenges, this work demonstrates how transfer learning and active learning strategies can be used to create successful chemical ML models in data scarce situations.<br></div><div>Transfer learning is a domain of machine learning under which information learned in solving one task is transferred to help in another, more difficult task. Consider the case of a forward design problem involving the search for a molecule with a particular property target with limited existing data, a situation not typically amenable to ML. In these situations, there are often correlated properties that are computationally accessible. As all chemical properties are fundamentally tied to the underlying chemical topology, and because related properties arise due to related moieties, the information contained in the correlated property can be leveraged during model training to help improve the prediction of the data scarce property. Transfer learning is thus a favorable strategy for facilitating high throughput characterization of low-data design spaces.</div><div>Generative chemical models invert the structure-function paradigm, and instead directly suggest new chemical structures that should display the desired application properties. This inversion process is fraught with difficulties but can be improved by training these models with strategically selected chemical information. Structural information contained within this chemical property data is thus transferred to support the generation of new, feasible compounds. Moreover, transfer learning approach helps ensure that the proposed structures exhibit the specified property targets. Recent extensions also utilize thermodynamic reaction data to help promote the synthesizability of suggested compounds. These transfer learning strategies are well-suited for explorative scenarios where the property values being sought are well outside the range of available training data.</div><div>There are situations where property data is so limited that obtaining additional training data is unavoidable. By improving both the predictive and generative qualities of chemical ML models, a fully closed-loop computational search can be conducted using active learning. New molecules in underrepresented property spaces may be iteratively generated by the network, characterized by the network, and used for retraining the network. This allows the model to gradually learn the unknown chemistries required to explore the target regions of chemical space by actively suggesting the new training data it needs. By utilizing active learning, the create-test-refine pathway can be addressed purely in silico. This approach is particularly suitable for multi-target chemical design, where the high dimensionality of the desired property targets exacerbates data scarcity concerns.</div><div>The techniques presented herein can be used to improve both predictive and generative performance of chemical ML models. Transfer learning is demonstrated as a powerful technique for improving the predictive performance of chemical models in situations where a correlated property can be leveraged alongside scarce experimental or computational properties. Inverse design may also be facilitated through the use of transfer learning, where property values can be connected with stable structural features to generate new compounds with targeted properties beyond those observed in the training data. Thus, when the necessary chemical structures are not known, generative networks can directly propose them based on function-structure relationships learned from domain data, and this domain data can even be generated and characterized by the model itself for closed-loop chemical searches in an active learning framework. With recent extensions, these models are compelling techniques for looking at chemical reactions and other data types beyond the individual molecule. Furthermore, the approaches are not limited by choice of model architecture or chemical representation and are expected to be helpful in a variety of data scarce chemical applications.</div> Machine Learning Inverse Design Data Science Chemical Engineering Computational Chemistry

Search results