Global ETD Search

121	Integrated Process Modeling and Data Analytics for Optimizing Polyolefin Manufacturing Sharma, Niket 19 November 2021 (has links) Polyolefins are one of the most widely used commodity polymers with applications in films, packaging and automotive industry. The modeling of polymerization processes producing polyolefins, including high-density polyethylene (HDPE), polypropylene (PP), and linear low-density polyethylene (LLDPE) using Ziegler-Natta catalysts with multiple active sites, is a complex and challenging task. In our study, we integrate process modeling and data analytics for improving and optimizing polyolefin manufacturing processes. Most of the current literature on polyolefin modeling does not consider all of the commercially important production targets when quantifying the relevant polymerization reactions and their kinetic parameters based on measurable plant data. We develop an effective methodology to estimate kinetic parameters that have the most significant impacts on specific production targets, and to develop the kinetics using all commercially important production targets validated over industrial polyolefin processes. We showcase the utility of dynamic models for efficient grade transition in polyolefin processes. We also use the dynamic models for inferential control of polymer processes. Thus, we showcase the methodology for making first-principle polyolefin process models which are scientifically consistent, but tend to be less accurate due to many modeling assumptions in a complex system. Data analytics and machine learning (ML) have been applied in the chemical process industry for accurate predictions for data-based soft sensors and process monitoring/control. Specifically, for polymer processes, they are very useful since the polymer quality measurements like polymer melt index, molecular weight etc. are usually less frequent compared to the continuous process variable measurements. We showcase the use of predictive machine learning models like neural networks for predicting polymer quality indicators and demonstrate the utility of causal models like partial least squares to study the causal effect of the process parameters on the polymer quality variables. ML models produce accurate results can over-fit the data and also produce scientifically inconsistent results beyond the operating data range. Thus, it is growingly important to develop hybrid models combining data-based ML models and first-principle models. We present a broad perspective of hybrid process modeling and optimization combining the scientific knowledge and data analytics in bioprocessing and chemical engineering with a science-guided machine learning (SGML) approach and not just the direct combinations of first-principle and ML models. We present a detailed review of scientific literature relating to the hybrid SGML approach, and propose a systematic classification of hybrid SGML models according to their methodology and objective. We identify the themes and methodologies which have not been explored much in chemical engineering applications, like the use of scientific knowledge to help improve the ML model architecture and learning process for more scientifically consistent solutions. We apply these hybrid SGML techniques to industrial polyolefin processes such as inverse modeling, science guided loss and many others which have not been applied previously to such polymer applications. / Doctor of Philosophy / Almost everything we see around us from furniture, electronics to bottles, cars, etc. are made fully or partially from plastic polymers. The two most popular polymers which comprise almost two-thirds of polymer production globally are polyethylene (PE) and polypropylene (PP), collectively known as polyolefins. Hence, the optimization of polyolefin manufacturing processes with the aid of simulation models is critical and profitable for chemical industry. Modeling of a chemical/polymer process is helpful for process-scale up, product quality estimation/monitoring and new process development. For making a good simulation model, we need to validate the predictions with actual industrial data. Polyolefin process has complex reaction kinetics with multiple parameters that need to be estimated to accurately match the industrial process. We have developed a novel strategy for estimating the kinetics for the model, including the reaction chemistry and the polymer quality information validating with industrial process. Thus, we have developed a science-based model which includes the knowledge of reaction kinetics, thermodynamics, heat and mass balance for the polyolefin process. The science-based model is scientifically consistent, but may not be very accurate due to many model assumptions. Therefore, for applications requiring very high accuracy predicting any polymer quality targets such as melt index (MI), density, data-based techniques might be more appropriate. Recently, we may have heard a lot about artificial intelligence (AI) and machine learning (ML) the basic principle behind these methods is to making the model learn from data for prediction. The process data that are measured in a chemical/polymer plant can be utilized for data analysis. We can build ML models to predict polymer targets like MI as a function of the input process variables. The ML model predictions are very accurate in the process operating range of the dataset on which the model is learned, but outside the prediction range, they may tend to give scientifically inconsistent results. Thus, there is a need to combine the data-based models and scientific models. In our research, we showcase novel approaches to integrate the science-based models and the data-based ML methodology which we term as the hybrid science-guided machine learning methods (SGML). The hybrid SGML methods applied to polyolefin processes yield not only accurate, but scientifically consistent predictions which can be used for polyolefin process optimization for applications like process development and quality monitoring. polyolefins polymers process modeling Machine learning data analytics hybrid Machine learning science-guided Machine learning
122	Analysis of Information Diffusion through Social Media Khalili, Nastaran 16 June 2021 (has links) The changes in the course of communication changed the world from different perspectives. Public participation on social media means the generation, diffusion, and exposure to a tremendous amount of user-generated content without supervision. This four-essay dissertation analyzes information diffusion through social media and its opportunities and challenges through management systems engineering and data analytics. First, we evaluate how information can be shared to reach maximum exposure for the case on online petitions. We use system dynamics modeling and propose policies for campaign managers to schedule the reminders they send to have the highest number of petition signatures. We find that sending reminders is more effective in the case of increasing the signature rate. In the second essay, we investigate how people build trust/ mistrust in science during an emergency. We use data analytics methods on more than 700,000 tweets containing keywords of Hydroxychloroquine and chloroquine, two candidate medicines, to prevent and cure patients infected with COVID-19. We show that people's opinions are concentrated in the case of polarity and spread out in the case of subjectivity. Also, they tend to share subjective tweets than objective ones. In the third essay, building on the same dataset as essay two, we study the changes in science communication during the coronavirus pandemic. We used topic modeling and clustered the tweets into seven different groups. Our analysis suggests that a highly scientific and health-related subject can become political in the case of an emergency. We found that the groups of medical information and research and study have fewer tweets than the political one. Fourth, we investigated fake news diffusion as one of the main challenges of user-generated content. We built a system dynamics model and analyzed the effects of competition and correction in combating fake news. We show that correction of misinformation and competition in fake news needs a high percentage of participation to be effective enough to deal with fake news. / Doctor of Philosophy / The prevalence of social media among people has changed information diffusion in several ways. This change caused the emergence of a variety of opportunities and challenges. We discuss instances of these in this dissertation in four main essays. In the first essay, we study online social and political campaigns. Considering the main goal of campaign managers is to gain the highest reach and signatures, we generate a model to show the effects of sending reminders after the initial announcement and its schedule on the final total number of signatures. We found that the best policy for online petition success is sending reminders when people are increasingly signing it rather than when people lose interest in it. In the second essay, we investigated how people build trust/ mistrust in scientific information in emergency cases. We used public tweets about two candidate medicines to prevent and treat patients infected with COVID-19 and analyzed them. Our results suggest that people trust and retweet the information that is based on emotions and judgments more than the one containing facts. We evaluated the science communication during the mentioned emergency by further investigating the same dataset in the third essay. We clustered all the tweets based on the words they used into seven different groups and labeled each of them. Then, we focused on three groups of medical, research and study, and political. Our analysis suggests that although the subject is a health-related scientific one, the number of tweets in the political group is greater than the other clusters. In the fourth essay, we analyzed the fake news diffusion through social media and the effects of correction and competition on it. In this context, correction means the reaction to misinformation that states its falsity or provides counter facts based on truth. We created a model and simulated it for the competition considering novelty as one influential factor of sharing. The results of this study reveal that active participation in correction and competition is needed to combat fake news effectively. Social Media Information Misinformation Communication System Dynamics Data Analytics Natural Language Processing
123	The Impact of Sleep Disorders on Driving Safety - Findings from the SHRP2 Naturalistic Driving Study Liu, Shuyuan 15 June 2017 (has links) This study is the first examination on the association between seven types of sleep disorder and driving risk using large-scale naturalistic driving study data involving more than 3,400 participants. Regression analyses revealed that females with restless leg syndrome or sleep apnea and drivers with insomnia, shift work sleep disorder, or periodic limb movement disorder are associated with significantly higher driving risk than other drivers without those conditons. Furthermore, despite a small number of observations, there is a strong indication of increased risk for narcoleptic drivers. The findings confirmed results from simulator and epidemiological studies that the driving risk increases amongst people with certain types of sleep disorders. However, this study did not yield evidence in naturalistic driving settings to confirm significantly increased driving risk associated with migraine in prior research. The inconsistency may be an indication that the significant decline in cognitive performance among drivers with sleep disorders observed in laboratory settings may not nessarily translate to an increase in actual driving risk. Further research is necessary to define how to incentivize drivers with specific sleep disorders to balance road safety and personal mobility. / Master of Science / This study is the first examination on the association between seven types of sleep disorder and driving risk using large-scale naturalistic driving study data involving more than 3,400 participants. The study identified seven sleep disorders - narcolepsy, sleep apnea, insomnia, shift work sleep disorder, restless legs syndrome, periodic limb movement disorder, and migraine among the participants and revealed that that females with restless leg syndrome or sleep apnea and drivers with insomnia, shift work sleep disorder, or periodic limb movement disorder are associated with significantly higher driving risk than other drivers without those conditons. Furthermore, despite a small number of observations, there is a strong indication of increased risk for narcoleptic drivers. The findings confirmed most results from previous simulator and epidemiological studies that the driving risk increased amongst people with certain types of sleep disorders except for those with migraines – there is no evidence showing increased driving risk associated with drivers with migraine. The inconsistency may be an indication that the significant decline in cognitive performance among drivers with sleep disorders observed in laboratory settings may not nessarily translate to an increase in actual driving risk. The public and private sectors can use the results to target their investments in supporting high risk individuals. And physicians now have more representative data on the level of risk in real world driving and thus more able to practice evidence-based medicine in consulting their patients with sleep disorders regarding driving safety and personal mobility. Driving safety sleep disorder naturalistic driving study SHRP2 crash near-crash data analytics
124	Advanced Process Design and Modeling Methods for Sustainable and Energy Efficient Processes McNeeley, Adam M. 06 January 2025 (has links) Chemical engineering, as a discipline, uses knowledge of chemistry, thermodynamics, and transport to process and refine resources on a global scale. The chemical processing industry has an enormous impact on global energy consumption and contributes to climate change. Chemical engineers play a major role in the transition of the chemical industry away from fossil fuels and develop more sustainable and efficient methods to produce commodities. To achieve this goal, new chemical and processing technologies must be developed. It is critical in these early stages of development to identify chemical and processing pathways that are both practical and economically competitive to existing technologies. With the goal of increasing the speed of developing and implementing new chemical and processing technologies, screening and early stage evaluation is essential to guiding research towards the most promising new processes and chemical pathways. This work focuses on the investigation of new chemical processing technologies, which have received academic attention, but have not been evaluated in the context of practical implementation, process design, or energy consumption. We investigate the background of these new technologies and compare them to the conventional counterparts. We present chemical and operational insights gained from industrial patents to develop feasible process designs that inform the operation and demonstrate drastic improvements possible with established heat integration and process intensification techniques. One technology we investigate is aromatics separation from petroleum feedstocks using new ionic liquid (IL) solvents. ILs are very popular in literature to replace conventional organic solvents with their main novelty being non-volatility. A practically limitless number of ILs with different properties can be synthesized introducing the potential to develop IL solvents tailored to specific applications. We investigate the potential of ILs for aromatic extraction by first developing a methodology to model the process and capture molecular interactions between the solvent and typical hydrocarbons. We then developed an IL specific process design that overcomes the challenges related to the target feedstock. We finally determined the ideal IL solvent properties for the target application investigated. We simulate and optimize designs considering 16 different ILs and use the data to correlate solvent properties to key process variables and total process energy demand. We demonstrate that 11 of the 16 ILs require less energy compared to the conventional solvent with the best performing IL reduced energy demand by 43%. Another technology we investigate is chemical recycling of poly(ethylene terephthalate) (PET), commonly used in bottles, textiles, and packaging. Chemical recycling converts waste PET into monomers that can be reprocessed into PET polymer. The monomer products are easier to purify, and chemical recycling expands the scope of recyclable waste material. There are three PET chemical recycling pathways considered by industry and academia: glycolysis, methanolysis, and hydrolysis. We investigate the fundamental differences between these chemical pathways and highlight how differences in physical and chemical properties of reactants and products lead to processing differences. We use a combination of industrial literature review and design knowledge to develop the first complete process configurations for each depolymerization pathway. We demonstrate heat integration and process intensifications that drastically reduce energy demand. We use the combination of process design and literature to compare the designs and discuss uncertainties and advantages and disadvantages. Heat integrated continuous PET chemical recycling processes can be expected to consume between 6,000 – 10,000 kJ/kg PET regardless of the depolymerization route. Continuing the trend of investigating chemical recycling of polymers we consider nylon 6, the most widely produced polyamide used for electronics, automotive parts, and textiles. Nylon 6 polymer is readily converted to its monomer caprolactam with or without the use of water as a solvent. While the recycling of post-consumer nylon 6 waste has been limited, the recovery and recycling of nylon 6 scrap and oligomers is well known. We identify the three processing routes commonly used to produce caprolactam from nylon 6: liquid-phase hydrolysis, steam stripping, and solvent-free depolymerization. We identify decomposition reactions and use experimental data to develop a kinetic model for nylon 6 depolymerization. We incorporate the kinetic model into process models for the different processing routes and demonstrate novel process intensifications to drastically reduce energy demand. We compare and discuss potential applications for each process configuration processing different types of post-consumer waste. Concluding the topic of chemical recycling of polymers, we investigate nylon 66 depolymerization, which despite chemical similarities to nylon 6, is hardly considered for chemical recycling. We provide an overview of the different chemical recycling pathways proposed in literature including acid and alkaline hydrolysis, and ammonolysis. We use experimental data to develop a novel activity coefficient based kinetic model for nylon 66 hydrolysis and add degradation reactions to present the first alkaline hydrolysis process design for nylon 66. We investigate different sections of the process and operation sensitivity to design assumptions and provide a comparison to the similar PET alkaline hydrolysis process. We find the nylon 66 alkaline hydrolysis process has favorable energy demand and is deserving of further evaluation for commercial implementation. Overall, this work has advanced the aromatic extraction technology and chemical recycling of step growth polymers. We demonstrate broad and systematic methods of incorporating data from academic and industrial evaluations to produce practical and thermodynamically consistent process models. We use these models to describe the reactions, separations, and purifications of new technologies to quantify energy demands and where operational or data uncertainties exist to focus future research. We use the defined process flows and separations to demonstrate process intensifications that drastically reduce process energy demand by as much as 70%, which can alter conclusions and favorability of certain process configurations. / Doctor of Philosophy / Chemical engineering plays a critical role in the global efforts to transition from fossil fuels to renewable and sustainable resources. This includes improving energy efficiency of existing chemical processes, improving processes to consume less raw materials, and developing new pathways to produce chemicals traditionally derived from fossil fuels. Academic chemical engineering research focuses on developing new chemicals and chemical processes to aid in this effort. There are a vast number of new chemicals and processes investigated in academia, but it is extremely rare that these advance beyond a conceptual or lab-scale, which limits the contribution of the research towards solving the problems it aims to address. We use our expertise in process design, modeling, and the general ability to understand how technology advances from concept to implementation. We take new chemicals or reaction pathways and conceptualize practical designs or implementations of the technology at commercial scale. We use the development of the designs to rank and screen favorability of new technologies against other new or conventional technologies, approximate the relative complexity and resource consumption, and identify important parts of the process where data is critical for continued development or a more accurate assessment of technological viability. In this way, we guide research for new technologies to increase the speed and likelihood of real-world implementation and impact. In this dissertation, we consider the application of a new type of solvents, claimed to be 'green', that are used to separate petroleum products, and recycling processes for plastics that convert the plastic to chemicals, which are purified and converted back to the original plastic. The results of our work demonstrate the new type of solvents we investigated have properties that can reduce the energy demand of the process for which they are proposed by almost 50% using a novel design concept we developed. Despite the potential of these solvents, we raise concerns about uncertainties related to their practical implementation that require resolution. For the chemical recycling of plastics, we demonstrate a disconnect between academic focus and industrial practice. We develop some of the first models for several waste plastic chemical recycling processes to demonstrate how the plastics are chemically converted and purified to be suitable for consumer use. We compare different methods to recycle specific types of plastic, providing insight into the advantages and disadvantages of each method, considering applications for which they are most suitable, and indicating where further research is best applied. We demonstrate that these processes, using advanced processing techniques, can drastically reduce energy demand, in some cases by as much as 70%. Process Design Chemical Recycling Aromatic Extraction Process Intensification Kinetic Modeling Data Analytics Process Modeling
125	Data Integration Methodologies and Services for Evaluation and Forecasting of Epidemics Deodhar, Suruchi 31 May 2016 (has links) Most epidemiological systems described in the literature are built for evaluation and analysis of specific diseases, such as Influenza-like-illness. The modeling environments that support these systems are implemented for specific diseases and epidemiological models. Hence they are not reusable or extendable. This thesis focuses on the design and development of an integrated analytical environment with flexible data integration methodologies and multi-level web services for evaluation and forecasting of various epidemics in different regions of the world. The environment supports analysis of epidemics based on any combination of disease, surveillance sources, epidemiological models, geographic regions and demographic factors. The environment also supports evaluation and forecasting of epidemics when various policy-level and behavioral interventions are applied, that may inhibit the spread of an epidemic. First, we describe data integration methodologies and schema design, for flexible experiment design, storage and query retrieval mechanisms related to large scale epidemic data. We describe novel techniques for data transformation, optimization, pre-computation and automation that enable flexibility, extendibility and efficiency required in different categories of query processing. Second, we describe the design and engineering of adaptable middleware platforms based on service-oriented paradigms for interactive workflow, communication, and decoupled integration. This supports large-scale multi-user applications with provision for online analysis of interventions as well as analytical processing of forecast computations. Using a service-oriented architecture, we have provided a platform-as-a-service representation for evaluation and forecasting of epidemics. We demonstrate the applicability of our integrated environment through development of the applications, DISIMS and EpiCaster. DISIMS is an interactive web-based system for evaluating the effects of dynamic intervention strategies on epidemic propagation. EpiCaster is a situation assessment and forecasting tool for projecting the state of evolving epidemics such as flu and Ebola in different regions of the world. We discuss how our platform uses existing technologies to solve a novel problem in epidemiology, and provides a unique solution on which different applications can be built for analyzing epidemic containment strategies. / Ph. D. Computational Epidemiology Data Integration Big Data Analytics Epidemic Forecasting Web Services Service Oriented Architecture Predictive Analytics
126	Marco teórico y estudios de caso para la mejora en la optimización de la red de agencias de una empresa bancaria en Lima Metropolitana Briones Gallegos, Fernando David 15 June 2021 (has links) La investigación toma sustento debido al proceso importante de transformación digital que están afrontando los bancos, lo cual implica una nueva estrategia de canales y educar a sus clientes a usar más aplicativos digitales. Esto es clave si estas organizaciones desean mantener una supervivencia en el mediano plazo debido a que hoy están saliendo nuevos competidores en el mercado. El objetivo de la investigación es identificar las fuentes teóricas que ayuden a plantear la mejor solución para la problemática identificada al momento de realizar un diagnóstico de los procesos en el Banco ABC: mejora del proceso de optimización de canales físicos usando marketing analytics y minería de datos. Como sustentos teóricos, toma como base algoritmos de machine learning de clustering relacionados a los modelos k-means y regresión multivariada. El procedimiento consiste en investigar en distintas fuentes académicas herramientas de diagnóstico de procesos, herramientas de la propuesta de mejora como conceptos de marketing analytics y minería de datos o algoritmos como regresiones y clustering. Finalmente, se analiza 3 casos que plantean problemáticas similares a la que se desea abordar en distintas industrias para poder comparar metodologías a seguir. Como resultados, se pudo consolidar una lista completa de conceptos sólidos del marco teórico que ayuden a sustentar la solución planteada, además, en los 3 casos planteados se identificó que existe un procedimiento claro de cómo abordar un problema de clustering. Como conclusión principal, se resume en que hoy existe mucha información sobre estos temas y casos prácticos como los que se abordan para poder sustentar cualquier propuesta de marketing analytics para una problemática en especifica. Se sugiere a los lectores manejar conceptos teóricos previos de estadística aplicada y algoritmos más sencillos como regresiones lineales para que pueda ser fácilmente entendible la teoría abordada al momento de buscar información de este tipo. Big Data/Analytics Minería de datos
127	WiSDM: a platform for crowd-sourced data acquisition, analytics, and synthetic data generation Choudhury, Ananya 15 August 2016 (has links) Human behavior is a key factor influencing the spread of infectious diseases. Individuals adapt their daily routine and typical behavior during the course of an epidemic -- the adaptation is based on their perception of risk of contracting the disease and its impact. As a result, it is desirable to collect behavioral data before and during a disease outbreak. Such data can help in creating better computer models that can, in turn, be used by epidemiologists and policy makers to better plan and respond to infectious disease outbreaks. However, traditional data collection methods are not well suited to support the task of acquiring human behavior related information; especially as it pertains to epidemic planning and response. Internet-based methods are an attractive complementary mechanism for collecting behavioral information. Systems such as Amazon Mechanical Turk (MTurk) and online survey tools provide simple ways to collect such information. This thesis explores new methods for information acquisition, especially behavioral information that leverage this recent technology. Here, we present the design and implementation of a crowd-sourced surveillance data acquisition system -- WiSDM. WiSDM is a web-based application and can be used by anyone with access to the Internet and a browser. Furthermore, it is designed to leverage online survey tools and MTurk; WiSDM can be embedded within MTurk in an iFrame. WiSDM has a number of novel features, including, (i) ability to support a model-based abductive reasoning loop: a flexible and adaptive information acquisition scheme driven by causal models of epidemic processes, (ii) question routing: an important feature to increase data acquisition efficacy and reduce survey fatigue and (iii) integrated surveys: interactive surveys to provide additional information on survey topic and improve user motivation. We evaluate the framework's performance using Apache JMeter and present our results. We also discuss three other extensions of WiSDM: Adapter, Synthetic Data Generator, and WiSDM Analytics. The API Adapter is an ETL extension of WiSDM which enables extracting data from disparate data sources and loading to WiSDM database. The Synthetic Data Generator allows epidemiologists to build synthetic survey data using NDSSL's Synthetic Population as agents. WiSDM Analytics empowers users to perform analysis on the data by writing simple python code using Versa APIs. We also propose a data model that is conducive to survey data analysis. / Master of Science Human Computation Crowd Sourcing Synthetic Dataset Data Analytics Data Visualization Epidemiology WiSDM
128	Exploring the Landscape of Big Data Analytics Through Domain-Aware Algorithm Design Dash, Sajal 20 August 2020 (has links) Experimental and observational data emerging from various scientific domains necessitate fast, accurate, and low-cost analysis of the data. While exploring the landscape of big data analytics, multiple challenges arise from three characteristics of big data: the volume, the variety, and the velocity. High volume and velocity of the data warrant a large amount of storage, memory, and compute power while a large variety of data demands cognition across domains. Addressing domain-intrinsic properties of data can help us analyze the data efficiently through the frugal use of high-performance computing (HPC) resources. In this thesis, we present our exploration of the data analytics landscape with domain-aware approximate and incremental algorithm design. We propose three guidelines targeting three properties of big data for domain-aware big data analytics: (1) explore geometric and domain-specific properties of high dimensional data for succinct representation, which addresses the volume property, (2) design domain-aware algorithms through mapping of domain problems to computational problems, which addresses the variety property, and (3) leverage incremental arrival of data through incremental analysis and invention of problem-specific merging methodologies, which addresses the velocity property. We demonstrate these three guidelines through the solution approaches of three representative domain problems. We present Claret, a fast and portable parallel weighted multi-dimensional scaling (WMDS) tool, to demonstrate the application of the first guideline. It combines algorithmic concepts extended from the stochastic force-based multi-dimensional scaling (SF-MDS) and Glimmer. Claret computes approximate weighted Euclidean distances by combining a novel data mapping called stretching and Johnson Lindestrauss' lemma to reduce the complexity of WMDS from O(f(n)d) to O(f(n) log d). In demonstrating the second guideline, we map the problem of identifying multi-hit combinations of genetic mutations responsible for cancers to weighted set cover (WSC) problem by leveraging the semantics of cancer genomic data obtained from cancer biology. Solving the mapped WSC with an approximate algorithm, we identified a set of multi-hit combinations that differentiate between tumor and normal tissue samples. To identify three- and four-hits, which require orders of magnitude larger computational power, we have scaled out the WSC algorithm on a hundred nodes of Summit supercomputer. In demonstrating the third guideline, we developed a tool iBLAST to perform an incremental sequence similarity search. Developing new statistics to combine search results over time makes incremental analysis feasible. iBLAST performs (1+δ)/δ times faster than NCBI BLAST, where δ represents the fraction of database growth. We also explored various approaches to mitigate catastrophic forgetting in incremental training of deep learning models. / Doctor of Philosophy / Experimental and observational data emerging from various scientific domains necessitate fast, accurate, and low-cost analysis of the data. While exploring the landscape of big data analytics, multiple challenges arise from three characteristics of big data: the volume, the variety, and the velocity. Here volume represents the data's size, variety represents various sources and formats of the data, and velocity represents the data arrival rate. High volume and velocity of the data warrant a large amount of storage, memory, and computational power. In contrast, a large variety of data demands cognition across domains. Addressing domain-intrinsic properties of data can help us analyze the data efficiently through the frugal use of high-performance computing (HPC) resources. This thesis presents our exploration of the data analytics landscape with domain-aware approximate and incremental algorithm design. We propose three guidelines targeting three properties of big data for domain-aware big data analytics: (1) explore geometric (pair-wise distance and distribution-related) and domain-specific properties of high dimensional data for succinct representation, which addresses the volume property, (2) design domain-aware algorithms through mapping of domain problems to computational problems, which addresses the variety property, and (3) leverage incremental data arrival through incremental analysis and invention of problem-specific merging methodologies, which addresses the velocity property. We demonstrate these three guidelines through the solution approaches of three representative domain problems. We demonstrate the application of the first guideline through the design and development of Claret. Claret is a fast and portable parallel weighted multi-dimensional scaling (WMDS) tool that can reduce the dimension of high-dimensional data points. In demonstrating the second guideline, we identify combinations of cancer-causing gene mutations by mapping the problem to a well known computational problem known as the weighted set cover (WSC) problem. We have scaled out the WSC algorithm on a hundred nodes of Summit supercomputer to solve the problem in less than two hours instead of an estimated hundred years. In demonstrating the third guideline, we developed a tool iBLAST to perform an incremental sequence similarity search. This analysis was made possible by developing new statistics to combine search results over time. We also explored various approaches to mitigate the catastrophic forgetting of deep learning models, where a model forgets to perform machine learning tasks efficiently on older data in a streaming setting. Big data analytics High-performance computing Algorithmic machine learning Incremental algorithm Approximate algorithm
129	Building and Evaluating a Learning Environment for Data Structures and Algorithms Courses Fouh Mbindi, Eric Noel 29 April 2015 (has links) Learning technologies in computer science education have been most closely associated with teaching of programming, including automatic assessment of programming exercises. However, when it comes to teaching computer science content and concepts, learning technologies have not been heavily used. Perhaps the best known application today is Algorithm Visualization (AV), of which there are hundreds of examples. AVs tend to focus on presenting the procedural aspects of how a given algorithm works, rather than more conceptual content. There are also new electronic textbooks (eTextbooks) that incorporate the ability to edit and execute program examples. For many traditional courses, a longstanding problem is lack of sufficient practice exercises with feedback to the student. Automated assessment provides a way to increase the number of exercises on which students can receive feedback. Interactive eTextbooks have the potential to make it easy for instructors to introduce both visualizations and practice exercises into their courses. OpenDSA is an interactive eTextbook for data structures and algorithms (DSA) courses. It integrates tutorial content with AVs and automatically assessed interactive exercises. Since Spring 2013, OpenDSA has been regularly used to teach a fundamental data structures and algorithms course (CS2), and also a more advanced data structures, algorithms, and analysis course (CS3) at various institutions of higher education. In this thesis, I report on findings from early adoption of the OpenDSA system. I describe how OpenDSA's design addresses obstacles in the use of AV systems. I identify a wide variety of use for OpenDSA in the classroom. I found that instructors used OpenDSA exercises as graded assignments in all the courses where it was used. Some instructors assigned an OpenDSA assignment before lectures and started spending more time teaching higher-level concepts. OpenDSA also supported implementing a ``flipped classroom'' by some instructors. I found that students are enthusiastic about OpenDSA and voluntarily used the AVs embedded within OpenDSA. Students found OpenDSA beneficial and expressed a preference for a class format that included using OpenDSA as part of the assigned graded work. The relationship between OpenDSA and students' performance was inconclusive, but I found that students with higher grades tend to complete more exercises. / Ph. D. Computer Science Education eTextbook Learning Technology Automated Assessment Algorithm Visualization Student Evaluation Data Analytics
130	Forecasting Large-scale Time Series Data Hartmann, Claudio 03 December 2018 (has links) The forecasting of time series data is an integral component for management, planning, and decision making in many domains. The prediction of electricity demand and supply in the energy domain or sales figures in market research are just two of the many application scenarios that require thorough predictions. Many of these domains have in common that they are influenced by the Big Data trend which also affects the time series forecasting. Data sets consist of thousands of temporal fine grained time series and have to be predicted in reasonable time. The time series may suffer from noisy behavior and missing values which makes modeling these time series especially hard, nonetheless accurate predictions are required. Furthermore, data sets from different domains exhibit various characteristics. Therefore, forecast techniques have to be flexible and adaptable to these characteristics. Long-established forecast techniques like ARIMA and Exponential Smoothing do not fulfill these new requirements. Most of the traditional models only represent one individual time series. This makes the prediction of thousands of time series very time consuming, as an equally large number of models has to be created. Furthermore, these models do not incorporate additional data sources and are, therefore, not capable of compensating missing measurements or noisy behavior of individual time series. In this thesis, we introduce CSAR (Cross-Sectional AutoRegression Model), a new forecast technique which is designed to address the new requirements on forecasting large-scale time series data. It is based on the novel concept of cross-sectional forecasting that assumes that time series from the same domain follow a similar behavior and represents many time series with one common model. CSAR combines this new approach with the modeling concept of ARIMA to make the model adaptable to the various properties of data sets from different domains. Furthermore, we introduce auto.CSAR, that helps to configure the model and to choose the right model components for a specific data set and forecast task. With CSAR, we present a new forecast technique that is suited for the prediction of large-scale time series data. By representing many time series with one model, large data sets can be predicted in short time. Furthermore, using data from many time series in one model helps to compensate missing values and noisy behavior of individual series. The evaluation on three real world data sets shows that CSAR outperforms long-established forecast techniques in accuracy and execution time. Finally, with auto.CSAR, we create a way to apply CSAR to new data sets without requiring the user to have extensive knowledge about our new forecast technique and its configuration. info:eu-repo/classification/ddc/004 ddc:004

Search results