Global ETD Search

581	Anomaly detection methods for detecting cyber attacks in industrial control systems Liu, Jessamyn. January 2020 (has links) Thesis: S.M., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, September, 2020 / Cataloged from PDF version of thesis. / Includes bibliographical references (pages 119-123). / Industrial control systems (ICS) are pervasive in modern society and increasingly under threat of cyber attack. Due to the critical nature of these systems, which govern everything from power and wastewater plants to refineries and manufacturing, a successful ICS cyber attack can result in serious physical consequences. This thesis evaluates multiple anomaly detection methods to quickly and accurately detect ICS cyber attacks. Two fundamental challenges in developing ICS cyber attack detection methods are the lack of historical attack data and the ability of attackers to make their malicious activity appear normal. The goal of this thesis is to develop methods which generalize well to anomalies that are not included in the training data and to increase the sensitivity of detection methods without increasing the false alarm rate. The thesis presents and analyzes a baseline detection method, the multivariate Shewhart control chart, and four extensions to the Shewhart chart which use machine learning or optimization methods to improve detection performance. Two of these methods, stationary subspace analysis and maximized ratio divergence analysis, are based on dimensionality reduction techniques, and an additional model-based method is implemented using residuals from LASSO regression models. The thesis also develops an ensemble method which uses an optimization formulation to combine the output of multiple models in a way that minimizes detection delay. When evaluated on 380 samples from the Kasperskey Tennessee Eastman process dataset, a simulated chemical process that includes disruptions from cyber attacks, the ensemble method reduced detection delay on attack data by 12% (55 minutes) on average when compared to the baseline method and was 9% (42 minutes) faster on average than the method which performed best on training data. / by Jessamyn Liu. / S.M. / S.M. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center Operations Research Center.
582	Dynamic node clustering in hierarchical optical data center network architectures Dimaki, Georgia. January 2020 (has links) Thesis: S.M., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, September, 2020 / Cataloged from PDF version of thesis. / Includes bibliographical references (pages 127-134). / During the past decade an increasing trend in the Data Center Network's traffic has been observed. This traffic is characterized mostly by many small bursty flows (mice) that last for less than few milliseconds as well as a few heavier more persistent (elephant) flows between certain number of nodes. As a result many relatively underutilized network links become momentarily hotspots with increased chance of packet loss. A potential solution could be given by Reconfigurable Optical Data Centers, due to higher traffic aggregation links and topology adaptation capabilities. An example is a novel two level hierarchical WDM-Based scalable Data Center Network architecture, RHODA, which is based on the interconnection of high speed equal sized clusters of Racks. We study the traffic based dynamic cluster membership reconfiguration of the Racks. Main goal is to maintain a near optimal network operation with respect to minimization of the inter cluster traffic, while emphasising better link utilization and network scalability. We present four algorithms, two deterministic greedy and two stochastic iterative, and discuss the tradeoffs of their use. Our results draw two main conclusion: 1) Stochastic iterative algorithms are more suitable for dynamic traffic based reconfiguration 2) Fast algorithmic deployments come at a price of reduced optimality / by Georgia Dimaki. / S.M. / S.M. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center Operations Research Center.
583	Structure, dynamics, and inference in networks Chodrow, Philip S.(Philip Samuel) January 2020 (has links) Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, September, 2020 / Cataloged from student-submitted PDF of thesis. / Includes bibliographical references (pages 187-203). / Networks offer a unified, conceptual formalism for reasoning about complex, relational systems. While pioneering work in network science focused primarily on the ability of "universal" models to explain the features of observed systems, contemporary research increasingly focuses on challenges and opportunities for data analysis in complex systems. In this thesis we study four problems, each of which is informed by the need for theory-informed modeling in network data science. The first chapter is a study of binary-state adaptive voter models (AVMs). AVMs model the emergence of global opinion-based network polarization from localized decision-making, doing so through a simple coupling of node and edge states. This coupling yields rich behavior, including phase transitions and low-dimensional quasistable manifolds. However, the coupling also makes these models extremely difficult to analyze. / Exploiting a novel asymmetry in the local dynamics, we provide low-dimensional approximations of unprecedented accuracy for one AVM variant, and of competitive accuracy for another. In the second chapter, we continue our focus on fragmentation in social systems with a study of spatial segregation. While the question of how to measure and quantify segregation has received extensive treatment in the sociological literature, this treatment tends to be mathematically disjoint. This results in scholars often re-proving the same results for special cases of measures, and grappling with incomparable methods for incorporating the role of space in their analyses. We provide contributions to address each of these issues. With respect to the first, we unify a large body of extant segregation measures through the calculus of Bregman divergences, showing that the most popular measures are instantiations of generalized mutual informations. / We then formulate a microscopic measure of spatial structure - the local information density - and prove a novel information-geometric result in order to measure it on real data in the common case in which the data is embedded in planar network. Using these tools, we are then able to formulate and evaluate several network-based regionalization algorithms for multiscale spatial analysis. We then take up two questions in null random graph modeling. The first of these develops a family of null random models for hypergraphs, the natural mathematical representation of polyadic networks in which multiple entities interact simultaneously. We formulate two distributions over spaces of hypergraphs subject to fixed node degree and edge dimension sequences, and provide Markov Chain Monte Carlo algorithms for sampling from them. We then conduct a sequence of experiments to highlight the role of hypergraph configuration models in the data science of polyadic networks. / We show that (a) the use of hypergraph nulls can lead to directionally different hypothesis-testing than the use of traditional nulls and that (b) polyadic nulls support richer and more complex measurements of graph structure. We close with a formulation of a novel measure of correlation in hypergraphs, as well as an asymptotic formula for estimating its expectations under one of our configuration models. In the final chapter, we study the expected adjacency matrix of a uniformly random multigraph with a fixed degree sequence. This matrix is an input into several common network analyses, including community-detection and mean-field theories of spreading properties on contact networks. The actual structure of this matrix, however, is not well understood. The main issues are (a) the combinatorial complexity of the space on which this random graph is defined and (b) an erroneous folk-theorem among network scientists which stems from confusion with related models. / By studying the dynamics of a Markov chain sampler, we prove a sequence of approximations that allow us to estimate the expected adjacency matrix - and other elementwise moments - using a fast numerical scheme with qualified uniqueness guarantees. We illustrate using a series of experiments on primary and secondary school contact networks, showing order-of-magnitude improvements over extant methods. We conclude with a description of several directions of future work. / by Philip S. Chodrow. / Ph. D. / Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center Operations Research Center.
584	Real-Time Calibration of Large-Scale Traffic Simulators: Achieving Efficiency Through the Use of Analytical Mode Zhang, Kevin,Ph. D.Massachusetts Institute of Technology. January 2020 (has links) Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, September, 2020 / Cataloged from PDF version of thesis. / Includes bibliographical references (pages 197-203). / Stochastic traffic simulators are widely used in the transportation community to model real-world urban road networks in applications ranging from real-time congestion routing and control to traffic state prediction. Online calibration of these simulators plays a crucial role in achieving high accuracy in the replication and prediction of streaming traffic data (i.e., link flows, densities). In order to be relevant in a real-time context, the problem must also be solved within a strict computational budget. The primary goal of this thesis is to develop an algorithm that adequately solves the online calibration problem for high-dimensional cases and on large-scale networks. In the first half, a new online calibration algorithm is proposed that incorporates structural information from an analytical metamodel into a general-purpose extended Kalman filter framework. / The metamodel is built around a macroscopic network model that relates calibration parameters to field measurements in an analytical, computationally tractable, and differentiable way. Using the metamodel as an analytical approximation of the traffic simulator improves the computational efficiency of the linearization step of the extended Kalman filter, making it suitable for use in large-scale calibration problems. The embedded analytical network model provides a secondary benefit of making the algorithm more robust to simulator stochasticity compared with traditional black-box calibration methods. In the second half, the proposed algorithm is adapted for the case study of online calibration of travel demand as defined by a set of time-dependent origin-destination matrices. First, an analytical network model relating origin-destination demand to link measurements is formulated and validated on the Singapore expressway network. / Next, the proposed algorithm is validated on a synthetic toy network, where its flexibility in calibrating to multiple sources of field data is demonstrated. The empirical results show marked improvement over the baseline of offline calibration and comparable performance to multiple benchmark algorithms from the literature. Finally, the proposed algorithm is applied to a problem of dimension 4,050 on the Singapore expressway network to evaluate its feasibility for large-scale problems. Empirical results confirm the real-time performance of the algorithm in a real-world setting, with strong accuracy in the estimation of sensor counts. / by Kevin Zhang. / Ph. D. / Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center Operations Research Center.
585	Online and offline learning in operations Wang, Li,Ph D.Massachusetts Institute of Technology. January 2020 (has links) Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, September, 2020 / Cataloged from PDF version of thesis. / Includes bibliographical references (pages 213-219). / With the rapid advancement of information technology and accelerated development of data science, the importance of integrating data into decision-making has never been stronger. In this thesis, we propose data-driven algorithms to incorporate learning from data in three operations problems, concerning both online learning and offline learning settings. First, we study a single product pricing problem with demand censoring in an offline data-driven setting. In this problem, a retailer is given a finite level of inventory, and faces a random demand that is price sensitive in a linear fashion with unknown parameters and distribution. Any unsatisfied demand is lost and unobservable. The retailer's objective is to use offline censored demand data to find an optimal price, maximizing her expected revenue with finite inventories. / We characterize an exact condition for the identifiability of near-optimal algorithms, and propose a data-driven algorithm that guarantees near-optimality in the identifiable case and approaches best-achievable optimality gap in the unidentifiable case. Next, we study the classic multi-period joint pricing and inventory control problem in an offline data-driven setting. We assume the demand functions and noise distributions are unknown, and propose a data-driven approximation algorithm, which uses offline demand data to solve the joint pricing and inventory control problem. We establish a polynomial sample complexity bound, the number of data samples needed to guarantee a near-optimal profit. A simulation study suggests that the data-driven algorithm solves the dynamic program effectively. Finally, we study an online learning problem for product selection in urban warehouses managed by fast-delivery retailers. We distill the problem into a semi-bandit model with linear generalization. / There are n products, each with a feature vector of dimension T. In each of the T periods, a retailer selects K products to offer, where T is much greater than T or b. We propose an online learning algorithm that iteratively shrinks the upper confidence bounds within each period. Compared to the standard UCB algorithm, we prove the new algorithm reduces the most dominant regret term by a factor of d, and experiments on datasets from Alibaba Group suggest it lowers the total regret by at least 10%.. / by Li Wang. / Ph. D. / Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center Operations Research Center.
586	Improving farmers' and consumers' welfare in agricultural supply chains via data-driven analytics and modeling : from theory to practice Singhvi, Somya. January 2020 (has links) Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, September, 2020 / Page 236 blank. Cataloged from PDF version of thesis. / Includes bibliographical references (pages 223-235). / The upstream parts of the agricultural supply chain consists of millions of smallholder farmers who continue to suffer from extreme poverty. The first stream of research in this thesis focuses on online agri-platforms which have been launched to connect geographically isolated markets in many developing countries. This work is in close collaboration with the state government of Karnataka in India which launched the Unified Market Platform (UMP). Leveraging both public data and platform data, a difference-in-differences analysis in Chapter 2 suggests that the implementation of the UMP has significantly increased modal price of certain commodities (5.1%-3.5%), while prices for other commodities have not changed. The analysis provides evidence that logistical challenges, bidding efficiency, market concentration, and price discovery process are important factors explaining the variable impact of UMP on prices. / Based on the insights, Chapter 3 describes the design, analysis and field implementation of a new two-stage auction mechanism. From February to May 2019, commodities worth more than $6 million (USD) had been traded under the new auction. Our empirical analysis suggests that the implementation has yielded a significant 4.7% price increase with an impact on farmer profitability ranging 60%-158%, affecting over 10,000 farmers who traded in the treatment market. The second stream of research work in the thesis turns to consumer welfare and identifies effective policies to tackle structural challenges of food safety and food security that arise in traditional agricultural markets. In Chapter 4, we develop a new modeling framework to investigate how quality uncertainty, supply chain dispersion, and imperfect testing capabilities jointly engender suppliers' adulteration behavior. / The results highlight the limitations of only relying on end-product inspection to deter EMA and advocate a more proactive approach that addresses fundamental structural problems in the supply chain. In Chapter 5, we analyze the issue of artificial shortage, the phenomenon that leads to food security risks where powerful traders strategically withhold inventory of essential commodities to create price surge in the market. The behavioral game-theoretic models developed allow us to examine the effectiveness of common government interventions. The analysis demonstrates the disparate effects of different interventions on artificial shortage; while supply allocation schemes often mitigate shortage, cash subsidy can inadvertently aggravate shortage in the market. Further, using field data from onion markets of India, we structurally estimate that 10% of the total supply is being hoarded by the traders during the lean season. / by Somya Singhvi. / Ph. D. / Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center Operations Research Center.
587	Data-driven decision making in online and offline retail/ Singhvi, Divya. January 2020 (has links) Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, September, 2020 / Cataloged from student-submitted PDF version of thesis. / Includes bibliographical references (pages 228-238). / .Retail operations have experienced a transformational change in the past decade with the advent and adoption of data-driven approaches to drive decision making. Granular data collection has enabled firms to make personalized decisions that improve customer experience and maintain long-term engagement. In this thesis we discuss important problems that retailers face in practice, before, while and after a product is introduced in the market. In Chapter 2, we consider the problem of estimating sales for a new product before retailers release the product to the customer. We introduce a joint clustering and regression method that jointly clusters existing products based on their features as well as their sales patterns while estimating their demand. Further, we use this information to predict demand for new products. Analytically, we show an out-of-sample prediction error bound. / Numerically, we perform an extensive study on real world data sets from Johnson & Johnson and a large fashion retailer and find that the proposed method outperforms state-of-the-art prediction methods and improves the WMAPE forecasting metric between 5%-15%. Even after the product is released in the market, a customer's decision of purchasing the product depends on the right recommendation personalized for her. In Chapter 3, we consider the problem of personalized product recommendations when customer preferences are unknown and the retailer risks losing customers because of irrelevant recommendations. We present empirical evidence of customer disengagement through real-world data. We formulate this problem as a user preference learning problem. We show that customer disengagement can cause almost all state-of-the-art learning algorithms to fail in this setting. / We propose modifying bandit learning strategies by constraining the action space upfront using an integer optimization model. We prove that this modification can keep significantly more customers engaged on the platform. Numerical experiments demonstrate that our algorithm can improve customer engagement with the platform by up to 80%. Another important decision a retailer needs to make for a new product, is its pricing. In Chapter 4, we consider the dynamic pricing problem of a retailer who does not have any information on the underlying demand for the product. An important feature we incorporate is the fact that the retailer also seeks to reduce the amount of price experimentation. / We consider the pricing problem when demand is non-parametric and construct a pricing algorithm that uses piecewise linear approximations of the unknown demand function and establish when the proposed policy achieves a near-optimal rate of regret (Õ)( [square root of] T), while making O(log log T) price changes. Our algorithm allows for a considerable reduction in price changes from the previously known O(log T) rate of price change guarantee found in the literature. Finally, once a purchase is made, a customer's decision to return to the same retailer depends on the product return polices and after-sales services of the retailer. As a result, in Chapter 5, we focus on the problem of reducing product returns. Closely working with one of India's largest online fashion retailers, we focus on identifying the effect of delivery gaps (total time that customers have to wait for the product they ordered to arrive) and customer promise dates on product returns. / We perform an extensive empirical analysis and run a large scale Randomized Control Trial (RCT) to estimate these effects. Based on the insights from this empirical analysis, we then develop an integer optimization model to optimize delivery speed targets. / by Divya Singhvi. / Ph. D. / Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center Operations Research Center.
588	Multi-objective Resource Constrained Parallel Machine Scheduling Model with Setups, Machine Eligibility Restrictions, Release and Due Dates with User Interaction January 2020 (has links) abstract: This dissertation explores the use of deterministic scheduling theory for the design and development of practical manufacturing scheduling strategies as alternatives to current scheduling methods, particularly those used to minimize completion times and increase system capacity utilization. The efficient scheduling of production systems can make the difference between a thriving and a failing enterprise, especially when expanding capacity is limited by the lead time or the high cost of acquiring additional manufacturing resources. A multi-objective optimization (MOO) resource constrained parallel machine scheduling model with setups, machine eligibility restrictions, release and due dates with user interaction is developed for the scheduling of complex manufacturing systems encountered in the semiconductor and plastic injection molding industries, among others. Two mathematical formulations using the time-indexed Integer Programming (IP) model and the Diversity Maximization Approach (DMA) were developed to solve resource constrained problems found in the semiconductor industry. A heuristic was developed to find fast feasible solutions to prime the IP models. The resulting models are applied in two different ways: constructing schedules for tactical decision making and constructing Pareto efficient schedules with user interaction for strategic decision making aiming to provide insight to decision makers on multiple competing objectives. Optimal solutions were found by the time-indexed IP model for 45 out of 45 scenarios in less than one hour for all the problem instance combinations where setups were not considered. Optimal solutions were found for 18 out of 45 scenarios in less than one hour for several combinations of problem instances with 10 and 25 jobs for the hybrid (IP and heuristic) model considering setups. Regarding the DMA MOO scheduling model, the complete efficient frontier (9 points) was found for a small size problem instance in 8 minutes, and a partial efficient frontier (29 points) was found for a medium sized problem instance in 183 hrs. / Dissertation/Thesis / Doctoral Dissertation Systems Engineering 2020 Engineering Operations research
589	Machine learning for problems with missing and uncertain data with applications to personalized medicine Pawlowski, Colin. January 2019 (has links) This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. / Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2019 / Cataloged from student-submitted PDF version of thesis. / Includes bibliographical references (pages 205-215). / When we try to apply statistical learning in real-world applications, we frequently encounter data which include missing and uncertain values. This thesis explores the problem of learning from missing and uncertain data with a focus on applications in personalized medicine. In the first chapter, we present a framework for classification when data is uncertain that is based upon robust optimization. We show that adding robustness in both the features and labels results in tractable optimization problems for three widely used classification methods: support vector machines, logistic regression, and decision trees. Through experiments on 75 benchmark data sets, we characterize the learning tasks for which adding robustness provides the most value. In the second chapter, we develop a family of methods for missing data imputation based upon predictive methods and formal optimization. / We present formulations for models based on K-nearest neighbors, support vector machines, and decision trees, and we develop an algorithm OptImpute to find high quality solutions which scales to large data sets. In experiments on 84 benchmark data sets, we show that OptImpute outperforms state-of-the-art methods in both imputation accuracy and performance on downstream tasks. In the third chapter, we develop MedImpute, an extension of OptImpute specialized for imputing missing values in multivariate panel data. This method is tailored for data sets that have multiple observations of the same individual at different points in time. In experiments on the Framingham Heart Study and Dana Farber Cancer Institute electronic health record data, we demonstrate that MedImpute improves the accuracy of models predicting 10-year risk of stroke and 60-day risk of mortality for late-stage cancer patients. / In the fourth chapter, we develop a method for tensor completion which leverages noisy side information available on the rows and/or columns of the tensor. We apply this method to the task of predicting anti-cancer drug response at particular dosages. We demonstrate significant gains in out-of-sample accuracy filling in missing values on two large-scale anticancer drug screening data sets with genomic side information. / by Colin Pawlowski. / Ph. D. / Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center Operations Research Center.
590	Advances in data-driven models for transportation Ng, Yee Sian. January 2019 (has links) This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. / Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2019 / Cataloged from student-submitted PDF version of thesis. / Includes bibliographical references (pages 163-176). / With the rising popularity of ride-sharing and alternative modes of transportation, there has been a renewed interest in transit planning to improve service quality and stem declining ridership. However, it often takes months of manual planning for operators to redesign and reschedule services in response to changing needs. To this end, we provide four models of transportation planning that are based on data and driven by optimization. A key aspect is the ability to provide certificates of optimality, while being practical in generating high-quality solutions in a short amount of time. We provide approaches to combinatorial problems in transit planning that scales up to city-sized networks. In transit network design, current tractable approaches only consider edges that exist, resulting in proposals that are closely tethered to the original network. We allow new transit links to be proposed and account for commuters transferring between different services. In integrated transit scheduling, we provide a way for transit providers to synchronize the timing of services in multimodal networks while ensuring regularity in the timetables of the individual services. This is made possible by taking the characteristics of transit demand patterns into account when designing tractable formulations. We also advance the state of the art in demand models for transportation optimization. In emergency medical services, we provide data-driven formulations that outperforms their probabilistic counterparts in ensuring coverage. This is achieved by replacing independence assumptions in probabilistic models and capturing the interactions of services in overlapping regions. In transit planning, we provide a unified framework that allows us to optimize frequencies and prices jointly in transit networks for minimizing total waiting time. / by Yee Sian Ng. / Ph. D. / Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center Operations Research Center.

Search results