Global ETD Search

221	Big-Data Driven Optimization Methods with Applications to LTL Freight Routing Tamvada, Srinivas January 2020 (has links) We propose solution strategies for hard Mixed Integer Programming (MIP) problems, with a focus on distributed parallel MIP optimization. Although our proposals are inspired by the Less-than-truckload (LTL) freight routing problem, they are more generally applicable to hard MIPs from other domains. We start by developing an Integer Programming model for the Less-than-truckload (LTL) freight routing problem, and present a novel heuristic for solving the model in a reasonable amount of time on large LTL networks. Next, we identify some adaptations to MIP branching strategies that are useful for achieving improved scaling upon distribution when the LTL routing problem (or other hard MIPs) are solved using parallel MIP optimization. Recognizing that our model represents a pseudo-Boolean optimization problem (PBO), we leverage solution techniques used by PBO solvers to develop a CPLEX based look-ahead solver for LTL routing and other PBO problems. Our focus once again is on achieving improved scaling upon distribution. We also analyze a technique for implementing subtree parallelism during distributed MIP optimization. We believe that our proposals represent a significant step towards solving big-data driven optimization problems (such as the LTL routing problem) in a more efficient manner. / Thesis / Doctor of Philosophy (PhD) / Less-than-truckload (LTL) freight transportation is a vital part of Canada's economy, with revenues running into billions of dollars and a cascading impact on many other industries. LTL operators often have to deal with large volumes of shipments, unexpected changes in traffic conditions, and uncertainty in demand patterns. In an industry that already has low profit margins, it is therefore vitally important to make good routing decisions without expending a lot of time. The optimization of such LTL freight networks often results in complex big-data driven optimization problems. In addition to the challenge of finding optimal solutions for these problems, analysts often have to deal with the complexities of big-data driven inputs. In this thesis we develop several solution strategies for solving the LTL freight routing problem including an exact model, novel heuristics, and techniques for solving the problem efficiently on a cluster of computers. Although the techniques we develop are inspired by LTL routing, they are more generally applicable for solving big-data driven optimization problems from other domains. Experiments conducted over the years in consultation with industry experts indicate that our proposals can significantly improve solution quality and reduce time to solution. Furthermore, our proposals open up interesting avenues for future research. Big-data driven optimization methods Less-than-truckload freight routing
222	A drug repurposing study based on clinical big data for the protective role of vitamin D in olanzapine-induced dyslipidemia / 臨床ビッグデータに基づくオランザピン誘発脂質異常症に対するビタミンDの予防作用の解明 ZHOU, ZIJIAN 23 March 2023 (has links) 京都大学 / 新制・課程博士 / 博士(薬科学) / 甲第24551号 / 薬科博第168号 / 新制\|\|薬科\|\|18(附属図書館) / 京都大学大学院薬学研究科薬科学専攻 / (主査)教授金子周司, 教授竹島浩, 教授上杉志成 / 学位規則第4条第1項該当 / Doctor of Pharmaceutical Sciences / Kyoto University / DFAM Clinical big data Dyslipidemia Olanzapine Vitamin D Cholesterol biosynthesis 499.3
223	The security of big data in fog-enabled IoT applications including blockchain: a survey Tariq, N., Asim, M., Al-Obeidat, F., Farooqi, M.Z., Baker, T., Hammoudeh, M., Ghafir, Ibrahim 24 January 2020 (has links) Yes / The proliferation of inter-connected devices in critical industries, such as healthcare and power grid, is changing the perception of what constitutes critical infrastructure. The rising interconnectedness of new critical industries is driven by the growing demand for seamless access to information as the world becomes more mobile and connected and as the Internet of Things (IoT) grows. Critical industries are essential to the foundation of today’s society, and interruption of service in any of these sectors can reverberate through other sectors and even around the globe. In today’s hyper-connected world, the critical infrastructure is more vulnerable than ever to cyber threats, whether state sponsored, criminal groups or individuals. As the number of interconnected devices increases, the number of potential access points for hackers to disrupt critical infrastructure grows. This new attack surface emerges from fundamental changes in the critical infrastructure of organizations technology systems. This paper aims to improve understanding the challenges to secure future digital infrastructure while it is still evolving. After introducing the infrastructure generating big data, the functionality-based fog architecture is defined. In addition, a comprehensive review of security requirements in fog-enabled IoT systems is presented. Then, an in-depth analysis of the fog computing security challenges and big data privacy and trust concerns in relation to fog-enabled IoT are given. We also discuss blockchain as a key enabler to address many security related issues in IoT and consider closely the complementary interrelationships between blockchain and fog computing. In this context, this work formalizes the task of securing big data and its scope, provides a taxonomy to categories threats to fog-based IoT systems, presents a comprehensive comparison of state-of-the-art contributions in the field according to their security service and recommends promising research directions for future investigations. Security Big data Internet of Things Fog computing Edge computing Blockchain
224	<b>Sample Size Determination for Subsampling in the Analysis of Big Data, Multiplicative models for confidence intervals and Free-Knot changepoint models</b> Sheng Zhang (18468615) 11 June 2024 (has links) <p dir="ltr">We studied the relationship between subsample size and the accuracy of resulted estimation under big data setup.</p><p dir="ltr">We also proposed a novel approach to the construction of confidence intervals based on improved concentration inequalities.</p><p dir="ltr">Lastly, we studied irregular change-point models using free-knot splines.</p> Applied statistics Subsampling. Big Data Analytics Analyzing Changepoint model
225	Efficient computer experiment designs for Gaussian process surrogates Cole, David Austin 28 June 2021 (has links) Due to advancements in supercomputing and algorithms for finite element analysis, today's computer simulation models often contain complex calculations that can result in a wealth of knowledge. Gaussian processes (GPs) are highly desirable models for computer experiments for their predictive accuracy and uncertainty quantification. This dissertation addresses GP modeling when data abounds as well as GP adaptive design when simulator expense severely limits the amount of collected data. For data-rich problems, I introduce a localized sparse covariance GP that preserves the flexibility and predictive accuracy of a GP's predictive surface while saving computational time. This locally induced Gaussian process (LIGP) incorporates latent design points, inducing points, with a local Gaussian process built from a subset of the data. Various methods are introduced for the design of the inducing points. LIGP is then extended to adapt to stochastic data with replicates, estimating noise while relying upon the unique design locations for computation. I also address the goal of identifying a contour when data collection resources are limited through entropy-based adaptive design. Unlike existing methods, the entropy-based contour locator (ECL) adaptive design promotes exploration in the design space, performing well in higher dimensions and when the contour corresponds to a high/low quantile. ECL adaptive design can join with importance sampling for the purpose of reducing uncertainty in reliability estimation. / Doctor of Philosophy / Due to advancements in supercomputing and physics-based algorithms, today's computer simulation models often contain complex calculations that can produce larger amounts of data than through physical experiments. Computer experiments conducted with simulation models are sought-after ways to gather knowledge about physical problems but come with design and modeling challenges. In this dissertation, I address both data size extremes - building prediction models with large data sets and designing computer experiments when scarce resources limit the amount of data. For the former, I introduce a strategy of constructing a series of models including small subsets of observed data along with a set of unobserved data locations (inducing points). This methodology also contains the ability to perform calculations with only unique data locations when replicates exist in the data. The locally induced model produces accurate predictions while saving computing time. Various methods are introduced to decide the locations of these inducing points. The focus then shifts to designing an experiment for the purpose of accurate prediction around a particular output quantity of interest (contour). A experimental design approach is detailed that selects new sample locations one-at-a-time through a function to maximize the amount of information gain in the contour region for the overall model. This work is combined with an existing method to estimate the true volume of the contour. inducing points active learning big data kriging reliability
226	SensAnalysis: A Big Data Platform for Vibration-Sensor Data Analysis Kumar, Abhinav 26 June 2019 (has links) The Goodwin Hall building on the Virginia Tech campus is the most instrumented building for vibration monitoring. It houses 225 hard-wired accelerometers which record vibrations arising due to internal as well as external activities. The recorded vibration data can be used to develop real-time applications for monitoring the health of the building or detecting human activity in the building. However, the lack of infrastructure to handle the massive scale of the data, and the steep learning curve of the tools required to store and process the data, are major deterrents for the researchers to perform their experiments. Additionally, researchers want to explore the data to determine the type of experiments they can perform. This work tries to solve these problems by providing a system to store and process the data using existing big data technologies. The system simplifies the process of big data analysis by supporting code re-usability and multiple programming languages. The effectiveness of the system was demonstrated by four case studies. Additionally, three visualizations were developed to help researchers in the initial data exploration. / Master of Science / The Goodwin Hall building on the Virginia Tech campus is an example of a ‘smart building.’ It uses sensors to record the response of the building to various internal and external activities. The recorded data can be used by algorithms to facilitate understanding of the properties of the building or to detect human activity. Accordingly, researchers in the Virginia Tech Smart Infrastructure Lab (VTSIL) run experiments using a part of the complete data. Ideally, they want to run their experiments continuously as new data is collected. However, the massive scale of the data makes it difficult to process new data as soon as it arrives, and to make it available immediately to the researchers. The technologies that can handle data at this scale have a steep learning curve. Starting to use them requires much time and effort. This project involved building a system to handle these challenges so that researchers can focus on their core area of research. The system provides visualizations depicting various properties of the data to help researchers explore that data before running an experiment. The effectiveness of this work was demonstrated using four case studies. These case studies used the actual experiments conducted by VTSIL researchers in the past. The first three case studies help in understanding the properties of the building whereas the final case study deals with detecting and locating human footsteps, on one of the floors, in real-time. big data data analysis sensor data Goodwin hall
227	Mining Security Risks from Massive Datasets Liu, Fang 09 August 2017 (has links) Cyber security risk has been a problem ever since the appearance of telecommunication and electronic computers. In the recent 30 years, researchers have developed various tools to protect the confidentiality, integrity, and availability of data and programs. However, new challenges are emerging as the amount of data grows rapidly in the big data era. On one hand, attacks are becoming stealthier by concealing their behaviors in massive datasets. One the other hand, it is becoming more and more difficult for existing tools to handle massive datasets with various data types. This thesis presents the attempts to address the challenges and solve different security problems by mining security risks from massive datasets. The attempts are in three aspects: detecting security risks in the enterprise environment, prioritizing security risks of mobile apps and measuring the impact of security risks between websites and mobile apps. First, the thesis presents a framework to detect data leakage in very large content. The framework can be deployed on cloud for enterprise and preserve the privacy of sensitive data. Second, the thesis prioritizes the inter-app communication risks in large-scale Android apps by designing new distributed inter-app communication linking algorithm and performing nearest-neighbor risk analysis. Third, the thesis measures the impact of deep link hijacking risk, which is one type of inter-app communication risks, on 1 million websites and 160 thousand mobile apps. The measurement reveals the failure of Google's attempts to improve the security of deep links. / Ph. D. Cyber Security Big Data Security Mobile Security Data Leakage Detection
228	Sequential learning, large-scale calibration, and uncertainty quantification Huang, Jiangeng 23 July 2019 (has links) With remarkable advances in computing power, computer experiments continue to expand the boundaries and drive down the cost of various scientific discoveries. New challenges keep arising from designing, analyzing, modeling, calibrating, optimizing, and predicting in computer experiments. This dissertation consists of six chapters, exploring statistical methodologies in sequential learning, model calibration, and uncertainty quantification for heteroskedastic computer experiments and large-scale computer experiments. For heteroskedastic computer experiments, an optimal lookahead based sequential learning strategy is presented, balancing replication and exploration to facilitate separating signal from input-dependent noise. Motivated by challenges in both large data size and model fidelity arising from ever larger modern computer experiments, highly accurate and computationally efficient divide-and-conquer calibration methods based on on-site experimental design and surrogate modeling for large-scale computer models are developed in this dissertation. The proposed methodology is applied to calibrate a real computer experiment from the gas and oil industry. This on-site surrogate calibration method is further extended to multiple output calibration problems. / Doctor of Philosophy / With remarkable advances in computing power, complex physical systems today can be simulated comparatively cheaply and to high accuracy through computer experiments. Computer experiments continue to expand the boundaries and drive down the cost of various scientific investigations, including biological, business, engineering, industrial, management, health-related, physical, and social sciences. This dissertation consists of six chapters, exploring statistical methodologies in sequential learning, model calibration, and uncertainty quantification for heteroskedastic computer experiments and large-scale computer experiments. For computer experiments with changing signal-to-noise ratio, an optimal lookahead based sequential learning strategy is presented, balancing replication and exploration to facilitate separating signal from complex noise structure. In order to effectively extract key information from massive amount of simulation and make better prediction for the real world, highly accurate and computationally efficient divide-and-conquer calibration methods for large-scale computer models are developed in this dissertation, addressing challenges in both large data size and model fidelity arising from ever larger modern computer experiments. The proposed methodology is applied to calibrate a real computer experiment from the gas and oil industry. This large-scale calibration method is further extended to solve multiple output calibration problems. sequential learning computer experiments uncertainty quantification big data hierarchical modeling
229	Parallel Mining and Analysis of Triangles and Communities in Big Networks Arifuzzaman, S M. 19 August 2016 (has links) A network (graph) is a powerful abstraction for interactions among entities in a system. Examples include various social, biological, collaboration, citation, and co-purchase networks. Real-world networks are often characterized by an abundance of triangles and the existence of well-structured communities. Thus, counting triangles and detecting communities in networks have become important algorithmic problems in network mining and analysis. In the era of big data, the network data emerged from numerous scientific disciplines are very large. Online social networks such as Twitter and Facebook have millions to billions of users. Such massive networks often do not fit in the main memory of a single machine, and the existing sequential methods might take a prohibitively large runtime. This motivates the need for scalable parallel algorithms for mining and analysis. We design MPI-based distributed-memory parallel algorithms for counting triangles and detecting communities in big networks and present related analysis. The dissertation consists of four parts. In Part I, we devise parallel algorithms for counting and enumerating triangles. The first algorithm employs an overlapping partitioning scheme and novel load-balancing schemes leading to a fast algorithm. We also design a space-efficient algorithm using non-overlapping partitioning and an efficient communication scheme. This space efficiency allows the algorithm to work on even larger networks. We then present our third parallel algorithm based on dynamic load balancing. All these algorithms work on big networks, scale to a large number of processors, and demonstrate very good speedups. An important property, very related to triangles, of many real-world networks is high transitivity, which states that two nodes having common neighbors tend to become neighbors themselves. In Part II, we characterize networks by quantifying the number of common neighbors and demonstrate its relationship to community structure of networks. In Part III, we design parallel algorithms for detecting communities in big networks. We propose efficient load balancing and communication approaches, which lead to fast and scalable algorithms. Finally, in Part IV, we present scalable parallel algorithms for a useful graph preprocessing problem-- converting edge list to adjacency list. We present non-trivial parallelization with efficient HPC-based techniques leading to fast and space-efficient algorithms. / Ph. D. Network Mining Parallel Algorithm Triangle Counting Community Detection Big Data
230	Surveillance Technology and the Neoliberal State: Expanding the Power to Criminalize in a Data-Unlimited World Hurley, Emily Elizabeth 23 June 2017 (has links) For the past several decades, the neoliberal school of economics has dominated public policy, encouraging American politicians to reduce the size of the government. Despite this trend, the power of the state to surveille, criminalize, and detain has become more extensive, even as the state appears to be growing less powerful. By allowing information technology corporations such as Google to collect location data from users with or without their knowledge, the state can tap into a vast surveillance network at any time, retroactively surveilling and criminalizing at its discretion. Furthermore, neoliberal political theory has eroded the classical liberal conception of freedom so that these surveillance tactics to not appear to restrict individuals' freedom or privacy so long as they give their consent to be surveilled by a private corporation. Neoliberalism also encourages the proliferation of information technologies by making individuals responsible for their economic success and wellbeing in an increasingly competitive world, thus pushing more individuals to use information technologies to enter into the gig economy. The individuating logic of neoliberalism, combined with the rapid economic potentialities of information technology, turn individuals into mere sources of human capital. Even though the American state's commitment to neoliberalism precludes it from covertly managing the labor economy, it can still manage a population through criminalization and incarceration. Access to users' data by way of information technology makes the process of criminalization more manageable and allows the state to more easily incarcerate indiscriminately. / Master of Arts Neoliberal governance surveillance technology big data criminality freedom

Search results