• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 17
  • 6
  • 3
  • Tagged with
  • 25
  • 25
  • 25
  • 25
  • 12
  • 8
  • 5
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Essays on Demand Estimation, Financial Economics and Machine Learning

He, Pu January 2019 (has links)
In this era of big data, we often rely on techniques ranging from simple linear regression, structural estimation, and state-of-the-art machine learning algorithms to make operational and financial decisions based on data. This calls for a deep understanding of practical and theoretical aspects of methods and models from statistics, econometrics, and computer science, combined with relevant domain knowledge. In this thesis, we study several practical, data-related problems in the particular domains of sharing economy and financial economics/financial engineering, using appropriate approaches from an arsenal of data-analysis tools. On the methodological front, we propose a new estimator for classic demand estimation problem in economics, which is important for pricing and revenue management. In the first part of this thesis, we study customer preference for the bike share system in London, in order to provide policy recommendations on bike share system design and expansion. We estimate a structural demand model on the station network to learn the preference parameters, and use the estimated model to provide insights on the design and expansion of the system. We highlight the importance of network effects in understanding customer demand and evaluating expansion strategies of transportation networks. In the particular example of the London bike share system, we find that allocating resources to some areas of the station network can be 10 times more beneficial than others in terms of system usage, and that currently implemented station density rule is far from optimal. We develop a new method to deal with the endogeneity problem of the choice set in estimating demand for network products. Our method can be applied to other settings, in which the available set of products or services depends on demand. In the second part of this thesis, we study demand estimation methodology when data has a long-tail pattern, that is, when a significant portion of products have zero or very few sales. Long-tail distributions in sales or market share data have long been an issue in empirical studies in areas such as economics, operations, and marketing, and it is increasingly common nowadays with more detailed levels of data available and many more products being offered in places like online retailers and platforms. The classic demand estimation framework cannot deal with zero sales, which yields inconsistent estimates. More importantly, biased demand estimates, if used as an input to subsequent tasks such as pricing, lead to managerial decisions that are far from optimal. We introduce two new two-stage estimators to solve the problem: our solutions apply machine learning algorithms to estimate market shares in the first stage, and in the second stage, we utilize the first-stage results to correct for the selection bias in demand estimates. We find that our approach works better than traditional methods using simulations. In the third part of this thesis, we study how to extract a signal from option pricing models to form a profitable stock trading strategy. Recent work has documented roughness in the time series of stock market volatility and investigated its implications for option pricing. We study a strategy for trading stocks based on measures of their implied and realized roughness. A strategy that goes long the roughest-volatility stocks and short the smoothest-volatility stocks earns statistically significant excess annual returns of 6% or more, depending on the time period and strategy details. Standard factors do not explain the profitability of the strategy. We compare alternative measures of roughness in volatility and find that the profitability of the strategy is greater when we sort stocks based on implied rather than realized roughness. We interpret the profitability of the strategy as compensation for near-term idiosyncratic event risk. Lastly, we apply a heterogeneous treatment effect (HTE) estimator from statistics and machine learning to financial asset pricing. Recent progress in the interdisciplinary area of causal inference and machine learning has proposed various promising estimators for HTE. We take the R-learner algorithm by [73] and adapt it to empirical asset pricing. We study characteristics associated with standard factors, size, value and momentum through the lens of HTE. Our goal is to identify sub-universes of stocks, ``characteristic responders", in which size, value or momentum trading strategies perform best, compared with the performance had they been applied to the entire universe. On the other hand, we identify subsets of ``characteristic traps" in which the strategies perform the worst. In our test period, the differences in average monthly returns between long-short strategies restricted to ``characteristic responders" and ``characteristic traps" range from 0.77% to 1.54% depending on treatment characteristics. The differences are statistically significant and cannot be explained by standard factors: a long-short of long-short strategy generates alpha of significant magnitude from 0.98% to 1.80% monthly, with respect to standard Fama-French plus momentum factors. Simple interaction terms between standard factors and ex-post important features do not explain the alphas either. We also characterize and interpret the characteristic traps and responders identified by our algorithm. Our study can be viewed as a systematic, data-driven way to investigate interaction effects between features and treatment characteristic, and to identify characteristic traps and responders.
12

A Mathematical Study of Learning Dynamics

Keller, Rachael Tara January 2021 (has links)
Data-driven discovery of dynamics, where data is used to learn unknown dynamics, is witnessing a resurgence of interest as data and computational tools have become widespread and increasingly accessible. Advances in machine learning, data science, and neural networks are fueling new data-driven studies and rapidly changing the landscape in almost every field. Meanwhile, classical numerical analysis remains a steady tool to analyze these new problems. This thesis situates emerging works coupling machine learning, neural networks, and data-driven discovery of dynamics in classical numerical theory. We begin by formulating a universal learning framework based in optimization theory. We discuss how three paradigms of machine learning -- supervised, unsupervised, and reinforcement learning -- are encapsulated by this framework and form a general learning problem for discovery of dynamics. Using this formulation, we distill data-driven discovery of dynamics using the classical technique of linear multistep methods with neural networks to its most basic roots for numerical analysis. We establish for the first time a rigorous mathematical theory for using linear multistep methods in discovery of dynamics assuming exact data. We present refined notions of consistency, stability, and convergence for discovery and show convergence results for the popular schemes of Adams-Bashforth, Adams-Moulton, and Backwards Differentiation Formula. Extending the study for noisy data, we propose and analyze the recovery of a smooth approximation to the state using splines and prove new results on discrete differentiation error estimates.
13

Salience Estimation and Faithful Generation: Modeling Methods for Text Summarization and Generation

Kedzie, Christopher January 2021 (has links)
This thesis is focused on a particular text-to-text generation problem, automatic summarization, where the goal is to map a large input text to a much shorter summary text. The research presented aims to both understand and tame existing machine learning models, hopefully paving the way for more reliable text-to-text generation algorithms. Somewhat against the prevailing trends, we eschew end-to-end training of an abstractive summarization model, and instead break down the text summarization problem into its constituent tasks. At a high level, we divide these tasks into two categories: content selection, or “what to say” and content realization, or “how to say it” (McKeown, 1985). Within these categories we propose models and learning algorithms for the problems of salience estimation and faithful generation. Salience estimation, that is, determining the importance of a piece of text relative to some context, falls into a problem of the former category, determining what should be selected for a summary. In particular, we experiment with a variety of popular or novel deep learning models for salience estimation in a single document summarization setting, and design several ablation experiments to gain some insight into which input signals are most important for making predictions. Understanding these signals is critical for designing reliable summarization models. We then consider a more difficult problem of estimating salience in a large document stream, and propose two alternative approaches using classical machine learning techniques from both unsupervised clustering and structured prediction. These models incorporate salience estimates into larger text extraction algorithms that also consider redundancy and previous extraction decisions. Overall, we find that when simple, position based heuristics are available, as in single document news or research summarization, deep learning models of salience often exploit them to make predictions, while ignoring the arguably more important content features of the input. In more demanding environments, like stream summarization, where heuristics are unreliable, more semantically relevant features become key to identifying salience content. In part two, content realization, we assume content selection has already been performed and focus on methods for faithful generation (i.e., ensuring that output text utterances respect the semantics of the input content). Since they can generate very fluent and natural text, deep learning- based natural language generation models are a popular approach to this problem. However, they often omit, misconstrue, or otherwise generate text that is not semantically correct given the input content. In this section, we develop a data augmentation and self-training technique to mitigate this problem. Additionally, we propose a training method for making deep learning-based natural language generation models capable of following a content plan, allowing for more control over the output utterances generated by the model. Under a stress test evaluation protocol, we demonstrate some empirical limits on several neural natural language generation models’ ability to encode and properly realize a content plan. Finally, we conclude with some remarks on future directions for abstractive summarization outside of the end-to-end deep learning paradigm. Our aim here is to suggest avenues for constructing abstractive summarization systems with transparent, controllable, and reliable behavior when it comes to text understanding, compression, and generation. Our hope is that this thesis inspires more research in this direction, and, ultimately, real tools that are broadly useful outside of the natural language processing community.
14

Examination of Bandwidth Enhancement and Circulant Filter Frequency Cutoff Robustification in Iterative Learning Control

Zhang, Tianyi January 2021 (has links)
The iterative learning control (ILC) problem considers control tasks that perform a specific tracking command, and the command is to be performed is many times. The system returns to the same initial conditions on the desired trajectory for each repetition, also called run, or iteration. The learning law adjusts the command to a feedback system based on the error observed in the previous run, and aims to converge to zero-tracking error at sampled times as the iterations progress. The ILC problem is an inverse problem: it seeks to converge to that command that produces the desired output. Mathematically that command is given by the inverse of the transfer function of the feedback system, times the desired output. However, in many applications that unique command is often an unstable function of time. A discrete-time system, converted from a continuous-time system fed by a zero-order hold, often has non-minimum phase zeros which become unstable poles in the inverse problem. An inverse discrete-time system will have at least one unstable pole, if the pole-zero excess of the original continuous-time counterpart is equal to or larger than three, and the sample rate is fast enough. The corresponding difference equation has roots larger than one, and the homogeneous solution has components that are the values of these poles to the power of k, with k being the time step. This creates an unstable command growing in magnitude with time step. If the ILC law aims at zero-tracking error for such systems, the command produced by the ILC iterations will ask for a command input that grows exponentially in magnitude with each time step. This thesis examines several ways to circumvent this difficulty, designing filters that prevent the growth in ILC. The sister field of ILC, repetitive control (RC), aims at zero-error at sample times when tracking a periodic command or eliminating a periodic disturbance of known period, or both. Instead of learning from a previous run always starting from the same initial condition, RC learns from the error in the previous period of the periodic command or disturbance. Unlike ILC, the system in RC eventually enters into steady state as time progresses. As a result, one can use frequency response thinking. In ILC, the frequency thinking is not applicable since the output of the system has transients for every run. RC is also an inverse problem and the periodic command to the system converges to the inverse of the system times the desired output. Because what RC needs is zero error after reaching steady state, one can aim to invert the steady state frequency response of the system instead of the system transfer function in order to have a stable solution to the inverse problem. This can be accomplished by designing a Finite Impulse Response (FIR) filter that mimics the steady state frequency response, and which can be used in real time. This dissertation discusses how the digital feedback control system configuration affects the locations of sampling zeros and discusses the effectiveness of RC design methods for these possible sampling zeros. The sampling zeros are zeros introduced by the discretization process from continuous-time system to the discrete-time system. In the RC problem, the feedback control system can have sampling zeros outside the unit circle, and they are challenges for the RC law design. Previous research concentrated on the situation where the sampling zeros of the feedback control system come from a zero-order hold on the input of a continuous-time feedback system, and studied the influence of these zeros including the influence of these sampling zeros as the sampling rate is changed from the asymptotic value of sample time interval approaching zero. Effective RC design methods are developed and tested based for this configuration. In the real world, the feedback control system may not be the continuous-time system. Here we investigate the possible sampling zero locations that can be encountered in digital control systems where the zero-order hold can be in various possible places in the control loop. We show that various new situations can occur. We discuss the sampling zeros location with different feedback system structures, and show that the RC design methods still work. Moreover, we compare the learning rates of different RC design methods and show that the RC design method based on a quadratic fit of the reciprocal of the steady state frequency response will have the desired learning rate features that balance the robustness with efficiency. This dissertation discusses the steady-state response filter of the finite-time signal used in ILC. The ILC problem is sensitive to model errors and unmodelled high frequency dynamics, thus it needs a zero-phase low-pass filter to cutoff learning for frequencies where there is too much model inaccuracy for convergence. But typical zero-phase low-pass filters, like Filtfilt used by MATLAB, gives the filtered results with transients that can destabilize ILC. The associated issues are examined from several points of view. First, the dissertation discusses use of a partial inverse of the feedback system as both learning gain matrix and a low-pass filter to address this problem The approach is used to make a partial system inverse for frequencies where the model is accurate, eliminating the robustness issue. The concept is used as a way to improve a feedback control system performance whose bandwidth is not as high as desired. When the feedback control system design is unable to achieve the desired bandwidth, the partial system inverse for frequency in a range above the bandwidth can boost the bandwidth. If needed ILC can be used to further correct response up to the new bandwidth. The dissertation then discusses Discrete Fourier Transform (DFT) based filters to cut off the learning at high frequencies where model uncertainty is too large for convergence. The concept of a low pass filter is based on steady state frequency response, but ILC is always a finite time problem. This forms a mismatch in the design process, and we seek to address this. A math proof is given showing the DFT based filters directly give the steady-state response of the filter for the finite-time signal which can eliminate the possibility of instability of ILC. However, such filters have problems of frequency leakage and Gibbs phenomenon in applications, produced by the difference between the signal being filtered at the start time and at the final time, This difference applies to the signal filtered for nearly all iterations in ILC. This dissertation discusses the use of single reflection that produced a signal that has the start time and end times matching and then using the original signal portion of the result. In addition, a double reflection of the signal is studied that aims not only to eliminate the discontinuity that produces Gibbs, but also aims to have continuity of the first derivative. It applies a specific kind of double reflection. It is shown mathematically that the two reflection methods reduce the Gibbs phenomenon. A criterion is given to determine when one should consider using such reflection methods on any signal. The numerical simulations demonstrate the benefits of these reflection methods in reducing the tracking error of the system.
15

Application of Support Vector Machine in Predicting the Market's Monthly Trend Direction

Alali, Ali 10 December 2013 (has links)
In this work, we investigate different techniques to predict the monthly trend direction of the S&P 500 market index. The techniques use a machine learning classifier with technical and macroeconomic indicators as input features. The Support Vector Machine (SVM) classifier was explored in-depth in order to optimize the performance using four different kernels; Linear, Radial Basis Function (RBF), Polynomial, and Quadratic. A result found was the performance of the classifier can be optimized by reducing the number of macroeconomic features needed by 30% using Sequential Feature Selection. Further performance enhancement was achieved by optimizing the RBF kernel and SVM parameters through gridsearch. This resulted in final classification accuracy rates of 62% using technical features alone with gridsearch and 60.4% using macroeconomic features alone using Rankfeatures
16

Towards Trustworthy Geometric Deep Learning for Elastoplasticity

Vlassis, Nikolaos Napoleon January 2021 (has links)
Recent advances in machine learning have unlocked new potential for innovation in engineering science. Neural networks are used as universal function approximators that harness high-dimensional data with excellent learning capacity. While this is an opportunity to accelerate computational mechanics research, application in constitutive modeling is not trivial. Machine learning material response predictions without enforcing physical constraints may lack interpretability and could be detrimental to high-risk engineering applications. This dissertation presents a meta-modeling framework for automating the discovery of elastoplasticity models across material scales with emphasis on establishing interpretable and, hence, trustworthy machine learning modeling tools. Our objective is to introduce a workflow that leverages computational mechanics domain expertise to enforce / post hoc validate physical properties of the data-driven constitutive laws. Firstly, we introduce a deep learning framework designed to train and validate neural networks to predict the hyperelastic response of materials. We adopt the Sobolev training method and adapt it for mechanics modeling to gain control over the higher-order derivatives of the learned functions. We generate machine learning models that are thermodynamically consistent, interpretable, and demonstrate enhanced learning capacity. The Sobolev training framework is shown through numerical experiments on different material data sets (e.g. β-HMX crystal, polycrystals, soil) to generate hyperelastic energy functionals that predict the elastic energy, stress, and stiffness measures more accurately than the classical training methods that minimize L2 norms. To model path-dependent phenomena, we depart from the common approach to lump the elastic and plastic response into one black-box neural network prediction. We decompose the elastoplastic behavior into its interpretable theoretical components by training separately a stored elastic energy function, a yield surface, and a plastic flow that evolve based on a set of deep neural network predictions. We interpret the yield function as a level set and control its evolutionas the neural network approximated solutions of a Hamilton-Jacobi equation that governs the hardening/softening mechanism. Our framework may recover any classical literature yield functions and hardening rules as well as discover new mechanisms that are either unbeknownst or difficult to express with mathematical expressions. Through numerical experiments on a 3D FFT-generated polycrystal material response database, we demonstrate that our novel approach provides more robust and accurate forward predictions of cyclic stress paths than black-box deep neural network models. We demonstrate the framework's capacity to readily extend to more complex plasticity phenomena, such as pressure sensitivity, rate-dependence, and anisotropy. Finally, we integrate geometric deep learning and Sobolev training to generate constitutive models for the homogenized responses of anisotropic microstructures (e.g. polycrystals, granular materials). Commonly used hand-crafted homogenized microstructural descriptors (e.g. porosity or the averaged orientation of constitutes) may not adequately capture the topological structures of a material. This is overcome by introducing weighted graphs as new high-dimensional descriptors that represent topological information, such as the connectivity of anisotropic grains in an assemble. Through graph convolutional deep neural networks and graph embedding techniques, our neural networks extract low-dimensional features from the weighted graphs and, subsequently, learn the influence of these low-dimensional features on the resultant stored elastic energy functionals and plasticity models.
17

Beyond Worst-Case Analysis of Optimization in the Era of Machine Learning

Vlatakis Gkaragkounis, Emmanouil Vasileios January 2022 (has links)
Worst-case analysis (WCA) has been the dominant tool for understanding the performance of the lion share of algorithmic arsenal of theoretical computer science. While WCA has provided us a thorough picture for a variety of problems over the last few decades, the advent of Machine Learn- ing era renewed our interest in several important optimization problems whose actual complexity have been elusive for empirical and real-world instances. More interestingly, while state-of-the- art ML models become deeper, larger in scale, sequential and highly nonconvex, the backbone of modern learning algorithms are simple algorithms such as Local Search, Gradient Descent and Follow The Leader variations (in the case of multi-agent tasks). Thus, a basic question endures: Why do simple algorithms work so well even in these challenging settings? A rapidly developing recent line of research, the so-called beyond worst-case analysis of algo- rithms (BWCA), aims to bridge this gap between theory and practice considering the design and analysis of algorithms using more realistic models or using natural structural properties. In this the- sis, we continue the line of work of BWCA by making contributions in several areas. Specifically, we focus on four main problems and models: 1. In combinatorial optimization, can simple flip local search methods compute local optima efficiently? 2. In continuous optimization, are the gradients necessary to compute efficiently local optima in the general nonconvex setting? 3. In multi-agent optimization, what is the computational complexity of generalized Nash Equi- libria and how simple dynamics behave on them? 4. In the special case of nonconvex-nonconcave minmax optimization, are there rich classes of ML well-motivated games with an effectively unique game theoretic solution that is selected by standard optimization techniques (e.g gradient descent) ? Leveraging machinery like the celebrated Smoothed Analysis of Spielman and Tang and the widely studied Lyapunov’s Stability Analysis, in this thesis we show that although the standard versions of these classical algorithms do not enjoy good theoretical properties in the worst case, simple modifications are sufficient to grant them desirable behaviors, which explain the underlying mechanisms behind their favorable performance in practice.
18

Multiscaling and Machine Learning Approaches to Physics Simulation

Chen, Peter Yichen January 2022 (has links)
Physics simulation computationally models physical phenomena. It is the bread-and-butter of modern-day scientific discoveries and engineering design: from plasma theory to digital twins. However, viable efficiency remains a long-standing challenge to physics simulation. Accurate, real-world-scale simulations are often computationally too expensive (e.g., excessive wall-clock time) to gain any practical usage. In this thesis, we explore two general solutions to tackle this problem. Our first proposed method is a multiscaling approach. Simulating physics at its fundamental discrete scale, e.g., the atomic-level, provides unmatched levels of detail and generality, but proves to be excessively costly when applied to large-scale systems. Alternatively, simulating physics at the continuum scale governed by partial differential equations (PDEs) is computationally tractable, but limited in applicability due to built-in modeling assumptions. We propose a multiscaling simulation technique that exploits the dual strengths of discrete and continuum treatments. In particular, we design a hybrid discrete-continuum framework for granular media. In this adaptive framework, we define an oracle to dynamically partition the domain into continuum regions where safe and discrete regions where necessary. We couple the dynamics of the discrete and continuum regions via overlapping transition zones to form one coherent simulation. Enrichment and homogenization operations convert between discrete and continuum representations, which allow the partitions to evolve over time. This approach saves the computation cost by partially employing continuum simulations and obtains up to 116X speedup over the discrete-only simulations while maintaining the same level of accuracy. To further accelerate PDE-governed continuum simulations, we propose a machine-learning-based reduced-order modeling (ROM) method. Whereas prior ROM approaches reduce the dimensionality of discretized vector fields, our continuous reduced-order modeling (CROM) approach builds a smooth, low-dimensional manifold of the continuous vector fields themselves, not their discretization. We represent this reduced manifold using neural fields, relying on their continuous and differentiable nature to efficiently solve the PDEs. CROM may train on any and all available numerical solutions of the continuous system, even when they are obtained using diverse methods or discretizations. Indeed, CROM is the first model reduction framework that can simultaneously handle data from voxels, meshes, and point clouds. After the low-dimensional manifolds are established, solving PDEs requires significantly less computational resources. Since CROM is discretization-agnostic, CROM-based PDE solvers may optimally adapt discretization resolution over time to economize computation. We validate our approach on an extensive range of PDEs from thermodynamics, image processing, solid mechanics, and fluid dynamics. Selected large-scale experiments demonstrate that our approach obtains speed, memory, and accuracy advantages over prior ROM approaches while gaining 109X wall-clock speedup over full-order models on CPUs and 89X speedup on GPUs.
19

A General Framework for Model Adaptation to Meet Practical Constraints in Computer Vision

Huang, Shiyuan January 2024 (has links)
Recent advances in deep learning models have shown impressive capabilities in various computer vision tasks, which encourages the integration of these models into real-world vision systems such as smart devices. This integration presents new challenges as models need to meet complex real-world requirements. This thesis is dedicated to building practical deep learning models, where we focus on two main challenges in vision systems: data efficiency and variability. We address these issues by providing a general model adaptation framework that extends models with practical capabilities. In the first part of the thesis, we explore model adaptation approaches for efficient representation. We illustrate the benefits of different types of efficient data representations, including compressed video modalities from video codecs, low-bit features and sparsified frames and texts. By using such efficient representation, the system complexity such as data storage, processing and computation can be greatly reduced. We systematically study various methods to extract, learn and utilize these representations, presenting new methods to adapt machine learning models for them. The proposed methods include a compressed-domain video recognition model with coarse-to-fine distillation training strategy, a task-specific feature compression framework for low-bit video-and-language understanding, and a learnable token sparsification approach for sparsifying human-interpretable video inputs. We demonstrate new perspectives of representing vision data in a more practical and efficient way in various applications. The second part of the thesis focuses on open environment challenges, where we explore model adaptation for new, unseen classes and domains. We examine the practical limitations in current recognition models, and introduce various methods to empower models in addressing open recognition scenarios. This includes a negative envisioning framework for managing new classes and outliers, and a multi-domain translation approach for dealing with unseen domain data. Our study shows a promising trajectory towards models exhibiting the capability to navigate through diverse data environments in real-world applications.
20

Interpretable Machine Learning and Sparse Coding for Computer Vision

Landecker, Will 01 August 2014 (has links)
Machine learning offers many powerful tools for prediction. One of these tools, the binary classifier, is often considered a black box. Although its predictions may be accurate, we might never know why the classifier made a particular prediction. In the first half of this dissertation, I review the state of the art of interpretable methods (methods for explaining why); after noting where the existing methods fall short, I propose a new method for a particular type of black box called additive networks. I offer a proof of trustworthiness for this new method (meaning a proof that my method does not "make up" the logic of the black box when generating an explanation), and verify that its explanations are sound empirically. Sparse coding is part of a family of methods that are believed, by many researchers, to not be black boxes. In the second half of this dissertation, I review sparse coding and its application to the binary classifier. Despite the fact that the goal of sparse coding is to reconstruct data (an entirely different goal than classification), many researchers note that it improves classification accuracy. I investigate this phenomenon, challenging a common assumption in the literature. I show empirically that sparse reconstruction is not necessarily the right intermediate goal, when our ultimate goal is classification. Along the way, I introduce a new sparse coding algorithm that outperforms competing, state-of-the-art algorithms for a variety of important tasks.

Page generated in 0.4354 seconds