Global ETD Search

1	Building A More Efficient Mobile Vision System Through Adaptive Video Analytics Junpeng Guo (20349582) 17 December 2024 (has links) <p dir="ltr">Mobile vision is becoming the norm, transforming our daily lives. It powers numerous applications, enabling seamless interactions between the digital and physical worlds, such as augmented reality, real-time object detection, and many others. The popularity of mobile vision has spurred advancements from both computer vision (CV) and mobile edge computing (MEC) communities. The former focuses on improving analytics accuracy through the use of proper deep neural networks (DNNs), while the latter addresses the resource limitations of mobile environments by coordinating tasks between mobile and edge devices, determining which data to transmit and process to enable real-time performance. </p><p dir="ltr"> Despite recent advancements, existing approaches typically integrate the functionalities of the two camps at a basic task level. They rely on a uniform on-device processing scheme that streams the same type of data and uses the same DNN model for identical CV tasks, regardless of the analytical complexity of the current input, input size, or latency requirements. This lack of adaptability to dynamic contexts limits their ability to achieve optimal efficiency in scenarios involving diverse source data, varying computational resources, and differing application requirements. </p><p dir="ltr">Our approach seeks to move beyond task-level adaptation by emphasizing customized optimizations tailored to dynamic use scenarios. This involves three key adaptive strategies: dynamically compressing source data based on contextual information, selecting the appropriate computing model (e.g., DNN or sub-DNN) for the vision task, and establishing a feedback mechanism for context-aware runtime tuning. Additionally, for scenarios involving movable cameras, the feedback mechanism guides the data capture process to further enhance performance. These innovations are explored across three use cases categorized by the capture device: one stationary camera, one moving camera, and cross-camera analytics. </p><p dir="ltr">My dissertation begins with a stationary camera scenario, where we improve efficiency by adapting to the use context on both the device and edge sides. On the device side, we explore a broader compression space and implement adaptive compression based on data context. Specifically, we leverage changes in confidence scores as feedback to guide on-device compression, progressively reducing data volume while preserving the accuracy of visual analytics. On the edge side, instead of training a specialized DNN for each deployment scenario, we adaptively select the best-fit sub-network for the given context. A shallow sub-network is used to “test the waters”, accelerating the search for a deep sub-network that maximizes analytical accuracy while meeting latency requirements.</p><p dir="ltr"> Next, we explore scenarios involving a moving camera, such as those mounted on drones. These introduce new challenges, including increased data encoding demands due to camera movement and degraded analytics performance (e.g., tracking) caused by changing perspectives. To address these issues, we leverage drone-specific domain knowledge to optimize compression for object detection by applying global motion compensation and assigning different resolutions at a tile-granularity level based on the far-near effect. Furthermore, we tackle the more complex task of object tracking and following, where the analytics results directly influence the drone’s navigation. To enable effective target following with minimal processing overhead, we design an adaptive frame rate tracking mechanism that dynamically adjusts based on changing contexts.</p><p dir="ltr"> Last but not least, we extend the work to cross-camera analytics, focusing on coordination between one stationary ground-based camera and one moving aerial camera. The primary challenge lies in addressing significant misalignments (e.g., scale, rotation, and lighting variations) between the two perspectives. To overcome these issues, we propose a multi-exit matching mechanism that prioritizes local feature matching while incorporating global features and additional cues, such as color and location, to refine matches as needed. This approach ensures accurate identification of the same target across viewpoints while minimizing computational overhead by dynamically adapting to the complexity of the matching task. </p><p dir="ltr">While the current work primarily addresses ideal conditions, assuming favorable weather, optimal lighting, and reliable network performance, it establishes a solid foundation for future innovations in adaptive video processing under more challenging conditions. Future efforts will focus on enhancing robustness against adversarial factors, such as sensing data drift and transmission losses. Additionally, we plan to explore multi-camera coordination and multimodal data integration, leveraging the growing potential of large language models to further advance this field.</p> Computer vision Mobile computing Video Analytics Mobile Computing,
2	Two New Applications of Tensors to Machine Learning for Wireless Communications Bhogi, Keerthana 09 September 2021 (has links) With the increasing number of wireless devices and the phenomenal amount of data that is being generated by them, there is a growing interest in the wireless communications community to complement the traditional model-driven design approaches with data-driven machine learning (ML)-based solutions. However, managing the large-scale multi-dimensional data to maintain the efficiency and scalability of the ML algorithms has obviously been a challenge. Tensors provide a useful framework to represent multi-dimensional data in an integrated manner by preserving relationships in data across different dimensions. This thesis studies two new applications of tensors to ML for wireless communications where the tensor structure of the concerned data is exploited in novel ways. The first contribution of this thesis is a tensor learning-based low-complexity precoder codebook design technique for a full-dimension multiple-input multiple-output (FD-MIMO) system with a uniform planar antenna (UPA) array at the transmitter (Tx) whose channel distribution is available through a dataset. Represented as a tensor, the FD-MIMO channel is further decomposed using a tensor decomposition technique to obtain an optimal precoder which is a function of Kronecker-Product (KP) of two low-dimensional precoders, each corresponding to the horizontal and vertical dimensions of the FD-MIMO channel. From the design perspective, we have made contributions in deriving a criterion for optimal product precoder codebooks using the obtained low-dimensional precoders. We show that this product codebook design problem is an unsupervised clustering problem on a Cartesian Product Grassmann Manifold (CPM), where the optimal cluster centroids form the desired codebook. We further simplify this clustering problem to a $K$-means algorithm on the low-dimensional factor Grassmann manifolds (GMs) of the CPM which correspond to the horizontal and vertical dimensions of the UPA, thus significantly reducing the complexity of precoder codebook construction when compared to the existing codebook learning techniques. The second contribution of this thesis is a tensor-based bandwidth-efficient gradient communication technique for federated learning (FL) with convolutional neural networks (CNNs). Concisely, FL is a decentralized ML approach that allows to jointly train an ML model at the server using the data generated by the distributed users coordinated by a server, by sharing only the local gradients with the server and not the raw data. Here, we focus on efficient compression and reconstruction of convolutional gradients at the users and the server, respectively. To reduce the gradient communication overhead, we compress the sparse gradients at the users to obtain their low-dimensional estimates using compressive sensing (CS)-based technique and transmit to the server for joint training of the CNN. We exploit a natural tensor structure offered by the convolutional gradients to demonstrate the correlation of a gradient element with its neighbors. We propose a novel prior for the convolutional gradients that captures the described spatial consistency along with its sparse nature in an appropriate way. We further propose a novel Bayesian reconstruction algorithm based on the Generalized Approximate Message Passing (GAMP) framework that exploits this prior information about the gradients. Through the numerical simulations, we demonstrate that the developed gradient reconstruction method improves the convergence of the CNN model. / Master of Science / The increase in the number of wireless and mobile devices have led to the generation of massive amounts of multi-modal data at the users in various real-world applications including wireless communications. This has led to an increasing interest in machine learning (ML)-based data-driven techniques for communication system design. The native setting of ML is {em centralized} where all the data is available on a single device. However, the distributed nature of the users and their data has also motivated the development of distributed ML techniques. Since the success of ML techniques is grounded in their data-based nature, there is a need to maintain the efficiency and scalability of the algorithms to manage the large-scale data. Tensors are multi-dimensional arrays that provide an integrated way of representing multi-modal data. Tensor algebra and tensor decompositions have enabled the extension of several classical ML techniques to tensors-based ML techniques in various application domains such as computer vision, data-mining, image processing, and wireless communications. Tensors-based ML techniques have shown to improve the performance of the ML models because of their ability to leverage the underlying structural information in the data. In this thesis, we present two new applications of tensors to ML for wireless applications and show how the tensor structure of the concerned data can be exploited and incorporated in different ways. The first contribution is a tensor learning-based precoder codebook design technique for full-dimension multiple-input multiple-output (FD-MIMO) systems where we develop a scheme for designing low-complexity product precoder codebooks by identifying and leveraging a tensor representation of the FD-MIMO channel. The second contribution is a tensor-based gradient communication scheme for a decentralized ML technique known as federated learning (FL) with convolutional neural networks (CNNs), where we design a novel bandwidth-efficient gradient compression-reconstruction algorithm that leverages a tensor structure of the convolutional gradients. The numerical simulations in both applications demonstrate that exploiting the underlying tensor structure in the data provides significant gains in their respective performance criteria. Tensor machine learning (ML) Grassmann manifold (GM) federated learning (FL) neural network (NN)

Search results

Building A More Efficient Mobile Vision System Through Adaptive Video Analytics

Two New Applications of Tensors to Machine Learning for Wireless Communications