Global ETD Search

101	Product Defect Discovery and Summarization from Online User Reviews Zhang, Xuan 29 October 2018 (has links) Product defects concern various groups of people, such as customers, manufacturers, government officials, etc. Thus, defect-related knowledge and information are essential. In keeping with the growth of social media, online forums, and Internet commerce, people post a vast amount of feedback on products, which forms a good source for the automatic acquisition of knowledge about defects. However, considering the vast volume of online reviews, how to automatically identify critical product defects and summarize the related information from the huge number of user reviews is challenging, even when we target only the negative reviews. As a kind of opinion mining research, existing defect discovery methods mainly focus on how to classify the type of product issues, which is not enough for users. People expect to see defect information in multiple facets, such as product model, component, and symptom, which are necessary to understand the defects and quantify their influence. In addition, people are eager to seek problem resolutions once they spot defects. These challenges cannot be solved by existing aspect-oriented opinion mining models, which seldom consider the defect entities mentioned above. Furthermore, users also want to better capture the semantics of review text, and to summarize product defects more accurately in the form of natural language sentences. However, existing text summarization models including neural networks can hardly generalize to user review summarization due to the lack of labeled data. In this research, we explore topic models and neural network models for product defect discovery and summarization from user reviews. Firstly, a generative Probabilistic Defect Model (PDM) is proposed, which models the generation process of user reviews from key defect entities including product Model, Component, Symptom, and Incident Date. Using the joint topics in these aspects, which are produced by PDM, people can discover defects which are represented by those entities. Secondly, we devise a Product Defect Latent Dirichlet Allocation (PDLDA) model, which describes how negative reviews are generated from defect elements like Component, Symptom, and Resolution. The interdependency between these entities is modeled by PDLDA as well. PDLDA answers not only what the defects look like, but also how to address them using the crowd wisdom hidden in user reviews. Finally, the problem of how to summarize user reviews more accurately, and better capture the semantics in them, is studied using deep neural networks, especially Hierarchical Encoder-Decoder Models. For each of the research topics, comprehensive evaluations are conducted to justify the effectiveness and accuracy of the proposed models, on heterogeneous datasets. Further, on the theoretical side, this research contributes to the research stream on product defect discovery, opinion mining, probabilistic graphical models, and deep neural network models. Regarding impact, these techniques will benefit related users such as customers, manufacturers, and government officials. / Ph. D. / Product defects concern various groups of people, such as customers, manufacturers, and government officials. Thus, defect-related knowledge and information are essential. In keeping with the growth of social media, online forums, and Internet commerce, people post a vast amount of feedback on products, which forms a good source for the automatic acquisition of knowledge about defects. However, considering the vast volume of online reviews, how to automatically identify critical product defects and summarize the related information from the huge number of user reviews is challenging, even when we target only the negative reviews. People expect to see defect information in multiple facets, such as product model, component, and symptom, which are necessary to understand the defects and quantify their influence. In addition, people are eager to seek problem resolutions once they spot defects. Furthermore, users also want to better summarize product defects more accurately in the form of natural language sentences. These requirements cannot be satisfied by existing methods, which seldom consider the defect entities mentioned above, or hardly generalize to user review summarization. In this research, we develop novel Machine Learning (ML) algorithms for product defect discovery and summarization. Firstly, we study how to identify product defects and their related attributes, such as Product Model, Component, Symptom, and Incident Date. Secondly, we devise a novel algorithm, which can discover product defects and the related Component, Symptom, and Resolution, from online user reviews. This method tells not only what the defects look like, but also how to address them using the crowd wisdom hidden in user reviews. Finally, we address the problem of how to summarize user reviews in the form of natural language sentences using a paraphrase-style method. On the theoretical side, this research contributes to multiple research areas in Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning. Regarding impact, these techniques will benefit related users such as customers, manufacturers, and government officials. Opinion Mining Product Defect Discovery Opinion Summarization Topic Model Deep learning (Machine learning)
102	Securing Cloud Containers through Intrusion Detection and Remediation Abed, Amr Sayed Omar 29 August 2017 (has links) Linux containers are gaining increasing traction in both individual and industrial use. As these containers get integrated into mission-critical systems, real-time detection of malicious cyber attacks becomes a critical operational requirement. However, a little research has been conducted in this area. This research introduces an anomaly-based intrusion detection and remediation system for container-based clouds. The introduced system monitors system calls between the container and the host server to passively detect malfeasance against applications running in cloud containers. We started by applying a basic memory-based machine learning technique to model the container behavior. The same technique was also extended to learn the behavior of a distributed application running in a number of cloud-based containers. In addition to monitoring the behavior of each container independently, the system used prior knowledge for a more informed detection system. We then studied the feasibility and effectiveness of applying a more sophisticated deep learning technique to the same problem. We used a recurrent neural network to model the container behavior. We evaluated the system using a typical web application hosted in two containers, one for the front-end web server, and one for the back-end database server. The system has shown promising results for both of the machine learning techniques used. Finally, we describe a number of incident handling and remediation techniques to be applied upon attack detection. / Ph. D. Security in Cloud Computing Deep learning (Machine learning) Intrusion Detection Container Security Behavior Modeling Anomaly Detection
103	Detecting and Mitigating Rumors in Social Media Islam, Mohammad Raihanul 19 June 2020 (has links) The penetration of social media today enables the rapid spread of breaking news and other developments to millions of people across the globe within hours. However, such pervasive use of social media by the general masses to receive and consume news is not without its attendant negative consequences as it also opens opportunities for nefarious elements to spread rumors or misinformation. A rumor generally refers to an interesting piece of information that is widely disseminated through a social network and whose credibility cannot be easily substantiated. A rumor can later turn out to be true or false or remain unverified. The spread of misinformation and fake news can lead to deleterious effects on users and society. The objective of the proposed research is to develop a range of machine learning methods that will effectively detect and characterize rumor veracity in social media. Since users are the primary protagonists on social media, analyzing the characteristics of information spread w.r.t. users can be effective for our purpose. For our first problem, we propose a method of computing user embeddings from underlying social networks. For our second problem, we propose a long short-term memory (LSTM) based model that can classify whether a story discussed in a thread can be categorized as a false, true, or unverified rumor. We demonstrate the utility of user features computed from the first problem to address the second problem. For our third problem, we propose a method that uses user profile information to detect rumor veracity. This method has the advantage of not requiring the underlying social network, which can be tedious to compute. For the last problem, we investigate a rumor mitigation technique that recommends fact-checking URLs to rumor debunkers, i.e., social network users who are very passionate about disseminating true news. Here, we incorporate the influence of other users on rumor debunkers in addition to their previous URL sharing history to recommend relevant fact-checking URLs. / Doctor of Philosophy / A rumor is generally defined as an interesting piece of a story that cannot be authenticated easily. On social networks, a user can generally find an interesting piece of news or story and may share (retweet) it. A story that initially appears plausible can later turn out to be false or remain unverified. The propagation of false rumors on social networks has a deteriorating effect on user experience. Therefore, rumor veracity detection is important, and drawing interest in social network research. In this thesis, we develop various machine learning models that detect rumor veracity. For this purpose, we exploit different types of information regarding users, such as profile details and connectivity with other users etc. Moreover, we propose a rumor mitigation technique that recommends fact-checking URLs to social network users who are passionate about debunking rumors. Here, we leverage similar techniques used in e-commerce sites for recommending products to solve this problem. Rumor Veracity Detection Rumor Mitigation Social Network Analysis Deep learning (Machine learning)
104	A Profit-Neutral Double-price-signal Retail Electricity Market Solution for Incentivizing Price-responsive DERs Considering Network Constraints Cai, Mengmeng 23 June 2020 (has links) Emerging technologies, including distributed energy resources (DERs), internet-of-things and advanced distribution management systems, are revolutionizing the power industry. They provide benefits like higher operation flexibility and lower bulk grid dependency, and are moving the modern power grid towards a decentralized, interconnected and intelligent direction. Consequently, the emphasis of the system operation management has been shifted from the supply-side to the demand-side. It calls for a reconsideration of the business model for future retail market operators. To address this need, this dissertation proposes an innovative retail market solution tailored to market environments penetrated with price-responsive DERs. The work is presented from aspects of theoretical study, test-bed platform development, and experimental analysis, within which two topics relevant to the retail market operation are investigated in depth. The first topic covers the modeling of key retail market participants. With regard to price-insensitive participants, fixed loads are treated as the representative. Deep learning-based day-ahead load forecasting models are developed in this study, utilizing both recurrent and convolutional neural networks, to predict the part of demands that keep fixed regardless of the market price. With regard to price-sensitive participants, battery storages are selected as the representative. An optimization-based battery arbitrage model is developed in this study to represent their price-responsive behaviors in response to a dynamic price. The second topic further investigates how the retail market model and pricing strategy should be designed to incentivize these market participants. Different from existing works, this study innovatively proposes a profit-neutral double-price-signal retail market model. Such a design differentiates elastic prosumers, who actively offer flexibilities to the system operation, from normal inelastic consumers/generators, based on their sensitivities to the market price. Two price signals, namely retail grid service price and retail energy price, are then introduced to separately quantify values of the flexibility, provided by elastic participants, and the electricity commodity, sold/bought to/from inelastic participants. Within the proposed retail market, a non-profit retail market operator (RMO) manages and settles the market through determining the price signals and supplementary subsidy to minimize the overall system cost. In response to the announced retail grid service price, elastic prosumers adjust their day-ahead operating schedules to maximize their payoffs. Given the interdependency between decisions made by the RMO and elastic participants, a retail pricing scheme, formulated based on a bi-level optimization framework, is proposed. Additional efforts are made on merging and linearizing the original non-convex bi-level problem into a single-level mixed-integer linear programming problem to ensure the computational efficiency of the retail pricing tool. Case studies are conducted on a modified IEEE 34-bus test-bed system, simulating both physical operations of the power grid and financial interactions inside the retail market. Experimental results demonstrate promising properties of the proposed retail market solution: First of all, it is able to provide cost-saving benefits to inelastic customers and create revenues for elastic customers at the same time, justifying the rationalities of these participants to join the market. Second of all, the addition of the grid service subsidy not only strengthens the profitability of the elastic customer, but also ensures that the benefit enjoyed per customer will not be compromised by the competition brought up by a growing number of participants. Furthermore, it is able to properly capture impacts from line losses and voltage constraints on the system efficiency and stability, so as to derive practical pricing solutions that respect the system operating rules. Last but not least, it encourages the technology improvement of elastic assets as elastic assets in better conditions are more profitable and could better save the electricity bills for inelastic customers. Above all, the superiority of the proposed retail market solution is proven. It can serve as a promising start for the retail electricity market reconstruction. / Doctor of Philosophy / The electricity market plays a critical role in ensuring the economic and secure operation of the power system. The progress made by distributed energy resources (DERs) has reshaped the modern power industry bringing a larger proportion of price-responsive behaviors to the demand-side. It challenges the traditional wholesale-only electricity market and calls for an addition of retail markets to better utilize distributed and elastic assets. Therefore, this dissertation targets at offering a reliable and computational affordable retail market solution to bridge this knowledge gap. Different from existing works, this study assumes that the retail market is managed by a profit-neutral retail market operator (RMO), who oversees and facilitates the system operation for maximizing the system efficiency rather than making profits. Market participants are categorized into two groups: inelastic participants and elastic participants, based on their sensitivity to the market price. The motivation behind this design is that instead of treating elastic participants as normal customers, it is more reasonable to treat them as grid service providers who offer operational flexibilities that benefit the system efficiency. Correspondingly, a double-signal pricing scheme is proposed, such that the flexibility, provided by elastic participants, and the electricity commodity, generated/consumed by inelastic participants, are separately valued by two distinct prices, namely retail grid service price and retail energy price. A grid service subsidy is also introduced in the pricing system to provide supplementary incentives to elastic customers. These two price signals in addition to the subsidy are determined by the RMO via solving a bi-level optimization problem given the interdependency between the prices and reaction of elastic participants. Experimental results indicate that the proposed retail market model and pricing scheme are beneficial for both types of market participants, practical for the network-constrained real-world implementation, and supportive for the technology improvement of elastic assets. Retail Electricity Market Load Forecasting Battery Arbitrage Bi-level Optimization Deep learning (Machine learning)
105	A Deep Learning Approach to Predict Accident Occurrence Based on Traffic Dynamics Khaghani, Farnaz 05 1900 (has links) Traffic accidents are of concern for traffic safety; 1.25 million deaths are reported each year. Hence, it is crucial to have access to real-time data and rapidly detect or predict accidents. Predicting the occurrence of a highway car accident accurately any significant length of time into the future is not feasible since the vast majority of crashes occur due to unpredictable human negligence and/or error. However, rapid traffic incident detection could reduce incident-related congestion and secondary crashes, alleviate the waste of vehicles’ fuel and passengers’ time, and provide appropriate information for emergency response and field operation. While the focus of most previously proposed techniques is predicting the number of accidents in a certain region, the problem of predicting the accident occurrence or fast detection of the accident has been little studied. To address this gap, we propose a deep learning approach and build a deep neural network model based on long short term memory (LSTM). We apply it to forecast the expected speed values on freeways’ links and identify the anomalies as potential accident occurrences. Several detailed features such as weather, traffic speed, and traffic flow of upstream and downstream points are extracted from big datasets. We assess the proposed approach on a traffic dataset from Sacramento, California. The experimental results demonstrate the potential of the proposed approach in identifying the anomalies in speed value and matching them with accidents in the same area. We show that this approach can handle a high rate of rapid accident detection and be implemented in real-time travelers’ information or emergency management systems. / M.S. / Rapid traffic accident detection/prediction is essential for scaling down non-recurrent conges- tion caused by traffic accidents, avoiding secondary accidents, and accelerating emergency system responses. In this study, we propose a framework that uses large-scale historical traffic speed and traffic flow data along with the relevant weather information to obtain robust traffic patterns. The predicted traffic patterns can be coupled with the real traffic data to detect anomalous behavior that often results in traffic incidents in the roadways. Our framework consists of two major steps. First, we estimate the speed values of traffic at each point based on the historical speed and flow values of locations before and after each point on the roadway. Second, we compare the estimated values with the actual ones and introduce the ones that are significantly different as an anomaly. The anomaly points are the potential points and times that an accident occurs and causes a change in the normal behavior of the roadways. Our study shows the potential of the approach in detecting the accidents while exhibiting promising performance in detecting the accident occurrence at a time close to the actual time of occurrence. Deep learning (Machine learning) LSTM Bi-directional LSTM Anomaly Detection Database management
106	Modified Kernel Principal Component Analysis and Autoencoder Approaches to Unsupervised Anomaly Detection Merrill, Nicholas Swede 01 June 2020 (has links) Unsupervised anomaly detection is the task of identifying examples that differ from the normal or expected pattern without the use of labeled training data. Our research addresses shortcomings in two existing anomaly detection algorithms, Kernel Principal Component Analysis (KPCA) and Autoencoders (AE), and proposes novel solutions to improve both of their performances in the unsupervised settings. Anomaly detection has several useful applications, such as intrusion detection, fault monitoring, and vision processing. More specifically, anomaly detection can be used in autonomous driving to identify obscured signage or to monitor intersections. Kernel techniques are desirable because of their ability to model highly non-linear patterns, but they are limited in the unsupervised setting due to their sensitivity of parameter choices and the absence of a validation step. Additionally, conventionally KPCA suffers from a quadratic time and memory complexity in the construction of the gram matrix and a cubic time complexity in its eigendecomposition. The problem of tuning the Gaussian kernel parameter, $sigma$, is solved using the mini-batch stochastic gradient descent (SGD) optimization of a loss function that maximizes the dispersion of the kernel matrix entries. Secondly, the computational time is greatly reduced, while still maintaining high accuracy by using an ensemble of small, textit{skeleton} models and combining their scores. The performance of traditional machine learning approaches to anomaly detection plateaus as the volume and complexity of data increases. Deep anomaly detection (DAD) involves the applications of multilayer artificial neural networks to identify anomalous examples. AEs are fundamental to most DAD approaches. Conventional AEs rely on the assumption that a trained network will learn to reconstruct normal examples better than anomalous ones. In practice however, given sufficient capacity and training time, an AE will generalize to reconstruct even very rare examples. Three methods are introduced to more reliably train AEs for unsupervised anomaly detection: Cumulative Error Scoring (CES) leverages the entire history of training errors to minimize the importance of early stopping and Percentile Loss (PL) training aims to prevent anomalous examples from contributing to parameter updates. Lastly, early stopping via Knee detection aims to limit the risk of over training. Ultimately, the two new modified proposed methods of this research, Unsupervised Ensemble KPCA (UE-KPCA) and the modified training and scoring AE (MTS-AE), demonstrates improved detection performance and reliability compared to many baseline algorithms across a number of benchmark datasets. / Master of Science / Anomaly detection is the task of identifying examples that differ from the normal or expected pattern. The challenge of unsupervised anomaly detection is distinguishing normal and anomalous data without the use of labeled examples to demonstrate their differences. This thesis addresses shortcomings in two anomaly detection algorithms, Kernel Principal Component Analysis (KPCA) and Autoencoders (AE) and proposes new solutions to apply them in the unsupervised setting. Ultimately, the two modified methods, Unsupervised Ensemble KPCA (UE-KPCA) and the Modified Training and Scoring AE (MTS-AE), demonstrates improved detection performance and reliability compared to many baseline algorithms across a number of benchmark datasets. Machine learning Deep learning (Machine learning) Anomaly Detection Autoencoder Kernel Principal Component Analysis
107	Increasing Accessibility of Electronic Theses and Dissertations (ETDs) Through Chapter-level Classification Jude, Palakh Mignonne 07 July 2020 (has links) Great progress has been made to leverage the improvements made in natural language processing and machine learning to better mine data from journals, conference proceedings, and other digital library documents. However, these advances do not extend well to book-length documents such as electronic theses and dissertations (ETDs). ETDs contain extensive research data; stakeholders -- including researchers, librarians, students, and educators -- can benefit from increased access to this corpus. Challenges arise while working with this corpus owing to the varied nature of disciplines covered as well as the use of domain-specific language. Prior systems are not tuned to this corpus. This research aims to increase the accessibility of ETDs by the automatic classification of chapters of an ETD using machine learning and deep learning techniques. This work utilizes an ETD-centric target classification system. It demonstrates the use of custom trained word and document embeddings to generate better vector representations of this corpus. It also describes a methodology to leverage extractive summaries of chapters of an ETD to aid in the classification process. Our findings indicate that custom embeddings and the use of summarization techniques can increase the performance of the classifiers. The chapter-level labels generated by this research help to identify the level of interdisciplinarity in the corpus. The automatic classifiers can also be further used in a search engine interface that would help users to find the most appropriate chapters. / Master of Science / Electronic Theses and Dissertations (ETDs) are submitted by students at the end of their academic study. These works contain research information pertinent to a given field. Increasing the accessibility of such documents will be beneficial to many stakeholders including students, researchers, librarians, and educators. In recent years, a great deal of research has been conducted to better extract information from textual documents with the use of machine learning and natural language processing. However, these advances have not been applied to increase the accessibility of ETDs. This research aims to perform the automatic classification of chapters extracted from ETDs. That will reduce the human effort required to label the key parts of these book-length documents. Additionally, when considered by search engines, such categorization can aid users to more easily find the chapters that are most relevant to their research. Electronic Theses and Dissertations Classification Machine learning Deep learning (Machine learning) Natural Language Processing
108	Land Cover Quantification using Autoencoder based Unsupervised Deep Learning Manjunatha Bharadwaj, Sandhya 27 August 2020 (has links) This work aims to develop a deep learning model for land cover quantification through hyperspectral unmixing using an unsupervised autoencoder. Land cover identification and classification is instrumental in urban planning, environmental monitoring and land management. With the technological advancements in remote sensing, hyperspectral imagery which captures high resolution images of the earth's surface across hundreds of wavelength bands, is becoming increasingly popular. The high spectral information in these images can be analyzed to identify the various target materials present in the image scene based on their unique reflectance patterns. An autoencoder is a deep learning model that can perform spectral unmixing by decomposing the complex image spectra into its constituent materials and estimating their abundance compositions. The advantage of using this technique for land cover quantification is that it is completely unsupervised and eliminates the need for labelled data which generally requires years of field survey and formulation of detailed maps. We evaluate the performance of the autoencoder on various synthetic and real hyperspectral images consisting of different land covers using similarity metrics and abundance maps. The scalability of the technique with respect to landscapes is assessed by evaluating its performance on hyperspectral images spanning across 100m x 100m, 200m x 200m, 1000m x 1000m, 4000m x 4000m and 5000m x 5000m regions. Finally, we analyze the performance of this technique by comparing it to several supervised learning methods like Support Vector Machine (SVM), Random Forest (RF) and multilayer perceptron using F1-score, Precision and Recall metrics and other unsupervised techniques like K-Means, N-Findr, and VCA using cosine similarity, mean square error and estimated abundances. The land cover classification obtained using this technique is compared to the existing United States National Land Cover Database (NLCD) classification standard. / Master of Science / This work aims to develop an automated deep learning model for identifying and estimating the composition of the different land covers in a region using hyperspectral remote sensing imagery. With the technological advancements in remote sensing, hyperspectral imagery which captures high resolution images of the earth's surface across hundreds of wavelength bands, is becoming increasingly popular. As every surface has a unique reflectance pattern, the high spectral information contained in these images can be analyzed to identify the various target materials present in the image scene. An autoencoder is a deep learning model that can perform spectral unmixing by decomposing the complex image spectra into its constituent materials and estimate their percent compositions. The advantage of this method in land cover quantification is that it is an unsupervised technique which does not require labelled data which generally requires years of field survey and formulation of detailed maps. The performance of this technique is evaluated on various synthetic and real hyperspectral datasets consisting of different land covers. We assess the scalability of the model by evaluating its performance on images of different sizes spanning over a few hundred square meters to thousands of square meters. Finally, we compare the performance of the autoencoder based approach with other supervised and unsupervised deep learning techniques and with the current land cover classification standard. Deep learning (Machine learning) Autoencoder Land Cover Hyperspectral Imagery Spectral Unmixing Reflectance Spectra
109	Distributed Intelligence for Multi-Agent Systems in Search and Rescue Patnayak, Chinmaya 05 November 2020 (has links) Unfavorable environmental and (or) human displacement may engender the need for Search and Rescue (SAR). Challenges such as inaccessibility, large search areas, and heavy reliance on available responder count, limited equipment and training makes SAR a challenging problem. Additionally, SAR operations also pose significant risk to involved responders. This opens a remarkable opportunity for robotic systems to assist and augment human understanding of the harsh environments. A large body of work exists on the introduction of ground and aerial robots in visual and temporal inspection of search areas with varying levels of autonomy. Unfortunately, limited autonomy is the norm in such systems, due to the limitations presented by on-board UAV resources and networking capabilities. In this work we propose a new multi-agent approach to SAR and introduce a wearable compute cluster in the form factor of a backpack. The backpack allows offloading compute intensive tasks such as Lost Person Behavior Modelling, Path Planning and Deep Neural Network based computer vision applications away from the UAVs and offers significantly high performance computers to execute them. The backpack also provides for a strong networking backbone and task orchestrators which allow for enhanced coordination and resource sharing among all the agents in the system. On the basis of our benchmarking experiments, we observe that the backpack can significantly boost capabilities and success in modern SAR responses. / Master of Science / Unfavorable environmental and (or) human displacement may engender the need for Search and Rescue (SAR). Challenges such as inaccessibility, large search areas, and heavy reliance on available responder count, limited equipment and training makes SAR a challenging problem. Additionally, SAR operations also pose significant risk to involved responders. This opens a remarkable opportunity for robotic systems to assist and augment human understanding of the harsh environments. A large body of work exists on the introduction of ground and aerial robots in visual and temporal inspection of search areas with varying levels of autonomy. Unfortunately, limited autonomy is the norm in such systems, due to the limitations presented by on-board UAV resources and networking capabilities. In this work we propose a new multi-agent approach to SAR and introduce a wearable compute cluster in the form factor of a backpack. The backpack allows offloading compute intensive tasks such as Lost Person Behavior Modelling, Path Planning and Deep Neural Network based computer vision applications away from the UAVs and offers significantly high performance computers to execute them. The backpack also provides for a strong networking backbone and task orchestrators which allow for enhanced coordination and resource sharing among all the agents in the system. On the basis of our benchmarking experiments, we observe that the backpack can significantly boost capabilities and success in modern SAR responses. Distributed Computing Multi-Agent Systems Search and Rescue Unmanned Aerial Vehicles Deep learning (Machine learning) Inference
110	Representational Capabilities of Feed-forward and Sequential Neural Architectures Sanford, Clayton Hendrick January 2024 (has links) Despite the widespread empirical success of deep neural networks over the past decade, a comprehensive understanding of their mathematical properties remains elusive, which limits the abilities of practitioners to train neural networks in a principled manner. This dissertation provides a representational characterization of a variety of neural network architectures, including fully-connected feed-forward networks and sequential models like transformers. The representational capabilities of neural networks are most famously characterized by the universal approximation theorem, which states that sufficiently large neural networks can closely approximate any well-behaved target function. However, the universal approximation theorem applies exclusively to two-layer neural networks of unbounded size and fails to capture the comparative strengths and weaknesses of different architectures. The thesis addresses these limitations by quantifying the representational consequences of random features, weight regularization, and model depth on feed-forward architectures. It further investigates and contrasts the expressive powers of transformers and other sequential neural architectures. Taken together, these results apply a wide range of theoretical tools—including approximation theory, discrete dynamical systems, and communication complexity—to prove rigorous separations between different neural architectures and scaling regimes. Computer science Neural networks (Computer science) Deep learning (Machine learning) Computer networks--Scalability

Search results