• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Quantifying Information Leakage via Adversarial Loss Functions: Theory and Practice

January 2020 (has links)
abstract: Modern digital applications have significantly increased the leakage of private and sensitive personal data. While worst-case measures of leakage such as Differential Privacy (DP) provide the strongest guarantees, when utility matters, average-case information-theoretic measures can be more relevant. However, most such information-theoretic measures do not have clear operational meanings. This dissertation addresses this challenge. This work introduces a tunable leakage measure called maximal $\alpha$-leakage which quantifies the maximal gain of an adversary in inferring any function of a data set. The inferential capability of the adversary is modeled by a class of loss functions, namely, $\alpha$-loss. The choice of $\alpha$ determines specific adversarial actions ranging from refining a belief for $\alpha =1$ to guessing the best posterior for $\alpha = \infty$, and for the two specific values maximal $\alpha$-leakage simplifies to mutual information and maximal leakage, respectively. Maximal $\alpha$-leakage is proved to have a composition property and be robust to side information. There is a fundamental disjoint between theoretical measures of information leakages and their applications in practice. This issue is addressed in the second part of this dissertation by proposing a data-driven framework for learning Censored and Fair Universal Representations (CFUR) of data. This framework is formulated as a constrained minimax optimization of the expected $\alpha$-loss where the constraint ensures a measure of the usefulness of the representation. The performance of the CFUR framework with $\alpha=1$ is evaluated on publicly accessible data sets; it is shown that multiple sensitive features can be effectively censored to achieve group fairness via demographic parity while ensuring accuracy for several \textit{a priori} unknown downstream tasks. Finally, focusing on worst-case measures, novel information-theoretic tools are used to refine the existing relationship between two such measures, $(\epsilon,\delta)$-DP and R\'enyi-DP. Applying these tools to the moments accountant framework, one can track the privacy guarantee achieved by adding Gaussian noise to Stochastic Gradient Descent (SGD) algorithms. Relative to state-of-the-art, for the same privacy budget, this method allows about 100 more SGD rounds for training deep learning models. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2020
2

<b>EXPLORING ENSEMBLE MODELS AND GAN-BASED </b><b>APPROACHES FOR AUTOMATED DETECTION OF </b><b>MACHINE-GENERATED TEXT</b>

Surbhi Sharma (18437877) 29 April 2024 (has links)
<p dir="ltr">Automated detection of machine-generated text has become increasingly crucial in various fields such as cybersecurity, journalism, and content moderation due to the proliferation of generated content, including fake news, spam, and bot-generated comments. Traditional methods for detecting such content often rely on rule-based systems or supervised learning approaches, which may struggle to adapt to evolving generation techniques and sophisticated manipulations. In this thesis, we explore the use of ensemble models and Generative Adversarial Networks (GANs) for the automated detection of machine-generated text. </p><p dir="ltr">Ensemble models combine the strengths of different approaches, such as utilizing both rule-based systems and machine learning algorithms, to enhance detection accuracy and robustness. We investigate the integration of linguistic features, syntactic patterns, and semantic cues into machine learning pipelines, leveraging the power of Natural Language Processing (NLP) techniques. By combining multiple modalities of information, Ensemble models can effectively capture the subtle characteristics and nuances inherent in machine-generated text, improving detection performance. </p><p dir="ltr">In my latest experiments, I examined the performance of a Random Forest classifier trained on TF-IDF representations in combination with RoBERTa embeddings to calculate probabilities for machine-generated text detection. Test1 results showed promising accuracy rates, indicating the effectiveness of combining TF-IDF with RoBERTa probabilities. Test2 further validated these findings, demonstrating improved detection performance compared to standalone approaches.<br></p><p dir="ltr">These results suggest that leveraging Random Forest TF-IDF representation with RoBERTa embeddings to calculate probabilities can enhance the detection accuracy of machine-generated text.<br></p><p dir="ltr">Furthermore, we delve into the application of GAN-RoBERTa, a class of deep learning models comprising a generator and a discriminator trained adversarially, for generating and detecting machine-generated text. GANs have demonstrated remarkable capabilities in generating realistic text, making them a potential tool for adversaries to produce deceptive content. However, this same adversarial nature can be harnessed for detection purposes,<br>where the discriminator is trained to distinguish between genuine and machine-generated text.<br></p><p dir="ltr">Overall, our findings suggest that the use of Ensemble models and GAN-RoBERTa architectures holds significant promise for the automated detection of machine-generated text. Through a combination of diverse approaches and adversarial training techniques, we have demonstrated improved detection accuracy and robustness, thereby addressing the challenges posed by the proliferation of generated content across various domains. Further research and refinement of these approaches will be essential to stay ahead of evolving generation techniques and ensure the integrity and trustworthiness of textual content in the digital landscape.</p>

Page generated in 0.0603 seconds