• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • Tagged with
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Data-driven Algorithms for Critical Detection Problems: From Healthcare to Cybersecurity Defenses

Song, Wenjia 16 January 2025 (has links)
Machine learning and data-driven approaches have been widely applied to critical detection problems, but their performance is often hindered by data-related challenges. This dissertation seeks to address three key challenges: data imbalance, scarcity of high-quality labels, and excessive data processing requirements, through studies in healthcare and cybersecurity. We study healthcare problems with imbalanced clinical datasets that lead to performance disparities across prediction classes and demographic groups. We systematically evaluate these disparities and propose a Double Prioritized (DP) bias correction method that significantly improves the model performance for underrepresented groups and reduces biases. Cyber threats, such as ransomware and advanced persistent threats (APTs), have presented growing threats in recent years. Existing ransomware defenses often rely on black-box models trained on unverified traces, providing limited interpretability. To address the scarcity of reliably labeled training data, we experimentally profile runtime ransomware behaviors of real-world samples and identify core patterns, enabling explainable and trustworthy detection. For APT detection, the large size of system audit logs hinders real-time detection. We introduce Madeline, a lightweight system that efficiently processes voluminous logs with compact representations, overcoming real-time detection bottlenecks. These contributions provide deployable and effective solutions, offering insights for future research within and beyond the fields of healthcare and cybersecurity. / Doctor of Philosophy / Machine learning and data-driven methods have been widely used to solve important detection problems, but their effectiveness is often limited by challenges related to the data they rely on. This dissertation focuses on three key challenges: imbalanced data, a lack of high-quality information, and the need to process large amounts of data quickly. We address these issues through studies in healthcare and cybersecurity. Data from clinical studies is often unbalanced, with certain patient groups or outcomes being underrepresented. This imbalance leads to inconsistent prediction accuracies across groups. We address this by developing a method called Double Prioritized (DP) bias correction, which significantly improves the accuracy for minority groups and reduces biases. Cyber threats are becoming increasingly serious risks. One type of prevalent malware is ransomware, which encrypts the victim's data and demands payment for recovery. Current ransomware defenses often learn from unverified data and make decisions without clear explanations. To improve this, we analyze how real-world ransomware behaves, identifying patterns that allow for more explainable and reliable detection. Another type of threat is called advanced persistent threats (APTs), which aim to stay undetected in the victim's system for a long time and exfiltrate data gradually. For APT detection, the challenge lies in analyzing the vast amount of activity data the system generates, which slows down detection. We introduce detectionname, a system designed to process large logs efficiently, enabling fast and accurate threat detection. These contributions provide practical solutions to pressing problems in healthcare and cybersecurity and offer ideas for future improvements within and beyond these fields.
2

Faktorer som påverkar ett rättvist beslutsfattande : En undersökning av begränsningar och möjligheter inom datainsamling för maskininlärning

Westerberg, Erik January 2023 (has links)
Artificial intelligence, AI, is widely acknowledged to have atransformative impact on various industries. However, thistechnology is not without its limitations. One such limitationis the potential reinforcement of human biases withinmachine learning systems. After all, these systems rely ondata generated by humans. To address this issue, theEuropean Union, EU, are implementing regulationsgoverning the development of AI systems, not only topromote ethical decision-making but also to curb marketoligopolies. Achieving fair decision-making relies on highquality data. The performance of a model is thussynonymous with high-quality data, encompassing breadth,accurate annotation, and relevance. Previous researchhighlights the lack of processes and methods guiding theeffort to ensure high-quality training data. In response, thisstudy aims to investigate the limitations and opportunitiesassociated with claims of data quality within the domain ofdata collection research. To achieve this, a research questionis posed: What factors constrain and enable the creation of ahigh-quality dataset in the context of AI fairness? The studyemploys a method of semistructured interviews withindustry experts, allowing them to describe their personalexperiences and the challenges they have encountered. Thestudy reveals multiple factors that restrict the ability tocreate a high-quality dataset and, ultimately, a fair decisionmaking system. The study also reveals a few opportunities inrelation to high quality data, which methods associated withthe research landscape provides. / Att artificiell intelligens är något som kommer vända uppoch ned på många branscher är något som många experter äröverens om. Men denna teknik är inte helt befriad frånbegränsningar. En av dessa begränsningar är att ett systemsom använder maskininlärning potentiellt kan förstärka defördomar vi människor besitter. Tekniken grundar sig trotsallt i data, data som skapas av oss människor. EU harbestämt sig för att tackla denna problematik genom att införaregler gällande huruvida system som tillämpar AI skallutvecklas. Både för att gynna det etiska beslutsfattandet menockså för att hämma oligopol på marknaden. För att uppnåett så rättvist beslutsfattande som möjligt krävs det data avhög kvalitet. En modells prestanda är således synonymt meddata av hög kvalitet, där bredd, korrekt annotering ochrelevans är betydande. Tidigare forskning pekar påavsaknaden av processer och metoder för att vägleda arbetetmed att säkerställa högkvalitativa träningsdata. Som svar pådetta syftar denna studie till att undersöka vilkabegränsningar och möjligheter som gör anspråk pådatakvalitet i delar av forskningsområdet Data collection.Detta görs genom att besvara forskningsfrågan: Vilkafaktorer begränsar och möjliggör skapandet av etthögkvalitativt dataset i kontexten rättvis AI? Metoden somtillämpas i studien för att besvara ovanstående ärsemistrukturerade intervjuer där yrkesverksamma experterfår beskriva sina personliga upplevelser gällande vilkautmaningar de har ställts inför. Studien resulterar i ett antalfaktorer som begränsar förutsättningarna för att skapa etthögkvalitativt dataset och i slutändan ett rättvistbeslutsfattande system. Studien resulterar även i att peka påett antal möjligheter i relation till högkvalitativa data, sommetoder associerade med forskningslandskapet besitter
3

Enhancing Fairness in Facial Recognition: Balancing Datasets and Leveraging AI-Generated Imagery for Bias Mitigation : A Study on Mitigating Ethnic and Gender Bias in Public Surveillance Systems

Abbas, Rashad, Tesfagiorgish, William Issac January 2024 (has links)
Facial recognition technology has become a ubiquitous tool in security and personal identification. However, the rise of this technology has been accompanied by concerns over inherent biases, particularly regarding ethnic and gender. This thesis examines the extent of these biases by focusing on the influence of dataset imbalances in facial recognition algorithms. We employ a structured methodological approach that integrates AI-generated images to enhance dataset diversity, with the intent to balance representation across ethnics and genders. Using the ResNet and Vgg model, we conducted a series of controlled experiments that compare the performance impacts of balanced versus imbalanced datasets. Our analysis includes the use of confusion matrices and accuracy, precision, recall and F1-score metrics to critically assess the model’s performance. The results demonstrate how tailored augmentation of training datasets can mitigate bias, leading to more equitable outcomes in facial recognition technology. We present our findings with the aim of contributing to the ongoing dialogue regarding AI fairness and propose a framework for future research in the field.

Page generated in 0.0341 seconds