Global ETD Search

81	Understanding the Impact of Cloud-Based Shadow IT on Employee and IT-Manager Perceptions in the Swedish Tech Industry Fager, Adam January 2023 (has links) This study focuses on the impact of Cloud-Based Shadow IT on data privacy in the tech sector of Sweden. It explores the use of unapproved applications by employees without the knowledge and control of the IT department. The objective is to understand how Cloud-Based Shadow IT affects employees' compliance with cloud services and to examine the understanding of IT managers regarding this phenomenon. The research problem addresses the challenges faced in ensuring compliance with regulations and effective utilization of cloud technology. By identifying the strengths, weaknesses, possibilities, and risks associated with Cloud-Based Shadow IT, this study aimed to provide insights for companies and IT managers in making informed decisions. It explores the relationship between Shadow IT and cloud services and investigates employees' and IT managers' adherence to and understanding of these issues. The findings indicate that employees have varying levels of understanding, with limited knowledge of approved cloud services. Managers prioritize security concerns, including data compliance and ownership, but lack strategies to address knowledge gaps. The use of Cloud-Based Shadow IT has both positive and negative consequences, including increased productivity and collaboration but also data loss and non-compliance risks. Factors such as education and awareness of security risks are important for employees to understand and comply with policies. Overall, the study highlights the need for continuous education and awareness programs to improve understanding and decision-making regarding cloud services and Shadow IT. Cloud-Based Shadow IT Shadow IT Swedish Tech Sector Data Privacy Managers Employees End User Computation Workaround Personal IT Third-Party Access Information Systems, Social aspects
82	Preventing Health Data from Leaking in a Machine Learning System : Implementing code analysis with LLM and model privacy evaluation testing / Förhindra att Hälsodata Läcker ut i ett Maskininlärnings System : Implementering av kod analys med stor språk-modell och modell integritets testning Janryd, Balder, Johansson, Tim January 2024 (has links) Sensitive data leaking from a system can have tremendous negative consequences, such as discrimination, social stigma, and fraudulent economic consequences for those whose data has been leaked. Therefore, it’s of utmost importance that sensitive data is not leaked from a system. This thesis investigated different methods to prevent sensitive patient data from leaking in a machine learning system. Various methods have been investigated and evaluated based on previous research; the methods used in this thesis are a large language model (LLM) for code analysis and a membership inference attack on models to test their privacy level. The LLM code analysis results show that the Llama 3 (an LLM) model had an accuracy of 90% in identifying malicious code that attempts to steal sensitive patient data. The model analysis can evaluate and determine membership inference of sensitive patient data used for training in machine learning models, which is essential for determining data leakage a machine learning model can pose in machine learning systems. Further studies in increasing the deterministic and formatting of the LLM‘s responses must be investigated to ensure the robustness of the security system that utilizes LLMs before it can be deployed in a production environment. Further studies of the model analysis can apply a wider variety of evaluations, such as increased size of machine learning model types and increased range of attack testing types of machine learning models, which can be implemented into machine learning systems. / Känsliga data som läcker från ett system kan ha enorma negativa konsekvenser, såsom diskriminering, social stigmatisering och negativa ekonomiska konsekvenser för dem vars data har läckt ut. Därför är det av yttersta vikt att känsliga data inte läcker från ett system. Denna avhandling undersökte olika metoder för att förhindra att känsliga patientdata läcker ut ur ett maskininlärningssystem. Olika metoder har undersökts och utvärderats baserat på tidigare forskning; metoderna som användes i denna avhandling är en stor språkmodell (LLM) för kodanalys och en medlemskapsinfiltrationsattack på maskininlärnings (ML) modeller för att testa modellernas integritetsnivå. Kodanalysresultaten från LLM visar att modellen Llama 3 hade en noggrannhet på 90% i att identifiera skadlig kod som försöker stjäla känsliga patientdata. Modellanalysen kan utvärdera och bestämma medlemskap av känsliga patientdata som används för träning i maskininlärningsmodeller, vilket är avgörande för att bestämma den dataläckage som en maskininlärningsmodell kan exponera. Ytterligare studier för att öka determinismen och formateringen av LLM:s svar måste undersökas för att säkerställa robustheten i säkerhetssystemet som använder LLM:er innan det kan driftsättas i en produktionsmiljö. Vidare studier av modellanalysen kan tillämpa ytterligare bredd av utvärderingar, såsom ökad storlek på maskininlärningsmodelltyper och ökat utbud av attacktesttyper av maskininlärningsmodeller som kan implementeras i maskininlärningssystem. Sensitive Data Machine Learning (ML) Large Language Model (LLM) Code Analysis Llama 3 Data Privacy Membership Inference Attack (MIA) Känsliga Data Maskininlärning (ML) Stor Språkmodell (LLM) Kodanalys Llama 3 Datasekretess Medlemskapsinfiltrationsattack (MIA) Computer Sciences Datavetenskap (datalogi)
83	[en] A NEW LAYERED APPROACH TO BIOLOGICAL DATA REPRESENTATION AND ITS APPLICATIONS COMPARING SEQUENCES / [pt] UMA NOVA ABORDAGEM EM CAMADAS PARA REPRESENTAÇÃO DE DADOS BIOLÓGICOS E SUAS APLICAÇÕES EM COMPARAÇÃO DE SEQUÊNCIAS DIOGO MUNARO VIEIRA 09 December 2024 (has links) [pt] A identificação e categorização de proteínas homólogas são tarefas fundamentais no campo da biologia, que dependem de ferramentas que analisam sequências de nucleotídeos ou aminoácidos. No entanto, a detecção automatizada de padrões evolutivos, assim como outras características, usando métodos tradicionais, ainda apresenta desafios científicos. Neste estudo, propomos uma nova abordagem de representação de dados em camadas, que permite explorar padrões evolutivos e outras características de sequências na busca por similaridades, classificação e agrupamento. Utiliza-se um processo livre de alinhamento e são propostos novos algoritmos de similaridade que permitem aprimorar a eficácia dessa abordagem. Esses algoritmos utilizam técnicas inspiradas na percepção humana para capturar similaridades dentro das representações de moléculas biológicas. Avaliações experimentais demonstram bom desempenho e alta precisão em comparação com abordagens propostas anteriormente. Essa representação em camadas se mostra promissora na identificação de proteínas similares, principalmente com características de homólogas distantes. Além disso, sugere-se também o desenvolvimento de novos métodos e algoritmos de aprendizado de máquina em bioinformática que envolvam a privacidade e segurança de dados biológicos. / [en] The identification and categorization of homologous proteins are fundamental tasks in the field of biology, relying on tools that analyze nucleotide oramino acid sequences. However, automated detection of evolutionary patternsand additional attributes using traditional methods still presents research challenges. In this study, we propose a novel layered data representation approachthat allows us to explore evolutionary patterns and other sequence features insimilarity searching, classification, and clustering. It employs an alignment-freeprocess, and we introduce new similarity algorithms to enhance the effectiveness of this approach. These algorithms leverage techniques inspired by humanperception to capture subtle similarities within biological molecules representations. Experimental evaluations demonstrate good performance and high accuracy compared to previously proposed approaches. This layered representationshows promise in identifying similar proteins, especially with distant homologscharacteristics. Furthermore, it also suggests the development of new methods and machine learning (ML) algorithms in bioinformatics that address theprivacy and security of biological data. [pt] VISAO COMPUTACIONAL [pt] REPRESENTACAO DE CARACTERISTICAS [pt] PROTEINAS HOMOLOGAS [pt] APRENDIZADO DE MAQUINA [pt] PRIVACIDADE DE DADOS [pt] BIOLOGIA MOLECULAR [pt] MODELAGEM DE DADOS [en] COMPUTER VISION [en] FEATURES REPRESENTATION [en] HOMOLOGOUS PROTEINS [en] MACHINE LEARNING [en] DATA PRIVACY [en] MOLECULAR BIOLOGY [en] DATA MODELING
84	Lite-Agro: Integrating Federated Learning and TinyML on IoAT-Edge for Plant Disease Classification Dockendorf, Catherine April 05 1900 (has links) Lite-Agro studies applications of TinyML in pear (Pyrus communis) tree disease identification and explores hardware implementations with an ESP32 microcontroller. The study works with the DiaMOS Pear Dataset to learn through image analysis whether the leaf is healthy or not, and classifies it according to curl, healthy, spot or slug categories. The system is designed as a low cost and light-duty computing detection edge solution that compares models such as InceptionV3, XceptionV3, EfficientNetB0, and MobileNetV2. This work also researches integration with federated learning frameworks and provides an introduction to federated averaging algorithms. Computer Science Engineering, System Science Agriculture, Plant Pathology
85	A Mixed-Method Study on Barriers to the Publication of Research Data in Learning Analytics Biernacka, Katarzyna 07 November 2024 (has links) Diese Studie untersucht umfassend Barrieren bei der Veröffentlichung von Forschungsdaten im Bereich Learning Analytics (LA) mithilfe eines Mixed-Methods-Ansatzes. Methodologisch gegliedert in vier Phasen – Systematic Literature Review (SLR), Leitfrageninterviews, eine weltweite Online-Umfrage und adaptive Workshops – zeigt die Forschung eine Vielzahl interdisziplinärer und internationaler Perspektiven auf. Das SLR bildet die Grundlage, indem es rechtliche, ethische und ressourcenbezogene Hindernisse für die Datenveröffentlichung identifiziert. Durch die Integration dieser Erkenntnisse in Interviews zeigt sich ein vertieftes Verständnis kultureller und institutioneller Unterschiede, die die Datenpublikation beeinflussen. Eine globale Umfrage verdeutlicht zudem eine Diskrepanz zwischen der Bereitschaft von Forschenden, Daten zu teilen, und ihrer Bewertung der Vorteile geteilten Wissens. Dies weist auf Vertrauensthemen und den geringen wahrgenommenen Nutzen gemeinsamer Daten in der Forschung hin, trotz zunehmender Infrastrukturen und Förderungen für Open Data. Adaptive Workshops beleuchten die Lücke zwischen der Anerkennung der Bedeutung von Datenfreigabe und der Fähigkeit der Forschenden, diese effektiv umzusetzen. Insbesondere Datenschutzbedenken, etwa zur DSGVO, und der Verlust von Kontrolle über geteilte Daten erweisen sich als große Hürden. Die Ergebnisse dieser Studie verdeutlichen, wie Barrieren der Datenpublikation je nach Disziplin und Region variieren und tief in kulturellen und institutionellen Rahmen eingebettet sind. / This study investigates barriers to research data publication in Learning Analytics (LA) through a mixed-method approach encompassing a Systematic Literature Review (SLR), semi-structured interviews, a global survey, and adaptive workshops. The SLR establishes a foundation by identifying legal, ethical, and resource-related barriers to data publication across disciplines. Findings from the SLR integrate in the subsequent interviews, which reveal cultural and institutional nuances affecting researchers' motivations and capabilities for data sharing. A global survey uncovers a discrepancy between researchers' willingness to share data and their perceived benefits from accessing others' data, highlighting trust issues within the scientific community despite growing support for open data. Adaptive workshops underscore the gap between researchers' recognition of data sharing importance and their practical ability to implement it, with data protection concerns, particularly related to GDPR compliance, emerging as major barriers alongside fears of losing data control. The findings from this study illustrate how barriers to data publication vary by discipline and region, being deeply embedded within cultural and institutional frameworks. Forschungsdaten Forschungsdatenmanagement Learning Analytics Open Data Publikation DSGVO Research Data Data Publication Research Data Management Learning Analytics Open Data GDPR Data Privacy AN 73800 DF 2000 ST 670 ddc:000
86	Privacy preserving software engineering for data driven development Tongay, Karan Naresh 14 December 2020 (has links) The exponential rise in the generation of data has introduced many new areas of research including data science, data engineering, machine learning, artificial in- telligence to name a few. It has become important for any industry or organization to precisely understand and analyze the data in order to extract value out of the data. The value of the data can only be realized when it is put into practice in the real world and the most common approach to do this in the technology industry is through software engineering. This brings into picture the area of privacy oriented software engineering and thus there is a rise of data protection regulation acts such as GDPR (General Data Protection Regulation), PDPA (Personal Data Protection Act), etc. Many organizations, governments and companies who have accumulated huge amounts of data over time may conveniently use the data for increasing business value but at the same time the privacy aspects associated with the sensitivity of data especially in terms of personal information of the people can easily be circumvented while designing a software engineering model for these types of applications. Even before the software engineering phase for any data processing application, often times there can be one or many data sharing agreements or privacy policies in place. Every organization may have their own way of maintaining data privacy practices for data driven development. There is a need to generalize or categorize their approaches into tactics which could be referred by other practitioners who are trying to integrate data privacy practices into their development. This qualitative study provides an understanding of various approaches and tactics that are being practised within the industry for privacy preserving data science in software engineering, and discusses a tool for data usage monitoring to identify unethical data access. Finally, we studied strategies for secure data publishing and conducted experiments using sample data to demonstrate how these techniques can be helpful for securing private data before publishing. / Graduate Data Privacy Privacy Data Engineering Software Engineering Data Driven Developers Data Science Privacy Preserving Data Driven Development Machine Learning One class SVM Data Usage Monitoring Health data k-anonymity l-diversity differential privacy Information management Secure data sharing Survey Audits and access control Data Privacy Tactics
87	Measuring the Utility of Synthetic Data : An Empirical Evaluation of Population Fidelity Measures as Indicators of Synthetic Data Utility in Classification Tasks / Mätning av Användbarheten hos Syntetiska Data : En Empirisk Utvärdering av Population Fidelity mätvärden som Indikatorer på Syntetiska Datas Användbarhet i Klassifikationsuppgifter Florean, Alexander January 2024 (has links) In the era of data-driven decision-making and innovation, synthetic data serves as a promising tool that bridges the need for vast datasets in machine learning (ML) and the imperative necessity of data privacy. By simulating real-world data while preserving privacy, synthetic data generators have become more prevalent instruments in AI and ML development. A key challenge with synthetic data lies in accurately estimating its utility. For such purpose, Population Fidelity (PF) measures have shown to be good candidates, a category of metrics that evaluates how well the synthetic data mimics the general distribution of the original data. With this setting, we aim to answer: "How well are different population fidelity measures able to indicate the utility of synthetic data for machine learning based classification models?" We designed a reusable six-step experiment framework to examine the correlation between nine PF measures and the performance of four ML for training classification models over five datasets. The six-step approach includes data preparation, training, testing on original and synthetic datasets, and PF measures computation. The study reveals non-linear relationships between the PF measures and synthetic data utility. The general analysis, meaning the monotonic relationship between the PF measure and performance over all models, yielded at most moderate correlations, where the Cluster measure showed the strongest correlation. In the more granular model-specific analysis, Random Forest showed strong correlations with three PF measures. The findings show that no PF measure shows a consistently high correlation over all models to be considered a universal estimator for model performance.This highlights the importance of context-aware application of PF measures and sets the stage for future research to expand the scope, including support for a wider range of types of data and integrating privacy evaluations in synthetic data assessment. Ultimately, this study contributes to the effective and reliable use of synthetic data, particularly in sensitive fields where data quality is vital. / I eran av datadriven beslutsfattning och innovation, fungerar syntetiska data som ett lovande verktyg som bryggar behovet av omfattande dataset inom maskininlärning (ML) och nödvändigheten för dataintegritet. Genom att simulera verklig data samtidigt som man bevarar integriteten, har generatorer av syntetiska data blivit allt vanligare verktyg inom AI och ML-utveckling. En viktig utmaning med syntetiska data är att noggrant uppskatta dess användbarhet. För detta ändamål har mått under kategorin Populations Fidelity (PF) visat sig vara goda kandidater, det är mätvärden som utvärderar hur väl syntetiska datan efterliknar den generella distributionen av den ursprungliga datan. Med detta i åtanke strävar vi att svara på följande: Hur väl kan olika population fidelity mätvärden indikera användbarheten av syntetisk data för maskininlärnings baserade klassifikationsmodeller? För att besvara frågan har vi designat ett återanvändbart sex-stegs experiment ramverk, för att undersöka korrelationen mellan nio PF-mått och prestandan hos fyra ML klassificeringsmodeller, på fem dataset. Sex-stegs strategin inkluderar datatillredning, träning, testning på både ursprungliga och syntetiska dataset samt beräkning av PF-mått. Studien avslöjar förekommandet av icke-linjära relationer mellan PF-måtten och användbarheten av syntetiska data. Den generella analysen, det vill säga den monotona relationen mellan PF-måttet och prestanda över alla modeller, visade som mest medelmåttiga korrelationer, där Cluster-måttet visade den starkaste korrelationen. I den mer detaljerade, modell-specifika analysen visade Random Forest starka korrelationer med tre PF-mått. Resultaten visar att inget PF-mått visar konsekvent hög korrelation över alla modeller för att betraktas som en universell indikator för modellprestanda. Detta understryker vikten av kontextmedveten tillämpning av PF-mått och banar väg för framtida forskning för att utöka omfånget, inklusive stöd för ett bredare utbud för data av olika typer och integrering av integritetsutvärderingar i bedömningen av syntetiska data. Därav, så bidrar denna studie till effektiv och tillförlitlig användning av syntetiska data, särskilt inom känsliga områden där datakvalitet är avgörande. Synthetic Data Machine Learning Population Fidelity Measures Utility Metrics Synthetic Data Quality Evaluation Classification Algorithms Utility Estimation Data Privacy Artificial Intelligence Experiment Framework Model Performance Assessment Syntetisk Data Maskininlärning Population Fidelity Mätvärden Användbarhetsmätvärden Kvalitetsutvärdering av Syntetisk Data Klassificeringsalgoritmer Användbarhetsutvärdering Dataintegritet Artificiell Intelligens AI Experiment Ramverk Utvärdering av Modellprestanda Computer Sciences Datavetenskap (datalogi)
88	Improving Deep Learning-based Object Detection Algorithms for Omnidirectional Images by Simulated Data Scheck, Tobias 08 August 2024 (has links) Perception, primarily through vision, is a vital human ability that informs decision-making and interactions with the world. Computer Vision, the field dedicated to emulating this human capability in computers, has witnessed transformative progress with the advent of artificial intelligence, particularly neural networks and deep learning. These technologies enable automatic feature learning, eliminating the need for laborious hand-crafted features. The increasing global demand for artificial intelligence applications across various industries, however, raises concerns about data privacy and access. This dissertation addresses these challenges by proposing solutions that leverage synthetic data to preserve privacy and enhance the robustness of computer vision algorithms. The primary objective of this dissertation is to reduce the dependence on real data for modern image processing algorithms by utilizing synthetic data generated through computer simulations. Synthetic data serves as a privacy-preserving alternative, enabling the generation of data in scenarios that are difficult or unsafe to replicate in the real world. While purely simulated data falls short of capturing the full complexity of reality, the dissertation explores methods to bridge the gap between synthetic and real data. The dissertation encompasses a comprehensive evaluation of the synthetic THEODORE dataset, focusing on object detection using Convolutional Neural Networks. Fine-tuning CNN architectures with synthetic data demonstrates remarkable performance improvements over relying solely on real-world data. Extending beyond person recognition, these architectures exhibit the ability to recognize various objects in real-world settings. This work also investigates real-time performance and the impact of barrel distortion in omnidirectional images, underlining the potential of using synthetic data. Furthermore, the dissertation introduces two unsupervised domain adaptation methods tailored for anchorless object detection within the CenterNet architecture. The methods effectively reduce the domain gap when synthetic omnidirectional images serve as the source domain, and real images act as the target domain. Qualitative assessments highlight the advantages of these methods in reducing noise and enhancing detection accuracy. The dissertation concludes with creating an application within the Ambient Assisted Living context to realize the concepts. This application encompasses indoor localization heatmaps, human pose estimation, and activity recognition. The methodology leverages synthetically generated data, unique object identifiers, and rotated bounding boxes to enhance tracking in omnidirectional images. Importantly, the system is designed to operate without compromising privacy or using sensitive images, aligning with the growing concerns of data privacy and access in artificial intelligence applications. / Die Wahrnehmung, insbesondere durch das Sehen, ist eine entscheidende menschliche Fähigkeit, die die Entscheidungsfindung und die Interaktion mit der Welt beeinflusst. Die Computer Vision, das Fachgebiet, das sich der Nachahmung dieser menschlichen Fähigkeit in Computern widmet, hat mit dem Aufkommen künstlicher Intelligenz, insbesondere neuronaler Netzwerke und tiefem Lernen, eine transformative Entwicklung erlebt. Diese Technologien ermöglichen das automatische Erlernen von Merkmalen und beseitigen die Notwendigkeit mühsamer, handgefertigter Merkmale. Die steigende weltweite Nachfrage nach Anwendungen künstlicher Intelligenz in verschiedenen Branchen wirft jedoch Bedenken hinsichtlich des Datenschutzes und des Datenzugriffs auf. Diese Dissertation begegnet diesen Herausforderungen, indem sie Lösungen vorschlägt, die auf synthetischen Daten basieren, um die Privatsphäre zu wahren und die Robustheit von Computer-Vision Algorithmen zu steigern. Das Hauptziel dieser Dissertation besteht darin, die Abhängigkeit von realen Daten für moderne Bildverarbeitungsalgorithmen durch die Verwendung von synthetischen Daten zu reduzieren, die durch Computersimulationen generiert werden. Synthetische Daten dienen als datenschutzfreundliche Alternative und ermöglichen die Generierung von Daten in Szenarien, die schwer oder unsicher in der realen Welt nachzustellen sind. Obwohl rein simulierte Daten die volle Komplexität der Realität nicht erfassen, erforscht die Dissertation Methoden zur Überbrückung der Kluft zwischen synthetischen und realen Daten. Die Dissertation umfasst eine Evaluation des synthetischen THEODORE-Datensatzes mit dem Schwerpunkt auf der Objekterkennung mithilfe von Convolutional Neural Networks. Das Feinabstimmen dieser Architekturen mit synthetischen Daten zeigt bemerkenswerte Leistungssteigerungen im Vergleich zur ausschließlichen Verwendung von realen Daten. Über die Erkennung von Personen hinaus zeigen diese Architekturen die Fähigkeit, verschiedene Objekte in realen Umgebungen zu erkennen. Untersucht wird auch die Echtzeit-Performance und der Einfluss der tonnenförmigen Verzerrung in omnidirektionalen Bildern und betont das Potenzial der Verwendung synthetischer Daten. Darüber hinaus führt die Dissertation zwei nicht überwachte Domänenanpassungsmethoden ein, die speziell für die ankerlose Objekterkennung in der CenterNetArchitektur entwickelt wurden. Die Methoden reduzieren effektiv die Domänenlücke, wenn synthetische omnidirektionale Bilder als Quelldomäne und reale Bilder als Zieldomäne dienen. Qualitative Bewertungen heben die Vorteile dieser Methoden bei der Reduzierung von Störungen und der Verbesserung der Erkennungsgenauigkeit hervor. Die Dissertation schließt mit der Entwicklung einer Anwendung im Kontext von Ambient Assisted Living zur Umsetzung der Konzepte. Diese Anwendung umfasst Innenlokalisierungskarten, die Schätzung der menschlichen Körperhaltung und die Erkennung von Aktivitäten. Die Methodologie nutzt synthetisch generierte Daten, eindeutige Objektidentifikatoren und rotierte Begrenzungsrahmen, um die Verfolgung in omnidirektionalen Bildern zu verbessern. Wichtig ist, dass das System entwickelt wurde, um ohne Beeinträchtigung der Privatsphäre oder Verwendung sensibler Bilder zu arbeiten, was den wachsenden Bedenken hinsichtlich des Datenschutzes und des Zugriffs auf Daten in Anwendungen künstlicher Intelligenz entspricht. info:eu-repo/classification/ddc/621.3 ddc:621.3

Search results