Global ETD Search

101	Image-Text context relation using Machine Learning : Research on performance of different datasets Sun, Yuqi January 2022 (has links) Based on the progress in Computer Vision and Natural Language Processing fields, Vision-Language (VL) models are designed to process information from images and texts. The thesis focused on the performance of a model, Oscar, on different datasets. Oscar is a State-of-The-Art VL representation learning model based on a pre-trained model for Object Detection and a pre-trained Bert model. By comparing the performance of datasets, we could understand the relationship between the properties of datasets and the performance of models. The conclusions could provide the direction for future work on VL datasets and models. In this thesis, I collected five VL datasets that have at least one main difference from each other and generated 8 subsets from these datasets. I trained the same model with different subsets to classify whether an image is related to a text. In common sense, clear datasets have better performance because their images are of everyday scenes and annotated by human annotators. Thus, the size of clear datasets is always limited. However, an interesting phenomenon in the thesis is that the dataset generated by models trained on different datasets has achieved as good performance as clear datasets. This would encourage the research on models for data collection. The experiment results also indicated that future work on the VL model could focus on improving feature extraction from images, as the images have a great influence on the performance of VL models. / Baserat på prestationerna inom Computer Vision och Natural Language Processing-fält, är Vision-Language (VL)-modeller utformade för att bearbeta information från bilder och texter. Projektet fokuserade på prestanda av en modell, Oscar, på olika datamängder. Oscar är en State-of-The-Art VL-representationsinlärningsmodell baserad på en förutbildad modell för Objektdetektion och en förutbildad Bert-modell. Genom att jämföra datauppsättningarnas prestanda kunde vi förstå sambandet mellan datauppsättningarnas egenskaper och modellernas prestanda. Slutsatserna skulle kunna ge riktning för framtida arbete med VL-datauppsättningar och modeller. I detta projekt samlade jag fem VL-datauppsättningar som har minst en huvudskillnad från varandra och genererade 8 delmängder från dessa datauppsättningar. Jag tränade samma modell med olika delmängder för att klassificera om en bild är relaterad till en text. I sunt förnuft har tydliga datauppsättningar bättre prestanda eftersom deras bilder är av vardagliga scener och kommenterade av människor. Storleken på tydliga datamängder är därför alltid begränsad. Ett intressant fenomen i projektet är dock att den datauppsättning som genereras av modeller har uppnått lika bra prestanda som tydliga datauppsättningar. Detta skulle uppmuntra forskning om modeller för datainsamling. Experimentresultaten indikerade också att framtida arbete med VL-modellen kan fokusera på att förbättra funktionsextraktion från bilder, eftersom bilderna har ett stort inflytande på prestandan hos VL-modeller. Vision-Language Representation learning Bert Faster R-CNN Oscar Datasets Visual-Language Representation inlärning Bert Faster R-CNN Oscar Datamängder Computer and Information Sciences Data- och informationsvetenskap
102	Predicting Customer Satisfaction in the Context of Last-Mile Delivery using Supervised and Automatic Machine Learning Höggren, Carl January 2022 (has links) The prevalence of online shopping has steadily risen in the last few years. In response to these changes, last-mile delivery services have emerged that enable goods to reach customers within a shorter timeframe compared to traditional logistics providers. However, with decreased lead times follows greater exposure to risks that directly influence customer satisfaction. More specifically, this report aims to investigate the extent to which Supervised and Automatic Machine Learning can be leveraged to extract those features that have the highest explanatory power dictating customer ratings. The implementation suggests that Random Forest Classifier outperforms both Multi-Layer Perceptron and Support Vector Machine in predicting customer ratings on a highly imbalanced version of the dataset, while AutoML soars when the dataset is subject to undersampling. Using Permutation Feature Importance and Shapley Additive Explanations, it was further concluded that whether the delivery is on time, whether the delivery is executed within the stated time window, and whether the delivery is executed during the morning, afternoon, or evening, are paramount drivers of customer ratings. / Förekomsten av online-shopping har kraftigt ökat de senaste åren. I kölvattnet av dessa förändringar har flertalet sista-milen företag etablerats som möjliggör för paket att nå kunder inom en kortare tidsperiod jämfört med traditionella logistikföretag. Däremot, med minskade ledtider följer större exponering mot risker som direkt påverkar kundernas upplevelse av sista-milen tjänsten. Givet detta syftar denna rapport till att undersöka huruvida övervakad och automtisk maskininlärning kan användas för att extrahera de parametrar som har störst påverkan på kundnöjdhet. Implementationen visar att slumpmässiga beslutsträd överträffar både neurala nätverk och stödvektorsmaskiner i syfte att förutspå kundnöjdhet på en obalanserad version av träningsdatan, medan automatisk maskininlärning överträffar övriga modeller på en balanserad version. Genom användning av metoderna Permutation Feature Importance och Shapley Additive Explanations, framgick att huruvida paketet är försenad, huruvida paketet levereras inom det angivet tidsfönster, och huruvida paketet anländer under morgonen, eftermiddagen, eller kvällen, har störst påverkan på kundnöjdhet. Last-mile Delivery Customer Satisfaction Supervised Machine Learning Automatic Machine Learning Imbalanced Datasets Sista-milen Leveranser Kundnöjdhet Övervakad Maskininlärning Automatiserad Maskininlärning Obalanserade Dataset Computer and Information Sciences Data- och informationsvetenskap
103	Méthodes d'évaluation en extraction d'information ouverte Lamarche, Fabrice 08 1900 (has links) L’extraction d’information ouverte (OIE) est un domaine du traitement des langues naturelles qui a pour but de présenter les informations contenues dans un texte dans un format régulier permettant d’organiser, d’analyser et de réfléchir ces informations. De nombreux systèmes d’OIE existent et sont proposés, revendiquant des performances de plus en plus proches d’un idéal. Dans les dernières années, beaucoup de ces systèmes utilisent des architectures neuronales et leurs auteurs affirment être plus performant sur cette tâche que les méthodes précédentes. Afin d’établir ces performances et de les comparer les unes aux autres, il est nécessaire d’utiliser une référence. Celles-ci ont également évolué dans le temps et se veulent précises, objectives et proches de la réalité. Dans ce mémoire, nous proposons une nouvelle référence permettant de remédier à certaines limitations potentielles des méthodes d’évaluation actuelles. Cette référence comprend deux composantes principales soit une annotations manuelles de phrases candidates et une fonction permettant d’établir la concordance syntaxique entre différents faits extraits et annotés. De plus, nous proposons de nouvelles lignes directrice pour encadrer et mieux définir la tâche d’extraction d’information ouverte elle-même, ce qui permet de mieux quantifier et mesurer la quantité d’informations pertinentes extraites par les systèmes d’OIE. Nos expériences démontrent que notre référence suit de plus près ces lignes directrices que les références précédentes ,qu’elle parvient à mieux juger de la concordances entre les faits extraits et les faits annotés et qu’elle est plus souple dans son approche que la référence qui est à ce jour l’état de l’art. Notre nouvelle référence permet de tirer des conclusions intéressantes sur les performances réelles des systèmes d’extraction d'Information ouverte, notamment sur la réelle amélioration des systèmes plus récents par rapport aux méthodes classiques. / Open Information Extraction (OIE) is a field of natural language processing whose aim is to present the information contained in a text in a regular format that allows that information to be organized, analyzed and reflected upon. Numerous OIE systems exist, claiming everincreasing levels of performance. In order to establish their performance and compare them, it is necessary to use a reference. These have also evolved over time, and are intended to be precise and objective, making it possible to identify the best-performing systems. In this thesis, we identify some of the limitations of current evaluation methods and propose a new benchmark to remedy them. This new benchmark comprises two main components : a manual annotation of candidate sentences and a function to establish syntactic concordance between different extracted and annotated facts. In addition, we propose new guidelines to frame and better define the open information extraction task itself, enabling us to better quantify and measure the amount of relevant information extracted by OIE systems. Our experiment shows that our benchmark follows these guidelines more closely than previous benchmarks, is better at judging the match between extracted and annotated facts, and is more flexible than the current state-of-the-art benchmarks. Our new benchmark allows us to draw some interesting conclusions about the actual performance of open information extraction systems. We show that the latest systems are not necessarily the best. extraction d'information ouverte jeux de donnés évaluation open information extraction datasets task evaluation natural language processing
104	A game engine based application for visualising and analysing environmental spatiotemporal mobile sensor data in an urban context Helbig, Carolin, Becker, Anna Maria, Masson, Torsten, Mohamdeen, Abdelrhman, Sen, Özgür Ozan, Schlink, Uwe 14 March 2024 (has links) Climate change and the high proportion of private motorised transport leads to a high exposure of the urban population to environmental stressors such as particulate matter, nitrogen oxides, noise, and heat. The few fixed measuring stations for these stressors do not provide information on how they are distributed throughout the urban area and what influence the local urban structure has on hot and cold spots of pollution. In the measurement campaign “UmweltTracker” with 95 participants (cyclists, pedestrians), data on the stressors were collected via mobile sensors. The aim was to design and implement an application to analyse the heterogeneous data sets. In this paper we present a prototype of a visualisation and analysis application based on the Unity Game Engine, which allowed us to explore and analyse the collected data sets and to present them on a PC as well as in a VR environment. With the application we were able to show the influence of local urban structures as well as the impact of the time of day on the measured values. With the help of the application, outliers could be identified and the underlying causes could be investigated. The application was used in analysis sessions as well as a workshop with stakeholders. info:eu-repo/classification/ddc/333 ddc:333
105	Graph Partitioning Algorithms for Minimizing Inter-node Communication on a Distributed System Gadde, Srimanth January 2013 (has links) No description available. Computer Engineering Computer Science
106	Prediction of Travel Time and Development of Flood Inundation Maps for Flood Warning System Including Ice Jam Scenario. A Case Study of the Grand River, Ohio Lamichhane, Niraj 23 May 2016 (has links) No description available. Civil Engineering Hydrologic Sciences Water Resource Management
107	Analysis of MRI and CT-based radiomics features for personalized treatment in locally advanced rectal cancer and external validation of published radiomics models Shahzadi, Iram, Zwanenburg, Alex, Lattermann, Annika, Linge, Annett, Baldus, Christian, Peeken, Jan C., Combs, Stephanie E., Diefenhardt, Markus, Rödel, Claus, Kirste, Simon, Grosu, Anca-Ligia, Baumann, Michael, Krause, Mechthild, Troost, Esther G. C., Löck, Steffen 05 April 2024 (has links) Radiomics analyses commonly apply imaging features of different complexity for the prediction of the endpoint of interest. However, the prognostic value of each feature class is generally unclear. Furthermore, many radiomics models lack independent external validation that is decisive for their clinical application. Therefore, in this manuscript we present two complementary studies. In our modelling study, we developed and validated different radiomics signatures for outcome prediction after neoadjuvant chemoradiotherapy (nCRT) in patients with locally advanced rectal cancer (LARC) based on computed tomography (CT) and T2-weighted (T2w) magnetic resonance (MR) imaging datasets of 4 independent institutions (training: 122, validation 68 patients). We compared different feature classes extracted from the gross tumour volume for the prognosis of tumour response and freedom from distant metastases (FFDM): morphological and first order (MFO) features, second order texture (SOT) features, and Laplacian of Gaussian (LoG) transformed intensity features. Analyses were performed for CT and MRI separately and combined. Model performance was assessed by the area under the curve (AUC) and the concordance index (CI) for tumour response and FFDM, respectively. Overall, intensity features of LoG transformed CT and MR imaging combined with clinical T stage (cT) showed the best performance for tumour response prediction, while SOT features showed good performance for FFDM in independent validation (AUC = 0.70, CI = 0.69). In our external validation study, we aimed to validate previously published radiomics signatures on our multicentre cohort. We identified relevant publications on comparable patient datasets through a literature search and applied the reported radiomics models to our dataset. Only one of the identified studies could be validated, indicating an overall lack of reproducibility and the need of further standardization of radiomics before clinical application. info:eu-repo/classification/ddc/500 ddc:500 info:eu-repo/classification/ddc/600 ddc:600
108	Decision Support Elements and Enabling Techniques to Achieve a Cyber Defence Situational Awareness Capability Llopis Sánchez, Salvador 15 June 2023 (has links) [ES] La presente tesis doctoral realiza un análisis en detalle de los elementos de decisión necesarios para mejorar la comprensión de la situación en ciberdefensa con especial énfasis en la percepción y comprensión del analista de un centro de operaciones de ciberseguridad (SOC). Se proponen dos arquitecturas diferentes basadas en el análisis forense de flujos de datos (NF3). La primera arquitectura emplea técnicas de Ensemble Machine Learning mientras que la segunda es una variante de Machine Learning de mayor complejidad algorítmica (lambda-NF3) que ofrece un marco de defensa de mayor robustez frente a ataques adversarios. Ambas propuestas buscan automatizar de forma efectiva la detección de malware y su posterior gestión de incidentes mostrando unos resultados satisfactorios en aproximar lo que se ha denominado un SOC de próxima generación y de computación cognitiva (NGC2SOC). La supervisión y monitorización de eventos para la protección de las redes informáticas de una organización debe ir acompañada de técnicas de visualización. En este caso, la tesis aborda la generación de representaciones tridimensionales basadas en métricas orientadas a la misión y procedimientos que usan un sistema experto basado en lógica difusa. Precisamente, el estado del arte muestra serias deficiencias a la hora de implementar soluciones de ciberdefensa que reflejen la relevancia de la misión, los recursos y cometidos de una organización para una decisión mejor informada. El trabajo de investigación proporciona finalmente dos áreas claves para mejorar la toma de decisiones en ciberdefensa: un marco sólido y completo de verificación y validación para evaluar parámetros de soluciones y la elaboración de un conjunto de datos sintéticos que referencian unívocamente las fases de un ciberataque con los estándares Cyber Kill Chain y MITRE ATT & CK. / [CA] La present tesi doctoral realitza una anàlisi detalladament dels elements de decisió necessaris per a millorar la comprensió de la situació en ciberdefensa amb especial èmfasi en la percepció i comprensió de l'analista d'un centre d'operacions de ciberseguretat (SOC). Es proposen dues arquitectures diferents basades en l'anàlisi forense de fluxos de dades (NF3). La primera arquitectura empra tècniques de Ensemble Machine Learning mentre que la segona és una variant de Machine Learning de major complexitat algorítmica (lambda-NF3) que ofereix un marc de defensa de major robustesa enfront d'atacs adversaris. Totes dues propostes busquen automatitzar de manera efectiva la detecció de malware i la seua posterior gestió d'incidents mostrant uns resultats satisfactoris a aproximar el que s'ha denominat un SOC de pròxima generació i de computació cognitiva (NGC2SOC). La supervisió i monitoratge d'esdeveniments per a la protecció de les xarxes informàtiques d'una organització ha d'anar acompanyada de tècniques de visualització. En aquest cas, la tesi aborda la generació de representacions tridimensionals basades en mètriques orientades a la missió i procediments que usen un sistema expert basat en lògica difusa. Precisament, l'estat de l'art mostra serioses deficiències a l'hora d'implementar solucions de ciberdefensa que reflectisquen la rellevància de la missió, els recursos i comeses d'una organització per a una decisió més ben informada. El treball de recerca proporciona finalment dues àrees claus per a millorar la presa de decisions en ciberdefensa: un marc sòlid i complet de verificació i validació per a avaluar paràmetres de solucions i l'elaboració d'un conjunt de dades sintètiques que referencien unívocament les fases d'un ciberatac amb els estàndards Cyber Kill Chain i MITRE ATT & CK. / [EN] This doctoral thesis performs a detailed analysis of the decision elements necessary to improve the cyber defence situation awareness with a special emphasis on the perception and understanding of the analyst of a cybersecurity operations center (SOC). Two different architectures based on the network flow forensics of data streams (NF3) are proposed. The first architecture uses Ensemble Machine Learning techniques while the second is a variant of Machine Learning with greater algorithmic complexity (lambda-NF3) that offers a more robust defense framework against adversarial attacks. Both proposals seek to effectively automate the detection of malware and its subsequent incident management, showing satisfactory results in approximating what has been called a next generation cognitive computing SOC (NGC2SOC). The supervision and monitoring of events for the protection of an organisation's computer networks must be accompanied by visualisation techniques. In this case, the thesis addresses the representation of three-dimensional pictures based on mission oriented metrics and procedures that use an expert system based on fuzzy logic. Precisely, the state-of-the-art evidences serious deficiencies when it comes to implementing cyber defence solutions that consider the relevance of the mission, resources and tasks of an organisation for a better-informed decision. The research work finally provides two key areas to improve decision-making in cyber defence: a solid and complete verification and validation framework to evaluate solution parameters and the development of a synthetic dataset that univocally references the phases of a cyber-attack with the Cyber Kill Chain and MITRE ATT & CK standards. / Llopis Sánchez, S. (2023). Decision Support Elements and Enabling Techniques to Achieve a Cyber Defence Situational Awareness Capability [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/194242 Visualización Verificación y validación Conjuntos de datos Aprendizaje automático Apoyo a la toma de decisiones Ciberdefensa Verification and validation Visualization Datasets Machine learning Decision support Cyber defence INGENIERÍA TELEMÁTICA
109	Apprentissage supervisé de données déséquilibrées par forêt aléatoire / Supervised learning of imbalanced datasets using random forest Thomas, Julien 12 February 2009 (has links) La problématique des jeux de données déséquilibrées en apprentissage supervisé est apparue relativement récemment, dès lors que le data mining est devenu une technologie amplement utilisée dans l'industrie. Le but de nos travaux est d'adapter différents éléments de l'apprentissage supervisé à cette problématique. Nous cherchons également à répondre aux exigences spécifiques de performances souvent liées aux problèmes de données déséquilibrées. Ce besoin se retrouve dans notre application principale, la mise au point d'un logiciel d'aide à la détection des cancers du sein.Pour cela, nous proposons de nouvelles méthodes modifiant trois différentes étapes d'un processus d'apprentissage. Tout d'abord au niveau de l'échantillonnage, nous proposons lors de l'utilisation d'un bagging, de remplacer le bootstrap classique par un échantillonnage dirigé. Nos techniques FUNSS et LARSS utilisent des propriétés de voisinage pour la sélection des individus. Ensuite au niveau de l'espace de représentation, notre contribution consiste en une méthode de construction de variables adaptées aux jeux de données déséquilibrées. Cette méthode, l'algorithme FuFeFa, est basée sur la découverte de règles d'association prédictives. Enfin, lors de l'étape d'agrégation des classifieurs de base d'un bagging, nous proposons d'optimiser le vote à la majorité en le pondérant. Pour ce faire nous avons mis en place une nouvelle mesure quantitative d'évaluation des performances d'un modèle, PRAGMA, qui permet la prise en considération de besoins spécifiques de l'utilisateur vis-à-vis des taux de rappel et de précision de chaque classe. / The problem of imbalanced datasets in supervised learning has emerged relatively recently, since the data mining has become a technology widely used in industry. The assisted medical diagnosis, the detection of fraud, abnormal phenomena, or specific elements on satellite imagery, are examples of industrial applications based on supervised learning of imbalanced datasets. The goal of our work is to bring supervised learning process on this issue. We also try to give an answer about the specific requirements of performance often related to the problem of imbalanced datasets, such as a high recall rate for the minority class. This need is reflected in our main application, the development of software to help radiologist in the detection of breast cancer. For this, we propose new methods of amending three different stages of a learning process. First in the sampling stage, we propose in the case of a bagging, to replaced classic bootstrap sampling by a guided sampling. Our techniques, FUNSS and LARSS use neighbourhood properties for the selection of objects. Secondly, for the representation space, our contribution is a method of variables construction adapted to imbalanced datasets. This method, the algorithm FuFeFa, is based on the discovery of predictive association rules. Finally, at the stage of aggregation of base classifiers of a bagging, we propose to optimize the majority vote in using weightings. For this, we have introduced a new quantitative measure of model assessment, PRAGMA, which allows taking into account user specific needs about recall and precision rates of each class. Apprentissage supervisé Data mining Données déséquilibrées Forêt aléatoire Échantillonnage Construction de variables Évaluation de modèles Aide au diagnostic FUNSS LARSS FuFeFa PRAGMA Supervised learning Imbalanced datasets Random forest Sampling Model assessment Assisted diagnostic Breast cancer
110	Un modèle rétroactif de réconciliation utilité-confidentialité sur les données d’assurance Rioux, Jonathan 04 1900 (has links) Le partage des données de façon confidentielle préoccupe un bon nombre d’acteurs, peu importe le domaine. La recherche évolue rapidement, mais le manque de solutions adaptées à la réalité d’une entreprise freine l’adoption de bonnes pratiques d’affaires quant à la protection des renseignements sensibles. Nous proposons dans ce mémoire une solution modulaire, évolutive et complète nommée PEPS, paramétrée pour une utilisation dans le domaine de l’assurance. Nous évaluons le cycle entier d’un partage confidentiel, de la gestion des données à la divulgation, en passant par la gestion des forces externes et l’anonymisation. PEPS se démarque du fait qu’il utilise la contextualisation du problème rencontré et l’information propre au domaine afin de s’ajuster et de maximiser l’utilisation de l’ensemble anonymisé. À cette fin, nous présentons un algorithme d’anonymat fortement contextualisé ainsi que des mesures de performances ajustées aux analyses d’expérience. / Privacy-preserving data sharing is a challenge for almost any enterprise nowadays, no matter their field of expertise. Research is evolving at a rapid pace, but there is still a lack of adapted and adaptable solutions for best business practices regarding the management and sharing of privacy-aware datasets. To this problem, we offer PEPS, a modular, upgradeable and end-to-end system tailored for the need of insurance companies and researchers. We take into account the entire cycle of sharing data: from data management to publication, while negotiating with external forces and policies. Our system distinguishes itself by taking advantage of the domain-specific and problem-specific knowledge to tailor itself to the situation and increase the utility of the resulting dataset. To this end, we also present a strongly contextualised privacy algorithm and adapted utility measures to evaluate the performance of a successful disclosure of experience analysis. Partage confidentiel de données Gestion de la confidentialité Données d’assurance Privacy-preserving data sharing Confidentiality management Insurance data Utility measures for anonymized datasets

Search results