Global ETD Search

1	Detección de objetos usando redes neuronales convolucionales junto con Random Forest y Support Vector Machines Campanini García, Diego Alejandro January 2018 (has links) Ingeniero Civil Eléctrico / En el presente trabajo de título se desarrolla un sistema de detección de objetos (localización y clasificación), basado en redes neuronales convolucionales (CNN por su sigla en inglés) y dos métodos clásicos de machine learning como Random Forest (RF) y Support Vector Machines (SVMs). La idea es mejorar, con los mencionados clasificadores, el rendimiento del sistema de detección conocido como Faster R-CNN (su significado en inglés es: Regions with CNN features). El sistema Faster R-CNN, se fundamenta en el concepto de region proposal para generar muestras candidatas a ser objetos y posteriormente producir dos salidas: una con la regresión que caracteriza la localización de los objetos y otra con los puntajes de confianza asociados a los bounding boxes predichos. Ambas salidas son generadas por capas completamente conectadas. En este trabajo se interviene la salida que genera los puntajes de confianza, tal que, en este punto se conecta un clasificador (RF o SVM), para generar con estos los puntajes de salida del sistema. De esta forma se busca mejorar el rendimiento del sistema Faster R-CNN. El entrenamiento de los clasificadores se realiza con los vectores de características extraídos, desde una de las capas completamente conectadas del sistema Faster R-CNN, específicamente se prueban las tres que contempla la arquitectura, para evaluar cuál de estas permite obtener los mejores resultados. Para definir, entre otras cosas, el número de capas convolucionales a utilizar y el tamaño de los filtros presentes en las primeras capas del sistema Faster R-CNN, se emplean los modelos de redes convolucionales ZF y VGG16, estas redes son solamente de clasificación, y son las mismas ocupados originalmente. Para desarrollar los sistemas propuestos se utilizan distintas implementaciones o librerías para las cuales se dispone de su código de forma abierta. Para el detector Faster R-CNN se utiliza una implementación desarrollado en Python, para RF se comparan dos librerías: randomForest escrita en R y scikit-learn en Python. Por su parte para SVM se utiliza la librería conocida como LIBSVM escrita en C. Las principales tareas de programación consisten en desarrollar los algoritmos de etiquetado de los vectores de características extraídos desde las capas completamente conectadas; unir los clasificadores con el sistema base, para el análisis \textit{online} de las imágenes en la etapa de prueba; programar un algoritmo para el entrenamiento eficiente en tiempo y en memoria para SVM (algoritmo conocido como hard negative mining) Al evaluar los sistemas desarrollados se concluye que los mejores resultados se obtienen con la red VGG16, específicamente para el caso en que se implementa el sistema Faster R-CNN+SVM con kernel RBF (radial basis function), logrando un mean Average Precision (mAP) de 68.9%. El segundo mejor resultado se alcanza con Faster R-CNN+RF con 180 árboles y es de 67.8%. Con el sistema original Faster R-CNN se consigue un mAP de 69.3%. Aprendizaje de máquina Análisis de imagen Faster R-CNN
2	Faster-X Evolution in the Speciation of Heliconius Butterflies Baquero, Margarita 12 August 2016 (has links) The X and Z chromosomes have unique characteristics that lead to unique evolutionary consequences. Lepidopterans have a well-known, disproportionately large-Z effect for behavioral and morphological traits that distinguish closely related species. A potential explanation for the Large-X effect is the faster evolution of the sex chromosome (Faster-X evolution). We use whole genome re-sequencing of Heliconius erato races and of the incipient species H. himera to test for faster-Z evolution between hybridizing populations at different reproductive isolation levels, by calculating divergence and nucleotide diversity. We show evidence for Faster-Z evolution in Heliconius butterflies at the early stages of speciation and along the speciation continuum. Evidence of higher divergence and lower nucleotide diversity suggests not only selection but also nonaptive process, like demographic changes, may be driving faster-Z evolution, especially in the incipient species. faster-X sex chromosome speciation Heliconius
3	The Scharnhorst Effect: Superluminality and Causality in Effective Field Theories de Clark, Sybil Gertrude, de Clark, Sybil Gertrude January 2016 (has links) We present two re-derivations of the Scharnhorst effect. The latter was first obtained in 1990 by Klaus Scharnhorst, soon followed by Gabriel Barton, and consists in the theoretical prediction that the phase velocity of photons propagating in a Casimir vacuum normal to the plates would be larger than c. The first derivation given in the present work is relevant for the debates that have taken place in the physics literature regarding a possible greater-than-c value of the signal velocity. Indeed because the phase velocity result also held for the group velocity, the issue soon arose as to whether the same could be said for the signal velocity. Several arguments were presented against this notion, notably to the effect that measurement uncertainties would preclude such a measurement. These notably relied on the fact that the known phase velocity result is only valid within a certain frequency regime. Scharnhorst and Barton responded by arguing that given their previous result, the Kramers-Kronig relations imply one of two options: either the greater-than-c result holds for the signal velocity as well, or the Casimir vacuum behaves like an amplifying medium for some frequencies. Furthermore, the effect was later rederived and generalized within the framework of an effective metric approach, which has been argued to obviate the worries regarding causal paradoxes often associated with the possibility of faster-than-c signalling. However concerns related to theory errors as well as to the measurement uncertainties that had surfaced in the earlier debate have remained salient. By re-deriving the phase velocity using Soft-Collinear Effective Theory (SCET), one can address some of these concerns. Indeed, with regard to theory errors, SCET provides us with a framework where higher order corrections are known to be power-suppressed because SCET ensures that the expansion parameters are multiplied by factors of order 1. As a result, with due qualifications inherent to the nature of effective field theory, the result obtained within the SCET approach cannot be invalidated by higher order corrections. Furthermore, the theoretical description offered by SCET provides an argument relevant to the point that measurement uncertainties would prevent measuring the signal speed to be faster-than-c. Indeed, SCET implies the interaction between the Casimir vacuum and the propagating photon to be such that the latter would have the same phase velocity irrespective of its frequency. This in turn would entail that its signal velocity would be equal to this phase velocity, which is faster-than-c. The second calculation presented is concerned with the physical interpretation of the Scharnhorst effect, and constitutes an attempt at re-deriving it within source theory. Existing derivations imply that the Scharnhorst effect can be attributed to vacuum fluctuations. Other physical effects that share this feature have also been derived without any reference to the vacuum, but as due to source fields instead. We attempt a similar derivation for the Scharnhorst effect. Effective Faster-than-light QFT Scharnhorst Superluminal Physics Causality
4	Neuron-adaptive neural network models and applications Xu, Shuxiang, University of Western Sydney, Faculty of Informatics, Science and Technology January 1999 (has links) Artificial Neural Networks have been widely probed by worldwide researchers to cope with the problems such as function approximation and data simulation. This thesis deals with Feed-forward Neural Networks (FNN's) with a new neuron activation function called Neuron-adaptive Activation Function (NAF), and Feed-forward Higher Order Neural Networks (HONN's) with this new neuron activation function. We have designed a new neural network model, the Neuron-Adaptive Neural Network (NANN), and mathematically proved that one NANN can approximate any piecewise continuous function to any desired accuracy. In the neural network literature only Zhang proved the universal approximation ability of FNN Group to any piecewise continuous function. Next, we have developed the approximation properties of Neuron Adaptive Higher Order Neural Networks (NAHONN's), a combination of HONN's and NAF, to any continuous function, functional and operator. Finally, we have created a software program called MASFinance which runs on the Solaris system for the approximation of continuous or discontinuous functions, and for the simulation of any continuous or discontinuous data (especially financial data). Our work distinguishes itself from previous work in the following ways: we use a new neuron-adaptive activation function, while the neuron activation functions in most existing work are all fixed and can't be tuned to adapt to different approximation problems; we only use on NANN to approximate any piecewise continuous function, while a neural network group must be utilised in previous research; we combine HONN's with NAF and investigate its approximation properties to any continuous function, functional, and operator; we present a new software program, MASFinance, for function approximation and data simulation. Experiments running MASFinance indicate that the proposed NANN's present several advantages over traditional neuron-fixed networks (such as greatly reduced network size, faster learning, and lessened simulation errors), and that the suggested NANN's can effectively approximate piecewise continuous functions better than neural networks groups. Experiments also indicate that NANN's are especially suitable for data simulation / Doctor of Philosophy (PhD) neural networks MASFinance network size faster learning simulation errors
5	POTHOLE DETECTION USING DEEP LEARNING AND AREA ASSESSMENT USING IMAGE MANIPULATION Kharel, Subash 01 June 2021 (has links) Every year, drivers are spending over 3 billions to repair damage on vehicle caused by potholes. Along with the financial disaster, potholes cause frustration in drivers. Also, with the emerging development of automated vehicles, road safety with automation in mind is being a necessity. Deep Learning techniques offer intelligent alternatives to reduce the loss caused by spotting pothole. The world is connected in such a way that the information can be shared in no time. Using the power of connectivity, we can communicate the information of potholes to other vehicles and also the department of Transportation for necessary action. A significant number of research efforts have been done with a view to help detect potholes in the pavements. In this thesis, we have compared two object detection algorithms belonging to two major classes i.e. single shot detectors and two stage detectors using our dataset. Comparing the results in the Faster RCNN and YOLOv5, we concluded that, potholes take a small portion in image which makes potholes detection with YOLOv5 less accurate than the Faster RCNN, but keeping the speed of detection in mind, we have suggested that YOLOv5 will be a better solution for this task. Using the YOLOv5 model and image processing technique, we calculated approximate area of potholes and visualized the shape of potholes. Thus obtained information can be used by the Department of Transportation for planning necessary construction tasks. Also, we can use these information to warn the drivers about the severity of potholes depending upon the shape and area. Deep Learning Faster RCNN Potholes Potholes Area Potholes Detection YOLOv5
6	Faster R-CNN based CubeSat Close Proximity Detection and Attitude Estimation Sujeewa Samarawickrama, N G I 09 August 2019 (has links) Automatic detection of space objects in optical images is important to close proximity operations, relative navigation, and situational awareness. To better protect space assets, it is very important not only to know where a space object is, but also what the object is. In this dissertation, a method for detecting multiple 1U, 2U, 3U, and 6U CubeSats based on the faster region-based convolutional neural network (Faster R-CNN) is described. CubeSats detection models are developed using Web-searched and computer-aided design images. In addition, a two-step method is presented for detecting a rotating CubeSat in close proximity from a sequence of images without the use of intrinsic or external camera parameters. First, a Faster R-CNN trained on synthetic images of 1U, 2U, 3U, and 6U CubeSats locates the CubeSat in each image and assigns a weight to each CubeSat class. Then, these classification results are combined using Dempster's rule. The method is tested on simulated scenarios where the rotating 3U and 6U CubeSats are in unfavorable views or in dark environments. Faster R-CNN detection results contain useful information for tracking, navigation, pose estimation, and simultaneous localization and mapping. A coarse single-point attitude estimation method is proposed utilizing the centroids of the bounding boxes surrounding the CubeSats in the image. The centroids define the line-of-sight (LOS) vectors to the detected CubeSats in the camera frame, and the LOS vectors in the reference frame are assumed to be obtained from global positioning system (GPS). The three-axis attitude is determined from the vector observations by solving Wahba's problem. The attitude estimation concept is tested on simulated scenarios using Autodesk Maya. Attitude Estimation Close Proximity Detection CubeSats Faster R-CNN
7	USING ADVANCED DEEP LEARNING TECHNIQUES TO IDENTIFY DRAINAGE CROSSING FEATURES Edidem, Michael Isaiah 01 August 2024 (has links) (PDF) High-resolution digital elevation models (HRDEMs) enable precise mapping of hydrographic features. However, the absence of drainage crossings underpassing roads or bridges hinders accurate delineation of stream networks. Traditional methods such as on-screen digitization and field surveys for locating these crossings are time-consuming and expensive for extensive areas. This study investigates the effectiveness of deep learning models for automated drainage crossing detection using HRDEMs. The study also explores the performance of advanced classification algorithm such as EfficientNetV2 model using various co-registered HRDRM-derived geomorphological features, such as positive openness, geometric curvature, and topographic position index (TPI) variants, for drainage crossings classification. The results reveal that individual layers, particularly HRDEM and TPI21, achieve the best performance, while combining all five layers doesn't improve accuracy. Hence, effective feature screening is crucial, as eliminating less informative features enhances the F1 score. For drainage crossing detection, this study develops and trains deep learning models, Faster R-CNN and YOLOv5 object detectors, using HRDEM tiles and ground truth labels. These models achieve an average F1-score of 0.78 in Nebraska watershed and demonstrate successful transferability to other watersheds. This spatial object detection approach offers a promising avenue for automated, large-scale drainage crossing detection, facilitating the integration of these features into HRDEMs and improving the accuracy of hydrographic network delineation. Drainage crossing EfficientNet Faster R-CNN GeoAI Hydrography YOLO
8	Récepteurs avancés et nouvelles formes d'ondes pour les communications aéronautiques / Advanced receivers and waveforms for UAV/Aircraft aeronautical communications Raddadi, Bilel 03 July 2018 (has links) De nos jours, l'utilisation des drones ne cesse d'augmenter et de nombreuses études sont réalisées afin de mettre en place des systèmes de communication dronique destinés à des applications non seulement militaires mais aussi civiles. Pour le moment, les règles d'intégration des drones commerciaux dans l’espace aérien doivent encore être définies et le principal enjeu occupation est d'assurer une communication fiable et sécurisée. Cette thèse s’inscrit dans ce contexte de communication. Motivée par la croissance rapide du nombre des drones et par les nouvelles générations des drones commandés par satellite, la thèse vise à étudier les différents liens possibles qui relient le drone aux autres composants du système de communication. Trois principaux liens sont à mettre en place : le lien de contrôle, le lien de retour et le lien de mission. En raison de la rareté des ressources fréquentielles déjà allouées pour les futurs systèmes droniques, l'efficacité spectrale devient un paramètre crucial pour leur déploiement à grande échelle. Afin de mettre en place un système de communication par drones spectralement efficace, une bonne compréhension des canaux de transmission pour chacune des trois liaisons est indispensable, ainsi qu’un choix judicieux de la forme d’onde. Cette thèse commence par étudier les canaux de propagation pour chaque liaison : canaux de type muti-trajets avec ligne de vue directe, dans un contexte d’utilisation de drones à moyenne altitude et longue endurance (drones MALE). L’objectif de cette thèse est de proposer de nouveaux algorithmes de réception permettant d’estimer et égaliser ces canaux de propagation muti-trajets. Les méthodes proposées dépendent du choix de la forme d’onde. Du fait de la présence d’un lien satellite, les formes d’onde considérées sont de type mono-porteuse (avec un faible facteur de crête) : SC et EW-SCOFDM. L’égalisation est réalisée dans le domaine temporel (SC) ou fréquentiel (EW-SC-OFDM). L'architecture UAV prévoit l'implantation de deux antennes placées aux ailes. Ces deux antennes peuvent être utilisées pour augmenter le gain de diversité (gain de matrice de canal). Afin de réduire la complexité de l'égalisation des canaux, la forme d'onde EW-SC-OFDM est proposée et étudiée dans un contexte muti-antennes, dans le but d'améliorer l'endurance de l'UAV et d'accroître l'efficacité spectrale, une nouvelle technique de modulation est considérée: Modulation spatiale ( SM). Dans SM, les antennes de transmission sont activées en alternance. L'utilisation de la forme d'onde EW-SC-OFDM combinée à la technique SM nous permet de proposer de nouvelles structures modifiées qui exploitent l’étalement spectrale pour mieux protéger des bits de sélection des antennes émettrices et ainsi améliorer les performances du système. / Nowadays, several studies are launched for the design of reliable and safe communications systems that introduce Unmanned Aerial Vehicle (UAV), this paves the way for UAV communication systems to play an important role in a lot of applications for non-segregated military and civil airspaces. Until today, rules for integrating commercial UAVs in airspace still need to be defined, the design of secure, highly reliable and cost effective communications systems still a challenging task. This thesis is part of this communication context. Motivated by the rapid growth of UAV quantities and by the new generations of UAVs controlled by satellite, the thesis aims to study the various possible UAV links which connect UAV/aircraft to other communication system components (satellite, terrestrial networks, etc.). Three main links are considered: the Forward link, the Return link and the Mission link. Due to spectrum scarcity and higher concentration in aircraft density, spectral efficiency becomes a crucial parameter for largescale deployment of UAVs. In order to set up a spectrally efficient UAV communication system, a good understanding of transmission channel for each link is indispensable, as well as a judicious choice of the waveform. This thesis begins to study propagation channels for each link: a mutipath channels through radio Line-of-Sight (LOS) links, in a context of using Meduim Altitude Long drones Endurance (MALE) UAVs. The objective of this thesis is to maximize the solutions and the algorithms used for signal reception such as channel estimation and channel equalization. These algorithms will be used to estimate and to equalize the existing muti-path propagation channels. Furthermore, the proposed methods depend on the choosen waveform. Because of the presence of satellite link, in this thesis, we consider two low-papr linear waveforms: classical Single-Carrier (SC) waveform and Extented Weighted Single-Carrier Orthogonal Frequency-Division Multiplexing (EW-SC-OFDM) waveform. channel estimation and channel equalization are performed in the time-domain (SC) or in the frequency-domain (EW-SC-OFDM). UAV architecture envisages the implantation of two antennas placed at wings. These two antennas can be used to increase diversity gain (channel matrix gain). In order to reduce channel equalization complexity, the EWSC- OFDM waveform is proposed and studied in a muti-antennas context, also for the purpose of enhancing UAV endurance and also increasing spectral efficiency, a new modulation technique is considered: Spatial Modulation (SM). In SM, transmit antennas are activated in an alternating manner. The use of EW-SC-OFDM waveform combined to SM technique allows us to propose new modified structures which exploit exces bandwidth to improve antenna bit protection and thus enhancing system performances. Communications aéronautiques Faster-Than-Nyquist (FTN) Égalisation Estimation Modulation spatiale Aeronautical Communications Faster-Than-Nyquist (FTN) Channel Equalization Channel Estimation Spatial Modulation
9	Faster than Nyquist transceiver design : algorithms for a global transmission-reception enhancement / Transmettre l'information au-delà de la cadence de Nyquist : algorithmes de transmission et réception et optimisation globale Lahbabi, Naila 22 June 2017 (has links) La croissance exponentielle du trafic de données sans fils, causée par l'Internet mobile et les smartphones, contraint les futurs systèmes radio à inclure des modulations/formes d'ondes plus avancées offrant un débit plus élevé et une utilisation efficace des ressources spectrales. Les transmissions dites Faster-Than-Nyquist (FTN), introduites en 1975, sont parmi les meilleurs candidates pour répondre à ces besoins. En transmettant les symboles à une cadence plus rapide que celle définie par le critère de Nyquist, FTN peut théoriquement augmenter le débit mais en introduisant des interférences en contrepartie. Dans cette thèse, nous explorons le concept des transmissions FTN à travers un canal AWGN (Additive White Gaussian Noise) dans le contexte des modulations OFDM/OQAM (Orthogonal Frequency Division Multiplexing with Offset Quadrature Amplitude Modulation).L'objectif principal de cette thèse est de présenter un système OFDM/OQAM qui permet de transmettre l'information au-delà de la cadence de Nyquist tout en tenant en compte la complexité globale du système. Tout d'abord, nous proposons une nouvelle implémentation efficace des systèmes OFDM/OQAM appliquant le concept FTN, désignée ici par FTN-OQAM, qui garde la même complexité que les systèmes OFDM/OQAM et qui permet un gain en débit très proche du gain théorique. Vu que la condition de Nyquist n'est plus respectée, le signal transmis est maintenant perturbé par des interférences. Pour remédier à ce problème, nous proposons un récepteur basé sur le principe de l'égalisation linéaire sous le critère minimum erreur quadratique moyenne avec annulation d'interférences appelé MMSE LE-IC. Le but de notre système est d'augmenter le débit de transmission, ce qui signifie que des constellations d'ordres élevés seront ciblées. Dans ce contexte, le MMSE LE-IC, dont la complexité est indépendante de la constellation, représente un bon compromis entre efficacité et complexité. Puisque la modulation OFDM/OQAM utilise différents types de formes d'ondes, nous proposons pour plusieurs d'entre elles un algorithme pour déterminer la valeur minimale du facteur d'accélération, en fonction de l'ordre de constellation, qui apporte un gain en efficacité spectrale tout en gardant les mêmes performances que les systèmes respectant le critère de Nyquist à un SNR fixé. Ensuite, nous étudions l'amélioration du traitement itératif de l'émetteur-récepteur. La méthode proposée consiste à combiner un précodeur avec le système FTN-OQAM afin de réduire les interférences causées par du FTN à l'émission. Nous proposons un modèle de précodage dispersé, car il est difficile de précoder conjointement tous les symboles transmis. Nous présentons trois familles de précodeurs avec les récepteurs correspondants. En outre, nous modifions différents blocs de l'émetteur FTN-OQAM tels que le codage canal, le mappage des bits et le mappage des symboles afin d'améliorer davantage le transmetteur FTN-OQAM. Les résultats présentés révèlent le potentiel important des systèmes proposés. / The exponential growth of wireless data traffic driven by mobile Internet and smart devices constrains the future radio systems to include advanced modulations/waveforms offering higher data rates with more efficient bandwidth usage. One possibility is to violate the well known Nyquist criterion by transmitting faster than the Nyquist rate, i.e., using a technique also known as Faster-Than-Nyquist (FTN) signaling. Nyquist-based systems have the advantage of simple transmitter and receiver architectures at the detriment of bandwidth efficiency. The idea of signaling beyond the Nyquist rate to trade the interference-free transmission for more throughput goes back to 1975. In this dissertation, we investigate the concept of FTN signaling over Additive White Gaussian Noise (AWGN) channel in the context of Orthogonal Frequency Division Multiplexing with Offset Quadrature Amplitude Modulation OFDM/OQAM modulation.The main objective of our work is to present an OFDM/OQAM system signaling faster than the Nyquist one and explore its potential rate improvement while keeping under consideration the overall system complexity. First, we propose a new efficient FTN implementation of OFDM/OQAM systems, denoted by FTN-OQAM, that has the same complexity as OFDM/OQAM systems, while approaching very closely the FTN theoretical rate improvement. As the Nyquist condition is no longer respected, severe interference impacts the transmitted signals. To deal with the introduced interferences, we propose a turbo-like receiver based on Minimum Mean Square Error Linear Equalization and Interference Cancellation, named MMSE LE-IC. The aim of our system is to boost the transmission rate, which means that high constellation orders will be targeted. In this respect, the MMSE LE-IC, whose complexity is independent of the constellation, turns out to be a good candidate. Since OFDM/OQAM modulation can be equipped with different types of pulse shapes, we propose an algorithm to find, for different constellation orders, the minimum achieved FTN packing factor for various pulse shapes. Then, we aim at improving the iterative processing of the introduced transceiver. The proposed method involves combining a precoder with the FTN-OQAM system in order to remove FTN-induced interference at the transmitter. We also present a sparse precoding pattern as it is difficult to jointly precode all the transmitted symbols. We introduce three families of precoders along with the corresponding receivers. Furthermore, we propose several modifications of the FTN-OQAM transmitter concerning different blocks such as channel coding, bits mapping and symbols mapping to further enhance the FTN-OQAM transceiver design. Presented results reveal the significant potential of the proposed methods. Faster-Than-Nyquist Ofdm/oqam Codage canal Précodage Turbo égalisation Annulation d'interférence Faster-Than-Nyquist Ofdm/oqam Channel coding Precoding Turbo equalization Interference cancellation 004
10	Image-Text context relation using Machine Learning : Research on performance of different datasets Sun, Yuqi January 2022 (has links) Based on the progress in Computer Vision and Natural Language Processing fields, Vision-Language (VL) models are designed to process information from images and texts. The thesis focused on the performance of a model, Oscar, on different datasets. Oscar is a State-of-The-Art VL representation learning model based on a pre-trained model for Object Detection and a pre-trained Bert model. By comparing the performance of datasets, we could understand the relationship between the properties of datasets and the performance of models. The conclusions could provide the direction for future work on VL datasets and models. In this thesis, I collected five VL datasets that have at least one main difference from each other and generated 8 subsets from these datasets. I trained the same model with different subsets to classify whether an image is related to a text. In common sense, clear datasets have better performance because their images are of everyday scenes and annotated by human annotators. Thus, the size of clear datasets is always limited. However, an interesting phenomenon in the thesis is that the dataset generated by models trained on different datasets has achieved as good performance as clear datasets. This would encourage the research on models for data collection. The experiment results also indicated that future work on the VL model could focus on improving feature extraction from images, as the images have a great influence on the performance of VL models. / Baserat på prestationerna inom Computer Vision och Natural Language Processing-fält, är Vision-Language (VL)-modeller utformade för att bearbeta information från bilder och texter. Projektet fokuserade på prestanda av en modell, Oscar, på olika datamängder. Oscar är en State-of-The-Art VL-representationsinlärningsmodell baserad på en förutbildad modell för Objektdetektion och en förutbildad Bert-modell. Genom att jämföra datauppsättningarnas prestanda kunde vi förstå sambandet mellan datauppsättningarnas egenskaper och modellernas prestanda. Slutsatserna skulle kunna ge riktning för framtida arbete med VL-datauppsättningar och modeller. I detta projekt samlade jag fem VL-datauppsättningar som har minst en huvudskillnad från varandra och genererade 8 delmängder från dessa datauppsättningar. Jag tränade samma modell med olika delmängder för att klassificera om en bild är relaterad till en text. I sunt förnuft har tydliga datauppsättningar bättre prestanda eftersom deras bilder är av vardagliga scener och kommenterade av människor. Storleken på tydliga datamängder är därför alltid begränsad. Ett intressant fenomen i projektet är dock att den datauppsättning som genereras av modeller har uppnått lika bra prestanda som tydliga datauppsättningar. Detta skulle uppmuntra forskning om modeller för datainsamling. Experimentresultaten indikerade också att framtida arbete med VL-modellen kan fokusera på att förbättra funktionsextraktion från bilder, eftersom bilderna har ett stort inflytande på prestandan hos VL-modeller. Vision-Language Representation learning Bert Faster R-CNN Oscar Datasets Visual-Language Representation inlärning Bert Faster R-CNN Oscar Datamängder Computer and Information Sciences Data- och informationsvetenskap

Search results