Global ETD Search

161	Alternative Solution to Catastrophical Forgetting on FewShot Instance Segmentation Álvarez Fernández Del Vallado, Juan January 2021 (has links) Video instance segmentation is a rapidly-growing research area within the computer vision field. Models for segmentation require data already annotated, which can be a daunting task when starting from scratch. Although there are some publicly available datasets for image instance segmentation, they are limited to the application they target. This work proposes a new approach to training an instance segmentation model using transfer learning, notably reducing the need for annotated data. Transferring knowledge from domain A to domain B can result in catastrophical forgetting, leading to an algorithm unable to properly generalize and remember the previous knowledge acquired at the initial domain. This problem is studied and a solution is proposed based on data transformations applied precisely at the process of transferring knowledge to the target domain following the empirical research method and using publicly available video instance segmentation datasets as resources for the experiments. Conclusions show there is a relationship between the data transformations and ability to generalize both domains. / Segmentering av videointervjuer är ett snabbt växande forskningsområde inom datorseende. Modeller för segmentering kräver data som redan är annoterade, vilket kan vara en krävande uppgift när man börjar från början. Även om det finns några offentligt tillgängliga datamängder för bildinstanssegmentering är de begränsade till den tillämpning de är inriktade på. I detta arbete föreslås en ny metod för att träna en modell för instanssegmentering med hjälp av överföringsinlärning, vilket framför allt minskar behovet av annoterade data. Överföring av kunskap från domän A till domän B kan resultera i katastrofal glömska, vilket leder till att en algoritm inte kan generalisera och komma ihåg den tidigare kunskap som förvärvats i den ursprungliga domänen. Detta problem studeras och en lösning föreslås som bygger på datatransformationer som tillämpas just vid överföringen av kunskap till måldomänen enligt den empiriska forskningsmetoden och med hjälp av offentligt tillgängliga datamängder för segmentering av videointervjuer som resurser för experimenten. Slutsatserna visar att det finns ett samband mellan datatransformationer och förmågan att generalisera båda områdena. Machine learning big data transfer learning computer vision instance segmentation Maskininlärning stora datamängder datorseende instanssegmentering objektdetektering detectron2 Computer and Information Sciences Data- och informationsvetenskap
162	Formative Research on Component Display Theory Antwi, Samuel January 2017 (has links) No description available. Curriculum Development Instructional Design Educational Technology Education Component display theory concepts computer-based instruction design research formative research generality instance instructional design interdisplay relationship primary presentation form secondary presentation form theory
163	[pt] DESENVOLVIMENTO DE UMA METODOLOGIA PARA CARACTERIZAÇÃO DE FASES NO PELLET FEED UTILIZANDO MICROSCOPIA DIGITAL E APRENDIZAGEM PROFUNDA / [en] DEVELOPMENT OF A METHODOLOGY FOR PHASE CHARACTERIZATION IN PELLET FEED USING DIGITAL MICROSCOPY AND DEEP LEARNING THALITA DIAS PINHEIRO CALDAS 09 November 2023 (has links) [pt] O minério de ferro é encontrado na natureza como agregado de minerais, dentre os principais minerais presentes em sua composição estão: hematita, magnetita, goethita e quartzo. Dada a importância do minério de ferro para a indústria, há um crescente interesse por sua caracterização com o objetivo de avaliar a qualidade do material. Com o avanço de pesquisas na área de análise de imagens e microscopia, rotinas de caracterização foram desenvolvidas utilizando ferramentas de Microscopia Digital e Processamento e Análise Digital de Imagens capazes de automatizar grande parte do processo. Porém esbarrava-se em algumas dificuldades, como por exemplo identificar e classificar as diferentes texturas das partículas de hematita, as diferentes formas de seus cristais ou discriminar quartzo e resina em imagens de microscopia ótica de luz refletida. Desta forma, a partir da necessidade de se construir sistemas capazes de aprender e se adaptar a possíveis variações das imagens deste material, surgiu a possibilidade de estudar a utilização de ferramentas de Deep Learning para esta função. Este trabalho propõe o desenvolvimento de uma nova metodologia de caracterização mineral baseada em Deep Learning utilizando o algoritmo Mask R-CNN. Através do qual é possível realizar segmentação de instâncias, ou seja, desenvolver sistemas capazes de identificar, classificar e segmentar objetos nas imagens. Neste trabalho, foram desenvolvidos dois modelos: Modelo 1 que realiza segmentação de instâncias para as classes compacta, porosa, martita e goethita em imagens obtidas em Campo Claro e o Modelo 2 que utiliza imagens adquiridas em Luz Polarizada Circularmente para segmentar as classes monocristalina, policristalina e martita. Para o Modelo 1 foi obtido F1-score em torno de 80 por cento e para o Modelo 2 em torno de 90 por cento. A partir da segmentação das classes foi possível extrair atributos importantes de cada partícula, como distribuição de quantidade, medidas de forma, tamanho e fração de área. Os resultados obtidos foram muito promissores e indicam que a metodologia desenvolvida pode ser viável para tal caracterização. / [en] Iron ore is found in nature as an aggregate of minerals. Among the main minerals in its composition are hematite, magnetite, goethite, and quartz. Given the importance of iron ore for the industry, there is a growing interest in its characterization to assess the material s quality. With the advancement of image analysis and microscopy research, characterization routines were developed using Digital Microscopy and Digital Image Processing and Analysis tools capable of automating a large part of the process. However, it encountered some difficulties, such as identifying and classifying the different textures of hematite particles, the different shapes of its crystals, or discriminating between quartz and resin in optical microscopy images of reflected light. Therefore, from the need to build systems capable of learning and adapting to possible variations of the images of this material, the possibility of studying the use of Deep Learning tools for this function arose. This work proposes developing a new mineral characterization methodology based on Deep Learning using the Mask R-CNN algorithm. Through this, it is possible to perform instance segmentation, that is, to develop systems capable of identifying, classifying, and segmenting objects in images. In this work, two models were developed: Model 1 performs segmentation of instances for the compact, porous, martite, and goethite classes in images obtained in Bright Field, and Model 2 uses images acquired in Circularly Polarized Light to segment the classes monocrystalline, polycrystalline and martite. For Model 1, F1-score was obtained around 80 percent, and for Model 2, around 90 percent. From the class segmentation, it was possible to extract important attributes of each particle, such as quantity distribution, shape measurements, size, and area fraction. The obtained results were very promising and indicated that the developed methodology could be viable for such characterization. [pt] ANALISE DE IMAGENS [pt] SEGMENTACAO DE INSTANCIAS [pt] APRENDIZADO PROFUNDO [pt] CARACTERIZACAO MICROESTRUTURAL [pt] MICROSCOPIA DIGITAL [pt] MINERIO DE FERRO [en] IMAGE ANALYSIS [en] INSTANCE SEGMENTATION [en] DEEP LEARNING [en] MICROSTRUCTURAL CHARACTERIZATION [en] DIGITAL MICROSCOPY [en] IRON ORE
164	[pt] MONITORAMENTO DE MORANGOS: DETECÇÃO, CLASSIFICAÇÃO E SERVOVISÃO / [en] STRAWBERRY MONITORING: DETECTION, CLASSIFICATION, AND VISUAL SERVOING GABRIEL LINS TENORIO 27 August 2024 (has links) [pt] O presente trabalho inicia com uma investigação sobre o uso de modelos de Aprendizado Profundo 3D para a detecção aprimorada de morangos em túneis de cultivo. Focou-se em duas tarefas principais: primeiramente, a detecção de frutas, comparando o modelo original MaskRCNN com uma versão adaptada que integra informações de profundidade (MaskRCNN-D). Ambos os modelos são capazes de classificar morangos baseados em sua maturidade (maduro, não maduro) e estado de saúde (afetados por doença ou fungo). Em segundo lugar, focou-se em identificar a região mais ampla dos morangos, cumprindo um requisito para um sistema de espectrômetro capaz de medir o conteúdo de açúcar das frutas. Nesta tarefa, comparouse um algoritmo baseado em contorno com uma versão aprimorada do modelo VGG-16. Os resultados demonstram que a integração de dados de profundidade no MaskRCNN-D resulta em até 13.7 por cento de melhoria no mAP através de diversos conjuntos de teste de morangos, incluindo os simulados, enfatizando a eficácia do modelo em cenários agrícolas reais e simulados. Além disso, nossa abordagem de solução ponta-a-ponta, que combina a detecção de frutas (MaskRCNN-D) e os modelos de identificação da região mais ampla (VGG-16 aprimorado), mostra um erro de localização notavelmente baixo, alcançando até 11.3 pixels de RMSE em uma imagem de morango cortada de 224 × 224. Finalmente, explorou-se o desafio de aprimorar a qualidade das leituras de dados do espectrômetro através do posicionamento automático do sensor. Para tal, projetou-se e treinou-se um modelo de Aprendizado Profundo com dados simulados, capaz de prever a acurácia do sensor com base em uma imagem dada de um morango e o deslocamento desejado da posição do sensor. Usando este modelo, calcula-se o gradiente da saída de acurácia em relação à entrada de deslocamento. Isso resulta em um vetor indicando a direção e magnitude com que o sensor deve ser movido para melhorar a acurácia do sinal do sensor. Propôs-se então uma solução de Servo Visão baseada neste vetor, obtendo um aumento significativo na acurácia média do sensor e melhoria na consistência em novas iterações simuladas. / [en] The present work begins with an investigation into the use of 3D Deep Learning models for enhanced strawberry detection in polytunnels. We focus on two main tasks: firstly, fruit detection, comparing the standard MaskRCNN with an adapted version that integrates depth information (MaskRCNN-D). Both models are capable of classifying strawberries based on their maturity (ripe, unripe) and health status (affected by disease or fungus). Secondly, we focus on identifying the widest region of strawberries, fulfilling a requirement for a spectrometer system capable of measuring their sugar content. In this task, we compare a contour-based algorithm with an enhanced version of the VGG-16 model. Our findings demonstrate that integrating depth data into the MaskRCNN-D results in up to a 13.7 percent improvement in mAP across various strawberry test sets, including simulated ones, emphasizing the model s effectiveness in both real-world and simulated agricultural scenarios. Furthermore, our end-to-end pipeline approach, which combines the fruit detection (MaskRCNN-D) and widest region identification models (enhanced VGG-16), shows a remarkably low localization error, achieving down to 11.3 pixels of RMSE in a 224 × 224 strawberry cropped image. Finally, we explore the challenge of enhancing the quality of the data readings from the spectrometer through automatic sensor positioning. To this end, we designed and trained a Deep Learning model with simulated data, capable of predicting the sensor accuracy based on a given image of the strawberry and the subsequent displacement of the sensor s position. Using this model, we calculate the gradient of the accuracy output with respect to the displacement input. This results in a vector indicating the direction and magnitude with which the sensor should be moved to improve the sensor signal accuracy. A Visual Servoing solution based on this vector provided a significant increase in the average sensor accuracy and improvement in consistency across new simulated iterations. [pt] AGRICULTURA DE PRECISAO [pt] DETECCAO DE MORANGOS [pt] SEGMENTACAO DE INSTANCIAS EM 3D [en] PRECISION AGRICULTURE [en] STRAWBERRY DETECTION [en] 3D INSTANCE SEGMENTAION [en] DEEP LEARNING-BASED VISION SERVOING
165	Données multimodales pour l'analyse d'image Guillaumin, Matthieu 27 September 2010 (has links) (PDF) La présente thèse s'intéresse à l'utilisation de méta-données textuelles pour l'analyse d'image. Nous cherchons à utiliser ces informations additionelles comme supervision faible pour l'apprentissage de modèles de reconnaissance visuelle. Nous avons observé un récent et grandissant intérêt pour les méthodes capables d'exploiter ce type de données car celles-ci peuvent potentiellement supprimer le besoin d'annotations manuelles, qui sont coûteuses en temps et en ressources. Nous concentrons nos efforts sur deux types de données visuelles associées à des informations textuelles. Tout d'abord, nous utilisons des images de dépêches qui sont accompagnées de légendes descriptives pour s'attaquer à plusieurs problèmes liés à la reconnaissance de visages. Parmi ces problèmes, la vérification de visages est la tâche consistant à décider si deux images représentent la même personne, et le nommage de visages cherche à associer les visages d'une base de données à leur noms corrects. Ensuite, nous explorons des modèles pour prédire automatiquement les labels pertinents pour des images, un problème connu sous le nom d'annotation automatique d'image. Ces modèles peuvent aussi être utilisés pour effectuer des recherches d'images à partir de mots-clés. Nous étudions enfin un scénario d'apprentissage multimodal semi-supervisé pour la catégorisation d'image. Dans ce cadre de travail, les labels sont supposés présents pour les données d'apprentissage, qu'elles soient manuellement annotées ou non, et absentes des données de test. Nos travaux se basent sur l'observation que la plupart de ces problèmes peuvent être résolus si des mesures de similarité parfaitement adaptées sont utilisées. Nous proposons donc de nouvelles approches qui combinent apprentissage de distance, modèles par plus proches voisins et méthodes par graphes pour apprendre, à partir de données visuelles et textuelles, des similarités visuelles spécifiques à chaque problème. Dans le cas des visages, nos similarités se concentrent sur l'identité des individus tandis que, pour les images, elles concernent des concepts sémantiques plus généraux. Expérimentalement, nos approches obtiennent des performances à l'état de l'art sur plusieurs bases de données complexes. Pour les deux types de données considérés, nous montrons clairement que l'apprentissage bénéficie de l'information textuelle supplémentaire résultant en l'amélioration de la performance des systèmes de reconnaissance visuelle. Face recognition Face verification Image auto-annotation Keyword-based image retrieval Object recognition Metric learning Nearest neighbour models Constrained clustering Multiple instance metric learning Multimodal semisupervised learning Weakly supervised learning
166	Estimation du RUL par des approches basées sur l'expérience : de la donnée vers la connaissance / Rul estimation using experience based approached : from data to knwoledge Khelif, Racha 14 December 2015 (has links) Nos travaux de thèses s’intéressent au pronostic de défaillance de composant critique et à l’estimation de la durée de vie résiduelle avant défaillance (RUL). Nous avons développé des méthodes basées sur l’expérience. Cette orientation nous permet de nous affranchir de la définition d’un seuil de défaillance, point problématique lors de l’estimation du RUL. Nous avons pris appui sur le paradigme de Raisonnement à Partir de Cas (R à PC) pour assurer le suivi d’un nouveau composant critique et prédire son RUL. Une approche basée sur les instances (IBL) a été développée en proposant plusieurs formalisations de l’expérience : une supervisée tenant compte de l’ état du composant sous forme d’indicateur de santé et une non-supervisée agrégeant les données capteurs en une série temporelle mono-dimensionnelle formant une trajectoire de dégradation. Nous avons ensuite fait évoluer cette approche en intégrant de la connaissance à ces instances. La connaissance est extraite à partir de données capteurs et est de deux types : temporelle qui complète la modélisation des instances et fréquentielle qui, associée à la mesure de similarité permet d’affiner la phase de remémoration. Cette dernière prend appui sur deux types de mesures : une pondérée entre fenêtres parallèles et fixes et une pondérée avec projection temporelle. Les fenêtres sont glissantes ce qui permet d’identifier et de localiser l’état actuel de la dégradation de nouveaux composants. Une autre approche orientée donnée a été test ée. Celle-ci est se base sur des caractéristiques extraites des expériences, qui sont mono-dimensionnelles dans le premier cas et multi-dimensionnelles autrement. Ces caractéristiques seront modélisées par un algorithme de régression à vecteurs de support (SVR). Ces approches ont été évaluées sur deux types de composants : les turboréacteurs et les batteries «Li-ion». Les résultats obtenus sont intéressants mais dépendent du type de données traitées. / Our thesis work is concerned with the development of experience based approachesfor criticalcomponent prognostics and Remaining Useful Life (RUL) estimation. This choice allows us to avoidthe problematic issue of setting a failure threshold.Our work was based on Case Based Reasoning (CBR) to track the health status of a new componentand predict its RUL. An Instance Based Learning (IBL) approach was first developed offering twoexperience formalizations. The first is a supervised method that takes into account the status of thecomponent and produces health indicators. The second is an unsupervised method that fuses thesensory data into degradation trajectories.The approach was then evolved by integrating knowledge. Knowledge is extracted from the sensorydata and is of two types: temporal that completes the modeling of instances and frequential that,along with the similarity measure refine the retrieval phase. The latter is based on two similaritymeasures: a weighted one between fixed parallel windows and a weighted similarity with temporalprojection through sliding windows which allow actual health status identification.Another data-driven technique was tested. This one is developed from features extracted from theexperiences that can be either mono or multi-dimensional. These features are modeled by a SupportVector Regression (SVR) algorithm. The developed approaches were assessed on two types ofcritical components: turbofans and ”Li-ion” batteries. The obtained results are interesting but theydepend on the type of the treated data. Indicateur de santé Trajectoires de dégradation IBL Approche basée sur les instances Raisonnement à partir de cas Connaissance Similarité SVR Régression à vecteurs de support Experience based prognostics Remaining Useful Life RUL Health indicators Degradation trajectories IBL Instance Based Learning CBR Knowledge Similarity Support Vector Regression Case Based Reasoning 600
167	Aspects procéduraux de la contrefaçon de brevet d'invention / Procedural aspects of patent infringement Hubert, Olivier 01 December 2015 (has links) Le droit procédural de l’action en contrefaçon de brevet d’invention n’est pas un droit autonome. En effet, si l’action en contrefaçon de brevet dépend majoritairement de règles procédurales qui lui sont propres, elle repose également sur une multitude de règles appartenant à des systèmes normatifs plus généraux, tels que, notamment, le droit judiciaire privé, le droit des biens, le droit des contrats, ou encore les droits fondamentaux. L’instance en contrefaçon de brevet, qui relève pour sa part essentiellement du droit judiciaire privé général, intègre un certain nombre de règles spécifiques qui lui confèrent ainsi une physionomie originale. Seule l’étude des rapports existant entre ces différents systèmes normatifs, à chaque étape de l’action et de l’instance, permet de clarifier les aspects procéduraux de l’action en contrefaçon de brevet d’invention et de sécuriser les justiciables dans l’exercice de leurs droits. / The procedural law of patent infringement action is not an autonomous law. Indeed, if patent infringement action largely depends on its own procedural rules, it also relies on a multitude of rules belonging to more general normative systems, such as, in particular, the private judicial law, property law, contract law, or human rights. The patent infringement proceedings, which fundamentaly depends on private judicial law, integrates some specific rules, which thus give it a unique legal physionomy. Only the study of the relationship between these different normative systems at each stage of both the action and the proceedings, clarifies the procedural aspects of the action of patent infringement and secure as well as protecting litigants while exercizing their rights. Action en contrefaçon Brevet d'invention Compétences juridictionnelles Défense Droit judiciaire privé Exécution forcée Instance Jugement Mesures probatoires Mesures provisoires et conservatoires Procédure Saisie-Contrefaçon Voies de recours Infringement action Patent Jurisdiction Defense Private littigation Enforcement Proceedings Judgment Probationary measures Provisionnal and protective measures Procedure Entering infringement Saisie-Contrefaçon Appeals 340
168	Rozeznávání vzorů v dynamických datech / Pattern Recognition in Temporal Data Hovanec, Stanislav January 2009 (has links) This diploma work initially conduct research in the area of descriptions and analysis of time series. The thesis then proceed to introduce the problems of technical analysis of price charts as well as indicators, price patterns and method of Pure Price Action. The method Pure Price Action is demonstrated in this work in two practical examples of its application to real businesses with a view to discovering and analyzing price patterns, as well as analysis and prediction of future price and financial evolution. This analysis is an introduction to the processes of successful business, following on from this we discuss the theme of Pattern Recognition and the Instance Based Learning method. The practical aspect of this work is carried out with the aid of a MATLAB applied algorithm for the analysis of the price pattern Correction for sale and purchase in dynamic time segments, specifically in trading price graphs, like those used for commodities or stock trading. For the analysis of time series we use the Pure Price Action method. The Instance Based Learning method is used by the algorithm to recognize price patterns. The created algorithm is verified on real data of a 5 minute time series of the US Dow Jones price charts for the years 2006, 2007, 2008. The achieved accuracy is evaluated with the aid of Equity Curves.
169	Instance Segmentation of Multiclass Litter and Imbalanced Dataset Handling : A Deep Learning Model Comparison / Instanssegmentering av kategoriserat skräp samt hantering av obalanserat dataset Sievert, Rolf January 2021 (has links) Instance segmentation has a great potential for improving the current state of littering by autonomously detecting and segmenting different categories of litter. With this information, litter could, for example, be geotagged to aid litter pickers or to give precise locational information to unmanned vehicles for autonomous litter collection. Land-based litter instance segmentation is a relatively unexplored field, and this study aims to give a comparison of the instance segmentation models Mask R-CNN and DetectoRS using the multiclass litter dataset called Trash Annotations in Context (TACO) in conjunction with the Common Objects in Context precision and recall scores. TACO is an imbalanced dataset, and therefore imbalanced data-handling is addressed, exercising a second-order relation iterative stratified split, and additionally oversampling when training Mask R-CNN. Mask R-CNN without oversampling resulted in a segmentation of 0.127 mAP, and with oversampling 0.163 mAP. DetectoRS achieved 0.167 segmentation mAP, and improves the segmentation mAP of small objects most noticeably, with a factor of at least 2, which is important within the litter domain since small objects such as cigarettes are overrepresented. In contrast, oversampling with Mask R-CNN does not seem to improve the general precision of small and medium objects, but only improves the detection of large objects. It is concluded that DetectoRS improves results compared to Mask R-CNN, as well does oversampling. However, using a dataset that cannot have an all-class representation for train, validation, and test splits, together with an iterative stratification that does not guarantee all-class representations, makes it hard for future works to do exact comparisons to this study. Results are therefore approximate considering using all categories since 12 categories are missing from the test set, where 4 of those were impossible to split into train, validation, and test set. Further image collection and annotation to mitigate the imbalance would most noticeably improve results since results depend on class-averaged values. Doing oversampling with DetectoRS would also help improve results. There is also the option to combine the two datasets TACO and MJU-Waste to enforce training of more categories. Machine learning Multiclass Deep learning Instance segmentation Object segmentation Iterative stratification Mask R-CNN DetectoRS Imbalanced dataset Classification Detection Segmentation Litter Trash TACO COCO MMDetection Multinomial Cybercom AI Artificial intelligence Land-based litter Computer vision Maskininlärning Djupinlärning Instanssegmentering Objektsegmentering Mask R-CNN DetectoRS Obalanserat dataset Klassificering Detektion Segmentering Skräp TACO COCO MMDetection Multinomial Cybercom AI Artificiell intelligens Datorseende
170	Towards meaningful and data-efficient learning : exploring GAN losses, improving few-shot benchmarks, and multimodal video captioning Huang, Gabriel 09 1900 (has links) Ces dernières années, le domaine de l’apprentissage profond a connu des progrès énormes dans des applications allant de la génération d’images, détection d’objets, modélisation du langage à la réponse aux questions visuelles. Les approches classiques telles que l’apprentissage supervisé nécessitent de grandes quantités de données étiquetées et spécifiques à la tâches. Cependant, celles-ci sont parfois coûteuses, peu pratiques, ou trop longues à collecter. La modélisation efficace en données, qui comprend des techniques comme l’apprentissage few-shot (à partir de peu d’exemples) et l’apprentissage self-supervised (auto-supervisé), tentent de remédier au manque de données spécifiques à la tâche en exploitant de grandes quantités de données plus “générales”. Les progrès de l’apprentissage profond, et en particulier de l’apprentissage few-shot, s’appuient sur les benchmarks (suites d’évaluation), les métriques d’évaluation et les jeux de données, car ceux-ci sont utilisés pour tester et départager différentes méthodes sur des tâches précises, et identifier l’état de l’art. Cependant, du fait qu’il s’agit de versions idéalisées de la tâche à résoudre, les benchmarks sont rarement équivalents à la tâche originelle, et peuvent avoir plusieurs limitations qui entravent leur rôle de sélection des directions de recherche les plus prometteuses. De plus, la définition de métriques d’évaluation pertinentes peut être difficile, en particulier dans le cas de sorties structurées et en haute dimension, telles que des images, de l’audio, de la parole ou encore du texte. Cette thèse discute des limites et des perspectives des benchmarks existants, des fonctions de coût (training losses) et des métriques d’évaluation (evaluation metrics), en mettant l’accent sur la modélisation générative - les Réseaux Antagonistes Génératifs (GANs) en particulier - et la modélisation efficace des données, qui comprend l’apprentissage few-shot et self-supervised. La première contribution est une discussion de la tâche de modélisation générative, suivie d’une exploration des propriétés théoriques et empiriques des fonctions de coût des GANs. La deuxième contribution est une discussion sur la limitation des few-shot classification benchmarks, certains ne nécessitant pas de généralisation à de nouvelles sémantiques de classe pour être résolus, et la proposition d’une méthode de base pour les résoudre sans étiquettes en phase de testing. La troisième contribution est une revue sur les méthodes few-shot et self-supervised de détection d’objets , qui souligne les limites et directions de recherche prometteuses. Enfin, la quatrième contribution est une méthode efficace en données pour la description de vidéo qui exploite des jeux de données texte et vidéo non supervisés. / In recent years, the field of deep learning has seen tremendous progress for applications ranging from image generation, object detection, language modeling, to visual question answering. Classic approaches such as supervised learning require large amounts of task-specific and labeled data, which may be too expensive, time-consuming, or impractical to collect. Data-efficient methods, such as few-shot and self-supervised learning, attempt to deal with the limited availability of task-specific data by leveraging large amounts of general data. Progress in deep learning, and in particular, few-shot learning, is largely driven by the relevant benchmarks, evaluation metrics, and datasets. They are used to test and compare different methods on a given task, and determine the state-of-the-art. However, due to being idealized versions of the task to solve, benchmarks are rarely equivalent to the original task, and can have several limitations which hinder their role of identifying the most promising research directions. Moreover, defining meaningful evaluation metrics can be challenging, especially in the case of high-dimensional and structured outputs, such as images, audio, speech, or text. This thesis discusses the limitations and perspectives of existing benchmarks, training losses, and evaluation metrics, with a focus on generative modeling—Generative Adversarial Networks (GANs) in particular—and data-efficient modeling, which includes few-shot and self-supervised learning. The first contribution is a discussion of the generative modeling task, followed by an exploration of theoretical and empirical properties of the GAN loss. The second contribution is a discussion of a limitation of few-shot classification benchmarks, which is that they may not require class semantic generalization to be solved, and the proposal of a baseline method for solving them without test-time labels. The third contribution is a survey of few-shot and self-supervised object detection, which points out the limitations and promising future research for the field. Finally, the fourth contribution is a data-efficient method for video captioning, which leverages unsupervised text and video datasets, and explores several multimodal pretraining strategies. self-supervised learning few-shot classification few-shot object detection low-data learning object detection instance segmentation representation learning residual network visual transformer Faster R-CNN DETR parametric adversarial divergence generative adversarial network variational auto-encoder maximum-likelihood structured prediction optimal discriminator mutual information implicit generative model multimodal pretraining dense video captioning cross-attention YouCook2 HowTo-100M Youtube-8M Recipe-1M Pascal VOC MSCOCO LVIS mutual information neural estimation apprentissage auto-supervisé classification few-shot détection d'objets few-shot apprentissage efficace en données segmentation en instances apprentissage de représentation réseau résiduel transformer visual divergences antagonistes paramétriques auto-encodeur variationnel maximum de vraisemblance prédiction structurée discriminateur optimal information mutuelle modèle génératif implicite pré-apprentissage multi-modal description dense de vidéo attention croisée ResNet ViT GAN VAE MINE

Search results