Global ETD Search

1	Исследование возможностей базисных моделей в рамках задачи прогнозирования временного ряда : магистерская диссертация / Exploring the capabilities of foundation models in the time series forecasting problem Зайцев, А. А., Zaytsev, A. A. January 2024 (has links) Объект исследования – стандартные задачи прогнозирования одномерного и многомерного временного ряда. Цель работы – исследование возможностей современных базисных моделей в задаче точечного прогнозирования одномерного и многомерного временного ряда на эталонных наборах данных. Результаты работы: проведены независимые исследования возможностей базисных моделей в задаче прогнозирования одномерного и многомерного временного ряда. Предложены два новых набора данных, сформированные из публичного пакета данных и возможное разделение базисных моделей на категории. Выявлены проблемы при исследовании базисных моделей. Даны рекомендации для проведения новых исследований. / Research object - standard forecasting tasks of univariate and multivariate time series. The aim of the study is to investigate the capabilities of modern foundation models in the task of point forecasting of univariate and multivariate time series on benchmark datasets. Results of the study: independent investigations of the capabilities of foundation models in forecasting univariate and multivariate time series have been conducted. Two new datasets have been proposed, formed from a public dataset package, and a possible categorization of foundation models has been introduced. Issues encountered during the investigation of foundation models have been identified. Recommendations for conducting new research have been provided. MASTER'S THESIS TIME SERIES FOUNDATION MODELS FOUNDATION MODELS FORECAST UNIVARIATE MULTIVARIATE SOTA STATE-OF-THE-ART ВРЕМЕННЫЕ РЯДЫ БАЗИСНЫЕ МОДЕЛИ ПРОГНОЗИРОВАНИЕ ОДНОМЕРНЫЙ МНОГОМЕРНЫЙ ЛУЧШИЕ МОДЕЛИ
2	Toward Transformer-based Large Energy Models for Smart Energy Management Gu, Yueyan 01 November 2024 (has links) Buildings contribute significantly to global energy demand and emissions, highlighting the need for precise energy forecasting for effective management. Existing research tends to focus on specific target problems, such as individual buildings or small groups of buildings, leading to current challenges in data-driven energy forecasting, including dependence on data quality and quantity, limited generalizability, and computational inefficiency. To address these challenges, Generalized Energy Models (GEMs) for energy forecasting can potentially be developed using large-scale datasets. Transformer architectures, known for their scalability, ability to capture long-term dependencies, and efficiency in parallel processing of large datasets, are considered good candidates for GEMs. In this study, we tested the hypothesis that GEMs can be efficiently developed to outperform in-situ models trained on individual buildings. To this end, we investigated and compared three candidate multi-variate Transformer architectures, utilizing both zero-shot and fine-tuning strategies, with data from 1,014 buildings. The results, evaluated across three prediction horizons (24, 72, and 168 hours), confirm that GEMs significantly outperform Transformer-based in-situ (i.e., building-specific) models. Fine-tuned GEMs showed performance improvements of up to 28% and reduced training time by 55%. Besides Transformer-based in-situ models, GEMs outperformed several state-of-the-art non-Transformer deep learning baseline models in efficiency and efficiency. We further explored the answer to a number of questions including the required data size for effective fine-tuning, as well as the impact of input sub-sequence length and pre-training dataset size on GEM performance. The findings show a significant performance boost by using larger pre-training datasets, highlighting the potential for larger GEMs using web-scale global data to move toward Large Energy Models (LEM). / Master of Science / Buildings account for a large share of global energy use and emissions, which makes predicting their energy needs critical for better management. However, most research focuses on creating energy models for specific buildings or small groups, which limits their usefulness for larger-scale applications. Additionally, these models often face challenges such as relying on high-quality data, limited adaptability to different buildings, and inefficiencies when dealing with large amounts of data. This study aims to address these issues by developing Generalized Energy Models (GEMs), which use data from a large number of buildings to create more versatile and efficient energy forecasting tools. To achieve this, we used Transformer models, a type of machine learning approach known for handling large datasets efficiently and recognizing long-term patterns. We tested whether GEMs could provide better predictions than traditional models designed for individual buildings. Our analysis included data from over 1,000 buildings and used two strategies: zero-shot (using the model without further adjustments) and fine-tuning (adapting the model to specific data). The results showed that GEMs were more accurate than traditional models, improving prediction accuracy by up to 28% and reducing the time needed for training by over 50%. Additionally, GEMs outperformed other advanced methods of energy forecasting. We also examined how different factors, such as the amount of data and the length of the data sequences, influenced the model’s performance. The findings suggest that using even larger datasets could lead to further improvements, opening the possibility of creating Large Energy Models (LEMs) that can make predictions on a global scale. Building Energy Management Time Series Forecasting Transformers Scalable AI Foundation Models
3	Leveraging foundation models towards semantic world representations for robotics Kuwajerwala, Alihusein 06 1900 (has links) Un défi central en robotique est la construction de représentations du monde exploitables. Pour accomplir des tâches complexes, les robots doivent construire une représentation 3D de leur environnement qui représente les informations géométriques, visuelles et sémantiques de la scène, et qui est efficace à utiliser. Les approches existantes encodent les informations sémantiques en utilisant un ensemble (fini) d’étiquettes de classes sémantiques, tels que “personne” et “chaise”. Cependant, pour des instructions ambiguës données à un robot, telles que “apporte-moi une collation saine”, cette approche est insuffisante. En conséquence, des travaux récents ont exploité de grands réseaux de neurones pré-entraînés appelés “modèles de fondation”, dont les représentations latentes apprises offrent plus de flexibilité que les étiquettes de classe, mais ces approches peuvent être inefficaces. Dans ce travail, nous construisons des représentations de scènes 3D qui tirent parti des modèles de fondation pour encoder la sémantique, permettant des requêtes à vocabulaire ouvert et multimodales, tout en restant évolutives et efficaces. Nous présentons initialement ConceptFusion, qui construit des cartes 3D à vocabulaire ouvert en assignant à chaque point 3D un vecteur de caractéristiques qui encode la sémantique, permettant des requêtes nuancées et multimodales, mais à un coût de mémoire élevé. Nous présentons ensuite ConceptGraphs, qui s’appuie sur l’approche précédente avec une structure de graphe de scène qui assigne des vecteurs de caractéristiques sémantiques aux objets au lieu des points, augmentant ainsi l’efficacité, tout en permettant la planification sur le graphe de scène construit. Les deux systèmes ne nécessitent pas d’entraînement supplémentaire ni de réglage fin des modèles, mais permettent aux robots d’effectuer des tâches de recherche et de navigation inédites, comme le montrent nos expériences dans le monde réel. / A central challenge in robotics is building actionable world representations. To perform complex tasks, robots need to build a 3D representation of their environment that represents the geometric, visual, and semantic information of the scene, and is efficient to use. Existing approaches encode semantic information using a (finite) set of semantic class labels, such as “person” and “chair”. However, for ambiguous instructions to a robot, such as “get me a healthy snack”, this approach is insufficient. As a result, recent works have leveraged large pre-trained neural networks called “foundation models”, whose learned latent representations offer more flexibility than class labels, but these approaches can be inefficient. For example, they may require prohibitive amounts of video memory, or an inability to edit the map. In this work, we construct 3D scene representations that leverage foundation models to encode semantics, allowing for open-vocabulary and multimodal queries, while still being scalable and efficient. We initially present ConceptFusion, which builds open-vocabulary 3D maps by assigning each 3D point a feature vector that encodes semantics, enabling nuanced and multimodal queries, but at high memory cost. We then present ConceptGraphs, which builds upon the previous approach with a scene graph structure that assigns semantic feature vectors to objects instead of points, increasing efficiency, while also enabling planning over the constructed scene graph. Both systems do not require any additional training or fine-tuning of models, yet enable novel search and navigation tasks to be performed by robots, as shown by our real world experiments. Cartographie 3D Robotique Apprentissage profond Modèles de fondation Information sémantique Méthodes à vocabulaire ouvert Robotics Scene Representation 3D Mapping Robotics Deep Learning Foundation Models Semantic Information Open-Vocabulary Methods
4	The role of continual learning and adaptive computation in improving computational efficiency of deep learning Gupta, Kshitij 01 1900 (has links) Au cours de la dernière décennie, des progrès significatifs ont été réalisés dans le domaine de l’IA, principalement grâce aux progrès de l’apprentissage automatique, de l’apprentissage profond et de l’utilisation de modèles à grande échelle. Cependant, à mesure que ces modèles évoluent, ils présentent de nouveaux défis en termes de gestion de grands ensembles de données et d’efficacité informatique. Cette thèse propose des approches pour réduire les coûts de calcul de la formation et de l’inférence dans les systèmes d’intelligence artificielle (IA). Plus précisément, ce travail étudie les techniques d’apprentissage continu et de calcul adaptatif, démontrant des stratégies possibles pour préserver les niveaux de performance de ces systèmes tout en réduisant considérablement les coûts de formation et d’inférence. Les résultats du premier article montrent que les modèles de base peuvent être continuellement pré-entraînés grâce à une méthode d’échauffement et de relecture, ce qui réduit considérable- ment les coûts de calcul de l’entraînement tout en préservant les performances par rapport à un entraînement à partir de zéro. Par la suite, la thèse étudie comment les stratégies de calcul adaptatif, lorsqu’elles sont combinées avec la mémoire, peuvent être utilisées pour créer des agents d’IA plus efficaces au moment de l’inférence pour des tâches de raisonnement complexes, telles que le jeu stratégique de Sokoban. Nos résultats montrent que les modèles peuvent offrir des per- formances similaires ou améliorées tout en utilisant beaucoup moins de ressources de calcul. Les résultats de cette étude ont de vastes implications pour l’amélioration de l’efficacité in- formatique des systèmes d’IA, soutenant à terme le développement de technologies d’IA plus abordables, accessibles et efficaces. / Over the past decade, significant progress has been made by the field of AI, primarily due to advances in machine learning, deep learning, and the usage of large scale models. However, as these models scale, they present new challenges with respect to handling large datasets and being computationally efficient. This thesis proposes approaches to reducing computational costs of training and inference in artificial intelligence (AI) systems. Specifically, this work investigates how Continual Learning and Adaptive Computation techniques can be used to reducing training and inference costs while preserving the perfor- mance levels of these systems . The findings of the first article show that foundation models can be continually pre-trained through a method of warm-up and replay, which significantly decreases training computational costs while preserving performance compared to training from scratch. Subsequently, the thesis investigates how adaptive computation strategies, when com- bined with memory, can be utilized to create more computationally efficient AI agents at inference time for complex reasoning tasks, such as the strategic game of Sokoban. Our results exhibit that models can deliver similar or improved performances while using signifi- cantly fewer computational resources. Findings from this study have broad implications for improving the computational efficiency of AI systems, ultimately supporting the development of more affordable, accessible, and efficient AI technologies. Efficacité informatique Modèles de base Apprentissage continu Oubli catastrophique Calcul adaptatif Mémoire Planification Computational Efficiency Foundation Models Continual Learning Catastrophic Forgetting Adaptive computation Memory Planning Offline Reinforcement Learning
5	Toward trustworthy deep learning : out-of-distribution generalization and few-shot learning Gagnon-Audet, Jean-Christophe 04 1900 (has links) L'intelligence artificielle est un domaine en pleine évolution. Au premier plan des percées récentes se retrouve des approches connues sous le nom d'apprentissage automatique. Cependant, bien que l'apprentissage automatique ait montré des performances remarquables dans des tâches telles que la reconnaissance et la génération d'images, la génération et la traduction de textes et le traitement de la parole, il est connu pour échouer silencieusement dans des conditions courantes. Cela est dû au fait que les algorithmes modernes héritent des biais des données utilisées pour les créer, ce qui conduit à des prédictions incorrectes lorsqu'ils rencontrent de nouvelles données différentes des données d'entraînement. Ce problème est connu sous le nom de défaillance hors-distribution. Cela rend l'intelligence artificielle moderne peu fiable et constitue un obstacle important à son déploiement sécuritaire et généralisé. Ignorer l'échec de généralisation hors-distribution de l'apprentissage automatique pourrait entraîner des situations mettant des vies en danger. Cette thèse vise à aborder cette question et propose des solutions pour assurer le déploiement sûr et fiable de modèles d'intelligence artificielle modernes. Nous présentons trois articles qui couvrent différentes directions pour résoudre l'échec de généralisation hors-distribution de l'apprentissage automatique. Le premier article propose une approche directe qui démontre une performance améliorée par rapport à l'état de l'art. Le deuxième article établie les bases de recherches futures en généralisation hors distribution dans les séries temporelles, tandis que le troisième article fournit une solution simple pour corriger les échecs de généralisation des grands modèles pré-entraînés lorsqu'entraîné sur tes tâches en aval. Ces articles apportent des contributions précieuses au domaine et fournissent des pistes prometteuses pour la recherche future en généralisation hors distribution. / Artificial Intelligence (AI) is a rapidly advancing field, with data-driven approaches known as machine learning, at the forefront of many recent breakthroughs. However, while machine learning have shown remarkable performance in tasks such as image recognition and generation, text generation and translation, and speech processing, they are known to silently fail under common conditions. This is because modern AI algorithms inherit biases from the data used to train them, leading to incorrect predictions when encountering new data that is different from the training data. This problem is known as distribution shift or out-of-distribution (OOD) failure. This causes modern AI to be untrustworthy and is a significant barrier to the safe widespread deployment of AI. Failing to address the OOD generalization failure of machine learning could result in situations that put lives in danger or make it impossible to deploy AI in any significant manner. This thesis aims to tackle this issue and proposes solutions to ensure the safe and reliable deployment of modern deep learning models. We present three papers that cover different directions in solving the OOD generalization failure of machine learning. The first paper proposes a direct approach that demonstrates improved performance over the state-of-the-art. The second paper lays the groundwork for future research in OOD generalization in time series, while the third paper provides a straightforward solution for fixing generalization failures of large pretrained models when finetuned on downstream tasks. These papers make valuable contributions to the field and provide promising avenues for future research in OOD generalization. apprentissage automatique apprentissage profond réseaux de neurones apprentissage de représentation déplacement de distribution généralisation hors-distribution modèles fondamentaux apprentissage à quelques exemples machine learning deep learning neural networks representation learning domain generalization distribution shift out-of-distribution generalization foundation models few-shot learning généralisation de domaine

1

Page generated in 0.0817 seconds