Global ETD Search

511	Enhanced 3D Object Detection And Tracking In Autonomous Vehicles: An Efficient Multi-modal Deep Fusion Approach Priyank Kalgaonkar (10911822) 03 September 2024 (has links) <p dir="ltr">This dissertation delves into a significant challenge for Autonomous Vehicles (AVs): achieving efficient and robust perception under adverse weather and lighting conditions. Systems that rely solely on cameras face difficulties with visibility over long distances, while radar-only systems struggle to recognize features like stop signs, which are crucial for safe navigation in such scenarios.</p><p dir="ltr">To overcome this limitation, this research introduces a novel deep camera-radar fusion approach using neural networks. This method ensures reliable AV perception regardless of weather or lighting conditions. Cameras, similar to human vision, are adept at capturing rich semantic information, whereas radars can penetrate obstacles like fog and darkness, similar to X-ray vision.</p><p dir="ltr">The thesis presents NeXtFusion, an innovative and efficient camera-radar fusion network designed specifically for robust AV perception. Building on the efficient single-sensor NeXtDet neural network, NeXtFusion significantly enhances object detection accuracy and tracking. A notable feature of NeXtFusion is its attention module, which refines critical feature representation for object detection, minimizing information loss when processing data from both cameras and radars.</p><p dir="ltr">Extensive experiments conducted on large-scale datasets such as Argoverse, Microsoft COCO, and nuScenes thoroughly evaluate the capabilities of NeXtDet and NeXtFusion. The results show that NeXtFusion excels in detecting small and distant objects compared to existing methods. Notably, NeXtFusion achieves a state-of-the-art mAP score of 0.473 on the nuScenes validation set, outperforming competitors like OFT by 35.1% and MonoDIS by 9.5%.</p><p dir="ltr">NeXtFusion’s excellence extends beyond mAP scores. It also performs well in other crucial metrics, including mATE (0.449) and mAOE (0.534), highlighting its overall effectiveness in 3D object detection. Visualizations of real-world scenarios from the nuScenes dataset processed by NeXtFusion provide compelling evidence of its capability to handle diverse and challenging environments.</p> Computer vision CondenseNeXt NeXtDet NeXtFusion Sensor Fusion AI CNN DNN ANN PyTorch Object Detection Image Classification Self-Driving Autonomous Car Neural Networks Machine Learning Deep Learning
512	<b>LIDAR BASED 3D OBJECT DETECTION USING YOLOV8</b> Swetha Suresh Menon (18813667) 03 September 2024 (has links) <p dir="ltr">Autonomous vehicles have gained substantial traction as the future of transportation, necessitating continuous research and innovation. While 2D object detection and instance segmentation methods have made significant strides, 3D object detection offers unparalleled precision. Deep neural network-based 3D object detection, coupled with sensor fusion, has become indispensable for self-driving vehicles, enabling a comprehensive grasp of the spatial geometry of physical objects. In our study of a Lidar-based 3D object detection network using point clouds, we propose a novel architectural model based on You Only Look Once (YOLO) framework. This innovative model combines the efficiency and accuracy of the YOLOv8 network, a swift 2D standard object detector, and a state-of-the-art model, with the real-time 3D object detection capability of the Complex YOLO model. By integrating the YOLOv8 model as the backbone network and employing the Euler Region Proposal (ERP) method, our approach achieves rapid inference speeds, surpassing other object detection models while upholding high accuracy standards. Our experiments, conducted on the KITTI dataset, demonstrate the superior efficiency of our new architectural model. It outperforms its predecessors, showcasing its prowess in advancing the field of 3D object detection in autonomous vehicles.</p> Computer vision Deep learning Neural networks Euler Region Proposal Network Lidar Computer Vision 3D Object Detection Autonomous Vehicles YOLOv8
513	[pt] MAPEAMENTO DA DISTRIBUIÇÃO POPULACIONAL ATRAVÉS DA DETECÇÃO DE ÁREAS EDIFICADAS EM IMAGENS DE REGIÕES HETEROGÊNEAS DO GOOGLE EARTH USANDO DEEP LEARNING / [en] POPULATION DISTRIBUTION MAPPING THROUGH THE DETECTION OF BUILDING AREAS IN GOOGLE EARTH IMAGES OF HETEROGENEOUS REGIONS USING DEEP LEARNING CASSIO FREITAS PEREIRA DE ALMEIDA 08 February 2018 (has links) [pt] Informações precisas sobre a distribuição da população são reconhecidamente importantes. A fonte de informação mais completa sobre a população é o censo, cujos os dados são disponibilizados de forma agregada em setores censitários. Esses setores são unidades operacionais de tamanho e formas irregulares, que dificulta a análise espacial dos dados associados. Assim, a mudança de setores censitários para um conjunto de células regulares com estimativas adequadas facilitaria a análise. Uma metodologia a ser utilizada para essa mudança poderia ser baseada na classificação de imagens de sensoriamento remoto para a identificação de domicílios, que é a base das pesquisas envolvendo a população. A detecção de áreas edificadas é uma tarefa complexa devido a grande variabilidade de características de construção e de imagens. Os métodos usuais são complexos e muito dependentes de especialistas. Os processos automáticos dependem de grandes bases de imagens para treinamento e são sensíveis à variação de qualidade de imagens e características das construções e de ambiente. Nesta tese propomos a utilização de um método automatizado para detecção de edificações em imagens Google Earth que mostrou bons resultados utilizando um conjunto de imagens relativamente pequeno e com grande variabilidade, superando as limitações dos processos existentes. Este resultado foi obtido com uma aplicação prática. Foi construído um conjunto de imagens com anotação de áreas construídas para 12 regiões do Brasil. Estas imagens, além de diferentes na qualidade, apresentam grande variabilidade nas características das edificações e no ambiente geográfico. Uma prova de conceito será feita na utilização da classificação de área construída nos métodos dasimétrico para a estimação de população em gride. Ela mostrou um resultado promissor quando comparado com o método usual, possibilitando a melhoria da qualidade das estimativas. / [en] The importance of precise information about the population distribution is widely acknowledged. The census is considered the most reliable and complete source of this information, and its data are delivered in an aggregated form in sectors. These sectors are operational units with irregular shapes, which hinder the spatial analysis of the data. Thus, the transformation of sectors onto a regular grid would facilitate such analysis. A methodology to achieve this transformation could be based on remote sensing image classification to identify building where the population lives. The building detection is considered a complex task since there is a great variability of building characteristics and on the images quality themselves. The majority of methods are complex and very specialist dependent. The automatic methods require a large annotated dataset for training and they are sensitive to the image quality, to the building characteristics, and to the environment. In this thesis, we propose an automatic method for building detection based on a deep learning architecture that uses a relative small dataset with a large variability. The proposed method shows good results when compared to the state of the art. An annotated dataset has been built that covers 12 cities distributed in different regions of Brazil. Such images not only have different qualities, but also shows a large variability on the building characteristics and geographic environments. A very important application of this method is the use of the building area classification in the dasimetric methods for the population estimation into grid. The concept proof in this application showed a promising result when compared to the usual method allowing the improvement of the quality of the estimates. [pt] APRENDIZADO DE MAQUINA [pt] REDE NEURAL CONVOLUCIONAL [pt] DASIMETRIA [pt] SEGMENTACAO DE IMAGENS [pt] DETECCAO DE OBJETOS [pt] CLASSIFICACAO DE IMAGENS [en] MACHINE LEARNING [en] CONVOLUTIONAL NEURAL NETWORK [en] DASIMETRY [en] IMAGE SEGMENTATION [en] OBJECT DETECTION [en] IMAGE CLASSIFICATION
514	A MULTI-HEAD ATTENTION APPROACH WITH COMPLEMENTARY MULTIMODAL FUSION FOR VEHICLE DETECTION Nujhat Tabassum (18010969) 03 June 2024 (has links) <p dir="ltr">In the realm of autonomous vehicle technology, the Multimodal Vehicle Detection Network (MVDNet) represents a significant leap forward, particularly in the challenging context of weather conditions. This paper focuses on the enhancement of MVDNet through the integration of a multi-head attention layer, aimed at refining its performance. The integrated multi-head attention layer in the MVDNet model is a pivotal modification, advancing the network's ability to process and fuse multimodal sensor information more efficiently. The paper validates the improved performance of MVDNet with multi-head attention through comprehensive testing, which includes a training dataset derived from the Oxford Radar Robotcar. The results clearly demonstrate that the Multi-Head MVDNet outperforms the other related conventional models, particularly in the Average Precision (AP) estimation, under challenging environmental conditions. The proposed Multi-Head MVDNet not only contributes significantly to the field of autonomous vehicle detection but also underscores the potential of sophisticated sensor fusion techniques in overcoming environmental limitations.</p> Electronic sensors Computer vision Multimodal analysis and synthesis Deep learning Neural networks Multi-head Attention Deep Learning Attention Neural Network Autonomous Vehicle Sensor Fusion CNN R-CNN Vehicle Detection Object Detection Deep Fusion Lidar Radar Vision Transformer (ViT)
515	Multilevel Datenfusion konkurrierender Sensoren in der Fahrzeugumfelderfassung Haberjahn, Mathias 21 November 2013 (has links) Mit der vorliegenden Dissertation soll ein Beitrag zur Steigerung der Genauigkeit und Zuverlässigkeit einer sensorgestützten Objekterkennung im Fahrzeugumfeld geleistet werden. Aufbauend auf einem Erfassungssystem, bestehend aus einer Stereokamera und einem Mehrzeilen-Laserscanner, werden teils neu entwickelte Verfahren für die gesamte Verarbeitungskette vorgestellt. Zusätzlich wird ein neuartiges Framework zur Fusion heterogener Sensordaten eingeführt, welches über eine Zusammenführung der Fusionsergebnisse aus den unterschiedlichen Verarbeitungsebenen in der Lage ist, die Objektbestimmung zu verbessern. Nach einer Beschreibung des verwendeten Sensoraufbaus werden die entwickelten Verfahren zur Kalibrierung des Sensorpaares vorgestellt. Bei der Segmentierung der räumlichen Punktdaten werden bestehende Verfahren durch die Einbeziehung von Messgenauigkeit und Messspezifik des Sensors erweitert. In der anschließenden Objektverfolgung wird neben einem neuartigen berechnungsoptimierten Ansatz zur Objektassoziierung ein Modell zur adaptiven Referenzpunktbestimmung und –Verfolgung beschrieben. Durch das vorgestellte Fusions-Framework ist es möglich, die Sensordaten wahlweise auf drei unterschiedlichen Verarbeitungsebenen (Punkt-, Objekt- und Track-Ebene) zu vereinen. Hierzu wird ein sensorunabhängiger Ansatz zur Fusion der Punktdaten dargelegt, der im Vergleich zu den anderen Fusionsebenen und den Einzelsensoren die genaueste Objektbeschreibung liefert. Für die oberen Fusionsebenen wurden unter Ausnutzung der konkurrierenden Sensorinformationen neuartige Verfahren zur Bestimmung und Reduzierung der Detektions- und Verarbeitungsfehler entwickelt. Abschließend wird beschrieben, wie die fehlerreduzierenden Verfahren der oberen Fusionsebenen mit der optimalen Objektbeschreibung der unteren Fusionsebene für eine optimale Objektbestimmung zusammengeführt werden können. Die Effektivität der entwickelten Verfahren wurde durch Simulation oder in realen Messszenarien überprüft. / With the present thesis a contribution to the increase of the accuracy and reliability of a sensor-supported recognition and tracking of objects in a vehicle’s surroundings should be made. Based on a detection system, consisting of a stereo camera and a laser scanner, novel developed procedures are introduced for the whole processing chain of the sensor data. In addition, a new framework is introduced for the fusion of heterogeneous sensor data. By combining the data fusion results from the different processing levels the object detection can be improved. After a short description of the used sensor setup the developed procedures for the calibration and mutual orientation are introduced. With the segmentation of the spatial point data existing procedures are extended by the inclusion of measuring accuracy and specificity of the sensor. In the subsequent object tracking a new computation-optimized approach for the association of the related object hypotheses is presented. In addition, a model for a dynamic determination and tracking of an object reference point is described which exceeds the classical tracking of the object center in the track accuracy. By the introduced fusion framework it is possible to merge the sensor data at three different processing levels (point, object and track level). A sensor independent approach for the low fusion of point data is demonstrated which delivers the most precise object description in comparison to the other fusion levels and the single sensors. For the higher fusion levels new procedures were developed to discover and clean up the detection and processing mistakes benefiting from the competing sensor information. Finally it is described how the fusion results of the upper and lower levels can be brought together for an ideal object description. The effectiveness of the newly developed methods was checked either by simulation or in real measurement scenarios. Objekterkennung Multi-Sensor Datenfusion Multilevel Datenfusion Fahrzeugumfelderfassung Stereobildverbeitung Laserscanner multi sensor data fusion multi level data fusion object detection stereo vision laser scanner 004 Informatik 28 Informatik, Datenverarbeitung ZQ 3130 ddc:004
516	[en] GENERATION AND DETECTION OF OBJECTS IN DOCUMENTS BY DEEP LEARNING NEURAL NETWORK MODELS (DEEPDOCGEN) / [pt] GERAÇÃO E DETECÇÃO DE OBJETOS EM DOCUMENTOS POR MODELOS DE REDES NEURAIS DE APRENDIZAGEM PROFUNDA (DEEPDOCGEN) LOICK GEOFFREY HODONOU 06 February 2025 (has links) [pt] A eficácia dos sistemas de conversação homem-máquina, como chatbots e assistentes virtuais, está diretamente relacionada à quantidade e qualidade do conhecimento disponível para eles. Na era digital, a diversidade e a qualidade dos dados aumentaram significativamente, estando disponíveis em diversos formatos. Entre esses, o PDF (Portable Document Format) se destaca como um dos mais conhecidos e amplamente utilizados, adaptando-se a variados setores, como empresarial, educacional e de pesquisa. Esses arquivos contêm uma quantidade considerável de dados estruturados, como textos, títulos, listas, tabelas, imagens, etc. O conteúdo dos arquivos PDF pode ser extraído utilizando ferramentas dedicadas, como o OCR (Reconhecimento Ótico de Caracteres), o PdfMiner, Tabula e outras, que provaram ser adequadas para esta tarefa. No entanto, estas ferramentas podem deparar-se com dificuldades quando lidam com a apresentação complexa e variada dos documentos PDF. A exatidão da extração pode ser comprometida pela diversidade de esquemas, formatos não normalizados e elementos gráficos incorporados nos documentos, o que frequentemente leva a um pós-processamento manual. A visão computacional e, mais especificamente, a detecção de objetos, é um ramo do aprendizado de máquina que visa localizar e classificar instâncias em imagens utilizando modelos de detecção dedicados à tarefa, e está provando ser uma abordagem viável para acelerar o trabalho realizado por algoritmos como OCR, PdfMiner, Tabula, além de melhorar sua precisão. Os modelos de detecção de objetos, por serem baseados em aprendizagem profunda, exigem não apenas uma quantidade substancial de dados para treinamento, mas, acima de tudo, anotações de alta qualidade pois elas têm um impacto direto na obtenção de altos níveis de precisão e robustez. A diversidade de layouts e elementos gráficos em documentos PDF acrescenta uma camada adicional de complexidade, exigindo dados anotados de forma representativa para que os modelos possam aprender a lidar com todas as variações possíveis. Considerando o aspecto volumoso dos dados necessários para o treinamento dos modelos, percebemos rapidamente que o processo de anotação dos dados se torna uma tarefa tediosa e demorada que requer intervenção humana para identificar e etiquetar manualmente cada elemento relevante. Essa tarefa não é apenas demorada, mas também sujeita a erros humanos, o que muitas vezes exige verificações e correções adicionais. A fim de encontrar um meio-termo entre a quantidade de dados, a minimização do tempo de anotação e anotações de alta qualidade, neste trabalho propusemos um pipeline que, a partir de um número limitado de documentos PDF anotados com as categorias texto, título, lista, tabela e imagem recebidas como entrada, é capaz de criar novas layouts de documentos semelhantes com base no número desejado pelo usuário. Este pipeline vai mais longe em preenchendo com o conteúdo as novas layouts criadas, a fim de fornecer imagens de documentos sintéticos e suas respectivas anotações. Com sua estrutura simples, intuitiva e escalável, este pipeline pode contribuir para o active learning, permitindo assim aos modelos de detecção serem treinados continuamente, os tornando mais eficazes e robustos diante de documentos reais. Em nossas experiências, ao avaliar e comparar três modelos de detecção, observamos que o RT-DETR (Real-Time DEtection TRansformer) obteve os melhores resultados, atingindo uma precisão média (mean Average Precision, mAP) de 96,30 por cento, superando os resultados do Mask R-CNN (Region-based Convolutional Neural Networks) e Mask DINO (Mask DETR with Improved Denoising Anchor Boxes). A superioridade do RT-DETR indica seu potencial para se tornar uma solução de referência na detecção de características em documentos PDF. Esses resultados promissores abrem caminho para aplicações mais eficientes e confiáveis no processamento automático de documentos. / [en] The effectiveness of human-machine conversation systems, such as chat-bots and virtual assistants, is directly related to the amount and quality of knowledge available to them. In the digital age, the diversity and quality of data have increased significantly, being available in various formats. Among these, the PDF (Portable Document Format) stands out as one of the most well-known and widely used, adapting to various sectors, such as business, education, and research. These files contain a considerable amount of structured data, such as text, headings, lists, tables, images, etc. The content of PDF files can be extracted using dedicated tools, such as OCR (Optical Character Recognition), PdfMiner, Tabula and others, which have proven to be suitable for this task. However, these tools may encounter difficulties when dealing with the complex and varied presentation of PDF documents. The accuracy of extraction can be compromised by the diversity of layouts, non-standardized formats, and embedded graphic elements in the documents, often leading to manual post-processing. Computer vision, and more specifically, object detection, is a branch of machine learning that aims to locate and classify instances in images using models dedicated to the task. It is proving to be a viable approach to accelerating the work performed by algorithms like OCR, PdfMiner, Tabula and improving their accuracy. Object detection models, being based on deep learning, require not only a substantial amount of data for training but, above all, high-quality annotations, as they have a direct impact on achieving high levels of accuracy and robustness. The diversity of layouts and graphic elements in PDF documents adds an additional layer of complexity, requiring representatively annotated data so that the models can learn to handle all possible variations. Considering the voluminous aspect of the data needed for training the models, we quickly realize that the data annotation process becomes a tedious and time-consuming task requiring human intervention to manually identify and label each relevant element. This task is not only time-consuming but also subject to human error, often requiring additional checks and corrections. To find a middle ground between the amount of data, minimizing annotation time, and high-quality annotations, in this work, we proposed a pipeline that, from a limited number of annotated PDF documents with the categories text, title, list, table, and image as input, can create new document layouts similar to the desired number by the user. This pipeline goes further by filling the new created layouts with content to provide synthetic document images and their respective annotations. With its simple, intuitive, and scalable structure, this pipeline can contribute to active learning, allowing detection models to be continuously trained, making them more effective and robust in the face of real documents. In our experiments, when evaluating and comparing three detection models, we observed that the RT-DETR (Real-Time Detection Transformer) achieved the best results, reaching a mean Average Precision (mAP) of 96.30 percent, surpassing the results of Mask R-CNN (Region-based Convolutional Neural Networks) and Mask DINO (Mask DETR with Improved Denoising Anchor Boxes). The superiority of RT-DETR indicates its potential to become a reference solution in detecting features in PDF documents. These promising results pave the way for more efficient and reliable applications in the automatic processing of documents. [pt] ENSINO INTERATIVO [pt] ANALISE DO LAYOUT DO DOCUMENTO [pt] DADO SINTETICO [pt] APRENDIZADO DE MAQUINA [pt] GERACAO DE DADOS [pt] APRENDIZADO PROFUNDO [pt] DETECCAO DE OBJETOS [en] ACTIVE LEARNING [en] DOCUMENT LAYOUT ANALYSIS [en] SYNTHETIC DATUM [en] MACHINE LEARNING [en] DATA GENERATION [en] DEEP LEARNING [en] OBJECT DETECTION
517	Исследование методов семантической сегментации для объектов типа прожилки : магистерская диссертация / Study of semantic segmentation methods for vein-type objects Мельников, В. А., Melnikov, V. A. January 2024 (has links) The object of the study is digital images of stones in an open pit. The aim of the work is to develop and implement an algorithm for detecting and segmenting asbestos veins using an artificial intelligence apparatus. The study presents an analytical review of methods and existing technical and software systems that use artificial intelligence methods for segmentation on the main test datasets. An analysis of existing models was carried out, new models based on convolutional networks (UNet and Attention Unet) and transformers (SegFormer) were tested, and the best algorithm for the task of segmenting asbestos veins was proposed. As a result of using the artificial intelligence model, it was possible to effectively solve the problem of vein segmentation and achieve acceptable accuracy of the results with low computing power. The scope of application of the developed algorithm is not only its use in the analysis of asbestos content in quarry images. The obtained models can be used to identify defects in various products and in medicine. / Объектом исследования являются цифровые изображения камней в открытом карьере. Целью работы является разработка и реализация алгоритма детектирования и сегментации асбестовых прожилок с применением аппарата искусственного интеллекта. В исследовании представлен аналитический обзор методов и существующих технических и программных систем, использующих методы искусственного интеллекта для сегментации на основных тестовых датасетах. Проведён анализ существующих моделей, протестированы новые модели на основе сверточных сетей (UNet и Attention Unet) и трансформеров (SegFormer), предложен лучший алгоритм для задачи сегментации асбестовых прожилок. В результате применения модели искусственного интеллекта удалось эффективно решить задачу сегментации прожилок и достигнуть приемлемой точности полученных результатов при небольшой вычислительной мощности. Областью применения разработанного алгоритма является не только его использование в рамках анализа содержания асбеста в снимках карьера. Полученные модели могут использоваться для определения дефектов на различной продукции и в медицине. MASTER'S THESIS SEMANTIC SEGMENTATION INSTANCE SEGMENTATION CONVOLUTIONAL NEURAL NETWORKS OBJECT DETECTION FILTERING ALGORITHMS MINING INDUSTRY PROBLEM ОБНАРУЖЕНИЕ ОБЪЕКТОВ АЛГОРИТМЫ ФИЛЬТРАЦИИ
518	Исследование методов оценки выхода продукции предприятия "Урал-Асбест" при помощи системы компьютерного зрения : магистерская диссертация / Study of methods for assessing the output of the Ural-Asbest enterprise using computer vision system Чилингарян, Д. Г., Chilingaryan, D. G. January 2024 (has links) Данная выпускная квалификационная работа Давида Грайровича Чилингаряна посвящена оценке выпуска продукции предприятия «Урал-асбест» с помощью системы компьютерного зрения. В работе рассматриваются современные методы семантической сегментации и обнаружения объектов на изображениях, в том числе с применением нейронных сетей UNet, YOLOv9, SWIN. Особое внимание уделено предварительной обработке данных, выбору и настройке моделей, а также анализу эффективности на реальных производственных данных. Полученные результаты демонстрируют высокую точность и эффективность предложенных методов, позволяющих автоматизировать оценку содержания асбеста в горных породах, сократить временные затраты и минимизировать контакт рабочих с вредным материалом. Практическая значимость заключается во внедрении разработанных решений в производственные процессы предприятия, улучшении контроля качества и защите здоровья работников. / This graduate qualification work of David Grayrovich Chilingaryan is devoted to the estimation of output of “Ural-asbestos” enterprise with the help of computer vision system. The work considers modern methods of semantic segmentation and detection of objects in images, including the use of neural networks UNet, YOLOv9, SWIN. Special attention is paid to data preprocessing, model selection and tuning, and performance analysis on real production data. The obtained results demonstrate high accuracy and efficiency of the proposed methods, allowing to automate the assessment of asbestos content in rocks, reduce time costs and minimize the contact of workers with harmful material. The practical significance lies in the implementation of the developed solutions in the production processes of the enterprise, improvement of quality control and protection of workers' health. MASTER'S THESIS SEMANTIC SEGMENTATION INSTANCE SEGMENTATION CONVOLUTIONAL NEURAL NETWORKS OBJECT DETECTION FILTERING ALGORITHMS MINING PROBLEM ОБНАРУЖЕНИЕ ОБЪЕКТОВ АЛГОРИТМЫ ФИЛЬТРАЦИИ
519	Разработка системы фиксации по камерам факта отъезда автомобиля с помощью технологий компьютерного зрения : магистерская диссертация / Development of a system for detecting the departure of a vehicle using computer vision technologies Орлов, А. А., Orlov, A. A. January 2024 (has links) В данной работе исследуется и разрабатывается система, способная автоматически фиксировать начало и конец тест-драйва автомобилей с использованием технологий компьютерного зрения. Основные методы исследования включают использование глубокого обучения, в частности одноэтапных детекторов, методов анализа видеоданных и временных меток. Работа имеет как теоретическую, так и практическую значимость. Она вносит вклад в область применения технологий компьютерного зрения для автоматизации процессов в автосалонах, а также предоставляет практический инструмент для повышения эффективности работы дилерских центров. / This paper investigates and develops a system capable of automatically detecting the start and end of car test drives using computer vision technologies. The main research methods include the use of deep learning, particularly single-stage detectors, video data analysis methods, and timestamps. The work has both theoretical and practical significance. It contributes to the application of computer vision technologies for automating processes in car dealerships and provides a practical tool to enhance the efficiency of dealership operations. MASTER'S THESIS TEST DRIVE TRACKING SYSTEM OBJECT DETECTION YOLOV8 VIDEO DATA PARKING SPOTS FLASK AUTOMATION DEALERSHIP CENTERS ТЕСТ-ДРАЙВ СИСТЕМА ФИКСАЦИИ ДЕТЕКЦИЯ ОБЪЕКТОВ YOLOV8 ВИДЕОДАННЫЕ ПАРКОВОЧНЫЕ МЕСТА FLASK АВТОМАТИЗАЦИЯ ДИЛЕРСКИЕ ЦЕНТРЫ
520	Разработка системы компьютерного зрения для определения вида фракции щебня : магистерская диссертация / Development of a computer vision system for determining the type of crushed stone fraction Ахметов, В. М., Akhmetov, V. M. January 2024 (has links) Основная цель выпускной квалификационной работы состоит в разработке системы компьютерного зрения для определения вида фракции щебня. А также определении наиболее эффективного метода для определения фракции щебня, сравнивая задачи компьютерного зрения: обнаружение объектов и классификация. Первая часть исследования посвящена анализу существующий методов и алгоритмов классификации изображений на основе нейронных сетей. Были проанализированы модели, предназначенные для обнаружения объектов и классификации. Для задачи классификации изображений сравнение выполнялось для моделей: Resnet, Efficientnet, Deit, Tinyvit. Для задачи обнаружения объектов: Yolo, Faster R-CNN и SSD. Во второй части исследования была обучена модель обнаружения объектов и обучены модели классификации. После произведено сравнение производительности данных моделей для решаемой задачи – определения фракции щебня. Третья часть выпускной квалификационной работы направлена на разработку системы компьютерного зрения для определения фракции щебня. Для работоспособности системы было развернуто два Docker-контейнера и сервер Uvicorn с работающим приложением FastAPI. / The main objective of the final qualification work is to develop a computer vision system for determining the type of crushed stone fraction. As well as determining the most effective method for determining the crushed stone fraction, comparing the tasks of computer vision: object detection and classification. The first part of the study is devoted to the analysis of existing methods and algorithms for image classification based on neural networks. Models designed for object detection and classification were analyzed. For the task of image classification, the comparison was performed for the following models: Resnet, Efficientnet, Deit, Tinyvit. For the task of object detection: Yolo, Faster R-CNN and SSD. In the second part of the study, an object detection model was trained and classification models were trained. After that, a comparison of the performance of these models for the problem being solved - determining the crushed stone fraction was made. The third part of the final qualification work is aimed at developing a computer vision system for determining the crushed stone fraction. For the system to work, two Docker containers and a Uvicorn server with a running FastAPI application were deployed. MASTER'S THESIS CRUSHED STONE FRACTION COMPUTER VISION NEURAL NETWORKS OBJECT DETECTION CLASSIFICATION YOLO DATA MARKUP DOCKER FASTAPI ФРАКЦИЯ ЩЕБНЯ КОМПЬЮТЕРНОЕ ЗРЕНИЕ НЕЙРОННЫЕ СЕТИ ОБНАРУЖЕНИЕ ОБЪЕКТОВ КЛАССИФИКАЦИЯ YOLO РАЗМЕТКА ДАННЫХ DOCKER FASTAPI

Search results