Global ETD Search

11	Návrh algoritmu pro anonymizaci ultrazvukových dat na úrovni snímku / Design of algorithm for anonymization of ultrasound data Bugnerová, Pavla January 2017 (has links) This master’s thesis is focused on anonymization of ultrasound data in DICOM format. Haar wavelet belonging to Daubechies wavelet family is used to detect text areas in the image. Extraction of the text from the image is done using a free tool - tesseract OCR Engine. Finally, detected text is compared to sensitive data from DICOM metadata using Levenshtein - edit distance algorithm.
12	Reinforcement Learning for Parameter Control of Image-Based Applications Taylor, Graham January 2004 (has links) The significant amount of data contained in digital images present barriers to methods of learning from the information they hold. Noise and the subjectivity of image evaluation further complicate such automated processes. In this thesis, we examine a particular area in which these difficulties are experienced. We attempt to control the parameters of a multi-step algorithm that processes visual information. A framework for approaching the parameter selection problem using reinforcement learning agents is presented as the main contribution of this research. We focus on the generation of state and action space, as well as task-dependent reward. We first discuss the automatic determination of fuzzy membership functions as a specific case of the above problem. Entropy of a fuzzy event is used as a reinforcement signal. Membership functions representing brightness have been automatically generated for several images. The results show that the reinforcement learning approach is superior to an existing simulated annealing-based approach. The framework has also been evaluated by optimizing ten parameters of the text detection for semantic indexing algorithm proposed by Wolf et al. Image features are defined and extracted to construct the state space. Generalization to reduce the state space is performed with the fuzzy ARTMAP neural network, offering much faster learning than in the previous tabular implementation, despite a much larger state and action space. Difficulties in using a continuous action space are overcome by employing the DIRECT method for global optimization without derivatives. The chosen parameters are evaluated using metrics of recall and precision, and are shown to be superior to the parameters previously recommended. We further discuss the interplay between intermediate and terminal reinforcement. Systems Design reinforcement learning artificial neural networks image processing computer vision text detection artificial intelligence machine learning parameter control optimization Markov decision processes fuzzy ARTMAP
13	Detection of Frozen Video Subtitles Using Machine Learning Sjölund, Jonathan January 2019 (has links) When subtitles are burned into a video, an error can sometimes occur in the encoder that results in the same subtitle being burned into several frames, resulting in subtitles becoming frozen. This thesis provides a way to detect frozen video subtitles with the help of an implemented text detector and classifier. Two types of classifiers, naïve classifiers and machine learning classifiers, are tested and compared on a variety of different videos to see how much a machine learning approach can improve the performance. The naïve classifiers are evaluated using ground truth data to gain an understanding of the importance of good text detection. To understand the difficulty of the problem, two different machine learning classifiers are tested, logistic regression and random forests. The result shows that machine learning improves the performance over using naïve classifiers by improving the specificity from approximately 87.3% to 95.8% and improving the accuracy from 93.3% to 95.5%. Random forests achieve the best overall performance, but the difference compared to when using logistic regression is small enough that more computationally complex machine learning classifiers are not necessary. Using the ground truth shows that the weaker naïve classifiers would be improved by at least 4.2% accuracy, thus a better text detector is warranted. This thesis shows that machine learning is a viable option for detecting frozen video subtitles. Machine learning Text detection Text localization Text extraction Frozen subtitles Burnt-in subtitles Hardcoded subtitles Classification Text classification Frozen subtitle classification Computer Sciences Datavetenskap (datalogi)
14	Reinforcement Learning for Parameter Control of Image-Based Applications Taylor, Graham January 2004 (has links) The significant amount of data contained in digital images present barriers to methods of learning from the information they hold. Noise and the subjectivity of image evaluation further complicate such automated processes. In this thesis, we examine a particular area in which these difficulties are experienced. We attempt to control the parameters of a multi-step algorithm that processes visual information. A framework for approaching the parameter selection problem using reinforcement learning agents is presented as the main contribution of this research. We focus on the generation of state and action space, as well as task-dependent reward. We first discuss the automatic determination of fuzzy membership functions as a specific case of the above problem. Entropy of a fuzzy event is used as a reinforcement signal. Membership functions representing brightness have been automatically generated for several images. The results show that the reinforcement learning approach is superior to an existing simulated annealing-based approach. The framework has also been evaluated by optimizing ten parameters of the text detection for semantic indexing algorithm proposed by Wolf et al. Image features are defined and extracted to construct the state space. Generalization to reduce the state space is performed with the fuzzy ARTMAP neural network, offering much faster learning than in the previous tabular implementation, despite a much larger state and action space. Difficulties in using a continuous action space are overcome by employing the DIRECT method for global optimization without derivatives. The chosen parameters are evaluated using metrics of recall and precision, and are shown to be superior to the parameters previously recommended. We further discuss the interplay between intermediate and terminal reinforcement. Systems Design reinforcement learning artificial neural networks image processing computer vision text detection artificial intelligence machine learning parameter control optimization Markov decision processes fuzzy ARTMAP
15	Reconhecimento de texto e rastreamento de objetos 2D/3D / Text recognition and 2D/3D object tracking Minetto, Rodrigo, 1983- 20 August 2018 (has links) Orientadores: Jorge Stolfi, Neucimar Jerônimo Leite / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-20T03:12:07Z (GMT). No. of bitstreams: 1 Minetto_Rodrigo_D.pdf: 35894128 bytes, checksum: 8a0e453fba7e6a9a02fb17a52fdbf878 (MD5) Previous issue date: 2012 / Resumo: Nesta tese abordamos três problemas de visão computacional: (1) detecção e reconhecimento de objetos de texto planos em imagens de cenas reais; (2) rastreamento destes objetos de texto em vídeos digitais; e (3) o rastreamento de um objeto tridimensional rígido arbitrário com marcas conhecidas em um vídeo digital. Nós desenvolvemos, para cada um dos problemas, algoritmos inovadores, que são pelo menos tão precisos e robustos quanto outros algoritmos estado-da-arte. Especificamente, para reconhecimento de texto nós desenvolvemos (e validamos extensivamente) um novo descritor de imagem baseado em HOG especializado para escrita romana, que denominamos T-HOG, e mostramos sua contribuição como um filtro em um detector de texto (SNOOPERTEXT). Nós também melhoramos o algoritmo SNOOPERTEXT através do uso da técnica multiescala para tratar caracteres de tamanhos bastante variados e limitar a sensibilidade do algoritmo a vários artefatos. Para rastreamento de texto, nós descrevemos quatro estratégias básicas para combinar a detecção e o rastreamento de texto, e desenvolvemos também um rastreador específico baseado em filtro de partículas que explora o uso do reconhecedor T-HOG. Para o rastreamento de objetos rígidos, nós desenvolvemos um novo algoritmo preciso e robusto (AFFTRACK) que combina rastreamento de características por KLT com uma calibração de câmera melhorada. Nós testamos extensivamente nossos algoritmos com diversas bases de dados descritas na literatura. Nós também desenvolvemos algumas bases de dados (publicamente disponíveis) para a validação de algoritmos de detecção e rastreamento de texto e de rastreamento de objetos rígidos em vídeos / Abstract: In this thesis we address three computer vision problems: (1) the detection and recognition of flat text objects in images of real scenes; (2) the tracking of such text objects in a digital video; and (3) the tracking an arbitrary three-dimensional rigid object with known markings in a digital video. For each problem we developed innovative algorithms, which are at least as accurate and robust as other state-of-the-art algorithms. Specifically, for text classification we developed (and extensively evaluated) a new HOG-based descriptor specialized for Roman script, which we call T-HOG, and showed its value as a post-filter for an existing text detector (SNOOPERTEXT). We also improved the SNOOPERTEXT algorithm by using the multi-scale technique to handle widely different letter sizes while limiting the sensitivity of the algorithm to various artifacts. For text tracking, we describe four basic ways of combining a text detector and a text tracker, and we developed a specific tracker based on a particle-filter which exploits the T-HOG recognizer. For rigid object tracking we developed a new accurate and robust algorithm (AFFTRACK) that combines the KLT feature tracker with an improved camera calibration procedure. We extensively tested our algorithms on several benchmarks well-known in the literature. We also created benchmarks (publicly available) for the evaluation of text detection and tracking and rigid object tracking algorithms / Doutorado / Ciência da Computação / Doutor em Ciência da Computação Reconhecimento de texto Detecção de texto Descritor de imagem Rastreamento de texto Rastreamento tridimensional Text recognition Text detection Image descriptor Text tracking Three-dimensional tracking
16	Improvement of a text detection chain and the proposition of a new evaluation protocol for text detection algorithms / Amélioration d'une chaîne de détection de texte et proposition d'un nouveau protocole d'évaluation d'algorithmes de détection de texte Calarasanu, Stefania Ana 11 December 2015 (has links) Le nombre croissant d'approches de détection de texte proposé dans la littérature exige une évaluation rigoureuse de la performance. Un protocole d'évaluation repose sur trois éléments: une vérité terrain fiable, une stratégie d'appariement et enfin un ensemble de métriques. Peu de protocoles existent et ces protocoles manquent souvent de précision. Dans cette thèse, nous proposons un nouveau protocole d'évaluation qui résout la plupart des problèmes rencontrés dans les méthodes d'évaluation actuelles. Ce travail est axé sur trois contributions principales : tout d’abord, nous introduisons une représentation complexe de la vérité terrain qui ne contraint pas les détecteurs de texte à adopter un niveau de granularité de détection spécifique ou une représentation d'annotation ; d’autre part, nous proposons un ensemble de règles capables d'évaluer tous types de scénario qui peuvent se produire entre les objets de la vérité terrain et les détections correspondantes ; et enfin, nous montrons comment nous pouvons analyser un ensemble de résultats de détection, non seulement à travers un ensemble de mesures, mais aussi à travers une représentation visuelle intuitive. Un défi fréquent pour de nombreux systèmes de détection de texte est d'aborder la variété des caractéristiques de texte dans des images naturelles ou d’origine numérique pour lesquels les OCR actuels ne sont pas bien adaptées. Par exemple, des textes en perspective sont fréquemment présents dans les images réelles. Dans cette thèse, nous proposons également une procédure de rectification capable de corriger des textes hautement déformés, évalué sur un ensemble de données difficiles. / The growing number of text detection approaches proposed in the literature requires a rigorous performance evaluation and ranking. An evaluation protocol relies on three elements: a reliable text reference, a matching strategy and finally a set of metrics. The few existing evaluation protocols often lack accuracy either due to inconsistent matching or due to unrepresentative metrics. In this thesis we propose a new evaluation protocol that tackles most of the drawbacks faced by currently used evaluation methods. This work is focused on three main contributions: firstly, we introduce a complex text reference representation that does not constrain text detectors to adopt a specific detection granularity level or annotation representation; secondly, we propose a set of matching rules capable of evaluating any type of scenario that can occur between a text reference and a detection; and finally we show how we can analyze a set of detection results, not only through a set of metrics, but also through an intuitive visual representation. A frequent challenge for many Text Understanding Systems is to tackle the variety of text characteristics in born-digital and natural scene images for which current OCRs are not well adapted. For example, texts in perspective are frequently present in real-word images because the camera capture angle is not normal to the plane containing the text regions. Despite the ability of some detectors to accurately localize such text objects, the recognition stage fails most of the time. In this thesis we also propose a rectification procedure capable of correcting highly distorted texts evaluated on a very challenging dataset. Protocole d'évaluation Détection de texte Métriques de performance Visualisation par histogrammes Earth mover's distance Rectification de texte Texte en perspective Comparaison d'algorithmes Evaluation protocol Text detection Performance metrics 004
17	DETECTING UNSTRUCTURED TEXT IN STRUCTURAL DRAWINGS USING MACHINE VISION Jean Herfina Kwannandar (13171761) 29 July 2022 (has links) <p>This focus of this thesis is the application of text detection, which is a field within computer vision, in structural drawings. To understand a structural system and conduct a rapid assessment of an existing structure would benefit from the ability to read the information contained within the drawing or related engineering documents. Extracting engineering data manually from the structural drawings is incredibly time-consuming and expensive. In addition, the variation in human engineers’ experience makes the output prone to errors and false evaluations. In this study, the latest development in computer vision, especially for text detection, using large volumes of words in some structural drawings, is explored and evaluated. The goal is to read text in structural drawings, which usually has some feature noises due to the high complexity of the structural annotations and lines. The dataset consists of computer-generated structural drawings which have different word shapes and types of fonts with various text orientations. The utilized structural drawings are floor plans, and thus contain structural details which are filled with various structural element labels and dimensions. Fine tuning of the pre-trained model yieldssignificant performance in unstructured text detection, especially in the model’s recall. The results demonstrate that the developed predictive modeling workflow and its computational requirements are sufficient for the unstructured text detection in structural drawings </p> Computer vision Structural Drawings Computer Vision Text Detection Unstructured Text Machine Learning Deep Learning Civil Engineering
18	多語言的場景文字偵測 / Multilingual Scene Text Detection 梁苡萱, Liang, Yi Hsuan Unknown Date (has links) 影像中的文字訊息，通常包含著與場景內容相關的重要資訊，如地點、名稱、指示、警告等，因此如何有效地在影像中擷取文字區塊，進而解讀其意義，成為近來電腦視覺領域中相當受矚目的議題。然而在眾多的場景文字偵測方法裡，絕大多數是以英文為偵測目標語言，中文方面的研究相當稀少，而且辨識率遠不及英文。因此，本論文提出以中文和英文為偵測目標語言的方法，分成以下四個主要程序：一、前處理，利用雙邊濾波器(Bilateral filter)使文字區域更加穩定；二、候選文字資訊擷取，考慮文字特徵，選用Canny 邊緣偵測和最大穩定極值區域(Maximally Stable Extremal Region)，分別提取文字邊緣和區域特徵，並結合兩者來優化擷取的資訊；三、文字連結，依中文字結構和直式、橫式兩種書寫方向，設置幾何條件連結候選文字字串；四、候選字串分類，以SVM加入影像中文字的特徵，分類文字字串和非文字字串。使得此方法可以偵測中文和英文兩種語言，並且達到不錯的辨識效果。 / Text messages in an image usually contain useful information related to the scene, such as location, name, direction and warning. As such, robust and efficient scene text detection has gained increasing attention in the area of computer vision recently. However, most existing scene text detection methods are devised to process Latin-based languages. For the few researches that reported the investigation of Chinese text, the detection rate was inferior to the result for English. In this thesis, we propose a multilingual scene text detection algorithm for both Chinese and English. The method comprises of four stages: 1. Preprocessing by bilateral filter to make the text region more stable. 2. Extracting candidate text edge and region using Canny edge detector and Maximally Stable Extremal Region (MSER) respectively. Then combine these two features to achieve more robust results. 3. Linking candidate characters: considering both horizontal and vertical direction, character candidates are clustered into text candidates by using geometrical constraints. 4. Classifying candidate texts using support vector machine (SVM), the text and non-text areas are separated. Experimental results show that the proposed method detects both Chinese and English texts, and achieve satisfactory performance compared to those approaches designed only for English detection. 場景文字偵測雙邊濾波器最大穩定極值區域 Scene text detection Bilateral filter Maximally Stable Extremal Region(MSER)
19	Rozpoznávání topologických informací z plánu křižovatky / Topology Recognition from Crossroad Plan Huták, Petr January 2016 (has links) This master‘s thesis describes research, design and development of system for topology recognition from crossroad plan. It explains the methods used for image processing, image segmentation, object recognition. It describes approaches in processing of maps represented by raster images and target software, in which the final product of practical part of project will be integrated. Thesis is focused mainly on comparison of different approaches in feature extraction from raster maps and determination their semantic meaning. Practical part of project is implemented in C# language with OpenCV library.
20	Detecting and comparing Kanban boards using Computer Vision / Detektering och jämförelse av Kanbantavlor med hjälp av datorseende Behnam, Humam January 2022 (has links) This thesis investigates the problem of detecting and tracking sticky notes on Kanban boards using classical computer vision techniques. Currently, there exists some alternatives for digitizing sticky notes, but none keep track of notes that have already been digitized, allowing for duplicate notes to be created when scanning multiple images of the same Kanban board. Kanban boards are widely used in various industries, and being able to recognize, and possibly in the future even digitize entire Kanban boards could provide users with extended functionality. The implementation presented in this thesis is able to, given two images, detect the Kanban boards in each image and rectify them. The rectified images are then sent to the Google Cloud Vision API for text detection. Then, the rectified images are used to detect all the sticky notes. The positional information of the notes and columns of the Kanban boards are then used to filter the text detection to find the text inside each note as well as the header text for each column. Between the two images, the columns are compared and matched, as well as notes of the same color. If columns or notes in one image do not have a match in the second image, it is concluded that the boards are different, and the user is informed of why. If all columns and notes in one image have matches in the second image but some notes have moved, the user is informed of which notes that have moved, and how they have moved as well. The different experiments conducted in this thesis on the implementation show that it works well, but it is very confined to strict requirements, making it unsuitable for commercial use. The biggest problem to solve is to make the implementation more general, i.e. the Kanban board layout, sticky note shapes and colors as well as their actual content. / Denna avhandling undersöker problemet med att upptäcka och spåra klisterlappar och Kanban-tavlor med hjälp av klassiska datorseendetekniker. För närvarande finns det några alternativ för att digitalisera klisterlappar, men ingen håller reda på anteckningar som redan har digitaliserats, vilket gör att duplicerade anteckningar kan skapas när du skannar flera bilder av samma Kanban-kort. Kanban-kort används flitigt i olika branscher och att kunna känna igen, och eventuellt i framtiden även digitalisera hela Kanban-tavlor, skulle kunna ge användarna utökad funktionalitet. Implementeringen som presenteras i denna avhandling kan, givet två bilder, upptäcka Kanban-brädorna i varje bild och korrigera dem. De korrigerade bilderna skickas sedan till Google Cloud Vision API för textidentifiering. Sedan används de korrigerade bilderna för att upptäcka alla klisterlappar. Positionsinformationen för anteckningarna och kolumnerna på Kanban-tavlan används sedan för att filtrera textdetekteringen för att hitta texten i varje anteckning såväl som rubriktexten för varje kolumn. Mellan de två bilderna jämförs och matchas kolumnerna, samt anteckningar av samma färg. Om kolumner eller anteckningar i en bild inte har en matchning i den andra bilden dras slutsatsen att brädorna är olika och användaren informeras om varför. Om alla kolumner och anteckningar i en bild har matchningar i den andra bilden men några anteckningar har flyttats, informeras användaren om vilka anteckningar som har flyttats och hur de har flyttats. De olika experiment som genomförs i denna avhandling om implementering visar att den fungerar bra, men den är mycket begränsad till strikta krav, vilket gör den olämplig för kommersiellt bruk. Det största problemet att lösa är att göra implementeringen mer generell, d.v.s. Kanban-tavlans layout, klisterlapparnas former och färger samt deras faktiska innehåll. Kanban Board Sticky Notes Computer Vison OpenCV Feature Detection Feature Matching Canny Edges Contours Perspective Transformation Text Detection Kanbantavla Klisterlappar Datorseende OpenCV Funktionsdetektering Funktionsmatchning Cannykanter Konturer Perspektivförvandling Textigenkänning Computer Sciences Datavetenskap (datalogi)

Search results