Global ETD Search

101	Extraction hybride et description structurelle de caractères pour une reconnaissance efficace de texte dans les documents hétérogènes scannés : Méthodes et Algorithmes parallèles / Hybrid extraction and structural description of characters for effective text recognition in heterogeneous scanned documents : Methods and Parallel Algorithms Soua, Mahmoud 08 November 2016 (has links) La Reconnaissance Optique de Caractères (OCR) est un processus qui convertit les images textuelles en documents textes éditables. De nos jours, ces systèmes sont largement utilisés dans les applications de dématérialisation tels que le tri de courriers, la gestion de factures, etc. Dans ce cadre, l'objectif de cette thèse est de proposer un système OCR qui assure un meilleur compromis entre le taux de reconnaissance et la vitesse de traitement ce qui permet de faire une dématérialisation de documents fiable et temps réel. Pour assurer sa reconnaissance, le texte est d'abord extrait à partir de l'arrière-plan. Ensuite, il est segmenté en caractères disjoints qui seront décrits ultérieurement en se basant sur leurs caractéristiques structurelles. Finalement, les caractères sont reconnus suite à la mise en correspondance de leurs descripteurs avec ceux d'une base prédéfinie. L'extraction du texte, reste difficile dans les documents hétérogènes scannés avec un arrière-plan complexe et bruité où le texte risque d'être confondu avec un fond texturé/varié en couleurs ou distordu à cause du bruit de la numérisation. D'autre part, la description des caractères, extraits et segmentés, se montre souvent complexe (calcul de transformations géométriques, utilisation d'un grand nombre de caractéristiques) ou peu discriminante si les caractéristiques des caractères choisies sont sensibles à la variation de l'échelle, de la fonte, de style, etc. Pour ceci, nous adaptons la binarisation au type de documents hétérogènes scannés. Nous assurons également une description hautement discriminante entre les caractères se basant sur l'étude de la structure des caractères selon leurs projections horizontale et verticale dans l'espace. Pour assurer un traitement temps réel, nous parallélisons les algorithmes développés sur la plateforme du processeur graphique (GPU). Nos principales contributions dans notre système OCR proposé sont comme suit :Une nouvelle méthode d'extraction de texte à partir des documents hétérogènes scannés incluant des régions de texte avec un fond complexe ou homogène. Dans cette méthode, un processus d'analyse d’image est employé suivi d’une classification des régions du document en régions d’images (texte avec un fond complexe) et de textes (texte avec un fond homogène). Pour les régions de texte on extrait l'information textuelle en utilisant une méthode de classification hybride basée sur l'algorithme Kmeans (CHK) que nous avons développé. Les régions d'images sont améliorées avec une Correction Gamma (CG) avant d'appliquer CHK. Les résultats obtenus d'expérimentations, montrent que notre méthode d'extraction de texte permet d'attendre un taux de reconnaissance de caractères de 98,5% sur des documents hétérogènes scannés.Un Descripteur de Caractère Unifié basé sur l'étude de la structure des caractères. Il emploie un nombre suffisant de caractéristiques issues de l'unification des descripteurs de la projection horizontale et verticale des caractères réalisantune discrimination plus efficace. L'avantage de ce descripteur est à la fois sa haute performance et sa simplicité en termes de calcul. Il supporte la reconnaissance des reconnaissance de caractère de 100% pour une fonte et une taille données.Une parallélisation du système de reconnaissance de caractères. Le processeur graphique GPU a été employé comme une plateforme de parallélisation. Flexible et puissante, cette architecture offre une solution efficace pour l'accélération des algorithmesde traitement intensif d'images. Notre mise en oeuvre, combine les stratégies de parallélisation à fins et gros grains pour accélérer les étapes de la chaine OCR. En outre, les coûts de communication CPU-GPU sont évités et une bonne gestion mémoire est assurée. L'efficacité de notre mise en oeuvre est validée par une expérimentation approfondie / The Optical Character Recognition (OCR) is a process that converts text images into editable text documents. Today, these systems are widely used in the dematerialization applications such as mail sorting, bill management, etc. In this context, the aim of this thesis is to propose an OCR system that provides a better compromise between recognition rate and processing speed which allows to give a reliable and a real time documents dematerialization. To ensure its recognition, the text is firstly extracted from the background. Then, it is segmented into disjoint characters that are described based on their structural characteristics. Finally, the characters are recognized when comparing their descriptors with a predefined ones.The text extraction, based on binarization methods remains difficult in heterogeneous and scanned documents with a complex and noisy background where the text may be confused with a textured background or because of the noise. On the other hand, the description of characters, and the extraction of segments, are often complex using calculation of geometricaltransformations, polygon, including a large number of characteristics or gives low discrimination if the characteristics of the selected type are sensitive to variation of scale, style, etc. For this, we adapt our algorithms to the type of heterogeneous and scanned documents. We also provide a high discriminatiobn between characters that descriptionis based on the study of the structure of the characters according to their horizontal and vertical projections. To ensure real-time processing, we parallelise algorithms developed on the graphics processor (GPU). Our main contributions in our proposed OCR system are as follows:A new binarisation method for heterogeneous and scanned documents including text regions with complex or homogeneous background. In this method, an image analysis process is used followed by a classification of the document areas into images (text with a complex background) and text (text with a homogeneous background). For text regions is performed text extraction using a hybrid method based on classification algorithm Kmeans (CHK) that we have developed for this aim. This method combines local and global approaches. It improves the quality of separation text/background, while minimizing the amount of distortion for text extraction from the scanned document and noisy because of the process of digitization. The image areas are improved with Gamma Correction (CG) before applying HBK. According to our experiment, our text extraction method gives 98% of character recognition rate on heterogeneous scanned documents.A Unified Character Descriptor based on the study of the character structure. It employs a sufficient number of characteristics resulting from the unification of the descriptors of the horizontal and vertical projection of the characters for efficient discrimination. The advantage of this descriptor is both on its high performance and its simple computation. It supports the recognition of alphanumeric and multiscale characters. The proposed descriptor provides a character recognition 100% for a given Face-type and Font-size.Parallelization of the proposed character recognition system. The GPU graphics processor has been used as a platform of parallelization. Flexible and powerful, this architecture provides an effective solution for accelerating intensive image processing algorithms. Our implementation, combines coarse/fine-grained parallelization strategies to speed up the steps of the OCR chain. In addition, the CPU-GPU communication overheads are avoided and a good memory management is assured. The effectiveness of our implementation is validated through extensive experiments Ocr Binarisation Parallélisation Gpu Documents hétérogènes Ocr Binarization Parallelization Gpu Heterogeneous Documents
102	Evaluating the Scalability of SDF Single-chip Multiprocessor Architecture Using Automatically Parallelizing Code Zhang, Yuhua 12 1900 (has links) Advances in integrated circuit technology continue to provide more and more transistors on a chip. Computer architects are faced with the challenge of finding the best way to translate these resources into high performance. The challenge in the design of next generation CPU (central processing unit) lies not on trying to use up the silicon area, but on finding smart ways to make use of the wealth of transistors now available. In addition, the next generation architecture should offer high throughout performance, scalability, modularity, and low energy consumption, instead of an architecture that is suitable for only one class of applications or users, or only emphasize faster clock rate. A program exhibits different types of parallelism: instruction level parallelism (ILP), thread level parallelism (TLP), or data level parallelism (DLP). Likewise, architectures can be designed to exploit one or more of these types of parallelism. It is generally not possible to design architectures that can take advantage of all three types of parallelism without using very complex hardware structures and complex compiler optimizations. We present the state-of-art architecture SDF (scheduled data flowed) which explores the TLP parallelism as much as that is supplied by that application. We implement a SDF single-chip multiprocessor constructed from simpler processors and execute the automatically parallelizing application on the single-chip multiprocessor. SDF has many desirable features such as high throughput, scalability, and low power consumption, which meet the requirements of the next generation of CPU design. Compared with superscalar, VLIW (very long instruction word), and SMT (simultaneous multithreading), the experiment results show that for application with very little parallelism SDF is comparable to other architectures, for applications with large amounts of parallelism SDF outperforms other architectures. Multiprocessors. Computer architecture. SDF single-chip multiprocessor multithreading automatic parallelization scalability superscalar VLIW SMT
103	Zpracování obrazu s velkými datovými toky - využití CUDA/OpenCL / High data rate image processing using CUDA/OpenCL Sedláček, Filip January 2018 (has links) The main objective of this research is to propose optimization of the defect detection algorithm in the production of nonwoven textile. The algorithm was developed by CAMEA spol. s.r.o. As a consequence of upgrading the current camera system to a more powerful one, it will be necessary to optimize the current algorithm and choose the hardware with the appropriate architecture on which the calculations will be performed. This work will describe a usefull programming techniques of CUDA software architecture and OpenCL framework in details. Using these tools, we proposed to implement a parallel equivalent of the current algorithm, describe various optimization methods, and we designed a GUI to test these methods.
104	Modul pro výuku výslovnosti cizích jazyků / Module for Pronunciation Training and Foreign Language Learning Kudláč, Vladan January 2021 (has links) Cílem této práce je vylepšit implementaci modulu pro mobilní aplikace pro výuku výslovnosti, najít místa vhodná pro optimalizaci a provést optimalizaci s cílem zvýšit přesnost, snížit čas zpracování a snížit paměťovou náročnost zpracování.
105	Konstrukce kD stromu na GPU / Building kD Tree on GPU Bajza, Jakub January 2016 (has links) This term project addresses the construction of kD tree acceleration structures and parallelization of this construction using GPU. At the beginning, there is an introduction of the reader into CUDA platform for parallel programming. There is a decription of generic principles as well as specific features that will be used in this thesis. Following that the reader is put into the issue of acceleration structures for Ray tracing. These structures are described and the kD tree acceleration structure and its variants are portrayed in detail. After that the analysis of chosen kD tree variant is broken down and the problems and issuse of its parallel implementation are adressed. As a part of implementation discription, there is a short descripton of CPU variant and detailed specifications of the CUDA kernels. The testing section brings the results of implementation in form of CPU vs GPU comparison, as well as evaluation of how much the metric set in design was fulfilled. In the end there is a summary of achieved goals and results followed by possible future improvements for the implementation.
106	Paralelizace ultrazvukových simulací na svazku grafických karet / Parallelisation of Ultrasound Simulations on Multi-GPU Clusters Dujíček, Aleš January 2015 (has links) This work is part of the k-Wave project, which is a toolbox designed for time ultrasound simulations in complex and heterogeneous media. The simulation functions are based on the k-space pseudospectral method. The goal of this work is to compute these simulations on graphics cards using local domain decompostion. Thanks to decomposition we could compute these simulations faster, and on larger data grids. The main goal of this work is to achieve efficiency and scalability.
107	LIMES M/R: Parallelization of the LInk discovery framework for MEtric Spaces using the Map/Reduce paradigm Hillner, Stanley 26 February 2018 (has links) The World Wide Web is the most important information space in the world. With the change of the web during the last decade, today’sWeb 2.0 offers everybody the possibility to easily publish information on the web. For instance, everyone can have his own blog, write Wikipedia articles, publish photos on Flickr or post status messages via Twitter. All these services on the web offer users all around the world the opportunity to interchange information and interconnect themselves with other users. However, the information, as it is usually published today, does not offer enough semantics to be machine-processable. As an example, Wikipedia articles are created using the lightweight Wiki markup language and then published as HyperText Markup Language (HTML) files whose semantics can easily be captured by humans, but not machines. info:eu-repo/classification/ddc/000 ddc:000
108	Adaptive Finite Elements for Systems of PDEs: Software Concepts, Multi-level Techniques and Parallelization Vey, Simon 21 February 2008 (has links) In the recent past, the field of scientific computing has become of more and more importance for scientific as well as for industrial research, playing a comparable role as experiment and theory do. This success of computational methods in scientific and engineering research is next to the enormous improvement of computer hardware to a large extend due to contributions from applied mathematicians, who have developed algorithms which make real life applications feasible. Examples are adaptive methods, high order discretization, fast linear and non-linear solvers and multi-level methods. The application of these methods in a large class of problems demands for suitable and robust tools for a flexible and efficient implementation. In order to play a crucial role in scientific and engineering research, besides efficiency in the numerical solution, also efficiency in problem setup and interpretation of simulation results is of utmost importance. As modeling and computing comes closer together, efficient computational methods need to be applied to new sets of equations. The problems to be addressed by simulation methods become more and more complicated, ranging over different scales, interacting on different dimensions and combining different physics. Such problems need to be implemented in a short period of time, solved on complicated domains and visualized with respect to the demand of the user. %Only a modular abstract simulation environment will fulfill these requirements and allow to setup, solve and visualize real-world problems appropriately. In this work, the concepts and the design of the C++ finite element toolbox AMDiS (adaptive multidimensional simulations) are described. It is shown, how abstract data structures and modern software concepts can help to design user-friendly finite element software, which provides large flexibility in problem definition while on the other hand efficiently solves these problems. Also systems of coupled problems can be solved in an intuitive way. In order to demonstrate its possibilities, AMDiS has been applied to several non-standard problems. The most time-consuming part in most simulations is the solution of linear systems of equations. Multi-level methods use discretization hierarchies to solve these systems in a very efficient way. In AMDiS, such multi-level techniques are implemented in the context of adaptive finite elements. Several numerical results are given which compare this multigrid solver with classical iterative methods. Besides the development of more efficient algorithms also the growing hardware capabilities lead to an improvement of simulation possibilities. Modern computing clusters contain more and more processors and also personal computers today are often equipped with multi-core processors. In this work, a new parallelization approach has been developed which allows the parallelization of sequential code in a very easy way and reduces the communication overhead compared to classical parallelization concepts. info:eu-repo/classification/ddc/510 ddc:510
109	Parallelization of the HIROMB ocean model Wilhelmsson, Tomas January 2002 (has links) <p>NR 20140805</p> parallelization ocean model operational forecast Baltic Sea multi-frontal solver domain decomposition ice dynamics
110	Parallelization of Online Random Forest Lindroth, Leonard January 2021 (has links) Context. Random Forests (RFs) is a very popular machine learning algorithm for mining large scale data. RFs is mainly known asan algorithm that operates in offline mode. However, in recent yearsimplementations of online random forests (ORFs) have been introduced. With multicore processors and successful implementation ofparallelism may result in increased performance of an algorithm, inrelation to its sequential implementation. Objectives. In this paper we develop and investigate the performanceof a parallel implementation of ORFs and compare the experimentalresults with its sequential counterpart. Methods. From using profiling tools on ORFs we located its bottlenecks and with this knowledge we used the implementation/experiment methodology to develop parallel online random forests (PORFs).Evaluation is done by comparing performance from ORFs and PORFs. Results. Experiments on common machine learning data sets showthat PORFs achieve equal classification to our execution of ORFs. However, there is a difference in classification on some data sets whencompared to results from another study. Furthermore, PORFs didn’tachieve any speed up compared to ORFs. In fact with the added overhead from pthreads PORFs takes longer time to finish than ORFs. Conclusions. We conclude that our parallelization of ORFs achievesequal classification performance as sequential ORFs. However, speedup wasn’t achieved with our chosen approach for parallelism. Possible solutions to achieve speed up is presented and suggested as futurework. Online Learning Parallelization Online Random Forests Machine Learning Computer Systems Datorsystem

Search results