Spelling suggestions: "subject:"aprocessing unit"" "subject:"eprocessing unit""
71 |
Prospecção de componentes bioativos em resíduos do processamento do pescado visando a sustentabilidade da cadeia produtiva / Prospecting of bioactive components in the fish processing for the sustainability of the production chainLika Anbe 13 October 2011 (has links)
O resíduo gerado nas indústrias de processamento de pescado representa sérios problemas de poluição ambiental pela falta de destino adequado a este material. As espécies que alcançaram melhor rendimento produzem cerca de 30 a 40% da fração comestível na forma de filés. O ideal seria utilizar a matéria-prima em toda a sua extensão para obtenção de co-produtos, evitando a própria formação do resíduo. Elaborou-se a silagem ácida de pescado através do resíduo do processamento de sardinha (Sardinella brasiliensis) como forma de aproveitamento integral da matériaprima, constituídos por brânquias, vísceras, cabeças, escamas, espinhas dorsais e descartes de tecido musculares, incentivando a sustentabilidade desde a escolha do ácido e o aproveitamento de resíduos. A utilização do ácido cítrico (T1) como agente acidificador apresentou bons resultados em relação à mistura fórmico:propiônico (T2). Avaliou-se a estabilização das silagens, a aplicação da centrifugação (modelo 5810R, Eppendorf), sob rotação de 4840 x g; 0 ºC; 20min para obtenção das frações e o rendimento de cada parcela. Para T1 foram obtidos 17,1% de fração lipídica, 27,2% de fração aquosa (F1) e 55,7% de fração sedimentada. Para T2 foram obtidos 15,1% de fração lipídica, 31,8% de fração aquosa (F2) e 53,1% de fração sedimentada. A silagem ao ser fracionada se torna uma alternativa tecnológica com possível utilização em diferentes áreas de atuação, pois em sua porção aquosa (F), há presença de todos os aminoácidos essenciais. O aminoácido em maior concentração foi o ácido glutâmico em T1 e F1, sendo 12,3 e 11,53 g/100g de proteína, respectivamente. Para T2 o maior valor encontrado foi para glicina, da ordem de 11,94 g/100g de proteína; e para F2 o ácido glutâmico, 11,25 g/100g de proteína. Os resultados indicaram a possibilidade das frações aquosas serem empregadas como peptonas, devido aos teores de aminoácidos existentes serem semelhantes e/ou superiores aos presentes em produtos comerciais. Buscou-se quantificar o resíduo gerado em um dia de processamento em uma unidade de beneficiamento de tilápias (Oreochromis niloticus), bem como verificar o custo para o possível aproveitamento deste. Obteve-se 61,15% de resíduo, sendo que, 28,23%; 17,12%; 7,97% e 7,83% eram constituídos de carcaças, cabeças, vísceras e peles. Sugeriu-se para a unidade de processamento, o encaminhamento dos resíduos para produção de co-produtos, como forma de aumentar a sustentabilidade sócio-econômica e ambiental da unidade de processamento. / The waste generated in the fish processing industries poses serious environmental pollution problems due to lack of suitable target for this material. The species that produced better yield reached about 30 to 40% fraction in the form of edible fillets. The ideal would be to use raw material in its entirety to obtain co-products, avoiding the very formation of the residue. We developed the acid silage of fish waste throught the processing of sardines (Sardinella brasiliensis) as a way to use all the raw materials, consisting of gills, viscera, scales, spines and discharges of muscle tissue, encouraging sustainability from the choice of acid and waste recovery. The use of citric acid (T1) as sour agent showed good results in terms of mixing formic, propionic (T2). We evaluated the stabilization of the silage and the application of centrifugation (model 5810R, Eppendorf) under rotation 4840 x g, 0 ºC, 20min to obtain the fractions and yield of each plot. The silage when it becomes an alternative fractional technology with potential use in different areas, because in the watery portion (F), is presence all the essential amino acids. In T1 obtained 17.1% of total lipids, 27.2% of aqueous fraction (F1) and 55.7% sedimented fraction. T2 were obtained for 15.1% of total lipids, 31,8% aqueous fraction (F2) and 53.1% sedimentad fraction. The amino acid concentration was higher in glutamic acid in T1 and F1, being 12.30 and 11.53 g.100g-1 of protein, respectively. For T2 the highest value was found for glycine, the order of 11.94 g.100g-1 of protein; for F2 the content was found for glutamic acid, 11.25 g.100g-1 of protein. The results indicate the possibility of aqueous fractions were employed as peptones, due to the existing levels amino acid are similar and/or greater than those present in commercial products. We sought to quantify the waste generated in a day of processing a unit improvement of tilapia (Oreohromis niloticus) and check the cost for the possible use of this. We obtained 61.15% of waste, of which, 28.23%, 17.12%, 7.97% and 7.83% consisted of carcasses, heads, guts and skins. It has been suggested for the processing unit, the routing of the waste to produce co-products as a way to increase the socio-economic sustainability and environmental processing unit.
|
72 |
Detekce objektů na GPU / Object Detection on GPUMacenauer, Pavel January 2015 (has links)
This thesis addresses the topic of object detection on graphics processing units. As a part of it, a system for object detection using NVIDIA CUDA was designed and implemented, allowing for realtime video object detection and bulk processing. Its contribution is mainly to study the options of NVIDIA CUDA technology and current graphics processing units for object detection acceleration. Also parallel algorithms for object detection are discussed and suggested.
|
73 |
Parallellisering av Sliding Extensive Cancellation Algorithm (ECA-S) för passiv radar med OpenMP / Parallelization of Sliding Extensive Cancellation Algorithm (ECA-S) for Passive Radar with OpenMPJohansson Hultberg, Andreas January 2021 (has links)
Software parallelization has gained increasing interest since the transistor manufacturing of smaller chips within an integrated circuit has begun to stagnate. This has led to the development of new processing units with an increasing number of cores. Parallelization is an optimization technique that allows the user to utilize parallel processes in order to streamline algorithm flows. This study examines the performance benefits that a passive bistatic radar system can obtain by parallelization and code refactorization. The study focuses mainly on investigating the use of parallel instructions within a shared memory model on a Central Processing Unit (CPU) with the use of an application programming interface, namely OpenMP. Quantitative data is collected to compare the runtime of the most central algorithm in the passive radar system, namely the Extensive Cancellation Algorithm (ECA). ECA can be used to suppress unwanted clutter in the surveillance signal, which purpose is to create clear target detections of airborne objects. The algorithm on the other hand is computationally demanding, which has led to the development of faster versions such as the Sliding ECA (ECA-S). Despite the ongoing development, the algorithm is still relatively computationally demanding which can lead to long execution times within the radar system. In this study, a MATLAB implementation of ECA-S is transformed to C in order to take advantage of the fast execution time of the procedural programming language. Parallelism is introduced within the converted algorithm by the use of Intel's thread methodology and then applied within two different operating systems. The study shows that a speedup can be obtained, in the programming language C, by a factor of 24 while still ensuring the correctness of the results. The results also showed that code refactorization of a MATLAB algorithm could result in 73% faster code and that C-MEX implementations are twice as slow as a C-implementation. Finally, the study pointed out that real-time can be achieved for a passive bistatic radar system with the use of the programming language C and by using parallel instructions within a shared memory model on a CPU. / Parallellisering av mjukvara har fått ett ökat intresse sedan transistortillverkningen av mindre chip inom en integrerade krets har börjat att stagnera. Detta har lett till utveckling av moderna processorer med ett ökande antal av kärnor. Parallellisering är en optimeringsteknik vilken tillåter användaren att utnyttja parallella processer till att effektivisera algoritmflöden. Denna studie undersöker de tidsmässiga fördelar ett passivt bistatiskt radarsystem kan erhålla genom att, bland annat tillämpa parallellisering och omformning. Studien fokuserar främst på att undersöka användandet av parallella trådar inom det delade minnesutrymmet på en centralprocessor (CPU), detta med hjälp av applikationsprogrammeringsgränssnittet OpenMP. Kvantitativa jämförelser tas fram med hjälp av en av de mest centrala algoritmerna inom det passiva radarsystemet, nämligen Extensive Cancellation Algorithm (ECA). ECA kan används till att undertrycka oönskat klotter i övervakningssignalen, vilket har till syfte att skapa klara måldetektioner av luftföremål. Algoritmen är däremot beräkningstung, vilket har medfört utveckling av snabbare versioner som exempelvis Sliding ECA (ECA-S). Trots utvecklingen är algoritmen fortfarande relativt beräkningstung och kan medföra en lång exekeveringstid inom hela radarsystemet. I denna studie transformeras en MATLAB-implementation av ECA-S till C för att kunna dra nytta av den snabba exekeveringstiden i det procedurella programmeringsspråket. Parallellism införs inom den transformerade algoritmen med hjälp av Intels trådmetodik och appliceras sedan inom två olika operativsystem. Studien visar på en tidsmässig optimering i C med upp till 24 gånger snabbare exekeveringstid och bibehållen noggrannhet. Resultaten visade även på att en enklare omformning av en MATLAB-algoritm kunde resultera till 73% snabbare kod och att en C-MEX-implementation är dubbelt så långsam i jämförelse med en C-implementering. Slutligen pekade studien på att realtid kan uppnås för ett passivt bistatiskt radarsystem vid användandet av programmeringsspråket C och med utnyttjandet av parallella instruktioner inom det delade minnet på en CPU.
|
74 |
Numerical solution of the two-phase incompressible navier-stokes equations using a gpu-accelerated meshless methodKelly, Jesse 01 January 2009 (has links)
This project presents the development and implementation of a GPU-accelerated meshless two-phase incompressible fluid flow solver. The solver uses a variant of the Generalized Finite Difference Meshless Method presented by Gerace et al. [1]. The Level Set Method [2] is used for capturing the fluid interface. The Compute Unified Device Architecture (CUDA) language for general-purpose computing on the graphics-processing-unit is used to implement the GPU-accelerated portions of the solver. CUDA allows the programmer to take advantage of the massive parallelism offered by the GPU at a cost that is significantly lower than other parallel computing options. Through the combined use of GPU-acceleration and a radial-basis function (RBF) collocation meshless method, this project seeks to address the issue of speed in computational fluid dynamics. Traditional mesh-based methods require a large amount of user input in the generation and verification of a computational mesh, which is quite time consuming. The RBF meshless method seeks to rectify this issue through the use of a grid of data centers that need not meet stringent geometric requirements like those required by finite-volume and finite-element methods. Further, the use of the GPU to accelerate the method has been shown to provide a 16-fold increase in speed for the solver subroutines that have been accelerated.
|
75 |
Development of Parallel Architectures for Radar/Video Signal Processing ApplicationsJarrah, Amin January 2014 (has links)
No description available.
|
76 |
Evaluating the OpenACC API for Parallelization of CFD ApplicationsPickering, Brent Phillip 06 September 2014 (has links)
Directive-based programming of graphics processing units (GPUs) has recently appeared as a viable alternative to using specialized low-level languages such as CUDA C and OpenCL for general-purpose GPU programming. This technique, which uses directive or pragma statements to annotate source codes written in traditional high-level languages, is designed to permit a unified code base to serve multiple computational platforms and to simplify the transition of legacy codes to new architectures. This work analyzes the popular OpenACC programming standard, as implemented by the PGI compiler suite, in order to evaluate its utility and performance potential in computational fluid dynamics (CFD) applications. Of particular interest is the handling of stencil algorithms, which are an important component of finite-difference and finite-volume numerical methods. To this end, the process of applying the OpenACC Fortran API to a preexisting finite-difference CFD code is examined in detail, and all modifications that must be made to the original source in order to run efficiently on the GPU are noted. Optimization techniques for OpenACC are also explored, and it is demonstrated that tuning the code for a particular accelerator architecture can result in performance increases of over 30%. There are also some limitations and programming restrictions imposed by the API: it is observed that certain useful features of modern Fortran (2003/8) are effectively disabled within OpenACC regions. Finally, a combination of OpenACC and OpenMP directives is used to create a truly cross-platform Fortran code that can be compiled for either CPU or GPU hardware. The performance of the OpenACC code is measured on several contemporary NVIDIA GPU architectures, and a comparison is made between double and single precision arithmetic showing that if reduced precision can be tolerated, it can lead to significant speedups. To assess the performance gains relative to a typical CPU implementation, the execution time for a standard benchmark case (lid-driven cavity) is used as a reference. The OpenACC version is compared against the identical Fortran code recompiled to use OpenMP on multicore CPUs, as well as a highly-optimized C++ version of the code that utilizes hardware aware programming techniques to attain higher performance on the Intel Xeon platforms being tested. Low-level optimizations specific to these architectures are analyzed and it is observed that the stencil access pattern required by the structured-grid CFD code sometimes leads to performance degrading conflict misses in the hardware managed CPU caches. The GPU code, which primarily uses software managed caching, is found to be free from these issues. Overall, it is observed that the OpenACC GPU code compares favorably against even the best optimized CPU version: using a single NVIDIA K20x GPU, the Fortran+OpenACC code is seen to outperform the optimized C++ version by 20% and the Fortran+OpenMP version by more than 100% with both CPU codes running on a 16-core Xeon workstation. / Master of Science
|
77 |
Efficient betweenness Centrality Computations on Hybrid CPU-GPU SystemsMishra, Ashirbad January 2016 (has links) (PDF)
Analysis of networks is quite interesting, because they can be interpreted for several purposes. Various features require different metrics to measure and interpret them. Measuring the relative importance of each vertex in a network is one of the most fundamental building blocks in network analysis. Between’s Centrality (BC) is one such metric that plays a key role in many real world applications. BC is an important graph analytics application for large-scale graphs. However it is one of the most computationally intensive kernels to execute, and measuring centrality in billion-scale graphs is quite challenging.
While there are several existing e orts towards parallelizing BC algorithms on multi-core CPUs and many-core GPUs, in this work, we propose a novel ne-grained CPU-GPU hybrid algorithm that partitions a graph into two partitions, one each for CPU and GPU. Our method performs BC computations for the graph on both the CPU and GPU resources simultaneously, resulting in a very small number of CPU-GPU synchronizations, hence taking less time for communications. The BC algorithm consists of two phases, the forward phase and the backward phase. In the forward phase, we initially and the paths that are needed by either partitions, after which each partition is executed on each processor in an asynchronous manner. We initially compute border matrices for each partition which stores the relative distances between each pair of border vertex in a partition. The matrices are used in the forward phase calculations of all the sources. In this way, our hybrid BC algorithm leverages the multi-source property inherent in the BC problem. We present proof of correctness and the bounds for the number of iterations for each source. We also perform a novel hybrid and asynchronous backward phase, in which each partition communicates with the other only when there is a path that crosses the partition, hence it performs minimal CPU-GPU synchronizations.
We use a variety of implementations for our work, like node-based and edge based parallelism, which includes data-driven and topology based techniques. In the implementation we show that our method also works using variable partitioning technique. The technique partitions the graph into unequal parts accounting for the processing power of each processor. Our implementations achieve almost equal percentage of utilization on both the processors due to the technique. For large scale graphs, the size of the border matrix also becomes large, hence to accommodate the matrix we present various techniques. The techniques use the properties inherent in the shortest path problem for reduction. We mention the drawbacks of performing shortest path computations on a large scale and also provide various solutions to it.
Evaluations using a large number of graphs with different characteristics show that our hybrid approach without variable partitioning and border matrix reduction gives 67% improvement in performance, and 64-98.5% less CPU-GPU communications than the state of art hybrid algorithm based on the popular Bulk Synchronous Paradigm (BSP) approach implemented in TOTEM. This shows our algorithm's strength which reduces the need for larger synchronizations. Implementing variable partitioning, border matrix reduction and backward phase optimizations on our hybrid algorithm provides up to 10x speedup. We compare our optimized implementation, with CPU and GPU standalone codes based on our forward phase and backward phase kernels, and show around 2-8x speedup over the CPU-only code and can accommodate large graphs that cannot be accommodated in the GPU-only code. We also show that our method`s performance is competitive to the state of art multi-core CPU and performs 40-52% better than GPU implementations, on large graphs. We show the drawbacks of CPU and GPU only implementations and try to motivate the reader about the challenges that graph algorithms face in large scale computing, suggesting that a hybrid or distributed way of approaching the problem is a better way of overcoming the hurdles.
|
78 |
Eismo dalyvių kelyje atpažinimas naudojant dirbtinius neuroninius tinklus ir grafikos procesorių / On - road vehicle recognition using neural networks and graphics processing unitKinderis, Povilas 27 June 2014 (has links)
Kasmet daugybė žmonių būna sužalojami autoįvykiuose, iš kurių dalis sužalojimų būna rimti arba pasibaigia mirtimi. Dedama vis daugiau pastangų kuriant įvairias sistemas, kurios padėtų mažinti nelaimių skaičių kelyje. Tokios sistemos gebėtų perspėti vairuotojus apie galimus pavojus, atpažindamos eismo dalyvius ir sekdamos jų padėtį kelyje. Eismo dalyvių kelyje atpažinimas iš vaizdo yra pakankamai sudėtinga, daug skaičiavimų reikalaujanti problema. Šiame darbe šiai problemai spręsti pasitelkti stereo vaizdai, nesugretinamumo žemėlapis bei konvoliuciniai neuroniniai tinklai. Konvoliuciniai neuroniniai tinklai reikalauja daug skaičiavimų, todėl jie optimizuoti pasitelkus grafikos procesorių ir OpenCL. Gautas iki 33,4% spartos pagerėjimas lyginant su centriniu procesoriumi. Stereo vaizdai ir nesugretinamumo žemėlapis leidžia atmesti didelius kadro regionus, kurių nereikia klasifikuoti su konvoliuciniu neuroniniu tinklu. Priklausomai nuo scenos vaizde, reikalingų klasifikavimo operacijų skaičius sumažėja vidutiniškai apie 70-95% ir tai leidžia kadrą apdoroti atitinkamai greičiau. / Many people are injured during auto accidents each year, some injures are serious or end in death. Many efforts are being put in developing various systems, which could help to reduce accidents on the road. Such systems could warn drivers of a potential danger, while recognizing on-road vehicles and tracking their position on the road. On-road vehicle recognition on image is a complex and computationally very intensive problem. In this paper, to solve this problem, stereo images, disparity map and convolutional neural networks are used. Convolutional neural networks are very computational intensive, so to optimize it GPU and OpenCL are used. 33.4% speed improvement was achieved compared to the central processor. Stereo images and disparity map allows to discard large areas of the image, which are not needed to be classified using convolutional neural networks. Depending on the scene of the image, the number of the required classification operations decreases on average by 70-95% and this allows to process the image accordingly faster.
|
79 |
Medical Image Processing on the GPU : Past, Present and FutureEklund, Anders, Dufort, Paul, Forsberg, Daniel, LaConte, Stephen January 2013 (has links)
Graphics processing units (GPUs) are used today in a wide range of applications, mainly because they can dramatically accelerate parallel computing, are affordable and energy efficient. In the field of medical imaging, GPUs are in some cases crucial for enabling practical use of computationally demanding algorithms. This review presents the past and present work on GPU accelerated medical image processing, and is meant to serve as an overview and introduction to existing GPU implementations. The review covers GPU acceleration of basic image processing operations (filtering, interpolation, histogram estimation and distance transforms), the most commonly used algorithms in medical imaging (image registration, image segmentation and image denoising) and algorithms that are specific to individual modalities (CT, PET, SPECT, MRI, fMRI, DTI, ultrasound, optical imaging and microscopy). The review ends by highlighting some future possibilities and challenges.
|
80 |
Método automático para descoberta de funções de ordenação utilizando programação genética paralela em GPU / Automatic raking function discovery method using parallel genetic programming on GPUCoimbra, Andre Rodrigues 28 March 2014 (has links)
Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2015-05-15T13:33:06Z
No. of bitstreams: 2
Dissertação - André Rodrigues Coimbra - 2014.pdf: 5214859 bytes, checksum: d951502129d7be5d60b6a785516c3ad1 (MD5)
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2015-05-15T13:37:45Z (GMT) No. of bitstreams: 2
Dissertação - André Rodrigues Coimbra - 2014.pdf: 5214859 bytes, checksum: d951502129d7be5d60b6a785516c3ad1 (MD5)
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2015-05-15T13:37:45Z (GMT). No. of bitstreams: 2
Dissertação - André Rodrigues Coimbra - 2014.pdf: 5214859 bytes, checksum: d951502129d7be5d60b6a785516c3ad1 (MD5)
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)
Previous issue date: 2014-03-28 / Ranking functions have a vital role in the performance of information retrieval systems
ensuring that documents more related to the user’s search need – represented as a query
– are shown in the top results, preventing the user from having to examine a range of
documents that are not really relevant.
Therefore, this work uses Genetic Programming (GP), an Evolutionary Computation
technique, to find ranking functions automaticaly and systematicaly. Moreover, in this
project the technique of GP was developed following a strategy that exploits parallelism
through graphics processing units.
Other known methods in the context of information retrieval as classification committees
and the Lazy strategy were combined with the proposed approach – called Finch. These
combinations were only feasible due to the GP nature and the use of parallelism.
The experimental results with the Finch, regarding the ranking functions quality, surpassed
the results of several strategies known in the literature. Considering the time performance,
significant gains were also achieved. The solution developed exploiting the
parallelism spends around twenty times less time than the solution using only the central
processing unit. / Funções de ordenação têm um papel vital no desempenho de sistemas de recuperação de
informação garantindo que os documentos mais relacionados com o desejo do usuário –
representado através de uma consulta – sejam trazidos no topo dos resultados, evitando
que o usuário tenha que analisar uma série de documentos que não sejam realmente
relevantes.
Assim, utiliza-se a Programação Genética (PG), uma técnica da Computação Evolucionária,
para descobrir de forma automática e sistemática funções de ordenação. Além disso,
neste trabalho a técnica de PG foi desenvolvida seguindo uma estratégia que explora o
paralelismo através de unidades gráficas de processamento.
Foram agregados ainda na abordagem proposta – denominada Finch – outros métodos
conhecidos no contexto de recuperação de informação como os comitês de classificação e
a estratégia Lazy. Sendo que essa complementação só foi viável devido a natureza da PG
e em virtude da utilização do paralelismo.
Os resultados experimentais encontrados com a Finch, em relação à qualidade das funções
de ordenação descobertas, superaram os resultados de diversas estratégias conhecidas
na literatura. Considerando o desempenho da abordagem em função do tempo, também
foram alcançados ganhos significativos. A solução desenvolvida explorando o paralelismo
gasta, em média, vinte vezes menos tempo que a solução utilizando somente a unidade
central de processamento.
|
Page generated in 0.0969 seconds