Global ETD Search

111	Scalable parallel architecture for biological neural simulation on hardware platforms Pourhaj, Peyman 04 October 2010 Difficulties and dangers in doing experiments on living systems and providing a testbed for theorists make the biologically detailed neural simulation an essential part of neurobiology. Due to the complexity of the neural systems and dynamic properties of the neurons simulation of biologically realistic models is very challenging area. Currently all general purpose simulator are software based. Limitation on the available processing power provides a huge gap between the maximum practical simulation size and human brain simulation as the most complex neural system. This thesis aimed at providing a hardware friendly parallel architecture in order to accelerate the simulation process.<p> This thesis presents a scalable hierarchical architecture for accelerating simulations of large-scale biological neural systems on field-programmable gate arrays (FPGAs). The architecture provides a high degree of flexibility to optimize the parallelization ratio based on available hardware resources and model specifications such as complexity of dendritic trees. The whole design is based on three types of customized processors and a switching module. An addressing scheme is developed which allows flexible integration of various combination of processors. The proposed addressing scheme, design modularity and data process localization allow the whole system to extend over multiple FPGA platforms to simulate a very large biological neural system.<p> In this research Hodgkin-Huxley model is adopted for cell excitability. Passive compartmental approach is used to model dendritic tree with any level of complexity. The whole architecture is verified in MATLAB and all processor modules and the switching unit implemented in Verilog HDL and Schematic Capture. A prototype simulator is integrated and synthesized for Xilinx V5-330t-1 as the target FPGA. While not dependent on particular IP (Intellectual Property) cores, the whole implementation is based on Xilinx IP cores including IEEE-754 64-bit floating-point adder and multiplier cores. The synthesize results and performance analyses are provided. Biological Neuron Simulation FPGA Parallel processing
112	Scalable parallel architecture for biological neural simulation on hardware platforms Pourhaj, Peyman 04 October 2010 (has links) Difficulties and dangers in doing experiments on living systems and providing a testbed for theorists make the biologically detailed neural simulation an essential part of neurobiology. Due to the complexity of the neural systems and dynamic properties of the neurons simulation of biologically realistic models is very challenging area. Currently all general purpose simulator are software based. Limitation on the available processing power provides a huge gap between the maximum practical simulation size and human brain simulation as the most complex neural system. This thesis aimed at providing a hardware friendly parallel architecture in order to accelerate the simulation process.<p> This thesis presents a scalable hierarchical architecture for accelerating simulations of large-scale biological neural systems on field-programmable gate arrays (FPGAs). The architecture provides a high degree of flexibility to optimize the parallelization ratio based on available hardware resources and model specifications such as complexity of dendritic trees. The whole design is based on three types of customized processors and a switching module. An addressing scheme is developed which allows flexible integration of various combination of processors. The proposed addressing scheme, design modularity and data process localization allow the whole system to extend over multiple FPGA platforms to simulate a very large biological neural system.<p> In this research Hodgkin-Huxley model is adopted for cell excitability. Passive compartmental approach is used to model dendritic tree with any level of complexity. The whole architecture is verified in MATLAB and all processor modules and the switching unit implemented in Verilog HDL and Schematic Capture. A prototype simulator is integrated and synthesized for Xilinx V5-330t-1 as the target FPGA. While not dependent on particular IP (Intellectual Property) cores, the whole implementation is based on Xilinx IP cores including IEEE-754 64-bit floating-point adder and multiplier cores. The synthesize results and performance analyses are provided. Biological Neuron Simulation FPGA Parallel processing
113	Integration and Evaluation of Cache Coherence Protocols for Multiprocessor SoCs Suh, Taeweon 20 November 2006 (has links) System-on-a-chip (SoC) designs is characterized by heavy reuse of IP blocks to satisfy specific computing needs for target applications, reduce overall design cost, and expedite time-to-market. To meet their performance goal and cost constraint, SoC designers integrate multiple, sometimes heterogeneous, processor IPs to perform particular functions. This design approach is called Multiprocessor SoC (MPSoC). In this thesis, I investigated generic methodologies for enabling efficient communication among heterogeneous processors and quantified the efficiency of coherence traffic. Hardware techniques for two main MPSoC architectures were studied: Integration of cache coherence protocols for shared-bus-based MPSoCs and Cache coherence support for non-shared-bus-based MPSoCs. In the shared-bus-based MPSoCs, the integration techniques guarantee data consistency among incompatible coherence protocols. An integrated protocol will contain common states from these coherence protocols. A snoop-hit buffer and region-based cache coherence were also proposed to further enhance the coherence performance. For the non-shared-bus-based MPSoCs, bypass and bookkeeping approaches were proposed to maintain coherence in a new cache coherence-enforced memory controller. The simulations based on micro-benchmark and RTOS kernel showed the benefits of my methodologies over a generic software solution. This thesis also evaluated and quantified the efficiency of coherence traffic based on a novel emulation platform using FPGA. The proposed technique can completely isolate the intrinsic delay of the coherence traffic to demonstrate the impact of coherence traffic on system performance. Unlike previous evaluation methods, this technique eliminated non-deterministic factors in measurements such as bus arbitration delay and stall in the pipelined bus. The experimental results showed that the cache-to-cache transfer in the Intel server system is less efficient than the main memory access. MPSoC Multiprocessors
114	Comparison of Realization Methods for the Morphological Filter with Their Applications Chiu, Yun-ming 31 August 2006 (has links) The morphological image processing can modify the shapes of objects very efficiently by structure elements. Thus, the morphology processing has recently been applied to industry auto-inspection and medical image processing successfully. In this thesis, we incestigate the efficient processing of morphological image processing by two approaches: quadtree approach and paralell approach. By the quadtree decomposition, any binary image can be decomposed into black and white square blocks with some fixed size of power of 2. Thus, dilation of the whole image can be accomplished by dilating individual decomposed square blocks. On the other hand, any binary image can be presented by bit per pixel basis. Thus, we can exploit the parallel on a personal computer to speed up the set oriented morphological image processing. Experiments have revealed that both approach are much faster than the direct method. The quadtree approach are most advantageous for large structure elements. Whereas, the parralel approach are the fastest for the usual applications. Morphological Image Processing quadtree Dilation Parallel processing
115	Efficient Implementation of Morphological Image Processing on Pentium Machines Chen, Jau-Liang 06 August 2001 (has links) Morphological image processing is especially useful in the applications of medical image processing, pattern recognition, and industry auto-inspection. Special hardware for morphological image processing are very expensive. On the other hand, the speed of software are too slow. The purpose of this paper is to speed up the software computations of morphological image processing by parallel processing on Pentium machine. The morphological operation is similar to digital convolution. We can realize our parallel morphological operation on the Pentium machine by two different methods. They are output-decomposition and input-decomposition methods, similar to the procedure of overlap-and-save and overlap-and-add respectively. The above methods implemented on Pentium machine are proved very efficient with 64-bits parallelism. Our experimental results demonstrated they are twice faster than the 32-bits parallelism method. In addition to the simulation and the real time experiments, a set of theoretical formulas are derived to analyze our methods and are checking the actual measured time quite well. Morphological Image Processing MMX Instruction Parallel processing
116	Multi-area power system state estimation utilizing boundary measurements and phasor measurement units ( PMUs) Freeman, Matthew A 30 October 2006 (has links) The objective of this thesis is to prove the validity of a multi-area state estimator and investigate the advantages it provides over a serial state estimator. This is done utilizing the IEEE 118 Bus Test System as a sample system. This thesis investigates the benefits that stem from utilizing a multi-area state estimator instead of a serial state estimator. These benefits are largely in the form of increased accuracy and decreased processing time. First, the theory behind power system state estimation is explained for a simple serial estimator. Then the thesis shows how conventional measurements and newer, more accurate PMU measurements work within the framework of weighted least squares estimation. Next, the multi-area state estimator is examined closely and the additional measurements provided by PMUs are used to increase accuracy and computational efficiency. Finally, the multi-area state estimator is tested for accuracy, its ability to detect bad data, and computation time. Multi-Area State Estimation Parallel Processing
117	Task Scheduling Technlques for Distrlbuted/Parallel Processing Systems Sreenivasan, C R 04 1900 (has links) Indian Institute of Science / This dissertation discusses the principles, techniques and approaches adopted in the design of task scheduling algorithms for Distributed Parallel Processing Computer Systems (DPCSs) connected with network of front-end systems (FSs), The primary goal in the design of scheduling algorithms is to minimise the total turnaround time of the jobs to be scheduled by maximizing the utilisation of the resources of the DPCS with minimum data communication overhead, The users present their jobs to be scheduled at the FS, The FS receives a job and generates a finite set of independent tasks based on mutually independent sections having inherent parallelism, Each task could be scheduled to different available processors of DPCS for concurrent execution, The tasks are of three groups viz,, compute intensive tasks, input. output intensive tasks and the combination of compute and input-output intensive tasks. They may have the execution time almost the same. Some of the tasks may have the execution time larger due to precedence constraints than that of other tasks and they are provided with logical breakpoints which can be utilised to further break the tasks into subtasks during scheduling, The technique of using breakpoint of the tasks is more appropriate when the number of available processors is more than the number of tasks to be scheduled. The tasks of a job thus generated are sent to the front-end processor (FEP or the host processor) of the DPCS in the form of data flow graph (DFG), The DFG is used to model the tasks and represent the precedence (or data dependencies) among the tasks, In order to preserve the constraints among the tasks during scheduling and realise efficient utilisation of the resources of DPCS, the DFG is structured in the form of levels, The FBP of DPCS has a resident Task Manager (TM). The key function of the TM is to schedule the tasks to the appropriate processors of DPCS either statically or dynamically based on the required resources. To realise efficient scheduling and utilisation of the processors of DPCS, the TM uses a set of buffers known as Task Forwarding Buffer (TFB), Task Output Buffer (TOB) and Task Status Buffer (TSB) maintained by the FEP of DPCS. The tasks of a job from the FS are received at the TFB. The TM picks up a set of tasks pertaining to a level for scheduling into a temporary buffer C and obtains the status of the processors of DPCS. In order to realise both static and dynamic approaches of allocation, task to processor relation is considered in the scheduling algorithm. If the number of tasks in C is equal to or greater than the number of processors available, one task per processor is allocated, the remaining tasks of C are scheduled subsequently as and when the processors become available. This method of allocation is called static approach. If the number of tasks in C is less than the number of processors available, the TM makes use of the logical breakpoints of the tasks to generate subtasks equal to the number of available processors. Each subtask is scheduled to a processor. This method of scheduling is called the dynamic approach. In all the case the precedence constraints among the tasks are preserved by scheduling the successor task to the parent processor or near neighbouring processor, maintaining minimum data communication between them. Various examples of Computational Fluid Dynamics problems' were tested and the objective of reduced total turnaround time and maximum utilisation of the processors was achieved. The total turnaround time achieved for different jobs varies between 51% and 86% with static approach and 16% and 89% with dynamic approach. The utilisation of the processors varies between the 50% and 92.5%. Hence a speed-up of 5 to 8 folds is realised. Electrical Communications Distributed Parallel Processing Systems
118	Jole a library for dynamic job-level parallel workloads / Patterson, Jordan Dacey Lee. January 2009 (has links) Thesis (M. Sc.)--University of Alberta, 2009. / Title from PDF file main screen (viewed on Nov. 27, 2009). "A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of Science, Department of Computing Science, University of Alberta." Includes bibliographical references.
119	Evidential reasoning in semantic networks : a formal theory and its parallel implementation / Shastri, Lokendra. January 1900 (has links) Thesis (Ph. D.)--University of Rochester, 1985. / "September 1985."
120	A practical realization of parallel disks for a distributed parallel computing system Jin, Xiaoming. January 2000 (has links) (PDF) Thesis (M.S.)--University of Florida, 2000. / Title from first page of PDF file. Document formatted into pages; contains ix, 41 p.; also contains graphics. Vita. Includes bibliographical references (p. 39-40).

Search results