281 |
A framework for managing the evolving web service protocols in service-oriented architectures.Ryu, Seung Hwan, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links)
In Service-Oriented Architectures, everything is a service and services can interact with each other when needed. Web services (or simply services) are loosely coupled software components that are published, discovered, and invoked across the Web. As the use of Web services grows, in order to correctly interact with the growing services, it is important to understand the business protocols that provide clients with the information on how to interact with services. In dynamic Web services environments, service providers need to constantly adapt their business protocols for reflecting the restrictions and requirements proposed by new applications, new business strategies, and new laws, or for fixing the problems found in the protocol definition. However, the effective management of such a protocol evolution raises critical problems: one of the most critical issues is to handle instances running under the old protocol when their protocol has been changed. Simple solutions, such as aborting them or allowing them to continue to run according to the old protocol, can be considered, but they are inapplicable for many reasons (e.g., the lose of work already done and the critical nature of work). We present a framework that supports service administrators in managing the business protocol evolution by providing several features, such as a set of change operators allowing modifications of protocols, a variety of protocol change impact analyses automatically determining which ongoing instances can be migrated to the new version of protocol, and data mining techniques inducing a model for classifying ongoing instances migrateable to the new protocol. To support the protocol evolution process, we have also developed database-backed GUI tools on top of our existing system. The proposed approach and tools can help service administrators in managing the evolution of ongoing instances when the business protocols of services with which they are interacting have changed.
|
282 |
Architectural support for security and reliability in embedded processorsRagel, Roshan Gabriel, Computer Science & Engineering, Faculty of Engineering, UNSW January 2006 (has links)
Security and reliability in processor based systems are concerns requiring adroit solutions. Security is often compromised by code injection attacks, jeopardizing even ???trusted software???. Reliability is of concern, where unintended code is executed in modern processors with ever smaller feature sizes and low voltage swings causing bit flips. Countermeasures by software-only approaches increase code size and therefore significantly reduce performance. Hardware assisted approaches use additional hardware monitors and thus incur considerably high hardware cost and have scalability problems. Considering reliability and security issues during the design of an embedded system has its advantages as this overcomes the limitations of existing solutions. The research work presented in this thesis combines two elements: one, defining a hardware software design framework for reliability and security monitoring at the granularity of micro-instructions, and two, applying this framework for real world problems. At a given time, a processor executes only a few instructions and large part of the processor is idle. Utilizing these idling hardware components by sharing them with the monitoring hardware, to perform security and reliability monitoring reduces the impact of the monitors on hardware cost. Using micro-instruction routines within the machine instructions, allows us to share most of the monitoring hardware. Therefore, our technique requires little hardware overhead in comparison to having additional hardware blocks outside the processor. This reduction in overhead is due to maximal sharing of hardware resources of the processor. Our framework is superior to software-only techniques as the monitoring routines are formed with micro-instructions and therefore reduces code size and execution time overheads, since they occur in parallel with machine instructions. This dissertation makes four significant contributions to the field of security and reliability on embedded processor research and they are: (i) proposed a security and reliability framework for embedded processors that could be included into its design phase; (ii) shown that inline (machine instruction level) monitoring will detect common security attacks (four inline monitors against common attacks cost 9.21% area and 0.67% performance, as opposed to previous work where an external monitor with two monitoring modules costs 15% area overhead); (iii) illustrated that basic block check-summing for code integrity is much simpler and efficient than currently proposed integrity violation detectors which address code injection attacks (this costs 5.03% area increase and 3.67% performance penalty with a single level control flow checking, as opposed to previous work where the area overhead is 5.59%, which needed three control flow levels of integrity checking); and (iv) shown that hardware assisted control flow checking implemented during the design of a processor is much cheaper and effective than software only approaches (this approach costs 0.24-1.47% performance and 3.59% area overheads, as opposed to previous work that costs 53.5-99.5% performance).
|
283 |
Optimization of instruction memory for embedded systemsJanapsatya, Andhi, Computer Science & Engineering, Faculty of Engineering, UNSW January 2005 (has links)
This thesis presents methodologies for improving system performance and energy consumption by optimizing the memory hierarchy performance. The processor-memory performance gap is a well-known problem that is predicted to get worse, as the performance gap between processor and memory is widening. The author describes a method to estimate the best L1 cache configuration for a given application. In addition, three methods are presented to improve the performance and reduce energy in embedded systems by optimizing the instruction memory. Performance estimation is an important procedure to assess the performance of the system and to assess the effectiveness of any applied optimizations. A cache memory performance estimation methodology is presented in this thesis. The methodology is designed to quickly and accurately estimate the performance of multiple cache memory configurations. Experimental results showed that the methodology is on average 45 times faster compared to a widely used tool (Dinero IV). The first optimization method is a software-only method, called code placement, was implemented to improve the performance of instruction cache memory. The method involves careful placement of code within memory to ensure high cache hit rate when code is brought into the cache memory. Code placement methodology aims to improve cache hit rates to improve cache memory performance. Experimental results show that by applying the code placement method, a reduction in cache miss rate by up to 71%, and energy consumption reduction of up to 63% are observed when compared to application without code placement. The second method involves a novel architecture for utilizing scratchpad memory. The scratchpad memory is designed as a replacement of the instruction cache memory. Hardware modification was designed to allow data to be written into the scratchpad memory during program execution, allowing dynamic control of the scratchpad memory content. Scratchpad memory has a faster memory access time and a lower energy consumption per access compared to cache memory; the usage of scratchpad memory aims to improve performance and lower energy consumption of systems compared to system with cache memory. Experimental results show an average energy reduction of 26.59% and an average performance improvement of 25.63% when compared to a system with cache memory. The third is an application profiling method using statistical information to identify application???s hot-spots. Application profiling is important for identifying section in the application where performance degradation might occur and/or where maximum performance gain can be obtained through optimization. The method was applied and tested on the scratchpad based system described in this thesis. Experimental results show the effectiveness of the analysis method in reducing energy and improving performance when compared to previous method for utilizing the scratchpad memory based system (average performance improvement of 23.6% and average energy reduction of 27.1% are observed).
|
284 |
Executable system architecting using systems modeling language in conjunction with Colored Petri Nets - a demonstration using the GEOSS network centric systemWang, Renzhong, January 2007 (has links) (PDF)
Thesis (M.S.)--University of Missouri--Rolla, 2007. / Vita. The entire thesis text is included in file. Title from title screen of thesis/dissertation PDF file (viewed November 30, 2007) Includes bibliographical references (p. 199-209).
|
285 |
Optimization of instruction memory for embedded systemsJanapsatya, Andhi, Computer Science & Engineering, Faculty of Engineering, UNSW January 2005 (has links)
This thesis presents methodologies for improving system performance and energy consumption by optimizing the memory hierarchy performance. The processor-memory performance gap is a well-known problem that is predicted to get worse, as the performance gap between processor and memory is widening. The author describes a method to estimate the best L1 cache configuration for a given application. In addition, three methods are presented to improve the performance and reduce energy in embedded systems by optimizing the instruction memory. Performance estimation is an important procedure to assess the performance of the system and to assess the effectiveness of any applied optimizations. A cache memory performance estimation methodology is presented in this thesis. The methodology is designed to quickly and accurately estimate the performance of multiple cache memory configurations. Experimental results showed that the methodology is on average 45 times faster compared to a widely used tool (Dinero IV). The first optimization method is a software-only method, called code placement, was implemented to improve the performance of instruction cache memory. The method involves careful placement of code within memory to ensure high cache hit rate when code is brought into the cache memory. Code placement methodology aims to improve cache hit rates to improve cache memory performance. Experimental results show that by applying the code placement method, a reduction in cache miss rate by up to 71%, and energy consumption reduction of up to 63% are observed when compared to application without code placement. The second method involves a novel architecture for utilizing scratchpad memory. The scratchpad memory is designed as a replacement of the instruction cache memory. Hardware modification was designed to allow data to be written into the scratchpad memory during program execution, allowing dynamic control of the scratchpad memory content. Scratchpad memory has a faster memory access time and a lower energy consumption per access compared to cache memory; the usage of scratchpad memory aims to improve performance and lower energy consumption of systems compared to system with cache memory. Experimental results show an average energy reduction of 26.59% and an average performance improvement of 25.63% when compared to a system with cache memory. The third is an application profiling method using statistical information to identify application???s hot-spots. Application profiling is important for identifying section in the application where performance degradation might occur and/or where maximum performance gain can be obtained through optimization. The method was applied and tested on the scratchpad based system described in this thesis. Experimental results show the effectiveness of the analysis method in reducing energy and improving performance when compared to previous method for utilizing the scratchpad memory based system (average performance improvement of 23.6% and average energy reduction of 27.1% are observed).
|
286 |
Optimization of instruction memory for embedded systemsJanapsatya, Andhi, Computer Science & Engineering, Faculty of Engineering, UNSW January 2005 (has links)
This thesis presents methodologies for improving system performance and energy consumption by optimizing the memory hierarchy performance. The processor-memory performance gap is a well-known problem that is predicted to get worse, as the performance gap between processor and memory is widening. The author describes a method to estimate the best L1 cache configuration for a given application. In addition, three methods are presented to improve the performance and reduce energy in embedded systems by optimizing the instruction memory. Performance estimation is an important procedure to assess the performance of the system and to assess the effectiveness of any applied optimizations. A cache memory performance estimation methodology is presented in this thesis. The methodology is designed to quickly and accurately estimate the performance of multiple cache memory configurations. Experimental results showed that the methodology is on average 45 times faster compared to a widely used tool (Dinero IV). The first optimization method is a software-only method, called code placement, was implemented to improve the performance of instruction cache memory. The method involves careful placement of code within memory to ensure high cache hit rate when code is brought into the cache memory. Code placement methodology aims to improve cache hit rates to improve cache memory performance. Experimental results show that by applying the code placement method, a reduction in cache miss rate by up to 71%, and energy consumption reduction of up to 63% are observed when compared to application without code placement. The second method involves a novel architecture for utilizing scratchpad memory. The scratchpad memory is designed as a replacement of the instruction cache memory. Hardware modification was designed to allow data to be written into the scratchpad memory during program execution, allowing dynamic control of the scratchpad memory content. Scratchpad memory has a faster memory access time and a lower energy consumption per access compared to cache memory; the usage of scratchpad memory aims to improve performance and lower energy consumption of systems compared to system with cache memory. Experimental results show an average energy reduction of 26.59% and an average performance improvement of 25.63% when compared to a system with cache memory. The third is an application profiling method using statistical information to identify application???s hot-spots. Application profiling is important for identifying section in the application where performance degradation might occur and/or where maximum performance gain can be obtained through optimization. The method was applied and tested on the scratchpad based system described in this thesis. Experimental results show the effectiveness of the analysis method in reducing energy and improving performance when compared to previous method for utilizing the scratchpad memory based system (average performance improvement of 23.6% and average energy reduction of 27.1% are observed).
|
287 |
CounterDataFlow architecture : design and performanceMiller, Michael F., (Michael Frederic) 17 July 1997 (has links)
The counterflow pipeline concept was originated by Sproull and Sutherland to demonstrate
the concept of asynchronous circuits. This architecture relies on distributed decision
making and localized clocking and data movement. We have taken these ideas and reformulated
them into a substantially faster more scalable architecture that has the same distributed
decision making and locality for clocking and data, but adds very aggressive
speculation, no stalls, and other desirable characteristics. A high level Java simulator has
been built to explore the design tradeoffs and evaluate performance. / Graduation date: 1998
|
288 |
Allocation of SISAL program graphs to processors using BLASRaisinghani, Manoj H. 07 April 1994 (has links)
There are a number of well known techniques for extracting parallelism from a
given program. They range from hardware implementations, building restructuring
compilers or reorganizing of programs so as to specify all the available parallelism. The
success rate of any of the known techniques is rather poor over all types of programs.
This has pushed the research community to explore new languages and design different
architectures to exploit program parallelism.
The principles of dataflow architectures have addressed the problem of exploiting
parallelism in systems by executing dataflow graphs. These graphs or programs represent
data dependencies among instructions and execution of the graph proceeds in a data-driven
manner. That is, an instruction is executed as soon as all its operands are
available, without waiting for any program counter to sequence its execution, as is the
case in conventional von Neumann architectures.
In this thesis, data flow graphs are generated during the intermediate compilation of
a functional language called SISAL (Streams and Iterations in a Single Assignment
Language). The Intermediate Form (IFl) is a graphical language consisting of multiple
acyclic function graphs that represent a given program. Each graph consists of a
sequence of nodes and edges. The nodes specify the operation and the edges indicate the
dependencies between the nodes. The graphs are further connected to each other by
means of implicit dependencies.
The Automator package developed in this project, preprocesses these multiple IF1
graphs and translates them into a single connected graph. It converts all implicit
dependencies into actual ones. Additionally, complex language constructs like For All,
loops and if-then-else are treated in special ways together with their nested levels by the
Automator. There is virtually no limit to the number of nested levels that can be
translated by this package.
The Automator's prime contribution is in translating real programs written in SISAL
into a specified format required by an allocation algorithm called the Balanced Layered
Allocation Scheme (BLAS).
BLAS partitions a connected graph into independent tasks and assigns them to
processors in a multicomputer system. The problem of program allocation lies in
maximizing parallelism while minimizing interprocessor communication costs. Hence,
allocation is based on the best choice of communication to execution ratio for each task.
BLAS utilizes heuristic rules to find a balance between computation and communication
costs in the target system. Here the target architecture is a simulated nCUBE 3E
computer, having a hypercube topology.
Simulations show that, BLAS is effective in reducing the overall execution time of
a program by considering the communication costs on the execution times. The results
will help in understanding the effects in packing nodes (grain-packing), routing issues in
the network and in general, the allocation problem to any processor in a network. In
addition, tasks have also been assigned to adjacent processors only, instead of any
processor on the hypercube network. The adjacent allocation to processors helps to
determine trade-offs required between achieved speed-ups and the time it takes to
completely allocate large graphs on compilation. / Graduation date: 1994
|
289 |
ADAM: A Decentralized Parallel Computer Architecture Featuring Fast Thread and Data Migration and a Uniform Hardware AbstractionHuang, Andrew "bunnie" 01 June 2002 (has links)
The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.
|
290 |
CDP a multithreaded implementation of a network communication protocol on the Cyclops-64 multithreaded architecture /Gan, Ge. January 2007 (has links)
Thesis (M.S.)--University of Delaware, 2006. / Principal faculty advisor: Guang R. Gao, Dept. of Electrical and Computer Engineering. Includes bibliographical references.
|
Page generated in 0.1037 seconds