Java’s cross-platform virtual machine arrangement and its special features that make
it ideal for writing network applications, also have a tremendous negative impact on its
operations. In spite of its relatively weak performance, Java’s success has motivated the
search for techniques to enhance its execution.
This work presents the JAFARDD (a Java Architecture based on a Folding Algorithm,
with Reservation stations, Dynamic translation, and Dual processing) processor designed
to accelerate Java processing. JAFARDD dynamically translates Java bytecodes to RISC
instructions to facilitate the use of a typical general-purpose RISC core. This enables the
exploitation of the instruction level parallelism among the translated instructions using well
established techniques, and facilitates the migration to Java-enabled hardware.
Designing hardware for Java requires an extensive knowledge and understanding of
its instruction set architecture which were acquired through a comprehensive behavioral
analysis by benchmarking. Many aspects of the Java workload behavior were collected and
the resulting statistics were analyzed. This helped identify performance-critical aspects that
are candidates for hardware support. Our analysis surpasses other similar ones in terms of
the number of aspects studied and the coverage of the recommendations made.
Next, a global analysis of the design space of Java processors was carried out. Different
hardware design options and alternatives that are suitable for Java were explored and their
trade-offs were examined. We especially focused on the design methodology, execution
engine organization, parallelism exploitation, and support for high-level language features.
This analysis helped identify innovative design ideas such as the use of a modified Tomasulo’s
algorithm. This, in turn, motivated the development of a bytecode folding algorithm
that integrates with the reservation station concept in JAFARRD.
While examining the behavioral analysis and the design space exploration ideas, a list of
global architectural design principles started to emerge. These principles ensure JAFARRD
can execute Java efficiently and are taken into consideration while the various instruction
pipeline modules were designed.
Results from the behavioral analysis also confirmed that Java’s stack architecture creates
virtual data dependencies that limit performance and prohibit instruction level parallelism. To overcome this drawback, stack operation folding has been suggested in the
literature to enhance performance by grouping contiguous instructions that have true data
dependencies into a compound instruction. We have developed a folding algorithm that,
unlike existing ones, does not require the folded instructions to be consecutive. To the best
of our knowledge, our folding algorithm is the only one that permits nested pattern folding,
tolerates variations in folding groups, and detects and resolves folding hazards completely.
By incorporating this algorithm into a Java processor, the need for, and therefore the limitations
of, a stack are eliminated.
In addition to an efficient dual processing configuration (i.e., Java and RISC), JAFARDD
is empowered with a number of innovative design features, including: an adaptive
feedback fetch policy that copes with the variation in Java instruction size, a smart bytecode
queue that compensates for the lack of a stack, an on-chip local variable file to facilitate
operand access, an early tag assignment to dispatched instructions to reduce processing
delay, and a specialized load/store unit that preprocesses object-oriented instructions.
The functionality of JAFARDD has been successfully demonstrated through VHDL
modeling and simulation. Furthermore, benchmarking using SPECjvm98 showed that the
introduced techniques indeed speed up Java execution. Our bytecode folding algorithm
speeds up execution by an average of about 1.29, eliminating an average of 97% of the
stack instructions and 50% of the overall instructions.
Compared to other proposals, JAFARDD combines Java bytecode folding with dynamic
hardware translation, while maintaining the RISC nature of the processor, making this a
much more flexible and general approach. / Graduate
Identifer | oai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/10264 |
Date | 07 November 2018 |
Creators | El-Kharashi, Mohamed Watheq Ali Kamel |
Contributors | Gebali, Fayez, Li, K. F. |
Source Sets | University of Victoria |
Language | English, English |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Rights | Available to the World Wide Web |
Page generated in 0.1235 seconds