Return to search

The use of Java in large scientific applications in HPC environments

Java is a very commonly used computer programming language, although its use amongst the scientific and High Performance Computing (HPC) communities remains relatively low. In this thesis, the option of using Java for developing scientific applications intended for execution in HPC environments is investigated.

The data reduction pipeline for the Gaia space astronomy mission is an example of a large software project that has been written in Java, and will run in HPC environments. The efficient execution of the Gaia data reduction pipeline was one of the main motivations behind this thesis, although this thesis largely remains a general investigation into the use of Java in HPC.

HPC is a fast changing field, in terms of hardware, software, and the scale of the problems that are being tackled. Amongst the most significant trends in HPC in recent years have been the increase in the number of cores per computing node, and the increase in the size of datasets that must be processed.

A significant challenge in HPC is ensuring that data is made available in a particular node, when a core is ready to process it, thereby avoiding deadtime and providing high throughput. One danger to throughput is a decrease in the performance of shared storage devices, as the number of concurrent processes that are accessing those devices increases.

Given the trends mentioned above, efficient data communication is very important for many applications running in HPC environments. In this thesis, we present an investigation into the current options for providing efficient data communication to Java applications in HPC environments. We investigate a number of implementations of Message Passing in Java (MPJ) and compare their performance.

We present a new communication middleware application, called MPJ-Cache. This middleware makes use of an underlying implementation of Message-Passing in Java (MPJ), and adds prefetching, caching, and file-splitting functionality. It presents application developers with a high-level API, thus providing high-performance, as well as enabling high productivity amongst application developers. We compare the aggregate data rate that can be achieved though the use of this middleware, against that which can be achieved though direct access of a high performance shared storage device (GPFS), while distributing data amongst the nodes of a computer cluster. The use of MPJ-Cache has shown to provide an aggregate data rate of up to 103Gbps.

Java applications are executed within a Java Virtual Machine (JVM), which is a managed runtime environment. The execution of applications within such a runtime environment is very different from the execution of native code, that was compiled ahead-of-time. The Java runtime environment consists of several sophisticated components, including the core runtime system, a garbage collector and a Just-In-Time (JIT) compiler. Modern JVMs strive to provide out-of-the-box high-performance, however in some situations, users may want to tune the JVM to better suit the behaviour and needs of a particular application. In order to do this, a profile of the target application should be obtained.

Identiferoai:union.ndltd.org:TDX_UB/oai:www.tdx.cat:10803/98405
Date21 January 2013
CreatorsFries, Aidan
ContributorsPortell de Mora, Jordi, Sirvent Pardell, Raül, Luri Carrascoso, Xavier, Universitat de Barcelona. Departament d'Astronomia i Meteorologia
PublisherUniversitat de Barcelona
Source SetsUniversitat de Barcelona
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/doctoralThesis, info:eu-repo/semantics/publishedVersion
Format240 p., application/pdf
SourceTDX (Tesis Doctorals en Xarxa)
Rightsinfo:eu-repo/semantics/openAccess, ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.

Page generated in 0.0027 seconds