1 |
Scalable and Productive Data Management for High-Performance AnalyticsYoussef, Karim Yasser Mohamed Yousri 07 November 2023 (has links)
Advancements in data acquisition technologies across different domains, from genome sequencing to satellite and telescope imaging to large-scale physics simulations, are leading to an exponential growth in dataset sizes. Extracting knowledge from this wealth of data enables scientific discoveries at unprecedented scales. However, the sheer volume of the gathered datasets is a bottleneck for knowledge discovery. High-performance computing (HPC) provides a scalable infrastructure to extract knowledge from these massive datasets. However, multiple data management performance gaps exist between big data analytics software and HPC systems. These gaps arise from multiple factors, including the tradeoff between performance and programming productivity, data growth at a faster rate than memory capacity, and the high storage footprints of data analytics workflows. This dissertation bridges these gaps by combining productive data management interfaces with application-specific optimizations of data parallelism, memory operation, and storage management. First, we address the performance-productivity tradeoff by leveraging Spark and optimizing input data partitioning. Our solution optimizes programming productivity while achieving comparable performance to the Message Passing Interface (MPI) for scalable bioinformatics. Second, we address the operating system's kernel limitations for out-of-core data processing by autotuning memory management parameters in userspace. Finally, we address I/O and storage efficiency bottlenecks in data analytics workflows that iteratively and incrementally create and reuse persistent data structures such as graphs, data frames, and key-value datastores. / Doctor of Philosophy / Advancements in various fields, like genetics, satellite imaging, and physics simulations, are generating massive amounts of data. Analyzing this data can lead to groundbreaking scientific discoveries. However, the sheer size of these datasets presents a challenge. High-performance computing (HPC) offers a solution to process and understand this data efficiently. Still, several issues hinder the performance of big data analytics software on HPC systems. These problems include finding the right balance between performance and ease of programming, dealing with the challenges of handling massive amounts of data, and optimizing storage usage. This dissertation focuses on three areas to improve high-performance data analytics (HPDA). Firstly, it demonstrates how using Spark and optimized data partitioning can optimize programming productivity while achieving similar scalability as the Message Passing Interface (MPI) for scalable bioinformatics. Secondly, it addresses the limitations of the operating system's memory management for processing data that is too large to fit entirely in memory. Lastly, it tackles the efficiency issues related to input/output operations and storage when dealing with data structures like graphs, data frames, and key-value datastores in iterative and incremental workflows.
|
2 |
The significance of heterogeneity for spreading of geologically stored carbon dioxide / Betydelsen av heterogenitet för spridning av geologiskt lagrad koldioxidOlofsson, Christofer January 2011 (has links)
The demand for large scale storage of carbon dioxide (CO2) grows stronger as incentives to reduce greenhouse gas emissions are introduced. Geological storage sites such as depleted oil and gas reservoirs, unminable coal seams and deep saline water-saturated aquifers are a few of many possible geological storage sites. Geological formations offer large scale storage potential, hidden locations and are naturally occurring world wide. A disadvantage is the difficulty to investigate the properties of storage material over large areas. Reservoir simulation studies addressing issues of heterogeneous reservoirs are growing in number. There is still much to investigate however this study adds to the field by investigating the significance of the heterogeneity in hydraulic conductivity based on core sample data. The data was received from the main CO2 injection site Heletz, Israel in the European Union Seventh Framework Programme for research and technological development (EU FP7) project MUSTANG (CO2MUSTANG, 2011-03-13). By developing models using iTOUGH2/ECO2N, the aim of this study is to contribute to a better understanding of how the average permeability, variance in permeability and spatial correlation of the reservoir properties affect the distribution of CO2 within the deep saline aquifer target layer. In this study a stochastic simulation approach known as the Monte Carlo method is applied. Based on core sample data, geostatistical properties of the data are determined and utilized to create equally probable realizations where properties are described through a probability distribution described by a mean and variance as well as a constructed semivariogram. The results suggest that deep saline aquifers are less storage effective for higher values of average permeability, variance in permeability and spatial correlation. The results also indicate that the Heletz aquifer, with its highly heterogeneous characteristics, in some extreme cases can be just as storage effective as a deep saline aquifer ten times as permeable consisting of homogeneous sandstone. / Incitament för minskningar av växthusgaser har på senare tid ökat efterfrågan för storskalig lagring av koldioxid (CO2). Geologiska lagringsplatser som exploaterade olje- och gasreservoarer, svårutvunna kollager och djupt belägna salina akvifärer är exempel på potentiella lagringsplatser. Sådana geologiska formationer erbjuder storskalig lagring, dold förvaring och är naturligt förekommande världen över. Dock finns det stora svårigheter i att undersöka de materiella egenskaperna för hela lagringsområden. Simuleringsstudier som hantera frågor gällande reservoarers heterogenitet växer i antal. Det finns fortfarande mycket kvar att undersöka och denna studie bidrar till detta forskningsområde genom att undersöka betydelsen av heterogenitet i hydraulisk konduktivitet för spridningen av koldioxid med hjälp av uppmätt brunnsdata. Data erhölls från lagringsplatsen Heletz i Israel som är den huvudsakliga lagringplatsen i projektet MUSTANG är en del av den Europeiska Unionens sjunde ramprogram för forskning och teknisk utveckling (EU FP7), (CO2MUSTANG, 2011/3/13). Genom att utveckla modeller med hjälp av programvaran iTOUGH2/ECO2N är syftet med denna studie att bidra till en bättre förståelse för hur den genomsnittliga permeabilitet, varians i permeabilitet samt rumslig korrelation av reservoaregenskaper påverkar fördelningen av CO2 i den djupa saltvattenakvifären Heletz. Denna studie använde sig av stokastisk simulering genom att tillämpa Monte Carlo-metoden. Med hjälp av tidigare uppmätt brunnsdata kunde geostatistiska egenskaper bestämmas för att skapa ekvivalent sannolika realiseringar. De geostatistiska egenskaperna beskrevs med en sannolikhetsfördelning genom medelvärde och varians samt ett konstruerat semivariogram. Resultaten tyder på att djupa saltvattenakvifärer är mindre lagringseffektiva vid högre värden av genomsnittlig permeabilitet, varians i permeabilitet och rumslig horisontell korrelation. Resultaten visar även att Heletz akvifär, med dess mycket heterogena egenskaper, i extrema fall kan vara lika lagringsineffektiv som en djupt belägen saltvattenakvifär med tio gånger högre genomsnittlig permeabilitet.
|
3 |
Shallow aquifer storage and recovery (SASR): Regional management of underground water storage in hydraulically connected aquifer-stream systemsNeumann, Philip E. 08 November 2012 (has links)
A novel mode of shallow aquifer management could increase the volumetric potential and distribution of underground, freshwater storage: Shallow aquifer storage and recovery (SASR). In this mode, water is efficiently stored in basin fill aquifers with strong hydraulic connection to surface water. Regional numerical modeling can provide a linkage between storage efficiency and local hydrogeologic parameters, which in turn may contribute to useful rules guiding how and where water can be stored. This study: (1) uses a calibrated model of the central Willamette Basin (CWB), Oregon to correlate SASR storage efficiency to basic hydrogeologic parameters using the stream depletion factor (SDF); (2) uses SDF to identify regions of high storage efficiency, and (3) estimates potential volumetric storage and injection rates for storage-efficient regions. Potential storage for the CWB is estimated to be 2.40 million m��. Given areal average hydrogeologic parameters, 8 wells--roughly 35 m deep and 0.3 m diameter--would be capable of managing this storage on an annual basis. Given otherwise similar conditions, greater depth to groundwater would yield greater volumetric potential, greater injection rates, and either unchanged or increased efficiency. / Graduation date: 2013
|
4 |
Desempenho do sistema de irrigação por aspersão, tipo pivô central rebocável. / Performance of the sprinkler irrigation system, central towing pivot type.SILVA, Jonas Carlos Santino. 24 May 2018 (has links)
Submitted by Deyse Queiroz (deysequeirozz@hotmail.com) on 2018-05-24T19:31:37Z
No. of bitstreams: 1
JONAS CARLOS SANTINO - DISSERTAÇÃO PPGEA 2002..pdf: 8664728 bytes, checksum: 1073738dd910a18404b200d943f370f2 (MD5) / Made available in DSpace on 2018-05-24T19:31:37Z (GMT). No. of bitstreams: 1
JONAS CARLOS SANTINO - DISSERTAÇÃO PPGEA 2002..pdf: 8664728 bytes, checksum: 1073738dd910a18404b200d943f370f2 (MD5)
Previous issue date: 2002-08 / O objetivo deste trabalho foi avaliar a performance de um pivô central rebocável em nível de campo, na Fazenda Capim, localizada no município de Capim-PB. Os resultados da avaliação do equipamento nas três bases estudada levaram a concluir que: o equipamento apresentou bons resultados quando analisado como um todo para as base, constatou-se problemas de uniformidade e de eficiência em alguns setores quando a análise foi feita por raio individualmente, indicando setores com déficit e outros com excesso de água, o equipamento apresentou seus piores resultados quando foi avaliado na base 5, verificou-se que existe uma dispersão das lâminas de água aplicadas em relação ao valor médio, em todas as bases estudadas, as pressões de entrada do pivô nas bases, 4 e 5 estão muito abaixo da pressão recomendada o que resultou numa baixa vazão nestas bases. / The objective of this work was to evaluate the performance of a towable pivot center in
thefield on Fazenda Capim in the municipal district of Capim-PB. The results of the
equipment evaluation of the three studied bases showed that the equipment presented
goodresults when analyzed as a whole for them. Uniformity and efficiency problems in some
sections were analyzed when the analysis was individually made by ratio, indicating sections
with deficit and other with excess of water. The equipment presented their worst results when
it was evaluated in base 5. It was verified a dispersion of the applied water sheets in relation
to the mean value, in ali studied bases. The pressures on pivot in bases 4 and 5 are a lot below
the recommended pressure and resulted in low flow in these bases.
|
Page generated in 0.0836 seconds