Global ETD Search

271	Optimization of the Photovoltaic Time-series Analysis Process Through Hybrid Distributed Computing Hwang, Suk Hyun 01 June 2020 (has links) No description available. Computer Science
272	Mining Formal Concepts in Large Binary Datasets using Apache Spark Rayabarapu, Varun Raj 29 September 2021 (has links) No description available. Computer Science Formal Concepts Apache Spark Large Binary Datasets Scalable Algorithm Data Mining
273	Development of an Apache Spark-Based Framework for Processing and Analyzing Neuroscience Big Data: Application in Epilepsy Using EEG Signal Data Zhang, Jianzhe 07 September 2020 (has links) No description available. Bioinformatics Computer Science Network analysis Apache Spark Distributed computing
274	Ablation Programming for Machine Learning Sheikholeslami, Sina January 2019 (has links) As machine learning systems are being used in an increasing number of applications from analysis of satellite sensory data and health-care analytics to smart virtual assistants and self-driving cars they are also becoming more and more complex. This means that more time and computing resources are needed in order to train the models and the number of design choices and hyperparameters will increase as well. Due to this complexity, it is usually hard to explain the effect of each design choice or component of the machine learning system on its performance.A simple approach for addressing this problem is to perform an ablation study, a scientific examination of a machine learning system in order to gain insight on the effects of its building blocks on its overall performance. However, ablation studies are currently not part of the standard machine learning practice. One of the key reasons for this is the fact that currently, performing an ablation study requires major modifications in the code as well as extra compute and time resources.On the other hand, experimentation with a machine learning system is an iterative process that consists of several trials. A popular approach for execution is to run these trials in parallel, on an Apache Spark cluster. Since Apache Spark follows the Bulk Synchronous Parallel model, parallel execution of trials includes several stages, between which there will be barriers. This means that in order to execute a new set of trials, all trials from the previous stage must be finished. As a result, we usually end up wasting a lot of time and computing resources on unpromising trials that could have been stopped soon after their start.We have attempted to address these challenges by introducing MAGGY, an open-source framework for asynchronous and parallel hyperparameter optimization and ablation studies with Apache Spark and TensorFlow. This framework allows for better resource utilization as well as ablation studies and hyperparameter optimization in a unified and extendable API. / Eftersom maskininlärningssystem används i ett ökande antal applikationer från analys av data från satellitsensorer samt sjukvården till smarta virtuella assistenter och självkörande bilar blir de också mer och mer komplexa. Detta innebär att mer tid och beräkningsresurser behövs för att träna modellerna och antalet designval och hyperparametrar kommer också att öka. På grund av denna komplexitet är det ofta svårt att förstå vilken effekt varje komponent samt designval i ett maskininlärningssystem har på slutresultatet.En enkel metod för att få insikt om vilken påverkan olika komponenter i ett maskinlärningssytem har på systemets prestanda är att utföra en ablationsstudie. En ablationsstudie är en vetenskaplig undersökning av maskininlärningssystem för att få insikt om effekterna av var och en av dess byggstenar på dess totala prestanda. Men i praktiken så är ablationsstudier ännu inte vanligt förekommande inom maskininlärning. Ett av de viktigaste skälen till detta är det faktum att för närvarande så krävs både stora ändringar av koden för att utföra en ablationsstudie, samt extra beräkningsoch tidsresurser.Vi har försökt att ta itu med dessa utmaningar genom att använda en kombination av distribuerad asynkron beräkning och maskininlärning. Vi introducerar maggy, ett ramverk med öppen källkodsram för asynkron och parallell hyperparameteroptimering och ablationsstudier med PySpark och TensorFlow. Detta ramverk möjliggör bättre resursutnyttjande samt ablationsstudier och hyperparameteroptimering i ett enhetligt och utbyggbart API. Distributed Machine Learning Distributed Systems Ablation Studies Apache Spark Keras Hopsworks Computer and Information Sciences Data- och informationsvetenskap
275	Matrix Multiplications on Apache Spark through GPUs / Matrismultiplikationer på Apache Spark med GPU Safari, Arash January 2017 (has links) In this report, we consider the distribution of large scale matrix multiplications across a group of systems through Apache Spark, where each individual system utilizes Graphical Processor Units (GPUs) in order to perform the matrix multiplication. The purpose of this thesis is to research whether the GPU's advantage in performing parallel work can be applied to a distributed environment, and whether it scales noticeably better than a CPU implementation in a distributed environment. This question was resolved by benchmarking the different implementations at their peak. Based on these benchmarks, it was concluded that GPUs indeed do perform better as long as single precision support is available in the distributed environment. When single precision operations are not supported, GPUs perform much worse due to the low double precision performance of most GPU devices. / I denna rapport betraktar vi fördelningen av storskaliga matrismultiplikationeröver ett Apache Spark kluster, där varje system i klustret delegerar beräkningarnatill grafiska processorenheter (GPU). Syftet med denna avhandling är attundersöka huruvida GPU:s fördel vid parallellt arbete kan tillämpas på en distribuerad miljö, och om det skalar märkbart bättre än en CPU-implementationi en distribuerad miljö. Detta gjordes genom att testa de olika implementationerna i en miljö däroptimal prestanda kunde förväntas. Baserat på resultat ifrån dessa tester drogsslutsatsen att GPU-enheter preseterar bättre än CPU-enheter så länge ramverkethar stöd för single precision beräkningar. När detta inte är fallet så presterar deflesta GPU-enheterna betydligt sämre på grund av deras låga double-precisionprestanda. GPU Matrix multiplication Apache Spark Cluster Distributed Matrix Multiplication Computer Sciences Datavetenskap (datalogi)
276	MECHANISTIC UNDERSTANDING OF PHASE STABILITY, TRANSFORMATION, AND STRENGTHENING MECHANISMS IN LIGHTWEIGHT HIGH ENTROPY ALLOYS AND HIGH ENTROPY CERAMICS Walunj, Ganesh Shankar 01 September 2022 (has links) No description available. Mechanical Engineering Materials Science High Entropy Alloy Spark Plasma Sintering Mechanical Alloying
277	Investigation of the structural and mechanical properties of micro-/nano-sized Al2O3 and cBN composites prepared by spark plasma sintering Irshad, H.M., Ahmed, B.A., Ehsan, M.A., Khan, Tahir I., Laoui, T., Yousaf, M.R., Ibrahim, A., Hakeem, A.S. 27 May 2017 (has links) Yes / Alumina-cubic boron nitride (cBN) composites were prepared using the spark plasma sintering (SPS) technique. Alpha-alumina powders with particle sizes of ∼15 µm and ∼150 nm were used as the matrix while cBN particles with and without nickel coating were used as reinforcement agents. The amount of both coated and uncoated cBN reinforcements for each type of matrix was varied between 10 to 30 wt%. The powder materials were sintered at a temperature of 1400 °C under a constant uniaxial pressure of 50 MPa. We studied the effect of the size of the starting alumina powder particles, as well as the effect of the nickel coating, on the phase transformation from cBN to hBN (hexagonal boron nitride) and on the thermo-mechanical properties of the composites. In contrast to micro-sized alumina, utilization of nano-sized alumina as the starting powder was observed to have played a pivotal role in preventing the cBN-to-hBN transformation. The composites prepared using nano-sized alumina reinforced with nickel-coated 30 wt% cBN showed the highest relative density of 99% along with the highest Vickers hardness (Hv2) value of 29 GPa. Because the compositions made with micro-sized alumina underwent the phase transformation from cBN to hBN, their relative densification as well as hardness values were relatively low (20.9–22.8 GPa). However, the nickel coating on the cBN reinforcement particles hindered the cBN-to-hBN transformation in the micro-sized alumina matrix, resulting in improved hardness values of up to 24.64 GPa. Alumina Cubic boron nitride Spark plasma sintering Structural properties Mechanical properties Phase transformation
278	Big Data Analytics Using Apache Flink for Cybercrime Forensics on X (formerly known as Twitter) / Big Data Analytics Using Apache Flink for Cybercrime Forensics on X (formerly known as Twitter) Kakkepalya Puttaswamy, Manjunath January 2023 (has links) The exponential growth of social media usage has led to massive data sharing, posing challenges for traditional systems in managing and analyzing such vast amounts of data. This surge in data exchange has also resulted in an increase in cyber threats from individuals and criminal groups. Traditional forensic methods, such as evidence collection and data backup, become impractical when dealing with petabytes or terabytes of data. To address this, Big Data Analytics has emerged as a powerful solution for handling and analyzing structured and unstructured data. This thesis explores the use of Apache Flink, an open-source tool by the Apache Software Foundation, to enhance cybercrime forensic research. Unlike batch processing engines like Apache Spark, Apache Flink offers real-time processing capabilities, making it well-suited for analyzing dynamic and time-sensitive data streams. The study compares Apache Flink's performance against Apache Spark in handling various workloads on a single node. The literature review reveals a growing interest in utilizing Big Data Analytics, including platforms like Apache Flink, for cybercrime detection and investigation, especially on social media platforms like X (formerly known as Twitter). Sentiment analysis is a vital technique, but challenges arise due to the unique nature of social data. X (formerly known as Twitter), as a valuable source for cybercrime forensics, enables the study of fraudulent, extremist, and other criminal activities. This research explores various data mining techniques and emphasizes the need for real-time analytics to combat cybercrime effectively. The methodology involves data collection from X, preprocessing to remove noise, and sentiment analysis to identify cybercrime-related tweets. The comparative analysis between Apache Flink and Apache Spark demonstrates Flink's efficiency in handling larger datasets and real-time processing. Parallelism and scalability are evaluated to optimize performance. The results indicate that Apache Flink outperforms Apache Spark regarding response time, making it a valuable tool for cybercrime forensics. Despite progress, challenges such as data privacy, accuracy improvement, and cross-platform analysis remain. Future research should focus on refining algorithms, enhancing scalability, and addressing these challenges to further advance cybercrime forensics using Big Data Analytics and platforms like Apache Flink. Apache Flink Apache Spark Big Data Twitter X Computer Sciences Datavetenskap (datalogi)
279	Biocompatibility evaluation of sintered biomedical Ti-24Nb-4Zr-8Sn (Ti2448) alloy produced using spark plasma sintering (SPS). Madonsela, Jerman S. January 2018 (has links) M. Tech. (Department of Metallurgical Engineering, Faculty of Engineering Technology), Vaal University of Technology. / Solid titanium (Ti), Ti-6Al-4V (wt.%), and Ti-24Nb-4Zr-8Sn (wt.%) materials were fabricated from powders using spark plasma sintering (SPS). The starting materials comprised of elemental powders of ASTM Grade 4 titanium (Ti), aluminium (Al), vanadium (V), niobium (Nb), zirconium (Zr), and tin (Sn). The powders were initially characterised and milled prior to sintering. The micronpowders were milled in an attempt to produce materials with nanostructured grains and as a result improved hardness and wear resistance. The produced solid Ti-24Nb-4Zr-8Sn alloy was compared to solid titanium (Ti) and Ti-6Al-4V (Ti64) on the basis of density, microstructure, hardness, corrosion, and biocompatibility. Relative densities above 99.0% were achieved for all three systems. CP-Ti and Ti64 had both 100% relative density, and Ti2448 showed a slightly lower density of 99.8%. Corrosion results showed that all three materials exhibited good corrosion resistance due to the formation of a protective passive film. In 0.9% NaCl Ti2448 had the highest current density (9.05 nA/cm2), implying that its corrosion resistance is relatively poor in comparison to Ti (6.41 nA/cm2) and Ti64 (5.43 nA/cm2), respectively. The same behavior was observed in Hank's solution. In cell culture medium, Ti2448 showed better corrosion resistance with the lowest current density of 2.96 nA/cm2 compared to 4.86 nA/cm2 and 5.62 nA/cm2 of Ti and Ti64 respectively. However, the current densities observed are quite low and insignificant that they lie within acceptable ranges for Ti2448 to be qualified as a biomaterial. Cell proliferation test was performed using murine osteoblastic cells, MC3T3-E1 at two cell densities; 400 and 4000 cells/mL for 7 days incubation. Pure titanium showed better cell attachment and proliferation under both conditions suggesting that the presence of other oxide layers influence cell proliferation. No significant difference in cell proliferation was observed between Ti64 and Ti2448. Biocompatibility Alloys Titanium Spark plasma sintering Dissertations, Academic -- South Africa. Biomedical materials. Titanium alloys. Sintering.
280	Auto-Tuning Apache Spark Parameters for Processing Large Datasets / Auto-Optimering av Apache Spark-parametrar för bearbetning av stora datamängder Zhou, Shidi January 2023 (has links) Apache Spark is a popular open-source distributed processing framework that enables efficient processing of large amounts of data. Apache Spark has a large number of configuration parameters that are strongly related to performance. Selecting an optimal configuration for Apache Spark application deployed in a cloud environment is a complex task. Making a poor choice may not only result in poor performance but also increases costs. Manually adjusting the Apache Spark configuration parameters can take a lot of time and may not lead to the best outcomes, particularly in a cloud environment where computing resources are allocated dynamically, and workloads can fluctuate significantly. The focus of this thesis project is the development of an auto-tuning approach for Apache Spark configuration parameters. Four machine learning models are formulated and evaluated to predict Apache Spark’s performance. Additionally, two models for Apache Spark configuration parameter search are created and evaluated to identify the most suitable parameters, resulting in the shortest execution time. The obtained results demonstrates that with the developed auto-tuning approach and adjusting Apache Spark configuration parameters, Apache Spark applications can achieve a shorter execution time than when using the default parameters. The developed auto-tuning approach gives an improved cluster utilization and shorter job execution time, with an average performance improvement of 49.98%, 53.84%, and 64.16% for the three different types of Apache Spark applications benchmarked. / Apache Spark är en populär öppen källkodslösning för distribuerad databehandling som möjliggör effektiv bearbetning av stora mängder data. Apache Spark har ett stort antal konfigurationsparametrar som starkt påverkar prestandan. Att välja en optimal konfiguration för en Apache Spark-applikation som distribueras i en molnmiljö är en komplex uppgift. Ett dåligt val kan inte bara leda till dålig prestanda utan också ökade kostnader. Manuell anpassning av Apache Spark-konfigurationsparametrar kan ta mycket tid och leda till suboptimala resultat, särskilt i en molnmiljö där beräkningsresurser tilldelas dynamiskt och arbetsbelastningen kan variera avsevärt. Fokus för detta examensprojekt är att utveckla en automatisk optimeringsmetod för konfigurationsparametrarna i Apache Spark. Fyra maskininlärningsmodeller formuleras och utvärderas för att förutsäga Apache Sparks prestanda. Dessutom skapas och utvärderas två modeller för att söka efter de mest lämpliga konfigurationsparametrarna för Apache Spark, vilket resulterar i kortast möjliga exekveringstid. De erhållna resultaten visar att den utvecklade automatiska optimeringsmetoden, med anpassning av Apache Sparks konfigurationsparameterar, bidrar till att Apache Spark-applikationer kan uppnå kortare exekveringstider än vid användning av standard-parametrar. Den utvecklade metoden för automatisk optimering bidrar till en förbättrad användning av klustret och kortare exekveringstider, med en genomsnittlig prestandaförbättring på 49,98%, 53,84% och 64,16% för de tre olika typerna av Apache Spark-applikationer som testades. Apache Spark Cloud Environment Spark Configuration Parameter Resource Utilization Ridge Regression Elastic Net Random Forest Deep Neural Network Bayesian Optimization Particle Swarm Optimization. Apache Spark Molnmiljö Apache Spark konfigurationsparameter Resursutnyttjande Ridge-regression Elastisk nät Slumpskog Djupt neuralt nätverk Bayesiansk optimering Partikelsvärmsoptimering. Computer and Information Sciences Data- och informationsvetenskap

Search results