• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 91
  • 30
  • 11
  • 11
  • 8
  • 5
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 185
  • 75
  • 52
  • 40
  • 29
  • 28
  • 24
  • 23
  • 23
  • 21
  • 19
  • 19
  • 18
  • 18
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Performance evaluation of Raspberry pi 3B as a web server : Evaluating the performance of Raspberry pi 3B as a web server using nginx and apache2

Basha, Israel Tekahun, Istifanos, Meron January 2020 (has links)
Context. During the usage of a product, evaluating its performance quality is a crucial procedure. Web servers are one of the most used technological products in today’s modern world[1]. Thus, in this thesis we will evaluate and compare the performances of two web servers. The servers that are tested in the experiment are a raspberry pi 3B and a personal HP laptop. Objectives. The main objective of the study is to evaluate the performance of a raspberry pi 3B as a web server. In order to give a clearer image of how the raspberry pi performs, the laptop will also be evaluated and its performance will be used as a contrast during the study. Realization. To fulfill our objective, an experiment was conducted with the help of a performance testing tool called apache bench. To provide comprehensive performance results, the served content and the server software were altered throughout the experiment. The number of simulated users sending the requests were also altered. Results. The results were gathered by sending more than 1000 HTTP-requests to the two servers that served static and dynamic websites. The number of served requests per second and the CPU consumption of the servers were the recorded results. The raspberry pi exhibited response times as high as 1164 requests per second and CPU consumption that varied between ≈6% and ≈40%. In comparison to the laptop, on one occasion it exhibited a better processor utilization serving HTTPrequests of one user. Conclusions. Regardless of the used server software, although the laptop was slightly performing better, the raspberry pi had a closer response time in comparison to the laptop when both of them were serving a static website. On the contrary, when both were serving dynamic content the raspberry pi had a very low response time in the comparison. Out of the two used server software, nginx gave it a better CPU consumption in contrast to the laptop that had a better processor. That is irrespective of the served content type.
52

Tolkning av handskrivna siffror i formulär : Betydelsen av datauppsättningens storlek vid maskininlärning

Kirik, Engin January 2021 (has links)
Forskningen i denna studie har varit att tag fram hur mycket betydelse storleken på datauppsättningen har för inverkan på resultat inom objektigenkänning. Forskningen implementerades i att träna en modell inom datorseende som skall kunna identifiera och konvertera handskrivna siffror från fysisk-formulär till digitaliserad-format. Till denna process användes två olika ramverk som heter TensorFlow och PyTorch. Processen tränades inom två olika miljöer, ena modellen tränades i CPU-miljö och den andra i Google Clouds GPU-miljö. Tanken med studien är att förbättra resultat från tidigare examensarbete och forska vidare till att utöka utvecklingen extra genom att skapa en modell som identifierar och digitaliserar flera handskrivna siffror samtidigt på ett helt formulär. För att vidare i fortsättningen kunna användas till applikationer som räknar ihop tex poängskörden på ett formulär med hjälp av en mobilkamera för igenkänning. Projektet visade ett resultat av ett felfritt igenkännande av flera siffror samtidigt, när datauppsättningen ständigt utökades. Resultat kring enskilda siffror lyckades identifiera alla siffror från 0 till 9 med både ramverket TensorFlow och PyTorch. / The research in this study has been to extract how important the size of the dataset is for the impact on results within object recognition. The research was implemented in training a model in computer vision that should be able to identify and convert handwritten numbers from physical forms to digitized format. Two different frameworks called TensorFlow and PyTorch were used for this process. The process was trained in two different environments, one model was trained in the CPU environment and the other in the Google Cloud GPU environment. The idea of the study is to improve results from previous degree projects and further research to expand the development extra by creating a model that identifies and digitizes several handwritten numbers simultaneously on a complete form, which will continue to be able to help and be used in the future for applications that sums up points on a form using a mobile camera for recognition. The project showed a result of an error-free recognition of several numbers at the same time, when the data set was constantly expanded. Results around individual numbers managed to identify all numbers from 0 to 9 with both the TensorFlow and PyTorch frameworks.
53

Acceleration of FreeRTOS withSierra RTOS accelerator : Implementation of a FreeRTOS software layer onSierra RTOS accelerator / Accelerering av FreeRTOS medSierra RTOS accelerator : Implementering av ett FreeRTOS mjukvarulagerpå Sierra RTOS accelerator

Borgström, Fredrik January 2016 (has links)
Today, the effect of the most common ways to improve the performance of embedded systems and real-time operating systems is stagnating. Therefore it is interesting to examine new ways to push the performance boundaries of embedded systems and real-time operating systems even further. It has previously been demonstrated that the hardware-based real-time operating system, Sierra, has better performance than the software-based real-time operating system, FreeRTOS. These real-time operating systems have also been shown to be similar in many aspects, which mean that it is possible for Sierra to accelerate FreeRTOS. In this thesis an implementation of such acceleration has been carried out. Because existing real-time operating systems are constantly in development combined with that it was several years since an earlier comparison between the two real-time operating systems was per-formed, FreeRTOS and Sierra were compared in terms of functionality and architecture also in this thesis. This comparison showed that FreeRTOS and Sierra share the most fundamental functions of a real-time operating system, and thus can be accelerated by Sierra, but that FreeRTOS also has a number of exclusive functions to facilitate the use of that real-time operating system. The infor-mation obtained by this comparison was the very essence of how the acceleration would be imple-mented. After a number of performance tests it could be concluded that all of the implemented functions, with the exception of a few, had shorter execution time than the corresponding functions in the original version of FreeRTOS. / Idag är effekten av de vanligaste åtgärderna för att förbättra prestandan av inbyggda system och realtidsoperativsystem väldigt liten. På grund av detta är det intressant att undersöka nya åtgärder för att tänja prestandagränserna av inbyggda system och realtidsoperativsystem ytterliggare. Det har tidigare påvisats att det hårdvarubaseraderealtidsoperativsystemet, Sierra, har bättre prestanda än det mjukvarubaseraderealtidsoperativsystemet, FreeRTOS. Dessa realtidsoperativsystem har även visats vara lika i flera aspekter, vilket betyder att det är möjligt för Sierra att accelererera FreeRTOS. I detta examensarbete har en implementering av en sådan acceleration genomförts. Eftersom befintliga realtidsoperativsystem ständigtär i utveckling i kombination med att det är flera år sedan som en tidigare jämförelse mellan de båda systemen utfördes, så jämfördes FreeRTOS och Sierra i fråga om funktionalitet och uppbyggnad även i detta examensarbete.Denna jämförelse visade att FreeRTOS och Sierra delar de mest grundläggande funktionerna av ett realtidsoperativsystem, och som därmed kan accelereras av Sierra, men att FreeRTOS även har ett antal exklusiva funktioner för att underlätta användningen av det realtidsoperativsystemet. Informationen som erhölls av denna jämförelse var sedan grunden för hur själva accelerationen skulle implementeras. Efter ett antal prestandatesterkunde det konstateras att alla implementerade funktioner, med undantag för ett fåtal, hade kortare exekveringstid än motsvarande funktioner i ursprungsversionen av FreeRTOS.
54

Network Implementation with TCP Protocol : A server on FPGA handling multiple connections / Nätverks implementering med TCP protokoll : En server på FPGA som hanterar flera anslutningar

Li, Ruobing January 2022 (has links)
The growing number of players in Massively Multiplayer Online games puts a heavy load on the network infrastructure and the general-purpose CPU of the game servers. A game server’s network stack processing needs equal treatment to the game-related processing ability. It is a fact that the networkcommunication tasks on the CPU reach the same order of magnitude as the game-related tasks, and the computing capability of the CPU can be a factor that limits the maximum number of players. Therefore, CPU offloading is becoming vital. FPGAs play an essential role in dedicated computation and network communication due to their superiority in flexibility and computation-oriented efficiency. Thus, an FPGA can be a good hardware platform to implement a network stack to replace the CPU in processing the network computations. However, most commercial and open-source network stack IPs support only one or few connections. This thesis project explores a network server on FPGA, implemented in RTL, that can handle multiple connections, specialized in the TCP protocol. The design in this project adds a cached memory hierarchy that provides a filter against port numbers of multiple connections from the same application and an Application Layer Controller, based on an open-source Ethernet, to increase the number of TCP connections further. A proof of concept was built, and its performance was tested. As a result, the TCP server on the FPGA was designed to handle a maximum of 40 configurable connections, but only 25 connections could be maintained during operation due to operational latency constraints. This FPGA server solution provides a latency of 1 ms in LAN. The babbling idiot and out-of-order packet transfer tests from clientswere also performed to guarantee robustness. During testing, poor performance in Packet Loss and Packet Error Handling was noted. In the future, this issue needs to be addressed. In addition, further investigations of methods for expanding the cache need to be done to allow handling more clients. / Det växande antalet spelare i Massively Multiplayer Online-spel belastar nätverksinfrastrukturen och spelservrarnas CPU:er. En spelservers förmåga att bearbeta nätverksstacken måste behandlas lika med den spelrelaterade bearbetningsförmågan. Det är ett faktum att nätverkskommunikationsuppgifterna på processorn når samma storleksordning som de spelrelaterade uppgifterna, och processorns beräkningsförmåga kan vara en faktor som begränsar det maximala antalet spelare. Därför blir avlastning av CPU-viktig. FPGA:er spelar en viktig roll i dedikerad beräkning och nätverkskommunikation på grund av dess överlägsenhet vad gäller flexibilitet och beräkningsorienterad effektivitet. Således kan en FPGA vara en bra hårdvaruplattform för att implementera en nätverksstack, för att ersätta CPU:n vid bearbetning av nätverksberäkningsarna. Men, de flesta kommersiella och öppna källkodsnätverksstack- IP:er stöder dock bara en eller ett fåtal anslutningar. Detta examensarbete utforskar en nätverksserver på FPGA, implementerad mha RTL, som kan hantera flera anslutningar, specialiserad på TCP-protokollet. Designen i detta projekt lägger till en cachad minneshierarki som ger ett filter mot portnummer för flera anslutningar från samma applikation och en Application Layer Controller, baserad på öppen källkod för Ethernet, för att öka antalet TCP-anslutningar ytterligare. Ett proof of concept byggdes och dess prestanda testades. Som ett resultat designades TCP-servern på FPGA:n att kunna hantera maximalt 40 konfigurerbara anslutningar, men endast 25 anslutningar kunde bibehållas under drift på grund av driftsfördröjningar. Denna FPGA-serverlösning ger en latens på 1 ms i LAN. Tester inkluderande den babblande idioten och out-of-order paketöverföring från klienter utfördes också för att garantera robusthet. Under testningen noterades dålig prestanda i paketförlust och paketfelshantering. I framtiden måste denna fråga åtgärdas. Dessutom behöver ytterligare undersökningar av metoder för att utöka cachen göras för att kunna hantera fler klienter.
55

SAP HANA: The Evolution from a Modern Main-Memory Data Platform to an Enterprise Application Platform

Sikka, Vishal, Färber, Franz, Goel, Anil, Lehner, Wolfgang 10 January 2023 (has links)
Sensors in smart-item environments capture data about product conditions and usage to support business decisions as well as production automation processes. A challenging issue in this application area is the restricted quality of sensor SAP HANA is a pioneering, and one of the best performing, data platform designed from the grounds up to heavily exploit modern hardware capabilities, including SIMD, and large memory and CPU footprints. As a comprehensive data management solution, SAP HANA supports the complete data life cycle encompassing modeling, provisioning, and consumption. This extended abstract outlines the vision and planned next step of the SAP HANA evolution growing from a core data platform into an innovative enterprise application platform as the foundation for current as well as novel business applications in both on-premise and on-demand scenarios. We argue that only a holistic system design rigorously applying co-design at different levels may yield a highly optimized and sustainable platform for modern enterprise applications.
56

Prediction of 5G system latency contribution for 5GC network functions / Förutsägelse av 5G-systemets latensbidrag för 5GC-nätverksfunktioner

Cheng, Ziyu January 2023 (has links)
End-to-end delay measurement is deemed crucial for network models at all times as it acts as a pivotal metric of the model’s effectiveness, assists in delineating its performance ceiling, and stimulates further refinement and enhancement. This premise holds true for 5G Core Network (5GC) models as well. Commercial 5G models, with their intricate topological structures and requirement for reduced latencies, necessitate an effective model to anticipate each server’s current latency and load levels. Consequently, the introduction of a model for estimating the present latency and load levels of each network element server would be advantageous. The central content of this article is to record and analyze the packet data and CPU load data of network functions running at different user counts as operational data, with the data from each successful operation of a service used as model data for analyzing the relationship between latency and CPU load. Particular emphasis is placed on the end-to-end latency of the PDU session establishment scenario on two core functions - the Access and Mobility Management Function (AMF) and the Session Management Function (SMF). Through this methodology, a more accurate model has been developed to review the latency of servers and nodes when used by up to 650, 000 end users. This approach has provided new insights for network level testing, paving the way for a comprehensive understanding of network performance under various conditions. These conditions include strategies such as "sluggish start" and "delayed TCP confirmation" for flow control, or overload situations where the load of network functions exceeds 80%. It also identifies the optimal performance range. / Latensmätningar för slutanvändare anses vara viktiga för nätverksmodeller eftersom de fungerar som en måttstock för modellens effektivitet, hjälper till att definiera dess prestandatak samt bidrar till vidare förfining och förbättring. Detta antagande gäller även för 5G kärnnätverk (5GC). Kommersiella 5G-nätverk med sin komplexa topologi och krav på låg latens, kräver en effektiv modell för att prediktera varje servers aktuella last och latensbidrag. Följdaktligen behövs en modell som beskriver den aktuella latensen och dess beroende till lastnivå hos respektive nätverkselement. Arbetet består i att samla in och analysera paketdata och CPU-last för nätverksfunktioner i drift med olika antal slutanvändare. Fokus ligger på tjänster som används som modelldata för att analysera förhållandet mellan latens och CPU-last. Särskilt fokus läggs på latensen för slutanvändarna vid PDU session-etablering för två kärnfunktioner – Åtkomst- och mobilitetshanteringsfunktionen (AMF) samt Sessionshanteringsfunktionen (SMF). Genom denna metodik har en mer exakt modell tagits fram för att granska latensen för servrar och noder vid användning av upp till 650 000 slutanvändare. Detta tillvägagångssätt har givit nya insikter för nätverksnivåtestningen, vilket banar väg för en omfattande förståelse för nätverprestanda under olika förhållanden. Dessa förhållanden inkluderar strategier som ”trög start” och ”fördröjd TCP bekräftelse” för flödeskontroll, eller överlastsituationer där lasten hos nätverksfunktionerna överstiger 80%. Det identifierar också det optimala prestandaområdet.
57

Suitability of FPGA-based computing for cyber-physical systems

Lauzon, Thomas Charles 18 August 2010 (has links)
Cyber-Physical Systems theory is a new concept that is about to revolutionize the way computers interact with the physical world by integrating physical knowledge into the computing systems and tailoring such computing systems in a way that is more compatible with the way processes happen in the physical world. In this master’s thesis, Field Programmable Gate Arrays (FPGA) are studied as a potential technological asset that may contribute to the enablement of the Cyber-Physical paradigm. As an example application that may benefit from cyber-physical system support, the Electro-Slag Remelting process - a process for remelting metals into better alloys - has been chosen due to the maturity of its related physical models and controller designs. In particular, the Particle Filter that estimates the state of the process is studied as a candidate for FPGA-based computing enhancements. In comparison with CPUs, through the designs and experiments carried in relationship with this study, the FPGA reveals itself as a serious contender in the arsenal of v computing means for Cyber-Physical Systems, due to its capacity to mimic the ubiquitous parallelism of physical processes. / text
58

True random number generation using genetic algorithms on high performance architectures

MIJARES CHAN, JOSE JUAN 01 September 2016 (has links)
Many real-world applications use random numbers generated by pseudo-random number and true random number generators (TRNG). Unlike pseudo-random number generators which rely on an input seed to generate random numbers, a TRNG relies on a non-deterministic source to generate aperiodic random numbers. In this research, we develop a novel and generic software-based TRNG using a random source extracted from compute architectures of today. We show that the non-deterministic events such as race conditions between compute threads follow a near Gamma distribution, independent of the architecture, multi-cores or co-processors. Our design improves the distribution towards a uniform distribution ensuring the stationarity of the sequence of random variables. We improve the random numbers statistical deficiencies by using a post-processing stage based on a heuristic evolutionary algorithm. Our post-processing algorithm is composed of two phases: (i) Histogram Specification and (ii) Stationarity Enforcement. We propose two techniques for histogram equalization, Exact Histogram Equalization (EHE) and Adaptive EHE (AEHE) that maps the random numbers distribution to a user-specified distribution. EHE is an offline algorithm with O(NlogN). AEHE is an online algorithm that improves performance using a sliding window and achieves O(N). Both algorithms ensure a normalized entropy of (0:95; 1:0]. The stationarity enforcement phase uses genetic algorithms to mitigate the statistical deficiencies from the output of histogram equalization by permuting the random numbers until wide-sense stationarity is achieved. By measuring the power spectral density standard deviation, we ensure that the quality of the numbers generated from the genetic algorithms are within the specified level of error defined by the user. We develop two algorithms, a naive algorithm with an expected exponential complexity of E[O(eN)], and an accelerated FFT-based algorithm with an expected quadratic complexity of E[O(N2)]. The accelerated FFT-based algorithm exploits the parallelism found in genetic algorithms on a homogeneous multi-core cluster. We evaluate the effects of its scalability and data size on a standardized battery of tests, TestU01, finding the tuning parameters to ensure wide-sense stationarity on long runs. / October 2016
59

CPU Performance Evaluation for 2D Voronoi Tessellation

Olsson, Victor, Eklund, Viktor January 2019 (has links)
Voronoi tessellation can be used within a couple of different fields. Some of these fields include healthcare, construction and urban planning. Since Voronoi tessellations are used in multiple fields, it is motivated to know the strengths and weaknesses of the algorithms used to generate them, in terms of their efficiency. The objectives of this thesis are to compare two CPU algorithm implementations for Voronoi tessellation in regards to execution time and see which of the two is the most efficient. The algorithms compared are The Bowyer-Watson algorithm and Fortunes algorithm. The Fortunes algorithm used in the research is based upon a pre-existing Fortunes implementation while the Bowyer-Watson implementation was specifically made for this research. Their differences in efficiency were determined by measuring their execution times and comparing them. This was done in an iterative manner, where for each iteration, the amount of data to be computed was continuously increased. The results show that Fortunes algorithm is more efficient on the CPU without using any acceleration techniques for any of the algorithms. It took 70 milliseconds for the Bowyer-Watson method to calculate 3000 input points while Fortunes method took 12 milliseconds under the same conditions. As a conclusion, Fortunes algorithm was more efficient due to the Bowyer-Watson algorithm doing unnecessary calculations. These calculations include checking all the triangles for every new point added. A suggestion for improving the speed of this algorithm would be to use a nearest neighbour search technique when searching through triangles.
60

Plateforme de calcul parallèle « Design for Demise » / Parallel computing platform « Design for Demise »

Plazolles, Bastien 10 January 2017 (has links)
Les risques liés aux débris spatiaux sont à présent considérés comme critiques par les gouvernements et les agences spa-tiales internationales. Durant la dernière décennie les agences spatiales ont développé des logiciels pour simuler la rentrée atmosphérique des satellites et des stations orbitales afin de déterminer les risques et possibles dommages au sol. Néan-moins les outils actuels fournissent des résultats déterministes alors que les modèles employés utilisent des valeurs de paramètres qui sont mal connues. De plus les résultats obtenus dépendent fortement des hypothèses qui sont faites. Une solution pour obtenir des résultats pertinents et exploitables est de prendre en considération les incertitudes que l’on a sur les différents paramètres de la modélisation afin d’effectuer des analyses de type Monte-Carlo. Mais une telle étude est particulièrement gourmande en temps de calcul à cause du grand espace des paramètres à explorer (ce qui nécessite des centaines de milliers de simulations numériques). Dans le cadre de ces travaux de thèse nous proposons un nouveau logiciel de simulation numérique de rentrée atmosphérique de satellite, permettant de façon native de prendre en consi-dération les incertitudes sur les différents paramètres de modélisations pour effectuer des analyses statistiques. Afin de maitriser les temps de calculs cet outil tire avantage de la méthode de Taguchi pour réduire le nombre de paramètres à étudier et aussi des accélérateurs de calculs de type Graphics Processing Units (GPUs) et Intel Xeon Phi. / The risk of space debris is now perceived as primordial by government and international space agencies. Since the last decade, international space agencies have developed tools to simulate the re-entry of satellites and orbital stations in order to assess casualty risk on the ground. Nevertheless , all current tools provide deterministic solutions, though models include various parameters that are not well known. Therefore, the provided results are strongly dependent on the as-sumptions made. One solution to obtain relevant and exploitable results is to include uncertainties around those parame-ters in order to perform Monte-Carlo analysis. But such a study is very time consuming due to the large parameter space to explore (that necessitate hundreds of thousands simulations). As part of this thesis work we propose a new satellite atmospheric reentry simulation to perform statistical analysis. To master computing time this tool takes advantage of Taguchi method to restrain the amount of parameter to study and also takes advantage of computing accelerators like Graphic Processing Units (GPUs) and Intel Xeon Phi.

Page generated in 0.0281 seconds