• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 44
  • 10
  • 6
  • 4
  • 4
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 84
  • 29
  • 21
  • 21
  • 16
  • 13
  • 11
  • 10
  • 8
  • 8
  • 8
  • 8
  • 7
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Optimizacinių ekonominės konkurencijos modelių tyrimas ir jų realizavimas interneto aplinkoje / Research of economical competition models and internet based realization of them

Skruodys, Nerijus 31 May 2004 (has links)
Future enterprises will be characterized by a focus on total quality, globalization, an object-oriented approach, and a business process-oriented approach. The globalization will lead to the 'virtual enterprise'. The virtual enterprise can obtain a competitive position by defining and reengineering its business processes. However, such reengineering requires an enterprise model. Also, as manufacturing is growing more difficult and competition gets more power, there are more useful to have practical realization of theoretic models. Traditional market theory is based on Walras model. There are lots of mathematic results, explaining some processes of competition of market. Programming realization would be a useful tool for simulating real economical processes. Realization on the internet would give possibility to show the algorithm on distance studies. The quality of service is defined by the average time lost by customers while waiting for services. Formal expressions of other quality measures are difficult. A customer prefers the server with lesser total service cost. The total cost includes the service price plus waiting losses. A customer goes away, if the total cost exceeds a certain critical level. A flow of customers is stochastic. Service times are stochastic, too. There is no known analytical solution for this model. The results are obtained by Monte-Carlo simulation. The first market model considers a simple case. Here the server rate depends on a single resource that... [to full text]
32

A Cross-Layer Perspective on Transport Protocol Performance in Wireless Networks

Alfredsson, Stefan January 2012 (has links)
Communication by wireless technologies has seen a tremendous growth in the last decades. Mobile phone technology and wireless broadband solutions are rapidly replacing the last-hop wireline connectivity for telephones and Internet access.  Research has, however, shown that Internet traffic can experience a performance degradation over wireless compared to wired networks.  The inherent properties of radio communication lead to a higher degree of unreliability, compared to communication by wire or fiber.  This can result in an increased amount of transmission errors, packet loss, delay and delay variations, which in turn affect the performance of the main Internet transport protocols TCP and UDP.  This dissertation examines the cross-layer relationship between wireless transmission and the resulting performance on the transport layer. To this end, experimental evaluations of TCP and UDP over a wireless 4G downlink system proposal are performed.  The experiment results show, in a holistic scenario, that link-level adaptive modulation, channel prediction, fast persistent link retransmissions, and channel scheduling, enables the transport protocols TCP and UDP to perform well and utilize the wireless link efficiently.  Further, a novel approach is proposed where a modified TCP receiver can choose to accept packets that are corrupted by bit errors. Results from network emulation experiments indicate that by accepting and acknowledging even small amounts of corrupted data, a much higher throughput can be maintained compared to standard TCP.
33

Source code optimizations to reduce multi core and many core performance bottlenecks / Otimizações de código fonte para reduzir gargalos de desempenho em multi core e many core

Serpa, Matheus da Silva January 2018 (has links)
Atualmente, existe uma variedade de arquiteturas disponíveis não apenas para a indústria, mas também para consumidores finais. Processadores multi-core tradicionais, GPUs, aceleradores, como o Xeon Phi, ou até mesmo processadores orientados para eficiência energética, como a família ARM, apresentam características arquiteturais muito diferentes. Essa ampla gama de características representa um desafio para os desenvolvedores de aplicações. Os desenvolvedores devem lidar com diferentes conjuntos de instruções, hierarquias de memória, ou até mesmo diferentes paradigmas de programação ao programar para essas arquiteturas. Para otimizar uma aplicação, é importante ter uma compreensão profunda de como ela se comporta em diferentes arquiteturas. Os trabalhos relacionados provaram ter uma ampla variedade de soluções. A maioria deles se concentrou em melhorar apenas o desempenho da memória. Outros se concentram no balanceamento de carga, na vetorização e no mapeamento de threads e dados, mas os realizam separadamente, perdendo oportunidades de otimização. Nesta dissertação de mestrado, foram propostas várias técnicas de otimização para melhorar o desempenho de uma aplicação de exploração sísmica real fornecida pela Petrobras, uma empresa multinacional do setor de petróleo. Os experimentos mostram que loop interchange é uma técnica útil para melhorar o desempenho de diferentes níveis de memória cache, melhorando o desempenho em até 5,3 e 3,9 nas arquiteturas Intel Broadwell e Intel Knights Landing, respectivamente. Ao alterar o código para ativar a vetorização, o desempenho foi aumentado em até 1,4 e 6,5 . O balanceamento de carga melhorou o desempenho em até 1,1 no Knights Landing. Técnicas de mapeamento de threads e dados também foram avaliadas, com uma melhora de desempenho de até 1,6 e 4,4 . O ganho de desempenho do Broadwell foi de 22,7 e do Knights Landing de 56,7 em comparação com uma versão sem otimizações, mas, no final, o Broadwell foi 1,2 mais rápido que o Knights Landing. / Nowadays, there are several different architectures available not only for the industry but also for final consumers. Traditional multi-core processors, GPUs, accelerators such as the Xeon Phi, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. This wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. Related work proved to have a wide variety of solutions. Most of then focused on improving only memory performance. Others focus on load balancing, vectorization, and thread and data mapping, but perform them separately, losing optimization opportunities. In this master thesis, we propose several optimization techniques to improve the performance of a real-world seismic exploration application provided by Petrobras, a multinational corporation in the petroleum industry. In our experiments, we show that loop interchange is a useful technique to improve the performance of different cache memory levels, improving the performance by up to 5.3 and 3.9 on the Intel Broadwell and Intel Knights Landing architectures, respectively. By changing the code to enable vectorization, performance was increased by up to 1.4 and 6.5 . Load Balancing improved the performance by up to 1.1 on Knights Landing. Thread and data mapping techniques were also evaluated, with a performance improvement of up to 1.6 and 4.4 . We also compared the best version of each architecture and showed that we were able to improve the performance of Broadwell by 22.7 and Knights Landing by 56.7 compared to a naive version, but, in the end, Broadwell was 1.2 faster than Knights Landing.
34

Low-cost memory analyses for efficient compilers / Analyses de mémoire à bas cout pour des compilateurs efficaces

Maalej Kammoun, Maroua 26 September 2017 (has links)
La rapidité, la consommation énergétique et l'efficacité des systèmes logiciels et matériels sont devenues les préoccupations majeures de la communauté informatique de nos jours. Gérer de manière correcte et efficace les problématiques mémoire est essentiel pour le développement des programmes de grande tailles sur des architectures de plus en plus complexes. Dans ce contexte, cette thèse contribue aux domaines de l'analyse mémoire et de la compilation tant sur les aspects théoriques que sur les aspects pratiques et expérimentaux. Outre l'étude approfondie de l'état de l'art des analyses mémoire et des différentes limitations qu'elles montrent, notre contribution réside dans la conception et l'évaluation de nouvelles analyses qui remédient au manque de précision des techniques publiées et implémentées. Nous nous sommes principalement attachés à améliorer l'analyse de pointeurs appartenant à une même structure de données, afin de lever une des limitations majeures des compilateurs actuels. Nous développons nos analyses dans le cadre général de l'interprétation abstraite « non dense ». Ce choix est motivé par les aspects de correction et d'efficacité : deux critères requis pour une intégration facile dans un compilateur. La première analyse que nous concevons est basée sur l'analyse d'intervalles des variables entières ; elle utilise le fait que deux pointeurs définis à l'aide d'un même pointeur de base n'aliasent pas si les valeurs possibles des décalages sont disjointes. La seconde analyse que nous développons est inspirée du domaine abstrait des Pentagones ; elle génère des relations d'ordre strict entre des paires de pointeurs comparables. Enfin, nous combinons et enrichissons les deux analyses précédentes dans un cadre plus général. Ces analyses ont été implémentées dans le compilateur LLVM. Nous expérimentons et évaluons leurs performances, et les comparons aux implémentations disponibles selon deux métriques : le nombre de paires de pointeurs pour lesquelles nous inférons le non-aliasing et les optimisations rendues possibles par nos analyses / This thesis was motivated by the emergence of massively parallel processing and supercomputingthat tend to make computer programming extremely performing. Speedup, the power consump-tion, and the efficiency of both software and hardware are nowadays the main concerns of theinformation systems community. Handling memory in a correct and efficient way is a step towardless complex and more performing programs and architectures. This thesis falls into this contextand contributes to memory analysis and compilation fields in both theoretical and experimentalaspects.Besides the deep study of the current state-of-the-art of memory analyses and their limitations,our theoretical results stand in designing new algorithms to recover part of the imprecisionthat published techniques still show. Among the present limitations, we focus our research onthe pointer arithmetic to disambiguate pointers within the same data structure. We develop ouranalyses in the abstract interpretation framework. The key idea behind this choice is correctness,and scalability: two requisite criteria for analyses to be embedded to the compiler construction.The first alias analysis we design is based on the range lattice of integer variables. Given a pair ofpointers defined from a common base pointer, they are disjoint if their offsets cannot have valuesthat intersect at runtime. The second pointer analysis we develop is inspired from the Pentagonabstract domain. We conclude that two pointers do not alias whenever we are able to build astrict relation between them, valid at program points where the two variables are simultaneouslyalive. In a third algorithm we design, we combine both the first and second analysis, and enhancethem with a coarse grained but efficient analysis to deal with non related pointers.We implement these analyses on top of the LLVM compiler. We experiment and evaluate theirperformance based on two metrics: the number of disambiguated pairs of pointers compared tocommon analyses of the compiler, and the optimizations further enabled thanks to the extraprecision they introduce
35

Improving Storage Performance Through Layout Optimizations

Bhadkamkar, Medha 28 July 2009 (has links)
Disk drives are the bottleneck in the processing of large amounts of data used in almost all common applications. File systems attempt to reduce this by storing data sequentially on the disk drives, thereby reducing the access latencies. Although this strategy is useful when data is retrieved sequentially, the access patterns in real world workloads is not necessarily sequential and this mismatch results in storage I/O performance degradation. This thesis demonstrates that one way to improve the storage performance is to reorganize data on disk drives in the same way in which it is mostly accessed. We identify two classes of accesses: static, where access patterns do not change over the lifetime of the data and dynamic, where access patterns frequently change over short durations of time, and propose, implement and evaluate layout strategies for each of these. Our strategies are implemented in a way that they can be seamlessly integrated or removed from the system as desired. We evaluate our layout strategies for static policies using tree-structured XML data where accesses to the storage device are mostly of two kinds - parent-tochild or child-to-sibling. Our results show that for a specific class of deep-focused queries, the existing file system layout policy performs better by 5-54X. For the non-deep-focused queries, our native layout mechanism shows an improvement of 3-127X. To improve performance of the dynamic access patterns, we implement a self-optimizing storage system that performs rearranges popular block accesses on a dedicated partition based on the observed workload characteristics. Our evaluation shows an improvement of over 80% in the disk busy times over a range of workloads. These results show that applying the knowledge of data access patterns for allocation decisions can substantially improve the I/O performance.
36

Unstructured Computations on Emerging Architectures

Al Farhan, Mohammed 05 May 2019 (has links)
This dissertation describes detailed performance engineering and optimization of an unstructured computational aerodynamics software system with irregular memory accesses on various multi- and many-core emerging high performance computing scalable architectures, which are expected to be the building blocks of energy-austere exascale systems, and on which algorithmic- and architecture-oriented optimizations are essential for achieving worthy performance. We investigate several state-of-the-practice shared-memory optimization techniques applied to key kernels for the important problem class of unstructured meshes. We illustrate for a broad spectrum of emerging microprocessor architectures as representatives of the compute units in contemporary leading supercomputers, identifying and addressing performance challenges without compromising the floating-point numerics of the original code. While the linear algebraic kernels are bottlenecked by memory bandwidth for even modest numbers of hardware cores sharing a common address space, the edge-based loop kernels, which arise in the control volume discretization of the conservation law residuals and in the formation of the preconditioner for the Jacobian by finite-differencing the conservation law residuals, are compute-intensive and effectively exploit contemporary multi- and many-core processing hardware. We therefore employ low- and high-level algorithmic- and architecture-specific code optimizations and tuning in light of thread- and data-level parallelism, with a focus on strong thread scaling at the node-level. Our approaches are based upon novel multi-level hierarchical workload distribution mechanisms of data across different compute units (from the address space down to the registers) within every hardware core. We analyze the demonstrated aerodynamics application on specific computing architectures to develop certain performance metrics and models to bespeak the upper and lower bounds of the performance. We present significant full application speedup relative to the baseline code, on a succession of many-core processor architectures, i.e., Intel Xeon Phi Knights Corner (5.0x) and Knights Landing (2.9x). In addition, the performance of Knights Landing outperforms, at significantly lower power consumption, Intel Xeon Skylake with nearly twofold speedup. These optimizations are expected to be of value for many other unstructured mesh partial differential equation-based scientific applications as multi- and many- core architecture evolves.
37

Onboard Video Stabilization for Unmanned Air Vehicles

Cross, Nicholas Stewart 01 June 2011 (has links)
Unmanned Air Vehicles (UAVs) enable the observation of hazardous areas without endangering a pilot. Observational capabilities are provided by on-board video cameras and images are relayed to remote operators for analysis. However, vibration and wind cause video camera mounts to move and can introduce unintended motion that makes video analysis more difficult. Video stabilization is a process that attempts to remove unwanted movement from a video input to provide a clearer picture. This thesis presents an onboard video stabilization solution that removes high-frequency jitter, displays output at 20 frames per second (FPS), and runs on a Blackfin embedded processor. Any video stabilization algorithm will have to contend with the limited space, weight, and power available for embedded systems hardware on a UAV. This thesis demonstrates how architecture-specific optimizations improve algorithm performance on embedded systems and allow an algorithm that was designed with more powerful computing systems in mind to perform on a system that is limited in both size and resources. These optimizations reduce the total clock cycles per frame by 157 million to 30 million, which yields a frame rate increase from 3.2 to 20 FPS.
38

Generované peephole optimalizace v překladači LLVM / Generated Peephole Optimizations in LLVM Compiler

Melo, Stanislav January 2016 (has links)
One of the important feature of application specific processors is performance. To maximize it, the compiler must adapt to needs of processor that it is going to compile for and it must generate the most efficient code. One of the ways to do that is to search for appropriate instructions that can be implemented as one instruction with multiple outputs. Afterwards the generated code can be parsed through peephole optimizations that search for instruction patterns and replace them with other instructions to make code more effective. This paper describes the problem of finding and selecting suitable candidates for multiple output instructions. It also provides a brief overview of the few best known algorithms that solve this problem. Eventually it examines possibilities of incorporating this optimizations to LLVM compiler.
39

Predictive Modeling of Novel Mutations to DNA-Editing Metalloenzymes and Development of Improved QM/MM Methods

Hix, Mark Alan 12 1900 (has links)
Molecular dynamics simulations and QM/MM calculations can provide insights into the structure and function of enzymes as well as changes due to mutations of the protein sequence.
40

Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs

Nisa, Israt 02 October 2019 (has links)
No description available.

Page generated in 0.1073 seconds