391 |
Software Transactional Memory Building BlocksRiegel, Torvald 15 August 2013 (has links) (PDF)
Exploiting thread-level parallelism has become a part of mainstream programming in recent years. Many approaches to parallelization require threads executing in parallel to also synchronize occassionally (i.e., coordinate concurrent accesses to shared state). Transactional Memory (TM) is a programming abstraction that provides the concept of database transactions in the context of programming languages such as C/C++. This allows programmers to only declare which pieces of a program synchronize without requiring them to actually implement synchronization and tune its performance, which in turn makes TM typically easier to use than other abstractions such as locks.
I have investigated and implemented the building blocks that are required for a high-performance, practical, and realistic TM. They host several novel algorithms and optimizations for TM implementations, both for current hardware and future hardware extensions for TM, and are being used in or have influenced commercial TM implementations such as the TM support in GCC.
|
392 |
Improving cache Behavior in CMP architectures throug cache partitioning techniquesMoretó Planas, Miquel 19 March 2010 (has links)
The evolution of microprocessor design in the last few decades has changed significantly, moving from simple inorder single core architectures to superscalar and vector architectures in order to extract the maximum available instruction level parallelism. Executing several instructions from the same thread in parallel allows significantly improving the performance of an application. However, there is only a limited amount of parallelism available in each thread, because of data and control dependences. Furthermore, designing a high performance, single, monolithic processor has become very complex due to power and chip latencies constraints. These limitations have motivated the use of thread level parallelism (TLP) as a common strategy for improving processor performance. Multithreaded processors allow executing different threads at the same time, sharing some hardware resources. There are several flavors of multithreaded processors that exploit the TLP, such as chip multiprocessors (CMP), coarse grain multithreading, fine grain multithreading, simultaneous multithreading (SMT), and combinations of them.To improve cost and power efficiency, the computer industry has adopted multicore chips. In particular, CMP architectures have become the most common design decision (combined sometimes with multithreaded cores). Firstly, CMPs reduce design costs and average power consumption by promoting design re-use and simpler processor cores. For example, it is less complex to design a chip with many small, simple cores than a chip with fewer, larger, monolithic cores.Furthermore, simpler cores have less power hungry centralized hardware structures. Secondly, CMPs reduce costs by improving hardware resource utilization. On a multicore chip, co-scheduled threads can share costly microarchitecture resources that would otherwise be underutilized. Higher resource utilization improves aggregate performance and enables lower cost design alternatives.One of the resources that impacts most on the final performance of an application is the cache hierarchy. Caches store data recently used by the applications in order to take advantage of temporal and spatial locality of applications. Caches provide fast access to data, improving the performance of applications. Caches with low latencies have to be small, which prompts the design of a cache hierarchy organized into several levels of cache.In CMPs, the cache hierarchy is normally organized in a first level (L1) of instruction and data caches private to each core. A last level of cache (LLC) is normally shared among different cores in the processor (L2, L3 or both). Shared caches increase resource utilization and system performance. Large caches improve performance and efficiency by increasing the probability that each application can access data from a closer level of the cache hierarchy. It also allows an application to make use of the entire cache if needed.A second advantage of having a shared cache in a CMP design has to do with the cache coherency. In parallel applications, different threads share the same data and keep a local copy of this data in their cache. With multiple processors, it is possible for one processor to change the data, leaving another processor's cache with outdated data. Cache coherency protocol monitors changes to data and ensures that all processor caches have the most recent data. When the parallel application executes on the same physical chip, the cache coherency circuitry can operate at the speed of on-chip communications, rather than having to use the much slower between-chip communication, as is required with discrete processors on separate chips. These coherence protocols are simpler to design with a unified and shared level of cache onchip.Due to the advantages that multicore architectures offer, chip vendors use CMP architectures in current high performance, network, real-time and embedded systems. Several of these commercial processors have a level of the cache hierarchy shared by different cores. For example, the Sun UltraSPARC T2 has a 16-way 4MB L2 cache shared by 8 cores each one up to 8-way SMT. Other processors like the Intel Core 2 family also share up to a 12MB 24-way L2 cache. In contrast, the AMD K10 family has a private L2 cache per core and a shared L3 cache, with up to a 6MB 64-way L3 cache.As the long-term trend of increasing integration continues, the number of cores per chip is also projected to increase with each successive technology generation. Some significant studies have shown that processors with hundreds of cores per chip will appear in the market in the following years. The manycore era has already begun. Although this era provides many opportunities, it also presents many challenges. In particular, higher hardware resource sharing among concurrently executing threads can cause individual thread's performance to become unpredictable and might lead to violations of the individual applications' performance requirements. Current resource management mechanisms and policies are no longer adequate for future multicore systems.Some applications present low re-use of their data and pollute caches with data streams, such as multimedia, communications or streaming applications, or have many compulsory misses that cannot be solved by assigning more cache space to the application. Traditional eviction policies such as Least Recently Used (LRU), pseudo LRU or random are demand-driven, that is, they tend to give more space to the application that has more accesses to the cache hierarchy.When no direct control over shared resources is exercised (the last level cache in this case), it is possible that a particular thread allocates most of the shared resources, degrading other threads performance. As a consequence, high resource sharing and resource utilization can cause systems to become unstable and violate individual applications' requirements. If we want to provide a Quality of Service (QoS) to applications, we need to enhance the control over shared resources and enrich the collaboration between the OS and the architecture.In this thesis, we propose software and hardware mechanisms to improve cache sharing in CMP architectures. We make use of a holistic approach, coordinating targets of software and hardware to improve system aggregate performance and provide QoS to applications. We make use of explicit resource allocation techniques to control the shared cache in a CMP architecture, with resource allocation targets driven by hardware and software mechanisms.The main contributions of this thesis are the following:- We have characterized different single- and multithreaded applications and classified workloads with a systematic method to better understand and explain the cache sharing effects on a CMP architecture. We have made a special effort in studying previous cache partitioning techniques for CMP architectures, in order to acquire the insight to propose improved mechanisms.- In CMP architectures with out-of-order processors, cache misses can be served in parallel and share the miss penalty to access main memory. We take this fact into account to propose new cache partitioning algorithms guided by the memory-level parallelism (MLP) of each application. With these algorithms, the system performance is improved (in terms of throughput and fairness) without significantly increasing the hardware required by previous proposals.- Driving cache partition decisions with indirect indicators of performance such as misses, MLP or data re-use may lead to suboptimal cache partitions. Ideally, the appropriate metric to drive cache partitions should be the target metric to optimize, which is normally related to IPC. Thus, we have developed a hardware mechanism, OPACU, which is able to obtain at run-time accurate predictions of the performance of an application when running with different cache assignments.- Using performance predictions, we have introduced a new framework to manage shared caches in CMP architectures, FlexDCP, which allows the OS to optimize different IPC-related target metrics like throughput or fairness and provide QoS to applications. FlexDCP allows an enhanced coordination between the hardware and the software layers, which leads to improved system performance and flexibility.- Next, we have made use of performance estimations to reduce the load imbalance problem in parallel applications. We have built a run-time mechanism that detects parallel applications sensitive to cache allocation and, in these situations, the load imbalance is reduced by assigning more cache space to the slowest threads. This mechanism, helps reducing the long optimization time in terms of man-years of effort devoted to large-scale parallel applications.- Finally, we have stated the main characteristics that future multicore processors with thousands of cores should have. An enhanced coordination between the software and hardware layers has been proposed to better manage the shared resources in these architectures.
|
393 |
Efficient openMP over sequentially consistent distributed shared memory systemsCosta Prats, Juan José 20 July 2011 (has links)
Nowadays clusters are one of the most used platforms in High Performance Computing and most programmers use the Message Passing Interface (MPI) library to program their applications in these distributed platforms getting their maximum performance, although it is a complex task. On the other side, OpenMP has been established as the de facto standard to program applications on shared memory platforms because it is easy to use and obtains good performance without too much effort.
So, could it be possible to join both worlds? Could programmers use the easiness of OpenMP in distributed platforms? A lot of researchers think so. And one of the developed ideas is the distributed shared memory (DSM), a software layer on top of a distributed platform giving an abstract shared memory view to the applications. Even though it seems a good solution it also has some inconveniences. The memory coherence between the nodes in the platform is difficult to maintain (complex management, scalability issues, high overhead and others) and the latency of the remote-memory accesses which can be orders of magnitude greater than on a shared bus due to the interconnection network.
Therefore this research improves the performance of OpenMP applications being executed on distributed memory platforms using a DSM with sequential consistency evaluating thoroughly the results from the NAS parallel benchmarks.
The vast majority of designed DSMs use a relaxed consistency model because it avoids some major problems in the area. In contrast, we use a sequential consistency model because we think that showing these potential problems that otherwise are hidden may allow the finding of some solutions and, therefore, apply them to both models.
The main idea behind this work is that both runtimes, the OpenMP and the DSM layer, should cooperate to achieve good performance, otherwise they interfere one each other trashing the final performance of applications.
We develop three different contributions to improve the performance of these applications: (a) a technique to avoid false sharing at runtime, (b) a technique to mimic the MPI behaviour, where produced data is forwarded to their consumers and, finally, (c) a mechanism to avoid the network congestion due to the DSM coherence messages. The NAS Parallel Benchmarks are used to test the contributions.
The results of this work shows that the false-sharing problem is a relative problem depending on each application. Another result is the importance to move the data flow outside of the critical path and to use techniques that forwards data as early as possible, similar to MPI, benefits the final application performance.
Additionally, this data movement is usually concentrated at single points and affects the application performance due to the limited bandwidth of the network. Therefore it is necessary to provide mechanisms that allows the distribution of this data through the computation time using an otherwise idle network.
Finally, results shows that the proposed contributions improve the performance of OpenMP applications on this kind of environments.
|
394 |
Quantum Random Access Codes with Shared RandomnessOzols, Maris 05 1900 (has links)
We consider a communication method, where the sender encodes n classical bits into 1 qubit and sends it to the receiver who performs a certain measurement depending on which of the initial bits must be recovered. This procedure is called (n,1,p) quantum random access code (QRAC) where p > 1/2 is its success probability. It is known that (2,1,0.85) and (3,1,0.79) QRACs (with no classical counterparts) exist and that (4,1,p) QRAC with p > 1/2 is not possible.
We extend this model with shared randomness (SR) that is accessible to both parties. Then (n,1,p) QRAC with SR and p > 1/2 exists for any n > 0. We give an upper bound on its success probability (the known (2,1,0.85) and (3,1,0.79) QRACs match this upper bound). We discuss some particular constructions for several small values of n.
We also study the classical counterpart of this model where n bits are encoded into 1 bit instead of 1 qubit and SR is used. We give an optimal construction for such codes and find their success probability exactly---it is less than in the quantum case.
Interactive 3D quantum random access codes are available on-line at
http://home.lanet.lv/~sd20008/racs
|
395 |
Realizing Shared Services - A Punctuated Process Analysis of a Public IT DepartmentOlsen, Tim 06 December 2012 (has links)
IT services are increasingly being offered via a shared service model. This model promises the benefits of centralization and consolidation, as well as an increased customer satisfaction. Adopting shared services is not easy as it necessitates a major organizational change, with few documented exemplars to guide managers. This research explores a public IT unit’s realization of shared services with the intent to improve the transparency of its value proposition to their stakeholders. An ethnographic field study enabled in-situ data collection over a 24-month period. We analyzed the resulting, rich process data using the Punctuated Socio-Technical IS Change (PSIC) model. This resulted in several contributions: an explanatory account of shared services realization, an empirically grounded punctuated process model with seventeen critical incidents, and twelve key lessons for practitioners. Several extensions to extant process research methods are developed. These contributions combine to form a detailed and nuanced understanding of the process of realizing IT shared services at a large public university over a multi-year period.
|
396 |
Corporate Social Responsibility and Financial Performance: Does it Pay to Be Good?Palmer, Harmony J 01 January 2012 (has links)
The prominence of corporate social responsibility (CSR) initiatives today suggests that the corporate perception of such policies has shifted from an unnecessary addition to a critical business function. Using a reliable source of data on corporate social performance (CSP), this study explores and tests the relationship between CSP and corporate financial performance (CFP). Unlike prior research, this study additionally tests the impact CSP has on sales and gross margin in hopes of providing insight on sales strategies that can be implemented to maximize the impact of the relationship. The dataset includes most of the S&P 500 firms and covers years 2001-2005. The relationships are tested using time-series regressions. Results indicate that CSP and CFP have a significantly positive relationship in both directions, supporting the view that CSR programs have positive impacts on the bottom-line. Results also indicate that increased CSP leads to increases in gross margin, indicating that some customers are willing to pay a premium for the products and/or services of a company with CSR initiatives. Lastly, results also indicate that increases in CSP leads to a decrease in sales, which implies a decrease in customer base because less people are willing to buy the products at premium. Despite the result on sales, I argue in this paper that firms can increase sales by increasing CSR investments—assuming increases in CSR investments leads to higher CSP—as long as the perception of programs transform from socially responsible, philanthropic actions to programs promoting corporate shared value (CSV).
|
397 |
Changing Role of HR : A Comparative study of different organization structures in relation to HR & the motivation behind themPaphavatana, Pisalvit, Mohiuddin, Md. Fazla January 2011 (has links)
Since its big breakthrough in 1980 starting in America (Bredin, 2008), we see yet another shift from traditional to Strategic Human Resources which were basically about two normative model “best fit” vs. “best practice” and their implication in business organizations (Boxall & Purcell, 2000). Scholars like Ulrich (1997), suggested ways about how Human Resource (HR) could contribute in the search for competitive advantage by advocating new organizational structures and roles such as HRSSC (Human Resource Shared Service Center) or the new role of HRBP (Human Resource Business Partner). These new roles and structures can be seen as an extension of “best fit” vs. “best practice” thinking and provide with a tool to cope with challenges faced by today’s organizations. The first and foremost objective of this paper is to come up with a reasonable understanding about these different changes in roles and structures of HR. To do this, it puts the whole change process under “organizational evolution theory” lens and analyzes the whole phenomena to figure out “where do these changes come from” and “what is the implication of these changes for practitioner managers”. To be more precise, this paper applies ecological perspective at organizational and population level suggested by Lovas & Ghoshal (2000) and provides a starting point for future research to apply what Lovas & Ghoshal (2000) called “Guided Evolution” perspective. The next objective of this paper is to check if it is possible to come up with a Key Success Factors (KSF) which would work across different business environments and come up with implications for today’s organizations accordingly. In addition to an extensive literature review, the thesis conducted four semi-structured interviews with three large companies in Sweden applying “qualitative research interview” technique and then analyzed the data with adding more data from other secondary sources. The findings of this work suggest that, the whole change process corresponds to a “variation” cycle of the evolutionary process which should eventually move to a “selection” cycle. The choice and success of these new structures and roles are dependent on factors such as corporate strategies, adequate knowledge of HR or presence/absence of competition and finally suggest that success factors vary from environment to environment and thus it is not possible to come up with a set of Key Success Factors (KSF) which would work across cultures and business environments.
|
398 |
Delat ledarskap : En ledningsform med potential - om stjärnorna står rätt / Shared leadership : A form of management with potential - with a bit of luckFolger, Anna-Karin January 2011 (has links)
Denna kvalitativa intervjustudie fokuserar etablering av delat ledarskap inom offentlig sektor. Syftet är att belysa varför delat ledarskap uppkommer som en alternativ ledningsform. Förutom beslutsfattares medvetna val och motiveringar försöker studien även fånga underliggande bevekelsegrunder samt identifiera eventuellt andra faktorer som kan påverka uppkomsten av ledningsformen. En komplex bild av faktorer såsom faktisk situation, organisationens normer, förutsättningar och omgivningar liksom de individer som befolkar den, påverkar uppkomsten av ledningsformen. Tilltron till ledningsformens potential är stor och farhågor tonas ner. Ledningsformen ses kunna tillföra organisationen synergieffekter och symboliska mervärden och ses som en lösning på olika problem där den erbjuder ett alternativ. En öppen inställning till att pröva nya ledningsformer, pragmatiskt förhållningssätt liksom enskilda individers arbetsrelaterade egenintressen påverkar också uppkomsten av ledningsformen. / This qualitative interview study focuses on establishing shared management in the public sector. The objective is to shed light on the reason why shared management emerges as an alternative form of management. As well as the conscious choices and motives of decision makers, the study also attempts to capture underlying reasons and to identify any other factors that may influence the emergence of this form of management. A complex picture of factors where the actual situation and the organisation’s standards, conditions and surroundings together with the individuals who populate it, are what influence the emergence of this form of management. There is great confidence in the potential of this form of management and misgivings are downplayed. This form of management is thought to be able to bring synergy effects and symbolic added value to the organisation and is seen as a solution to various problems where it offers an alternative. An open attitude to trying new forms of management, pragmatic attitudes and the vested work-related interests of individuals also influence the emergence of this form of management. / <p>Examensarbetet ingår i program: Master of Public Administration and Governance 120 hp, Offentligt ledarskap, styrning och samverkan</p>
|
399 |
Wireless On-Board DiagnosticsSchirninger, Rene, Zeppetzauer, Stefan January 2005 (has links)
Wireless On-board diagnostics functionality, which is a future outlook to vehicle system parameter analysis, enables measurements and controlling without the limitation of a physical connector. Today every vehicle must by law provide the possibility to analyze engine and emission related parameters (OBD II). The wireless connection requires a high security level to prevent unauthorized communication establishment with the truck’s bus system. The aim of the project is to make a survey of the available security mechanisms and to find the most promising solutions. Furthermore, several usage scenarios and access right levels are specified and a risk analysis of the whole system is made. The greatest challenge is the specification and implementation of a proper key-exchange mechanism between the analyzing device and the truck’s bus system, which is therefore carried out with the highest possible level of awareness. Consequently several different concepts have been formulated based on the different usage scenarios.
|
400 |
Quantum Random Access Codes with Shared RandomnessOzols, Maris 05 1900 (has links)
We consider a communication method, where the sender encodes n classical bits into 1 qubit and sends it to the receiver who performs a certain measurement depending on which of the initial bits must be recovered. This procedure is called (n,1,p) quantum random access code (QRAC) where p > 1/2 is its success probability. It is known that (2,1,0.85) and (3,1,0.79) QRACs (with no classical counterparts) exist and that (4,1,p) QRAC with p > 1/2 is not possible.
We extend this model with shared randomness (SR) that is accessible to both parties. Then (n,1,p) QRAC with SR and p > 1/2 exists for any n > 0. We give an upper bound on its success probability (the known (2,1,0.85) and (3,1,0.79) QRACs match this upper bound). We discuss some particular constructions for several small values of n.
We also study the classical counterpart of this model where n bits are encoded into 1 bit instead of 1 qubit and SR is used. We give an optimal construction for such codes and find their success probability exactly---it is less than in the quantum case.
Interactive 3D quantum random access codes are available on-line at
http://home.lanet.lv/~sd20008/racs
|
Page generated in 0.0584 seconds