Spelling suggestions: "subject:"demory efficiency"" "subject:"amemory efficiency""
1 |
Exploiting language abstraction to optimize memory efficiencySartor, Jennifer Bedke 13 December 2010 (has links)
The programming language and underlying hardware determine application
performance, and both are undergoing revolutionary shifts. As
applications have become more sophisticated and capable, programmers
have chosen managed languages in many domains for ease of development.
These languages abstract memory management from the programmer, which
can introduce time and space overhead but also provide opportunities
for dynamic optimization. Optimizing memory performance is in part
paramount because hardware is reaching physical limits. Recent trends
towards chip multiprocessor machines exacerbate the memory system
bottleneck because they are adding cores without adding commensurate
bandwidth. Both language and architecture trends add stress to the
memory system and degrade application performance.
This dissertation exploits the language abstraction to analyze and
optimize memory efficiency on emerging hardware. We study the sources
of memory inefficiencies on two levels: heap data and hardware storage
traffic. We design and implement optimizations that change the heap
layout of arrays, and use program semantics to eliminate useless
memory traffic. These techniques improve memory system efficiency and
performance.
We first quantitatively characterize the problem by comparing many
data compression algorithms and their combinations in a limit study of
Java benchmarks. We find that arrays are a dominant source of heap
inefficiency. We introduce z-rays, a new array layout design, to
bridge the gap between fast access, space efficiency and
predictability. Z-rays facilitate compression and offer flexibility,
and time and space efficiency.
We find that there is a semantic mismatch between managed languages,
with their rapid allocation rates, and current hardware, causing
unnecessary and excessive traffic in the memory subsystem. We take
advantage of the garbage collector's identification of dead data
regions, communicating information to the caches to eliminate useless
traffic to memory. By reducing traffic and bandwidth, we improve
performance.
We show that the memory abstraction in managed languages is not just a
cost to be borne, but an opportunity to alleviate the memory
bottleneck. This thesis shows how to exploit this abstraction to
improve space and time efficiency and overcome the memory wall. We
enhance the productivity and performance of ubiquitous managed
languages on current and future architectures. / text
|
2 |
A tool for creating high-speed, memory efficient derivative codes for large scale applicationsStovboun, Alexei January 2000 (has links)
No description available.
|
3 |
Referencing Unlabelled World Data to Prevent Catastrophic Forgetting in Class-incremental LearningLi, Xuan 24 June 2022 (has links)
This thesis presents a novel strategy to address the challenge of "catastrophic forgetting" in deep continual-learning systems. The term refers to severe performance degradation for older tasks, as a system learns new tasks that are presented sequentially. Most previous techniques have emphasized preservation of existing knowledge while learning new tasks, in some cases advocating a memory buffer that grows in proportion to the number of tasks. However, we offer another perspective, which is that mitigating local-task fitness during learning is as important as attempting to preserve existing knowledge. We posit the existence of a consistent, unlabelled world environment that the system uses as an easily-accessible reference to avoid favoring spurious properties over more generalizable ones. Based on this assumption, we have developed a novel method called Learning with Reference (LwR), which delivers substantial performance gains relative to its state-of-the-art counterparts. The approach does not involve a growing memory buffer, and therefore promotes better performance at scale. We present extensive empirical evaluation on real-world datasets. / Master of Science / Rome was not built in a day, and in nature knowledge is acquired and consolidated gradually over time. Evolution has taught biological systems how to address emerging challenges by building on past experience, adapting quickly while retaining known skills. Modern artificial intelligence systems also seek to amortize the learning process over time. Specifically, one large learning task can be divided into many smaller non-overlapping tasks. For example, a classification task of two classes, tiger and horse, is divided into two tasks, where the classifier only sees and learns from tiger data in the first task and horse data in the second task. The systems are expected to sequentially acquire knowledge from these smaller tasks. Such learning strategy is known as continual learning and provides three meaningful benefits: higher resource efficiency, a progressively better knowledge base, and strong adaptability. In this thesis, we investigate the class-incremental learning problem, a subset of continual learning, which refers to learning a classification model from a sequence of tasks.
Different from transfer learning, which targets better performance in new domains, continual learning emphasizes the knowledge preservation of both old and new tasks. In deep neural networks, one challenge against the preservation is "catastrophic forgetting", which refers to severe performance degradation for older tasks, as a system learns new ones that are presented sequentially. An intuitive explanation is that old task data is missing in the new tasks under continual learning setting and the model is optimized toward new tasks without concerning the old ones. To overcome this, most previous techniques have emphasized the preservation of existing knowledge while learning new tasks, in some cases advocating old-data replay with a memory buffer, which grows in proportion to the number of tasks.
In this thesis, we offer another perspective, which is that mitigating local-task fitness during learning is as important as attempting to preserve existing knowledge. We notice that local task data always has strong biases because of its smaller size. Optimization on it leads the model to local optima, therefore losing a holistic view that is crucial for other tasks. To mitigate this, a reliable reference should be enforced across tasks and the model should consistently learn all new knowledge based on this. With this assumption, we have developed a novel method called Learning with Reference (LwR), which posits the existence of a consistent, unlabelled world environment that the system uses as an easily-accessible reference to avoid favoring spurious properties over more generalizable ones. Our extensive empirical experiments show that it significantly outperforms state-of-the-art counterparts in real-world datasets.
|
4 |
Algorithmic Engineering Towards More Efficient Key-Value SystemsFan, Bin 18 December 2013 (has links)
Distributed key-value systems have been widely used as elemental components of many Internet-scale services at sites such as Amazon, Facebook and Twitter. This thesis examines a system design approach to scale existing key-value systems, both horizontally and vertically, by carefully engineering and integrating techniques that are grounded in recent theory but also informed by underlying architectures and expected workloads in practice. As a case study, we re-design FAWN-KV—a distributed key-value cluster consisting of “wimpy” key-value nodes—to use less memory but achieve higher throughput even in the worst case.
First, to improve the worst-case throughput of a FAWN-KV system, we propose a randomized load balancing scheme that can fully utilize all the nodes regardless of their query distribution. We analytically prove and empirically demonstrate that deploying a very small but extremely fast load balancer at FAWN-KV can effectively prevent uneven or dynamic workloads creating hotspots on individual nodes. Moreover, our analysis provides service designers a mathematically tractable approach to estimate the worst-case throughput and also avoid drastic overprovisioning in similar distributed key-value systems.
Second, to implement the high-speed load balancer and also to improve the space efficiency of individual key-value nodes, we propose novel data structures and algorithms, including the cuckoo filter, a Bloom filter replacement that is high-speed, highly compact and delete-supporting, and optimistic cuckoo hashing, a fast and space-efficient hashing scheme that scales on multiple CPUs. Both algorithms are built upon conventional cuckoo hashing but are optimized for our target architectures and workloads. Using them as building blocks, we design and implement MemC3 to serve transient data from DRAM with high throughput and low-latency retrievals, and SILT to provide cost-effective access to persistent data on flash storage with extremely small memory footprint (e.g., 0.7 bytes per entry)
|
5 |
Log data filtering in embedded sensor devicesOlsson, Jakob, Yberg, Viktor January 2015 (has links)
Data filtering is the disposal of unnecessary data in a data set, to save resources such as server capacity and bandwidth. The method is used to reduce the amount of stored data and thereby prevent valuable resources from processing insignificant information.The purpose of this thesis is to find algorithms for data filtering and to find out which algorithm gives the best effect in embedded devices with resource limitations. This means that the algorithm needs to be resource efficient in terms of memory usage and performance, while saving enough data points to avoid modification or loss of information. After an algorithm has been found it will also be implemented to fit the Exqbe system.The study has been done by researching previously done studies in line simplification algorithms and their applications. A comparison between several well-known and studied algorithms has been done to find which suits this thesis problem best.The comparison between the different line simplification algorithms resulted in an implementation of an extended version of the Ramer-Douglas-Peucker algorithm. The algorithm has been optimized and a new filter has been implemented in addition to the algorithm. / Datafiltrering är att ta bort onödig data i en datamängd, för att spara resurser såsom serverkapacitet och bandbredd. Metoden används för att minska mängden lagrad data och därmed förhindra att värdefulla resurser används för att bearbeta obetydlig information. Syftet med denna tes är att hitta algoritmer för datafiltrering och att undersöka vilken algoritm som ger bäst resultat i inbyggda system med resursbegränsningar. Det innebär att algoritmen bör vara resurseffektiv vad gäller minnesanvändning och prestanda, men spara tillräckligt många datapunkter för att inte modifiera eller förlora information. Efter att en algoritm har hittats kommer den även att implementeras för att passa Exqbe-systemet. Studien är genomförd genom att studera tidigare gjorda studier om datafiltreringsalgoritmer och dess applikationer. Jämförelser mellan flera välkända algoritmer har utförts för att hitta vilken som passar denna tes bäst. Jämförelsen mellan de olika filtreringsalgoritmerna resulterade i en implementation av en utökad version av Ramer-Douglas-Peucker-algoritmen. Algoritmen har optimerats och ett nytt filter har implementerats utöver algoritmen.
|
Page generated in 0.0563 seconds