1 |
Performance Impact on Neural Network with Partitioned Convolution Implemented with GPU Programming / Partitioned Convolution in Neuron NetworkLee, Bill January 2021 (has links)
For input data of homogenous type, the standard form of convolutional neural network is normally constructed with universally applied filters to identify global patterns. However, for certain datasets, there are identifiable trends and patterns within subgroups of input data. This research proposes a convolutional neural network that deliberately partitions input data into groups to be processed with unique sets of convolutional layers, thus identifying the underlying features of individual data groups. Training and testing data are built from historical prices of stock market and preprocessed so that the generated datasets are suitable for both standard and the proposed convolutional neural network. The author of this research also developed a software framework that can construct neural networks to perform necessary testing. The calculation logic was implemented using parallel programming and executed on a Nvidia graphic processing unit, thus allowing tests to be executed without expensive hardware. Tests were executed for 134 sets of datasets to benchmark the performance between standard and the proposed convolutional neural network. Test results show that the partitioned convolution method is capable of performance that rivals its standard counterpart. Further analysis indicates that more sophisticated method of building datasets, larger sets of training data, or more training epochs can further improve the performance of the partitioned neural network. For suitable datasets, the proposed method could be a viable replacement or supplement to the standard convolutional neural network structure. / Thesis / Master of Applied Science (MASc) / A convolutional neural network is a machine learning tool that allows complex patterns in datasets to be identified and modelled. For datasets with input that consists of the same type of data, a convolutional neural network is often architected to identify global patterns. This research explores the viability of partitioning input data into groups and processing them with separate convolutional layers so unique patterns associated with individual subgroups of input data can be identified. The author of this research built suitable test datasets and developed a (parallel computation enabled) framework that can construct both standard and proposed convolutional neural networks. The test results show that the proposed structure is capable of performance that matches its standard counterpart. Further analysis indicates that there are potential methods to further improve the performance of partitioned convolution, making it a viable replacement or supplement to standard convolution.
|
2 |
Optimal Implementation of Simulink Models on Multicore Architectures with Partitioned Fixed Priority SchedulingBansal, Shamit 02 August 2018 (has links)
Model-based design based on the Simulink modeling formalism and the associated toolchain has gained its popularity in the development of complex embedded control systems. However,the current research on software synthesis for Simulink models has a critical gap for providing a deterministic, semantics-preserving implementation on multicore architectures with partitioned fixed-priority scheduling. In this thesis, we propose to judiciously assign task offset, task priority, and task communication mechanism, to avoid simultaneous access to shared memory by tasks on different cores, to preserve the model semantics, and to optimize the control performance. We develop two approaches to solve the problem: (a) a mixed integer linear programming (MILP) formulation; and (b) a problem specific exact algorithm that may run several magnitudes faster than MILP. / Master of Science / To save development time and money, automotive industries have been developing models using software, before implementing them directly on hardware. For reliability, the model generated from the software tool should behave in a well defined manner, coherent to the ideal design of the model. While the current tools are able to generate this reliable model for a single processor system, they are not able to do so for a system with multiple processors. When two or more processors contend to access the same resource at the same time, the existing tools are unable to provide a well defined execution order in their model. Since modern embedded systems need multiple processors to meet their increasing performance demands, it is imperative that the software tools scale up to multiple processors as well. In this work, we seek to bridge this gap by presenting two solutions that generate a deterministic software implementation of a system with multiple processors. In our solutions, we generate a model with well defined execution order by ensuring that at any given time, only one processor accesses a given resource. Furthermore, apart from ensuring determinism, we also improve upon the performance of the generated model by ensuring that there is minimal end-to-end latency in the system.
|
3 |
The Existence of Balanced Tournament Designs and Partitioned Balanced Tournament DesignsBauman, Shane January 2001 (has links)
A balanced tournament design of order <I>n</I>, BTD(<I>n</I>), defined on a 2<I>n</I>-set<I> V</i>, is an arrangement of the all of the (2<I>n</i>2) distinct unordered pairs of elements of <I>V</I> into an <I>n</I> X (2<I>n</i> - 1) array such that (1) every element of <I>V</i> occurs exactly once in each column and (2) every element of <I>V</I> occurs at most twice in each row. We will show that there exists a BTD(<i>n</i>) for <i>n</i> a positive integer, <i>n</i> not equal to 2. For <I>n</i> = 2, a BTD (<i>n</i>) does not exist. If the BTD(<i>n</i>) has the additional property that it is possible to permute the columns of the array such that for every row, all the elements of<I> V</I> appear exactly once in the first <i>n</i> pairs of that row and exactly once in the last <i>n</i> pairs of that row then we call the design a partitioned balanced tournament design, PBTD(<I>n</I>). We will show that there exists a PBTD (<I>n</I>) for <I>n</I> a positive integer, <I>n</I> is greater than and equal to 5, except possibly for <I>n</I> an element of the set {9,11,15}. For <I>n</I> less than and equal to 4 a PBTD(<I>n</I>) does not exist.
|
4 |
The Existence of Balanced Tournament Designs and Partitioned Balanced Tournament DesignsBauman, Shane January 2001 (has links)
A balanced tournament design of order <I>n</I>, BTD(<I>n</I>), defined on a 2<I>n</I>-set<I> V</i>, is an arrangement of the all of the (2<I>n</i>2) distinct unordered pairs of elements of <I>V</I> into an <I>n</I> X (2<I>n</i> - 1) array such that (1) every element of <I>V</i> occurs exactly once in each column and (2) every element of <I>V</I> occurs at most twice in each row. We will show that there exists a BTD(<i>n</i>) for <i>n</i> a positive integer, <i>n</i> not equal to 2. For <I>n</i> = 2, a BTD (<i>n</i>) does not exist. If the BTD(<i>n</i>) has the additional property that it is possible to permute the columns of the array such that for every row, all the elements of<I> V</I> appear exactly once in the first <i>n</i> pairs of that row and exactly once in the last <i>n</i> pairs of that row then we call the design a partitioned balanced tournament design, PBTD(<I>n</I>). We will show that there exists a PBTD (<I>n</I>) for <I>n</I> a positive integer, <I>n</I> is greater than and equal to 5, except possibly for <I>n</I> an element of the set {9,11,15}. For <I>n</I> less than and equal to 4 a PBTD(<I>n</I>) does not exist.
|
5 |
Modeling And Partitioning The Nucleotide Evolutionary Process For Phylogenetic And Comparative Genomic InferenceCastoe, Todd 01 January 2007 (has links)
The transformation of genomic data into functionally relevant information about the composition of biological systems hinges critically on the field of computational genome biology, at the core of which lies comparative genomics. The aim of comparative genomics is to extract meaningful functional information from the differences and similarities observed across genomes of different organisms. We develop and test a novel framework for applying complex models of nucleotide evolution to solve phylogenetic and comparative genomic problems, and demonstrate that these techniques are crucial for accurate comparative evolutionary inferences. Additionally, we conduct an exploratory study using vertebrate mitochondrial genomes as a model to identify the reciprocal influences that genome structure, nucleotide evolution, and multi-level molecular function may have on one another. Collectively this work represents a significant and novel contribution to accurately modeling and characterizing patterns of nucleotide evolution, a contribution that enables the enhanced detection of patterns of genealogical relationships, selection, and function in comparative genomic datasets. Our work with entire mitochondrial genomes highlights a coordinated evolutionary shift that simultaneously altered genome architecture, replication, nucleotide evolution and molecular function (of proteins, RNAs, and the genome itself). Current research in computational biology, including the advances included in this dissertation, continue to close the gap that impedes the transformation of genomic data into powerful tools for the analysis and understanding of biological systems function.
|
6 |
Analytical Modelling and Optimization of Congestion Control for Prioritized Multi-Class Self-Similar TrafficMin, Geyong, Jin, X. January 2013 (has links)
No / Traffic congestion in communication networks can dramatically deteriorate user-perceived Quality-of-Service (QoS). The integration of the Random Early Detection (RED) and priority scheduling mechanisms is a promising scheme for congestion control and provisioning of differentiated QoS required by multimedia applications. Although analytical modelling of RED congestion control has received significant research efforts, the performance models reported in the current literature were primarily restricted to the RED algorithm only without consideration of traffic scheduling scheme for QoS differentiation. Moreover, for analytical tractability, these models were developed under the simplified assumption that the traffic follows Short-Range-Dependent (SRD) arrival processes (e.g., Poisson or Markov processes), which are unable to capture the self-similar nature (i.e., scale-invariant burstiness) of multimedia traffic in modern communication networks. To fill these gaps, this paper presents a new analytical model of RED congestion control for prioritized multi-class self-similar traffic. The closed-form expressions for the loss probability of individual traffic classes are derived. The effectiveness and accuracy of the model are validated through extensive comparison between analytical and simulation results. To illustrate its application, the model is adopted as a cost-effective tool to investigate the optimal threshold configuration and minimize the required buffer space with congestion control.
|
7 |
A Dynamically Partitionable Compressed CacheChen, David, Peserico, Enoch, Rudolph, Larry 01 1900 (has links)
The effective size of an L2 cache can be increased by using a dictionary-based compression scheme. Naive application of this idea performs poorly since the data values in a cache greatly vary in their “compressibility.” The novelty of this paper is a scheme that dynamically partitions the cache into sections of different compressibilities. While compression is often researched in the context of a large stream, in this work it is applied repeatedly on smaller cache-line sized blocks so as to preserve the random access requirement of a cache. When a cache-line is brought into the L2 cache or the cache-line is to be modified, the line is compressed using a dynamic, LZW dictionary. Depending on the compression, it is placed into the relevant partition. The partitioning is dynamic in that the ratio of space allocated to compressed and uncompressed varies depending on the actual performance, Certain SPEC-2000 benchmarks using a compressed L2 cache show an 80reduction in L2 miss-rate when compared to using an uncompressed L2 cache of the same area, taking into account all area overhead associated with the compression circuitry. For other SPEC-2000 benchmarks, the compressed cache performs as well as a traditional cache that is 4.3 times as large as the compressed cache in terms of hit rate, The adaptivity ensures that, in terms of miss rates, the compressed cache never performs worse than a traditional cache. / Singapore-MIT Alliance (SMA)
|
8 |
High Performance Content Centric Networking on Virtual InfrastructureTang, Tang 28 November 2013 (has links)
Content Centric Networking (CCN) is a novel networking architecture in which communication is resolved based on names, or descriptions of the data transferred instead of addresses of the end-hosts.
While CCN demonstrates many promising potentials, its current implementation suffers from severe performance limitations.
In this thesis we study the performance and analyze the bottleneck of the existing CCN prototype. Based on the analysis, a variety of design alternatives are proposed for realizing high performance content centric networking over virtual infrastructure.
Preliminary implementations for two of the approaches are developed and evaluated on Smart Applications on Virtual Infrastructure (SAVI) testbed. The evaluation results demonstrate that our design is capable of providing scalable content centric routing solution beyond 1Gbps throughput under realistic traffic load.
|
9 |
High Performance Content Centric Networking on Virtual InfrastructureTang, Tang 28 November 2013 (has links)
Content Centric Networking (CCN) is a novel networking architecture in which communication is resolved based on names, or descriptions of the data transferred instead of addresses of the end-hosts.
While CCN demonstrates many promising potentials, its current implementation suffers from severe performance limitations.
In this thesis we study the performance and analyze the bottleneck of the existing CCN prototype. Based on the analysis, a variety of design alternatives are proposed for realizing high performance content centric networking over virtual infrastructure.
Preliminary implementations for two of the approaches are developed and evaluated on Smart Applications on Virtual Infrastructure (SAVI) testbed. The evaluation results demonstrate that our design is capable of providing scalable content centric routing solution beyond 1Gbps throughput under realistic traffic load.
|
10 |
Improved Heuristics for Partitioned Multiprocessor Scheduling Based on Rate-Monotonic Small-TasksMüller, Dirk, Werner, Matthias 01 November 2012 (has links) (PDF)
Partitioned preemptive EDF scheduling is very similar to bin packing, but there is a subtle difference. Estimating the probability of schedulability under a given total utilization has been studied empirically before. Here, we show an approach for closed-form formulae for the problem, starting with n = 3 tasks on m = 2 processors.
|
Page generated in 0.0854 seconds