Global ETD Search

11	ADVANCED TELEMETRY PROCESSING SYSTEM (ATPS) Finegan, Brian H., Singer, Gary 10 1900 (has links) International Telemetering Conference Proceedings / October 17-20, 1994 / Town & Country Hotel and Conference Center, San Diego, California / The Advanced Telemetry Processing System (ATPS) is the result of a joint development project between Harris Corporation and Veda Systems, Incorporated. The mission of the development team was to produce a high-performance, cost-effective, supportable telemetry system; one that would utilize commercial-off-the-shelf (COTS) hardware and software, thereby eliminating costly customization typically required for range and telemetry applications. A critical element in the 'cost-effective, supportable' equation was the ability to easily incorporate system performance upgrades as well as future hardware and software technology advancements. The ATPS combines advanced hardware and software technology that includes a high-speed, top-down data management environment; a mature man-machine interface; a B1-level Trusted operating system and network; and stringent real-time multiprocessing capabilities into a single, fully integrated, 'open' platform. In addition, the system incorporates a unique direct memory transfer feature that allows incoming data to pass directly into local memory space where it can be displayed and analyzed, thereby reducing I/O bottleneck and freeing processors for other specialized tasks. B1-level Trusted operating system commercia-off-the-shelf real-time multiprocessing
12	Μεθοδολογία ανάπτυξης μεταγλωττιστών με εκμετάλλευση της δομής του λογισμικού και του μοντέλου του υλικού του Κελεφούρας, Βασίλειος 16 May 2014 (has links) Οι υπάρχοντες μεταγλωττιστές, έχουν τρία βασικά μειονεκτήματα i) όλα τα υπό-προβλήματα της μεταγλώττισης (π.χ. μετασχηματισμοί, εύρεση χρονοπρογραμματισμού, ανάθεση καταχωρητών) βελτιστοποιούνται ξεχωριστά (εκτός από μεμονωμένες περιπτώσεις όπου βελτιστοποιούνται κάποια στάδια μαζί - συνήθως 2), παρόλο που υπάρχει εξάρτηση μεταξύ τους, ii) δεν εκμεταλλεύονται αποδοτικά όλα τα χαρακτηριστικά του προγράμματος εισόδου (π.χ. δομή του εκάστοτε αλγορίθμου, επαναχρησιμοποίηση δεδομένων), iii) δεν εκμεταλλεύονται αποδοτικά τις παραμέτρους της αρχιτεκτονικής. Στη παρούσα διδακτορική διατριβή, αναπτύχθηκαν μεθοδολογίες οι οποίες αντιμετωπίζουν τα προβλήματα εύρεσης χρονοπρογραμματισμών με τον ελάχιστο αριθμό i) προσβάσεων στην κρυφή μνήμη δεδομένων L1, ii) προσβάσεων στην κρυφή μνήμη L2, iii) προσβάσεων στην κύρια μνήμη, iv) πράξεων διευθυνσιοδότησης, μαζί σαν ενιαίο πρόβλημα και όχι ξεχωριστά, για ένα kernel. Αυτό επιτυγχάνεται αντιμετωπίζοντας τα χαρακτηριστικά του λογισμικού και τις τις βασικές παραμέτρους της αρχιτεκτονικής μαζί σαν ενιαίο πρόβλημα. Είναι η πρώτη φορά που μια μεθοδολογία αντιμετωπίζει τα παραπάνω προβλήματα με αυτό τον τρόπο. Οι προτεινόμενες μεθοδολογίες εκμεταλλεύονται τα χαρακτηριστικά του προγράμματος εισόδου. Η δομή του εκάστοτε αλγορίθμου (π.χ. ο FFT αποτελείται από πράξεις πεταλούδων ενώ ο αλγόριθμος αφαίρεσης θορύβου - Gauss Blur αποτελείται από πράξεις μάσκας στοιχείων), τα ιδιαίτερα χαρακτηριστικά του (π.χ. συμμετρία Toeplitz πίνακα), η ύπαρξη προτύπων-patterns (π.χ. στοιχεία πινάκων πολλαπλασιάζονται με μάσκα), η επαναχρησιμοποίηση των δεδομένων, η παραγωγή-κατανάλωση ενδιάμεσων αποτελεσμάτων και η παραλληλία του αλγορίθμου, αντιμετωπίζονται μαζί σαν ενιαίο πρόβλημα. Οι προτεινόμενες μεθοδολογίες εκμεταλλεύονται τις βασικές παραμέτρους της αρχιτεκτονικής. Η αρχιτεκτονική της μνήμης (π.χ. κοινή L2, L3), το πλήθος των καταχωρητών, ο αριθμός των κρυφών μνημών δεδομένων, τα μεγέθη, οι συσχετιστικότητες (assosiativity) και τα μεγέθη των γραμμών των κρυφών μνημών, ο αριθμός των λειτουργικών μονάδων, ο αριθμός των λειτουργικών μονάδων που λειτουργούν παράλληλα και ο αριθμός των πυρήνων (cores) του επεξεργαστή, αντιμετωπίζονται μαζί σαν ενιαίο πρόβλημα. Με την αξιοποίηση των χαρακτηριστικών του εκάστοτε αλγορίθμου και των παραμέτρων της αρχιτεκτονικής, αποκλείονται πιθανές λύσεις και ο χώρος εξερεύνησης μειώνεται ραγδαία (τάξεις μεγέθους). Στη παρούσα διδακτορική διατριβή, αναπτύχθηκαν μεθοδολογίες αύξησης της ταχύτητας του λογισμικού α) του Πολλαπλασιασμού Πίνακα επί Πίνακα (ΠΠΠ), β) του Πολλαπλασιασμού Πίνακα επί διάνυσμα (ΠΠΔ), γ) του Fast Fourier Transform (FFT), δ) του αλγορίθμου Canny και του μετασχηματισμού του Hough (αλγόριθμοι ανίχνευσης ακμών και ευθειών αντίστοιχα). Επίσης, αναπτύχθηκε μεθοδολογία μεταγλώττισης η οποία εκμεταλλεύεται τα χαρακτηριστικά του λογισμικού και τις παραμέτρους της ιεραρχίας μνήμης. Η μεθοδολογία μπορεί να εφαρμοστεί σε πυρήνες λογισμικού, στους οποίους α) τα μονοπάτια εκτέλεσης είναι γνωστά κατά τη μεταγλώττιση και συνεπώς δεν εξαρτώνται από τα δεδομένα, β) οι δείκτες όλων των sub- scripts να είναι γραμμικές εξισώσεις των iterators (που ισχύει στις περισσότερες περιπτώσεις). Οι μεθοδολογίες αφορούν ενσωματωμένους και γενικού σκοπού επεξεργαστές (χρήση μονάδας SIMD για περαιτέρω αύξηση της ταχύτητας). Ακολουθεί σύντομη περίληψη αυτών. Μεθοδολογία αύξησης της ταχύτητας του Πολλαπλασιασμού Πίνακα επί Πίνακα (ΠΠΠ): Αναπτύχθηκε μεθοδολογία αύξησης της ταχύτητας του ΠΠΠ για α) μονοπύρηνους επεξεργαστές (1 core), β) επεξεργαστές με πολλούς πυρήνες οι οποίοι συνδέονται με κοινή μνήμη. Η προτεινόμενη μεθοδολογία χωρίζει του πίνακες του αλγορίθμου σε μικρότερους οι οποίοι χωράνε στις κρυφές μνήμες και στο αρχείο καταχωρητών. Είναι η πρώτη φορά για τον ΠΠΠ που εισάγονται εξισώσεις οι οποίες αξιοποιούν τα associativities των κρυφών μνημών. Για τη πλήρη αξιοποίηση της ιεραρχίας της μνήμης προτείνεται νέος τρόπος αποθήκευσης των στοιχείων στη κύρια μνήμη (data array layout). Επίσης, προτείνεται διαφορετικός χρονοπρογραμματισμός σε επίπεδο στοιχείων και σε επίπεδο υπό-πινάκων. Η προτεινόμενη μεθοδολογία επιτυγχάνει από 1.1 έως 3.5 φορές μικρότερο χρόνο εκτέλεσης από τη βιβλιοθήκη του ATLAS, η οποία αποτελεί μια από τις ταχύτερες βιβλιοθήκες. Μεθοδολογία αύξησης της ταχύτητας του Fast Fourier Transform (FFT): Αναπτύχθηκε μεθοδολογία αύξησης της ταχύτητας του FFT αξιοποιώντας πλήρως τα ιδιαίτερα χαρακτηριστικά του αλγορίθμου και τις παραμέτρους της ιεραρχίας της μνήμης. Το διάγραμμα ροής δεδομένων (Data Flow Graph – DFG) του FFT, χωρίζεται σε πρότυπα (patterns) και σε υπό- FFTs. Κάθε πρότυπο, αποτελείται από πεταλούδες, σύμφωνα με το πλήθος των καταχωρητών του επεξεργαστή. Η επιλογή των πεταλούδων κάθε προτύπου έχει γίνει με τέτοιο τρόπο ώστε να μεγιστοποιείται η παραγωγή-κατανάλωση των ενδιάμεσων αποτελεσμάτων. Η σειρά εκτέλεσης των προτύπων είναι αυτή η οποία δίνει τη μέγιστη επαναχρησιμοποίηση των συντελεστών του FFT. Ο DFG του FFT χωρίζεται σε υπό-FFTs σύμφωνα με τον αριθμό και τα μεγέθη των κρυφών μνημών δεδομένων. Η προτεινόμενη μεθοδολογία δίνει από 1.1 μέχρι 1.8 φορές μικρότερο χρόνο εκτέλεσης από τη βιβλιοθήκη του FFTW, η οποία παρέχει ταχύτατο χρόνο εκτέλεσης. Είναι η πρώτη φορά για τον FFT που μια μεθοδολογία λαμβάνει υπόψη τις παραμέτρους της ιεραρχίας μνήμης και του αρχείου καταχωρητών. Μεθοδολογία αύξησης της ταχύτητας του Πολλαπλασιασμού Πίνακα επί Διάνυσμα (ΠΠΔ) για Toeplitz, Bisymetric (BT), Toeplitz (Τ) και κανονικούς πίνακες: Αναπτύχθηκε μεθοδολογία αύξησης της ταχύτητας του ΠΠΔ. Οι παραπάνω πίνακες έχουν ιδιαίτερη δομή, μικρό αριθμό διαφορετικών στοιχείων και μεγάλη επαναχρησιμοποίηση, χαρακτηριστικά τα οποία αξιοποιούνται πλήρως. Η προτεινόμενη μεθοδολογία χωρίζει τους πίνακες του αλγορίθμου σε μικρότερους οι οποίοι χωράνε στις κρυφές μνήμες και στο αρχείο καταχωρητών σύμφωνα με τον αριθμό τα μεγέθη και τα associativities των κρυφών μνημών. Για τη πλήρη αξιοποίηση της ιεραρχίας μνήμης προτείνεται νέος τρόπος αποθήκευσης των στοιχείων του πίνακα (data array layout) στη κύρια μνήμη. Η προτεινόμενη μεθοδολογία χρησιμοποιεί τον κανονικό αλγόριθμο ΠΠΔ (γραμμή επί στήλη). Ωστόσο, για BT και T πίνακες, ο ΠΠΔ μπορεί να υλοποιηθεί με χρήση του FFT επιτυγχάνοντας μικρότερη πολυπλοκότητα για μεγάλα μεγέθη πινάκων (έγινε ανάλυση και σύγκριση των δύο αλγορίθμων θεωρητικά και πειραματικά). Η προτεινόμενη μεθοδολογία για κανονικούς πίνακες συγκρίνεται με τη βιβλιοθήκη του ATLAS, επιτυγχάνοντας από 1.2 μέχρι 4.4 φορές μικρότερο χρόνο εκτέλεσης. Μεθοδολογία αύξησης της ταχύτητας του αλγόριθμου ανίχνευσης ακμών και ευθειών (αλγόριθμος του Canny και μετασχηματισμός του Hough): Αναπτύχθηκε μεθοδολογία η οποία επιτυγχάνει i) μικρότερο αριθμό εντολών ανάγνωσης/εγγραφής και διευθυνσιοδότησης, ii) μικρότερο αριθμό προσβάσεων και αστοχιών στην ιεραρχία μνήμης και iii) μικρότερο μέγεθος απαιτούμενης μνήμης του αλγορίθμου, εν συγκρίσει με την βιβλιοθήκη OpenCV η οποία παρέχει ταχύτατο χρόνο εκτέλεσης στους αλγορίθμους επεξεργασίας εικόνων. Τα παραπάνω επιτυγχάνονται: α) αξιοποιώντας την παραγωγή-κατανάλωση των στοιχείων των πινάκων και την παραλληλία του αλγορίθμου - τα τέσσερα kernels του Canny συγχωνεύονται σε ένα, διασωληνώνοντας (pipelining) τους πυρήνες για να διατηρηθούν οι εξαρτήσεις των δεδομένων, β) μειώνοντας τον αριθμό και το μέγεθος των πινάκων, γ) γράφοντας τα δεδομένα σε νέους μειωμένων διαστάσεων πίνακες με κυκλικό τρόπο, δ) χωρίζοντας τους πίνακες σε μικρότερους οι οποίοι χωράνε στο αρχείο καταχωρητών και στη κρυφή μνήμη δεδομένων σύμφωνα με το μέγεθος των κρυφών μνημών και του associativity, ε) βρίσκοντας τον βέλτιστο τρόπο αποθήκευσης των πινάκων (data array layout) στην κύρια μνήμη σύμφωνα με τη συσχετιστικότητα (associativity) της κρυφής μνήμης. Η προτεινόμενη μεθοδολογία δίνει από 1.27 μέχρι 2.2 φορές μικρότερο χρόνο εκτέλεσης από τη βιβλιοθήκη OpenCV (αναπτύχθηκε από την Intel και είναι γραμμένη σε χαμηλό επίπεδο), η οποία παρέχει ταχύτατο χρόνο εκτέλεσης. Μεθοδολογία μεταγλώττισης: Αναπτύχθηκε μεθοδολογία μεταγλώττισης η οποία αντιμετωπίζει τα προβλήματα εύρεσης χρονοπρογραμματισμών με τον ελάχιστο αριθμό i) προσβάσεων στην κρυφή μνήμη δεδομένων L1, ii) προσβάσεων στην κρυφή μνήμη L2, iii) προσβάσεων στην κύρια μνήμη, iv) πράξεων διευθυνσιοδότησης, μαζί σαν ενιαίο πρόβλημα και όχι ξεχωριστά, για ένα kernel. Η προτεινόμενη μεθοδολογία λαμβάνει ως είσοδο ker- nels σε C-κώδικα και παράγει νέα επιτυγχάνοντας είτε υψηλή απόδοση είτε τον ελάχιστο αριθμό προσβάσεων σε δεδομένη μνήμη. Αρχικά βρίσκεται ο χώρος εξερεύνησης με βάση τα χαρακτηριστικά του λογισμικού. Ο χώρος εξερεύνησης περιγράφεται από μαθηματικές εξισώσεις και ανισότητες οι οποίες προέρχονται από τα subscripts των πινάκων, τους iterators, τα όρια των βρόχων και τις εξαρτήσεις των δεδομένων. Αυτός ο χώρος εξερεύνησης δεν μπορεί να παραχθεί με την εφαρμογή υπαρχόντων μετασχηματισμών στον αρχικό C-κώδικα. Κατόπιν, ο χώρος εξερεύνησης μειώνεται τάξεις μεγέθους εφαρμόζοντας διάδοση περιορισμών (constraint propagation) των παραμέτρων του λογισμικού και αυτών της αρχιτεκτονικής της μνήμης. Το αρχείο καταχωρητών (register file) και τα μεγέθη των κρυφών μνημών αξιοποιούνται πλήρως παράγοντας ανισότητες για κάθε μνήμη οι οποίες περιέχουν α) τα μεγέθη των tiles που απαιτούνται για κάθε πίνακα, β) το σχήμα κάθε tile. Επίσης, βρίσκεται ο βέλτιστος τρόπος αποθήκευσης των στοιχείων των πινάκων στη κύρια μνήμη, σύμφωνα με τη συσχετιστικότητα (associativity) των κρυφών μνημών. Η προτεινόμενη μεθοδολογία εφαρμόστηκε σε 5 ευρέως διαδεδομένους αλγορίθμους και επιτυγχάνει αύξηση της ταχύτητας (speedup) από 2 έως 18 φορές (έγινε σύγκριση του αρχικού C κώδικα και του C κώδικα έπειτα από την εφαρμογή της προτεινόμενης μεθοδολογίας – η μεταγλώττιση έγινε με τον gcc compiler). / The existing state of the art (SOA) compilers, have 3 major disadvantages. Firstly, the back-end compiler phases - subproblems (e.g. transformations, scheduling, register allocation) are optimized separately; these subproblems depend on each other and they should be optimized together as one problem and not separately. Secondly, the existing SOA compilers do not effectively utilize the software characteristics (e.g. algorithm structure, data reuse). Thirdly, they do not effectively utilize the hardware parameters. In this PhD dissertation, new methodologies have been developed speeding up software kernels, by solving the sub-problems of finding the schedules with the minimum numbers of i) L1 data cache accesses, ii) L2 data cache accesses, iii) main memory accesses and iv) addressing instructions, as one problem and not separately. This is achieved by fully exploiting the software information and the memory hierarchy parameters. This is the first time a methodology optimizes the above sub-problems in this way. The proposed methodologies fully utilize the software characteristics. The algorithm structure (e.g. FFT data flow graph consists of butterfly operations while the gauss blur algorithm consists of array mask operations), the algorithm individual characteristics (e.g. symmetry of Toeplitz matrix), the data patterns (e.g. matrix elements are multiplied by a mask), data reuse, production-consumption of intermediate results and algorithm's parallelism, are utilized as one problem and not separately. The proposed methodologies fully utilize the major architecture parameters. The memory archi- tecture (e.g. shared L2/L3 cache), the size of the register file, the number of the levels of data cache hierarchy, the data cache sizes, the data cache associativities, the data cache line sizes, the number of the function units, the number of the function units can run in parallel and the number of the CPU cores are utilized as one problem and not separately. By utilizing the hardware and software constraints the exploration space is orders of magnitude decreased. In this PhD dissertation, new speeding-up methodologies are developed for i) Matrix Matrix Multi- plication (MMM) algorithm, ii) Matrix Vector Multiplication (MVM) algorithm, iii) Fast Fourier Trans- form (FFT), iv) Canny algorithm and Hough Transform. Also, a new compilation methodology which fully exploits the memory architecture and the software characteristics, is developed. This methodology can be applied in software kernels whose i) execution paths are known at compile time and thus they do not depend on the data, ii) all array subscripts are linear equations of the iterators (which in most cases do). The above methodologies refer to both embedded and general purpose processors (usage of the SIMD technology). The summary of the above methodologies is given below. A Methodology for speeding-up Matrix Matrix Multiplication (MMM) algorithm: A new methodol- ogy for Matrix Matrix Multiplication using SIMD (Single Instruction Multiple Data) unit and not, at one and more cores having a shared cache, is presented. The proposed methodology partitions the MMM matrices into smaller sub-matrices fitting in the data cache memories and into register file according to the memory hierarchy architecture parameters. This is the first time for MMM algorithm that equations containing the data cache associativity values, are given. To fully utilize the memory hierarchy, a new the data array layout is proposed. The proposed methodology is from 1.1 up to 3.5 times faster than one of the SOA software libraries for linear algebra, ATLAS. A Fast Fourier Transform (FFT) speeding-up methodology: A new Fast Fourier Transform method- ology is presented which fully utilizes the individual algorithm characteristics and the memory hierarchy architecture parameters. FFT data flow graph (DFG) is partitioned into patterns and into sub-FFTs. Each pattern consists of butterflies according to the number of the registers. The selection of the exact butter- flies each pattern contains, has been made by maximizing the production-consumption of the butterflies intermediate results. Also, the patterns are executed in that order, minimizing the data reuse of the FFT twiddle factors. The FFT data flow graph is partitioned into sub-FFTs according to the number of the levels and the sizes of data cache. The proposed methodology is faster from 1.1 up to 1.8 times in con- trast to the SOA FFT library, FFTW. This is the first time that an FFT methodology fully utilizes the memory hierarchy architecture parameters. A methodology for speeding-up Matrix Vector Multiplication (MVM) algorithm for regular, Toeplitz and Bisymmetric Toeplitz matrices: A new methodology for MVM including different types of matrices, is presented. The above matrices have a special structure, a small number of different elements and large data reuse. The proposed methodology partitions the MVM matrices into smaller sub-matrices fitting in the data cache memories and into register file according to the memory hierarchy architecture parameters. To fully utilize the memory hierarchy, a new data array layout is proposed. The proposed methodology uses the standard algorithm for matrix vector multiplication, i.e. each row of A is multiplied by X. However, for Bisymmetric Toeplitz (BT) and Toeplitz (T) matrices, MVM can also be implemented by using FFT; although in this paper we use the standard MVM algorithm, we show that for large input sizes, the MVM using FFT performs much better. The proposed methodology achieves speedup from 1.2 up to 4.4 over the SOA libraries, ATLAS. A Methodology for Speeding Up Edge and Line Detection Algorithms: A new Methodology for Speeding Up Edge and Line Detection Algorithms focusing on memory architecture utilization is pre- sented. This methodology achieves i) a smaller number of load/store and arithmetic instructions, ii) a smaller number of data cache accesses and data cache misses in memory hierarchy and iii) a smaller algorithm memory size, in contrast to the SOA library of OpenCV. This is achieved by: i) utilizing the production-consumption of intermediate results - merging all Canny kernels to one and pipelining the kernels to comply with the data dependences, ii) reducing the number and the size of the arrays, iii) writing the data into the new reduced size arrays in a circular way, iv) applying loop tiling for the register file and data cache, according to the size of the memories and associativity and v) finding the data arrays layout according to the data cache associativity. The proposed methodology achieves speedup from 1.27 up to 2.2 over the OpenCV SOA library. Compilation methodology: A new compilation methodology which fully exploits the memory archi- tecture and the software characteristics is presented. This is the first time that a methodology optimizes the subproblems explained above as one problem and not separately, for a loop-kernel. The proposed methodology takes as input C-code kernels and it produces new software kernels with a new iteration space, which may not be given by applying existing compiler transformations to original code. Firstly, the exploration space is found according to the s/w characteristics; it is described by mathematical equations and inequalities that are derived from the array subscripts, the combination of common array references, loop iterators, loop bounds and data dependences. Then, the exploration space is orders of magnitude decreased by applying constraint propagation of the h/w and s/w parameters. The register file and the data cache sizes are fully exploited by producing register file and data cache inequalities which contain i) the tiles sizes of each array, ii) the shape of each array tile. Also, new data array layouts are found, according to the data cache associativity. The final schedule is found by choosing the best combination of the number of i) L1 data cache accesses, ii) L2 data cache accesses, iii) main memory data accesses and iv) addressing instructions. The proposed methodology is evaluated to five well-known algorithms and speedups from 2 up to 18 over the target gcc compiler are obtained. Κρυφή μνήμη δεδομένων Συσχετιστικότητα Πολυεπεξεργασία 005.453 Data reuse Cache memory Correlativity Multiprocessing
13	Définition et implantation de la sémantique des langages de programmation Willis, Bruce 22 March 1974 (has links) (PDF) . programmation langages multiprocessing syntaxe ALGOL ALGOL68 ALOGOL 68 ALGOL 60 traduction
14	Evaluation of SMP Shared Memory Machines for Use with In-Memory and OpenMP Big Data Applications Younge, Andrew J., Reidy, Christopher, Henschel, Robert, Fox, Geoffrey C. 05 1900 (has links) While distributed memory systems have shaped the field of distributed systems for decades, the demand for many-core shared memory resources is increasing. Symmetric Multiprocessor Systems (SMPs) have become increasingly important recently among a wide array of disciplines, ranging from Bioinformatics to astrophysics, and beyond. With the increase in big data computing, the size and scope of traditional commodity server systems is often outpaced. While some big data applications can be mapped to distributed memory systems found through many cluster and cloud technologies today, this effort represents a large barrier of entry that some projects cannot cross. Shared memory SMP systems look to effectively and efficiently fill this niche within distributed systems by providing high throughput and performance with minimized development effort, as the computing environment often represents what many researchers are already familiar with. In this paper, we look at the use of two common shared memory systems, the ScaleMP vSMP virtualized SMP deployment at Indiana University, and the SGI UV architecture deployed at University of Arizona. While both systems are notably different in their design, their potential impact on computing is remarkably similar. As such, we look to compare each system first under a set of OpenMP threaded benchmarks via the SPEC group, and to follow up with our experience using each machine for Trinity de-novo assembly. We find both SMP systems are well suited to support various big data applications, with the newer vSMP deployment often slightly faster; however, certain caveats and performance considerations are necessary when considering such SMP systems. Symmetric Multiprocessing SMP ScaleMP vSMP SGI UV Large Memory Big Data Virtualization
15	Análise e implementação de suporte a SMP (multiprocessamento simétrico) para o sistema operacional eCos com aplicação em robótica móvel / Analysis and implementation of SMP support (symmetric multiprocessing) for eCos operating system with application in mobile robotics Bueno, Maikon Adiles Fernandez 26 April 2007 (has links) Technological development has significantly reduced the distance between the performance of systems designed using reconfigurable computing and dedicated hardware. The main sources of performance are the high density level of the FPGAs and the resources? improvement offered by manufacturers, who make more its use more attractive in a variety of applications, emphatically in systems that demand a high degree of flexibility. In this context, the objective of this work consists on the exploration of the resources offered by FPGAs for the development of a multiprocessed platform with the purpose of parallel execution of tasks. In this way, the eCos operating system was modified, with the addition of new characteristics to support of the Symmetric Multiprocessing model, using three soft-Core Altera Nios II processors. On this operating system, all parallelism is directly related to execution of the threads. This platform was analyzed and validated through the execution of parallel algorithms, emphasizing aspects of performance and flexibility compared to other architectures. This work contributes for reaching better results in the execution of tasks in robotics area, which belongs to a domain that demand great competition of tasks, mainly in modules that involve interaction with the external environment / Technological development has significantly reduced the distance between the performance of systems designed using reconfigurable computing and dedicated hardware. The main sources of performance are the high density level of the FPGAs and the resources? improvement offered by manufacturers, who make more its use more attractive in a variety of applications, emphatically in systems that demand a high degree of flexibility. In this context, the objective of this work consists on the exploration of the resources offered by FPGAs for the development of a multiprocessed platform with the purpose of parallel execution of tasks. In this way, the eCos operating system was modified, with the addition of new characteristics to support of the Symmetric Multiprocessing model, using three soft-Core Altera Nios II processors. On this operating system, all parallelism is directly related to execution of the threads. This platform was analyzed and validated through the execution of parallel algorithms, emphasizing aspects of performance and flexibility compared to other architectures. This work contributes for reaching better results in the execution of tasks in robotics area, which belongs to a domain that demand great competition of tasks, mainly in modules that involve interaction with the external environment eCos and Nios II eCos e Nios II Multiprocessamento simétrico Operating systems Sistemas operacionais Symmetric multiprocessing
16	Co-projeto hardware/software para cálculo de fluxo ótico / Software/hardware co-desing for the optical flow calculation Lobo, Tiago Mendonça 17 June 2013 (has links) O cálculo dos vetores de movimento é utilizado em vários processos na área de visão computacional. Problemas como estabelecer rotas de colisão e movimentação da câmera (egomotion) utilizam os vetores como entrada de algoritmos complexos e que demandam muitos recursos computacionais e consequentemente um consumo maior de energia. O fluxo ótico é uma aproximação do campo gerado pelos vetores de movimento. Porém, para aplicações móveis e de baixo consumo de energia se torna inviável o uso de computadores de uso geral. Um sistema embarcado é definido como um computador desenvolvido com um propósito específico referente à aplicação na qual está inserido. O objetivo principal deste trabalho foi elaborar um módulo em sistema embarcado que realiza o cálculo do fluxo ótico. Foi elaborado um co-projeto de hardware e software dedicado e implementados em FPGAs Cyclone II e Stratix IV para a prototipação do sistema. Desta forma, a implementação de um projeto que auxilia a detecção e medição do movimento é importante não só como aplicação isolada, mas para servir de base no desenvolvimento de outras aplicações como tracking, compressão de vídeos, predição de colisão, etc / The motion vectors calculation is used in many processes in the area of computer vision. Problems such as establishing collision routes and the movement of the camera (egomotion) use this vectors as input for complexes algorithms that require many computational and energy resources. The optical flow is an approximation of the field generated by the motion vectors. However, for mobile, low power consumption applications becomes infeasible to use general-purpose computers. An embedded system is defined as a computer designed with a specific purpose related to the application in which it is inserted. The main objective of this work is to implement a hardware and software co-design to assist the optical flow field calculation using the CycloneII and Stratix IV FPGAs. Sad that, it is easily to see that the implementation of a project to help the detection and measurement of the movement can be the base to the development of others applications like tracking, video compression and collision detection Co-desing Co-projeto Fluxo ótico FPGA FPGA multiprocessamento multiprocessing Nios II Nios II Optical flow
17	Co-projeto de hardware e software de um escalonador de processos para arquiteturas multicore heterogêneas baseadas em computação reconfigurável / Hardware and software co-design of a process scheduler for heterogeneous multicore architectures based on reconfigurable computing Bueno, Maikon Adiles Fernandez 05 November 2013 (has links) As arquiteturas multiprocessadas heterogêneas têm como objetivo principal a extração de maior desempenho da execução dos processos, por meio da utilização de núcleos apropriados às suas demandas. No entanto, a extração de maior desempenho é dependente de um mecanismo eficiente de escalonamento, capaz de identificar as demandas dos processos em tempo real e, a partir delas, designar o processador mais adequado, de acordo com seus recursos. Este trabalho tem como objetivo propor e implementar o modelo de um escalonador para arquiteturas multiprocessadas heterogêneas, baseado em software e hardware, aplicado ao sistema operacional Linux e ao processador SPARC Leon3, como prova de conceito. Nesse sentido, foram implementados monitores de desempenho dentro dos processadores, os quais identificam as demandas dos processos em tempo real. Para cada processo, sua demanda é projetada para os demais processadores da arquitetura e em seguida é realizado um balanceamento visando maximizar o desempenho total do sistema, distribuindo os processos entre processadores, de modo a diminuir o tempo total de processamento de todos os processos. O algoritmo de maximização Hungarian, utilizado no balanceamento do escalonador, foi desenvolvido em hardware, proporcionando paralelismo e maior desempenho na execução do algoritmo. O escalonador foi validado por meio da execução paralela de diversos benchmarks, resultando na diminuição dos tempos de execução em relação ao escalonador sem suporte à heterogeneidade / Heterogeneous multiprocessor architectures have as main objective the extraction of higher performance from processes through the use of appropriate cores to their demands. However, the extraction of higher performance is dependent on an efficient scheduling mechanism, able to identify in real-time the demands of processes and to designate the most appropriate processor according to their resources. This work aims at design and implementations of a model of a scheduler for heterogeneous multiprocessor architectures based on software and hardware, applied to the Linux operating system and the SPARC Leon3 processor as proof of concept. In this sense, performance monitors have been implemented within the processors, which in real-time identifies the demands of processes. For each process, its demand is projected for the other processors in the architecture and then it is performed a balancing to maximize the total system performance by distributing processes among processors. The Hungarian maximization algorithm, used in balancing scheduler was developed in hardware, providing greater parallelism and performance in the execution of the algorithm. The scheduler has been validated through the parallel execution of several benchmarks, resulting in decreased execution times compared to the scheduler without the heterogeneity support Computação reconfigurável Escalonador Heterogeneous multiprocessing Multi-core Multicore Multiprocessamento heterogêneo Reconfigurable computing Scheduler
18	Construction of a support tool for the design of the activity structures based computer system architectures Mohamad, Sabah Mohamad Amin January 1986 (has links) This thesis is a reapproachment of diverse design concepts, brought to bear upon the computer system engineering problem of identification and control of highly constrained multiprocessing (HCM) computer machines. It contributes to the area of meta/general systems methodology, and brings a new insight into the design formalisms, and results afforded by bringing together various design concepts that can be used for the construction of highly constrained computer system architectures. A unique point of view is taken by assuming the process of identification and control of HCM computer systems to be the process generated by the Activity Structures Methodology (ASM). The research in ASM has emerged from the Neuroscience research, aiming at providing the techniques for combining the diverse knowledge sources that capture the 'deep knowledge' of this application field in an effective formal and computer representable form. To apply the ASM design guidelines in the realm of the distributed computer system design, we provide new design definitions for the identification and control of such machines in terms of realisations. These realisation definitions characterise the various classes of the identification and control problem. The classes covered consist of: 1. the identification of the designer activities, 2. the identification and control of the machine's distributed structures of behaviour, 3. the identification and control of the conversational environment activities (i.e. the randomised/ adaptive activities and interactions of both the user and the machine environments), 4. the identification and control of the substrata needed for the realisation of the machine, and 5. the identification of the admissible design data, both user-oriented and machineoriented, that can force the conversational environment to act in a self-regulating manner. All extent results are considered in this context, allowing the development of both necessary conditions for machine identification in terms of their distributed behaviours as well as the substrata structures of the unknown machine and sufficient conditions in terms of experiments on the unknown machine to achieve the self-regulation behaviour. We provide a detailed description of the design and implementation of the support software tool which can be used for aiding the process of constructing effective, HCM computer systems, based on various classes of identification and control. The design data of a highly constrained system, the NUKE, are used to verify the tool logic as well as the various identification and control procedures. Possible extensions as well as future work implied by the results are considered. 621.39
19	Análise e implementação de suporte a SMP (multiprocessamento simétrico) para o sistema operacional eCos com aplicação em robótica móvel / Analysis and implementation of SMP support (symmetric multiprocessing) for eCos operating system with application in mobile robotics Maikon Adiles Fernandez Bueno 26 April 2007 (has links) Technological development has significantly reduced the distance between the performance of systems designed using reconfigurable computing and dedicated hardware. The main sources of performance are the high density level of the FPGAs and the resources? improvement offered by manufacturers, who make more its use more attractive in a variety of applications, emphatically in systems that demand a high degree of flexibility. In this context, the objective of this work consists on the exploration of the resources offered by FPGAs for the development of a multiprocessed platform with the purpose of parallel execution of tasks. In this way, the eCos operating system was modified, with the addition of new characteristics to support of the Symmetric Multiprocessing model, using three soft-Core Altera Nios II processors. On this operating system, all parallelism is directly related to execution of the threads. This platform was analyzed and validated through the execution of parallel algorithms, emphasizing aspects of performance and flexibility compared to other architectures. This work contributes for reaching better results in the execution of tasks in robotics area, which belongs to a domain that demand great competition of tasks, mainly in modules that involve interaction with the external environment / Technological development has significantly reduced the distance between the performance of systems designed using reconfigurable computing and dedicated hardware. The main sources of performance are the high density level of the FPGAs and the resources? improvement offered by manufacturers, who make more its use more attractive in a variety of applications, emphatically in systems that demand a high degree of flexibility. In this context, the objective of this work consists on the exploration of the resources offered by FPGAs for the development of a multiprocessed platform with the purpose of parallel execution of tasks. In this way, the eCos operating system was modified, with the addition of new characteristics to support of the Symmetric Multiprocessing model, using three soft-Core Altera Nios II processors. On this operating system, all parallelism is directly related to execution of the threads. This platform was analyzed and validated through the execution of parallel algorithms, emphasizing aspects of performance and flexibility compared to other architectures. This work contributes for reaching better results in the execution of tasks in robotics area, which belongs to a domain that demand great competition of tasks, mainly in modules that involve interaction with the external environment eCos e Nios II Multiprocessamento simétrico Sistemas operacionais eCos and Nios II Operating systems Symmetric multiprocessing
20	Co-projeto de hardware e software de um escalonador de processos para arquiteturas multicore heterogêneas baseadas em computação reconfigurável / Hardware and software co-design of a process scheduler for heterogeneous multicore architectures based on reconfigurable computing Maikon Adiles Fernandez Bueno 05 November 2013 (has links) As arquiteturas multiprocessadas heterogêneas têm como objetivo principal a extração de maior desempenho da execução dos processos, por meio da utilização de núcleos apropriados às suas demandas. No entanto, a extração de maior desempenho é dependente de um mecanismo eficiente de escalonamento, capaz de identificar as demandas dos processos em tempo real e, a partir delas, designar o processador mais adequado, de acordo com seus recursos. Este trabalho tem como objetivo propor e implementar o modelo de um escalonador para arquiteturas multiprocessadas heterogêneas, baseado em software e hardware, aplicado ao sistema operacional Linux e ao processador SPARC Leon3, como prova de conceito. Nesse sentido, foram implementados monitores de desempenho dentro dos processadores, os quais identificam as demandas dos processos em tempo real. Para cada processo, sua demanda é projetada para os demais processadores da arquitetura e em seguida é realizado um balanceamento visando maximizar o desempenho total do sistema, distribuindo os processos entre processadores, de modo a diminuir o tempo total de processamento de todos os processos. O algoritmo de maximização Hungarian, utilizado no balanceamento do escalonador, foi desenvolvido em hardware, proporcionando paralelismo e maior desempenho na execução do algoritmo. O escalonador foi validado por meio da execução paralela de diversos benchmarks, resultando na diminuição dos tempos de execução em relação ao escalonador sem suporte à heterogeneidade / Heterogeneous multiprocessor architectures have as main objective the extraction of higher performance from processes through the use of appropriate cores to their demands. However, the extraction of higher performance is dependent on an efficient scheduling mechanism, able to identify in real-time the demands of processes and to designate the most appropriate processor according to their resources. This work aims at design and implementations of a model of a scheduler for heterogeneous multiprocessor architectures based on software and hardware, applied to the Linux operating system and the SPARC Leon3 processor as proof of concept. In this sense, performance monitors have been implemented within the processors, which in real-time identifies the demands of processes. For each process, its demand is projected for the other processors in the architecture and then it is performed a balancing to maximize the total system performance by distributing processes among processors. The Hungarian maximization algorithm, used in balancing scheduler was developed in hardware, providing greater parallelism and performance in the execution of the algorithm. The scheduler has been validated through the parallel execution of several benchmarks, resulting in decreased execution times compared to the scheduler without the heterogeneity support Computação reconfigurável Escalonador Multi-core Multiprocessamento heterogêneo Heterogeneous multiprocessing Multicore Reconfigurable computing Scheduler

Search results