Global ETD Search

421	Μεθοδολογία ανάπτυξης μεταγλωττιστών με εκμετάλλευση της δομής του λογισμικού και του μοντέλου του υλικού του Κελεφούρας, Βασίλειος 16 May 2014 (has links) Οι υπάρχοντες μεταγλωττιστές, έχουν τρία βασικά μειονεκτήματα i) όλα τα υπό-προβλήματα της μεταγλώττισης (π.χ. μετασχηματισμοί, εύρεση χρονοπρογραμματισμού, ανάθεση καταχωρητών) βελτιστοποιούνται ξεχωριστά (εκτός από μεμονωμένες περιπτώσεις όπου βελτιστοποιούνται κάποια στάδια μαζί - συνήθως 2), παρόλο που υπάρχει εξάρτηση μεταξύ τους, ii) δεν εκμεταλλεύονται αποδοτικά όλα τα χαρακτηριστικά του προγράμματος εισόδου (π.χ. δομή του εκάστοτε αλγορίθμου, επαναχρησιμοποίηση δεδομένων), iii) δεν εκμεταλλεύονται αποδοτικά τις παραμέτρους της αρχιτεκτονικής. Στη παρούσα διδακτορική διατριβή, αναπτύχθηκαν μεθοδολογίες οι οποίες αντιμετωπίζουν τα προβλήματα εύρεσης χρονοπρογραμματισμών με τον ελάχιστο αριθμό i) προσβάσεων στην κρυφή μνήμη δεδομένων L1, ii) προσβάσεων στην κρυφή μνήμη L2, iii) προσβάσεων στην κύρια μνήμη, iv) πράξεων διευθυνσιοδότησης, μαζί σαν ενιαίο πρόβλημα και όχι ξεχωριστά, για ένα kernel. Αυτό επιτυγχάνεται αντιμετωπίζοντας τα χαρακτηριστικά του λογισμικού και τις τις βασικές παραμέτρους της αρχιτεκτονικής μαζί σαν ενιαίο πρόβλημα. Είναι η πρώτη φορά που μια μεθοδολογία αντιμετωπίζει τα παραπάνω προβλήματα με αυτό τον τρόπο. Οι προτεινόμενες μεθοδολογίες εκμεταλλεύονται τα χαρακτηριστικά του προγράμματος εισόδου. Η δομή του εκάστοτε αλγορίθμου (π.χ. ο FFT αποτελείται από πράξεις πεταλούδων ενώ ο αλγόριθμος αφαίρεσης θορύβου - Gauss Blur αποτελείται από πράξεις μάσκας στοιχείων), τα ιδιαίτερα χαρακτηριστικά του (π.χ. συμμετρία Toeplitz πίνακα), η ύπαρξη προτύπων-patterns (π.χ. στοιχεία πινάκων πολλαπλασιάζονται με μάσκα), η επαναχρησιμοποίηση των δεδομένων, η παραγωγή-κατανάλωση ενδιάμεσων αποτελεσμάτων και η παραλληλία του αλγορίθμου, αντιμετωπίζονται μαζί σαν ενιαίο πρόβλημα. Οι προτεινόμενες μεθοδολογίες εκμεταλλεύονται τις βασικές παραμέτρους της αρχιτεκτονικής. Η αρχιτεκτονική της μνήμης (π.χ. κοινή L2, L3), το πλήθος των καταχωρητών, ο αριθμός των κρυφών μνημών δεδομένων, τα μεγέθη, οι συσχετιστικότητες (assosiativity) και τα μεγέθη των γραμμών των κρυφών μνημών, ο αριθμός των λειτουργικών μονάδων, ο αριθμός των λειτουργικών μονάδων που λειτουργούν παράλληλα και ο αριθμός των πυρήνων (cores) του επεξεργαστή, αντιμετωπίζονται μαζί σαν ενιαίο πρόβλημα. Με την αξιοποίηση των χαρακτηριστικών του εκάστοτε αλγορίθμου και των παραμέτρων της αρχιτεκτονικής, αποκλείονται πιθανές λύσεις και ο χώρος εξερεύνησης μειώνεται ραγδαία (τάξεις μεγέθους). Στη παρούσα διδακτορική διατριβή, αναπτύχθηκαν μεθοδολογίες αύξησης της ταχύτητας του λογισμικού α) του Πολλαπλασιασμού Πίνακα επί Πίνακα (ΠΠΠ), β) του Πολλαπλασιασμού Πίνακα επί διάνυσμα (ΠΠΔ), γ) του Fast Fourier Transform (FFT), δ) του αλγορίθμου Canny και του μετασχηματισμού του Hough (αλγόριθμοι ανίχνευσης ακμών και ευθειών αντίστοιχα). Επίσης, αναπτύχθηκε μεθοδολογία μεταγλώττισης η οποία εκμεταλλεύεται τα χαρακτηριστικά του λογισμικού και τις παραμέτρους της ιεραρχίας μνήμης. Η μεθοδολογία μπορεί να εφαρμοστεί σε πυρήνες λογισμικού, στους οποίους α) τα μονοπάτια εκτέλεσης είναι γνωστά κατά τη μεταγλώττιση και συνεπώς δεν εξαρτώνται από τα δεδομένα, β) οι δείκτες όλων των sub- scripts να είναι γραμμικές εξισώσεις των iterators (που ισχύει στις περισσότερες περιπτώσεις). Οι μεθοδολογίες αφορούν ενσωματωμένους και γενικού σκοπού επεξεργαστές (χρήση μονάδας SIMD για περαιτέρω αύξηση της ταχύτητας). Ακολουθεί σύντομη περίληψη αυτών. Μεθοδολογία αύξησης της ταχύτητας του Πολλαπλασιασμού Πίνακα επί Πίνακα (ΠΠΠ): Αναπτύχθηκε μεθοδολογία αύξησης της ταχύτητας του ΠΠΠ για α) μονοπύρηνους επεξεργαστές (1 core), β) επεξεργαστές με πολλούς πυρήνες οι οποίοι συνδέονται με κοινή μνήμη. Η προτεινόμενη μεθοδολογία χωρίζει του πίνακες του αλγορίθμου σε μικρότερους οι οποίοι χωράνε στις κρυφές μνήμες και στο αρχείο καταχωρητών. Είναι η πρώτη φορά για τον ΠΠΠ που εισάγονται εξισώσεις οι οποίες αξιοποιούν τα associativities των κρυφών μνημών. Για τη πλήρη αξιοποίηση της ιεραρχίας της μνήμης προτείνεται νέος τρόπος αποθήκευσης των στοιχείων στη κύρια μνήμη (data array layout). Επίσης, προτείνεται διαφορετικός χρονοπρογραμματισμός σε επίπεδο στοιχείων και σε επίπεδο υπό-πινάκων. Η προτεινόμενη μεθοδολογία επιτυγχάνει από 1.1 έως 3.5 φορές μικρότερο χρόνο εκτέλεσης από τη βιβλιοθήκη του ATLAS, η οποία αποτελεί μια από τις ταχύτερες βιβλιοθήκες. Μεθοδολογία αύξησης της ταχύτητας του Fast Fourier Transform (FFT): Αναπτύχθηκε μεθοδολογία αύξησης της ταχύτητας του FFT αξιοποιώντας πλήρως τα ιδιαίτερα χαρακτηριστικά του αλγορίθμου και τις παραμέτρους της ιεραρχίας της μνήμης. Το διάγραμμα ροής δεδομένων (Data Flow Graph – DFG) του FFT, χωρίζεται σε πρότυπα (patterns) και σε υπό- FFTs. Κάθε πρότυπο, αποτελείται από πεταλούδες, σύμφωνα με το πλήθος των καταχωρητών του επεξεργαστή. Η επιλογή των πεταλούδων κάθε προτύπου έχει γίνει με τέτοιο τρόπο ώστε να μεγιστοποιείται η παραγωγή-κατανάλωση των ενδιάμεσων αποτελεσμάτων. Η σειρά εκτέλεσης των προτύπων είναι αυτή η οποία δίνει τη μέγιστη επαναχρησιμοποίηση των συντελεστών του FFT. Ο DFG του FFT χωρίζεται σε υπό-FFTs σύμφωνα με τον αριθμό και τα μεγέθη των κρυφών μνημών δεδομένων. Η προτεινόμενη μεθοδολογία δίνει από 1.1 μέχρι 1.8 φορές μικρότερο χρόνο εκτέλεσης από τη βιβλιοθήκη του FFTW, η οποία παρέχει ταχύτατο χρόνο εκτέλεσης. Είναι η πρώτη φορά για τον FFT που μια μεθοδολογία λαμβάνει υπόψη τις παραμέτρους της ιεραρχίας μνήμης και του αρχείου καταχωρητών. Μεθοδολογία αύξησης της ταχύτητας του Πολλαπλασιασμού Πίνακα επί Διάνυσμα (ΠΠΔ) για Toeplitz, Bisymetric (BT), Toeplitz (Τ) και κανονικούς πίνακες: Αναπτύχθηκε μεθοδολογία αύξησης της ταχύτητας του ΠΠΔ. Οι παραπάνω πίνακες έχουν ιδιαίτερη δομή, μικρό αριθμό διαφορετικών στοιχείων και μεγάλη επαναχρησιμοποίηση, χαρακτηριστικά τα οποία αξιοποιούνται πλήρως. Η προτεινόμενη μεθοδολογία χωρίζει τους πίνακες του αλγορίθμου σε μικρότερους οι οποίοι χωράνε στις κρυφές μνήμες και στο αρχείο καταχωρητών σύμφωνα με τον αριθμό τα μεγέθη και τα associativities των κρυφών μνημών. Για τη πλήρη αξιοποίηση της ιεραρχίας μνήμης προτείνεται νέος τρόπος αποθήκευσης των στοιχείων του πίνακα (data array layout) στη κύρια μνήμη. Η προτεινόμενη μεθοδολογία χρησιμοποιεί τον κανονικό αλγόριθμο ΠΠΔ (γραμμή επί στήλη). Ωστόσο, για BT και T πίνακες, ο ΠΠΔ μπορεί να υλοποιηθεί με χρήση του FFT επιτυγχάνοντας μικρότερη πολυπλοκότητα για μεγάλα μεγέθη πινάκων (έγινε ανάλυση και σύγκριση των δύο αλγορίθμων θεωρητικά και πειραματικά). Η προτεινόμενη μεθοδολογία για κανονικούς πίνακες συγκρίνεται με τη βιβλιοθήκη του ATLAS, επιτυγχάνοντας από 1.2 μέχρι 4.4 φορές μικρότερο χρόνο εκτέλεσης. Μεθοδολογία αύξησης της ταχύτητας του αλγόριθμου ανίχνευσης ακμών και ευθειών (αλγόριθμος του Canny και μετασχηματισμός του Hough): Αναπτύχθηκε μεθοδολογία η οποία επιτυγχάνει i) μικρότερο αριθμό εντολών ανάγνωσης/εγγραφής και διευθυνσιοδότησης, ii) μικρότερο αριθμό προσβάσεων και αστοχιών στην ιεραρχία μνήμης και iii) μικρότερο μέγεθος απαιτούμενης μνήμης του αλγορίθμου, εν συγκρίσει με την βιβλιοθήκη OpenCV η οποία παρέχει ταχύτατο χρόνο εκτέλεσης στους αλγορίθμους επεξεργασίας εικόνων. Τα παραπάνω επιτυγχάνονται: α) αξιοποιώντας την παραγωγή-κατανάλωση των στοιχείων των πινάκων και την παραλληλία του αλγορίθμου - τα τέσσερα kernels του Canny συγχωνεύονται σε ένα, διασωληνώνοντας (pipelining) τους πυρήνες για να διατηρηθούν οι εξαρτήσεις των δεδομένων, β) μειώνοντας τον αριθμό και το μέγεθος των πινάκων, γ) γράφοντας τα δεδομένα σε νέους μειωμένων διαστάσεων πίνακες με κυκλικό τρόπο, δ) χωρίζοντας τους πίνακες σε μικρότερους οι οποίοι χωράνε στο αρχείο καταχωρητών και στη κρυφή μνήμη δεδομένων σύμφωνα με το μέγεθος των κρυφών μνημών και του associativity, ε) βρίσκοντας τον βέλτιστο τρόπο αποθήκευσης των πινάκων (data array layout) στην κύρια μνήμη σύμφωνα με τη συσχετιστικότητα (associativity) της κρυφής μνήμης. Η προτεινόμενη μεθοδολογία δίνει από 1.27 μέχρι 2.2 φορές μικρότερο χρόνο εκτέλεσης από τη βιβλιοθήκη OpenCV (αναπτύχθηκε από την Intel και είναι γραμμένη σε χαμηλό επίπεδο), η οποία παρέχει ταχύτατο χρόνο εκτέλεσης. Μεθοδολογία μεταγλώττισης: Αναπτύχθηκε μεθοδολογία μεταγλώττισης η οποία αντιμετωπίζει τα προβλήματα εύρεσης χρονοπρογραμματισμών με τον ελάχιστο αριθμό i) προσβάσεων στην κρυφή μνήμη δεδομένων L1, ii) προσβάσεων στην κρυφή μνήμη L2, iii) προσβάσεων στην κύρια μνήμη, iv) πράξεων διευθυνσιοδότησης, μαζί σαν ενιαίο πρόβλημα και όχι ξεχωριστά, για ένα kernel. Η προτεινόμενη μεθοδολογία λαμβάνει ως είσοδο ker- nels σε C-κώδικα και παράγει νέα επιτυγχάνοντας είτε υψηλή απόδοση είτε τον ελάχιστο αριθμό προσβάσεων σε δεδομένη μνήμη. Αρχικά βρίσκεται ο χώρος εξερεύνησης με βάση τα χαρακτηριστικά του λογισμικού. Ο χώρος εξερεύνησης περιγράφεται από μαθηματικές εξισώσεις και ανισότητες οι οποίες προέρχονται από τα subscripts των πινάκων, τους iterators, τα όρια των βρόχων και τις εξαρτήσεις των δεδομένων. Αυτός ο χώρος εξερεύνησης δεν μπορεί να παραχθεί με την εφαρμογή υπαρχόντων μετασχηματισμών στον αρχικό C-κώδικα. Κατόπιν, ο χώρος εξερεύνησης μειώνεται τάξεις μεγέθους εφαρμόζοντας διάδοση περιορισμών (constraint propagation) των παραμέτρων του λογισμικού και αυτών της αρχιτεκτονικής της μνήμης. Το αρχείο καταχωρητών (register file) και τα μεγέθη των κρυφών μνημών αξιοποιούνται πλήρως παράγοντας ανισότητες για κάθε μνήμη οι οποίες περιέχουν α) τα μεγέθη των tiles που απαιτούνται για κάθε πίνακα, β) το σχήμα κάθε tile. Επίσης, βρίσκεται ο βέλτιστος τρόπος αποθήκευσης των στοιχείων των πινάκων στη κύρια μνήμη, σύμφωνα με τη συσχετιστικότητα (associativity) των κρυφών μνημών. Η προτεινόμενη μεθοδολογία εφαρμόστηκε σε 5 ευρέως διαδεδομένους αλγορίθμους και επιτυγχάνει αύξηση της ταχύτητας (speedup) από 2 έως 18 φορές (έγινε σύγκριση του αρχικού C κώδικα και του C κώδικα έπειτα από την εφαρμογή της προτεινόμενης μεθοδολογίας – η μεταγλώττιση έγινε με τον gcc compiler). / The existing state of the art (SOA) compilers, have 3 major disadvantages. Firstly, the back-end compiler phases - subproblems (e.g. transformations, scheduling, register allocation) are optimized separately; these subproblems depend on each other and they should be optimized together as one problem and not separately. Secondly, the existing SOA compilers do not effectively utilize the software characteristics (e.g. algorithm structure, data reuse). Thirdly, they do not effectively utilize the hardware parameters. In this PhD dissertation, new methodologies have been developed speeding up software kernels, by solving the sub-problems of finding the schedules with the minimum numbers of i) L1 data cache accesses, ii) L2 data cache accesses, iii) main memory accesses and iv) addressing instructions, as one problem and not separately. This is achieved by fully exploiting the software information and the memory hierarchy parameters. This is the first time a methodology optimizes the above sub-problems in this way. The proposed methodologies fully utilize the software characteristics. The algorithm structure (e.g. FFT data flow graph consists of butterfly operations while the gauss blur algorithm consists of array mask operations), the algorithm individual characteristics (e.g. symmetry of Toeplitz matrix), the data patterns (e.g. matrix elements are multiplied by a mask), data reuse, production-consumption of intermediate results and algorithm's parallelism, are utilized as one problem and not separately. The proposed methodologies fully utilize the major architecture parameters. The memory archi- tecture (e.g. shared L2/L3 cache), the size of the register file, the number of the levels of data cache hierarchy, the data cache sizes, the data cache associativities, the data cache line sizes, the number of the function units, the number of the function units can run in parallel and the number of the CPU cores are utilized as one problem and not separately. By utilizing the hardware and software constraints the exploration space is orders of magnitude decreased. In this PhD dissertation, new speeding-up methodologies are developed for i) Matrix Matrix Multi- plication (MMM) algorithm, ii) Matrix Vector Multiplication (MVM) algorithm, iii) Fast Fourier Trans- form (FFT), iv) Canny algorithm and Hough Transform. Also, a new compilation methodology which fully exploits the memory architecture and the software characteristics, is developed. This methodology can be applied in software kernels whose i) execution paths are known at compile time and thus they do not depend on the data, ii) all array subscripts are linear equations of the iterators (which in most cases do). The above methodologies refer to both embedded and general purpose processors (usage of the SIMD technology). The summary of the above methodologies is given below. A Methodology for speeding-up Matrix Matrix Multiplication (MMM) algorithm: A new methodol- ogy for Matrix Matrix Multiplication using SIMD (Single Instruction Multiple Data) unit and not, at one and more cores having a shared cache, is presented. The proposed methodology partitions the MMM matrices into smaller sub-matrices fitting in the data cache memories and into register file according to the memory hierarchy architecture parameters. This is the first time for MMM algorithm that equations containing the data cache associativity values, are given. To fully utilize the memory hierarchy, a new the data array layout is proposed. The proposed methodology is from 1.1 up to 3.5 times faster than one of the SOA software libraries for linear algebra, ATLAS. A Fast Fourier Transform (FFT) speeding-up methodology: A new Fast Fourier Transform method- ology is presented which fully utilizes the individual algorithm characteristics and the memory hierarchy architecture parameters. FFT data flow graph (DFG) is partitioned into patterns and into sub-FFTs. Each pattern consists of butterflies according to the number of the registers. The selection of the exact butter- flies each pattern contains, has been made by maximizing the production-consumption of the butterflies intermediate results. Also, the patterns are executed in that order, minimizing the data reuse of the FFT twiddle factors. The FFT data flow graph is partitioned into sub-FFTs according to the number of the levels and the sizes of data cache. The proposed methodology is faster from 1.1 up to 1.8 times in con- trast to the SOA FFT library, FFTW. This is the first time that an FFT methodology fully utilizes the memory hierarchy architecture parameters. A methodology for speeding-up Matrix Vector Multiplication (MVM) algorithm for regular, Toeplitz and Bisymmetric Toeplitz matrices: A new methodology for MVM including different types of matrices, is presented. The above matrices have a special structure, a small number of different elements and large data reuse. The proposed methodology partitions the MVM matrices into smaller sub-matrices fitting in the data cache memories and into register file according to the memory hierarchy architecture parameters. To fully utilize the memory hierarchy, a new data array layout is proposed. The proposed methodology uses the standard algorithm for matrix vector multiplication, i.e. each row of A is multiplied by X. However, for Bisymmetric Toeplitz (BT) and Toeplitz (T) matrices, MVM can also be implemented by using FFT; although in this paper we use the standard MVM algorithm, we show that for large input sizes, the MVM using FFT performs much better. The proposed methodology achieves speedup from 1.2 up to 4.4 over the SOA libraries, ATLAS. A Methodology for Speeding Up Edge and Line Detection Algorithms: A new Methodology for Speeding Up Edge and Line Detection Algorithms focusing on memory architecture utilization is pre- sented. This methodology achieves i) a smaller number of load/store and arithmetic instructions, ii) a smaller number of data cache accesses and data cache misses in memory hierarchy and iii) a smaller algorithm memory size, in contrast to the SOA library of OpenCV. This is achieved by: i) utilizing the production-consumption of intermediate results - merging all Canny kernels to one and pipelining the kernels to comply with the data dependences, ii) reducing the number and the size of the arrays, iii) writing the data into the new reduced size arrays in a circular way, iv) applying loop tiling for the register file and data cache, according to the size of the memories and associativity and v) finding the data arrays layout according to the data cache associativity. The proposed methodology achieves speedup from 1.27 up to 2.2 over the OpenCV SOA library. Compilation methodology: A new compilation methodology which fully exploits the memory archi- tecture and the software characteristics is presented. This is the first time that a methodology optimizes the subproblems explained above as one problem and not separately, for a loop-kernel. The proposed methodology takes as input C-code kernels and it produces new software kernels with a new iteration space, which may not be given by applying existing compiler transformations to original code. Firstly, the exploration space is found according to the s/w characteristics; it is described by mathematical equations and inequalities that are derived from the array subscripts, the combination of common array references, loop iterators, loop bounds and data dependences. Then, the exploration space is orders of magnitude decreased by applying constraint propagation of the h/w and s/w parameters. The register file and the data cache sizes are fully exploited by producing register file and data cache inequalities which contain i) the tiles sizes of each array, ii) the shape of each array tile. Also, new data array layouts are found, according to the data cache associativity. The final schedule is found by choosing the best combination of the number of i) L1 data cache accesses, ii) L2 data cache accesses, iii) main memory data accesses and iv) addressing instructions. The proposed methodology is evaluated to five well-known algorithms and speedups from 2 up to 18 over the target gcc compiler are obtained. Κρυφή μνήμη δεδομένων Συσχετιστικότητα Πολυεπεξεργασία 005.453 Data reuse Cache memory Correlativity Multiprocessing
422	Enhanced font services for X Window system Tsang, Pong-fan, Dex, 曾邦勳 January 2000 (has links) published_or_final_version / Computer Science and Information Systems / Master / Master of Philosophy X Window System (Computer system) Data structures (Computer science) Cache memory.
423	Simulateur pour l'étude de la visibilité dans les environnements enfumés. Ribardière, Mickaël 16 December 2010 (has links) (PDF) La simulation d'éclairage peut être utilisée pour l'étude et l'analyse du confort visuel ou de la performance de dispositifs d'éclairage. Pour répondre à de tels objectifs, les méthodes utilisées doivent résoudre de manière précise et réaliste la problématique de l'illumination globale. De plus, les logiciels de simulation d'éclairage doivent souvent manipuler des scènes géométriquement complexes mais aussi exploiter les propriétés photométriques réalistes des sources artificielles étendues, des sources naturelles et des matériaux. L'objectif du travail de thèse est d'étendre les possibilités de ces outils à la prise en compte d'environnements enfumés dans lesquels la densité et la répartition des fumées évoluent avec le temps, tout en considérant le déplacement d'un observateur virtuel dans la scène. De telles possibilités ouvriraient le champ des possibilités de la simulation d'éclairage à des cas d'étude de la vision dans les fumées pour la sécurité incendie par exemple. Suite à une analyse globale du problème (interaction lumière/matériaux, lumière/fumée, évolution de la fumée dans le temps), le travail de recherche est décomposé en trois parties. Nous présentons dans un premier temps une nouvelle méthode de résolution de l'illumination globale pour les objets surfaciques basée sur la méthode de cache d'éclairement avec des enregistrements dont les zones d'influence s'adaptent à la géométrie et aux variations d'éclairage. Nous appelons ces enregistrements des \textit{enregistrements adaptatifs} Cette technique permet de contrôler plus finement la densité du cache. Par la suite, les travaux s'intéressent en détail à la problématique des milieux participatifs statiques et de leur interaction avec la lumière. Une méthode de résolution, s'appuyant sur les travaux de la première partie, est alors proposée. Les enregistrements adaptatifs sont créés dans l'espace en fonction des caractéristiques de la fumée (coefficients de diffusion et d'absorption) et de son influence sur l'éclairage global. Enfin, l'aspect dynamique est étudié et une extension temporelle de la méthode de simulation d'éclairage en présence de milieux participatifs est alors proposée. Nous introduisons le concept d'enregistrements adaptatifs spatio-temporels (pour les surfaces et les volumes) pour interpoler les variations d'éclairement à la fois dans l'espace et dans le temps. Simulation d'éclairage Illumination Globale Rendu Milieux Participatifs Cache d'éclairement
424	Langage de mashup pour l'intégration autonomique de services de données dans des environements dynamiques Othman-Abdallah, Mohamad 26 February 2014 (has links) (PDF) Dans les communautés virtuelles, l'intégration des informations (provenant de sources ou services de données) est centrée sur les utilisateurs, i.e., associée à la visualisation d'informations déterminée par les besoins des utilisateurs. Une communauté virtuelle peut donc être vue comme un espace de données personnalisable par chaque utilisateur en spécifiant quelles sont ses informations d'intérêt et la façon dont elles doivent être présentées en respectant des contraintes de sécurité et de qualité de services (QoS). Ces contraintes sont définies ou inférées puis interprétées pour construire des visualisations personnalisées. Il n'existe pas aujourd'hui de langage déclaratif simple de mashups pour récupérer, intégrer et visualiser des données fournies par des services selon des spécifications spatio-temporelles. Dans le cadre de la thèse il s'agit de proposer un tel langage. Ce travail est réalisé dans le cadre du projet Redshine, bénéficiant d'un Bonus Qualité Recherche de Grenoble INP. services de données mashup cache
425	Novel storage architectures and pointer-free search trees for database systems Vasaitis, Vasileios January 2012 (has links) Database systems research is an old and well-established field in computer science. Many of the key concepts appeared as early as the 60s, while the core of relational databases, which have dominated the database world for a while now, was solidified during the 80s. However, the underlying hardware has not displayed such stability in the same period, which means that a lot of assumptions that were made about the hardware by early database systems are not necessarily true for modern computer architectures. In particular, over the last few decades there have been two notable consistent trends in the evolution of computer hardware. The first is that the memory hierarchy of mainstream computer systems has been getting deeper, with its different levels moving away from each other, and new levels being added in between as a result, in particular cache memories. The second is that, when it comes to data transfers between any two adjacent levels of the memory hierarchy, access latencies have not been keeping up with transfer rates. The challenge is therefore to adapt database index structures so that they become immune to these two trends. The latter is addressed by gradually increasing the size of the data transfer unit; the former, by organizing the data so that it exhibits good locality for memory transfers across multiple memory boundaries. We have developed novel structures that facilitate both of these strategies. We started our investigation with the venerable B+-tree, which is the cornerstone order-preserving index of any database system, and we have developed a novel pointer-free tree structure for its pages that optimizes its cache performance and makes it immune to the page size. We then adapted our approach to the R-tree and the GiST, making it applicable to multi-dimensional data indexes as well as generalized indexes for any abstract data type. Finally, we have investigated our structure in the context of main memory alone, and have demonstrated its superiority over the established approaches in that setting too. While our research has its roots in data structures and algorithms theory, we have conducted it with a strong experimental focus, as the complex interactions within the memory hierarchy of a modern computer system can be quite challenging to model and theorize about effectively. Our findings are therefore backed by solid experimental results that verify our hypotheses and prove the superiority of our structures over competing approaches. 004
426	Advances Towards Data-Race-Free Cache Coherence Through Data Classification Davari, Mahdad January 2017 (has links) Providing a consistent view of the shared memory based on precise and well-defined semantics—memory consistency model—has been an enabling factor in the widespread acceptance and commercial success of shared-memory architectures. Moreover, cache coherence protocols have been employed by the hardware to remove from the programmers the burden of dealing with the memory inconsistency that emerges in the presence of the private caches. The principle behind all such cache coherence protocols is to guarantee that consistent values are read from the private caches at all times. In its most stringent form, a cache coherence protocol eagerly enforces two invariants before each data modification: i) no other core has a copy of the data in its private caches, and ii) all other cores know where to receive the consistent data should they need the data later. Nevertheless, by partly transferring the responsibility for maintaining those invariants to the programmers, commercial multicores have adopted weaker memory consistency models, namely the Total Store Order (TSO), in order to optimize the performance for more common cases. Moreover, memory models with more relaxed invariants have been proposed based on the observation that more and more software is written in compliance with the Data-Race-Free (DRF) semantics. The semantics of DRF software can be leveraged by the hardware to infer when data in the private caches might be inconsistent. As a result, hardware ignores the inconsistent data and retrieves the consistent data from the shared memory. DRF semantics therefore removes from the hardware the burden of eagerly enforcing the strong consistency invariants before each data modification. Instead, consistency is guaranteed only when needed. This results in manifold optimizations, such as reducing the energy consumption and improving the performance and scalability. The efficiency of detecting and discarding the inconsistent data is an important factor affecting the efficiency of such coherence protocols. For instance, discarding the consistent data does not affect the correctness, but results in performance loss and increased energy consumption. In this thesis we show how data classification can be leveraged as an effective tool to simplify the cache coherence based on the DRF semantics. In particular, we introduce simple but efficient hardware-based private/shared data classification techniques that can be used to efficiently detect the inconsistent data, thus enabling low-overhead and scalable cache coherence solutions based on the DRF semantics. Shared Memory Architectures Multicore Memory Hierarchy Cache Coherence Data Classification Computer Systems Datorsystem
427	Optimisation mémoire et exploration architecturale d'applications multimédias sur un réseau sur puce Gagné, Vincent January 2006 (has links) Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal. Encodeur MPEG-4 Fusion Mémoires cache MPSoC Pavage Pipeline fonctionnel StepNP SystemC Transformation de boucles
428	Effect of memory access and caching on high performance computing Groening, James January 1900 (has links) Master of Science / Department of Electrical and Computer Engineering / Dwight Day / High-performance computing is often limited by memory access. As speeds increase, processors are often waiting on data transfers to and from memory. Classic memory controllers focus on delivering sequential memory as quickly as possible. This will increase the performance of instruction reads and sequential data reads and writes. However, many applications in high-performance computing often include random memory access which can limit the performance of the system. Techniques such as scatter/gather can improve performance by allowing nonsequential data to be written and read in a single operation. Caching can also improve performance by storing some of the data in memory local to the processor. In this project, we try to find the benefits of different cache configurations. The different configurations include different cache line sizes as well as total size of cache. Although a range of benchmarks are typically used to test performance, we focused on a conjugate gradient solver, HPCCG. The program HPCCG incorporates many of the elements of common benchmarks used in high-performance computing, and relates better to a real world problem. Results show that the performance of a cache configuration can depend on the size of the problem. Problems of smaller sizes can benefit more from a larger cache, while a smaller cache may be sufficient for larger problems. Cache High performance computing FPGA Memory Computer Engineering (0464) Electrical Engineering (0544)
429	Använda metoder från Progressive Web Application för att lösa specifika problem i en webapplikation Carlbom, Carolina January 2019 (has links) Computers and similar units are taking a larger part in our lives and we expect ourwork tools to operate the same way the applications in our private units do. Qpick is awarehouse management software in the shape of a web application that is used onhandheld units with the operating system Android, and desktops with the operatingsystem Windows. The purpose of this thesis is to test and examine which well knowntechnical solutions can be used to solve specific problems in a web application.The study is a case study focusing on the software Qpick. To answer the thesis´ tworesearch questions, two Qpick users were interviewed and observed during a shift.Qpicks´ product owner and a Qpick technician were interviewed. The application wasexamined and underwent performance testing, before and after the implementation oftechnical solutions.The result proved that implementing a service worker is a solution to handle lack ofinternet access. Caching of resources is a solution to lower the load on the web serveras well as speeding up the application and making it more responsive. AJAX is asolution to keep the application up to date without having to update the wholeapplication, and not send or receive more information than necessary. localStorage isa solution to store data in the browser. / Tekniken blir en allt större del av vår vardag och vi förväntar oss att våraarbetsredskap ska fungera på samma sätt som applikationerna på våra privata enhetergör. Qpick är en lagerhanteringsmjukvara i form av en webapplikation som användspå handdatorer med operativsystem Android samt truckdatorer med operativsystemetWindows. Syftet med detta examensarbete är att bepröva välkända tekniska lösningarför att lösa specifika problem i en webapplikation.Studien är en fallstudie med inriktning på programmet Qpick, för att besvara studienstvå forskningsfrågor har två Qpick-användare intervjuats och observerats under ettarbetspass. Vidare har Qpicks produktägare samt en Qpick.tekniker intervjuats.Applikationen granskades och genomgick prestandatester både före och efterimplementation av tekniska lösningar.Resultatet visade att implementera en service worker är en lösning för att hanterabristande internetåtkomst, cachning av resurser är en lösning för att minska tryck påwebservern samt göra applikationen snabbare och mer responsiv. AJAX är en lösningför att hålla applikationen uppdaterad utan att behöva uppdatera hela sidans innehåll,och inte skicka och ta emot mer innehåll än det som faktiskt är nödvändigt.localStorage är en lösning för att lagra data i webbläsaren. PWA web application service worker Cache AJAX localStorage Computer Sciences Datavetenskap (datalogi)
430	Caching of key-value stores in the data plane / Caching av nyckel-värde-databaser i dataplanet Larsson, Samuel January 2019 (has links) The performance of distributed key-value stores is usually dependent on its underlying network, and have potential to improve read/write latencies by improving upon the per- formance of the network communication. We explore the potential performance increase by designing an experimental in-network cache based on NetCache in the switch data plane for the distributed key-value store DXRAM, and placing it on a programmable switch that connects the peers in a DXRAM storage cluster. To accomodate DXRAM which uses TCP for its transport protocol, we also design a TCP flow state translator for the cache and implement an experimental version of this cache design. Benchmark runs with the cache show that best-case item read latency for DXRAM is reduced to approximately half and prove the potential performance gain that can be expected once a proper cache is designed and implemented. key-value store programmable data planes cache NetCache DXRAM Computer Sciences Datavetenskap (datalogi)

Search results