Global ETD Search

1	Biased search data structures and rectangular tiling Sharp, Jonathan Paul January 2002 (has links) No description available. 005.73
2	Categories of containers Abbott, Michael Gordon January 2003 (has links) This thesis develops a new approach to the theory of datatypes based on separating data and storage resulting in a class of datatype called a container. The extension of a container is a functor which can be regarded as a generalised polynomial functor in type variables. A representation theorem allows every natural transformation between container functors to be represented as a unique pair of morphisms in a category.;Under suitable assumptions on the ambient category container functors are closed under composition, limits, coproducts and the construction of initial algebras and final coalgebras. These closure properties allow containers to provide a functorial semantics for an important class of polymorphic types including the strictly positive types.;Since polymorphic functions between functorial polymorphic types correspond to natural transformations, every polymorphic function can be represented as a container morphism; this simplifies reasoning about such functions and provides a framework for developing concepts of generic programming.;Intuitionistic well-founded trees or W-types are the initial algebras of container functors in one parameter; the construction of the initial algebra of a container in more than one parameter leads to the solution of a problem left incomplete by earlier work of Dybjer.;We also find that containers provide a suitable framework to define the derivative of a functor as a kind of linear exponential. We show that the use of containers is essential for this approach.;The theory is developed in the context of a fairly general category to allow for a wide choice of applications. We use the language of dependent type theory to develop the theory of containers in an arbitrary extensive locally cartesian closed category; much of the development in this thesis can also be generalised to display map categories. We develop the appropriate internal language and its interpretation in a category with families. 005.73
3	Unsupervised ensemble learning and its application to temporal data clustering Yang, Yun January 2011 (has links) Temporal data clustering can provide underpinning techniques for the discovery of intrinsic structures and can condense or summarize information contained in temporal data, demands made in various fields ranging from time series analysis to understanding sequential data. In the context of the treatment of data dependency in temporal data, existing temporal data clustering algorithms can be classified in three categories: model-based, temporal-proximity and feature-based clustering. However, unlike static data, temporal data have many distinct characteristics, including high dimensionality, complex time dependency, and large volume, all of which make the clustering of temporal data more challenging than conventional static data clustering. A large of number of recent studies have shown that unsupervised ensemble approaches improve clustering quality by combining multiple clustering solutions into a single consolidated clustering ensemble that has the best performance among given clustering solutions. This thesis systemically reviews existing temporal clustering and unsupervised ensemble learning techniques and proposes three unsupervised ensemble learning approaches for temporal data clustering. The first approach is based on the ensemble of HMM k-models clustering, associated with agglomerative clustering refinement, for solving problems with finding the intrinsic number of clusters, model initialization sensitivity and computational cost, problems which exist in most forms of model-based clustering. Secondly, we propose a sampling-based clustering ensemble approach namely the iteratively constructed clustering ensemble. Our approach iteratively constructs multiple partitions on the subset of whole input instances selected by a smart weighting scheme, combining the strength of both boosting and bagging approaches whilst attempting to simultaneously avoid their drawbacks. Finally, we propose a weighted ensemble learning approach to temporal data clustering which combines partitions obtained by different representations of temporal data. As a result, this approach has the capability to capture the properties of temporal data and the synergy created by reconciling diverse partitions due to combining different representations. The proposed weighted function has out-standing ability in automatic model selection and appropriate grouping for complex temporal data. 005.73
4	An algorithmic framework for visualising and exploring multidimensional data Ross, Greg January 2006 (has links) To help understand multidimensional data, information visualisation techniques are often applied to take advantage of human visual perception in exposing latent structure. A popular means of presenting such data is via two-dimensional scatterplots where the inter-point proximities reflect some notion of similarity between the entities represented. This can result in potentially interesting structure becoming almost immediately apparent. Traditional algorithms for carrying out this dimension reduction tend to have different strengths and weaknesses in terms of run times and layout quality. However, it has been found that the combination of algorithms can produce hybrid variants that exhibit significantly lower run times while maintaining accurate depictions of high-dimensional structure. The author's initial contribution in the creation of such algorithms led to the design and implementation of a software system (HIVE) for the development and investigation of new hybrid variants and the subsequent analysis of the data they transform. This development was motivated by the fact that there are potentially many hybrid algorithmic combinations to explore and therefore an environment that is conductive to their development, analysis and use is beneficial not only in exploring the data they transform but also in exploring the growing number of visualisation tools that these algorithms beget. This thesis descries three areas of the author's contribution to the field of information visualisation. Firstly, work on hybrid algorithms for dimension reduction is presented and their analysis shows their effectiveness. Secondly, the development of a framework for the creation of tailored hybrid algorithms is illustrated. Thirdly, a system embodying the framework, providing an environment conductive to the development, evaluation and use of the algorithms is described. Case studies are provided to demonstrate how the author and others have used and found value in the system across areas as diverse as environmental science, social science and investigative psychology, where multidimensional data are in abundance. 005.73 QA Mathematics
5	Pragmatic algorithms for implementing geostatistics with large datasets Ingram, Benjamin R. January 2008 (has links) With the ability to collect and store increasingly large datasets on modern computers comes the need to be able to process the data in a way that can be useful to a Geostatistician or application scientist. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively for likelihood-based Geostatistics. Various methods have been proposed and are extensively used in an attempt to overcome these complexity issues. This thesis introduces a number of principled techniques for treating large datasets with an emphasis on three main areas: reduced complexity covariance matrices, sparsity in the covariance matrix and parallel algorithms for distributed computation. These techniques are presented individually, but it is also shown how they can be combined to produce techniques for further improving computational efficiency. 005.73
6	Formal specification and test of COTS-based embedded railway control/command architecture / Spécification formelle et test des architectures de contrôle/commande ferroviaire embarquée à base de COTS Yang, Jing 21 January 2013 (has links) L’objectif du project FerroCOTS est de faire évoluer l’architecture de contrôle-commande ferroviaire embarqué des relais électriques vers des Composant-sur-Etagère (COTS) programmables, ici des cartes FPGA (Field-Programmable Gate Array en anglais). Toutefois, l'absence d’une méthode appropriée de spécification et vérification formelles est un obstacle important au développement d’une architecture de contrôle-commande à base de COTS. Dans cette thèse, nous proposons tout d'abord des techniques systématiques de raffinement des exigences brutes et qui permettent de transformer des exigences informelles en des spécifications formelles, tout en guidant et assistant le processus de raffinement et l'étape de formalisation. En l'occurrence, deux méthodes de raffinement des exigences ont été développées. Le cadre de formalisation choisi pour cette méthode de formalisation des exigences est la logique temporelle CTL, qui est un sur-ensemble des logiques CTL (Computation Tree Logic en anglais) et LTL (Linear Time Logic en anglais). Ainsi, les exigences raffinées peuvent être formalisées à l'aide de CTL. En outre, ces formules en CTL* offrent une base formelle pour la vérification et la validation du système. Puis, à partir des spécifications formelles exprimées sous forme de propriétés CTL, nous présentons une méthode pour générer des scénarios de test à partir des formules CTL. La méthode de test utilise le concept de « non-vacuité » pour générer des scénarios de test « Intelligents », qui soient capables de conduire la simulation à une réfutation d’une propriété pour que les bugs dans un système sous test peuvent être détectés. / The goal of the FerroCOTS project is to develop an architecture of embedded railway command/control systems from electrical relays towards programmable Commercial-Off-The-Shelf (COTS) components, here some high-speed Field-Programmable Gate Array (FPGA) digital devices. However, the lack of appropriate formal specification and verification means is a huge obstacle to develop a COTS based control/command architecture. In this thesis, firstly we propose systematic requirement refinement techniques to transform informal requirements into formal specifications, while guiding and assisting the refinement process and the formalization step. In this case, two requirement refinement methods have been developed. The formalization framework chosen for requirement formalization is the temporal logic CTL, which is a superset of the logic Computation Tree Logic (CTL) and the Linear Time Logic (LTL). Thus, the refined requirements are formalized using CTL. In addition, the obtained CTL* formulas provide a formal basis for system verification and validation. On the other hand, based on formal specifications expressed as CTL* properties, we present a method for generating test cases from CTL* formulas. The test method uses the concept of non-vacuity to generate “Intelligent” test benches that are capable of driving the simulation to a property refutation so that the bug of the system under test can be detected. Logique temporelle CTL* 005.73
7	Μελέτη και ανάπτυξη αυτοοργανώμενων δομών δεδομένων Αντωνίου, Δημήτριος 26 February 2009 (has links) Θέμα της παρούσης διπλωματικής εργασίας αποτελεί η μελέτη, ανάπτυξη και πειραματική επιβεβαίωση αλγοριθμικών τεχνικών για την σχεδίαση αυτοοργανώμενων δομών δεδομένων (self-organizing data structures) και η ανάπτυξη τυχαιοποιημένων εκδόσεών τους. Μια αυτοοργανώμενη δομή δεδομένων διαθέτει κάποιον αλγόριθμο για να αναδιοργανώνει τους δείκτες και τα δεδομένα κατάστασης μετά από κάθε πρόσβαση ή πράξη . Ο αλγόριθμος αυτοοργάνωσης είναι σχεδιασμένος ώστε αντιδρώντας σε αρχικά άγνωστες ιδιότητες της ακολουθίας αιτήσεων (request sequence), να οδηγεί τη δομή δεδομένων σε κατάσταση πλεονεκτική για τις ιδιότητες της ακολουθίας με αποτέλεσμα τη μείωση του χρόνου που χρειάζεται στο μέλλον ανά πράξη. Ο πρώτος αλλά και ο μόνος μέχρι σήμερα πιθανός υποψήφιος αλγόριθμος αναζήτησης σε δένδρο που μπορεί να είναι Ο(1)-ανταγωνιστικός είναι το splay δένδρο (splay tree) που παρουσιάστηκε από τους Sleator και Tarjan [1]. Στην εργασία των Sleator και Tarjan παρουσιάζονται κάποιες εικασίες, οι οποίες δεν έχουν αποδειχθεί. Σημαντικότερη είναι η εικασία δυναμικής βελτιστότητας (dynamic optimality conjecture) σύμφωνα με την οποία το splay δένδρο είναι Ο(1)-ανταγωνιστικό. Η εικασία δυναμικής δακτυλοδότησης (dynamic finger conjecture) και η εικασία διαπέρασης (traversal conjecture) είναι αληθείς, αν είναι αληθής η εικασία δυναμικής βελτιστότητας. Ο Cole [3], [4] προσπάθησε να αποδείξει την ορθότητα της εικασίας δυναμικής δακτυλοδότησης σε μια από τις σημαντικότερες εργασίες για τα splay δένδρα. O J. Iacono [2] ανέπτυξε εναλλακτικές δομές δεδομένων που έχουν χρόνο χειρότερης περίπτωσης ανά πράξη (και όχι επιμερισμένο κόστος) της τάξης του Ο(logn), σε αντιδιαστολή με τον Ο(n) χρόνο χειρότερης περίπτωσης των splay trees. Σε αντιπαράθεση με τη δομή του Iacono, οι Mihai Badoiu και Erik D. Demaine παρουσίασαν μια δυναμική δομή αναζήτησης[7], η οποία επιτυγχάνει την ενοποιημένη ιδιότητα και που είναι απλούστερη από τη δομή του Iacono. Μεταξύ όλων των δυναμικών δομών αναζήτησης με βάση τις συγκρίσεις , η συγκεκριμένη δομή έχει τον καλύτερο χρόνο εκτέλεσης. Εκτός της παραπάνω δομής, ο Demaine ανέπτυξε ένα Ο(loglogn) ανταγωνιστικό online δυαδικό δέντρο αναζήτησης[5] , βελτιώνοντας το μέχρι πρότινος βέλτιστο ανταγωνιστικό ποσοστό της τάξης Ο(logn). Αυτή είναι η πρώτη μεγάλη βελτίωση της εικασίας δυναμικής βελτιστότητας (dynamic optimality conjecture) των Sleator και Tarjan , σύμφωνα με την οποία υπάρχουν Ο(1) ανταγωνιστικά δυαδικά δέντρα αναζήτησης. Σε σχέση με τη δυναμική βελτιστότητα των Splay trees, σημαντική συνεισφορά αποτελεί και η εργασία του George F. Georgakopoulos[6]. Ο George F. Georgakopoulos παρουσιάζει μια επέκταση της splay τεχνικής , την οποία ονομάζει chain-splay(αλυσιδωτό splay) . Τα chain-splay δέντρα εφαρμόζουν splay στο στοιχείο που προσπελαύνουμε προς τη ρίζα όπως ακριβώς γίνεται και στα κλασικά splay trees, αλλά εκτελούν και μερικές τοπικές splay πράξεις τακτοποίησης κάτω από το στοιχείο που προσπελάσαμε. Αποδεικνύεται πως η τεχνική chain–splay είναι Ο(loglogn) ανταγωνιστική σε σχέση με οποιοδήποτε offline αλγόριθμο αναζήτησης. Tέλος, ο George F. Georgakopoulos [9] έδωσε ένα νέο λήμμα επαναζύγισης για τα splay δέντρα και με βάση αυτό το λήμμα, αποδεικνύει πως τα splay δέντρα είναι ανταγωνιστικά προς κάθε κλάση δυναμικών ισοζυγισμένων δέντρων. Οι παραπάνω δομές θα μελετηθούν τόσο σε θεωρητικό όσο και σε πειραματικό επίπεδο με σκοπό την εξαγωγή χρήσιμων συμπερασμάτων σε σχέση με την αποδοτικότητά τους αλλά και με σκοπό την καταγραφή των ακόμη ανοικτών προβλημάτων και των προοπτικών επίλυσης τους. Επιπλέον, θα παρουσιαστούν τυχαιοποιημένες εκδόσεις των δομών των Demaine και Georgakopoulos. Οι δομές αυτές θα υλοποιηθούν και η απόδοσή τους θα τεκμηριωθεί τόσο πειραματικά όσο και θεωρητικά. Σημαντικής σημασίας είναι η σύγκρισή τους με τις αρχικές δομές, ώστε να εξαχθούν συμπεράσματα σχετικά με την συμβολή της τυχαιοποίησης στη βελτίωση της απόδοσης των δομών. / - Αλγόριθμοι Splay δέντρα 005.73 Self-organizing data structures Algorithms Splay trees
8	Αποτελεσματικοί αλγόριθμοι και δομές δεδομένων με εφαρμογές στην ανάκτηση πληροφορίας και στις τεχνολογίες διαδικτύου Αντωνίου, Δημήτρης 23 May 2011 (has links) Αντικείμενο της παρούσας διδακτορικής διατριβής είναι η μελέτη και τροποποίηση βασικών δομών δεδομένων με σκοπό τη δημιουργία νέων και την τροποποίηση υπαρχουσών λύσεων, με εφαρμογές στην Ανάκτηση Πληροφορίας, τη Βιοπληροφορική και το Διαδίκτυο. Αρχικά, δίνεται έμφαση στην ανάπτυξη και πειραματική επιβεβαίωση αλγοριθμικών τεχνικών για τη σχεδίαση αυτοοργανώμενων δομών δεδομένων (self-organizing data structures). Μέχρι σήμερα, ο μόνος πιθανός υποψήφιος αλγόριθμος αναζήτησης σε δένδρο που μπορεί να είναι Ο(1)-ανταγωνιστικός είναι το splay δένδρο (splay tree) που παρουσιάστηκε από τους Sleator και Tarjan [1]. Επιπρόσθετα, μελετώνται διάφορες εναλλακτικές τεχνικές αυτοοργάνωσης ([2],[3],[4],[5],[6]) και γίνεται επιβεβαίωση των πάνω ορίων που ισχύουν για την απόδοση των splay trees και για αυτές. Η ανάπτυξη των διάφορων αλγοριθμικών αυτών τεχνικών βρίσκει εφαρμογές πάνω στη συμπίεση δεδομένων. Οι αλγόριθμοι συμπίεσης δεδομένων μπορούν να βελτιώσουν την αποδοτικότητα με την οποία τα δεδομένα αποθηκεύονται ή μεταφέρονται, μέσω της μείωσης του ποσού της πλεονάζουσας πληροφορίας. Η χρήση αυτών των αλγορίθμων τόσο στην κρυπτογράφηση όσο και στην επεξεργασία εικόνας είναι αποδοτική και έχει μεγάλο ερευνητικό ενδιαφέρον. Γενικότερα, οι αυτοοργανώμενες δομές δεδομένων χρίζουν ιδιαίτερης προσοχής στους on-line αλγόριθμους. Αναλυτικότερα, στην παρούσα διατριβή, εφαρμόζεται συμπίεση σε βιολογικά δεδομένα αλλά και σε κείμενα τόσο με χρήση του κλασσικού splay δέντρου [10] αλλά και της log log n ανταγωνιστικής παραλλαγής του. Επιπλέον, παρουσιάζονται τυχαιοποιημένες εκδόσεις των παραπάνω δομών και εφαρμόζονται και αυτές στη συμπίεση δεδομένων. Οι log log n ανταγωνιστικές δομές έχουν καλύτερη απόδοση όσον αφορά την πολυπλοκότητά τους σε σχέση με την κλασσική splay δομή. Το γεγονός αυτό επιβεβαιώνεται πειραματικά, όπου η επιτυγχανόμενη συμπίεση είναι στις περισσότερες των περιπτώσεων καλύτερη από την αντίστοιχη της κλασικής δομής . Επιπλέον, ιδιαίτερο ερευνητικό ενδιαφέρον βρίσκει η εφαρμογή βασικών δομών δεδομένων στο διαδίκτυο. Επιδιώκουμε την ανάπτυξη και θεωρητική επιβεβαίωση αλγορίθμων για προβλήματα όπως η ανάθεση «καυτών συνδέσμων» (hot links [7]), η αναδιοργάνωση ιστοσελίδων και η ανάκτηση πληροφορίας ([8],[9]). Σε πρώτο στάδιο, προτείνονται ευριστικοί αλγόριθμοι με σκοπό την ανάθεση «καυτών συνδέσμων» (hotlinks) και τη βελτίωση της τοπολογίας ενός ιστότοπου ([12],[13],[14]). Σκοπός του αλγορίθμου είναι η προώθηση των δημοφιλών ιστοσελίδων ενός ιστότοπου, μέσω της ανάθεσης συνδέσμων προς αυτές, από ιστοσελίδες οι οποίες είναι σχετικές με αυτές ως προς το περιεχόμενο αλλά και ταυτόχρονα συντελούν στη μείωση της απόστασής τους από την αρχική σελίδα. Παρουσιάζεται το μοντέλο του αλγορίθμου, καθώς και μετρικές οι οποίες χρησιμοποιούνται για την ποσοτική αξιολόγηση της αποδοτικότητας του αλγορίθμου σε σχέση με ειδικά χαρακτηριστικά ενός ιστότοπου, όπως η εντροπία του. Σε δεύτερο στάδιο, γίνεται μελέτη τεχνικών προσωποποίησης ιστοσελίδων [11]. Συγκεκριμένα, σκοπός είναι η υλοποίηση ενός αλγορίθμου, ο οποίος θα ανακαλύπτει την αυξημένη ζήτηση μίας κατηγορίας ιστοσελίδων Α από έναν χρήστη και αξιοποιώντας την καταγεγραμμένη συμπεριφορά άλλων χρηστών, θα προτείνει κατηγορίες σελίδων οι οποίες προτιμήθηκαν από χρήστες οι οποίοι ομοίως παρουσίασαν αυξημένο ενδιαφέρον προς την κατηγορία αυτή. Αναλύεται το φαινόμενο της έξαρσης επισκεψιμότητας (burst) και η αξιοποίηση του στο πεδίο της εξατομίκευσης ιστοσελίδων. Ο αλγόριθμος υλοποιείται με τη χρήση δύο δομών δεδομένων, των Binary heaps και των Splay δέντρων, και αναλύεται η χρονική και χωρική πολυπλοκότητά του. Επιπρόσθετα, γίνεται πειραματική επιβεβαίωση της ορθής και αποδοτικής εκτέλεσης του αλγορίθμου. Αξίζει να σημειωθεί πως ο προτεινόμενος αλγόριθμος λόγω της φύσης του, χρησιμοποιεί χώρο, ο οποίος επιτρέπει τη χρησιμοποίηση του στη RAM. Τέλος, ο προτεινόμενος αλγόριθμος δύναται να βρει εφαρμογή σε εξατομίκευση σελίδων με βάση το σημασιολογικό τους περιεχόμενο σε αντιστοιχία με το διαχωρισμό τους σε κατηγορίες. Σε τρίτο στάδιο, γίνεται παρουσίαση πρωτότυπης τεχνικής σύστασης ιστοσελίδων [15] με χρήση Splay δέντρων. Σε αυτή την περίπτωση, δίνεται ιδιαίτερο βάρος στην εύρεση των σελίδων που παρουσιάζουν έξαρση επισκεψιμότητας και στη σύστασή τους στους χρήστες ενός ιστότοπου. Αρχικά, τεκμηριώνεται η αξία της εύρεσης μιας σελίδας, η οποία δέχεται ένα burst επισκέψεων. H έξαρση επισκεψιμότητας (burst) ορίζεται σε σχέση τόσο με τον αριθμό των επισκέψεων, όσο και με το χρονικό διάστημα επιτέλεσής τους. Η εύρεση των σελίδων επιτυγχάνεται με τη μοντελοποίηση ενός ιστότοπου μέσω ενός splay δέντρου. Με την τροποποίηση του δέντρου μέσω της χρήσης χρονοσφραγίδων (timestamps), ο αλγόριθμος είναι σε θέση να επιστρέφει σε κάθε χρονική στιγμή την ιστοσελίδα που έχει δεχθεί το πιο πρόσφατο burst επισκέψεων. Ο αλγόριθμος αναλύεται όσον αφορά τη χωρική και χρονική του πολυπλοκότητα και συγκρίνεται με εναλλακτικές λύσεις. Μείζονος σημασίας είναι η δυνατότητα εφαρμογής του αλγορίθμου και σε άλλα φαινόμενα της καθημερινότητας μέσω της ανάλογης μοντελοποίησης. Παραδείγματος χάρη, στην περίπτωση της απεικόνισης ενός συγκοινωνιακού δικτύου μέσω ενός γράφου, ο αλγόριθμος σύστασης δύναται να επιστρέφει σε κάθε περίπτωση τον κυκλοφοριακό κόμβο ο οποίος παρουσιάζει την πιο πρόσφατη συμφόρηση. Τέλος, όσον αφορά το πεδίο της ανάκτησης πληροφορίας, η διατριβή επικεντρώνεται σε μία πρωτότυπη και ολοκληρωμένη μεθοδολογία με σκοπό την αξιολόγηση της ποιότητας ενός συστήματος λογισμικού βάσει του Προτύπου Ποιότητας ISO/IEC-9126. Το κύριο χαρακτηριστικό της είναι ότι ολοκληρώνει την αξιολόγηση ενός συστήματος λογισμικού ενσωματώνοντας την αποτίμηση όχι μόνο των χαρακτηριστικών που είναι προσανατολισμένα στο χρήστη, αλλά και εκείνων που είναι πιο τεχνικά και αφορούν τους μηχανικούς λογισμικού ενός συστήματος. Σε αυτή τη διατριβή δίνεται βάρος στην εφαρμογή μεθόδων εξόρυξης δεδομένων πάνω στα αποτελέσματα της μέτρησης μετρικών οι οποίες συνθέτουν τα χαρακτηριστικά του πηγαίου κώδικα, όπως αυτά ορίζονται από το Προτύπο Ποιότητας ISO/IEC-9126 [16][17]. Ειδικότερα εφαρμόζονται αλγόριθμοι συσταδοποίησης με σκοπό την εύρεση τμημάτων κώδικα με ιδιαίτερα χαρακτηριστικά, που χρήζουν προσοχής. / In this dissertation we take an in-depth look at the use of effective and efficient data structures and algorithms in the fields of data mining and web technologies. The main goal is to develop algorithms based on appropriate data structures, in order to improve the performance at all levels of web applications. In the first chapter the reader is introduced to the main issues studied dissertation. In the second chapter, we propose novel randomized versions of the splay trees. We have evaluated the practical performance of these structures in comparison with the original version of splay trees and with their log log n-competitive variations, in the application field of compression. Moreover, we show that the Chain Splay tree achieves O(logn) worst-case cost per query. In order to evaluate performance, we utilize plain splay trees, the log log n-competitive variations, the proposed randomized version with the Chain Splay technique to compress data. It is observed experimentally that the compression achieved in the case of the log log n-competitive technique is, as expected, more efficient than the one of the plain splay trees. The third chapter focuses on hotlinks assignment techniques. Enhancing web browsing experience is an open issue frequently dealt using hotlinks assignment between webpages, shortcuts from one node to another. Our aim is to provide a novel, more efficient approach to minimize the expected number of steps needed to reach expected pages when browsing a website. We present a randomized algorithm, which combines the popularity of the webpages, the website structure, and for the first time to the best authors’ knowledge, the similarity of context between pages in order to suggest the placement of suitable hotlinks. We verify experimentally that users need less page transitions to reach expected information pages when browsing a website, enhanced using the proposed algorithm. In the fourth chapter we investigate the problem of web personalization. The explosive growth in the size and use of the World Wide Web continuously creates new great challenges and needs. The need for predicting the users’ preferences in order to expedite and improve the browsing though a site can be achieved through personalizing of the Websites. Recommendation and personalization algorithms aim at suggesting WebPages to users based on their current visit and past users’ navigational patterns. The problem that we address is the case where few WebPages become very popular for short periods of time and are accessed very frequently in a limited temporal space. Our aim is to deal with these bursts of visits and suggest these highly accessed pages to the future users that have common interests. Hence, in this paper, we propose a new web personalization technique, based on advanced data structures. The data structures that are used are the Splay tree (1) and Binary heaps (2). We describe the architecture of the technique, analyze the time and space complexity and prove its performance. In addition, we compare both theoretically and experimentally the proposed technique to another approach to verify its efficiency. Our solution achieves O(P2) space complexity and runs in k log P time, where k is the number of pages and P the number of categories of WebPages. Extending this algorithm, we propose an algorithm which efficiently detects bursts of visits to webpages. As an increasing number of Web sites consist of multiple pages, it is more difficult for the visitors to rapidly reach their own target. This results in an urgent need for intelligent systems that effectively support the users’ navigation to high demand Web content. In many cases, due to specific conditions, web pages become very popular and receive excessively large number of hits. Therefore, there is a high probability that these web pages will be of interest to the majority of the visitors at a given time. The data structure that is used for the purposes of the recommendation algorithm is the Splay tree. We describe the architecture of the technique, analyze the time and space complexity and show its performance. The dissertation’s last chapter elaborates on how to use clustering for the evaluation of a software system’s maintainability according to the ISO/IEC-9126 quality standard. More specifically it proposes a methodology that combines clustering and multicriteria decision aid techniques for knowledge acquisition by integrating groups of data from source code with the expertise of a software system’s evaluators. A process for the extraction of elements from source code and Analytical Hierarchical Processing for assigning weights to these data are provided; k-Attractors clustering algorithm is then applied on these data, in order to produce system overviews and deductions. The methodology is evaluated on Apache Geronimo, a large Open Source Application Server, results are discussed and conclusions are presented together with directions for future work. Αλγόριθμοι Δομές δεδομένων Ανάκτηση πληροφορίας 005.73 Algorithms Data structures Data mining Web technologies
9	Variable selection in joint modelling of mean and variance for multilevel data Charalambous, Christiana January 2011 (has links) We propose to extend the use of penalized likelihood based variable selection methods to hierarchical generalized linear models (HGLMs) for jointly modellingboth the mean and variance structures. We are interested in applying these newmethods on multilevel structured data, hence we assume a two-level hierarchical structure, with subjects nested within groups. We consider a generalized linearmixed model (GLMM) for the mean, with a structured dispersion in the formof a generalized linear model (GLM). In the first instance, we model the varianceof the random effects which are present in the mean model, or in otherwords the variation between groups (between-level variation). In the second scenario,we model the dispersion parameter associated with the conditional varianceof the response, which could also be thought of as the variation betweensubjects (within-level variation). To do variable selection, we use the smoothlyclipped absolute deviation (SCAD) penalty, a penalized likelihood variable selectionmethod, which shrinks the coefficients of redundant variables to 0 and at thesame time estimates the coefficients of the remaining important covariates. Ourmethods are likelihood based and so in order to estimate the fixed effects in ourmodels, we apply iterative procedures such as the Newton-Raphson method, inthe form of the LQA algorithm proposed by Fan and Li (2001). We carry out simulationstudies for both the joint models for the mean and variance of the randomeffects, as well as the joint models for the mean and dispersion of the response,to assess the performance of our new procedures against a similar process whichexcludes variable selection. The results show that our method increases both theaccuracy and efficiency of the resulting penalized MLEs and has 100% successrate in identifying the zero and non-zero components over 100 simulations. Forthe main real data analysis, we use the Health Survey for England (HSE) 2004dataset. We investigate how obesity is linked to several factors such as smoking,drinking, exercise, long-standing illness, to name a few. We also discover whetherthere is variation in obesity between individuals and between households of individuals,as well as test whether that variation depends on some of the factorsaffecting obesity itself. 005.73
10	Contribution à la décomposition de données multimodales avec des applications en apprentisage de dictionnaires et la décomposition de tenseurs de grande taille. / Contribution to multimodal data processing with applications to dictionary learning and large-scale decomposition Traoré, Abraham 26 November 2019 (has links) Dans ce travail, on s'intéresse à des outils mathématiques spéciaux appelés tenseurs qui sont formellement définis comme des tableaux multidimensionnels définis sur le produit tensoriel d'espaces vectoriels (chaque espace vectoriel étant muni de son système de coordonnées), le nombre d'espaces vectoriels impliqués dans ce produit étant l'ordre du tenseur. L'intérêt pour les tenseurs est motivé par certains travaux expérimentaux qui ont prouvé, dans divers contextes, que traiter des données multidimensionnelles avec des tenseurs plutôt que des matrices donne un meilleur résultat aussi bien pour des tâches de régression que de classification. Dans le cadre de la thèse, nous nous sommes focalisés sur une décomposition dite de Tucker et avons mis en place une méthode pour l'apprentissage de dictionnaires, une technique pour l'apprentissage en ligne de dictionnaires, une approche pour la décomposition d'un tenseur de grandes tailles et enfin une méthodologie pour la décomposition d'un tenseur qui croît par rapport à tous les modes. De nouveaux résultats théoriques concernant la convergence et la vitesse de convergence sont établis et l'efficacité des algorithmes proposés, reposant soit sur la minimisation alternée, soit sur la descente de gradients par coordonnées, est démontrée sur des problèmes réels / In this work, we are interested in special mathematical tools called tensors, that are multidimensional arrays defined on tensor product of some vector spaces, each of which has its own coordinate system and the number of spaces involved in this product is generally referred to as order. The interest for these tools stem from some empirical works (for a range of applications encompassing both classification and regression) that prove the superiority of tensor processing with respect to matrix decomposition techniques. In this thesis framework, we focused on specific tensor model named Tucker and established new approaches for miscellaneous tasks such as dictionary learning, online dictionary learning, large-scale processing as well as the decomposition of a tensor evolving with respect to each of its modes. New theoretical results are established and the efficiency of the different algorithms, which are based either on alternate minimization or coordinate gradient descent, is proven via real-world problems. Tenseur Apprentissage de dictionnaires Descente de coordonnées Minimisation alternée Tensor Dictionary learning Coordinate descent Alternate minimization 005.73

Search results