• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 71
  • 12
  • 9
  • 7
  • 5
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 127
  • 52
  • 38
  • 26
  • 25
  • 21
  • 17
  • 14
  • 13
  • 13
  • 12
  • 12
  • 12
  • 12
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Lh*rs p2p : une nouvelle structure de données distribuée et scalable pour les environnements Pair à Pair / Lh*rsp2p : a new scalable and distributed data structure for Peer to Peer environnements

Yakouben, Hanafi 14 May 2013 (has links)
Nous proposons une nouvelle structure de données distribuée et scalable appelée LH*RSP2P conçue pour les environnements pair à pair(P2P).Les données de l'application forment un fichier d’enregistrements identifiés par les clés primaires. Les enregistrements sont dans des cases mémoires sur des pairs, adressées par le hachage distribué (LH*). Des éclatements créent dynamiquement de nouvelles cases pour accommoder les insertions. L'accès par clé à un enregistrement comporte un seul renvoi au maximum. Le scan du fichier s’effectue au maximum en deux rounds. Ces résultats sont parmi les meilleurs à l'heure actuelle. Tout fichier LH*RSP2P est également protégé contre le Churn. Le calcul de parité protège toute indisponibilité jusqu’à k cases, où k ≥ 1 est un paramètre scalable. Un nouveau type de requêtes, qualifiées de sûres, protège également contre l’accès à toute case périmée. Nous prouvons les propriétés de notre SDDS formellement par une implémentation prototype et des expérimentations. LH*RSP2P apparaît utile aux applications Big Data, sur des RamClouds tout particulièrement / We propose a new scalable and distributed data structure termed LH*RSP2P designed for Peer-to-Peer environment (P2P). Application data forms a file of records identified by primary keys. Records are in buckets on peers, addressed by distributed linear hashing (LH*). Splits create new buckets dynamically, to accommodate inserts. Key access to a record uses at most one hop. Scan of the file proceeds in two rounds at most. These results are among best at present. An LH*RSP2P file is also protected against Churn. Parity calculation recovers from every unavailability of up to k≥1, k is a scalable parameter. A new type of queries, qualified as sure, protects also against access to any out-of-date bucket. We prove the properties of our SDDS formally, by a prototype implementation and experiments. LH*RSP2P appears useful for Big Data manipulations, over RamClouds especially.
112

Efficient Frequent Closed Itemset Algorithms With Applications To Stream Mining And Classification

Ranganath, B N 09 1900 (has links)
Data mining is an area to find valid, novel, potentially useful, and ultimately understandable abstractions in a data. Frequent itemset mining is one of the important data mining approaches to find those abstractions in the form of patterns. Frequent Closed itemsets provide complete and condensed information for non-redundant association rules generation. For many applications mining all the frequent itemsets is not necessary, and mining frequent Closed itemsets are adequate. Compared to frequent itemset mining, frequent Closed itemset mining generates less number of itemsets, and therefore improves the efficiency and effectiveness of these tasks. Recently, much research has been done on Closed itemsets mining, but it is mainly for traditional databases where multiple scans are needed, and whenever new transactions arrive, additional scans must be performed on the updated transaction database; therefore, they are not suitable for data stream mining. Mining frequent itemsets from data streams has many potential and broad applications. Some of the emerging applications of data streams that require association rule mining are network traffic monitoring and web click streams analysis. Different from data in traditional static databases, data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. Recent works on data stream mining based on sliding window method slide the window by one transaction at a time. But when the window size is large and support threshold is low, the existing methods consume significant time and lead to a large increase in user response time. In our first work, we propose a novel algorithm Stream-Close based on sliding window model to mine frequent Closed itemsets from the data streams within the current sliding window. We enhance the scalabality of the algorithm by introducing several optimization techniques such as sliding the window by multiple transactions at a time and novel pruning techniques which lead to a considerable reduction in the number of candidate itemsets to be examined for closure checking. Our experimental studies show that the proposed algorithm scales well with large data sets. Still the notion of frequent closed itemsets generates a huge number of closed itemsets in some applications. This drawback makes frequent closed itemsets mining infeasible in many applications since users cannot interpret the large volume of output (which sometimes will be greater than the data itself when support threshold is low) and may lead to an overhead to develop extra applications which post processes the output of original algorithm to reduce the size of the output. Recent work on clustering of itemsets considers strictly either expression(consists of items present in itemset) or support of the itemsets or partially both to reduce the number of itemsets. But the drawback of the above approaches is that in some situations, number of itemsets does not reduce due to their restricted view of either considering expressions or support. So we propose a new notion of frequent itemsets called clustered itemsets which considers both expressions and support of the itemsets in summarizing the output. We introduce a new distance measure w.r.t expressions and also prove the problem of mining clustered itemsets to be NP-hard. In our second work, we propose a deterministic locality sensitive hashing based classifier using clustered itemsets. Locality sensitive hashing(LSH)is a technique for efficiently finding a nearest neighbour in high dimensional data sets. The idea of locality sensitive hashing is to hash the points using several hash functions to ensure that for each function the probability of collision is much higher for objects that are close to each other than those that are far apart. We propose a LSH based approximate nearest neighbour classification strategy. But the problem with LSH is, it randomly chooses hash functions and the estimation of a large number of hash functions could lead to an increase in query time. From Classification point of view, since LSH chooses randomly from a family of hash functions the buckets may contain points belonging to other classes which may affect classification accuracy. So, in order to overcome these problems we propose to use class association rules based hash functions which ensure that buckets corresponding to the class association rules contain points from the same class. But associative classification involves generation and examination of large number of candidate class association rules. So, we use the clustered itemsets which reduce the number of class association rules to be examined. We also establish formal connection between clustering parameter(delta used in the generation of clustered frequent itemsets) and discriminative measure such as Information gain. Our experimental studies show that the proposed method achieves an increase in accuracy over LSH based near neighbour classification strategy.
113

On the stability of document analysis algorithms : application to hybrid document hashing technologies / De la stabilité des algorithmes d’analyse de documents : application aux technologies de hachage de documents hybrides

Eskenazi, Sébastien 14 December 2016 (has links)
Un nombre incalculable de documents est imprimé, numérisé, faxé, photographié chaque jour. Ces documents sont hybrides : ils existent sous forme papier et numérique. De plus les documents numériques peuvent être consultés et modifiés simultanément dans de nombreux endroits. Avec la disponibilité des logiciels d’édition d’image, il est devenu très facile de modifier ou de falsifier un document. Cela crée un besoin croissant pour un système d’authentification capable de traiter ces documents hybrides. Les solutions actuelles reposent sur des processus d’authentification séparés pour les documents papiers et numériques. D’autres solutions reposent sur une vérification visuelle et offrent seulement une sécurité partielle. Dans d’autres cas elles nécessitent que les documents sensibles soient stockés à l’extérieur des locaux de l’entreprise et un accès au réseau au moment de la vérification. Afin de surmonter tous ces problèmes, nous proposons de créer un algorithme de hachage sémantique pour les images de documents. Cet algorithme de hachage devrait fournir une signature compacte pour toutes les informations visuellement significatives contenues dans le document. Ce condensé permettra la création de systèmes de sécurité hybrides pour sécuriser tout le document. Ceci peut être réalisé grâce à des algorithmes d’analyse du document. Cependant ceux-ci ont besoin d’être porté à un niveau de performance sans précédent, en particulier leur fiabilité qui dépend de leur stabilité. Après avoir défini le contexte de l’étude et ce qu’est un algorithme stable, nous nous sommes attachés à produire des algorithmes stables pour la description de la mise en page, la segmentation d’un document, la reconnaissance de caractères et la description des zones graphiques. / An innumerable number of documents is being printed, scanned, faxed, photographed every day. These documents are hybrid : they exist as both hard copies and digital copies. Moreover their digital copies can be viewed and modified simultaneously in many places. With the availability of image modification software, it has become very easy to modify or forge a document. This creates a rising need for an authentication scheme capable of handling these hybrid documents. Current solutions rely on separate authentication schemes for paper and digital documents. Other solutions rely on manual visual verification and offer only partial security or require that sensitive documents be stored outside the company’s premises and a network access at the verification time. In order to overcome all these issues we propose to create a semantic hashing algorithm for document images. This hashing algorithm should provide a compact digest for all the visually significant information contained in the document. This digest will allow current hybrid security systems to secure all the document. This can be achieved thanks to document analysis algorithms. However those need to be brought to an unprecedented level of performance, in particular for their reliability which depends on their stability. After defining the context of this study and what is a stable algorithm, we focused on producing stable algorithms for layout description, document segmentation, character recognition and describing the graphical parts of a document.
114

Machine learning techniques for content-based information retrieval / Méthodes d’apprentissage automatique pour la recherche par le contenu de l’information

Chafik, Sanaa 22 December 2017 (has links)
Avec l’évolution des technologies numériques et la prolifération d'internet, la quantité d’information numérique a considérablement évolué. La recherche par similarité (ou recherche des plus proches voisins) est une problématique que plusieurs communautés de recherche ont tenté de résoudre. Les systèmes de recherche par le contenu de l’information constituent l’une des solutions prometteuses à ce problème. Ces systèmes sont composés essentiellement de trois unités fondamentales, une unité de représentation des données pour l’extraction des primitives, une unité d’indexation multidimensionnelle pour la structuration de l’espace des primitives, et une unité de recherche des plus proches voisins pour la recherche des informations similaires. L’information (image, texte, audio, vidéo) peut être représentée par un vecteur multidimensionnel décrivant le contenu global des données d’entrée. La deuxième unité consiste à structurer l’espace des primitives dans une structure d’index, où la troisième unité -la recherche par similarité- est effective.Dans nos travaux de recherche, nous proposons trois systèmes de recherche par le contenu de plus proches voisins. Les trois approches sont non supervisées, et donc adaptées aux données étiquetées et non étiquetées. Elles sont basées sur le concept du hachage pour une recherche efficace multidimensionnelle des plus proches voisins. Contrairement aux approches de hachage existantes, qui sont binaires, les approches proposées fournissent des structures d’index avec un hachage réel. Bien que les approches de hachage binaires fournissent un bon compromis qualité-temps de calcul, leurs performances en termes de qualité (précision) se dégradent en raison de la perte d’information lors du processus de binarisation. À l'opposé, les approches de hachage réel fournissent une bonne qualité de recherche avec une meilleure approximation de l’espace d’origine, mais induisent en général un surcoût en temps de calcul.Ce dernier problème est abordé dans la troisième contribution. Les approches proposées sont classifiées en deux catégories, superficielle et profonde. Dans la première catégorie, on propose deux techniques de hachage superficiel, intitulées Symmetries of the Cube Locality sensitive hashing (SC-LSH) et Cluster-Based Data Oriented Hashing (CDOH), fondées respectivement sur le hachage aléatoire et l’apprentissage statistique superficiel. SCLSH propose une solution au problème de l’espace mémoire rencontré par la plupart des approches de hachage aléatoire, en considérant un hachage semi-aléatoire réduisant partiellement l’effet aléatoire, et donc l’espace mémoire, de ces dernières, tout en préservant leur efficacité pour la structuration des espaces hétérogènes. La seconde technique, CDOH, propose d’éliminer l’effet aléatoire en combinant des techniques d’apprentissage non-supervisé avec le concept de hachage. CDOH fournit de meilleures performances en temps de calcul, en espace mémoire et en qualité de recherche.La troisième contribution est une approche de hachage basée sur les réseaux de neurones profonds appelée "Unsupervised Deep Neuron-per-Neuron Hashing" (UDN2H). UDN2H propose une indexation individuelle de la sortie de chaque neurone de la couche centrale d’un modèle non supervisé. Ce dernier est un auto-encodeur profond capturant une structure individuelle de haut niveau de chaque neurone de sortie.Nos trois approches, SC-LSH, CDOH et UDN2H, ont été proposées séquentiellement durant cette thèse, avec un niveau croissant, en termes de la complexité des modèles développés, et en termes de la qualité de recherche obtenue sur de grandes bases de données d'information / The amount of media data is growing at high speed with the fast growth of Internet and media resources. Performing an efficient similarity (nearest neighbor) search in such a large collection of data is a very challenging problem that the scientific community has been attempting to tackle. One of the most promising solutions to this fundamental problem is Content-Based Media Retrieval (CBMR) systems. The latter are search systems that perform the retrieval task in large media databases based on the content of the data. CBMR systems consist essentially of three major units, a Data Representation unit for feature representation learning, a Multidimensional Indexing unit for structuring the resulting feature space, and a Nearest Neighbor Search unit to perform efficient search. Media data (i.e. image, text, audio, video, etc.) can be represented by meaningful numeric information (i.e. multidimensional vector), called Feature Description, describing the overall content of the input data. The task of the second unit is to structure the resulting feature descriptor space into an index structure, where the third unit, effective nearest neighbor search, is performed.In this work, we address the problem of nearest neighbor search by proposing three Content-Based Media Retrieval approaches. Our three approaches are unsupervised, and thus can adapt to both labeled and unlabeled real-world datasets. They are based on a hashing indexing scheme to perform effective high dimensional nearest neighbor search. Unlike most recent existing hashing approaches, which favor indexing in Hamming space, our proposed methods provide index structures adapted to a real-space mapping. Although Hamming-based hashing methods achieve good accuracy-speed tradeoff, their accuracy drops owing to information loss during the binarization process. By contrast, real-space hashing approaches provide a more accurate approximation in the mapped real-space as they avoid the hard binary approximations.Our proposed approaches can be classified into shallow and deep approaches. In the former category, we propose two shallow hashing-based approaches namely, "Symmetries of the Cube Locality Sensitive Hashing" (SC-LSH) and "Cluster-based Data Oriented Hashing" (CDOH), based respectively on randomized-hashing and shallow learning-to-hash schemes. The SC-LSH method provides a solution to the space storage problem faced by most randomized-based hashing approaches. It consists of a semi-random scheme reducing partially the randomness effect of randomized hashing approaches, and thus the memory storage problem, while maintaining their efficiency in structuring heterogeneous spaces. The CDOH approach proposes to eliminate the randomness effect by combining machine learning techniques with the hashing concept. The CDOH outperforms the randomized hashing approaches in terms of computation time, memory space and search accuracy.The third approach is a deep learning-based hashing scheme, named "Unsupervised Deep Neuron-per-Neuron Hashing" (UDN2H). The UDN2H approach proposes to index individually the output of each neuron of the top layer of a deep unsupervised model, namely a Deep Autoencoder, with the aim of capturing the high level individual structure of each neuron output.Our three approaches, SC-LSH, CDOH and UDN2H, were proposed sequentially as the thesis was progressing, with an increasing level of complexity in terms of the developed models, and in terms of the effectiveness and the performances obtained on large real-world datasets
115

Bezpečná implementace technologie blockchain / Secure Implementation of Blockchain Technology

Kovář, Adam January 2020 (has links)
This thesis describes basis of blockchain technology implementation for SAP Cloud platform with emphasis to security and safety of critical data which are stored in blockchain. This diploma thesis implements letter of credit to see and control business process administration. It also compares all the possible technology modification. Thesis describes all elementary parts of software which are necessary to implement while storing data and secure integrity. This thesis also leverages ideal configuration of each programable block in implementation. Alternative configurations of possible solutions are described with pros and cons as well. Another part of diploma thesis is actual working implementation as a proof of concept to cover letter of credit. All parts of code are design to be stand alone to provide working concept for possible implementation and can source as a help to write productive code. User using this concept will be able to see whole process and create new statutes for whole letter of credit business process.
116

Hashovací funkce - charakteristika, implementace a kolize / Hash functions - characteristics, implementation and collisions

Karásek, Jan January 2009 (has links)
Hash functions belong to elements of modern cryptography. Their task is to transfer the data expected on the entry into a unique bite sequence. Hash functions are used in many application areas, such as message integrity verification, information authentication, and are used in cryptographic protocols, to compare data and other applications. The goal of the master’s thesis is to characterize hash functions to describe their basic characteristics and use. Next task was to focus on one hash function, in particular MD5, and describe it properly. That means, to describe its construction, safety and possible attacks on this function. The last task was to implement this function and collisions. The introductory chapters describe the basic definition of hash function, the properties of the function. The chapters mention the methods preventing collisions and the areas were the hash functions are used. Further chapters are focused on the characteristics of various types of hash functions. These types include basic hash functions built on basic bit operations, perfect hash functions and cryptographic hash functions. After concluding the characteristics of hash functions, I devoted to practical matters. The thesis describes the basic appearance and control of the program and its individual functions which are explained theoretically. The following text describes the function MD5, its construction, safety risks and implementation. The last chapter refers to attacks on hash functions and describes the hash function tunneling method, brute force attack and dictionary attack.
117

[en] APPROXIMATE NEAREST NEIGHBOR SEARCH FOR THE KULLBACK-LEIBLER DIVERGENCE / [pt] BUSCA APROXIMADA DE VIZINHOS MAIS PRÓXIMOS PARA DIVERGÊNCIA DE KULLBACK-LEIBLER

19 March 2018 (has links)
[pt] Em uma série de aplicações, os pontos de dados podem ser representados como distribuições de probabilidade. Por exemplo, os documentos podem ser representados como modelos de tópicos, as imagens podem ser representadas como histogramas e também a música pode ser representada como uma distribuição de probabilidade. Neste trabalho, abordamos o problema do Vizinho Próximo Aproximado onde os pontos são distribuições de probabilidade e a função de distância é a divergência de Kullback-Leibler (KL). Mostramos como acelerar as estruturas de dados existentes, como a Bregman Ball Tree, em teoria, colocando a divergência KL como um produto interno. No lado prático, investigamos o uso de duas técnicas de indexação muito populares: Índice Invertido e Locality Sensitive Hashing. Os experimentos realizados em 6 conjuntos de dados do mundo real mostraram que o Índice Invertido é melhor do que LSH e Bregman Ball Tree, em termos de consultas por segundo e precisão. / [en] In a number of applications, data points can be represented as probability distributions. For instance, documents can be represented as topic models, images can be represented as histograms and also music can be represented as a probability distribution. In this work, we address the problem of the Approximate Nearest Neighbor where the points are probability distributions and the distance function is the Kullback-Leibler (KL) divergence. We show how to accelerate existing data structures such as the Bregman Ball Tree, by posing the KL divergence as an inner product embedding. On the practical side we investigated the use of two, very popular, indexing techniques: Inverted Index and Locality Sensitive Hashing. Experiments performed on 6 real world data-sets showed the Inverted Index performs better than LSH and Bregman Ball Tree, in terms of queries per second and precision.
118

Programming Model and Protocols for Reconfigurable Distributed Systems

Arad, Cosmin January 2013 (has links)
Distributed systems are everywhere. From large datacenters to mobile devices, an ever richer assortment of applications and services relies on distributed systems, infrastructure, and protocols. Despite their ubiquity, testing and debugging distributed systems remains notoriously hard. Moreover, aside from inherent design challenges posed by partial failure, concurrency, or asynchrony, there remain significant challenges in the implementation of distributed systems. These programming challenges stem from the increasing complexity of the concurrent activities and reactive behaviors in a distributed system on the one hand, and the need to effectively leverage the parallelism offered by modern multi-core hardware, on the other hand. This thesis contributes Kompics, a programming model designed to alleviate some of these challenges. Kompics is a component model and programming framework for building distributed systems by composing message-passing concurrent components. Systems built with Kompics leverage multi-core machines out of the box, and they can be dynamically reconfigured to support hot software upgrades. A simulation framework enables deterministic execution replay for debugging, testing, and reproducible behavior evaluation for large-scale Kompics distributed systems. The same system code is used for both simulation and production deployment, greatly simplifying the system development, testing, and debugging cycle. We highlight the architectural patterns and abstractions facilitated by Kompics through a case study of a non-trivial distributed key-value storage system. CATS is a scalable, fault-tolerant, elastic, and self-managing key-value store which trades off service availability for guarantees of atomic data consistency and tolerance to network partitions. We present the composition architecture for the numerous protocols employed by the CATS system, as well as our methodology for testing the correctness of key CATS algorithms using the Kompics simulation framework. Results from a comprehensive performance evaluation attest that CATS achieves its claimed properties and delivers a level of performance competitive with similar systems which provide only weaker consistency guarantees. More importantly, this testifies that Kompics admits efficient system implementations. Its use as a teaching framework as well as its use for rapid prototyping, development, and evaluation of a myriad of scalable distributed systems, both within and outside our research group, confirm the practicality of Kompics. / Kompics / CATS / REST
119

Forensisk hårddiskkloning och undersökning av hårddiskskrivskydd / Forensic hard disk cloning and investigation of hardware write blockers

Bengtsson, Johnny January 2004 (has links)
<p>Detta examensarbete reder ut arbetsprinciperna för olika typer av hårddiskskrivskydd; hårdvaruskrivskydd, mjukvaruskrivskydd, hybridskrivskydd och bygelskrivskydd. Slutsatsen av utredningen är att endast hårdvaruskrivskydd Detta examensarbete reder ut arbetsprinciperna för olika typer av hårddiskskrivskydd; hårdvaruskrivskydd, mjukvaruskrivskydd, hybridskrivskydd och bygelskrivskydd. Slutsatsen av utredningen är att endast hårdvaruskrivskydd bedöms ha tillräckligt pålitliga skyddsprinciper, vilket motiveras av dess oberoende från både hårdvara och operativsystem. </p><p>Vidare undersöks hårdvaruskrivskyddet Image MASSter(TM) Drive Lock från Intelligent Computer Solutions (ICS). Några egentliga slutsatser gick inte dra av kretskonstruktionen, bortsett från att den är uppbyggd kring en FPGA (Xilinx Spartan-II, XC2S15) med tillhörande PROM (XC17S15APC). </p><p>En egenutvecklad idé till autenticieringsmetod för hårddiskkloner föreslås som ett tillägg till arbetet. Principen bygger på att komplettera hårddiskklonen med unik information om hårddisk såväl kloningsomständigheter, vilka sammanflätas genom XOR-operation av komponenternas hashsummor.Autenticieringsmetoden kan vid sjösättning möjligen öka rättsäkerheten för både utredarna och den som står misstänkt vid en brottsutredning. </p><p>Arbetet är till stora delar utfört vid och på uppdrag av Statens kriminaltekniska laboratorium (SKL) i Linköping.</p>
120

Programming Model and Protocols for Reconfigurable Distributed Systems

Arad, Cosmin Ionel January 2013 (has links)
Distributed systems are everywhere. From large datacenters to mobile devices, an ever richer assortment of applications and services relies on distributed systems, infrastructure, and protocols. Despite their ubiquity, testing and debugging distributed systems remains notoriously hard. Moreover, aside from inherent design challenges posed by partial failure, concurrency, or asynchrony, there remain significant challenges in the implementation of distributed systems. These programming challenges stem from the increasing complexity of the concurrent activities and reactive behaviors in a distributed system on the one hand, and the need to effectively leverage the parallelism offered by modern multi-core hardware, on the other hand. This thesis contributes Kompics, a programming model designed to alleviate some of these challenges. Kompics is a component model and programming framework for building distributed systems by composing message-passing concurrent components. Systems built with Kompics leverage multi-core machines out of the box, and they can be dynamically reconfigured to support hot software upgrades. A simulation framework enables deterministic execution replay for debugging, testing, and reproducible behavior evaluation for largescale Kompics distributed systems. The same system code is used for both simulation and production deployment, greatly simplifying the system development, testing, and debugging cycle. We highlight the architectural patterns and abstractions facilitated by Kompics through a case study of a non-trivial distributed key-value storage system. CATS is a scalable, fault-tolerant, elastic, and self-managing key-value store which trades off service availability for guarantees of atomic data consistency and tolerance to network partitions. We present the composition architecture for the numerous protocols employed by the CATS system, as well as our methodology for testing the correctness of key CATS algorithms using the Kompics simulation framework. Results from a comprehensive performance evaluation attest that CATS achieves its claimed properties and delivers a level of performance competitive with similar systems which provide only weaker consistency guarantees. More importantly, this testifies that Kompics admits efficient system implementations. Its use as a teaching framework as well as its use for rapid prototyping, development, and evaluation of a myriad of scalable distributed systems, both within and outside our research group, confirm the practicality of Kompics. / <p>QC 20130520</p>

Page generated in 0.0907 seconds