Global ETD Search

1	Creating a Customizable Component Based ETL Solution for the Consumer / Skapandet av en anpassningsbar komponentbaserad ETL-lösning för konsumenten Retelius, Philip, Bergström Persson, Eddie January 2021 (has links) In today's society, an enormous amount of data is created that is stored in various databases. Since the data is in many cases stored in different databases, there is a demand from organizations with a lot of data to be able to merge separated data and get an extraction of this resource. Extract, Transform and Load System (ETL) is a solution that has made it possible to easily merge different databases. However, the ETL market has been owned by large actors such as Amazon and Microsoft and the solutions offered are completely owned by these actors. This leaves the consumer with little ownership of the solution. Therefore, this thesis proposes a framework to create a component based ETL which gives consumers an opportunity to own and develop their own ETL solution that they can customize to their own needs. The result of the thesis is a prototype ETL solution that is built with the idea of being able to configure and customize the prototype and it accomplishes this by being independent of inflexible external libraries and a level of modularity that makes adding and removing components easy. The results of this thesis are verified with a test that shows how two different files containing data can be combined. / I dagens samhälle skapas det en enorm mängd data som är lagrad i olika databaser. Eftersom data i många fall är lagrat i olika databaser, finns det en efterfrågan från organisationer med mycket data att kunna slå ihop separerad data och få en utvinning av denna resurs. Extract, Transform and Load System (ETL) är en lösning som gjort det möjligt att slå ihop olika databaser. Dock är problemet denna expansion av ETL teknologi. ETL marknaden blivit ägd av stora aktörer såsom Amazon och Microsoft och de lösningar som erbjuds är helt ägda av dem. Detta lämnar konsumenten med lite ägodel av lösningen. Därför föreslår detta examensarbete ett ramverk för att skapa ett komponentbaserat ETL verktyg som ger konsumenter en möjlighet att utveckla en egen ETL lösning som de kan skräddarsy efter deras egna förfogande. Resultatet av examensarbete är en prototyp ETL-lösning som är byggd för att kunna konfigurera och skräddarsy prototypen. Lösningen lyckas med detta genom att vara oberoende av oflexibla externa bibliotek och en nivå av modularitet som gör addering och borttagning av komponenter enkelt. Resultatet av detta examensarbete är verifierat av ett test som visar på hur två olika filer med innehållande data kan kombineras. ETL Modularity Dataset Components Data merge Transform Data profiling Metadata Software Engineering Programvaruteknik
2	Contextualized access to distributed and heterogeneous multimedia data sources / Accès contextualisé aux sources de données multimédias distribuées et hétérogènes Vilsmaier, Christian 26 September 2014 (has links) Rendre les données multimédias disponibles en ligne devient moins cher et plus pratique sur une base quotidienne, par exemple par les utilisateurs eux-mêmes. Des phénomènes du Web comme Facebook, Twitter et Flickr bénéficient de cette évolution. Ces phénomènes et leur acceptation accrue conduisent à une multiplication du nombre d’images disponibles en ligne. La taille cumulée de ces images souvent publiques et donc consultables, est de l’ordre de plusieurs zettaoctets. L’exécution d’une requête de similarité sur de tels volumes est un défi que la communauté scientifique commence à cibler. Une approche envisagée pour faire face à ce problème propose d’utiliser un système distribué et hétérogène de recherche d’images basé sur leur contenu (CBIRs). De nombreux problèmes émergent d’un tel scénario. Un exemple est l’utilisation de formats de métadonnées distincts pour décrire le contenu des images; un autre exemple est l’information technique et structurelle inégale. Les métriques individuelles qui sont utilisées par les CBIRs pour calculer la similarité entre les images constituent un autre exemple. Le calcul de bons résultats dans ce contexte s’avère ainsi une tàche très laborieuse qui n’est pas encore scientifiquement résolue. Le problème principalement abordé dans cette thèse est la recherche de photos de CBIRs similaires à une image donnée comme réponse à une requête multimédia distribuée. La contribution principale de cette thèse est la construction d’un réseau de CBIRs sensible à la sémantique des contenus (CBIRn). Ce CBIRn sémantique est capable de collecter et fusionner les résultats issus de sources externes spécialisées. Afin d’être en mesure d’intégrer de telles sources extérieures, prêtes à rejoindre le réseau, mais pas à divulguer leur configuration, un algorithme a été développé capable d’estimer la configuration d’un CBIRS. En classant les CBIRs et en analysant les requêtes entrantes, les requêtes d’image sont exclusivement transmises aux CBIRs les plus appropriés. De cette fac ̧on, les images sans intérêt pour l’utilisateur peuvent être omises à l’avance. Les images retournées cells sont considérées comme similaires par rapport à l’image donnée pour la requête. La faisabilité de l’approche et l’amélioration obtenue par le processus de recherche sont démontrées par un développement prototypique et son évaluation utilisant des images d’ImageNet. Le nombre d’images pertinentes renvoyées par l’approche de cette thèse en réponse à une requête image est supérieur d’un facteur 4.75 par rapport au résultat obtenu par un réseau de CBIRs predéfini. / Making multimedia data available online becomes less expensive and more convenient on a daily basis. This development promotes web phenomenons such as Facebook, Twitter, and Flickr. These phenomena and their increased acceptance in society in turn leads to a multiplication of the amount of available images online. This vast amount of, frequently public and therefore searchable, images already exceeds the zettabyte bound. Executing a similarity search on the magnitude of images that are publicly available and receiving a top quality result is a challenge that the scientific community has recently attempted to rise to. One approach to cope with this problem assumes the use of distributed heterogeneous Content Based Image Retrieval system (CBIRs). Following from this anticipation, the problems that emerge from a distributed query scenario must be dealt with. For example the involved CBIRs’ usage of distinct metadata formats for describing their content, as well as their unequal technical and structural information. An addition issue is the individual metrics that are used by the CBIRs to calculate the similarity between pictures, as well as their specific way of being combined. Overall, receiving good results in this environment is a very labor intensive task which has been scientifically but not yet comprehensively explored. The problem primarily addressed in this work is the collection of pictures from CBIRs, that are similar to a given picture, as a response to a distributed multimedia query. The main contribution of this thesis is the construction of a network of Content Based Image Retrieval systems that are able to extract and exploit the information about an input image’s semantic concept. This so called semantic CBIRn is mainly composed of CBIRs that are configured by the semantic CBIRn itself. Complementarily, there is a possibility that allows the integration of specialized external sources. The semantic CBIRn is able to collect and merge results of all of these attached CBIRs. In order to be able to integrate external sources that are willing to join the network, but are not willing to disclose their configuration, an algorithm was developed that approximates these configurations. By categorizing existing as well as external CBIRs and analyzing incoming queries, image queries are exclusively forwarded to the most suitable CBIRs. In this way, images that are not of any use for the user can be omitted beforehand. The hereafter returned images are rendered comparable in order to be able to merge them to one single result list of images, that are similar to the input image. The feasibility of the approach and the hereby obtained improvement of the search process is demonstrated by a prototypical implementation. Using this prototypical implementation an augmentation of the number of returned images that are of the same semantic concept as the input images is achieved by a factor of 4.75 with respect to a predefined non-semantic CBIRn. Informatique Bases de données multimédia Recherche d'images Contexte sensitive Fusion de données Fusion de données Information Technology Multimedia databases Image retrieval Contexte sensitive Data Fusion Data merge 005.750 72

Search results

Creating a Customizable Component Based ETL Solution for the Consumer / Skapandet av en anpassningsbar komponentbaserad ETL-lösning för konsumenten

Contextualized access to distributed and heterogeneous multimedia data sources / Accès contextualisé aux sources de données multimédias distribuées et hétérogènes