Global ETD Search

121	Data Driven High Performance Data Access Ramljak, Dusan January 2018 (has links) Low-latency, high throughput mechanisms to retrieve data become increasingly crucial as the cyber and cyber-physical systems pour out increasing amounts of data that often must be analyzed in an online manner. Generally, as the data volume increases, the marginal utility of an ``average'' data item tends to decline, which requires greater effort in identifying the most valuable data items and making them available with minimal overhead. We believe that data analytics driven mechanisms have a big role to play in solving this needle-in-the-haystack problem. We rely on the claim that efficient pattern discovery and description, coupled with the observed predictability of complex patterns within many applications offers significant potential to enable many I/O optimizations. Our research covers exploitation of storage hierarchy for data driven caching and tiering, reduction of distance between data and computations, removing redundancy in data, using sparse representations of data, the impact of data access mechanisms on resilience, energy consumption, storage usage, and the enablement of new classes of data driven applications. For caching and prefetching, we offer a powerful model that separates the process of access prediction from the data retrieval mechanism. Predictions are made on a data entity basis and used the notions of ``context'' and its aspects such as ``belief'' to uncover and leverage future data needs. This approach allows truly opportunistic utilization of predictive information. We elaborate on which aspects of the context we are using in areas other than caching and prefetching different situations and why it is appropriate in the specified situation. We present in more details the methods we have developed, BeliefCache for data driven caching and prefetching and AVSC for pattern mining based compression of data. In BeliefCache, using a belief, an aspect of context representing an estimate of the probability that the storage element will be needed, we developed modular framework BeliefCache, to make unified informed decisions about that element or a group. For the workloads we examined we were able to capture complex non-sequential access patterns better than a state-of-the-art framework for optimizing cloud storage gateways. Moreover, our framework is also able to adjust to variations in the workload faster. It also does not require a static workload to be effective since modular framework allows for discovering and adapting to the changes in the workload. In AVSC, using an aspect of context to gauge the similarity of the events, we perform our compression by keeping relevant events intact and approximating other events. We do that in two stages. We first generate a summarization of the data, then approximately match the remaining events with the existing patterns if possible, or add the patterns to the summary otherwise. We show gains over the plain lossless compression for a specified amount of accuracy for purposes of identifying the state of the system and a clear tradeoff in between the compressibility and fidelity. In other mentioned research areas we present challenges and opportunities with the hope that will spur researchers to further examine those issues in the space of rapidly emerging data intensive applications. We also discuss the ideas how our research in other domains could be applied in our attempts to provide high performance data access. / Computer and Information Science Computer Science Caching Data Filtering Data Science Locality Exploitation Prefetching Storage Systems
122	Analysis and Modeling of World Wide Web Traffic Abdulla, Ghaleb 30 April 1998 (has links) This dissertation deals with monitoring, collecting, analyzing, and modeling of World Wide Web (WWW) traffic and client interactions. The rapid growth of WWW usage has not been accompanied by an overall understanding of models of information resources and their deployment strategies. Consequently, the current Web architecture often faces performance and reliability problems. Scalability, latency, bandwidth, and disconnected operations are some of the important issues that should be considered when attempting to adjust for the growth in Web usage. The WWW Consortium launched an effort to design a new protocol that will be able to support future demands. Before doing that, however, we need to characterize current users' interactions with the WWW and understand how it is being used. We focus on proxies since they provide a good medium or caching, filtering information, payment methods, and copyright management. We collected proxy data from our environment over a period of more than two years. We also collected data from other sources such as schools, information service providers, and commercial aites. Sampling times range from days to years. We analyzed the collected data looking for important characteristics that can help in designing a better HTTP protocol. We developed a modeling approach that considers Web traffic characteristics such as self-similarity and long-range dependency. We developed an algorithm to characterize users' sessions. Finally we developed a high-level Web traffic model suitable for sensitivity analysis. As a result of this work we develop statistical models of parameters such as arrival times, file sizes, file types, and locality of reference. We describe an approach to model long-range and dependent Web traffic and we characterize activities of users accessing a digital library courseware server or Web search tools. Temporal and spatial locality of reference within examined user communities is high, so caching can be an effective tool to help reduce network traffic and to help solve the scalability problem. We recommend utilizing our findings to promote a smart distribution or push model to cache documents when there is likelihood of repeat accesses. / Ph. D. Time Series Modeling Scalability World Wide Web Log analysis Caching Proxy
123	Recovery of cached food by captive blue jays (Cyanocitta cristata) Callo, Paul Alexander 18 November 2008 (has links) Corvids are important seed and nut dispersers in North America. To date, the caching and recovery behaviors of four North American Corvids have been documented, n10st notably Clark1s Nutcracker (Nucifraga columbiana). Blue Jays (Cyanocitta cristata) are important dispersers of Quercus, Fagus, and Castanea nuts in eastern North America and their caching behavior in the wild has been well documented. Recovery of caches by the same individual Blue Jay that created the caches has not been demonstrated. In order to do this, I conducted a laboratory study in which I examined caching and recovery behaviors. I 'compared the performance of caching birds with noncaching birds and with a random foraging model. Blue Jays do return to their own caches with success rates higher than predicted by random searching and they also probe fewer sites than predicted by random. They also recover caches at success rates higher than non-caching birds searching for the same caches as well as probe fewer sites than the non-caching birds. There is a difference in probing patterns for recovered caches between caching birds and non-caching birds that suggests the use of spatial memory by caching birds and a difference in foraging strategies between the two groups. Cache recovery order does not exhibit either a primacy or recency effect and cache recovery order does not appear to correlate to nearest neighbor distance models. / Master of Science food caching spatial memory seed dispersal blue jay LD5655.V855 1996.C354
124	Hessian-based occlusion-aware radiance caching Zhao, Yangyang 10 1900 (has links) Simuler efficacement l'éclairage global est l'un des problèmes ouverts les plus importants en infographie. Calculer avec précision les effets de l'éclairage indirect, causés par des rebonds secondaires de la lumière sur des surfaces d'une scène 3D, est généralement un processus coûteux et souvent résolu en utilisant des algorithmes tels que le path tracing ou photon mapping. Ces techniquesrésolvent numériquement l'équation du rendu en utilisant un lancer de rayons Monte Carlo. Ward et al. ont proposé une technique nommée irradiance caching afin d'accélérer les techniques précédentes lors du calcul de la composante indirecte de l'éclairage global sur les surfaces diffuses. Krivanek a étendu l'approche de Ward et Heckbert pour traiter le cas plus complexe des surfaces spéculaires, en introduisant une approche nommée radiance caching. Jarosz et al. et Schwarzhaupt et al. ont proposé un modèle utilisant le hessien et l'information de visibilité pour raffiner le positionnement des points de la cache dans la scène, raffiner de manière significative la qualité et la performance des approches précédentes. Dans ce mémoire, nous avons étendu les approches introduites dans les travaux précédents au problème du radiance caching pour améliorer le positionnement des éléments de la cache. Nous avons aussi découvert un problème important négligé dans les travaux précédents en raison du choix des scènes de test. Nous avons fait une étude préliminaire sur ce problème et nous avons trouvé deux solutions potentielles qui méritent une recherche plus approfondie. / Efficiently simulating global illumination is one of the most important open problems in computer graphics. Accurately computing the effects of indirect illumination, caused by secondary bounces of light off surfaces in a 3D scene, is generally an expensive process and often solved using algorithms such as path tracing or photon mapping. These approaches numerically solve the rendering equation using stochastic Monte Carlo ray tracing. Ward et al. proposed irradiance caching to accelerate these techniques when computing the indirect illumination component on diffuse surfaces. Krivanek extended the approach of Ward and Heckbert to handle the more complex case of glossy surfaces, introducing an approach referred to as radiance caching. Jarosz et al. and Schwarzhaupt et al. proposed a more accurate visibility-aware Hessian-based model to greatly improve the placement of records in the scene for use in an irradiance caching context, significantly increasing the quality and performance of the baseline approach. In this thesis, we extended similar approaches introduced in these aforementioned work to the problem of radiance caching to improve the placement of records. We also discovered a crucial problem overlooked in the previous work due to the choice of test scenes. We did a preliminary study of this problem, and found several potential solutions worth further investigation. éclairage global cache d'irradiance cache de radiance synthèse d'images lancer de rayon rendu photoréaliste global illumination irradiance caching radiance caching image synthesis ray tracing photo-realistic rendering
125	Nuoxus - um modelo de caching proativo de conteúdo multimídia para Fog Radio Access Networks (F-RANs) Costa, Felipe Rabuske 28 February 2018 (has links) Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2018-05-11T12:40:43Z No. of bitstreams: 1 Felipe Rabuske Costa_.pdf: 3408830 bytes, checksum: 25a67ecb02629c811b5f305a1f2e3d27 (MD5) / Made available in DSpace on 2018-05-11T12:40:43Z (GMT). No. of bitstreams: 1 Felipe Rabuske Costa_.pdf: 3408830 bytes, checksum: 25a67ecb02629c811b5f305a1f2e3d27 (MD5) Previous issue date: 2018-02-28 / Nenhuma / Estima-se que até o ano de 2020, cerca de 50 bilhões de dispositivos móveis estarão conectados a redes sem fio e que 78% de todo o tráfego de dados gerado por esse tipo de dispositivos será conteúdo multimídia. Essas estimativas fomentam o desenvolvimento da quinta geração de redes móveis (5G). Uma das arquiteturas mais recentemente proposta, chamada de Fog Radio Access Networks (F-RAN), dá aos componentes localizados na borda da rede poder de processamento e armazenamento endereçados às atividades da rede. Um dos principais problemas dessa arquitetura é o intenso tráfego de dados no seu canal de comunicação centralizado chamado fronthaul, utilizado para conectar as antenas (F-APs) à rede externa. Dado esse contexto, esse trabalho apresenta o Nuoxus, um modelo de caching de conteúdo multimídia voltado para F-RANs que visa amenizar esse problema. Ao armazenar esse tipo de conteúdo nos nós de rede mais próximos ao usuário, o número de acessos concorrentes ao fronthaul é reduzido, sendo esse um dos fatores agravantes na latência de comunicação na rede. O Nuoxus pode ser executado em qualquer nó da rede que possua capacidade de armazenamento e processamento, ficando responsável por gerenciar o caching de conteúdo desse nó. Sua política de substituição de conteúdo utiliza a similaridade de requisições entre os nós filhos e o restante da rede como um fator para definir a relevância de armazenar o conteúdo requisitado em cache. Além disso, utilizando esse mesmo processo, o Nuoxus sugere, de forma proativa, aos demais nós filhos que apresentam um alto grau de similaridade que façam o caching desse conteúdo, visando um possível futuro acesso. A análise do estado da arte demonstra que até o momento não existe nenhum outro trabalho que explore o histórico de requisições para fazer caching de conteúdo em arquiteturas multicamadas para redes sem fio de forma proativa e sem utilizar algum componente centralizado para fazer coordenação e predição de caching. A fim de comprovar a eficiência do modelo, foi desenvolvido um protótipo utilizando o simulador ns-3. Os resultados obtidos demostram que a utilização do Nuoxus foi capaz de reduzir a latência de rede em cerca de 29.75%. Além disso, quando comparado com outras estratégias de caching, o número de acesso à cache dos componentes de rede aumentou em 53.16% em relação à estratégia que obteve o segundo melhor resultado. / It is estimated that by the year 2020, about 50 billion mobile devices will be connected to wireless networks and 78% of the data traffic of this kind of device will be multimedia content. These estimates foster the development of the 5th generation of mobile networks (5G). One of the most recently proposed architectures, named Fog Radio Access Networks or F-RAN, gives the components located at the edge of the network the processing power and storage capacity to address network activities. One of the main problems of this architecture is the intense data traffic in its centralized component named fronthaul, which is used to connect the antennas (FAPs) to the external network. Given this context, we propose Nuoxus, a multimedia content caching model for F-RANs that aims to mitigate this problem. By storing the content in the nodes closest to the user, the number of concurrent accesses to the fronthaul is reduced, which decreases the communication latency of the network. Nuoxus can run on any network node that has storage and processing capacity, becoming the responsible for managing the cache of that node. Its content replacement policy uses the similarity of requests between the child nodes and the rest of the network as a factor to decide the relevance of storing the requested content in the cache. Furthermore, by using this same process, Nuoxus proactively suggests to the child nodes whose degree of similarity is high to perform the caching of the content, assuming they will access the content at a future time. The State-of-the-art analysis shows that there is no other work that explores the history of requests to cache content in multi-layer architectures for wireless networks in a proactive manner, without using some centralized component to do coordination and prediction of caching. To demonstrate the efficiency of the model, a prototype was developed using the ns 3 simulator. The results obtained demonstrate that the use of Nuoxus reduced network latency in 29.75%. In addition, when compared to other caching strategies, the cache hit increased by 53.16% when compared to the strategy that obtained the second-best result. Redes de Acesso em FOG Redes de Acesso de Borda F-RAN Caching Nuoxus Similaridade de Cosseno FOG Radio Access Networks Edge Radio Access Networks F-RAN Caching Nuoxus Cosine Similarity
126	Mobility Metrics for Routing in MANETs Xu, Sanlin, SanlinXu@yahoo.com January 2007 (has links) A Mobile Ad hoc Network (MANET) is a collection of wireless mobile nodes forming a temporary network without the need for base stations or any other preexisting network infrastructure. In a peer-to-peer fashion, mobile nodes can communicate with each other by using wireless multihop communication. Due to its low cost, high flexibility, fast network establishment and self-reconfiguration, ad hoc networking has received much interest during the last ten years. However, without a fixed infrastructure, frequent path changes cause significant numbers of routing packets to discover new paths, leading to increased network congestion and transmission latency over fixed networks. Many on-demand routing protocols have been developed by using various routing mobility metrics to choose the most reliable routes, while dealing with the primary obstacle caused by node mobility. ¶ In the first part, we have developed an analysis framework for mobility metrics in random mobility model. Unlike previous research, where the mobility metrics were mostly studied by simulations, we derive the analytical expressions of mobility metrics, including link persistence, link duration, link availability, link residual time, link change rate and their path equivalents. We also show relationships between the different metrics, where they exist. Such exact expressions constitute precise mathematical relationships between network connectivity and node mobility. ¶ We further validate our analysis framework in Random Walk Mobility model (RWMM). Regarding constant or random variable node velocity, we construct the transition matrix of Markov Chain Model through the analysis of the PDF of node separation after one epoch. In addition, we present intuitive and simple expressions for the link residual time and link duration, for the RWMM, which relate them directly to the ratio between transmission range and node speed. We also illustrate the relationship between link change rate and link duration. Finally, simulation results for all mentioned mobility metrics are reported which match well the proposed analytical framework. ¶ In the second part, we investigate the mobility metric applications on caching strategies and hierarchy routing algorithm. When on-demand routing employed, stale route cache information and frequent new-route discovery in processes in MANETs generate considerable routing delay and overhead. This thesis proposes a practical route caching strategy to minimize routing delay and/or overhead by setting route cache timeout to a mobility metric, the expected path residual time. The strategy is independent of network traffic load and adapts to various non-identical link duration distributions, so it is feasible to implement in a real-time route caching scheme. Calculated results show that the routing delay achieved by the route caching scheme is only marginally more than the theoretically determined minimum. Simulation in NS-2 demonstrates that the end-to-end delay from DSR routing can be remarkably reduced by our caching scheme. By using overhead analysis model, we demonstrate that the minimum routing overhead can be achieved by increasing timeout to around twice the expected path residual time, without significant increase in routing delay. ¶ Apart from route cache, this thesis also addresses link cache strategy which has the potential to utilize route information more efficiently than a route cache scheme. Unlike some previous link cache schemes delete links at some fixed time after they enter the cache, we proposes using either the expected path duration or the link residual time as the link cache timeout. Simulation results in NS-2 show that both of the proposed link caching schemes can improve network performance in the DSR by reducing dropped data packets, latency and routing overhead, with the link residual time scheme out-performing the path duration scheme. ¶ To deal with large-scale MANETs, this thesis presents an adaptive k-hop clustering algorithm (AdpKHop), which selects clusterhead (CH) by our CH selection metrics. The proposed CH selection criteria enable that the chosen CHs are closer to the cluster centroid and more stable than other cluster members with respect to node mobility. By using merging threshold which is based on the CH selection metric, 1-hop clusters can merge to k-hop clusters, where the size of each k-hop cluster adapts to the node mobility of the chosen CH. Moreover, we propose a routing overhead analysis model for k-hop clustering algorithm, which is determined by a range of network parameters, such as link change rate (related to node mobility), node degree and cluster density. Through the overhead analysis, we show that an optimal k-hop cluster density does exist, which is independent of node mobility. Therefore, the corresponding optimal cluster merging threshold can be employed to efficiently organise k-hop clusters to achieve minimum routing overhead, which is highly desirable in large-scale networks. ¶ The work presented in this thesis provides a sound basis for future research on mobility analysis for mobile ad hoc networks, in aspects such as mobility metrics, caching strategies and k-hop clustering routing protocols. MANETs mobile ad hoc networks mobility metrics Markov chain model route caching link caching cache timeout minimum routing overhead hierarchy routing k-hop clustering clusterhead selectionmetric optimal cluster density
127	Délivrance de services média suivant le contexte au sein d'environnements hétérogènes pour les réseaux médias du futur / Context-aware media services delivery in heterogeneous environments for future media networks Ait Chellouche, Soraya 09 December 2011 (has links) La généralisation de l’usage de l’Internet, ces dernières années, a été marquée par deux tendances importantes. Nous citerons en premier, l’enthousiasme de plus en plus grand des utilisateurs pour les services médias. Cette tendance est particulièrement accentuée par l’avènement des contenus générés par les utilisateurs qui amènent dans les catalogues des fournisseurs de services un choix illimité de contenus. L’autre tendance est la diversification et l’hétérogénéité en ressources des terminaux et réseaux d’accès. Seule la valeur du service lui-même compte aujourd’hui pour les utilisateurs et non le moyen d’y accéder. Cependant, offrir aux utilisateurs un accès ubiquitaire à de plus en plus de services Internet, impose des exigences très rigoureuses sur l’infrastructure actuelle de l’Internet. En effet, L’évolution de l’Internet devient aujourd’hui une évidence et cette évolution est d’autant plus nécessaire dans un contexte de services multimédias qui sont connus pour leur sensibilité au contexte dans lequel ils sont consommés et pour générer d’énormes quantités de trafic. Dans le cadre de cette thèse, nous nous focalisons sur deux enjeux importants dans l’évolution de l’Internet. A savoir, faciliter le déploiement de services médias personnalisés et adaptatifs et améliorer les plateformes de distribution de ces derniers afin de permettre leur passage à l’échelle tout en gardant la qualité de service à un niveau satisfaisant pour les utilisateurs finaux. Afin de permettre ceci, nous introduisons en premier, une nouvelle architecture multi environnements et multi couches permettant un environnement collaboratif pour le partage et la consommation des services médias dans un cadre des réseaux média du futur. Puis, nous proposons deux contributions majeures que nous déployons sur la couche virtuelle formés par les Home-Boxes (passerelles résidentielles évoluées) introduite dans l’architecture précédente. Dans notre première contribution, nous proposons un environnement permettant le déploiement à grande échelle de services sensibles au contexte. Deux approches ont été considérées dans la modélisation et la gestion du contexte. La première approche est basée sur les langages de balisage afin de permettre un traitement du contexte plus léger et par conséquent des temps de réponse très petits. La seconde approche, quant à elle est basée sur les ontologies et les règles afin de permettre plus d’expressivité et un meilleur partage et réutilisation des informations de contexte. Les ontologies étant connues pour leur complexité, le but de cette proposition et de prouver la faisabilité d’une telle approche dans un contexte de services multimédias par des moyen de distribution de la gestion du contexte. Concernant notre deuxième contribution, l’idée et de tirer profit des ressources (disque et connectivité) des Home-Boxes déjà déployées, afin d’améliorer les plateformes de distribution des services médias et d’améliorer ainsi le passage à l’échelle, la performance et la fiabilité de ces derniers et ce, à moindre coût. Pour cela, nous proposons deux solutions pour deux problèmes communément traités dans la réplication des contenus : (1) la redirection de requêtes pour laquelle nous proposons un algorithme de sélection à deux niveaux de filtrage, un premier filtrage basé sur les règles afin de personnaliser les services en fonction du contexte de leur consommation suivi d’un filtrage basé sur des métriques réseaux (charges des serveurs et délais entre les serveurs et les clients) ; et (2) le placement et la distribution des contenus sur les caches pour lesquels on a proposé une stratégie de mise en cache online, basée sur la popularité des contenus. / Users’ willingness to consume media services along with the compelling proliferation of mobile devices interconnected via multiple wired and wireless networking technologies place high requirements on the Future Internet. It is a common belief today that Internet should evolve towards providing end users with ubiquitous and high quality media services and this, in a scalable, reliable, efficient and interoperable way. However, enabling such a seamless media delivery raises a number of challenges. On one hand, services should be more context-aware to enable their delivery to a large and disparate computational context. On another hand, current Internet media delivery infrastructures need to scale in order to meet the continuously growing number of users while keeping quality at a satisfying level. In this context, we introduce a novel architecture, enabling a novel collaborative framework for sharing and consuming Media Services within Future Internet (FI). The introduced architecture comprises a number of environments and layers aiming to improve today’s media delivery networks and systems towards a better user experience. In this thesis, we are particulary interested in enabling context-aware multimedia services provisioning that meets on one hand, the users expectations and needs and on another hand, the exponentially growing users’ demand experienced by these services. Two major and demanding challenges are then faced in this thesis (1) the design of a context-awareness framework that allows adaptive multimedia services provisioning and, (2) the enhancement of the media delivery platform to support large-scale media services. The proposed solutions are built on the newly introduced virtual Home-Box layer in the latter proposed architecture.First, in order to achieve context-awareness, two types of frameworks are proposed based on the two main models for context representation. The markup schemes-based framework aims to achieve light weight context management to ensure performance in term of responsiveness. The second framework uses ontology and rules to model and manage context. The aim is to allow higher formality and better expressiveness and sharing. However, ontology is known to be complex and thus difficult to scale. The aim of our work is then to prove the feasibility of such a solution in the field of multimedia services provisioning when the context management is distributed among the Home-Box layer. Concerning the media services delivery enhancement, the idea is to leverage the participating and already deployed Home-Boxes disk storage and uploading capabilities to achieve service performance, scalability and reliability. Towards this, we have addressed two issues that are commonly induced by the content replication: (1) the server selection for which we have proposed a two-level anycast-based request redirection strategy that consists in a preliminary filtering based on the clients’ contexts and in a second stage provides accurate network distance information, using not only the end-to-end delay metric but also the servers’ load one and, (2) the content placement and replacement in cache for which we have designed an adaptive online popularity-based video caching strategy among the introduced HB overlay. Rédeaux Médias du Futur Sensibilité au contexte Ontologies Redirection de requêtes Equilibrage de charge Caching Future Media Nerworks Context-awareness Ontology Video on Demand provisioning Request redirection Load balancing Caching Home-gateway evolution
128	External Streaming State Abstractions and Benchmarking / Extern strömmande statliga abstraktioner och benchmarking Sree Kumar, Sruthi January 2021 (has links) Distributed data stream processing is a popular research area and is one of the promising paradigms for faster and efficient data management. Application state is a first-class citizen in nearly every stream processing system. Nowadays, stream processing is, by definition, stateful. For a stream processing application, the state is backing operations such as aggregations, joins, and windows. Apache Flink is one of the most accepted and widely used stream processing systems in the industry. One of the main reasons engineers choose Apache Flink to write and deploy continuous applications is its unique combination of flexibility and scalability for stateful programmability, and the firm guarantee that the system ensures. Apache Flink’s guarantees always make its states correct and consistent even when nodes fail or when the number of tasks changes. Flink state can scale up to its compute node’s hard disk boundaries using embedded databases to store and retrieve data. Nevertheless, in all existing state backends officially supported by Flink, the state is always available locally to compute tasks. Even though this makes deployment more convenient, it creates other challenges such as non-trivial state reconfiguration and failure recovery. At the same time, compute, and state are bound to be tightly coupled. This strategy also leads to over-provisioning and is counterintuitive on state intensive only workloads or compute-intensive only workloads. This thesis investigates an alternative state backend architecture, FlinkNDB, which can tackle these challenges. FlinkNDB decouples state and computes by using a distributed database to store the state. The thesis covers the challenges of existing state backends and design choices and the new state backend implementation. We have evaluated the implementation of FlinkNDB against existing state backends offered by Apache Flink. / Distribuerad dataströmsbehandling är ett populärt forskningsområde och är ett av de lovande paradigmen för snabbare och effektivare datahantering. Applicationstate är en förstklassig medborgare i nästan alla strömbehandlingssystem. Numera är strömbearbetning per definition statlig. För en strömbehandlingsapplikation backar staten operationer som aggregeringar, sammanfogningar och windows. Apache Flink är ett av de mest accepterade och mest använda strömbehandlingssystemen i branschen. En av de främsta anledningarna till att ingenjörer väljer ApacheFlink för att skriva och distribuera kontinuerliga applikationer är dess unika kombination av flexibilitet och skalbarhet för statlig programmerbarhet, och företaget garanterar att systemet säkerställer. Apache Flinks garantier gör alltid dess tillstånd korrekt och konsekvent även när noder misslyckas eller när antalet uppgifter ändras. Flink-tillstånd kan skala upp till dess beräkningsnods hårddiskgränser genom att använda inbäddade databaser för att lagra och hämta data. I allmänna tillståndsstöd som officiellt stöds av Flink är staten dock alltid tillgänglig lokalt för att beräkna uppgifter. Även om detta gör installationen bekvämare, skapar det andra utmaningar som icke-trivial tillståndskonfiguration och felåterställning. Samtidigt måste beräkning och tillstånd vara tätt kopplade. Den här strategin leder också till överanvändning och är kontraintuitiv för statligt intensiva endast arbetsbelastningar eller beräkningsintensiva endast arbetsbelastningar. Denna avhandling undersöker en alternativ statsbackendarkitektur, FlinkNDB, som kan hantera dessa utmaningar. FlinkNDB frikopplar tillstånd och beräknar med hjälp av en distribuerad databas för att lagra tillståndet. Avhandlingen täcker utmaningarna med befintliga statliga backends och designval och den nya implementeringen av statebackend. Vi har utvärderat genomförandet av FlinkNDBagainst befintliga statliga backends som erbjuds av Apache Flink. Apache Flink Distributed Systems NDB FlinkNDB State State Backends External State Stream Processing Systems Benchmarking Caching Apache Flink Distributed Systems NDB FlinkNDB State State Backends External State Stream Processing Systems Benchmarking Caching Computer and Information Sciences Data- och informationsvetenskap
129	Semantic Caching for XML Queries Chen, Li 29 January 2004 (has links) With the advent of XML, great challenges arise from the demand for efficiently retrieving information from remote XML sources across the Internet. The semantic caching technology can help to improve the efficiency of XML query processing in the Web environment. Different from the traditional tuple or page-based caching systems, semantic caching systems exploit the idea of reusing cached query results to answer new queries based on the query containment and rewriting techniques. Fundamental results on the containment of relational queries have been established. In the XML setting, the containment problem remains unexplored for comprehensive XML query languages such as XQuery, and little has been studied with respect to the cache management issue such as replacement. Hence, this dissertation addresses two issues fundamental to building an XQuery-based semantic caching system: XQuery containment and rewriting, and an effective replacement strategy. We first define a restricted XQuery fragment for which the containment problem is tackled. For two given queries $Q1$ and $Q2$, a preprocessing step including variable minimization and query normalization is taken to transform them into a normal form. Then two tree structures are constructed for respectively representing the pattern matching and result construction components of the query semantics. Based on the tree structures, query containment is reduced to tree homomorphism, with some specific mapping conditions. Important notations and theorems are also presented to support our XQuery containment and rewriting approaches. For the cache replacement, we propose a fine-grained replacement strategy based on the detailed user access statistics recorded on the internal XML view structure. As a result, less frequently used XML view fragments are replaced to achieve a better utilization of the cache space. Finally, we has implemented a semantic caching system called ACE-XQ to realize the proposed techniques. Case studies are conducted to confirm the correctness of our XQuery containment and rewriting approaches by comparing the query results produced by utilizing ACE-XQ against those by the remote XQuery engine. Experimental studies show that the query performance is significantly improved by adopting ACE-XQ, and that our partial replacement helps to enhance the cache hits and utilization comparing to the traditional total replacement. Replacement Strategy Query Rewriting Query Containment Semantic Caching Query XML XML (Document markup language) Cache memory Query languages (Computer science)
130	Scalable Visual Hierarchy Exploration Stroe, Ionel Daniel 10 May 2000 (has links) More and more modern computer applications, from business decision support to scientific data analysis, utilize visualization techniques to support exploratory activities. Various tools have been proposed in the past decade to help users better interpret data using such display techniques. However, most do not scale well with regard to the size of the dataset upon which they operate. In particular, the level of cluttering on the screen is typically unacceptable and the performance is poor. To solve the problem of cluttering at the interface level, visualization tools have recently been extended to support hierarchical views of the data, with support for focusing and drilling-down using interactive brushes. To solve the scalability problem, we now investigate how best to couple such a visualization tool with a database management system without losing the real-time characteristics. This integration must be done carefully, since visual user interactions implemented as main memory operations do not map directly into efficient database operations. The main efficiency issue when doing this integration is to avoid the recursive processing required for hierarchical data retrieval. For this problem, we have develop a tree labeling method, called MinMax tree, that allows the movement of the on-line recursive processing into an off-line precomputation step. Thus, at run time, the recursive processing operations translate into linear cost range queries. Secondly, we employ a main memory access strategy to support incremental loading of data into the main memory. The techniques have been incorporated into XmdvTool, a multidimensional visual exploration tool, in order to achieve scalability. The tool now successfully scales up to datasets of the order 10^5-10^7 records. Lastly, we report experimental results that illustrate the impact of the proposed techniques on the system's overall performance. semantic caching prefetching recursive queries hierarchical structures database backend visual exploration Visualization Data processing Database management Computer storage devices

Search results