Global ETD Search

21	Efficient processing of multiway spatial join queries in distributed systems / Processamento eficiente de consultas de multi-junção espacial em sistemas distribuídos Oliveira, Thiago Borges de 29 November 2017 (has links) Submitted by Franciele Moreira (francielemoreyra@gmail.com) on 2017-12-12T16:13:05Z No. of bitstreams: 2 Tese - Thiago Borges de Oliveira - 2017.pdf: 1684209 bytes, checksum: f64b32084ca6b13a58109e4d2cffe541 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2017-12-13T09:33:57Z (GMT) No. of bitstreams: 2 Tese - Thiago Borges de Oliveira - 2017.pdf: 1684209 bytes, checksum: f64b32084ca6b13a58109e4d2cffe541 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-12-13T09:33:57Z (GMT). No. of bitstreams: 2 Tese - Thiago Borges de Oliveira - 2017.pdf: 1684209 bytes, checksum: f64b32084ca6b13a58109e4d2cffe541 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-11-29 / Multiway spatial join is an important type of query in spatial data processing, and its efficient execution is a requirement to move spatial data analysis to scalable platforms as has already happened with relational and unstructured data. In this thesis, we provide a set of comprehensive models and methods to efficiently execute multiway spatial join queries in distributed systems. We introduce a cost-based optimizer that is able to select a good execution plan for processing such queries in distributed systems taking into account: the partitioning of data based on the spatial attributes of datasets; the intra-operator level of parallelism, which enables high scalability; and the economy of cluster resources by appropriately scheduling the queries before execution. We propose a cost model based on relevant metadata about the spatial datasets and the data distribution, which identifies the pattern of costs incurred when processing a query in this environment. We formalized the distributed multiway spatial join plan scheduling problem as a bi-objective linear integer model, considering the minimization of both the makespan and the communication cost as objectives. Three methods are proposed to compute schedules based on this model that significantly reduce the resource consumption required to process a query. Although targeting multiway spatial join query scheduling, these methods can be applied to other kinds of problems in distributed systems, notably problems that require both the alignment of data partitions and the assignment of jobs to machines. Additionally, we propose a method to control the usage of resources and increase system throughput in the presence of constraints on the network or processing capacity. The proposed cost-based optimizer was able to select good execution plans for all queries in our experiments, using public datasets with a significant range of sizes and complex spatial objects. We also present an execution engine that is capable of performing the queries with near-linear scalability with respect to execution time. / A multi-junção espacial é um tipo importante de consulta usada no processamento de dados espaciais e sua execução eficiente é um requisito para mover a análise de dados espaciais para plataformas escaláveis, assim como aconteceu com dados relacionais e não estruturados. Nesta tese, propomos um conjunto de modelos e métodos para executar eficientemente consultas de multi-junção espacial em sistemas distribuídos. Apresentamos um otimizador baseado em custos que seleciona um bom plano de execução levando em consideração: o particionamento de dados com base nos atributos espaciais dos datasets; o nível de paralelismo intra-operador que proporciona alta escalabilidade; e o escalonamento das consultas antes da execução que resulta em economia de recursos computacionais. Propomos um modelo de custo baseado em metadados dos datasets e da distribuição de dados, que identifica o padrão de custos incorridos no processamento de uma consulta neste ambiente. Formalizamos o problema de escalonamento de planos de execução da multi-junção espacial distribuída como um modelo linear inteiro bi-objetivo, que minimiza tanto o custo de processamento quanto o custo de comunicação. Propomos três métodos para gerar escalonamentos a partir deste modelo, os quais reduzem significativamente o consumo de recursos no processamento das consultas. Embora projetados para o escalonamento da multi-junção espacial, esses métodos podem também ser aplicados a outros tipos de problemas em sistemas distribuídos, que necessitam do alinhamento de partições de dados e da distribuição de tarefas a máquinas de forma balanceada. Além disso, propomos um método para controlar o uso de recursos e aumentar a vazão do sistema na presença de restrições nas capacidades da rede ou de processamento. O otimizador proposto foi capaz de selecionar bons planos de execução para todas as consultas em nossos experimentos, as quais usaram datasets públicos com uma variedade significativa de tamanhos e de objetos espaciais complexos. Apresentamos também uma máquina de execução, capaz de executar as consultas com escalabilidade próxima de linear em relação ao tempo de execução. Read more Distributed multiway spatial join Cost-based optimizer Job scheduling Histograms Multi-junção espacial distribuída Otimizador baseado em custos Escalonamento de tarefas Histogramas
22	An investigation into parallel job scheduling using service level agreements Ali, Syed Zeeshan January 2014 (has links) A scheduler, as a central components of a computing site, aggregates computing resources and is responsible to distribute the incoming load (jobs) between the resources. Under such an environment, the optimum performance of the system against the service level agreement (SLA) based workloads, can be achieved by calculating the priority of SLA bound jobs using integrated heuristic. The SLA defines the service obligations and expectations to use the computational resources. The integrated heuristic is the combination of different SLA terms. It combines the SLA terms with a specific weight for each term. Theweights are computed by applying parameter sweep technique in order to obtain the best schedule for the optimum performance of the system under the workload. The sweepingof parameters on the integrated heuristic observed to be computationally expensive. The integrated heuristic becomes more expensive if no value of the computed weights result in improvement in performance with the resulting schedule. Hence, instead of obtaining optimum performance it incurs computation cost in such situations. Therefore, there is a need of detection of situations where the integrated heuristic can be exploited beneficially. For that reason, in this thesis we propose a metric based on the concept of utilization, to evaluate the SLA based parallel workloads of independent jobs to detect any impact of integrated heuristic on the workload. Read more 004
23	Towards Workload-aware Efficient Machine Learning Systems Khan, Redwan Ibne Seraj 03 March 2025 (has links) Machine learning (ML) is transforming various aspects of our lives, driving the need for computing systems that efficiently support large-scale ML workloads. As models grow in size and complexity, existing systems struggle to adapt, limiting both performance and flexibility. Additionally, ML techniques can enhance traditional computing tasks, but current systems lack the adaptability to integrate these advancements effectively. Building systems for running machine learning workloads, and running workloads using machine learning - both require a careful understanding of the nature of the systems and ML models. In this dissertation we design and develop a series of novel storage and scheduling solutions for ML systems by bringing attention to the unique characteristics of workloads and the underlying system. We find that by designing ML systems that are finely tuned to workload characteristics and underlying infrastructure, we can significantly enhance application performance and maximize resource utilization. In the first part of this dissertation (Ch- 3), we analyze popular ML models and datasets, uncovering insights that inspired SHADE, a data-importance-aware caching solution for ML. The second part of this dissertation (Ch- 4) proposes to leverage system characteristics of hundreds of client devices along with the characteristics of the samples within the clients to design novel sampling, caching and client scheduling mechanisms to tackle the data and system heterogeneity among client devices and thereby fundamentally improve the performance of federated learning using edge devices in the cloud. The third part of this dissertation (Ch- 5) proposes to leverage multi-agent LLM application and user request characteristics to design an efficient request scheduling mechanism that can serve clients in multi-tenant environments in a fair and efficient manner while preventing abuse. My dissertation demonstrates that leveraging workload-aware strategies can significantly enhance the efficiency (e.g., reduced training time, increased throughput, lower latency) and flexibility (e.g., improved ease of use, deployment, and programmability) of ma- chine learning systems. By accounting for workload dynamicity and heterogeneity, these principles can guide the design of next-generation ML systems, ensuring adaptability to emerging models and evolving hardware technologies. / Doctor of Philosophy / Machine learning (ML) has become an integral part of our daily lives, powering applications from virtual assistants to medical diagnostics. As ML models grow larger and more complex, the systems that run them must evolve to keep pace. This dissertation explores how we can build more efficient and adaptable computing systems to support large-scale ML workloads. Traditional computing systems often struggle to accommodate the ever-changing demands of ML applications. Similarly, ML techniques can be leveraged to improve the performance of non-ML workloads, but existing systems lack the flexibility to integrate these advancements seamlessly. This research tackles both challenges: designing systems optimized for ML workloads and enhancing traditional systems using ML-driven insights. By designing intelligent, workload-aware strategies, this research demonstrates substantial improvements in the speed, efficiency, and flexibility of ML systems. These principles will help shape the next generation of computing infrastructure, ensuring that future ML models and applications can be deployed smoothly, regardless of scale or complexity. Read more Machine Learning Deep Learning Federated Learning High Performance Computing Cloud Computing Storage Systems Data Storage Data Management Machine Learning Systems Job Scheduling Resource Management MLSys SysML Efficiency Flexibility
24	Heuristic Methods For Job Scheduling In A Heat Treatment Shop To Maximize Kiln Utilization Srinidhi, S 02 1900 (has links) Scheduling in the context of manufacturing systems has become increasingly impor- tant in order for organizations to achieve success in dynamic and competitive scenarios. Scheduling can be described as allocation of available jobs over resources to meet the performance criteria defined in a domain. Our research work fo cuses on scheduling a given set of three-dimensional cylindrical items, each characterized by width wj , height hj, and depth dj , onto parallel non-identical rectangular heat treatment kilns, such that the capacities of the kilns is optimally used. The problem is strongly NP-hard as it generalizes the (one-dimensional) Bin Packing Problem (1BP), in which a set of n positive values wj has to be partitioned into the minimum number of subsets so that the total value in each subset does not exceed the bin capacity W. The problem has been formulated as a variant of the 3D-BPP by following the MILP approach, and we propose a weight optimization heuristic that produces solutions comparable to that of the LP problem, in addition to reducing the computational complexity. Finally, we also propose a Decomposition Algorithm (DA) and validate the perfor- mance effectiveness of our heuristic. The numerical analyses provides useful insights that influence the shop-floor decision making process. Read more Manufacturing Paradigms Job-Shop Scheduling Kiln Utilization - Job Scheduling Kiln Utilization - Heuristics Bin Packing Problem Scheduling (Management) Strip Packing Problem (SPP) Container Packing Problem (CPP) Incompatible Jobs Decomposition Algorithm (DA) Management
25	Semantic Labeling of Large Geographic Areas Using Multi-Date and Multi-View Satellite Images and Noisy OpenStreetMap Labels Bharath Kumar Comandur Jagannathan Raghunathan (9187466) 31 July 2020 (has links) <div>This dissertation addresses the problem of how to design a convolutional neural network (CNN) for giving semantic labels to the points on the ground given the satellite image coverage over the area and, for the ground truth, given the noisy labels in OpenStreetMap (OSM). This problem is made challenging by the fact that -- (1) Most of the images are likely to have been recorded from off-nadir viewpoints for the area of interest on the ground; (2) The user-supplied labels in OSM are frequently inaccurate and, not uncommonly, entirely missing; and (3) The size of the area covered on the ground must be large enough to possess any engineering utility. As this dissertation demonstrates, solving this problem requires that we first construct a DSM (Digital Surface Model) from a stereo fusion of the available images, and subsequently use the DSM to map the individual pixels in the satellite images to points on the ground. That creates an association between the pixels in the images and the noisy labels in OSM. The CNN-based solution we present yields a 4-8% improvement in the per-class segmentation IoU (Intersection over Union) scores compared to the traditional approaches that use the views independently of one another. The system we present is end-to-end automated, which facilitates comparing the classifiers trained directly on true orthophotos vis-`a-vis first training them on the off-nadir images and subsequently translating the predicted labels to geographical coordinates. This work also presents, for arguably the first time, an in-depth discussion of large-area image alignment and DSM construction using tens of true multi-date and multi-view WorldView-3 satellite images on a distributed OpenStack cloud computing platform.</div> Read more Photogrammetry and Remote Sensing Computer Vision Semantic segmentation Deep Learning Applications Open Street Map Data Fusion Approach Stereo Matching Algorithm 3D reconstruction, multi-view data processing satellite images Building detection Road detection Remote sensing imagery Automated methods
26	Δενδρικές δομές διαχείρισης πληροφορίας και βιομηχανικές εφαρμογές / Tree structures for information management and industrial applications Σοφοτάσιος, Δημήτριος 06 February 2008 (has links) H διατριβή διερευνά προβλήματα αποδοτικής οργάνωσης χωροταξικών δεδομένων, προτείνει συγκεκριμένες δενδρικές δομές για τη διαχείρισή τους και, τέλος, δίνει παραδείγματα χρήσης τους σε ειδικές περιοχές εφαρμογών. Το πρώτο κεφάλαιο ασχολείται με το γεωμετρικό πρόβλημα της εύρεσης των ισo-προσανατολισμένων ορθογωνίων που περικλείουν ένα query αντικείμενο που μπορεί να είναι ένα ισο-προσανατολισμένο ορθογώνιο είτε σημείο ή κάθετο / οριζόντιο ευθύγραμμο τμήμα. Για την επίλυσή του προτείνεται μια πολυεπίπεδη δενδρική δομή που βελτιώνει τις πολυπλοκότητες των προηγούμενων καλύτερων λύσεων. Το δεύτερο κεφάλαιο εξετάζει το πρόβλημα της ανάκτησης σημείων σε πολύγωνα. H προτεινόμενη γεωμετρική δομή είναι επίσης πολυεπίπεδη και αποδοτική όταν το query πολύγωνο έχει συγκεκριμένες ιδιότητες. Το τρίτο κεφάλαιο ασχολείται με την εφαρμογή δενδρικών δομών σε δύο βιομηχανικά προβλήματα. Το πρώτο αφορά στη μείωση της πολυπλοκότητας ανίχνευσης συγκρούσεων κατά την κίνηση ενός ρομποτικού βραχίονα σε μια επίπεδη σκηνή με εμπόδια. Ο αλγόριθμος επίλυσης κάνει χρήση μιας ουράς προτεραιότητας και μιας UNION-FIND δομής ενώ αξιοποιεί γνωστές δομές και αλγόριθμους της Υπολογιστικής Γεωμετρίας όπως υπολογισμός κυρτών καλυμμάτων, έλεγχος polygon inclusion, κλπ. Το δεύτερο πρόβλημα ασχολείται με το σχεδιασμό απαιτήσεων υλικών (MRP) σε ένα βιομηχανικό σύστημα παραγωγής. Για το σκοπό αυτό αναπτύχθηκε ένας MRP επεξεργαστής που χρησιμοποιεί διασυνδεμένες λίστες και εκτελείται στην κύρια μνήμη για να είναι αποδοτικός. Το τελευταίο κεφάλαιο εξετάζει το πρόβλημα του ελέγχου της παραγωγής και συγκεκριμένα της δρομολόγησης εργασιών. Στο πλαίσιο αυτό σχεδιάστηκε και υλοποιήθηκε ένα ευφυές σύστημα δρομολόγησης σε περιβάλλον ροής που συνδυάζει γνωσιακή τεχνολογία και προσομοίωση με on-line έλεγχο προκειμένου να υποστηρίξει το διευθυντή παραγωγής στη λήψη αποφάσεων. / Τhe dissertation examines problems of efficient organization of spatial data, proposes specific tree structures for their management, and finally, gives examples of their use in specific application areas. The first chapter is about the problem of finding the iso-oriented rectangles that enclose a query object which can be an iso-oriented rectangle either a point or a vertical / horizontal line segment. A multilevel tree structure is proposed to solve the problem which improves the complexities of the best previous known solutions. The second chapter examines the problem of point retrieval on polygons. The proposed geometric structure is also multileveled and efficient when the query polygon has specific properties. The third chapter is about the application of tree structures in two manufacturing problems. The first one concerns the reduction in the complexity of collision detection as a robotic arm moves on a planar scene with obstacles. For the solution a priority queue and a UNION-FIND structure are used, whereas known data structures and algorithms of Computational Geometry such as construction of convex hulls, polygon inclusion testing, etc. are applied. The second problem is about material requirements planning (MRP) in a manufacturing production system. To this end an MRP processor was developed, which uses linked lists and runs in main memory to retain efficiency. The last chapter examines the production control problem, and more specifically the job scheduling problem. In this context, an intelligent scheduling system was designed and developed for flow shop production control which combines knowledge-based technology and simulation with on-line control in order to support the production manager in decision making. Read more Αναζήτηση περιοχής Γεωμετρικός δυϊσμός Ουρέςπροτεραιότητας Δομές ένωσης-εύρεσης Κυρτά καλύμματα Δρομολόγηση εργασιών Γνωσιακή τεχνολογία 658.403 8 Geometric data structures Range searching Enclosure of iso-oriented objects Point retrieval on polygons Geometric duality Half plane searching Collisison detection Priority queues Union-find structures Convex hulls Material requirements planning Manufacturing production control Job scheduling Intelligent decision support systems Knowledge-based technology Simulation with on-line control

Page generated in 0.0623 seconds