• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 2
  • 2
  • 1
  • Tagged with
  • 13
  • 13
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

[en] A DATA-CENTRIC APPROACH TO IMPROVING SEGMENTATION MODELS WITH DEEP LEARNING IN MAMMOGRAPHY IMAGES / [pt] UMA ABORDAGEM CENTRADA EM DADOS PARA O APRIMORAMENTO DE MODELOS DE SEGMENTAÇÃO COM APRENDIZADO PROFUNDO EM IMAGENS DE MAMOGRAFIA

SANTIAGO STIVEN VALLEJO SILVA 07 December 2023 (has links)
[pt] A segmentação semântica das estruturas anatômicas em imagens de mamografia desempenha um papel significativo no apoio da análise médica. Esta tarefa pode ser abordada com o uso de um modelo de aprendizado de máquina, que deve ser capaz de identificar e delinear corretamente as estruturas de interesse tais como papila, tecido fibroglandular, músculo peitoral e tecido gorduroso. No entanto, a segmentação de estruturas pequenas como papila e peitoral é frequentemente um desafio. Sendo o maior desafio o reconhecimento ou deteção do músculo peitoral na vista craniocaudal (CC), devido ao seu tamanho variável, possíveis ausências e sobreposição de tecido fibroglandular. Para enfrentar esse desafio, este trabalho propõe uma abordagem centrada em dados para melhorar o desempenho do modelo de segmentação na papila mamária e no músculo peitoral. Especificamente, aprimorando os dados de treinamento e as anotações em duas etapas. A primeira etapa é baseada em modificações nas anotações. Foram desenvolvidos algoritmos para buscar automaticamente anotações fora do comum dependendo da sua forma. Com estas anotações encontradas, foi feita uma revisão e correção manual. A segunda etapa envolve um downsampling do conjunto de dados, reduzindo as amostras de imagens do conjunto de treinamento. Foram analisados os casos de falsos positivos e falsos negativos, identificando as imagens que fornecem informações confusas, para posteriormente removê-las do conjunto. Em seguida, foram treinados modelos usando os dados de cada etapa e foram obtidas as métricas de classificação para o músculo peitoral em vista CC e o IoU para cada estrutura nas vistas CC e MLO (Mediolateral Oblíqua). Os resultados do treinamento mostram uma melhora progressiva na identificação e segmentação do músculo peitoral em vista CC e uma melhora na papila em vista MLO, mantendo as métricas para as demais estruturas. / [en] The semantic segmentation of anatomical structures in mammography images plays a significant role in supporting medical analysis. This task can be approached using a machine learning model, which must be capable of identifying and accurately delineating the structures. However, segmentation of small structures such as nipple and pectoral is often challenging. Especially in there cognition or detection of the pectoral muscle in the craniocaudal (CC) view,due to its variable size, possible absences and overlapping of fibroglandular tissue.To tackle this challenge, this work proposes a data-centric approach to improvethe segmentation model s performance on the mammary papilla and pectoral muscle. Specifically, enhancing the training data and annotations in two stages.The first stage is based on modifications to the annotations. Algorithms were developed to automatically search for uncommon annotations dependingon their shape. Once these annotations were found, a manual review and correction were performed.The second stage involves downsampling the dataset, reducing the image samples in the training set. Cases of false positives and false negatives were analyzed, identifying images that provide confusing information, which were subsequently removed from the set. Next, models were trained using the data from each stage, and classification metrics were obtained for the pectoral muscle in the CC view and IoU for each structure in CC and MLO (mediolateral oblique) views. The training results show a progressive improvement in the identification and segmentation of the pectoral muscle in the CC view and an enhancement in the mammary papilla in the MLO view, while maintaining segmentation metricsfor the other structures.
2

Decidability and complexity of simulation preorder for data-centric Web services / Décidabilité et complexité de la relation de simulation des services Web orientés données

Akroun, Lakhdar 08 December 2014 (has links)
Dans cette thèse nous nous intéressons au problème d’analyse des spécifications des protocoles d’interactions des services Web orientés données. La spécification de ce type de protocoles inclut les données en plus de la signature des opérations et des contraintes d’ordonnancement des messages. L’analyse des services orientés données est complexe car l’exécution d’un service engendre une infinité d’états. Notre travail se concentre autour du problème d’existence d’une relation de simulation quand les spécifications des protocoles des services Web sont représentés en utilisant un système à transition orienté données. D’abord nous avons étudié le modèle Colombo [BCG+05]. Dans ce modèle, un service (i) échange des messages en utilisant des variables ; (ii) modifie une base de donnée partagée ; (iii) son comportement est modélisé avec un système à transition. Nous montrons que tester l’existence de la relation de simulation entre deux services Colombo non bornée est indécidable. Puis, nous considérons le cas où les services sont bornés. Nous montrons pour ce cas que le test de simulation est (i) exptime-complet pour les services Colombo qui n’accèdent pas à la base de donnée (noté ColomboDB=∅), et (ii) 2exptime-complet quand le service peut accéder à une base de donnée bornée (Colombobound). Dans la seconde partie de cette thèse, nous avons défini un modèle générique pour étudier l’impact de différents paramètres sur le test de simulation dans le contexte des services Web orientés données. Le modèle générique est un système à transition gardé qui peut lire et écrire à partir d’une base de donnée et échanger des messages avec son environnement (d’autres services ou un client). Dans le modèle générique toutes les actions sont des requêtes sur des bases de données (modification de la base de données, messages échangés et aussi les gardes). Dans ce contexte, nous avons obtenu les résultats suivant : (i) pour les services gardés sans mise à jour, le test de simulation est caractérisé par rapport à la décidabilité du test de satisfiabilité du langage utilisé pour exprimer les gardes augmenté avec une forme restrictive de négation, (ii) pour les services sans mise à jour mais qui peuvent envoyer comme message le résultat d’une requête, nous avons trouvé des conditions suffisantes d’indécidabilité et de décidabilité par rapport au langage utilisé pour exprimer l’échange de messages, et (iii) nous avons étudié le cas des services qui ne peuvent que insérer des tuples dans la base de donnée. Pour ce cas, nous avons étudié la simulation ainsi que la weak simulation et nous avons montré que : (a) la weak simulation est indécidable quand les requêtes d’insertion sont des requêtes conjonctives, (b) le test de simulation est indécidable si la satisfiabilité du langage de requête utilisé pour exprimer les insertions augmenté avec une certaine forme de négation est indécidable. Enfin, nous avons étudié l’interaction entre le langage utilisé pour exprimer les gardes et celui utilisé pour les insertions, nous exhibons une classe de service où la satisfiabilité des deux langages est décidable alors que le test de simulation entre les services qui leur sont associés ne l’est pas. / In this thesis we address the problem of analyzing specifications of data-centric Web service interaction protocols (also called data-centric business protocols). Specifications of such protocols include data in addition to operation signatures and messages ordering constraints. Analysis of data-centric services is a complex task because of the inherently infinite states of the underlying service execution instances. Our work focuses on characterizing the problem of checking a refinement relation between service interaction protocol specifications. More specifically, we consider the problem of checking the simulation preorder when service business protocols are represented using data-centric state machines. First we study the Colombo model [BCG+05]. In this framework, a service (i) exchanges messages using variables; (ii) acts on a shared database; (iii) has a transition based behavior. We show that the simulation test for unbounded Colombo is undecidable. Then, we consider the case of bounded Colombo where we show that simulation is (i) exptime-complete for Colombo services without any access to the database (noted ColomboDB=∅), and (ii) 2exptime-complete when only bounded databases are considered (the obtained model is noted Colombobound). In the second part of this thesis, we define a generic model to study the impact of various parameters on the simulation test in the context of datacentric services. The generic model is a guarded transition system acting (i.e., read and write) on databases (i.e., local and shared) and exchanging messages with its environment (i.e., other services or users). The model was designed with a database theory perspective, where all actions are viewed as queries (i.e modification of databases, messages exchanges and guards). In this context, we obtain the following results (i) for update free guarded services (i.e., generic services with guards and only able to send empty messages) the decidability of simulation is fully characterized w.r.t decidability of satisfiability of the query language used to express the guards augmented with a restrictive form of negation, (ii) for update free send services (i.e., generic services without guards and able to send as messages the result of queries over local and shared database), we exhibit sufficient conditions for both decidability and undecidability of simulation test w.r.t the language used to compute messages payloads, and (iii) we study the case of insert services (i.e., generic services without guards and with the ability of insert the result of queries into the local and the shared database). In this case, we study the simulation as well as the weak simulation relations where we show that: (i) the weak simulation is undecidable when the insertions are expressed as conjunctive queries, (ii) the simulation is undecidable if satisfiability of the query language used to express the insertion augmented with a restricted form of negation is undecidable. Finally, we study the interaction between the queries used as guards and the ones used as insert where we exhibit a class of services where satisfiability of both languages is decidable while simulation is undecidable.
3

Opacité des artefacts d'un système Workflow / Opacity of artifacts in Workflow system

Diouf, Mohamadou Lamine 10 October 2014 (has links)
Une propriété d'un objet est dite opaque pour un observateur si celui-ci ne peut déduire que la propriété est satisfaite sur la base de l'observation qu'il a de cet objet. Supposons qu'un certain de nombre de propriétés (appelées secrets) soient attachées à chaque intervenant d'un système, nous dirons alors que le système lui-même est opaque si chaque secret d'un observateur lui est opaque : il ne peut percer aucun des secrets qui lui ont été attachés. L'opacité a été étudié préalablement dans le contexte des systèmes à événements discrets où différents jeux d'hypothèses ont pu être identifiés pour lesquels on pouvait d'une part décider de l'opacité d'un système et d'autre part développer des techniques pour diagnostiquer et/ou forcer l'opacité. Cette thèse constitue la première contribution au problème de l'opacité des artefacts d'un système à flots de tâches (système workflow). Notre propos est par conséquent de formaliser ce problème en dégageant les hypothèses qui doivent être posées sur ces systèmes pour que l'opacité soit décidable. Nous indiquons quelques techniques pour assurer l'opacité d'un système. / A property (of an object) is opaque to an observer when he or she cannot deduce the property from its set of observations. If each observer is attached to a given set of properties (the so-called secrets), then the system is said to be opaque if each secret is opaque to the corresponding observer. Opacity has been studied in the context of discrete event dynamic systems where technique of control theory were designed to enforce opacity. This thesis is the first attempt to formalize opacity of artifacts in data-centric workflow systems. We motivate this problem and give some assumptions that guarantee the decidability of opacity. Some techniques for enforcing opacity are indicated.
4

Au-delà des frontières entre langages de programmation et bases de données / Breaking boundaries between programming languages and databases

Lopez, Julien 13 September 2019 (has links)
Plusieurs classes de solutions permettent d'exprimer des requêtes dans des langages de programmation: les interfaces spécifiques telles que JDBC, les mappings objet-relationnel ou object-relational mapping en anglais (ORMs) comme Hibernate, et les frameworks de requêtes intégrées au langage comme le framework LINQ de Microsoft. Cependant, la plupart de ces solutions ne permet de requêtes visant plusieurs bases de données en même temps, et aucune ne permet l'utilisation de logique d'application complexe dans des requêtes aux bases de données. Dans cette thèse, nous détaillons la création d'un framework de requêtes intégrées au langage nommé BOLDR qui permet d'évaluer dans les bases de données des requêtes écrites dans des langages de programmation généralistes qui contiennent de la logique d'application, et qui ciblent différentes bases de données potentiellement basées sur des modèles de données différents. Dans ce framework, les requêtes d'une application sont traduites vers une représentation intermédiaire de requêtes, puis réécrites pour éviter le phénomène "d'avalanche de requêtes" et pour profiter au maximum des capacités d'optimisation des bases de données, et enfin envoyées pour évaluation vers les bases de données ciblées et les résultats obtenus sont convertis dans le langage de programmation de l'application. Nos expériences montrent que les techniques implémentées dans ce framework sont applicables pour de véritables applications centrées données, et permettent de gérer efficacement un vaste champ de requêtes intégrées à des langages de programmation généralistes. / Several classes of solutions allow programming languages to express queries: Specific APIs such as JDBC, Object-Relational Mappings (ORMs) such as Hibernate, and language-integrated query frameworks such as Microsoft's LINQ. However, most of these solutions do not allow for efficient cross-databases queries, and none allow the use of complex application logic from the programming language in queries. In this thesis, we create a language-integrated query framework called BOLDR that, in particular, allows the evaluation in databases of queries written in general-purpose programming languages that contain application logic, and that target different databases of possibly different data models. In this framework, application queries are translated to an intermediate representation, then rewritten in order to avoid query avalanches and make the most out of database optimizations, and finally sent for evaluation to the corresponding databases and the results are converted back to the application. Our experiments show that the techniques we implemented are applicable to real-world database applications, successfully handling a variety of language-integrated queries with good performances.
5

Test Data Extraction and Comparison with Test Data Generation

Raza, Ali 01 August 2011 (has links)
Testing an integrated information system that relies on data from multiple sources can be a challenge, particularly when the data is confidential. This thesis describes a novel test data extraction approach, called semantic-based test data extraction for integrated systems (iSTDE) that solves many of the problems associated with creating realistic test data for integrated information systems containing confidential data. iSTDE reads a consistent cross-section of data from the production databases, manipulates that data to obscure individual identities while still preserving overall semantic data characteristics that are critical to thorough system testing, and then moves that test data to an external test environment. This thesis also presents a theoretical study that compares test-data extraction with a competing technique, named test-data generation. Specifically, this thesis a) describes a comparison method that includes a comprehensive list of characteristics essential for testing the database applications organized into seven different areas, b) presents an analysis of the relative strengths and weaknesses of the different test-data creation techniques, and c) reports a number of specific conclusions that will help testers make appropriate choices.
6

XML και σχεσιακές βάσεις δεδομένων: πλαίσιο αναφοράς και αξιολόγησης / XML and relational databases: a frame of report and evaluation

Παλιανόπουλος, Ιωάννης 16 May 2007 (has links)
Η eXtensible Markup Language (XML) είναι εμφανώς το επικρατέστερο πρότυπο για αναπαράσταση δεδομένων στον Παγκόσμιο Ιστό. Αποτελεί μια γλώσσα περιγραφής δεδομένων, κατανοητή τόσο από τον άνθρωπο, όσο και από τη μηχανή. Η χρήση της σε αρχικό στάδιο περιορίστηκε στην ανταλλαγή δεδομένων, αλλά λόγω της εκφραστικότητάς της (σε αντίθεση με το σχεσιακό μοντέλο) μπορεί να αποτελέσει ένα αποτελεσματικό \"όχημα\" μεταφοράς και αποθήκευσης πληροφορίας. Οι σύγχρονες εφαρμογές κάνουν χρήση της τεχνολογίας XML εξυπηρετώντας ανάγκες διαλειτουργικότητας και επικοινωνίας. Ωστόσο, θεωρείται βέβαιο ότι η χρήση της σε επίπεδο υποδομής θα ενδυναμώσει περαιτέρω τις σύγχρονες εφαρμογές. Σε επίπεδο υποδομής, μια βάση δεδομένων που διαχειρίζεται την γλώσσα XML είναι σε θέση να πολλαπλασιάσει την αποδοτικότητά της, εφόσον η βάση δεδομένων μετατρέπεται σε βάση πληροφορίας. Έτσι, όσο οι εφαρμογές γίνονται πιο σύνθετες και απαιτητικές, η ενδυνάμωση των βάσεων δεδομένων με τεχνολογίες που φέρουν/εξυπηρετούν τη σημασιολογία των προβλημάτων υπόσχεται αποτελεσματικότερη αντιμετώπιση στο παραπάνω μέτωπο. Αλλά ποιος είναι ο καλύτερος τρόπος αποδοτικού χειρισμού των XML εγγράφων (XML documents); Με μια πρώτη ματιά η απάντηση είναι προφανής. Εφόσον ένα XML έγγραφο αποτελεί παράδειγμα μιας σχετικά νέας τεχνολογίας, γιατί να μη χρησιμοποιηθούν ειδικά συστήματα για το χειρισμό της; Αυτό είναι πράγματι μια βιώσιμη προσέγγιση και υπάρχει σημαντική δραστηριότητα στην κοινότητα των βάσεων δεδομένων που εστιάζει στην εκμετάλλευση αυτής της προσέγγισης. Μάλιστα, για το σκοπό αυτό, έχουν δημιουργηθεί ειδικά συστήματα βάσεων δεδομένων, οι επονομαζόμενες \"Εγγενείς XML Βάσεις Δεδομένων\" (Native XML Databases). Όμως, το μειονέκτημα της χρήσης τέτοιων συστημάτων είναι ότι αυτή η προσέγγιση δεν αξιοποιεί την πολυετή ερευνητική δραστηριότητα που επενδύθηκε για την τεχνολογία των σχεσιακών βάσεων δεδομένων. Είναι πράγματι γεγονός ότι δεν αρκεί η σχεσιακή τεχνολογία και επιβάλλεται η ανάγκη για νέες τεχνικές; Ή μήπως με την κατάλληλη αξιοποίηση των υπαρχόντων συστημάτων μπορεί να επιτευχθεί ποιοτική ενσωμάτωση της XML; Σε αυτήν την εργασία γίνεται μια μελέτη που αφορά στην πιθανή χρησιμοποίηση των σχεσιακών συστημάτων βάσεων δεδομένων για το χειρισμό των XML εγγράφων. Αφού αναλυθούν θεωρητικά οι τρόποι με τους οποίους γίνεται αυτό, στη συνέχεια εκτιμάται πειραματικά η απόδοση σε δύο από τα πιο δημοφιλή σχεσιακά συστήματα βάσεων δεδομένων. Σκοπός είναι η χάραξη ενός πλαισίου αναφοράς για την αποτίμηση και την αξιολόγηση των σχεσιακών βάσεων δεδομένων που υποστηρίζουν XML (XML-enabled RDBMSs). / The eXtensible Markup Language (XML) is obviously the prevailing model for data representation in the World Wide Web (WWW). It is a data description language comprehensible by both humans and computers. Its usage in an initial stage was limited to the exchange of data, but it can constitute an effective \"vehicle\" for transporting, handling and storing of information, due to its expressiveness (contrary to the relational model). Contemporary applications make heavy use of the XML technology in order to support communication and interoperability . However, supporting XML at the infrastructure level would reduce application development time, would make applications almost automatically complient to standards and would make them less error prone. In terms of infrastructure, a database able to handle XML properly would be beneficial to a wide range of applications thus multiplying its efficiency. In this way, as long as the applications become more complex and demanding, the strengthening of databases with technologies that serve the nature of problems, promises more effective confrontation with this topic. But how can XML documents be supported at the infrastructure level? At a first glance, the question is rhetorical. Since XML constitutes a relatively new technology, new XML-aware infrastructures can be built from scratch. This is indeed a viable approach and there is a considerable activity in the research community of databases, which focuses on the exploitation of this approach. In particular, this is the reason why special database systems have been created, called \"Native XML Databases\". However, the disadvantage of using such systems is that this approach does not build on existing knowledge currently present in the relational database field. The research question would be whether relational technology is able to support correctly XML data. In this thesis, we present a study concerned with the question whether relational database management systems (RDBMSs) provide suitable ground for handling XML documents. Having theoretically analyzed the ways with which RDBMSs handle XML, the performance in two of the most popular relational database management systems is then experimentally assessed. The aim is to draw a frame of report on the assessment and the evaluation of relational database management systems that support XML (XML-enabled RDBMSs).
7

Design Principles for Data Export : Action Design Research in U-CARE

Mustafa, Mudassir Imran January 2012 (has links)
In this thesis, we report the findings of designing data export functionality in Uppsala University Psychosocial Care Program (U-CARE) at Uppsala University. The aim of this thesis was to explore the design space for generic data export functionality in data centric clinical research applications for data analysis. This was attained by the construction and evaluation of a prototype for a data-centric clinical research application. For this purpose Action Design Research (ADR) was conducted, situated in the domain of clinical research. The results consist of a set of design principles expressing key aspects needed to address when designing data export functionality. The artifacts derived from the development and evaluation process each one constitutes an example of how to design for data export functionality of this kind.
8

Data Quality Evaluation and Improvement for Machine Learning

Chen, Haihua 05 1900 (has links)
In this research the focus is on data-centric AI with a specific concentration on data quality evaluation and improvement for machine learning. We first present a practical framework for data quality evaluation and improvement, using a legal domain as a case study and build a corpus for legal argument mining. We first created an initial corpus with 4,937 instances that were manually labeled. We define five data quality evaluation dimensions: comprehensiveness, correctness, variety, class imbalance, and duplication, and conducted a quantitative evaluation on these dimensions for the legal dataset and two existing datasets in the medical domain for medical concept normalization. The first group of experiments showed that class imbalance and insufficient training data are the two major data quality issues that negatively impacted the quality of the system that was built on the legal corpus. The second group of experiments showed that the overlap between the test datasets and the training datasets, which we defined as "duplication," is the major data quality issue for the two medical corpora. We explore several widely used machine learning methods for data quality improvement. Compared to pseudo-labeling, co-training, and expectation-maximization (EM), generative adversarial network (GAN) is more effective for automated data augmentation, especially when a small portion of labeled data and a large amount of unlabeled data is available. The data validation process, the performance improvement strategy, and the machine learning framework for data evaluation and improvement discussed in this dissertation can be used by machine learning researchers and practitioners to build high-performance machine learning systems. All the materials including the data, code, and results will be released at: https://github.com/haihua0913/dissertation-dqei.
9

MHNCS: um middleware para o desenvolvimento de aplicações móveis cientes de contexto com requisitos de QoC / MHNCS: um middleware para o desenvolvimento de aplicações móveis cientes de contexto com requisitos de QoC / MNCS: a middleware for development of context-aware mobile applications with requirements of QoC / MNCS: a middleware for development of context-aware mobile applications with requirements of QoC

Pinheiro, Dejailson Nascimento 06 August 2014 (has links)
Made available in DSpace on 2016-08-17T14:53:29Z (GMT). No. of bitstreams: 1 DISSERTACAO Dejailson Nascimento Pinheiro.pdf: 1433962 bytes, checksum: 4173dad207f09fa2033a834f86a5d4b7 (MD5) Previous issue date: 2014-08-06 / Mobile Social Networks (MSNs) are social structures in which members relate in groups and interaction is accomplished through information and communication technologies using portable devices and wireless network technologies. Healthcare is one among the many possible areas of RSMs application. The MobileHealthNet project, developed in partnership by UFMA and PUC-Rio, aims to develop a middleware that allows access to social networks and facilitate the development of collaborative services targeting the health domain, the exchange of experiences and communication between patients and health professionals, as well as a better management of health resources by government agencies. An important aspect in the development of the MobileHealthNet middleware is the infrastructure necessary for the gathering, distribution and processing of context data. In this master thesis we propose a software infrastructure incorporated to the MobileHealthNet middleware that allows the specification, acquisition, validation and distribution of context data, considering quality requirements, making them available to context-aware applications. The distribution of context data is based on a data-centric the publish/subscribe model, using the OMG-DDS specification. / Redes Sociais Móveis (RSMs) são estruturas sociais em que seus membros relacionam-se em grupos e a interação é realizada através de tecnologias de informação e comunicação utilizando dispositivos portáteis com acesso a tecnologias de rede sem fio. Entre os muitos domínios de aplicação das RSMs, temos a área da saúde. O projeto MobileHealthNet, desenvolvido em parceria pela UFMA e PUC-Rio, tem por objetivo desenvolver um middleware que permita o acesso às redes sociais e facilite o desenvolvimento de serviços colaborativos para o setor da saúde, a troca de experiências e a comunicação entre pacientes e profissionais da saúde, além de uma melhor gestão dos recursos da saúde por órgãos governamentais. Um aspecto importante no desenvolvimento do middleware proposto pelo projeto MobileHealthNet é a infraestrutura necessária para a coleta, distribuição e processamento de dados de contexto. Neste trabalho de mestrado é proposta uma infraestrutura de software incorporada ao middleware MobileHealthNet que permite a especificação, obtenção, validação e distribuição de dados de contexto, considerando requisitos de qualidade, tornando-os disponíveis a aplicações sensíveis ao contexto. A distribuição dos dados de contexto é baseado no modelo publish/subscribe centrado em dados, utilizando-se a especificação OMG-DDS.
10

On the construction of decentralised service-oriented orchestration systems

Jaradat, Ward January 2016 (has links)
Modern science relies on workflow technology to capture, process, and analyse data obtained from scientific instruments. Scientific workflows are precise descriptions of experiments in which multiple computational tasks are coordinated based on the dataflows between them. Orchestrating scientific workflows presents a significant research challenge: they are typically executed in a manner such that all data pass through a centralised computer server known as the engine, which causes unnecessary network traffic that leads to a performance bottleneck. These workflows are commonly composed of services that perform computation over geographically distributed resources, and involve the management of dataflows between them. Centralised orchestration is clearly not a scalable approach for coordinating services dispersed across distant geographical locations. This thesis presents a scalable decentralised service-oriented orchestration system that relies on a high-level data coordination language for the specification and execution of workflows. This system's architecture consists of distributed engines, each of which is responsible for executing part of the overall workflow. It exploits parallelism in the workflow by decomposing it into smaller sub-workflows, and determines the most appropriate engines to execute them using computation placement analysis. This permits the workflow logic to be distributed closer to the services providing the data for execution, which reduces the overall data transfer in the workflow and improves its execution time. This thesis provides an evaluation of the presented system which concludes that decentralised orchestration provides scalability benefits over centralised orchestration, and improves the overall performance of executing a service-oriented workflow.

Page generated in 0.045 seconds