1 |
Novel Algorithms for Understanding Online ReviewsShi, Tian 14 September 2021 (has links)
This dissertation focuses on the review understanding problem, which has gained attention from both industry and academia, and has found applications in many downstream tasks, such as recommendation, information retrieval and review summarization. In this dissertation, we aim to develop machine learning and natural language processing tools to understand and learn structured knowledge from unstructured reviews, which can be investigated in three research directions, including understanding review corpora, understanding review documents, and understanding review segments.
For the corpus-level review understanding, we have focused on discovering knowledge from corpora that consist of short texts. Since they have limited contextual information, automatically learning topics from them remains a challenging problem. We propose a semantics-assisted non-negative matrix factorization model to deal with this problem. It effectively incorporates the word-context semantic correlations into the model, where the semantic relationships between the words and their contexts are learned from the skip-gram view of a corpus. We conduct extensive sets of experiments on several short text corpora to demonstrate the proposed model can discover meaningful and coherent topics.
For document-level review understanding, we have focused on building interpretable and reliable models for the document-level multi-aspect sentiment analysis (DMSA) task, which can help us to not only recover missing aspect-level ratings and analyze sentiment of customers, but also detect aspect and opinion terms from reviews. We conduct three studies in this research direction. In the first study, we collect a new DMSA dataset in the healthcare domain and systematically investigate reviews in this dataset, including a comprehensive statistical analysis and topic modeling to discover aspects. We also propose a multi-task learning framework with self-attention networks to predict sentiment and ratings for given aspects. In the second study, we propose corpus-level and concept-based explanation methods to interpret attention-based deep learning models for text classification, including sentiment classification. The proposed corpus-level explanation approach aims to capture causal relationships between keywords and model predictions via learning importance of keywords for predicted labels across a training corpus based on attention weights. We also propose a concept-based explanation method that can automatically learn higher level concepts and their importance to model predictions. We apply these methods to the classification task and show that they are powerful in extracting semantically meaningful keywords and concepts, and explaining model predictions. In the third study, we propose an interpretable and uncertainty aware multi-task learning framework for DMSA, which can achieve competitive performance while also being able to interpret the predictions made. Based on the corpus-level explanation method, we propose an attention-driven keywords ranking method, which can automatically discover aspect terms and aspect-level opinion terms from a review corpus using the attention weights. In addition, we propose a lecture-audience strategy to estimate model uncertainty in the context of multi-task learning.
For the segment-level review understanding, we have focused on the unsupervised aspect detection task, which aims to automatically extract interpretable aspects and identify aspect-specific segments from online reviews. The existing deep learning-based topic models suffer from several problems such as extracting noisy aspects and poorly mapping aspects discovered by models to the aspects of interest. To deal with these problems, we propose a self-supervised contrastive learning framework in order to learn better representations for aspects and review segments. We also introduce a high-resolution selective mapping method to efficiently assign aspects discovered by the model to the aspects of interest. In addition, we propose using a knowledge distillation technique to further improve the aspect detection performance. / Doctor of Philosophy / Nowadays, online reviews are playing an important role in our daily lives. They are also critical to the success of many e-commerce and local businesses because they can help people build trust in brands and businesses, provide insights into products and services, and improve consumers' confidence. As a large number of reviews accumulate every day, a central research problem is to build an artificial intelligence system that can understand and interact with these reviews, and further use them to offer customers better support and services. In order to tackle challenges in these applications, we first have to get an in-depth understanding of online reviews.
In this dissertation, we focus on the review understanding problem and develop machine learning and natural language processing tools to understand reviews and learn structured knowledge from unstructured reviews. We have addressed the review understanding problem in three directions, including understanding a collection of reviews, understanding a single review, and understanding a piece of a review segment. In the first direction, we proposed a short-text topic modeling method to extract topics from review corpora that consist of primary complaints of consumers. In the second direction, we focused on building sentiment analysis models to predict the opinions of consumers from their reviews. Our deep learning models can provide good prediction accuracy as well as a human-understandable explanation for the prediction. In the third direction, we develop an aspect detection method to automatically extract sentences that mention certain features consumers are interested in, from reviews, which can help customers efficiently navigate through reviews and help businesses identify the advantages and disadvantages of their products.
|
2 |
Construction of a solid 3D model of geology in Sardinia using GIS methodsTavakoli, Saman January 2009 (has links)
<p><p>Abstract</p><p>3D visualization of geological structures is a very efficient way to create a good understanding of geological features. It is not only an illustrative way for common people, but also a comprehensive method to interpret results of the work. Geologists, geophysics engineers and GIS experts sometimes need to visualize an area to accomplish their researches. It can show how sample data are distributed over the area and therefore they can be applied as suitable approach to validate the result. Among different 3D modeling methods, some are expensive or complicated. Therefore, such a methodology enabling easy and cheap creation of a 3D construction is highly demanded.</p><p>However, several obstacles have been faced during the process of constructing a 3D model of geology. The main debate over suitable interpolation methods is the fact that 3D modelers may face discrepancies leading to different results even when they are working with the same set of data. Furthermore, most often part of data can be source of errors, themselves. Hence, it is extremely important to decide whether to omit those data or adopt another strategy. However, even after considering all these points, still the work may not be accurate enough to be used for scientific researches if the interpretation of work is not done precisely. This research sought to explain an approach for 3D modeling of Sedini platform in Sardinia, Italy. GIS was used as a flexible software together with Surfer and Voxler. Data manipulation, geodatabase creation and interpolation test all have been done with aid of GIS. A variety of interpolation methods available in Surfer were used to opt suitable method together with Arc view.</p><p>A solid 3D model is created in Voxler environment. In Voxler, in contrary to many other 3D types of software there are four components needed to construct 3D. C value as 4<sup>th</sup> component except for XYZ coordinates was used to differentiate special features in platform and do gridding based on chosen value. With the aid of C value, one can mark layer of interest to identify it from other layers.</p><p>The final result shows a 3D solid model of the Sedini platform including both surfaces and subsurfaces. An Isosurface with its unique value (Isovalue) can mark layer of interest and make it easy to interpret the results. However, the errors in some parts of model are also noticeable. Since data acquisition was done for studying geology and mineralogy characteristics of the area, there is less number of data points collected per volume according to the main goals of the initial project. Moreover, in some parts of geological border lines, the density of sample points is not high enough to estimate accurate location of lines.</p><p>The study result can be applicable in a broad range of geological studies. Resource evaluation, geomorphology, structural geology and GIS are only a few examples of its application. The results of the study can be compared to the results of similar works where different softwares have been used so as to comprehend pros and cons of each as well as appropriate application of each software for a special task.</p><p> </p><p> </p><p><em>Keywords: GIS, Image Interpretation, Geodatabase, Geology, Interpolation, 3D Modeling</em></p><p> </p><p> </p></p><p> </p>
|
3 |
Interpreting and Diagnosing Deep Learning Models: A Visual Analytics ApproachWang, Junpeng 11 July 2019 (has links)
No description available.
|
4 |
Interpretation, identification and reuse of models : theory and algorithms with applications in predictive toxicologyPalczewska, Anna Maria January 2014 (has links)
This thesis is concerned with developing methodologies that enable existing models to be effectively reused. Results of this thesis are presented in the framework of Quantitative Structural-Activity Relationship (QSAR) models, but their application is much more general. QSAR models relate chemical structures with their biological, chemical or environmental activity. There are many applications that offer an environment to build and store predictive models. Unfortunately, they do not provide advanced functionalities that allow for efficient model selection and for interpretation of model predictions for new data. This thesis aims to address these issues and proposes methodologies for dealing with three research problems: model governance (management), model identification (selection), and interpretation of model predictions. The combination of these methodologies can be employed to build more efficient systems for model reuse in QSAR modelling and other areas. The first part of this study investigates toxicity data and model formats and reviews some of the existing toxicity systems in the context of model development and reuse. Based on the findings of this review and the principles of data governance, a novel concept of model governance is defined. Model governance comprises model representation and model governance processes. These processes are designed and presented in the context of model management. As an application, minimum information requirements and an XML representation for QSAR models are proposed. Once a collection of validated, accepted and well annotated models is available within a model governance framework, they can be applied for new data. It may happen that there is more than one model available for the same endpoint. Which one to chose? The second part of this thesis proposes a theoretical framework and algorithms that enable automated identification of the most reliable model for new data from the collection of existing models. The main idea is based on partitioning of the search space into groups and assigning a single model to each group. The construction of this partitioning is difficult because it is a bi-criteria problem. The main contribution in this part is the application of Pareto points for the search space partition. The proposed methodology is applied to three endpoints in chemoinformatics and predictive toxicology. After having identified a model for the new data, we would like to know how the model obtained its prediction and how trustworthy it is. An interpretation of model predictions is straightforward for linear models thanks to the availability of model parameters and their statistical significance. For non linear models this information can be hidden inside the model structure. This thesis proposes an approach for interpretation of a random forest classification model. This approach allows for the determination of the influence (called feature contribution) of each variable on the model prediction for an individual data. In this part, there are three methods proposed that allow analysis of feature contributions. Such analysis might lead to the discovery of new patterns that represent a standard behaviour of the model and allow additional assessment of the model reliability for new data. The application of these methods to two standard benchmark datasets from the UCI machine learning repository shows a great potential of this methodology. The algorithm for calculating feature contributions has been implemented and is available as an R package called rfFC.
|
5 |
On the Use of Model-Agnostic Interpretation Methods as Defense Against Adversarial Input Attacks on Tabular DataKanerva, Anton, Helgesson, Fredrik January 2020 (has links)
Context. Machine learning is a constantly developing subfield within the artificial intelligence field. The number of domains in which we deploy machine learning models is constantly growing and the systems using these models spread almost unnoticeably in our daily lives through different devices. In previous years, lots of time and effort has been put into increasing the performance of these models, overshadowing the significant risks of attacks targeting the very core of the systems, the trained machine learning models themselves. A specific attack with the aim of fooling the decision-making of a model, called the adversarial input attack, has almost exclusively been researched for models processing image data. However, the threat of adversarial input attacks stretches beyond systems using image data, to e.g the tabular domain which is the most common data domain used in the industry. Methods used for interpreting complex machine learning models can help humans understand the behavior and predictions of these complex machine learning systems. Understanding the behavior of a model is an important component in detecting, understanding and mitigating vulnerabilities of the model. Objectives. This study aims to reduce the research gap of adversarial input attacks and defenses targeting machine learning models in the tabular data domain. The goal of this study is to analyze how model-agnostic interpretation methods can be used in order to mitigate and detect adversarial input attacks on tabular data. Methods. The goal is reached by conducting three consecutive experiments where model interpretation methods are analyzed and adversarial input attacks are evaluated as well as visualized in terms of perceptibility. Additionally, a novel method for adversarial input attack detection based on model interpretation is proposed together with a novel way of defensively using feature selection to reduce the attack vector size. Results. The adversarial input attack detection showed state-of-the-art results with an accuracy over 86%. The proposed feature selection-based mitigation technique was successful in hardening the model from adversarial input attacks by reducing their scores by 33% without decreasing the performance of the model. Conclusions. This study contributes with satisfactory and useful methods for adversarial input attack detection and mitigation as well as methods for evaluating and visualizing the imperceptibility of attacks on tabular data. / Kontext. Maskininlärning är ett område inom artificiell intelligens som är under konstant utveckling. Mängden domäner som vi sprider maskininlärningsmodeller i växer sig allt större och systemen sprider sig obemärkt nära inpå våra dagliga liv genom olika elektroniska enheter. Genom åren har mycket tid och arbete lagts på att öka dessa modellers prestanda vilket har överskuggat risken för sårbarheter i systemens kärna, den tränade modellen. En relativt ny attack, kallad "adversarial input attack", med målet att lura modellen till felaktiga beslutstaganden har nästan uteslutande forskats på inom bildigenkänning. Men, hotet som adversarial input-attacker utgör sträcker sig utom ramarna för bilddata till andra datadomäner som den tabulära domänen vilken är den vanligaste datadomänen inom industrin. Metoder för att tolka komplexa maskininlärningsmodeller kan hjälpa människor att förstå beteendet hos dessa komplexa maskininlärningssystem samt de beslut som de tar. Att förstå en modells beteende är en viktig komponent för att upptäcka, förstå och mitigera sårbarheter hos modellen. Syfte. Den här studien försöker reducera det forskningsgap som adversarial input-attacker och motsvarande försvarsmetoder i den tabulära domänen utgör. Målet med denna studie är att analysera hur modelloberoende tolkningsmetoder kan användas för att mitigera och detektera adversarial input-attacker mot tabulär data. Metod. Det uppsatta målet nås genom tre på varandra följande experiment där modelltolkningsmetoder analyseras, adversarial input-attacker utvärderas och visualiseras samt där en ny metod baserad på modelltolkning föreslås för detektion av adversarial input-attacker tillsammans med en ny mitigeringsteknik där feature selection används defensivt för att minska attackvektorns storlek. Resultat. Den föreslagna metoden för detektering av adversarial input-attacker visar state-of-the-art-resultat med över 86% träffsäkerhet. Den föreslagna mitigeringstekniken visades framgångsrik i att härda modellen mot adversarial input attacker genom att minska deras attackstyrka med 33% utan att degradera modellens klassifieringsprestanda. Slutsats. Denna studie bidrar med användbara metoder för detektering och mitigering av adversarial input-attacker såväl som metoder för att utvärdera och visualisera svårt förnimbara attacker mot tabulär data.
|
6 |
Interpretation, Identification and Reuse of Models. Theory and algorithms with applications in predictive toxicology.Palczewska, Anna Maria January 2014 (has links)
This thesis is concerned with developing methodologies that enable existing
models to be effectively reused. Results of this thesis are presented in
the framework of Quantitative Structural-Activity Relationship (QSAR)
models, but their application is much more general. QSAR models relate
chemical structures with their biological, chemical or environmental
activity. There are many applications that offer an environment to build
and store predictive models. Unfortunately, they do not provide advanced
functionalities that allow for efficient model selection and for interpretation
of model predictions for new data. This thesis aims to address these
issues and proposes methodologies for dealing with three research problems:
model governance (management), model identification (selection),
and interpretation of model predictions. The combination of these methodologies
can be employed to build more efficient systems for model reuse
in QSAR modelling and other areas.
The first part of this study investigates toxicity data and model formats
and reviews some of the existing toxicity systems in the context of model
development and reuse. Based on the findings of this review and the principles
of data governance, a novel concept of model governance is defined.
Model governance comprises model representation and model governance
processes. These processes are designed and presented in the context of
model management. As an application, minimum information requirements
and an XML representation for QSAR models are proposed.
Once a collection of validated, accepted and well annotated models is
available within a model governance framework, they can be applied for
new data. It may happen that there is more than one model available for
the same endpoint. Which one to chose? The second part of this thesis
proposes a theoretical framework and algorithms that enable automated
identification of the most reliable model for new data from the collection
of existing models. The main idea is based on partitioning of the search
space into groups and assigning a single model to each group. The construction
of this partitioning is difficult because it is a bi-criteria problem.
The main contribution in this part is the application of Pareto points for
the search space partition. The proposed methodology is applied to three
endpoints in chemoinformatics and predictive toxicology.
After having identified a model for the new data, we would like to know
how the model obtained its prediction and how trustworthy it is. An interpretation
of model predictions is straightforward for linear models thanks
to the availability of model parameters and their statistical significance.
For non linear models this information can be hidden inside the model
structure. This thesis proposes an approach for interpretation of a random
forest classification model. This approach allows for the determination of
the influence (called feature contribution) of each variable on the model
prediction for an individual data. In this part, there are three methods proposed
that allow analysis of feature contributions. Such analysis might
lead to the discovery of new patterns that represent a standard behaviour
of the model and allow additional assessment of the model reliability for
new data. The application of these methods to two standard benchmark
datasets from the UCI machine learning repository shows a great potential
of this methodology. The algorithm for calculating feature contributions
has been implemented and is available as an R package called rfFC. / BBSRC and Syngenta (International Research Centre at Jealott’s Hill, Bracknell, UK).
|
7 |
Construction of a solid 3D model of geology in Sardinia using GIS methodsTavakoli, Saman January 2009 (has links)
Abstract 3D visualization of geological structures is a very efficient way to create a good understanding of geological features. It is not only an illustrative way for common people, but also a comprehensive method to interpret results of the work. Geologists, geophysics engineers and GIS experts sometimes need to visualize an area to accomplish their researches. It can show how sample data are distributed over the area and therefore they can be applied as suitable approach to validate the result. Among different 3D modeling methods, some are expensive or complicated. Therefore, such a methodology enabling easy and cheap creation of a 3D construction is highly demanded. However, several obstacles have been faced during the process of constructing a 3D model of geology. The main debate over suitable interpolation methods is the fact that 3D modelers may face discrepancies leading to different results even when they are working with the same set of data. Furthermore, most often part of data can be source of errors, themselves. Hence, it is extremely important to decide whether to omit those data or adopt another strategy. However, even after considering all these points, still the work may not be accurate enough to be used for scientific researches if the interpretation of work is not done precisely. This research sought to explain an approach for 3D modeling of Sedini platform in Sardinia, Italy. GIS was used as a flexible software together with Surfer and Voxler. Data manipulation, geodatabase creation and interpolation test all have been done with aid of GIS. A variety of interpolation methods available in Surfer were used to opt suitable method together with Arc view. A solid 3D model is created in Voxler environment. In Voxler, in contrary to many other 3D types of software there are four components needed to construct 3D. C value as 4th component except for XYZ coordinates was used to differentiate special features in platform and do gridding based on chosen value. With the aid of C value, one can mark layer of interest to identify it from other layers. The final result shows a 3D solid model of the Sedini platform including both surfaces and subsurfaces. An Isosurface with its unique value (Isovalue) can mark layer of interest and make it easy to interpret the results. However, the errors in some parts of model are also noticeable. Since data acquisition was done for studying geology and mineralogy characteristics of the area, there is less number of data points collected per volume according to the main goals of the initial project. Moreover, in some parts of geological border lines, the density of sample points is not high enough to estimate accurate location of lines. The study result can be applicable in a broad range of geological studies. Resource evaluation, geomorphology, structural geology and GIS are only a few examples of its application. The results of the study can be compared to the results of similar works where different softwares have been used so as to comprehend pros and cons of each as well as appropriate application of each software for a special task. Keywords: GIS, Image Interpretation, Geodatabase, Geology, Interpolation, 3D Modeling
|
8 |
Permanganate Reaction Kinetics and Mechanisms and Machine Learning Application in Oxidative Water TreatmentZhong, Shifa 21 June 2021 (has links)
No description available.
|
9 |
HybridMDSD: Multi-Domain Engineering with Model-Driven Software Development using Ontological FoundationsLochmann, Henrik 04 March 2010 (has links) (PDF)
Software development is a complex task. Executable applications comprise a mutlitude of diverse components that are developed with various frameworks, libraries, or communication platforms. The technical complexity in development retains resources, hampers efficient problem solving, and thus increases the overall cost of software production. Another significant challenge in market-driven software engineering is the variety of customer needs. It necessitates a maximum of flexibility in software implementations to facilitate the deployment of different products that are based on one single core.
To reduce technical complexity, the paradigm of Model-Driven Software Development (MDSD) facilitates the abstract specification of software based on modeling languages. Corresponding models are used to generate actual programming code without the need for creating manually written, error-prone assets. Modeling languages that are tailored towards a particular domain are called domain-specific languages (DSLs). Domain-specific modeling (DSM) approximates
technical solutions with intentional problems and fosters the unfolding of specialized expertise. To cope with feature diversity in applications, the Software Product Line Engineering (SPLE)
community provides means for the management of variability in software products, such as feature models and appropriate tools for mapping features to implementation assets.
Model-driven development, domain-specific modeling, and the dedicated management of variability in SPLE are vital for the success of software enterprises. Yet, these paradigms exist in isolation and need to be integrated in order to exhaust the advantages of every single approach. In this thesis, we propose a way to do so.
We introduce the paradigm of Multi-Domain Engineering (MDE) which means model-driven development with multiple domain-specific languages in variability-intensive scenarios. MDE strongly emphasize the advantages of MDSD with multiple DSLs as a neccessity for efficiency in software development and treats the paradigm of SPLE as indispensable means to achieve a maximum degree of reuse and flexibility. We present HybridMDSD as our solution approach to implement the MDE paradigm.
The core idea of HybidMDSD is to capture the semantics of particular DSLs based on properly defined semantics for software models contained in a central upper ontology. Then, the resulting semantic foundation can be used to establish references between arbitrary domain-specific models (DSMs) and sophisticated instance level reasoning ensures integrity and allows to handle partiucular change adaptation scenarios. Moreover, we present an approach to automatically generate composition code that integrates generated assets from separate DSLs. All necessary development tasks are arranged in a comprehensive development process. Finally, we validate the introduced approach with a profound prototypical implementation and an industrial-scale case study. / Softwareentwicklung ist komplex: ausführbare Anwendungen beinhalten und vereinen eine Vielzahl an Komponenten, die mit unterschiedlichen Frameworks, Bibliotheken oder Kommunikationsplattformen entwickelt werden. Die technische Komplexität in der Entwicklung bindet Ressourcen, verhindert effiziente Problemlösung und führt zu insgesamt hohen Kosten bei der Produktion von Software. Zusätzliche Herausforderungen entstehen durch die Vielfalt und Unterschiedlichkeit an Kundenwünschen, die der Entwicklung ein hohes Maß an Flexibilität in Software-Implementierungen abverlangen und die Auslieferung verschiedener Produkte auf Grundlage einer Basis-Implementierung nötig machen.
Zur Reduktion der technischen Komplexität bietet sich das Paradigma der modellgetriebenen Softwareentwicklung (MDSD) an. Software-Spezifikationen in Form abstrakter Modelle werden hier verwendet um Programmcode zu generieren, was die fehleranfällige, manuelle Programmierung ähnlicher Komponenten überflüssig macht. Modellierungssprachen, die auf eine bestimmte Problemdomäne zugeschnitten sind, nennt man domänenspezifische Sprachen (DSLs). Domänenspezifische Modellierung (DSM) vereint technische Lösungen mit intentionalen Problemen und ermöglicht die Entfaltung spezialisierter Expertise. Um der Funktionsvielfalt in Software Herr zu werden, bietet der Forschungszweig der Softwareproduktlinienentwicklung (SPLE) verschiedene Mittel zur Verwaltung von Variabilität in Software-Produkten an. Hierzu zählen Feature-Modelle sowie passende Werkzeuge, um Features auf Implementierungsbestandteile abzubilden.
Modellgetriebene Entwicklung, domänenspezifische Modellierung und eine spezielle Handhabung von Variabilität in Softwareproduktlinien sind von entscheidender Bedeutung für den Erfolg von Softwarefirmen. Zur Zeit bestehen diese Paradigmen losgelöst voneinander und müssen integriert werden, damit die Vorteile jedes einzelnen für die Gesamtheit der Softwareentwicklung entfaltet werden können. In dieser Arbeit wird ein Ansatz vorgestellt, der dies ermöglicht.
Es wird das Multi-Domain Engineering Paradigma (MDE) eingeführt, welches die modellgetriebene Softwareentwicklung mit mehreren domänenspezifischen Sprachen in variabilitätszentrierten Szenarien beschreibt. MDE stellt die Vorteile modellgetriebener Entwicklung mit mehreren DSLs als eine Notwendigkeit für Effizienz in der Entwicklung heraus und betrachtet das SPLE-Paradigma als unabdingbares Mittel um ein Maximum an Wiederverwendbarkeit und Flexibilität zu erzielen. In der Arbeit wird ein Ansatz zur Implementierung des MDE-Paradigmas, mit dem Namen HybridMDSD, vorgestellt.
|
10 |
HybridMDSD: Multi-Domain Engineering with Model-Driven Software Development using Ontological FoundationsLochmann, Henrik 21 December 2009 (has links)
Software development is a complex task. Executable applications comprise a mutlitude of diverse components that are developed with various frameworks, libraries, or communication platforms. The technical complexity in development retains resources, hampers efficient problem solving, and thus increases the overall cost of software production. Another significant challenge in market-driven software engineering is the variety of customer needs. It necessitates a maximum of flexibility in software implementations to facilitate the deployment of different products that are based on one single core.
To reduce technical complexity, the paradigm of Model-Driven Software Development (MDSD) facilitates the abstract specification of software based on modeling languages. Corresponding models are used to generate actual programming code without the need for creating manually written, error-prone assets. Modeling languages that are tailored towards a particular domain are called domain-specific languages (DSLs). Domain-specific modeling (DSM) approximates
technical solutions with intentional problems and fosters the unfolding of specialized expertise. To cope with feature diversity in applications, the Software Product Line Engineering (SPLE)
community provides means for the management of variability in software products, such as feature models and appropriate tools for mapping features to implementation assets.
Model-driven development, domain-specific modeling, and the dedicated management of variability in SPLE are vital for the success of software enterprises. Yet, these paradigms exist in isolation and need to be integrated in order to exhaust the advantages of every single approach. In this thesis, we propose a way to do so.
We introduce the paradigm of Multi-Domain Engineering (MDE) which means model-driven development with multiple domain-specific languages in variability-intensive scenarios. MDE strongly emphasize the advantages of MDSD with multiple DSLs as a neccessity for efficiency in software development and treats the paradigm of SPLE as indispensable means to achieve a maximum degree of reuse and flexibility. We present HybridMDSD as our solution approach to implement the MDE paradigm.
The core idea of HybidMDSD is to capture the semantics of particular DSLs based on properly defined semantics for software models contained in a central upper ontology. Then, the resulting semantic foundation can be used to establish references between arbitrary domain-specific models (DSMs) and sophisticated instance level reasoning ensures integrity and allows to handle partiucular change adaptation scenarios. Moreover, we present an approach to automatically generate composition code that integrates generated assets from separate DSLs. All necessary development tasks are arranged in a comprehensive development process. Finally, we validate the introduced approach with a profound prototypical implementation and an industrial-scale case study. / Softwareentwicklung ist komplex: ausführbare Anwendungen beinhalten und vereinen eine Vielzahl an Komponenten, die mit unterschiedlichen Frameworks, Bibliotheken oder Kommunikationsplattformen entwickelt werden. Die technische Komplexität in der Entwicklung bindet Ressourcen, verhindert effiziente Problemlösung und führt zu insgesamt hohen Kosten bei der Produktion von Software. Zusätzliche Herausforderungen entstehen durch die Vielfalt und Unterschiedlichkeit an Kundenwünschen, die der Entwicklung ein hohes Maß an Flexibilität in Software-Implementierungen abverlangen und die Auslieferung verschiedener Produkte auf Grundlage einer Basis-Implementierung nötig machen.
Zur Reduktion der technischen Komplexität bietet sich das Paradigma der modellgetriebenen Softwareentwicklung (MDSD) an. Software-Spezifikationen in Form abstrakter Modelle werden hier verwendet um Programmcode zu generieren, was die fehleranfällige, manuelle Programmierung ähnlicher Komponenten überflüssig macht. Modellierungssprachen, die auf eine bestimmte Problemdomäne zugeschnitten sind, nennt man domänenspezifische Sprachen (DSLs). Domänenspezifische Modellierung (DSM) vereint technische Lösungen mit intentionalen Problemen und ermöglicht die Entfaltung spezialisierter Expertise. Um der Funktionsvielfalt in Software Herr zu werden, bietet der Forschungszweig der Softwareproduktlinienentwicklung (SPLE) verschiedene Mittel zur Verwaltung von Variabilität in Software-Produkten an. Hierzu zählen Feature-Modelle sowie passende Werkzeuge, um Features auf Implementierungsbestandteile abzubilden.
Modellgetriebene Entwicklung, domänenspezifische Modellierung und eine spezielle Handhabung von Variabilität in Softwareproduktlinien sind von entscheidender Bedeutung für den Erfolg von Softwarefirmen. Zur Zeit bestehen diese Paradigmen losgelöst voneinander und müssen integriert werden, damit die Vorteile jedes einzelnen für die Gesamtheit der Softwareentwicklung entfaltet werden können. In dieser Arbeit wird ein Ansatz vorgestellt, der dies ermöglicht.
Es wird das Multi-Domain Engineering Paradigma (MDE) eingeführt, welches die modellgetriebene Softwareentwicklung mit mehreren domänenspezifischen Sprachen in variabilitätszentrierten Szenarien beschreibt. MDE stellt die Vorteile modellgetriebener Entwicklung mit mehreren DSLs als eine Notwendigkeit für Effizienz in der Entwicklung heraus und betrachtet das SPLE-Paradigma als unabdingbares Mittel um ein Maximum an Wiederverwendbarkeit und Flexibilität zu erzielen. In der Arbeit wird ein Ansatz zur Implementierung des MDE-Paradigmas, mit dem Namen HybridMDSD, vorgestellt.
|
Page generated in 0.1411 seconds