Global ETD Search

1	Spatio-Temporal Data Analysis by Transformed Gaussian Processes Yan, Yuan 06 December 2018 (has links) In the analysis of spatio-temporal data, statistical inference based on the Gaussian assumption is ubiquitous due to its many attractive properties. However, data collected from different fields of science rarely meet the assumption of Gaussianity. One option is to apply a monotonic transformation to the data such that the transformed data have a distribution that is close to Gaussian. In this thesis, we focus on a flexible two-parameter family of transformations, the Tukey g-and-h (TGH) transformation. This family has the desirable properties that the two parameters g ∈ R and h ≥ 0 involved control skewness and tail-heaviness of the distribution, respectively. Applying the TGH transformation to a standard normal distribution results in the univariate TGH distribution. Extensions to the multivariate case and to a spatial process were developed recently. In this thesis, motivated by the need to exploit wind as renewable energy, we tackle the challenges of modeling big spatio-temporal data that are non-Gaussian by applying the TGH transformation to different types of Gaussian processes: spatial (random field), temporal (time series), spatio-temporal, and their multivariate extensions. We explore various aspects of spatio-temporal data modeling techniques using transformed Gaussian processes with the TGH transformation. First, we use the TGH transformation to generate non-Gaussian spatial data with the Matérn covariance function, and study the effect of non-Gaussianity on Gaussian likelihood inference for the parameters in the Matérn covariance via a sophisticatedly designed simulation study. Second, we build two autoregressive time series models using the TGH transformation. One model is applied to a dataset of observational wind speeds and shows advantaged in accurate forecasting; the other model is used to fit wind speed data from a climate model on gridded locations covering Saudi Arabia and to Gaussianize the data for each location. Third, we develop a parsimonious spatio-temporal model for time series data on a spatial grid and utilize the aforementioned Gaussianized climate model wind speed data to fit the latent Gaussian spatio-temporal process. Finally, we discuss issues under a unified framework of modeling multivariate trans-Gaussian processes and adopt one of the TGH autoregressive models to build a stochastic generator for global wind speed. non-gaussian spatiotemporal model data transformation wind data
2	QuickMig: automatic schema matching for data migration projects Drumm, Christian, Schmitt, Matthias, Do, Hong-Hai, Rahm, Erhard 14 December 2018 (has links) A common task in many database applications is the migration of legacy data from multiple sources into a new one. This requires identifying semantically related elements of the source and target systems and the creation of mapping expressions to transform instances of those elements from the source format to the target format. Currently, data migration is typically done manually, a tedious and timeconsuming process, which is difficult to scale to a high number of data sources. In this paper, we describe QuickMig, a new semi-automatic approach to determining semantic correspondences between schema elements for data migration applications. QuickMig advances the state of the art with a set of new techniques exploiting sample instances, domain ontologies, and reuse of existing mappings to detect not only element correspondences but also their mapping expressions. QuickMig further includes new mechanisms to effectively incorporate domain knowledge of users into the matching process. The results from a comprehensive evaluation using real-world schemas and data indicate the high quality and practicability of the overall approach.
3	Agrupamento de trabalhadores com perfis semelhantes de aprendizado utilizando técnicas multivariadas Azevedo, Bárbara Brzezinski January 2013 (has links) A manufatura de produtos customizados resulta em variedade de modelos, redução no tamanho de lotes e alternância frequente de tarefas executadas por trabalhadores. Neste contexto, tarefas manuais são especialmente afetadas por conta do processo de adaptação do trabalhador a novos modelos de produtos. Este processo de aprendizado pode ocorrer de maneira distinta dentro de um grupo de trabalhadores. Assim, busca-se o agrupamento dos trabalhadores com perfis similares de aprendizado, monitorando a formação de gargalos em linhas de produção constituídas por dissimilaridades de aprendizado em processos manuais. A presente dissertação apresenta abordagens para clusterização de trabalhadores baseadas nos parâmetros oriundos da modelagem de Curvas de Aprendizado. Tais parâmetros, os quais caracterizam o processo de adaptação de trabalhadores a tarefas, são transformados através da Análise de Componentes Principais e então utilizados como variáveis de clusterização. Na sequência, testam-se outras transformações nos parâmetros utilizando funções Kernel. Os trabalhadores são clusterizados através do método K-Means e Fuzzy C-Means e a qualidade dos agrupamentos formados é medida através do Silhouette Index. Por fim, sugere-se um índice de importância de variável baseado em parâmetros obtidos na Análise Componentes Principais com o objetivo de selecionar as variáveis mais relevantes para clusterização. As abordagens propostas são aplicadas em um processo da indústria calçadista, gerando resultados satisfatórios quando comparados a clusterizações realizadas sem a transformação prévia dos dados ou sem seleção das variáveis. / Manufacturing of customized products relies on a large menu choice, reduced batch sizes and frequent alternation of tasks performed by workers. In this context, manual tasks are especially affected by workers’ adaptation to new product models. This learning process takes place in different paces within a group of workers. This thesis aims at grouping workers with similar learning process tailored to avoid bottlenecks in production lines due to learning dissimilarities among workers. For that matter, we present a method for clustering workers based on parameters derived from Learning Curve (LC) modeling. Such parameters are processed through Principal Component Analysis (PCA), and the PCA scores are used as clustering variables. Next, Kernel transformations are also used to improve clustering quality. The data is clustered using K-Means and Fuzzy C-Means techniques, and the quality of resulting clusters is measured by the Silhouette Index. Finally, we suggest a variable importance index based on parameters derived from PCA to select the most relevant variables for clustering. The proposed approaches are applied in a footwear process, yielding satisfactory results when compared to clustering on original data or without variable selection. Aprendizagem organizacional Cluster industrial Learning curves Clustering Principal components analysis Data transformation Variable selection
4	XML as a Format for Representation and Manipulation of Data from Radar Communications Alfredsson, Anders January 2001 (has links) <p>XML was designed to be a new standard for marking up data on the web. However, as a result of its extensible and flexible properties, XML is now being used more and more for other purposes than was originally intended. Today XML is prompting an approach more focused on data exchange, between different applications inside companies or even between cooperating businesses.</p><p>Businesses are showing interest in using XML as an integral part of their work. Ericsson Microwave Systems (EMW) is a company that sees XML as a conceivable solution to problems in the work with radar communications. An approach towards a solution based on a relational database system has earlier been analysed.</p><p>In this project we present an investigation of the work at EMW, and identification and documentation of the problems in the radar communication work. Also, the requirements and expectations that EMW has on XML are presented. Moreover, an analysis has been made to decide to what extent XML could be used to solve the problems of EMW. The analysis was conducted by elucidating the problems and possibilities of XML compared to the previous approach for solving the problems at EMW, which was based on using a relational database management system.</p><p>The analysis shows that XML has good features for representing hierarchically structured data, as in the EMW case. It is also shown that XML is good for data integration purposes. Furthermore, the analysis shows that XML, due to its self-describing and weak typing nature, is inappropriate to use in the data semantics and integrity problem context of EMW. However, it also shows that the new XML Schema standard could be used as a complement to the core XML standard, to partially solve the semantics problems.</p> XML data representation data transformation data semantics query language Computer and systems science Data- och systemvetenskap
5	Transformacijų šablonais grindžiamas duomenų saugyklos projektavimo procesas / Data warehouse building process based on data transformation templates Paulavičiūtė, Kristina 11 January 2006 (has links) Growing amount of data and needs of data analysis starts needs of data warehouses. A lot of organizations operational data cumulates in OLTP DBMS databases. Organization historical data are cumulating in data warehouses. These data are adjusted for data analysis. DBMS ETL tools don’t have good data warehouse building opportunities. Created ETL tool for MS SQL Server makes data warehouse building process easier and speedier. Informatics Data transformation Duomenų saugykla Saugyklos kūrimas Duomenų transformacijos warehousing Data warehouse
6	Agrupamento de trabalhadores com perfis semelhantes de aprendizado utilizando técnicas multivariadas Azevedo, Bárbara Brzezinski January 2013 (has links) A manufatura de produtos customizados resulta em variedade de modelos, redução no tamanho de lotes e alternância frequente de tarefas executadas por trabalhadores. Neste contexto, tarefas manuais são especialmente afetadas por conta do processo de adaptação do trabalhador a novos modelos de produtos. Este processo de aprendizado pode ocorrer de maneira distinta dentro de um grupo de trabalhadores. Assim, busca-se o agrupamento dos trabalhadores com perfis similares de aprendizado, monitorando a formação de gargalos em linhas de produção constituídas por dissimilaridades de aprendizado em processos manuais. A presente dissertação apresenta abordagens para clusterização de trabalhadores baseadas nos parâmetros oriundos da modelagem de Curvas de Aprendizado. Tais parâmetros, os quais caracterizam o processo de adaptação de trabalhadores a tarefas, são transformados através da Análise de Componentes Principais e então utilizados como variáveis de clusterização. Na sequência, testam-se outras transformações nos parâmetros utilizando funções Kernel. Os trabalhadores são clusterizados através do método K-Means e Fuzzy C-Means e a qualidade dos agrupamentos formados é medida através do Silhouette Index. Por fim, sugere-se um índice de importância de variável baseado em parâmetros obtidos na Análise Componentes Principais com o objetivo de selecionar as variáveis mais relevantes para clusterização. As abordagens propostas são aplicadas em um processo da indústria calçadista, gerando resultados satisfatórios quando comparados a clusterizações realizadas sem a transformação prévia dos dados ou sem seleção das variáveis. / Manufacturing of customized products relies on a large menu choice, reduced batch sizes and frequent alternation of tasks performed by workers. In this context, manual tasks are especially affected by workers’ adaptation to new product models. This learning process takes place in different paces within a group of workers. This thesis aims at grouping workers with similar learning process tailored to avoid bottlenecks in production lines due to learning dissimilarities among workers. For that matter, we present a method for clustering workers based on parameters derived from Learning Curve (LC) modeling. Such parameters are processed through Principal Component Analysis (PCA), and the PCA scores are used as clustering variables. Next, Kernel transformations are also used to improve clustering quality. The data is clustered using K-Means and Fuzzy C-Means techniques, and the quality of resulting clusters is measured by the Silhouette Index. Finally, we suggest a variable importance index based on parameters derived from PCA to select the most relevant variables for clustering. The proposed approaches are applied in a footwear process, yielding satisfactory results when compared to clustering on original data or without variable selection. Aprendizagem organizacional Cluster industrial Learning curves Clustering Principal components analysis Data transformation Variable selection
7	XML as a Format for Representation and Manipulation of Data from Radar Communications Alfredsson, Anders January 2001 (has links) XML was designed to be a new standard for marking up data on the web. However, as a result of its extensible and flexible properties, XML is now being used more and more for other purposes than was originally intended. Today XML is prompting an approach more focused on data exchange, between different applications inside companies or even between cooperating businesses. Businesses are showing interest in using XML as an integral part of their work. Ericsson Microwave Systems (EMW) is a company that sees XML as a conceivable solution to problems in the work with radar communications. An approach towards a solution based on a relational database system has earlier been analysed. In this project we present an investigation of the work at EMW, and identification and documentation of the problems in the radar communication work. Also, the requirements and expectations that EMW has on XML are presented. Moreover, an analysis has been made to decide to what extent XML could be used to solve the problems of EMW. The analysis was conducted by elucidating the problems and possibilities of XML compared to the previous approach for solving the problems at EMW, which was based on using a relational database management system. The analysis shows that XML has good features for representing hierarchically structured data, as in the EMW case. It is also shown that XML is good for data integration purposes. Furthermore, the analysis shows that XML, due to its self-describing and weak typing nature, is inappropriate to use in the data semantics and integrity problem context of EMW. However, it also shows that the new XML Schema standard could be used as a complement to the core XML standard, to partially solve the semantics problems. XML data representation data transformation data semantics query language Information Systems
8	Komponentizace transformací linked data / Componentization of Linked Data Transformations Pilař, Štěpán January 2013 (has links) The diploma thesis is focused on transformation of linked data and opportunities for componentization of extract, transform, load process resulting in reusability of such components. UnifiedViews serves as a framework for demonstration of implementa-tion of selected components. Initial review of related fields of study, relational data oriented ETL and linked data quality management being most prominent, is followed by bottom-up analysis of existing extractors and transformations. Identified common transformations are supplemented by operations known from transformations of relational data. Options and limits of each component candidate are discussed as well as possible cooperation with other components. The next section discusses supported ways of implementation in the selected environment and provides a list of key questions for decision making process is provided. The last part describes implementation of selected components with respect to the approach suggested in the preceding section. Practical use as well as limitations of the implemented components are demonstrated on tasks transforming public contracts datasets.
9	Agrupamento de trabalhadores com perfis semelhantes de aprendizado utilizando técnicas multivariadas Azevedo, Bárbara Brzezinski January 2013 (has links) A manufatura de produtos customizados resulta em variedade de modelos, redução no tamanho de lotes e alternância frequente de tarefas executadas por trabalhadores. Neste contexto, tarefas manuais são especialmente afetadas por conta do processo de adaptação do trabalhador a novos modelos de produtos. Este processo de aprendizado pode ocorrer de maneira distinta dentro de um grupo de trabalhadores. Assim, busca-se o agrupamento dos trabalhadores com perfis similares de aprendizado, monitorando a formação de gargalos em linhas de produção constituídas por dissimilaridades de aprendizado em processos manuais. A presente dissertação apresenta abordagens para clusterização de trabalhadores baseadas nos parâmetros oriundos da modelagem de Curvas de Aprendizado. Tais parâmetros, os quais caracterizam o processo de adaptação de trabalhadores a tarefas, são transformados através da Análise de Componentes Principais e então utilizados como variáveis de clusterização. Na sequência, testam-se outras transformações nos parâmetros utilizando funções Kernel. Os trabalhadores são clusterizados através do método K-Means e Fuzzy C-Means e a qualidade dos agrupamentos formados é medida através do Silhouette Index. Por fim, sugere-se um índice de importância de variável baseado em parâmetros obtidos na Análise Componentes Principais com o objetivo de selecionar as variáveis mais relevantes para clusterização. As abordagens propostas são aplicadas em um processo da indústria calçadista, gerando resultados satisfatórios quando comparados a clusterizações realizadas sem a transformação prévia dos dados ou sem seleção das variáveis. / Manufacturing of customized products relies on a large menu choice, reduced batch sizes and frequent alternation of tasks performed by workers. In this context, manual tasks are especially affected by workers’ adaptation to new product models. This learning process takes place in different paces within a group of workers. This thesis aims at grouping workers with similar learning process tailored to avoid bottlenecks in production lines due to learning dissimilarities among workers. For that matter, we present a method for clustering workers based on parameters derived from Learning Curve (LC) modeling. Such parameters are processed through Principal Component Analysis (PCA), and the PCA scores are used as clustering variables. Next, Kernel transformations are also used to improve clustering quality. The data is clustered using K-Means and Fuzzy C-Means techniques, and the quality of resulting clusters is measured by the Silhouette Index. Finally, we suggest a variable importance index based on parameters derived from PCA to select the most relevant variables for clustering. The proposed approaches are applied in a footwear process, yielding satisfactory results when compared to clustering on original data or without variable selection. Aprendizagem organizacional Cluster industrial Learning curves Clustering Principal components analysis Data transformation Variable selection
10	Návrh části webové aplikace pro výpočet režijních nákladů / A Design of a Portion of Web Application for Overhead Cost Calculation Florians, Patrik January 2021 (has links) Subject of this thesis is to design a web application for overhead calculation, whose purpose is to be a substitution for presently used solution, which is considered to be deprecated. This is being done as a part of strategy of SAP SE corporation for which the solution is designed. This ambition to develop and improve cloud portfolio of already existing applications of the company should lead to betterment of already existing applications of this type and in a long run an improvement of the company’s market position as well as it’s products. The thesis is divided into 3 parts. It begins with a description of theoretical concepts, tools and principles, which are then utilized in further chapters. The following chapter analyzes current state of the affairs, where it is illustrated, what the current solution looks like along with key parts of it. The final, 3rd chapter is dedicated to a description of implemented solution and it also closely describes key differences mentioned in chapter 2.

Search results