Global ETD Search

31	Aplikace metod DZD na otevřená data / Use of data mining techniques for open data Prokůpek, Miroslav January 2015 (has links) This diploma thesis examines applications of datamining methods to open data. It is realized by solving analytical questions using the LISp-Miner system. Analytical questions are examined in data from The Czech Trade Inspection Authority from the perspective of the data owner. Procedure used to solve analytical questions is 4ft-Miner. There are presented and resolved four analytical questions, which are the results of the work. Work includes a detailed description of the transformation of the relational database into a format suitable for data mining. A detailed description of the data is also included. The theoretical part deals with the GUHA method and CRISP-DM methodology.
32	Product Information Management / Product Information Management Antonov, Anton January 2012 (has links) Product Information Management (PIM) is a field that deals with the product master data management and combines into one base the experience and the principles of data integration and data quality. Product Information Management merges the specific attributes of products across all channels in the supply chain. By unification, centralization and standardization of product information into one platform, quality and timely information with added value can be achieved. The goal of the theoretical part of the thesis is to construct a picture of the PIM, to place the PIM into a broader context, to define and describe various parts of the PIM solution, to describe the main differences in characteristics between the product data and data about clients and to summarize the available information on the administration and management of knowledge bases of the PIM data quality relevant for solving practical problems. The practical part of the thesis focuses on designing the structure, the content and the method of filling the knowledge base of the Product Information Management solution in the environment of the DataFlux software tools from SAS Institute. The practical part of the thesis further incorporates the analysis of the real product data, the design of definitions and objects of the knowledge base, the creation of a reference database and the testing of the knowledge base with the help of specially designed web services.
33	MDM of Product Data / MDM produktovych dat (MDM of Product Data) Čvančarová, Lenka January 2012 (has links) This thesis is focused on Master Data Management of Product Data. At present, most publications on the topic of MDM take into account customer data, and a very limited number of sources focus solely on product data. Some resources actually do attempt to cover MDM in full-depth. Even those publications are typically are very customer oriented. The lack of Product MDM oriented literature became one of the motivations for this thesis. Another motivation was to outline and analyze specifics of Product MDM in context of its implementation and software requirements for a vendor of MDM application software. For this I chose to create and describe a methodology for implementing MDM of product data. The methodology was derived from personal experience on projects focused on MDM of customer data, which was applied on findings from the theoretical part of this thesis. By analyzing product data characteristics and their impacts on MDM implementation as well as their requirements for application software, this thesis helps vendors of Customer MDM to understand the challenges of Product MDM and therefore to embark onto the product data MDM domain. Moreover this thesis can also serve as an information resource for enterprises considering adopting MDM of product data into their infrastructure.
34	Linked Data Quality Assessment and its Application to Societal Progress Measurement Zaveri, Amrapali 17 April 2015 (has links) In recent years, the Linked Data (LD) paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Moreover, the semantics and structure of the underlying data are kept intact, making this the Semantic Web. LD essentially entails a set of best practices for publishing and connecting structure data on the Web, which allows publish- ing and exchanging information in an interoperable and reusable fashion. Many different communities on the Internet such as geographic, media, life sciences and government have already adopted these LD principles. This is confirmed by the dramatically growing Linked Data Web, where currently more than 50 billion facts are represented. With the emergence of Web of Linked Data, there are several use cases, which are possible due to the rich and disparate data integrated into one global information space. Linked Data, in these cases, not only assists in building mashups by interlinking heterogeneous and dispersed data from multiple sources but also empowers the uncovering of meaningful and impactful relationships. These discoveries have paved the way for scientists to explore the existing data and uncover meaningful outcomes that they might not have been aware of previously. In all these use cases utilizing LD, one crippling problem is the underlying data quality. Incomplete, inconsistent or inaccurate data affects the end results gravely, thus making them unreliable. Data quality is commonly conceived as fitness for use, be it for a certain application or use case. There are cases when datasets that contain quality problems, are useful for certain applications, thus depending on the use case at hand. Thus, LD consumption has to deal with the problem of getting the data into a state in which it can be exploited for real use cases. The insufficient data quality can be caused either by the LD publication process or is intrinsic to the data source itself. A key challenge is to assess the quality of datasets published on the Web and make this quality information explicit. Assessing data quality is particularly a challenge in LD as the underlying data stems from a set of multiple, autonomous and evolving data sources. Moreover, the dynamic nature of LD makes assessing the quality crucial to measure the accuracy of representing the real-world data. On the document Web, data quality can only be indirectly or vaguely defined, but there is a requirement for more concrete and measurable data quality metrics for LD. Such data quality metrics include correctness of facts wrt. the real-world, adequacy of semantic representation, quality of interlinks, interoperability, timeliness or consistency with regard to implicit information. Even though data quality is an important concept in LD, there are few methodologies proposed to assess the quality of these datasets. Thus, in this thesis, we first unify 18 data quality dimensions and provide a total of 69 metrics for assessment of LD. The first methodology includes the employment of LD experts for the assessment. This assessment is performed with the help of the TripleCheckMate tool, which was developed specifically to assist LD experts for assessing the quality of a dataset, in this case DBpedia. The second methodology is a semi-automatic process, in which the first phase involves the detection of common quality problems by the automatic creation of an extended schema for DBpedia. The second phase involves the manual verification of the generated schema axioms. Thereafter, we employ the wisdom of the crowds i.e. workers for online crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) to assess the quality of DBpedia. We then compare the two approaches (previous assessment by LD experts and assessment by MTurk workers in this study) in order to measure the feasibility of each type of the user-driven data quality assessment methodology. Additionally, we evaluate another semi-automated methodology for LD quality assessment, which also involves human judgement. In this semi-automated methodology, selected metrics are formally defined and implemented as part of a tool, namely R2RLint. The user is not only provided the results of the assessment but also specific entities that cause the errors, which help users understand the quality issues and thus can fix them. Finally, we take into account a domain-specific use case that consumes LD and leverages on data quality. In particular, we identify four LD sources, assess their quality using the R2RLint tool and then utilize them in building the Health Economic Research (HER) Observatory. We show the advantages of this semi-automated assessment over the other types of quality assessment methodologies discussed earlier. The Observatory aims at evaluating the impact of research development on the economic and healthcare performance of each country per year. We illustrate the usefulness of LD in this use case and the importance of quality assessment for any data analysis. info:eu-repo/classification/ddc/500 ddc:500 Linked Data, Data Quality, Semantic Web
35	Estimating prevalence of subjective cognitive decline in and across international cohort studies of aging: a COSMIC study Röhr, Susanne, Pabst, Alexander, Riedel-Heller, Steffi Gerlinde, Jessen, Frank, Turana, Yuda, Handajani, Yvonne S., Brayne, Carol, Matthews, Fiona E., Stephan, Blossom C. M., Lipton, Richard B., Katz, Mindy J., Wang, Cuiling, Guerchet, Maëlenn, Preux, Pierre-Marie, Mbelesso, Pascal, Ritchie, Karen, Ancelin, Marie-Laure, Carrière, Isabelle, Guaita, Antonio, Davin, Annalisa, Vaccaro, Roberta, Kim, Ki Woong, Han, Ji Won, Suh, Seung Wan, Shahar, Suzana, Din, Normah C., Vanoh, Divya, van Boxtel, Martin, Köhler, Sebastian, Ganguli, Mary, Jacobsen, Erin P., Snitz, Beth E., Anstey, Kaarin J., Cherbuin, Nicolas, Kumagai, Shuzo, Chen, Sanmei, Narazaki, Kenji, Ng, Tze Pin, Gao, Qi, Gwee, Xinyi, Brodaty, Herny, Kochan, Nicole A., Trollor, Julian, Lobo, Antonio, López-Antón, Raúl, Santabárbara, Javier, Crawford, John D., Lipnicki, Darren M., Sachdev, Perminder S. 08 March 2022 (has links) Background: Subjective cognitive decline (SCD) is recognized as a risk stage for Alzheimer’s disease (AD) and other dementias, but its prevalence is not well known. We aimed to use uniform criteria to better estimate SCD prevalence across international cohorts. Methods: We combined individual participant data for 16 cohorts from 15 countries (members of the COSMIC consortium) and used qualitative and quantitative (Item Response Theory/IRT) harmonization techniques to estimate SCD prevalence. Results: The sample comprised 39,387 cognitively unimpaired individuals above age 60. The prevalence of SCD across studies was around one quarter with both qualitative harmonization/QH (23.8%, 95%CI = 23.3–24.4%) and IRT (25.6%, 95%CI = 25.1–26.1%); however, prevalence estimates varied largely between studies (QH 6.1%, 95%CI = 5.1–7.0%, to 52.7%, 95%CI = 47.4–58.0%; IRT: 7.8%, 95%CI = 6.8–8.9%, to 52.7%, 95%CI = 47.4–58.0%). Across studies, SCD prevalence was higher in men than women, in lower levels of education, in Asian and Black African people compared to White people, in lower- and middle-income countries compared to high-income countries, and in studies conducted in later decades. Conclusions: SCD is frequent in old age. Having a quarter of older individuals with SCD warrants further investigation of its significance, as a risk stage for AD and other dementias, and of ways to help individuals with SCD who seek medical advice. Moreover, a standardized instrument to measure SCD is needed to overcome the measurement variability currently dominant in the field. info:eu-repo/classification/ddc/610 ddc:610
36	Dolovací moduly systému pro dolování z dat v prostředí Oracle / Mining Modules of the Data Mining System in Oracle Mader, Pavel January 2009 (has links) This master's thesis deals with questions of the data mining and an extension of a data mining system in the Oracle environment developed at FIT. So far, this system cannot apply to real-life conditions as there are no data mining modules available. This system's core application design includes an interface allowing the addition of mining modules. Until now, this interface has been tested on a sample mining module only; this module has not been executing any activity just demonstrating the use of this interface. The main focus of this thesis is the study of this interface and the implementation of a functional mining module testing the applicability of the implemented interface. Association rule mining module was selected for implementation.
37	Applied Science for Water Quality Monitoring Khakipoor, Banafsheh 25 August 2020 (has links) No description available. Ecology Computer Science
38	Big Data Competence Center ScaDS Dresden/Leipzig Rahm, Erhard, Nagel, Wolfgang E., Peukert, Eric, Jäkel, René, Gärtner, Fabian, Stadler, Peter F., Wiegreffe, Daniel, Zeckzer, Dirk, Lehner, Wolfgang 16 June 2023 (has links) Since its launch in October 2014, the Competence Center for Scalable Data Services and Solutions (ScaDS) Dresden/Leipzig carries out collaborative research on Big Data methods and their use in challenging data science applications of different domains, leading to both general, and application-specific solutions and services. In this article, we give an overview about the structure of the competence center, its primary goals and research directions. Furthermore, we outline selected research results on scalable data platforms, distributed graph analytics, data augmentation and integration and visual analytics. We also briefly report on planned activities for the second funding period (2018-2021) of the center. Big Data, Data science, Data management info:eu-repo/classification/ddc/004 ddc:004
39	Efficient Partially Observable Markov Decision Process Based Formulation Of Gene Regulatory Network Control Problem Erdogdu, Utku 01 April 2012 (has links) (PDF) The need to analyze and closely study the gene related mechanisms motivated the research on the modeling and control of gene regulatory networks (GRN). Dierent approaches exist to model GRNs / they are mostly simulated as mathematical models that represent relationships between genes. Though it turns into a more challenging problem, we argue that partial observability would be a more natural and realistic method for handling the control of GRNs. Partial observability is a fundamental aspect of the problem / it is mostly ignored and substituted by the assumption that states of GRN are known precisely, prescribed as full observability. On the other hand, current works addressing partially observability focus on formulating algorithms for the nite horizon GRN control problem. So, in this work we explore the feasibility of realizing the problem in a partially observable setting, mainly with Partially Observable Markov Decision Processes (POMDP). We proposed a POMDP formulation for the innite horizon version of the problem. Knowing the fact that POMDP problems suer from the curse of dimensionality, we also proposed a POMDP solution method that automatically decomposes the problem by isolating dierent unrelated parts of the problem, and then solves the reduced subproblems. We also proposed a method to enrich gene expression data sets given as input to POMDP control task, because in available data sets there are thousands of genes but only tens or rarely hundreds of samples. The method is based on the idea of generating more than one model using the available data sets, and then sampling data from each of the models and nally ltering the generated samples with the help of metrics that measure compatibility, diversity and coverage of the newly generated samples.
40	Integrace a konzumace důvěryhodných Linked Data / Towards Trustworthy Linked Data Integration and Consumption Knap, Tomáš January 2013 (has links) Title: Towards Trustworthy Linked Data Integration and Consumption Author: RNDr. Tomáš Knap Department: Department of Software Engineering Supervisor: RNDr. Irena Holubová, PhD., Department of Software Engineering Abstract: We are now finally at a point when datasets based upon open standards are being published on an increasing basis by a variety of Web communities, governmental initiatives, and various companies. Linked Data offers information consumers a level of information integration and aggregation agility that has up to now not been possible. Consumers can now "mashup" and readily integrate information for use in a myriad of alternative end uses. Indiscriminate addition of information can, however, come with inherent problems, such as the provision of poor quality, inaccurate, irrelevant or fraudulent information. All will come with associated costs of the consumed data which will negatively affect data consumer's benefit and Linked Data applications usage and uptake. In this thesis, we address these issues by proposing ODCleanStore, a Linked Da- ta management and querying tool able to provide data consumers with Linked Data, which is cleansed, properly linked, integrated, and trustworthy accord- ing to consumer's subjective requirements. Trustworthiness of data means that the data has associated...

Search results