11 |
Analýza datových zdrojů veřejné správy v BI / Analysis of public administration data sources in BIBílek, Milan January 2016 (has links)
Thesis is divided into two parts. First part is theoretic the second one is practically focused. In the theoretic focused part thesis examines area of open data sources and trends, which appears in this area. In practically oriented part are data processed with Business Intelligence methods. Goal of this part was show that it is possible to process this open data with help of BI and report them similarly, like data from companies and show obstacles, which can appear. Thesis is divided in six chapters. First chapter is introduction, second is searches, third one focuses on trends and use of open data, fourth shows, where open data can be found, fifth describes processing of chosen data with BI, sixth chapter is final and sums up work and describes fulfilling of goals.
|
12 |
Improving Teacher Preparation Through Data SourcesSharp, L. Kathryn 01 February 2017 (has links)
No description available.
|
13 |
iFuice - Information Fusion utilizing Instance Correspondences and Peer MappingsRahm, Erhard, Thor, Andreas, Aumüller, David, Do, Hong-Hai, Golovin, Nick, Kirsten, Toralf 04 February 2019 (has links)
We present a new approach to information fusion of web data sources. It is based on peer-to-peer mappings between sources and utilizes correspondences between their instances. Such correspondences are already available between many sources, e.g. in the form of web links, and help combine the information about specific objects and support a high quality data fusion. Sources and
mappings relate to a domain model to support a semantically focused information fusion. The iFuice architecture incorporates a mapping mediator offering both an interactive and a script-driven, workflow-like access to the sources and their mappings. The script programmer can use powerful generic operators to execute
and manipulate mappings and their results. The paper motivates the new approach and outlines the architecture and its main components, in particular the domain model, source and mapping model, and the script operators and their usage.
|
14 |
Contextual Outlier Detection from Heterogeneous Data SourcesYan, Yizhou 17 May 2020 (has links)
The dissertation focuses on detecting contextual outliers from heterogeneous data sources. Modern sensor-based applications such as Internet of Things (IoT) applications and autonomous vehicles are generating a huge amount of heterogeneous data including not only the structured multi-variate data points, but also other complex types of data such as time-stamped sequence data and image data. Detecting outliers from such data sources is critical to diagnose and fix malfunctioning systems, prevent cyber attacks, and save human lives. The outlier detection techniques in the literature typically are unsupervised algorithms with a pre-defined logic, such as, to leverage the probability density at each point to detect outliers. Our analysis of the modern applications reveals that this rigid probability density-based methodology has severe drawbacks. That is, low probability density objects are not necessarily outliers, while the objects with relatively high probability densities might in fact be abnormal. In many cases, the determination of the outlierness of an object has to take the context in which this object occurs into consideration. Within this scope, my dissertation focuses on four research innovations, namely techniques and system for scalable contextual outlier detection from multi-dimensional data points, contextual outlier pattern detection from sequence data, contextual outlier image detection from image data sets, and lastly an integrative end-to-end outlier detection system capable of doing automatic outlier detection, outlier summarization and outlier explanation. 1. Scalable Contextual Outlier Detection from Multi-dimensional Data. Mining contextual outliers from big datasets is a computational expensive process because of the complex recursive kNN search used to define the context of each point. In this research, leveraging the power of distributed compute clusters, we design distributed contextual outlier detection strategies that optimize the key factors determining the efficiency of local outlier detection, namely, to localize the kNN search while still ensuring the load balancing. 2. Contextual Outlier Detection from Sequence Data. For big sequence data, such as messages exchanged between devices and servers and log files measuring complex system behaviors over time, outliers typically occur as a subsequence of symbolic values (or sequential pattern), in which each individual value itself may be completely normal. However, existing sequential pattern mining semantics tend to mis-classify outlier patterns as typical patterns due to ignoring the context in which the pattern occurs. In this dissertation, we present new context-aware pattern mining semantics and then design efficient mining strategies to support these new semantics. In addition, methodologies that continuously extract these outlier patterns from sequence streams are also developed. 3. Contextual Outlier Detection from Image Data. An image classification system not only needs to accurately classify objects from target classes, but also should safely reject unknown objects that belong to classes not present in the training data. Here, the training data defines the context of the classifier and unknown objects then correspond to contextual image outliers. Although the existing Convolutional Neural Network (CNN) achieves high accuracy when classifying known objects, the sum operation on multiple features produced by the convolutional layers causes an unknown object being classified to a target class with high confidence even if it matches some key features of a target class only by chance. In this research, we design an Unknown-aware Deep Neural Network (UDN for short) to detect contextual image outliers. The key idea of UDN is to enhance existing Convolutional Neural Network (CNN) to support a product operation that models the product relationship among the features produced by convolutional layers. This way, missing a single key feature of a target class will greatly reduce the probability of assigning an object to this class. To further improve the performance of our UDN at detecting contextual outliers, we propose an information-theoretic regularization strategy that incorporates the objective of rejecting unknowns into the learning process of UDN. 4. An End-to-end Integrated Outlier Detection System. Although numerous detection algorithms proposed in the literature, there is no one approach that brings the wealth of these alternate algorithms to bear in an integrated infrastructure to support versatile outlier discovery. In this work, we design the first end-to-end outlier detection service that integrates outlier-related services including automatic outlier detection, outlier summarization and explanation, human guided outlier detector refinement within one integrated outlier discovery paradigm. Experimental studies including performance evaluation and user studies conducted on benchmark outlier detection datasets and real world datasets including Geolocation, Lighting, MNIST, CIFAR and the Log file datasets confirm both the effectiveness and efficiency of the proposed approaches and systems.
|
15 |
Computational Prediction of Gene Function From High-throughput Data SourcesMostafavi, Sara 31 August 2011 (has links)
A large number and variety of genome-wide genomics and proteomics datasets are now available for model organisms. Each dataset on its own presents a distinct but noisy view of cellular state. However, collectively, these datasets embody a more comprehensive view of cell function. This motivates the prediction of function for uncharacterized genes by combining multiple datasets, in order to exploit the associations between such genes and genes of known function--all in a query-specific fashion.
Commonly, heterogeneous datasets are represented as networks in order to facilitate their combination. Here, I show that it is possible to accurately predict gene function in seconds by combining multiple large-scale networks. This facilitates function prediction on-demand, allowing users to take advantage of the persistent improvement and proliferation of genomics and proteomics datasets and continuously make up-to-date predictions for large genomes such as humans.
Our algorithm, GeneMANIA, uses constrained linear regression to combine multiple association networks and uses label propagation to make predictions from the combined network. I introduce extensions that result in improved predictions when the number of labeled examples for training is limited, or when an ontological structure describing a hierarchy of gene function categorization scheme is available. Further, motivated by our empirical observations on predicting node labels for general networks, I propose a new label propagation algorithm that exploits common properties of real-world networks to increase both the speed and accuracy of our predictions.
|
16 |
Computational Prediction of Gene Function From High-throughput Data SourcesMostafavi, Sara 31 August 2011 (has links)
A large number and variety of genome-wide genomics and proteomics datasets are now available for model organisms. Each dataset on its own presents a distinct but noisy view of cellular state. However, collectively, these datasets embody a more comprehensive view of cell function. This motivates the prediction of function for uncharacterized genes by combining multiple datasets, in order to exploit the associations between such genes and genes of known function--all in a query-specific fashion.
Commonly, heterogeneous datasets are represented as networks in order to facilitate their combination. Here, I show that it is possible to accurately predict gene function in seconds by combining multiple large-scale networks. This facilitates function prediction on-demand, allowing users to take advantage of the persistent improvement and proliferation of genomics and proteomics datasets and continuously make up-to-date predictions for large genomes such as humans.
Our algorithm, GeneMANIA, uses constrained linear regression to combine multiple association networks and uses label propagation to make predictions from the combined network. I introduce extensions that result in improved predictions when the number of labeled examples for training is limited, or when an ontological structure describing a hierarchy of gene function categorization scheme is available. Further, motivated by our empirical observations on predicting node labels for general networks, I propose a new label propagation algorithm that exploits common properties of real-world networks to increase both the speed and accuracy of our predictions.
|
17 |
Desenvolvimento de banco de dados de pacientes submetidos ao transplante de células-tronco hematopoéticasSilva, Tatiana Schnorr January 2018 (has links)
Introdução: O transplante de células‐tronco hematopoéticas (TCTH) é um procedimento complexo, que envolve diferentes fatores e condições biopsicossociais. O acompanhamento dos dados desses pacientes é fundamental para a obtenção de informações que possam auxiliar a gestão, aperfeiçoar a assistência prestada e subsidiar novas pesquisas sobre o assunto. Objetivos: desenvolver um modelo de banco de dados (BD) de pacientes submetidos a TCTH, contemplando as principais variáveis de interesse na área. Métodos: Trata‐se de um estudo aplicado, onde utilizou‐se a metodologia de desenvolvimento de um BD relacional, seguindo três etapas principais (modelo conceitual, modelo relacional, modelo físico). O modelo físico proposto foi desenvolvido na plataforma Research Electronic Data Capture (REDCap). Um teste piloto foi realizado com dados de três pacientes submetidos a TCTH no Hospital Moinhos de Vento no ano de 2016/2017, a fim de avaliar a utilização das ferramentas e sua aplicabilidade. Resultados: Foram desenvolvidos nove formulários no REDCap: dados sociodemográficos; dados diagnósticos; histórico, dados clínicos prévios; avaliação prétransplante; procedimento; acompanhamento pós‐imediato; acompanhamento pós‐tardio; reinternações; óbito. Adicionalmente foram desenvolvidos três modelos de relatórios, com as variáveis contidas nos formulários para auxiliar na exportação de dados para as instituições envolvidas com o TCTH. Após o teste piloto foram realizados pequenos ajustes na nomenclatura de algumas variáveis e exclusão de outras devido à complexidade na sua obtenção. Conclusão: Espera‐se que com a sua utilização, o modelo de BD proposto possa servir como subsídio para qualificar a assistência prestada ao paciente, auxiliar a gestão e facilitar futuras pesquisas na área. / Introduction: hematopoietic stem cell transplantation (HSCT) is a complex procedure involving different biopsychosocial factors and conditions. Monitoring the data of these patients is fundamental for obtaining information that can help the management, improve the assistance provided and subsidize new research on the subject. Objectives: to develop a database model (DB) of patients submitted to HSCT, considering the main variables of interest in the area. Methods: it is an applied study, where the methodology of development of a relational DB was used, following three main steps (conceptual model, relational model, physical model). The proposed physical model was developed in the research electronic data capture (Redcap) platform. A pilot test was performed with data from three patients submitted to HSCT at Moinhos de Vento Hospital in 2016, in order to evaluate the use of the tools and their applicability. Results: nine forms were developed in redcap: demographic data; diagnostic data; previous clinical data; pre‐transplant evaluation; procedure; post‐immediate follow‐up; post‐late follow‐up; readmissions; death. In addition, three reporting models were developed, with the variables contained in the forms to assist in the export of data to the institutions involved with the TCTH. After the pilot test small adjustments were made in the nomenclature of some variables and others were excluded due to the complexity in obtaining them. Conclusion: it is hoped that with its use, the proposed BD model can serve as a subsidy to qualify the care provided to the patient, assist the management and facilitate research in the area.
|
18 |
Desenvolvimento de banco de dados de pacientes submetidos ao transplante de células-tronco hematopoéticasSilva, Tatiana Schnorr January 2018 (has links)
Introdução: O transplante de células‐tronco hematopoéticas (TCTH) é um procedimento complexo, que envolve diferentes fatores e condições biopsicossociais. O acompanhamento dos dados desses pacientes é fundamental para a obtenção de informações que possam auxiliar a gestão, aperfeiçoar a assistência prestada e subsidiar novas pesquisas sobre o assunto. Objetivos: desenvolver um modelo de banco de dados (BD) de pacientes submetidos a TCTH, contemplando as principais variáveis de interesse na área. Métodos: Trata‐se de um estudo aplicado, onde utilizou‐se a metodologia de desenvolvimento de um BD relacional, seguindo três etapas principais (modelo conceitual, modelo relacional, modelo físico). O modelo físico proposto foi desenvolvido na plataforma Research Electronic Data Capture (REDCap). Um teste piloto foi realizado com dados de três pacientes submetidos a TCTH no Hospital Moinhos de Vento no ano de 2016/2017, a fim de avaliar a utilização das ferramentas e sua aplicabilidade. Resultados: Foram desenvolvidos nove formulários no REDCap: dados sociodemográficos; dados diagnósticos; histórico, dados clínicos prévios; avaliação prétransplante; procedimento; acompanhamento pós‐imediato; acompanhamento pós‐tardio; reinternações; óbito. Adicionalmente foram desenvolvidos três modelos de relatórios, com as variáveis contidas nos formulários para auxiliar na exportação de dados para as instituições envolvidas com o TCTH. Após o teste piloto foram realizados pequenos ajustes na nomenclatura de algumas variáveis e exclusão de outras devido à complexidade na sua obtenção. Conclusão: Espera‐se que com a sua utilização, o modelo de BD proposto possa servir como subsídio para qualificar a assistência prestada ao paciente, auxiliar a gestão e facilitar futuras pesquisas na área. / Introduction: hematopoietic stem cell transplantation (HSCT) is a complex procedure involving different biopsychosocial factors and conditions. Monitoring the data of these patients is fundamental for obtaining information that can help the management, improve the assistance provided and subsidize new research on the subject. Objectives: to develop a database model (DB) of patients submitted to HSCT, considering the main variables of interest in the area. Methods: it is an applied study, where the methodology of development of a relational DB was used, following three main steps (conceptual model, relational model, physical model). The proposed physical model was developed in the research electronic data capture (Redcap) platform. A pilot test was performed with data from three patients submitted to HSCT at Moinhos de Vento Hospital in 2016, in order to evaluate the use of the tools and their applicability. Results: nine forms were developed in redcap: demographic data; diagnostic data; previous clinical data; pre‐transplant evaluation; procedure; post‐immediate follow‐up; post‐late follow‐up; readmissions; death. In addition, three reporting models were developed, with the variables contained in the forms to assist in the export of data to the institutions involved with the TCTH. After the pilot test small adjustments were made in the nomenclature of some variables and others were excluded due to the complexity in obtaining them. Conclusion: it is hoped that with its use, the proposed BD model can serve as a subsidy to qualify the care provided to the patient, assist the management and facilitate research in the area.
|
19 |
Material Hub – Ordnung im Chaos der WerkstoffdatenquellenMosch, Marc, Radeck, Carsten, Schumann, Maria 17 May 2018 (has links) (PDF)
Neuartige Materialien spielen eine entscheidende Rolle in Innovationsprozessen und sind die Voraussetzung für eine Vielzahl neuer Produkte. Der Standort Dresden stellt mit der Exzellenz-Universität TU Dresden und einer Vielzahl an außeruniversitären Einrichtungen ein bedeutendes europäisches Zentrum auf dem Gebiet der Materialforschung dar. Das breite wissenschaftliche und technologische Spektrum sowie die enorme Forschungsdichte in Kombination mit einer hohen fachlichen Vernetzung führen einerseits zu Synergieeffekten unter den Wissenschaftlern und verschaffen andererseits der Wirtschaft einen enormen Standortvorteil. Sollen diese Vorteile voll ausgenutzt werden, bedarf es eines vereinheitlichten, intuitiven Informationszugangs. Aktuell werden Materialdaten jedoch typischerweise auf einer Vielzahl separierter, teilweise eingeschränkt zugänglicher Datenbestände gehalten und sind nach heterogenen Schemas und in variierendem Detailgrad beschrieben. Zwar existieren bereits Rechercheportale, diese sind jedoch domänenspezifisch, kostenpflichtig oder bieten nur auf spezielle Zielgruppen zugeschnittene Bedienoberflächen, die für andere Nutzer kaum bedienbar sind. Verteilte Recherchen über mehrere Datenquellen und Portale sind zeitaufwändig und mühsam. Abhilfe soll die hier vorgestellte integrierte Material-Recherche-Plattform Material Hub schaffen. Sie muss den Anforderungen von Herstellern und Zulieferern, deren Daten sie enthält ebenso entsprechen wie den Anforderungen der Anwender aus Forschung, Industrie und Handwerk. Diese den Wissenschaftsraum Dresden integrierende Plattform soll weitere erstklassige Forschungs- und Innovationsleistungen stimulieren, Kooperationen begünstigen und die
Vermarktung innovativer Ideen und Lösungen wesentlich erleichtern. Außerdem soll Material Hub die Sichtbarkeit und Reichweite der Dresdner Materialforschung erhöhen und so die bereits vorhandene Leistungsfähigkeit signifikant stärken.
Gegenstand dieses Artikels ist das technische Grundkonzept des Material Hub. Ein wesentlicher Aspekt besteht dabei in der Zusammenführung verschiedener Datenquellen in einem zentralen Rechercheportal. Integriert werden Forschungsdaten, Herstellerinformationen und Anwendungsbeispiele, die sowohl hinsichtlich Domäne als auch hinsichtlich Detailgrad und 1 gefördert aus Mitteln der Europäischen Union und des Europäischen Fonds für regionale Entwicklung zugrundeliegendem Schema heterogen sind. Dazu wird in Abstimmung mit Werkstoffwissenschaftlern ein Schema zur Materialbeschreibung sowie eine semantische Wissensbasis konzipiert, die z. B. Synonyme und inhaltliche Zusammenhänge modelliert. Basierend darauf werden die Datenbestände indexiert und für die Recherche zugänglich gemacht. Die Benutzeroberfläche unterstützt mehrere Suchmasken, von der klassischen Stichwortsuche über die facettierte Suche bis hin zu stärker geführten Ansätzen, um zielgruppenspezifischen Anwendungsfällen durch geeignete UI-Konzepte gerecht zu werden. Neben konzeptionellen Ansätzen behandelt dieser Artikel erste Implementierungs- und Evaluationsergebnisse.
|
20 |
Komplexní řízení kvality dat a informací / Towards Complex Data and Information Quality ManagementPejčoch, David January 2010 (has links)
This work deals with the issue of Data and Information Quality. It critically assesses the current state of knowledge within tvarious methods used for Data Quality Assessment and Data (Information) Quality improvement. It proposes new principles where this critical assessment revealed some gaps. The main idea of this work is the concept of Data and Information Quality Management across the entire universe of data. This universe represents all data sources which respective subject comes into contact with and which are used under its existing or planned processes. For all these data sources this approach considers setting the consistent set of rules, policies and principles with respect to current and potential benefits of these resources and also taking into account the potential risks of their use. An imaginary red thread that runs through the text, the importance of additional knowledge within a process of Data (Information) Quality Management. The introduction of a knowledge base oriented to support the Data (Information) Quality Management (QKB) is therefore one of the fundamental principles of the author proposed a set of best
|
Page generated in 0.072 seconds