• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 91
  • 22
  • 21
  • 17
  • 16
  • 14
  • 9
  • 6
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 208
  • 208
  • 48
  • 44
  • 39
  • 38
  • 35
  • 30
  • 29
  • 29
  • 20
  • 20
  • 17
  • 16
  • 16
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

Découverte interactive de connaissances dans le web des données / Interactive Knowledge Discovery over Web of Data

Alam, Mehwish 01 December 2015 (has links)
Récemment, le « Web des documents » est devenu le « Web des données », i.e, les documents sont annotés sous forme de triplets RDF. Ceci permet de transformer des données traitables uniquement par les humains en données compréhensibles par les machines. Ces données peuvent désormais être explorées par l'utilisateur par le biais de requêtes SPARQL. Par analogie avec les moteurs de clustering web qui fournissent des classifications des résultats obtenus à partir de l'interrogation du web des documents, il est également nécessaire de réfléchir à un cadre qui permette la classification des réponses aux requêtes SPARQL pour donner un sens aux données retrouvées. La fouille exploratoire des données se concentre sur l'établissement d'un aperçu de ces données. Elle permet également le filtrage des données non-intéressantes grâce à l'implication directe des experts du domaine dans le processus. La contribution de cette thèse consiste à guider l'utilisateur dans l'exploration du Web des données à l'aide de la fouille exploratoire de web des données. Nous étudions trois axes de recherche, i.e : 1) la création des vues sur les graphes RDF et la facilitation des interactions de l'utilisateur sur ces vues, 2) l'évaluation de la qualité des données RDF et la complétion de ces données 3) la navigation et l'exploration simultanée de multiples ressources hétérogènes présentes sur le Web des données. Premièrement, nous introduisons un modificateur de solution i.e., View By pour créer des vues sur les graphes RDF et classer les réponses aux requêtes SPARQL à l'aide de l'analyse formelle des concepts. Afin de naviguer dans le treillis de concepts obtenu et d'extraire les unités de connaissance, nous avons développé un nouvel outil appelé RV-Explorer (RDF View Explorer ) qui met en oeuvre plusieurs modes de navigation. Toutefois, cette navigation/exploration révèle plusieurs incompletions dans les ensembles des données. Afin de compléter les données, nous utilisons l'extraction de règles d'association pour la complétion de données RDF. En outre, afin d'assurer la navigation et l'exploration directement sur les graphes RDF avec des connaissances de base, les triplets RDF sont groupés par rapport à cette connaissance de base et ces groupes peuvent alors être parcourus et explorés interactivement. Finalement, nous pouvons conclure que, au lieu de fournir l'exploration directe nous utilisons ACF comme un outil pour le regroupement de données RDF. Cela permet de faciliter à l'utilisateur l'exploration des groupes de données et de réduire ainsi son espace d'exploration par l'interaction. / Recently, the “Web of Documents” has become the “Web of Data”, i.e., the documents are annotated in the form of RDF making this human processable data directly processable by machines. This data can further be explored by the user using SPARQL queries. As web clustering engines provide classification of the results obtained by querying web of documents, a framework for providing classification over SPARQL query answers is also needed to make sense of what is contained in the data. Exploratory Data Mining focuses on providing an insight into the data. It also allows filtering of non-interesting parts of data by directly involving the domain expert in the process. This thesis contributes in aiding the user in exploring Linked Data with the help of exploratory data mining. We study three research directions, i.e., 1) Creating views over RDF graphs and allow user interaction over these views, 2) assessing the quality and completing RDF data and finally 3) simultaneous navigation/exploration over heterogeneous and multiple resources present on Linked Data. Firstly, we introduce a solution modifier i.e., View By to create views over RDF graphs by classifying SPARQL query answers with the help of Formal Concept Analysis. In order to navigate the obtained concept lattice and extract knowledge units, we develop a new tool called RV-Explorer (Rdf View eXplorer) which implements several navigational modes. However, this navigation/exploration reveal several incompletions in the data sets. In order to complete the data, we use association rule mining for completing RDF data. Furthermore, for providing navigation and exploration directly over RDF graphs along with background knowledge, RDF triples are clustered w.r.t. background knowledge and these clusters can then be navigated and interactively explored. Finally, it can be concluded that instead of providing direct exploration we use FCA as an aid for clustering RDF data and allow user to explore these clusters of data and enable the user to reduce his exploration space by interaction.
132

Apport des ontologies de domaine pour l'extraction de connaissances à partir de données biomédicales / Contribution of domain ontologies for knowledge discovery in biomedical data

Personeni, Gabin 09 November 2018 (has links)
Le Web sémantique propose un ensemble de standards et d'outils pour la formalisation et l'interopérabilité de connaissances partagées sur le Web, sous la forme d'ontologies. Les ontologies biomédicales et les données associées constituent de nos jours un ensemble de connaissances complexes, hétérogènes et interconnectées, dont l'analyse est porteuse de grands enjeux en santé, par exemple dans le cadre de la pharmacovigilance. On proposera dans cette thèse des méthodes permettant d'utiliser ces ontologies biomédicales pour étendre les possibilités d'un processus de fouille de données, en particulier, permettant de faire cohabiter et d'exploiter les connaissances de plusieurs ontologies biomédicales. Les travaux de cette thèse concernent dans un premier temps une méthode fondée sur les structures de patrons, une extension de l'analyse formelle de concepts pour la découverte de co-occurences de événements indésirables médicamenteux dans des données patients. Cette méthode utilise une ontologie de phénotypes et une ontologie de médicaments pour permettre la comparaison de ces événements complexes, et la découverte d'associations à différents niveaux de généralisation, par exemple, au niveau de médicaments ou de classes de médicaments. Dans un second temps, on utilisera une méthode numérique fondée sur des mesures de similarité sémantique pour la classification de déficiences intellectuelles génétiques. On étudiera deux mesures de similarité utilisant des méthodes de calcul différentes, que l'on utilisera avec différentes combinaisons d'ontologies phénotypiques et géniques. En particulier, on quantifiera l'influence que les différentes connaissances de domaine ont sur la capacité de classification de ces mesures, et comment ces connaissances peuvent coopérer au sein de telles méthodes numériques. Une troisième étude utilise les données ouvertes liées ou LOD du Web sémantique et les ontologies associées dans le but de caractériser des gènes responsables de déficiences intellectuelles. On utilise ici la programmation logique inductive, qui s'avère adaptée pour fouiller des données relationnelles comme les LOD, en prenant en compte leurs relations avec les ontologies, et en extraire un modèle prédictif et descriptif des gènes responsables de déficiences intellectuelles. L'ensemble des contributions de cette thèse montre qu'il est possible de faire coopérer avantageusement une ou plusieurs ontologies dans divers processus de fouille de données / The semantic Web proposes standards and tools to formalize and share knowledge on the Web, in the form of ontologies. Biomedical ontologies and associated data represents a vast collection of complex, heterogeneous and linked knowledge. The analysis of such knowledge presents great opportunities in healthcare, for instance in pharmacovigilance. This thesis explores several ways to make use of this biomedical knowledge in the data mining step of a knowledge discovery process. In particular, we propose three methods in which several ontologies cooperate to improve data mining results. A first contribution of this thesis describes a method based on pattern structures, an extension of formal concept analysis, to extract associations between adverse drug events from patient data. In this context, a phenotype ontology and a drug ontology cooperate to allow a semantic comparison of these complex adverse events, and leading to the discovery of associations between such events at varying degrees of generalization, for instance, at the drug or drug class level. A second contribution uses a numeric method based on semantic similarity measures to classify different types of genetic intellectual disabilities, characterized by both their phenotypes and the functions of their linked genes. We study two different similarity measures, applied with different combinations of phenotypic and gene function ontologies. In particular, we investigate the influence of each domain of knowledge represented in each ontology on the classification process, and how they can cooperate to improve that process. Finally, a third contribution uses the data component of the semantic Web, the Linked Open Data (LOD), together with linked ontologies, to characterize genes responsible for intellectual deficiencies. We use Inductive Logic Programming, a suitable method to mine relational data such as LOD while exploiting domain knowledge from ontologies by using reasoning mechanisms. Here, ILP allows to extract from LOD and ontologies a descriptive and predictive model of genes responsible for intellectual disabilities. These contributions illustrates the possibility of having several ontologies cooperate to improve various data mining processes
133

Geração de perguntas em linguagem natural a partir de bases de dados abertos e conectados: um estudo exploratório

Rodrigues, Emílio Luiz Faria 04 December 2017 (has links)
Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2018-04-04T14:31:00Z No. of bitstreams: 1 Emílio Luiz Faria Rodrigues_.pdf: 2764513 bytes, checksum: b34f808a0eff24f3a2360d875e08e1f2 (MD5) / Made available in DSpace on 2018-04-04T14:31:00Z (GMT). No. of bitstreams: 1 Emílio Luiz Faria Rodrigues_.pdf: 2764513 bytes, checksum: b34f808a0eff24f3a2360d875e08e1f2 (MD5) Previous issue date: 2017-12-04 / Nenhuma / O crescimento acelerado das bases de dados abertas e conectadas vem sendo observado recentemente. Existem diversas motivações para tal, envolvendo desde a geração destas bases de forma automática a partir de textos, até a sua construção diretamente a partir de sistemas de informação. Este crescimento gerou um conjunto numeroso de bases de dados com grande volume de informações. Deste modo observa-se a possibilidade de sua utilização em larga escala, em sistemas de pergunta e resposta. Os sistemas de pergunta e resposta dependem da existência de uma estrutura de informações a ser usada como apoio na geração de frases e na conferência das respostas. O atual contexto das bases de dados abertas e conectadas proporciona este suporte necessário. A partir de estudos da literatura, observou-se a oportunidade de maior utilização, em aplicações diversas, das possibilidades de geração de frases em linguagem natural a partir de bases de dados abertas conectadas. Além disso, foram identificados diversos desafios para a efetiva utilização destes recursos com esta finalidade. Desta forma, esse trabalho objetiva verificar quais os aspectos da estrutura de bases de dados abertas conectadas que podem ser utilizados como apoio na geração de perguntas em linguagem natural. Para tal foi desenvolvido um estudo exploratório e definida uma abordagem geral, testada em um protótipo que permitiu gerar frases de perguntas em linguagem natural com apoio em bases de dados abertas e conectadas. Os resultados foram avaliados por um especialista em linguística e foram considerados promissores. / The accelerated growth of open and connected databases has recently been observed. There are several motivations leading to it, some bases generate data from texts automatically, other bases are built straightaway from information systems. Therefore, a numerous set of data base with a huge volume of information is now available. Thus, the possibility of its use in large scale, in systems of question and answer is observed. The question and answer systems depend on the existence of an information structure to be used as support in generating sentences and checking the answers. The current background of open and connected data provides the essential support. From literature studies, it was observed the opportunity to use more the possibilities of generating sentences in natural language from connected open databases, in different kinds of applications. In addition, several challenges have been identified to realize the effective use of this resource. So, this work aims to verify which aspects of the structure of connected open databases can be used as the support to generate questions in natural language. Since, an exploratory study was developed, and a general approach was established. Which was tested on a prototype that was able to generate natural language question sentences, supported by open and connected databases. The results were evaluated by a specialist in linguistics and were considered promising.
134

Dados abertos governamentais: implicações e possibilidades em políticas públicas

Issa, Marcelo Kalil 21 October 2013 (has links)
Made available in DSpace on 2016-04-25T20:21:03Z (GMT). No. of bitstreams: 1 Marcelo Kalil Issa.pdf: 1645366 bytes, checksum: 0a09771ceb4c6ddaf61e5d6daa386561 (MD5) Previous issue date: 2013-10-21 / In recent years, groups of digital activists, spread over different parts of the planet, conformed collectively a set of technical standards for publishing data on the web, so that they facilitate the capture and reuse of these elements. The political discourse associated with the defense of the provision of information according to these criteria made emerge a global coordination that has been called the open data movement. Its militancy maintains especially the need for publication of scientific and government data according to these parameters. Thus, in the context of political science, so-called open government data and its implications have been established as a relatively autonomous object of inquiry. This paper addresses the main aspects of the current debate on open government data and attempt to identify the context in which this discussion takes place, its meanings and most relevant implications in terms of improvement of democratic dynamics, especially with respect to transparency, social participation and collaborative governance. In this sense, the central aim of the dissertation is to examine how the practices of open government data can affect the process of formulation, implementation and evaluation of public policies. To this end, some aspects of the experiences of promoting open data carried out by the governments of the United States and Brazil are analyzed, in particular the political and administrative contexts in which they have given and the applications developed from the reuse of datasets offered by open data official pages from both countries. The paucity of Brazilian initiatives determined consulting the principals involved in discussions and actions related to the subject in the country through qualitative interviews and published statements. The final considerations intend to point out some of the most pressing challenges and possible ways to effect the potential publication of open government data as an instrument of democratic assertion / Nos últimos anos, grupos de ativistas digitais, espalhados por diversas partes do planeta, conformaram coletivamente um conjunto de padrões técnicos para publicação de dados na web, a fim de facilitar a captura e o reuso desses elementos. O discurso político associado à defesa da disponibilização de informação segundo esses critérios fez emergir uma articulação global que tem sido chamada de movimento de dados abertos. Sua militância sustenta, sobretudo, a necessidade de publicação dos dados científicos e governamentais de acordo com esses parâmetros. Assim, no âmbito da ciência política, os chamados dados abertos governamentais e suas implicações vêm sendo estabelecidos como um objeto relativamente autônomo de investigação. Este trabalho aborda os principais aspectos do atual debate sobre dados abertos governamentais. Nele se busca identificar o contexto em que se dá essa discussão, seus significados e implicações mais relevantes em termos de aprimoramento das dinâmicas democráticas, especialmente no que diz respeito a ações de transparência pública, participação social e governança colaborativa. Nesse sentido, o objetivo central da dissertação consiste em avaliar de que maneira as práticas de dados abertos governamentais podem interferir no processo de formulação, implementação e avaliação de políticas públicas. Para tanto, foram analisados alguns aspectos das experiências de promoção de dados abertos levadas a cabo pelos governos dos Estados Unidos e do Brasil, em especial os contextos político-administrativos em que se têm dado e as aplicações informáticas desenvolvidas a partir da reutilização das bases de dados públicas e oferecidas nas páginas oficiais de dados abertos de ambos os países. A incipiência das iniciativas brasileiras determinou a consulta aos principais atores envolvidos nos debates e ações relativas ao tema no país por meio de entrevistas qualitativas e declarações publicadas. As considerações finais pretendem apontar alguns dos desafios mais prementes e caminhos possíveis para efetivar as potencialidades da publicação de dados abertos governamentais como instrumento de afirmação democrática
135

Apprentissage automatique pour la détection d'anomalies dans les données ouvertes : application à la cartographie / Satellite images analysis for anomaly detection in open geographical data.

Delassus, Rémi 23 November 2018 (has links)
Dans cette thèse nous étudions le problème de détection d’anomalies dans les données ouvertes utilisées par l’entreprise Qucit ; aussi bien les données métiers de ses clients, que celles permettant de les contextualiser. Dans un premier temps, nous nous sommes intéressés à la détection de vélos défectueux au sein des données de trajets du système de vélo en libre service de New York. Nous cherchons des données reflétant une anomalie dans la réalité. Des caractéristiques décrivant le comportement de chaque vélo observé sont partitionnés. Les comportements anormaux sont extraits depuis ce partitionnement et comparés aux rapports mensuels indiquant le nombre de vélos réparés ; c’est un problème d’apprentissage à sortie agrégée. Les résultats de ce premier travail se sont avérés insatisfaisant en raison de la pauvreté des données. Ce premier volet des travaux a ensuite laissé place à une problématique tournée vers la détection de bâtiments au sein d’images satellites. Nous cherchons des anomalies dans les données géographiques qui ne reflètent pas la réalité. Nous proposons une méthode de fusion de modèles de segmentation améliorant la métrique d’erreur jusqu’à +7% par rapport à la méthode standard. Nous évaluons la robustesse de notre modèle face à la suppression de bâtiments dans les étiquettes, afin de déterminer à quel point les omissions sont susceptibles d’en altérer les résultats. Ce type de bruit est communément rencontré au sein des données OpenStreetMap, régulièrement utilisées par Qucit, et la robustesse observée indique qu’il pourrait être corrigé. / In this thesis we study the problem of anomaly detection in the open data used by the Qucit company, both the business data of its customers, as well as those allowing to contextualize them.We are looking for data that reflects an anomaly in reality. Initially, we were interested in detecting defective bicycles in the trip data of New York’s bike share system. Characteristics describing the behaviour of each observed bicycle are clustered. Abnormal behaviors are extracted from this clustering and compared to monthly reports indicating the number of bikes repaired; this is an aggregate learning problem. The results of this first work were unsatisfactory due to the paucity of data. This first part of the work then gave way to a problem focused on the detection of buildings within satellite images. We are looking for anomalies in the geographical data that do not reflect reality. We propose a method of merging segmentation models that improves the error metric by up to +7% over the standard method. We assess the robustness of our model to the removal of buildings from labels to determine the extent to which omissions are likely to alter the results. This type of noise is commonly encountered within the OpenStreetMap data, regularly used by Qucit, and the robustness observed indicates that it could be corrected.
136

IDEO Integrador de dados da Execução Orçamentária Brasileira: um estudo de caso da integração de dados das receitas e despesas nas Esferas Federal, Estadual Governo de São Paulo, e Municipal Municípios do Estado de São Paulo / The integration of multi-source heterogeneous data: an open data case study for budgetary execution in Brazil.

José Rodolfo Beluzo 30 September 2015 (has links)
Este trabalho apresenta um grupo de processos para a integracao de dados e esquemas das receitas e despesas da execucao do orcamento publico brasileiro nas tres esferas governamentais: governo federal, estadual e municipios. Estes processos visam resolver problemas de heterogeneidade encontrados pelo cidadao ao buscar por informacoes publicas em diferentes entes. Estas informacoes atualmente sao disponibilizadas pelos portais de transparencia que seguem a obrigatoriedade definida pelo arcabouco legal brasileiro, no qual estes devem publicar, dentre outras informacoes, o registro de receitas, despesas, transferencias financeiras e processos licitatorios, de forma integra, primaria, autentica e atualizada. Porem, apesar das exigencias citadas por lei, nao existe um padrao para publicacao, alem de inconsistencias e ambiguidades de dados entre os diferentes portais. Assim, este trabalho visa resolver estes problemas de heterogeneidade enfrentados pelo cidadao. Para tal, como prova de conceito foram selecionados os dados de receitas e despesas do governo federal, do governo do estado de Sao Paulo e de 645 municipios do estado de Sao Paulo. Este trabalho padronizou um modelo conceitual de receitas e despesas baseado no manual tecnico do orcamento redigido pelo governo federal anualmente. A partir deste modelo criou-se esquemas de dados padronizados de acordo com os datasets que estao disponibilizados nos portais de transparencia de cada ente federativo, assim como um esquema integrado entre estes. Os dados de execucao orcamentaria do periodo de 2010 a 2014 dos governos citados foram extraidos dos portais, passando por um processo de transformacao e limpeza, e carregados no sistema integrador. Apos os dados carregados no sistema, a partir do prototipo foi possivel obter informacoes a respeito da execucao orcamentaria as quais nao eram possiveis de se realizar de forma direta acessando os portais de transparencia, ou, quando possivel o trabalho de compilacao da informacao seria muito elevado. Tambem foi possivel analisar e apontar possiveis falhas sistemicas nos portais de transparencia atraves dos resultados obtidos no processo, podendo contribuir com a melhoria destes. / This dissertation presents a process group for data integration and schemes of the Brazilian public budget s revenues and expenditures from all government level spheres (municipalities, states and nationwide). These process group aims to solve some heterogeneity problems to access public information provided by different government entities. Budget information is currently disclosed on e-gov portals, which must comply the requirements set by the Brazilian legal framework. Data about revenues, expenses, financial transfers and bidding processes must be published in a primary, authentic and updated way. However, there is no standards for publication, besides the poor data quality and inconsistencies found in the same data provided by different portals. Thus, this work aims to give some contributions to address these heterogeneity problems. To achieve this, we implemented a proof of concept that gathers revenue and expenditure data from the Brazilian federal government, the state government of Sao Paulo and 645 municipalities of Sao Paulo state. As a result, this work has standardized a conceptual model of revenues and expenses based on the technical manual of the budget. From this model, we created standardized data schemas according to the datasets that are available at the website of transparency of each government entity, as well as an integrated scheme between them. Data disclosed from 2010-2014 by all mentioned government were gathered, cleaned and loaded into the prototype. The resulting data warehouse allows queries about budget execution in Brazil that are not possible to perform directly accessing the transparency portals, or, when it is possible, this compilation work is very time consuming. During the validation phase was also possible to analyze and identify possible some failures in the e-gov portals and some recomendations try to give some contribution to their improvement.
137

Dados abertos governamentais: implicações e possibilidades em políticas públicas

Issa, Marcelo Kalil 21 October 2013 (has links)
Made available in DSpace on 2016-04-26T14:54:33Z (GMT). No. of bitstreams: 1 Marcelo Kalil Issa.pdf: 1645366 bytes, checksum: 0a09771ceb4c6ddaf61e5d6daa386561 (MD5) Previous issue date: 2013-10-21 / In recent years, groups of digital activists, spread over different parts of the planet, conformed collectively a set of technical standards for publishing data on the web, so that they facilitate the capture and reuse of these elements. The political discourse associated with the defense of the provision of information according to these criteria made emerge a global coordination that has been called the open data movement. Its militancy maintains especially the need for publication of scientific and government data according to these parameters. Thus, in the context of political science, so-called open government data and its implications have been established as a relatively autonomous object of inquiry. This paper addresses the main aspects of the current debate on open government data and attempt to identify the context in which this discussion takes place, its meanings and most relevant implications in terms of improvement of democratic dynamics, especially with respect to transparency, social participation and collaborative governance. In this sense, the central aim of the dissertation is to examine how the practices of open government data can affect the process of formulation, implementation and evaluation of public policies. To this end, some aspects of the experiences of promoting open data carried out by the governments of the United States and Brazil are analyzed, in particular the political and administrative contexts in which they have given and the applications developed from the reuse of datasets offered by open data official pages from both countries. The paucity of Brazilian initiatives determined consulting the principals involved in discussions and actions related to the subject in the country through qualitative interviews and published statements. The final considerations intend to point out some of the most pressing challenges and possible ways to effect the potential publication of open government data as an instrument of democratic assertion / Nos últimos anos, grupos de ativistas digitais, espalhados por diversas partes do planeta, conformaram coletivamente um conjunto de padrões técnicos para publicação de dados na web, a fim de facilitar a captura e o reuso desses elementos. O discurso político associado à defesa da disponibilização de informação segundo esses critérios fez emergir uma articulação global que tem sido chamada de movimento de dados abertos. Sua militância sustenta, sobretudo, a necessidade de publicação dos dados científicos e governamentais de acordo com esses parâmetros. Assim, no âmbito da ciência política, os chamados dados abertos governamentais e suas implicações vêm sendo estabelecidos como um objeto relativamente autônomo de investigação. Este trabalho aborda os principais aspectos do atual debate sobre dados abertos governamentais. Nele se busca identificar o contexto em que se dá essa discussão, seus significados e implicações mais relevantes em termos de aprimoramento das dinâmicas democráticas, especialmente no que diz respeito a ações de transparência pública, participação social e governança colaborativa. Nesse sentido, o objetivo central da dissertação consiste em avaliar de que maneira as práticas de dados abertos governamentais podem interferir no processo de formulação, implementação e avaliação de políticas públicas. Para tanto, foram analisados alguns aspectos das experiências de promoção de dados abertos levadas a cabo pelos governos dos Estados Unidos e do Brasil, em especial os contextos político-administrativos em que se têm dado e as aplicações informáticas desenvolvidas a partir da reutilização das bases de dados públicas e oferecidas nas páginas oficiais de dados abertos de ambos os países. A incipiência das iniciativas brasileiras determinou a consulta aos principais atores envolvidos nos debates e ações relativas ao tema no país por meio de entrevistas qualitativas e declarações publicadas. As considerações finais pretendem apontar alguns dos desafios mais prementes e caminhos possíveis para efetivar as potencialidades da publicação de dados abertos governamentais como instrumento de afirmação democrática
138

Fit for purpose? : a metascientific analysis of metabolomics data in public repositories

Spicer, Rachel January 2019 (has links)
Metabolomics is the study of metabolites and metabolic processes. Due to the diversity of structures and polarities of metabolites, no single analytical technique is able to measure the entire metabolome - instead a varied set of experimental designs and instrumental technologies are used to measure specific portions. This has led to the development of many distinct data analysis and processing methods and software. There is hope that metabolomics can be utilized for clinical applications, in toxicology and to measure the exposome. However, for these applications to be realised data must be high quality, sufficiently standardised and annotated, and FAIR (Findable, Accessible, Interoperable and Reproducible). For this purpose, it is also important that standardised, FAIR software workflows are available. There has also recently been much concern over the reproducibility of scientific research, which FAIR and open data, and workflows can help to address. To this end, this thesis aims to assess current practices and standards of sharing data within the field of metabolomics, using metascientific approaches. The types of functions of software for processing and analysing metabolomics data is also assessed. Reporting standards are designed to ensure that the minimum information required to un- derstand and interpret the results of analysis are reported. However, poor reporting standards are ignored and not complied with. Compliance to the biological context Metabolomics Standards Initiative (MSI) guidelines was examined, in order to investigate their timeliness. The state of open data within the metabolomics community was examined by investigating how much publicly available metabolomics data there is and where has it been deposited. To explore whether journal data sharing policies are driving open metabolomics data, which journals publish articles that have their underlying data made open was also examined. However, open data alone is not inherently useful: if data is incomplete, lacking in quality or missing crucial metadata, it is not valuable. Conversely, if data are reused, this can demonstrate the worth of public data archiving. Levels of reuse of public metabolomics data were therefore examined. With greater than 250 software tools specific for metabolomics, practitioners are faced with a daunting task to select the best tools for data collection and analysis. To help educate researchers about what software is available, a taxonomy of metabolomics software tools and a GitHub pages wiki, which provides extensive details about all included software, have been developed.
139

Интерактивно састављање машински читљивих и разумљивих судских писмена базирано на знању / Interaktivno sastavljanje mašinski čitljivih i razumljivih sudskih pismena bazirano na znanju / Knowledge-based Interactive Assembly ofMachine-readable and Machine-understandable Judicial Documents

Marković Marko 20 December 2018 (has links)
<p>Овом докторском дисертацијом предложен је<br />систем за састављање докумената у правосуђу<br />базиран на знању. Састављање судских писмена<br />је препознато као један од изазова с којим се<br />сусрећу правници на почетку своје каријере у чему<br />им помоћ најчешће пружају старије колеге, док је<br />правно неуким странкама у састављању<br />поднесака најчешће неопходна помоћ искусних<br />правника. За представљање знања на којем се<br />базира вештина састављања ових докумената<br />коришћена су два облика правног знања,<br />експлицитно које је садржано у правним нормама<br />и имплицитно које се стиче искуством у<br />састављању правних докумената. Предложени су<br />модели машински читљивог и разумљивог<br />формата правних норми, машински читљивог и<br />разумљивог формата докумената у правосуђу и<br />систем за састављање докумената у правосуђу.<br />Систем је погодан за коришћење у образовању<br />правника јер су кориснику објашњене везе између<br />унетих чињеница и навода у генерисаном<br />документу. Документи сачињени на овај начин су<br />машински читљиви и разумљиви чиме се<br />доприноси квалитету отворених података при<br />њиховом објављивању. У циљу повећања<br />транспарентности правосуђа предложене су и<br />смернице за превазилажење најчешћих<br />недостатака отворених судских података. Такође,<br />предложен систем представља један корак ка<br />аутоматизацији рада судске писарнице.</p> / <p>Ovom doktorskom disertacijom predložen je<br />sistem za sastavljanje dokumenata u pravosuđu<br />baziran na znanju. Sastavljanje sudskih pismena<br />je prepoznato kao jedan od izazova s kojim se<br />susreću pravnici na početku svoje karijere u čemu<br />im pomoć najčešće pružaju starije kolege, dok je<br />pravno neukim strankama u sastavljanju<br />podnesaka najčešće neophodna pomoć iskusnih<br />pravnika. Za predstavljanje znanja na kojem se<br />bazira veština sastavljanja ovih dokumenata<br />korišćena su dva oblika pravnog znanja,<br />eksplicitno koje je sadržano u pravnim normama<br />i implicitno koje se stiče iskustvom u<br />sastavljanju pravnih dokumenata. Predloženi su<br />modeli mašinski čitljivog i razumljivog<br />formata pravnih normi, mašinski čitljivog i<br />razumljivog formata dokumenata u pravosuđu i<br />sistem za sastavljanje dokumenata u pravosuđu.<br />Sistem je pogodan za korišćenje u obrazovanju<br />pravnika jer su korisniku objašnjene veze između<br />unetih činjenica i navoda u generisanom<br />dokumentu. Dokumenti sačinjeni na ovaj način su<br />mašinski čitljivi i razumljivi čime se<br />doprinosi kvalitetu otvorenih podataka pri<br />njihovom objavljivanju. U cilju povećanja<br />transparentnosti pravosuđa predložene su i<br />smernice za prevazilaženje najčešćih<br />nedostataka otvorenih sudskih podataka. Takođe,<br />predložen sistem predstavlja jedan korak ka<br />automatizaciji rada sudske pisarnice.</p> / <p>This thesis proposes a knowledge-based judicial<br />document assembly system. Document assembly is<br />recognized as one of the issues facing junior lawyers<br />at the beginning of their professional career causing<br />them to rely on their senior colleague&#39;s experience.<br />Also, filings&rsquo; preparation is a challenging task for nonlayers.<br />Therefore they usually need to hire a lawyer.<br />Knowledge required for document assembly can be<br />divided into stated knowledge found in regulations and<br />legal textbooks and tacit knowledge gained through<br />experience. This thesis introduces a machine-readable<br />and machine-understandable format of legal norms, a<br />machine-readable and machine-understandable<br />format of judicial documents, and a system for judicial<br />document assembly. The assembly system has its<br />potential in the education of law students providing<br />them explanations how fragments of generated<br />document relates to case facts. Also, this approach<br />improves quality of open judicial data and increases<br />transparency of judiciary because generated<br />documents are machine-readable and machineunderstandable<br />by default. In addition, a set of<br />guidelines for opening judicial data is proposed.<br />Finally, a machine-readable and machineunderstandable<br />format of generated documents is a<br />step toward automatic document processing at court<br />clerk&#39;s office.</p>
140

IDEO Integrador de dados da Execução Orçamentária Brasileira: um estudo de caso da integração de dados das receitas e despesas nas Esferas Federal, Estadual Governo de São Paulo, e Municipal Municípios do Estado de São Paulo / The integration of multi-source heterogeneous data: an open data case study for budgetary execution in Brazil.

Beluzo, José Rodolfo 30 September 2015 (has links)
Este trabalho apresenta um grupo de processos para a integracao de dados e esquemas das receitas e despesas da execucao do orcamento publico brasileiro nas tres esferas governamentais: governo federal, estadual e municipios. Estes processos visam resolver problemas de heterogeneidade encontrados pelo cidadao ao buscar por informacoes publicas em diferentes entes. Estas informacoes atualmente sao disponibilizadas pelos portais de transparencia que seguem a obrigatoriedade definida pelo arcabouco legal brasileiro, no qual estes devem publicar, dentre outras informacoes, o registro de receitas, despesas, transferencias financeiras e processos licitatorios, de forma integra, primaria, autentica e atualizada. Porem, apesar das exigencias citadas por lei, nao existe um padrao para publicacao, alem de inconsistencias e ambiguidades de dados entre os diferentes portais. Assim, este trabalho visa resolver estes problemas de heterogeneidade enfrentados pelo cidadao. Para tal, como prova de conceito foram selecionados os dados de receitas e despesas do governo federal, do governo do estado de Sao Paulo e de 645 municipios do estado de Sao Paulo. Este trabalho padronizou um modelo conceitual de receitas e despesas baseado no manual tecnico do orcamento redigido pelo governo federal anualmente. A partir deste modelo criou-se esquemas de dados padronizados de acordo com os datasets que estao disponibilizados nos portais de transparencia de cada ente federativo, assim como um esquema integrado entre estes. Os dados de execucao orcamentaria do periodo de 2010 a 2014 dos governos citados foram extraidos dos portais, passando por um processo de transformacao e limpeza, e carregados no sistema integrador. Apos os dados carregados no sistema, a partir do prototipo foi possivel obter informacoes a respeito da execucao orcamentaria as quais nao eram possiveis de se realizar de forma direta acessando os portais de transparencia, ou, quando possivel o trabalho de compilacao da informacao seria muito elevado. Tambem foi possivel analisar e apontar possiveis falhas sistemicas nos portais de transparencia atraves dos resultados obtidos no processo, podendo contribuir com a melhoria destes. / This dissertation presents a process group for data integration and schemes of the Brazilian public budget s revenues and expenditures from all government level spheres (municipalities, states and nationwide). These process group aims to solve some heterogeneity problems to access public information provided by different government entities. Budget information is currently disclosed on e-gov portals, which must comply the requirements set by the Brazilian legal framework. Data about revenues, expenses, financial transfers and bidding processes must be published in a primary, authentic and updated way. However, there is no standards for publication, besides the poor data quality and inconsistencies found in the same data provided by different portals. Thus, this work aims to give some contributions to address these heterogeneity problems. To achieve this, we implemented a proof of concept that gathers revenue and expenditure data from the Brazilian federal government, the state government of Sao Paulo and 645 municipalities of Sao Paulo state. As a result, this work has standardized a conceptual model of revenues and expenses based on the technical manual of the budget. From this model, we created standardized data schemas according to the datasets that are available at the website of transparency of each government entity, as well as an integrated scheme between them. Data disclosed from 2010-2014 by all mentioned government were gathered, cleaned and loaded into the prototype. The resulting data warehouse allows queries about budget execution in Brazil that are not possible to perform directly accessing the transparency portals, or, when it is possible, this compilation work is very time consuming. During the validation phase was also possible to analyze and identify possible some failures in the e-gov portals and some recomendations try to give some contribution to their improvement.

Page generated in 0.0942 seconds