Spelling suggestions: "subject:"data repository"" "subject:"mata repository""
1 |
Establishing a Framework for an African Genome ArchiveSouthgate, Jamie January 2021 (has links)
>Magister Scientiae - MSc / The generation of biomedical research data on the African continent is grow-
ing, with numerous studies realizing the importance of African genetic diver-
sity in discoveries of human origins and disease susceptibility. The decrease in
costs to purchase and utilize such tools has enabled research groups to produce
datasets of signi cant scienti c value. However, this success story has resulted
in a new challenge for African Researchers and institutions. An increase in
data scale and complexity has led to an imbalance of infrastructure and skills
to manage, store and analyse this data. The lack of physical infrastructure has
left genomic research on the continent lagging behind its counterparts abroad,
drastically limiting the sharing of data and posing challenges for researchers
wishing to explore secondary analysis, study veri cation and amalgamation.
The scope of this project entailed the design and implementation of a proto-
type genome archive to support the e ective use of data resources amongst
researchers. The prototype consists of a web interface and storage backend
for users to upload and browse projects, datasets and metadata stored in
the archive. The server, middleware, database and server-side framework are
components of the genome archive and form the software stack. The server
component provides the shared resources such as network connectivity, le
storage, security and metadata database. The database type implemented in
storing the metadata relating to the sample les is a NoSQL database. This
database is interfaced with the iRods middleware component which controls
data being sent between the server, database and the Flask framework. The
Flask framework which is based on the Python programming language, is the
development platform of the archive web application.
The Cognitive Walkthrough methodology was used to evaluate suitabil-
ity of the software for its users. Results showed that the core conceptual model
adopted by the prototype software is consistent and that actions available to
the user are visible. Issues were raised pertaining to user feedback when per-
forming tasks and metadata term meaning. The development of a continent
wide genome archive for Africa is feasible by utilizing open source software
and metadata standards to improve data discovery and reuse.
|
2 |
Architecture of Databases for Mineralogy and AstrobiologyLafuente Valverde, Barbara, Lafuente Valverde, Barbara January 2016 (has links)
This dissertation is focused on the design of the Open Data Repository's Data Publisher (ODR), a web-based central repository for scientific data, primarily focused on mineralogical properties, but also applicable to other data types, including for instance, morphological, textural and contextual images, chemical, biochemical, isotopic, and sequencing information. Using simple web-based tools, the goal of ODR is to lower the cost and training barrier so that any researcher can easily publish their data, ensure that it is archived for posterity, and comply with the mandates for data sharing. There are only a few databases in the mineralogical community, including RRUFF (http://rruff.info) for professionals, and mindat.org (http://www.mindat.org) for amateurs. These databases contain certain specific mineral information, but none, however, provide the ability to include, in the same platform, any of the many datatypes that characterize the properties of minerals. The ODR framework provides the flexibility required to include unforeseen data without the need for additional software programming. Once ODR is completed, the RRUFF database will be migrated into ODR and populated with additional data using other analytical techniques, such as Mössbauer data from Dr. Richard Morris and NVIR data from Dr. Ralf Milliken. The current ODR pilot studies are also described here, including 1) a database of the XRD analysis performed by the CheMin instrument on the Mars Science Laboratory rover Curiosity, 2) the NASA-AMES Astrobiology Habitable Environments Database (AHED), which aims to provide a central, high quality, long-term data repository for relevant astrobiology information, 3) the University of Arizona Mineral Museum (UAMM), with over 21,000 records of minerals and fossils from the museum collection, and 4) the Mineral Evolution Database (MED), that uses the ages of mineral species and their localities to correlate the diversification of mineral species through time with Earth's physical, chemical and biological processes. A good database design requires understanding the fundamentals of its content, so part of this thesis is also focused on developing my skills in mineral analysis and characterization, through the study of the crystal-chemistry of diverse minerals using X-ray diffraction, Raman spectroscopy and microprobe analysis, as principal techniques.
|
3 |
Facilitating data sharing : a design approach to incorporate context into the research data repositoryGarza Gutierrez, Kristian January 2017 (has links)
We asked whether the design of a Science Data Repository (SDR) can influence data sharing behaviour in small scientific collaborations. We hypothesised that an SDR can influence data-sharing behaviour when its design considers the context of data-sharing. We proposed an alternative approach to those documented in the literature, employing a combination of socio-technical empirical and analytical methods for context capturing, and choice architecture for context incorporation. To evaluate the approach we applied it to design features in a Scientific Data Repository for a population of small scientific collaborations within the Life Sciences. The application of this thesis' approach consisted of an exploratory case study, a review of factors associated with data sharing, the definition of design claims, and implementation of a set of design features. We collected data using interviews with members of the collaborations and designers of the SDR; as well as obtaining the data-logs from the collaborations' SDR. We evaluated the resulting design features using an asynchronous web experiment. We found that using the empirical approach to context capturing we are able to effectively identify factors associated with data sharing in the small scientific collaborations. Moreover, we identified a number of limitations on the application of the analytical approach to context capturing. Furthermore, we found that the Choice Architecture based procedure for context incorporation can define effective design features in Science Data Repositories. In this work, we show that we can facilitate data-sharing by incorporating context into the design of a Science Data Repository, and identified a set of restrictions to use our approach. The approach proposed in this thesis can be used by practitioners wishing to improve data sharing in an SDR. Contributions, such as the survey of factors associated with data sharing behaviour, can be used by researchers to understand the problems associated with data sharing in small scientific collaborations.
|
4 |
An Analysis and Reasoning Framework for Project Data Software RepositoriesAttarian, Ioanna Maria January 2012 (has links)
As the requirements for software systems increase, their size, complexity and functionality consequently increases as well. This has a direct impact on the complexity of numerous artifacts related to the system such as specification, design, implementation and, testing models. Furthermore, as the software market becomes more and more competitive, the need for software products that are of high quality and require the least monetary, time and human resources for their development and maintenance becomes evident. Therefore, it is important that project managers and software engineers are given the necessary tools to obtain a more holistic and accurate perspective of the status of their projects in order to early identify potential risks, flaws, and quality issues that may arise during each stage of the software project life cycle. In this respect, practitioners and academics alike have recognized the significance of investigating new methods for supporting software management operations with respect to large software projects. The main target of this M.A.Sc. thesis is the design of a framework in terms of, first, a reference architecture for mining and analyzing of software project data repositories according to specific objectives and analytic knowledge, second, the techniques to model such analytic knowledge and, third, a reasoning methodology for verifying or denying hypotheses related to analysis objectives. Such a framework could assist project managers, team leaders and development teams towards more accurate prediction of project traits such as quality analysis, risk assessment, cost estimation and progress evaluation. More specifically, the framework utilizes goal models to specify analysis objectives as well as, possible ways by which these objectives can be achieved. Examples of such analysis objectives for a project could be to yield, high code quality, achieve low production cost or, cope with tight delivery deadlines. Such goal models are consequently transformed into collections of Markov Logic Network rules which are then applied to the repository data in order to verify or deny with a degree of probability, whether the particular project objectives can be met as the project evolves. The proposed framework has been applied, as a proof of concept, on a repository pertaining to three industrial projects with more that one hundred development tasks.
|
5 |
An Analysis and Reasoning Framework for Project Data Software RepositoriesAttarian, Ioanna Maria January 2012 (has links)
As the requirements for software systems increase, their size, complexity and functionality consequently increases as well. This has a direct impact on the complexity of numerous artifacts related to the system such as specification, design, implementation and, testing models. Furthermore, as the software market becomes more and more competitive, the need for software products that are of high quality and require the least monetary, time and human resources for their development and maintenance becomes evident. Therefore, it is important that project managers and software engineers are given the necessary tools to obtain a more holistic and accurate perspective of the status of their projects in order to early identify potential risks, flaws, and quality issues that may arise during each stage of the software project life cycle. In this respect, practitioners and academics alike have recognized the significance of investigating new methods for supporting software management operations with respect to large software projects. The main target of this M.A.Sc. thesis is the design of a framework in terms of, first, a reference architecture for mining and analyzing of software project data repositories according to specific objectives and analytic knowledge, second, the techniques to model such analytic knowledge and, third, a reasoning methodology for verifying or denying hypotheses related to analysis objectives. Such a framework could assist project managers, team leaders and development teams towards more accurate prediction of project traits such as quality analysis, risk assessment, cost estimation and progress evaluation. More specifically, the framework utilizes goal models to specify analysis objectives as well as, possible ways by which these objectives can be achieved. Examples of such analysis objectives for a project could be to yield, high code quality, achieve low production cost or, cope with tight delivery deadlines. Such goal models are consequently transformed into collections of Markov Logic Network rules which are then applied to the repository data in order to verify or deny with a degree of probability, whether the particular project objectives can be met as the project evolves. The proposed framework has been applied, as a proof of concept, on a repository pertaining to three industrial projects with more that one hundred development tasks.
|
6 |
Elaboration d'un cadre méthodologique pour l'analyse de l'information médicale de la tarification à l'activité / Construction of a methodological framework for the analysis of medical information within the French prospective payment systemBoudemaghe, Thierry 09 December 2016 (has links)
En dépit de la production standardisée de millions d'enregistrements décrivant année après année l'activité d'hospitalisation complète en France, aucune étude globale, aucun suivi précis et répété dans le temps, n'a pu être mené à partir de ces données.Si ces données sont exploitées dans une grande variété d'études spécifiques, elles peinent à trouver une utilisation significative dans l'analyse de l'activité et du recrutement des établissements de santé, ou dans celle des besoins de la population. Tout un pan d'analyse de l'efficience du système de santé reste ainsi inaccessible.Nous nous fixons comme objectif de contribuer à développer un cadre précis d'exploitation de ces données. Cette démarche se fera en trois temps :- Définition d'une méthode de consolidation des données ;- Construction de référentiels d'analyse des données ;- Elaboration de méthodes d'analyse sur deux thématiques générales : la caractérisation de l'activité des établissements et l'étude de leur recrutement, avec exemple d'application de ces méthodes. / Millions of computerized records describing inpatient hospitalization activity are produced year after year in France.In spite of the availability of this massive amount of data, no global, iterative and well calibrated related study on health system efficiency has been possible.There are of course many specific studies partially or totally relying on these data, but their systematic use for assessing hospital activity, catchment areas or population health needs remains to be implented.Our work aims to contribute to create a methodological framework for analyzing these data through a three step approach :- Definition of data consolidation methods ;- Creation of adequate data repositories ;- Determination of analysis methods for two general topics : hospital activity characterization and study, with example applications.
|
7 |
Streamlining user processes for a general data repository for life science in accordance with the FAIR principlesAsklöf, Anna January 2021 (has links)
With the increasing amounts of data generated in life science, methods for data storage and sharing are being developed and implemented. Online data repositories are more and more commonly used for data sharing. The national Swedish platform Science of Life Laboratory has decided to use an institutional data repository as a mean to address the increasing amounts of data generated at the platform. In this project, the system used for the institutional repository at SciLifeLab was studied and compared to implementations of the same system at other institutions to create user documentation for the repository. This documentation was created with the FAIR principles as a guidance. Feedback on the guidelines were then sought from users and based on the received feedback, the user documentation was improved. Using a FAIR evaluation tool called FAIR evaluation services, items published on the repository were evaluated. Investigation of these results and their correlation to the items record on the repository were carried out. Out of ten evaluated datasets all except one scored exactly the same on the FAIR evaluation services tests. This could indicate that the test used is not evaluating aspects needed to encounter the differences in these published items. Based on this, conclusions as to in what extent user documentation can increase the FAIRness of data cannot be drawn.
|
8 |
Foundational Data Repository for Numeric Engine ValidationHollingsworth, Jason Michael 19 November 2008 (has links) (PDF)
Many different numeric models have been created to address a variety of hydraulic and hydrologic engineering applications. Each utilizes formulations and numeric methods to represent processes such as contaminant transport, coastal circulation, and watershed runoff. Although one process may be adequately represented by a model, this does not guarantee that another process will be represented even if that process is similar. For example, a model that computes subcritical flow does not necessarily compute supercritical flow. Selecting an appropriate numeric model for a situation is a prerequisite to obtaining accurate results. Current policies and resources do not provide adequate guidance in the model selection process. Available resources range from approved lists to guidelines for performing calculations to technical documentation of candidate numeric models. Many of these resources are available only from the developers of the numeric models. They focus on strengths with little or no mention of weaknesses or limitations. For this reason, engineers must make a selection based on publicity and/or familiarity rather than capability, often resulting in inappropriate application, frustration, and/or incorrect results. A comprehensive selection tool to aid engineers needs to test model capabilities by comparing model output with analytical solutions, laboratory tests, and physical case studies. The first step in building such a tool involves gathering and categorizing robust data the can be used for such model comparisons. A repository has been designed for this purpose, created, and made available to the engineering community. This repository can be found at http://verification.aquaveo.com. This allows engineers and regulators to store studies with assigned characteristics, as well as search and access studies based on a desired set of characteristics. Studies with characteristics similar to a desired project can help identify appropriate numeric models.
|
9 |
Cryptography and Computer Communications Security. Extending the Human Security Perimeter through a Web of TrustAdeka, Muhammad I. January 2015 (has links)
This work modifies Shamir’s algorithm by sharing a random key that is used to lock up the secret data; as against sharing the data itself. This is significant in cloud computing, especially with homomorphic encryption. Using web design, the resultant scheme practically globalises secret sharing with authentications and inherent secondary applications. The work aims at improving cybersecurity via a joint exploitation of human factors and technology; a human-centred cybersecurity design as opposed to technology-centred. The completed functional scheme is tagged CDRSAS.
The literature on secret sharing schemes is reviewed together with the concepts of human factors, trust, cyberspace/cryptology and an analysis on a 3-factor security assessment process. This is followed by the relevance of passwords within the context of human factors. The main research design/implementation and system performance are analysed, together with a proposal for a new antidote against 419 fraudsters. Two twin equations were invented in the investigation process; a pair each for secret sharing and a risk-centred security assessment technique.
The building blocks/software used for the CDRSAS include Shamir’s algorithm, MD5, HTML5, PHP, Java, Servlets, JSP, Javascript, MySQL, JQuery, CSS, MATLAB, MS Excel, MS Visio, and Photoshop. The codes are developed in Eclipse IDE, and the Java-based system runs on Tomcat and Apache, using XAMPP Server. Its code units have passed JUnit tests. The system compares favourably with SSSS.
Defeating socio-cryptanalysis in cyberspace requires strategies that are centred on human trust, trust-related human attributes, and technology. The PhD research is completed but there is scope for future work. / Petroleum Technology Development Fund (PTDF), Abuja, Nigeria.
|
10 |
Strategy and methodology for enterprise data warehouse development : integrating data mining and social networking techniques for identifying different communities within the data warehouseRifaie, Mohammad January 2010 (has links)
Data warehouse technology has been successfully integrated into the information infrastructure of major organizations as potential solution for eliminating redundancy and providing for comprehensive data integration. Realizing the importance of a data warehouse as the main data repository within an organization, this dissertation addresses different aspects related to the data warehouse architecture and performance issues. Many data warehouse architectures have been presented by industry analysts and research organizations. These architectures vary from the independent and physical business unit centric data marts to the centralised two-tier hub-and-spoke data warehouse. The operational data store is a third tier which was offered later to address the business requirements for inter-day data loading. While the industry-available architectures are all valid, I found them to be suboptimal in efficiency (cost) and effectiveness (productivity). In this dissertation, I am advocating a new architecture (The Hybrid Architecture) which encompasses the industry advocated architecture. The hybrid architecture demands the acquisition, loading and consolidation of enterprise atomic and detailed data into a single integrated enterprise data store (The Enterprise Data Warehouse) where businessunit centric Data Marts and Operational Data Stores (ODS) are built in the same instance of the Enterprise Data Warehouse. For the purpose of highlighting the role of data warehouses for different applications, we describe an effort to develop a data warehouse for a geographical information system (GIS). We further study the importance of data practices, quality and governance for financial institutions by commenting on the RBC Financial Group case. v The development and deployment of the Enterprise Data Warehouse based on the Hybrid Architecture spawned its own issues and challenges. Organic data growth and business requirements to load additional new data significantly will increase the amount of stored data. Consequently, the number of users will increase significantly. Enterprise data warehouse obesity, performance degradation and navigation difficulties are chief amongst the issues and challenges. Association rules mining and social networks have been adopted in this thesis to address the above mentioned issues and challenges. We describe an approach that uses frequent pattern mining and social network techniques to discover different communities within the data warehouse. These communities include sets of tables frequently accessed together, sets of tables retrieved together most of the time and sets of attributes that mostly appear together in the queries. We concentrate on tables in the discussion; however, the model is general enough to discover other communities. We first build a frequent pattern mining model by considering each query as a transaction and the tables as items. Then, we mine closed frequent itemsets of tables; these itemsets include tables that are mostly accessed together and hence should be treated as one unit in storage and retrieval for better overall performance. We utilize social network construction and analysis to find maximum-sized sets of related tables; this is a more robust approach as opposed to a union of overlapping itemsets. We derive the Jaccard distance between the closed itemsets and construct the social network of tables by adding links that represent distance above a given threshold. The constructed network is analyzed to discover communities of tables that are mostly accessed together. The reported test results are promising and demonstrate the applicability and effectiveness of the developed approach.
|
Page generated in 0.0587 seconds