Global ETD Search

1	Konzepte und Techniken der Datenversorgung für komponentenbasierte Informationssysteme Sellentin, Jürgen. January 1999 (has links) Stuttgart, Univ., Diss., 1999.
2	Extending dimensional modeling through the abstraction of data relationships and development of the semantic data warehouse Hart, Robert 04 December 2017 (has links) The Kimball methodology, often referred to as dimensional modelling, is well established in data warehousing and business intelligence as a highly successful means for turning data into information. Yet weaknesses exist in the Kimball approach that make it difficult to rapidly extend or interrelate dimensional models in complex business areas such as Health Care. This Thesis looks at the development of a methodology that will provide for the rapid extension and interrelation of Kimball dimensional models. This is achieved through the use of techniques similar to those employed in the semantic web. These techniques allow for rapid analysis and insight into highly variable data which previously was difficult to achieve. / Graduate Kimball Star Schema Health Information Business Intelligence Data Warehouse RDF Triplets Dimensional Model Health Data Research
3	Perceptions, motivations and behaviours towards research impact : a cross-disciplinary perspective Chikoore, Lesley January 2016 (has links) In recent years, the UK higher education sector has seen notable policy changes with regard to how research is funded, disseminated and evaluated. Important amongst these changes is the emphasis that policy makers have placed on disseminating peer-reviewed scholarly journal articles via Open Access (OA) publishing routes e.g. OA journals or OA repositories. Through the Open Science agenda there have also been a number of initiatives to promote the dissemination of other types of output that have not traditionally been made publicly available via the scholarly communication system, such as data, workflows and methodologies. The UK Research Excellence Framework (REF) 2014 introduced social/economic impact of research as an evaluation measure. This has been a significant policy shift away from academic impact being the sole measure of impact and has arguably raised the profile of public engagement activities (although it should be noted that public engagement is not equivalent to social/economic impact, but is an important pathway to realising such impact). This exploratory study sought to investigate the extent to which these recent policy changes are aligned with researchers publication, dissemination and public engagement practices across different disciplines. Furthermore, it sought to identify the perceptions and attitudes of researchers towards the concept of social/economic impact. The study adopted a mixed-methods approach consisting of a questionnaire- based survey and semi-structured interviews with researchers from a broad range of disciplines across the physical, health, engineering, social sciences, and arts and humanities across fifteen UK universities. The work of Becher (1987) and Becher & Trowler (2001) on disciplinary classification was used as an explanatory framework to understand disciplinary differences. The study found evidence of a lack of awareness of the principle of OA by some researchers across all disciplines; and that researchers, in the main, are not sharing their research data, therefore only the few who are doing so are realising the benefits that have been championed in research funders policies. Moreover, the study uncovered that due to the increased emphasis of impact in research evaluation, conflicting goals between researchers and academic leaders exist. The study found that researchers, particularly from Applied and Interdisciplinary (as opposed to Pure) disciplinary groups felt that research outputs such as articles published in practitioner journals were most appropriate in targeting and making research more accessible to practitioners, than prestigious peer-reviewed scholarly journal articles. The thesis argues that there is still more to learn about what impact means to researchers and how it might be measured. The thesis makes an overall contribution to knowledge on a general level by providing greater understanding of how researchers have responded to the impact agenda . On a more specific level, the thesis identifies the effect of the impact agenda on academic autonomy, and situates this in different disciplinary contexts. It identifies that it is not only researchers from Pure disciplines who feel disadvantaged by the impact agenda , but also those from Interdisciplinary and Applied groups who feel an encroachment on their academic autonomy, particularly in selecting channels to disseminate their research and in selecting the relevant audiences they wish to engage with. Implications of the study s findings on researchers, higher education institutions and research funders are highlighted and recommendations to researchers, academic leaders and research funders are given. 070.5
4	DEPENDABLE CLOUD RESOURCES FOR BIG-DATA BATCH PROCESSING & STREAMING FRAMEWORKS Bara M Abusalah (10692924) 07 May 2021 (has links) The examiner of cloud computing systems in the last few years observes that there is a trend of the emergence of new Big Data frameworks every single year. Since Hadoop was developed in 2007, new frameworks followed it such as Spark, Storm, Heron, Apex, Flink, Samza, Kafka ... etc. Each framework is developed in a certain way to target and achieve certain objectives better than other frameworks do. However, there are few common functionalities and aspects that are shared between these frameworks. One vital aspect all these frameworks strive to achieve is better reliability and faster recovery time in case of failures. Despite all the advances in making datacenters dependable, failures actually still happen. This is particularly onerous for long-running “big data” applications, where partial failures can lead to significant losses and lengthy recomputations. This is also crucial for streaming systems where events are processed and monitored online in real time, and any delay in data delivery will cause a major inconvenience to the users.<div>Another observation is that some reliability implementations are redundant between different frameworks. Big data processing frameworks like Hadoop MapReduce include fault tolerance mechanisms, but these are commonly targeted at specific system/failure models, and are often redundant between frameworks. Encapsulating these implementations into one layer and making it shared between different applications will benefit more than one frame-work without the burden of re-implementing the same reliability approach in each single framework.<br></div><div>These observations motivated us to solve the problem by presenting two systems: Guardian and Warden. Guardian is tailored towards batch processing big data systems while Warden is targeted towards stream processing systems. Both systems are robust, RMS based, generic, multi-framework, flexible, customizable, low overhead systems that allow their users to run their applications with individually configurable fault tolerance granularity and degree, with only minor changes to their implementation.<br></div><div>Most reliability approaches carry out one rigid fault tolerance technique targeted towards one system at a time. It is more challenging to provide a reliability approach that is pluggable in multiple Big Data frameworks at a time and can achieve low overheads comparable with single targeted framework approaches, yet is flexible and customizable by its users to make it tailored towards their objectives. The genericity is attained by providing an interface that can be used in different applications from different frameworks in any part of the application code. The low overhead is achieved by providing faster application finish times with and without failures. The customizability is fulfilled by providing the users the options to choose between two fault tolerance guarantees (Crash Failures / Byzantine Failures) and, in case of streaming systems; it is combined with two delivery semantics (Exactly Once / At Most Once).<br></div><div>In other words, this thesis proposes the paradigm of dependable resources: big data processing frameworks are typically built on top of resource management systems (RMSs),and proposing fault tolerance support at the level of such an RMS yields generic fault tolerance mechanisms, which can be provided with low overhead by leveraging constraints on resources.<br></div><div>To the best of our knowledge, such approach was never tried on multiple big data batch processing and streaming frameworks before.<br></div><div>We demonstrate the benefits of Guardian by evaluating some batch processing frame-works such as Hadoop, Tez, Spark and Pig on a prototype of Guardian running on Amazon-EC2, improving completion time by around 68% in the presence of failures, while maintaining around 6% overhead. We’ve also built a prototype of Warden on the Flink and Samza (with Kafka) streaming frameworks. Our evaluations on Warden highlight the effectiveness of our approach in the presence of failures and without failures compared to other fault tolerance techniques (such as checkpointing)<br></div> Computer Engineering Big Data Research reliability failure mechanism fault tolerance
5	Integração semântica de publicações científicas e dados de pesquisa: proposta de modelo de publicação ampliada para a área de Ciências Nucleares Sales, Luana Farias 23 July 2014 (has links) Submitted by Priscilla Araujo (priscilla@ibict.br) on 2016-08-11T17:14:13Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) LUANA SALES D.pdf: 6728909 bytes, checksum: 2eadee7606fac1789263665564c63b8d (MD5) / Made available in DSpace on 2016-08-11T17:14:13Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) LUANA SALES D.pdf: 6728909 bytes, checksum: 2eadee7606fac1789263665564c63b8d (MD5) Previous issue date: 2014-07-23 / Esta pesquisa se desenrola sob as condições de um paradigma científico emergente, conhecido como e-Science ou 4º Paradigma Científico. Essa nova forma de fazer ciência se distingue pelo uso intensivo de redes de computadores, repositórios digitais distribuídos e pela geração extraordinária de dados de pesquisa, que é uma consequência do uso determinante de tecnologias de informação e de simulação por software do avanço da instrumentação científica. O ambiente informacional que se configura como consequência dessas transformações impacta sensivelmente os padrões de comunicação científica, principalmente no que tange às pesquisas cooperativas, ao compartilhamento e reuso de recursos informacionais e às formas de comunicar e de disseminar os resultados de pesquisa. Para contextualizar o seu campo de estudo, a tese contribui para delinear conceitos novos e renovados para a Ciência da Informação como e-Science, curadoria de dados de pesquisa, objetos digitais complexos, repositório de dados, CRIS (Current Research Information System) e outras infraestruturas essenciais para a gestão da pesquisa e das novas concepções de publicações acadêmicas e científicas. A pesquisa esta alicerçada sobre duas premissas: a primeira levanta a necessidade de um modelo de publicação científica que possa expressar e refletir o novo padrão de geração de conhecimento científico rico em dados, sendo capaz de integrar esses dados às publicações; a segunda preconiza que isso possa ser realizado segundo as possibilidades tecnológicas e os padrões decorrentes da Web Semântica. Essas duas premissas corporificam a formulação dahipótese colocada pela presente tese: uma publicação científica pode ser enriquecida e estar mais próxima às novas formas de geração de conhecimento da ciência contemporânea se estiver configurada segundo um modelo que vincule por meio de relações semânticas os dados e conjuntos de dados de pesquisa à publicação convencional. O método utilizado foi o dedutivo, partindo de conceitos gerais da Ciência da Informação aplicados à especificidade das Ciências Nucleares. Isto foi realizado de duas formas: por meio de levantamento bibliográfico, para fins de análise e interpretação qualitativa dos conceitos gerais, e por meio da abordagem de análise de domínio que permitiu analisar empiricamente a área de aplicação específica. Como resultado final obteve-se uma proposta de diretrizes para uma política nacional de curadoria digital e um modelo de publicação científica para a área de Ciências Nucleares, em que os dados são ligados às publicações acadêmicas por meio de relações semânticas sistematizadas em taxonomia construída para esta finalidade. Os modelos gráficos foram utilizados como ferramenta para representar e sintetizar os conceitos resultantes. Como conclusão constata-se: alterações no ciclo da comunicação científica, a possibilidade de construção de um novo modelo de publicação cientifica como padrão relevante para a prática de uma ciência mais aberta e mais colaborativa, e a viabilidade de incorporação dos princípios e das teorias da Biblioteconomia e da Ciência da Informação para a organização do conhecimento técnico-cientifico no mundo da eScience. / This research takes place under the conditions of an arising scientific paradigm, known as e-Science or 4th Scientific Paradigm. This new way of doing science is characterized by intensive use of computer networks, distributed digital repositories and by extraordinary generation of research data, which is a consequence of the heavy use of information and simulation technologies and advancing of scientific instrumentation. The information environment that is established as a result of these transformations significantly impacts the patterns of scientific communication, especially regarding to cooperative research, the sharing and reuse of information resources and ways to communicate and to disseminate research results. In order to create a context for their field of study, the thesis contributes to delineate new and renewed concepts for Information Science such as e-Science, curation of research data, complex digital objects, data repository, CRIS (Current Research Information System Model ) and others key infrastructures for the management of research and also of new conceptions of academic and scientific publications. The research is based on two assumptions: first raises the need for a model of scientific publication that would reflect the new standard for generating scientific knowledge characterized by data richness, and being able to integrate these data to publications; the second highlights that this can be performed according to the technological possibilities and standards arising from the Semantic Web. These two assumptions embody the formulation of the hypothesis raised by this thesis: a scientific publication can be enriched and be closer to new ways of generating knowledge, which characterizes contemporary science, if it is configured according to a model that links through semantic relations the research data and datasets to conventional publication. The method adopted was the deductive one, starting from general concepts of Information Science applied to the specificity of Nuclear Sciences. This was accomplished in two ways: through a literature review, for purposes of analysis and qualitative interpretation of the general concepts; and through the domain analysis approach that allowed empirically analyze the particular application area. As a final result was obtained a proposal of guidelines for a national policy for digital curation, and a model of scientific publication to the Nuclear Sciences area, in which the research data are linked to the academic publications by means of semantic relations systematized into taxonomy built for this purpose. Graphic models are used as a tool to represent and synthesize the resulting concepts. As a conclusion it is observed: changes in the scholarly communication cycle, the possibility of building a new scientific model as relevant standard to the practice of a more open and more collaborative science, and feasibility of incorporating the principles and theories of librarianship and Information Science for the organization of technical and scientific knowledge in the world of eScience. Publicação Ampliada Curadoria Digital Dados de Pesquisa eScience Relação Conceitual Enhanced Publication Digital Curation Data Research eScience Semantic Relations
6	Integriertes Management und Publikation von wissenschaftlichen Artikel, Software und Forschungsdaten am Helmholtz-Zentrum Dresden-Rossendorf (HZDR) Reschke, Edith, Konrad, Uwe 24 April 2020 (has links) Mit dem Ziel, das Publizieren von Artikeln, Forschungsdaten und wissenschaftlicher Software gemäß den FAIR-Prinzipien zu unterstützen, wurde am HZDR ein integriertes Publikationsmanagement aufgebaut. Insbesondere Daten- und Softwarepublikationen erfordern die Entwicklung bedarfsgerechter organisatorischer und technischer Strukturen ergänzend zu bereits sehr gut funktionierenden Services im Publikationsmanagement. In der Zusammenarbeit mit Wissenschaftlern des HZDR und internationalen Partnern in ausgewählten Projekten wurde der Bedarf an Unterstützung im Forschungsdatenmanagement analysiert. Darauf aufbauend wurde schrittweise ein integriertes System von Infrastrukturen und Services entwickelt und bereitgestellt. In einer seit Mai 2018 gültigen Data Policy wurden die Rahmenbedingungen und Regelungen sowohl für wissenschaftliche Mitarbeiter als auch für externe Messgäste definiert. Im Vortrag wird auf die Erfahrungen im integrierten Publikationsmanagement für Artikel, Forschungsdaten und Forschungssoftware eingegangen und daraus resultierend werden die nächsten Aufgaben und Ziele entwickelt. info:eu-repo/classification/ddc/001 ddc:001
7	Wissenswertes rund um Forschungsdaten: 10. November 2020, 10 - 11 Uhr Kuhnert, Dana, Queitsch, Manuela 23 November 2020 (has links) Die im Rahmen von Forschungsprojekten gewonnenen Forschungsdaten sind eine wesentliche Grundlage der wissenschaftlichen Arbeit. In nahezu allen Fachdisziplinen gewinnen sie immer mehr an Bedeutung. Die Nachvollziehbarkeit und die Qualität wissenschaftlicher Forschung wird durch die Dokumentation, die langfristige Sicherung und Bereitstellung der Forschungsdaten gefördert. Außerdem stellt die Publikation und die langfristige Sicherung von Forschungsdaten bei der DFG, EU und beim BMBF in vielen Fällen eine Voraussetzung für die Förderung von Forschungsvorhaben dar. Was genau sind Forschungsdaten? Was versteht man unter dem FAIR-Prinzip? Offene Forschungsdaten: Welche Vorteile bringen sie für die Forschenden? Wo kann man Forschungsdaten archivieren und veröffentlichen? Welche Services für Forschende der TU Bergakademie Freiberg zum Thema Forschungsdaten bieten die UB Freiberg und die Kontaktstelle Forschungsdaten der SLUB/ZiH Dresden? Diese und weitere Fragen beantworten Manuela Queitsch, Koordinatorin für Forschungsdaten an der SLUB Dresden und Teammitglied an der Kontaktstelle Forschungsdaten in Dresden und Dr. Dana Kuhnert, Fachreferentin für Wirtschafts- und Rechtswissenschaften der UB Bergakademie Freiberg. info:eu-repo/classification/ddc/020 ddc:020 FAIR data principles Veröffentlichung Archivierung Forschungsdaten
8	Forschungsdaten-Policy der TU Bergakademie Freiberg Technische Universität Bergakademie Freiberg 11 December 2023 (has links) Am 28.11.2023 hat der Senat der Technischen Universität Bergakademie Freiberg eine institutionelle Forschungsdaten-Policy verabschiedet, die allen Wissenschaftlerinnen und Wissenschaftlern der Hochschule eine wichtige Orientierungshilfe für den Umgang mit Forschungsdaten bietet. info:eu-repo/classification/ddc/020 ddc:020 Forschungsdaten Datenmanagement
9	SPECIES- TO COMMUNITY-LEVEL RESPONSES TO CLIMATE CHANGE IN EASTERN U.S. FORESTS Jonathan A Knott (8797934) 12 October 2021 (has links) <p>Climate change has dramatically altered the ecological landscape of the eastern U.S., leading to shifts in phenological events and redistribution of tree species. However, shifts in phenology and species distributions have implications for the productivity of different populations and <a></a>the communities these species are a part of. Here, I utilized two studies to quantify the effects of climate change on forests of the eastern U.S. First, I used phenology observations at a common garden of 28 populations of northern red oak (<i>Quercus rubra</i>) across seven years to assess shifts in phenology in response to warming, identify population differences in sensitivity to warming, and correlate sensitivity to the productivity of the populations. Second, I utilized data from the USDA Forest Service’s Forest Inventory and Analysis Program to identify forest communities of the eastern U.S., assess shifts in their species compositions and spatial distributions, and determine which climate-related variables are most associated with changes at the community level. In the first study, I found that populations were shifting their spring phenology in response to warming, with the greatest sensitivity in populations from warmer, wetter climates. However, these populations with higher sensitivity did not have the highest productivity; rather, populations closer to the common garden with intermediate levels of sensitivity had the highest productivity. In the second study, I found that there were 12 regional forest communities of the eastern U.S., which varied in the amount their species composition shifted over the last three decades. Additionally, all 12 communities shifted their spatial distributions, but their shifts were not correlated with the distance and direction that climate change predicted them to shift. Finally, areas with the highest changes across all 12 communities were associated with warmer, wetter, lower temperature-variable climates generally in the southeastern U.S. Taken together, these studies provide insight into the ways in which forests are responding to climate change and have implications for the management and sustainability of forests in a continuously changing global environment.</p> Ecology Ecological Impacts of Climate Change Forest Ecology Climate Change Forest Inventory and Analysis (FIA) Phenology Latent Dirichlet allocation (LDA) Big Data Research Community Ecology
10	EXPLOITING THE SPATIAL DIMENSION OF BIG DATA JOBS FOR EFFICIENT CLUSTER JOB SCHEDULING Akshay Jajoo (9530630) 16 December 2020 (has links) With the growing business impact of distributed big data analytics jobs, it has become crucial to optimize their execution and resource consumption. In most cases, such jobs consist of multiple sub-entities called tasks and are executed online in a large shared distributed computing system. The ability to accurately estimate runtime properties and coordinate execution of sub-entities of a job allows a scheduler to efficiently schedule jobs for optimal scheduling. This thesis presents the first study that highlights spatial dimension, an inherent property of distributed jobs, and underscores its importance in efficient cluster job scheduling. We develop two new classes of spatial dimension based algorithms to<br>address the two primary challenges of cluster scheduling. First, we propose, validate, and design two complete systems that employ learning algorithms exploiting spatial dimension. We demonstrate high similarity in runtime properties between sub-entities of the same job by detailed trace analysis on four different industrial cluster traces. We identify design challenges and propose principles for a sampling based learning system for two examples, first for a coflow scheduler, and second for a cluster job scheduler.<br>We also propose, design, and demonstrate the effectiveness of new multi-task scheduling algorithms based on effective synchronization across the spatial dimension. We underline and validate by experimental analysis the importance of synchronization between sub-entities (flows, tasks) of a distributed entity (coflow, data analytics jobs) for its efficient execution. We also highlight that by not considering sibling sub-entities when scheduling something it may also lead to sub-optimal overall cluster performance. We propose, design, and implement a full coflow scheduler based on these assertions. Computer Engineering Distributed Computing Networking and Communications Computer Communications Networks Big Data Research Big Data Framework Background cluster scheduling Data center networks data center network Hadoop MapReduce HADOOP Online Learning scheduling decision

Search results