1 |
From One Environment to Many: The Problem of Reproducibility of Experimental ResultsLin, Jinguang January 1900 (has links)
Master of Science / Department of Statistics / Michael J. Higgins / When the same experiment is carried out in a different environment, the error term not only includes the random error within a given experiment, but it also includes the additional sources of variability that are introduced by conducting the same experiment in different environments. These differences include both natural factors such as location, time or weather and other factors such as personnel or equipment necessary to carry out this experiment. By considering the effect of changing experimental environments on the reproducibility of experiments, we try to figure out in what situations the initial experimental results will likely carry over to other environments. We examine how p-value, effect size, sample size, and the ratio of the standard deviation of environment by treatment interaction and the standard deviation of experimental error interact with one another, and as a whole, affect the experiment's reproducibility. We suggest that not only p-values but also the effect sizes and the environmental effect ratio---the ratio of the standard deviation of environment by treatment interaction and the standard deviation of experimental error---should be considered when researchers are making statistical inferences. Large effect sizes and/or small ratios of the environmental effect ratio favor high probability of reproducibility. If the environmental effect ratio is too large, the reproducibility probability may be reduced to just a coin toss, and if effect sizes are small, researchers should be very cautious about making inferences about reproducibility even if the observed p-value is small and sample size is large.
|
2 |
Criteria for demonstrating the efficacy of a medical testBlackman, Nicole Jill-Marie January 2002 (has links)
No description available.
|
3 |
An analysis of software defect prediction studies through reproducibility and replicationMahmood, Zaheed January 2018 (has links)
Context. Software defect prediction is essential in reducing software development costs and in helping companies save their reputation. Defect prediction uses mathematical models to identify patterns associated with defects within code. Resources spent reviewing the entire code can be minimised by focusing on defective parts of the code. Recent findings suggest many published prediction models may not be reliable. Critical scientific methods for identifying reliable research are Replication and Reproduction. Replication can test the external validity of studies while Reproduction can test their internal validity. Aims. The aims of my dissertation are first to study the use and quality of replications and reproductions in defect prediction. Second, to identify factors that aid or hinder these scientific methods. Methods. My methodology is based on tracking the replication of 208 defect prediction studies identified in a highly cited Systematic Literature Review (SLR) [Hall et al. 2012]. I analyse how often each of these 208 studies has been replicated and determine the type of replication carried out. I use quality, citation counts, publication venue, impact factor, and data availability from all the 208 papers to see if any of these factors are associated with the frequency with which they are replicated. I further reproduce the original studies that have been replicated in order to check their internal validity. Finally, I identify factors that affect reproducibility. Results. Only 13 (6%) of the 208 studies are replicated, most of which fail a quality check. Of the 13 replicated original studies, 62% agree with their replications and 38% disagree. The main feature of a study associated with being replicated is that original papers appear in the Transactions of Software Engineering (TSE) journal. The number of citations an original paper had was also an indicator of the probability of being replicated. In addition, studies conducted using closed source data have more replications than those based on open source data. Of the 4 out of 5 papers I reproduced, their results differed with those of the original by more than 5%. Four factors are likely to have caused these failures: i) lack of a single version of the data initially used by the original; ii) the different dataset versions available have different properties that impact model performance; iii) unreported data preprocessing; and iv) inconsistent results from alternative versions of the same tools. Conclusions. Very few defect prediction studies are replicated. The lack of replication and failure of reproduction means that it remains unclear how reliable defect prediction is. Further investigation into this failure provides key aspects researchers need to consider when designing primary studies, performing replication and reproduction studies. Finally, I provide practical steps for improving the likelihood of replication and the chances of validating a study by reporting key factors.
|
4 |
The Reliability of Local Sweat Rate Measured Via the Ventilated Capsule Technique: Effects of Measurement Region and Level of Heat StrainRutherford, Maura McLean 14 September 2020 (has links)
Ventilated capsules (i.e. hygrometry) are widely used to measure time dependent changes in local sweat rate. Despite this, understanding of the reliability (consistency) of local sweat rate is limited to the forearm during mild hyperthermia. Further, extensive regional heterogeneity in sweating may render some regions more reliable than others. Knowledge of reliability has important implications for experimental design, statistical analysis and interpretation, yet it is relatively unknown. The purpose of this study was to determine local sweat rate across various regions of the body and the reliability of these responses, during increasing levels of hyperthermia. On three separate instances, fourteen young men (age: 24 [SD 5] years) donned a whole-body water perfusion suit to raise and clamp esopogheal temperature at elicit low (+0.6°C), moderate (+1.2°C) and high (+1.8°C) levels of heat strain. Local sweat rate was measured at the forehead, chest, abdomen, bicep, forearm, hand, quadriceps, calf, and foot via ventilated capsules (3.8 cm2). Absolute reliability was assessed using coefficient of variation (CV%) which quantifies the amount of error in a given measurement. Relative reliability was evaluated via the intraclass correlation coefficient (ICC); the consistency of an individual’s rank within a group during repeated measurements. At low heat strain, most sites demonstrated acceptable relative (ICC ≥0.70), and moderate absolute reliability (CV <25%). At moderate-heat strain, the abdomen, hand, quadriceps, calf and foot had acceptable relative reliability while the forehead, abdomen, forearm, hand and quadriceps had moderate absolute reliability. At high-heat strain, relative reliability was acceptable at the abdomen, quadriceps, calf, foot and absolute reliability was moderate at the chest, abdomen, forearm, hand, quadriceps, calf and foot. Our findings indicate that reliability of local sweat rate is dependent on both measurement site and level of hyperthermia. Researchers should consider this in their experimental design to increase the likelihood of detecting an effect of an intervention if one exists.
|
5 |
The impact factor: a useful indicator of journal quality or fatally flawed?Elliott, David January 2014 (has links)
No
|
6 |
Retractions, Post-Publication Peer Review and Fraud: Scientific Publishing's Wild WestOransky, Ivan 27 October 2016 (has links)
Presentation given on October 27, 2016 at Data Reproducibility: Integrity and Transparency program as part of Open Access Week 2016. / Ivan Oransky and Adam Marcus founded Retraction Watch in 2010. Unbeknownst to them, retractions had grown ten-fold in the previous decade. Oransky will discuss the reasons for that increase, whether fraud is on the rise, the growth of post-publication peer review, and other trends he and Marcus have seen as they've built a site that is now viewed by 150,000 people per month, and funded by philanthropies including the MacArthur and Arnold Foundations.
|
7 |
Confiabilidade e validade do Índice de prioridade de tratamento / Reliability and validity of the Treatment Priority Index (TPI)Renata Biella de Salles Oliveira 11 April 2011 (has links)
O presente trabalho teve como objetivo estimar a confiabilidade e validade do Índice de Prioridade de Tratamento (IPT) na avaliação das alterações oclusais, através da análise de 200 modelos selecionados do Arquivo de Ortodontia da Faculdade de Odontologia de Bauru, apresentando os diferentes tipos de má oclusão. Estes modelos foram avaliados em dois momentos, o primeiro, por uma comissão formada por 16 experientes ortodontistas que os avaliou subjetivamente quanto à severidade da má oclusão, dificuldade e duração do tratamento necessário. Em um segundo momento, os modelos foram avaliados através do IPT por dois ortodontistas previamente calibrados para a utilização do índice. A confiabilidade intraexaminador foi testada a partir da reavaliação de 50 pares de modelos e foi calculada pelo coeficiente de correlação intraclasse (CCI), além do teste t dependente. A confiabilidade interexaminador foi estimada sobre os 200 pares de modelos de estudo, sendo calculada pelo CCI e pelo teste t independente. Por fim, a validade do índice foi avaliada pelo comparando-se as médias dos valores obtidos pelo IPT e as percepções subjetivas através do coeficiente de correlação de Pearson. Por motivos comparativos, os modelos também foram analisados através do índice PAR (Peer Assessment Rating), que é um instrumento válido e amplamente aceito na avaliação dos resultados oclusais. Os resultados mostraram que o IPT, assim como o PAR, é um índice altamente reprodutível, uma vez que revelou altos níveis de concordância inter (ICC=0,97) e intraexaminador (ICC1=0,97 e ICC2= 0,96), e válido para investigar a severidade da má oclusão (R=0,25), dificuldade do tratamento (R=0,24) e duração do tratamento (R=0,29). No entanto, apesar de significantes, as correlações encontradas pelo IPT durante a validação foram muito fracas, principalmente em comparação com o índice PAR. Conclui-se, portanto, que o IPT apesar de reprodutível, possui pouca validade como um instrumento de avaliação das alterações oclusais. / The present study sought to estimate the reliability and validity of the Treatment Priority Index (TPI) for assessment of occlusal changes through analysis of 200 dental casts from the files of the Department of Orthodontics at Bauru Dental School, selected as a representative sample of the various types of malocclusion. These casts were evaluated twice: first, by a panel of 16 experienced orthodontists, who carried out subjective assessments of the severity of malocclusion and predicted the difficulty and duration of the required treatment. In another moment, the casts were assessed with the TPI by two orthodontists calibrated beforehand. The intrarater reliability was tested by means of reassessment of 50 pairs of models, with an intraclass correlation coefficient (ICC) and the dependent t-test. The inter-rater reliability was estimated considering all 200 casts, through the ICC and the independent t-test. Finally, the validity of the index was assessed by comparison between average TPI scores and subjective perceptions through Pearsons correlation coefficient. For comparative purposes, dental casts were also analyzed using the PAR (Peer Assessment Rating) index, a valid and widely accepted instrument for assessment of occlusal outcomes. Results showed that the TPI is a highly reproducible index, as is the PAR, with high levels of inter-rater (ICC=0.97) and intra-rater reliability (ICC1=0.97, ICC2=0.96), and is a valid instrument for assessment of malocclusion severity (R=0.25), treatment difficulty (R=0.24) and treatment duration (R=0.29). However, despite their statistical significance, the TPI correlations were very weak, particularly on comparison with the PAR. Despite its reproducibility, the TPI has very limited validity for assessment of occlusal changes provided by the orthodontic treatment.
|
8 |
Development of strategies for assessing reporting in biomedical research : moving toward enhancing reproducibilityFlorez Vargas, Oscar January 2016 (has links)
The idea that the same experimental findings can be reproduced by a variety of independent approaches is one of the cornerstones of science's claim to objective truth. However, in recent years, it has become clear that science is plagued by findings that cannot be reproduced and, consequently, invalidating research studies and undermining public trust in the research enterprise. The observed lack of reproducibility may be a result, among other things, of the lack of transparency or completeness in reporting. In particular, omissions in reporting the technical nature of the experimental method make it difficult to verify the findings of experimental research in biomedicine. In this context, the assessment of scientific reports could help to overcome - at least in part - the ongoing reproducibility crisis. In addressing this issue, this Thesis undertakes the challenge of developing strategies for the evaluation of reporting biomedical experimental methods in scientific manuscripts. Considering the complexity of experimental design - often involving different technologies and models, we characterise the problem in methods reporting through domain-specific checklists. Then, by using checklists as a decision making tool, supported by miniRECH - a spreadsheet-based approach that can be used by authors, editors and peer-reviewers - a reasonable level of consensus on reporting assessments was achieved regardless of the domain-specific expertise of referees. In addition, by using a text-mining system as a screening tool, a framework to guide an automated assessment of the reporting of bio-experiments was created. The usefulness of these strategies was demonstrated in some domain-specific scientific areas as well as in mouse models across biomedical research. In conclusion, we suggested that the strategies developed in this work could be implemented through the publication process as barriers to prevent incomplete reporting from entering the scientific literature, as well as promoters of completeness in reporting to improve the general value of the scientific evidence.
|
9 |
Data preservation and reproducibility at the LHCb experiment at CERNTrisovic, Ana January 2018 (has links)
This dissertation presents the first study of data preservation and research reproducibility in data science at the Large Hadron Collider at CERN. In particular, provenance capture of the experimental data and the reproducibility of physics analyses at the LHCb experiment were studied. First, the preservation of the software and hardware dependencies of the LHCb experimental data and simulations was investigated. It was found that the links between the data processing information and the datasets themselves were obscure. In order to document these dependencies, a graph database was designed and implemented. The nodes in the graph represent the data with their processing information, software and computational environment, whilst the edges represent their dependence on the other nodes. The database provides a central place to preserve information that was previously scattered across the LHCb computing infrastructure. Using the developed database, a methodology to recreate the LHCb computational environment and to execute the data processing on the cloud was implemented with the use of virtual containers. It was found that the produced physics events were identical to the official LHCb data, meaning that the system can aid in data preservation. Furthermore, the developed method can be used for outreach purposes, providing a streamlined way for a person external to CERN to process and analyse the LHCb data. Following this, the reproducibility of data analyses was studied. A data provenance tracking service was implemented within the LHCb software framework \textsc{Gaudi}. The service allows analysts to capture their data processing configurations that can be used to reproduce a dataset within the dataset itself. Furthermore, to assess the current status of the reproducibility of LHCb physics analyses, the major parts of an analysis were reproduced by following methods described in publicly and internally available documentation. This study allowed the identification of barriers to reproducibility and specific points where documentation is lacking. With this knowledge, one can specifically target areas that need improvement and encourage practices that would improve reproducibility in the future. Finally, contributions were made to the CERN Analysis Preservation portal, which is a general knowledge preservation framework developed at CERN to be used across all the LHC experiments. In particular, the functionality to preserve source code from git repositories and Docker images in one central location was implemented.
|
10 |
Active provenance for data intensive researchSpinuso, Alessandro January 2018 (has links)
The role of provenance information in data-intensive research is a significant topic of discussion among technical experts and scientists. Typical use cases addressing traceability, versioning and reproducibility of the research findings are extended with more interactive scenarios in support, for instance, of computational steering and results management. In this thesis we investigate the impact that lineage records can have on the early phases of the analysis, for instance performed through near-real-time systems and Virtual Research Environments (VREs) tailored to the requirements of a specific community. By positioning provenance at the centre of the computational research cycle, we highlight the importance of having mechanisms at the data-scientists' side that, by integrating with the abstractions offered by the processing technologies, such as scientific workflows and data-intensive tools, facilitate the experts' contribution to the lineage at runtime. Ultimately, by encouraging tuning and use of provenance for rapid feedback, the thesis aims at improving the synergy between different user groups to increase productivity and understanding of their processes. We present a model of provenance, called S-PROV, that uses and further extends PROV and ProvONE. The relationships and properties characterising the workflow's abstractions and their concrete executions are re-elaborated to include aspects related to delegation, distribution and steering of stateful streaming operators. The model is supported by the Active framework for tuneable and actionable lineage ensuring the user's engagement by fostering rapid exploitation. Here, concepts such as provenance types, configuration and explicit state management allow users to capture complex provenance scenarios and activate selective controls based on domain and user-defined metadata. We outline how the traces are recorded in a new comprehensive system, called S-ProvFlow, enabling different classes of consumers to explore the provenance data with services and tools for monitoring, in-depth validation and comprehensive visual-analytics. The work of this thesis will be discussed in the context of an existing computational framework and the experience matured in implementing provenance-aware tools for seismology and climate VREs. It will continue to evolve through newly funded projects, thereby providing generic and user-centred solutions for data-intensive research.
|
Page generated in 0.0712 seconds