Spelling suggestions: "subject:"istatistical analysis."" "subject:"bystatistical analysis.""
191 |
Prediction of Surfactant Mildness for Rinse-off Formulations Using Preclinical AssaysMcCardy, Nicole R. 21 October 2016 (has links)
No description available.
|
192 |
Statistical Analysis of Asthma Hospitalization Incidences in Canadian ChildrenDai, Jennifer 12 1900 (has links)
Asthma is the leading chronic disease of children in industrialized countries. In Canada, it is the most common cause of hospital admissions in children. Data were assembled for all asthma hospitalizations in Canada from 1990 to 2000 by the Canadian Institute for Health Information (CIHI). The annual cycles of asthma hospitalization among Canadian children from 1990 to 2000 were compared. For every year, region and latitude, asthma hospitalizations were lowest in July and August followed by a major peak in September then a rapid decline. Contingency table analyses were done to examine the homogeneity of the distributions of asthma hospitalization counts for the factors age, gender, region and latitude groups. Age, region and latitude groups were found to be significantly different with respect to their distribution of asthma counts. However, the distributions of asthma hospitalization counts did not differ significantly for gender. A nonlinear least squares model was fitted to the asthma hospitalization data for weeks 30 to 42. The primary objective was to obtain estimates of the parameter that describes the timing of the September peak. Next, a likelihood ratio test was done to assess the homogeneity of the September peaks for the factors age, gender, latitude and region. We found that, apart from gender, the September peaks were significantly different. Furthermore, the annual cycle of asthma hospitalization for children aged 2 to 4 was identical to that of children aged 5 to 15 except the peak in hospitalization for 2 to 4 year olds occurred on average 2 days after the older children. We suspect that the increase prevalence of and exposure to viral infections, exposure of school-aged children to allergens at school and the transmission of these factors to younger siblings are responsible for the September asthma epidemic. A Quasi-Poisson log-linear model was also fitted to the data to assess jointly the effects of age, gender, latitude, year and risk group size. The data were overdispersed, after accounting for overdispersion, we found that age, gender, latitude, year and the interactions between age and gender, age and latitude and gender and latitude were significant in explaining the data. Surprisingly, time had a negative effect suggesting a tendency to decline in the number of asthma incidences requiring hospitalization over the years. / Thesis / Master of Science (MSc)
|
193 |
Computationally-effective Modeling of Far-field Underwater Explosion for Early-stage Surface Ship DesignLu, Zhaokuan 23 March 2020 (has links)
The vulnerability of a ship to the impact of underwater explosions (UNDEX) and how to incorporate this factor into early-stage ship design is an important aspect in the ship survivability study. In this dissertation, attention is focused on the cost-efficient simulation of the ship response to a far-field UNDEX which involves fluid shock waves, cavitation, and fluid-structural interaction. Traditional fluid numerical simulation approaches using the Finite Element Method to track wave propagation and cavitation requires a high-level of mesh refinement to prevent numerical dispersion from discontinuities. Computation also becomes quite expensive for full ship-related problems due to the large fluid domain necessary to envelop the ship. The burden is aggravated by the need to generate a fluid mesh around the irregular ship hull geometry, which typically requires significant manual intervention. To accelerate the design process and enable the consideration of far-field UNDEX vulnerability, several contributions are made in this dissertation to make the simulation more efficient. First, a Cavitating Acoustic Spectral Element approach which has shown computational advantages in UNDEX problems, but not systematically assessed in total ship application, is used to model the fluid. The use of spectral elements shows greater structural response accuracy and lower computational cost than the traditional FEM. Second, a novel fully automatic all-hexahedral mesh generation scheme is applied to generate the fluid mesh. Along with the spectral element, the all-hex mesh shows greater accuracy than the all-tetrahedral finite element mesh which is typically used. This new meshing approach significantly saves time for mesh generation and allows the spectral element, which is confined to the hexahedral element, to be applied in practical ship problems. A further contribution of this dissertation is the development of a surrogate non-numerical approach to predict structural peak responses based on the shock factor concept. The regression analysis reveals a reasonably strong linear relationship between the structural peak response and the shock factor. The shock factor can be conveniently employed in the design aspects where the peak response is sufficient, using much less computational resources than numerical solvers. / Doctor of Philosophy / The vulnerability of a ship to the impact of underwater explosions (UNDEX) and how to incorporate this factor into early-stage ship design is an important aspect in the ship survivability study. In this dissertation, attention is focused on the cost-efficient simulation of the ship response to a far-field UNDEX which involves fluid shock waves, cavitation, and fluid-structural interaction. Traditional fluid numerical simulation approaches using the Finite Element Method to track wave propagation and cavitation requires a highly refined mesh to deal with large numerical errors. Computation also becomes quite expensive for full ship-related problems due to the large fluid domain necessary to envelop the ship. The burden is aggravated by the need to generate a fluid mesh around the irregular ship hull geometry, which typically requires significant manual intervention. To accelerate the design process and enable the consideration of far-field UNDEX vulnerability, several contributions are made in this dissertation to make the simulation more efficient. First, a Cavitating Acoustic Spectral Element approach, which has shown computational advantages in UNDEX problems but not systematically assessed in total ship application, is used to model the fluid. The use of spectral elements shows greater structural response accuracy and lower computational cost than the traditional FEM. Second, a novel fully automatic all-hexahedral mesh generation scheme is applied to generate the fluid mesh. Along with the spectral element, the all-hex mesh shows greater accuracy than the all-tetrahedral finite element mesh which is typically used. A further contribution of this dissertation is the development of a non-numerical approach which can approximate peak structural responses comparable to the numerical solution with far less computational effort.
|
194 |
Computational Cost Analysis of Large-Scale Agent-Based Epidemic SimulationsKamal, Tariq 21 September 2016 (has links)
Agent-based epidemic simulation (ABES) is a powerful and realistic approach for studying the impacts of disease dynamics and complex interventions on the spread of an infection in the population. Among many ABES systems, EpiSimdemics comes closest to the popular agent-based epidemic simulation systems developed by Eubank, Longini, Ferguson, and Parker. EpiSimdemics is a general framework that can model many reaction-diffusion processes besides the Susceptible-Exposed-Infectious-Recovered (SEIR) models. This model allows the study of complex systems as they interact, thus enabling researchers to model and observe the socio-technical trends and forces. Pandemic planning at the world level requires simulation of over 6 billion agents, where each agent has a unique set of demographics, daily activities, and behaviors. Moreover, the stochastic nature of epidemic models, the uncertainty in the initial conditions, and the variability of reactions require the computation of several replicates of a simulation for a meaningful study. Given the hard timelines to respond, running many replicates (15-25) of several configurations (10-100) (of these compute-heavy simulations) can only be possible on high-performance clusters (HPC). These agent-based epidemic simulations are irregular and show poor execution performance on high-performance clusters due to the evolutionary nature of their workload, large irregular communication and load imbalance.
For increased utilization of HPC clusters, the simulation needs to be scalable. Many challenges arise when improving the performance of agent-based epidemic simulations on high-performance clusters. Firstly, large-scale graph-structured computation is central to the processing of these simulations, where the star-motif quality nodes (natural graphs) create large computational imbalances and communication hotspots. Secondly, the computation is performed by classes of tasks that are separated by global synchronization. The non-overlapping computations cause idle times, which introduce the load balancing and cost estimation challenges. Thirdly, the computation is overlapped with communication, which is difficult to measure using simple methods, thus making the cost estimation very challenging. Finally, the simulations are iterative and the workload (computation and communication) may change through iterations, as a result introducing load imbalances.
This dissertation focuses on developing a cost estimation model and load balancing schemes to increase the runtime efficiency of agent-based epidemic simulations on high-performance clusters. While developing the cost model and load balancing schemes, we perform the static and dynamic load analysis of such simulations. We also statically quantified the computational and communication workloads in EpiSimdemics. We designed, developed and evaluated a cost model for estimating the execution cost of large-scale parallel agent-based epidemic simulations (and more generally for all constrained producer-consumer parallel algorithms). This cost model uses computational imbalances and communication latencies, and enables the cost estimation of those applications where the computation is performed by classes of tasks, separated by synchronization. It enables the performance analysis of parallel applications by computing its execution times on a number of partitions. Our evaluations show that the model is helpful in performance prediction, resource allocation and evaluation of load balancing schemes. As part of load balancing algorithms, we adopted the Metis library for partitioning bipartite graphs. We have also developed lower-overhead custom schemes called Colocation and MetColoc. We performed an evaluation of Metis, Colocation, and MetColoc. Our analysis showed that the MetColoc schemes gives a performance similar to Metis, but with half the partitioning overhead (runtime and memory). On the other hand, the Colocation scheme achieves a similar performance to Metis on a larger number of partitions, but at extremely lower partitioning overhead. Moreover, the memory requirements of Colocation scheme does not increase as we create more partitions. We have also performed the dynamic load analysis of agent-based epidemic simulations. For this, we studied the individual and joint effects of three disease parameter (transmissiblity, infection period and incubation period). We quantified the effects using an analytical equation with separate constants for SIS, SIR and SI disease models.
The metric that we have developed in this work is useful for cost estimation of constrained producer-consumer algorithms, however, it has some limitations. The applicability of the metric is application, machine and data-specific. In the future, we plan to extend the metric to increase its applicability to a larger set of machine architectures, applications, and datasets. / Ph. D.
|
195 |
Introducing a new sharpness factor to evaluate chess openingsSalmi, Samuli January 2024 (has links)
This thesis presents a comprehensive analysis of chess openings through the lens of data science. Utilizing the python-chess library, this study analyzes millions of chess games and thousands of opening sequences to define the term of ‘sharpness’ in chess openings and to evaluate if it relates to popularity in different levels of play. The methods used in the study involve data mining, extraction, and transformation in addition to statistical modeling, leveraging Python for all of these methods. Keyfindings of the research indicate that sharpness can be quantified and sorted through chess engine evaluations and applied to opening sequences. Another key finding is that the preferences of opening choice vary significantly between low-level and high-level players. The results point out certain opening sequences that should beintroduced to players’ opening repertoires based on the sharpness factor. The significance of this research is its contribution to both the field of data science and the chess community. For data scientists and statisticians, it showcases the application of analytical techniques to define a new take on the fuzzy concept of sharpness in such a complex game as chess. For chess players and enthusiasts, it offers a new perspective on opening strategies, potentially enhancing their opening theory knowledge.
|
196 |
STRESS TESTING AN SME PORTFOLIO : Effects of an Adverse Macroeconomic Scenario on Credit Risk Transition MatricesAlmqvist, Siri, Nordin, Oskar January 2021 (has links)
The financial crisis of 2007-2008 was a severe global crisis causing a worldwide recession. One of the main contributing factors of the crisis was the excessive risk appetite of banks and financial institutions. Since then, regulatory authorities and financial institutions have directed focus towards risk management with the main objective to avert a similar crisis from occurring in the future. The aim of this thesis is to investigate how an adverse macroeconomic scenario would affect the migrations between risk classes of an SME portfolio, referred to as stress test. This thesis utilises two frameworks, one by Belkin and Suchower and one by Carlehed and Petrov, for creating a single systematic indicator describing the credit class migrations of the portfolio. Four different regression model setups (Ordinary Least Squares, Additive Model, XGBoost and SVM) are then used to describe the relationship between macroeconomic indicators and this systematic indicator. The four models are evaluated in terms of interpretability and ability to predict in order to find the main drivers for the systematic indicator. Their corresponding prediction errors are compared to find the best model. The portfolio is stress tested by using the regression models to predict the corresponding systematic indicator given an adverse macroeconomic scenario. The probability of default, estimated from the indicator using each of the frameworks, are then compared and analysed with regards to the systematic indicator. The results show that unemployment is the main driver of the risk class migrations for an SME portfolio, both from a statistical and economical perspective. The most appropriate regression model is the additive model because of its performance and interpretability and is therefore advised to use for this problem. From the PD estimations, it is concluded that the framework by Belkin and Suchower gives a more volatile estimate than that of Carlehed and Petrov.
|
197 |
Model-based Analysis of Diversity in Higher EducationAndalib, Maryam Alsadat 03 July 2018 (has links)
U.S. higher education is an example of a large multi-organizational system within the service sector. Its performance regarding workforce development can be analyzed through the lens of industrial and systems engineering. In this three-essay dissertation, we seek the answer to the following question: How can the U.S. higher education system achieve an equal representation of female and minority members in its student and faculty populations? In essay 1, we model the education pipeline with a focus on the system's gender composition from k-12 to graduate school. We use a system dynamics approach to present a systems view of the mechanisms that affect the dynamics of higher education, replicate historical enrollment data, and forecast future trends of higher education's gender composition. Our results indicate that, in the next two decades, women will be the majority of advanced degree holders. In essay 2, we look at the support mechanisms for new-parent, tenure-track faculty in universities with a specific focus on tenure-clock extension policies. We construct a unique data set to answer questions around the effectiveness of removing the stigma connected with automatic tenure-clock policies. Our results show that such policies are successful in removing the stigma and that, overall, faculty members that have newborns and are employed by universities that adopt auto-TCE policies stay one year longer in their positions than other faculty members. In addition, although faculty employed at universities that adopt such policies are generally more satisfied with their jobs, there is no statistically significant effect of auto TCE policies on the chances of obtaining tenure. In essay 3, we focus on the effectiveness of training underrepresented minorities (e.g., African Americans and Hispanics) in U.S. higher education institutions using a Data Envelopment Analysis approach. Our results indicate that graduation rates, average GPAs, and post-graduate salaries of minority students are higher in selective universities and those located in more diverse towns/cities. Furthermore, the graduation rate of minority students in private universities and those with affirmative action programs is higher than in other institutions. Overall, this dissertation provides new insights into improving diversity within the science workforce at different organizational levels by using industrial and systems engineering and management sciences methods. / Ph. D. / One of the goals of higher education institutions is to increase diversity within student and faculty bodies. Equal inclusion of all individuals in students and faculty populations is important to society in several ways. First, providing an equal chance for individuals’ higher education and employment, regardless of demographic characteristics, is a cornerstone of any democratic society. Second, improving educational system diversity leads to higher educational achievements, as overall diversity of U.S. universities is a key indicator of global excellence. Despite improvement over the last decades, we still do not see an equitable distribution of women and racial minorities in such populations. The disparities in minority representation are even greater at higher levels of education and academic employment, such as graduate school and tenure-track positions. In this dissertation, our focus is on the trends, processes, and performance of the U.S. higher education system as it relates to diversity. We apply innovative industrial, systems engineering, and management sciences methods to the subject of diversity in the higher education context. The goal is to investigate answers to the following question: How can the U.S. higher education system achieve equal representation of female and minority groups in its student and faculty populations? The results of this dissertation could be used to train policy makers at institution and state levels on the ways of transforming universities into better places for females and minority groups. In particular, the system dynamics model could be used as a flight simulator in performing policy tests for educational workshops. Moreover, the outcomes could inform individuals and policy makers about the barriers doctorate holders face in following a successful academic path. Finally, this dissertation could be used in system dynamics and Data Envelopment Analysis classes as both case study and teaching materials.
|
198 |
Contribution à l'analyse et l'évaluation des requêtes expertes : cas du domaine médical / Contribution to the analyze and evaluation of clinical queries : medical domainZnaidi, Eya 30 June 2016 (has links)
La recherche d'information nécessite la mise en place de stratégies qui consistent à (1) cerner le besoin d'information ; (2) formuler le besoin d'information ; (3) repérer les sources pertinentes ; (4) identifier les outils à exploiter en fonction de ces sources ; (5) interroger les outils ; et (6) évaluer la qualité des résultats. Ce domaine n'a cessé d'évoluer pour présenter des techniques et des approches permettant de sélectionner à partir d'un corpus de documents l'information pertinente capable de satisfaire le besoin exprimé par l'utilisateur. De plus, dans le contexte applicatif du domaine de la RI biomédicale, les sources d'information hétérogènes sont en constante évolution, aussi bien du point de vue de la structure que du contenu. De même, les besoins en information peuvent être exprimés par des utilisateurs qui se caractérisent par différents profils, à savoir : les experts médicaux comme les praticiens, les cliniciens et les professionnels de santé, les utilisateurs néophytes (sans aucune expertise ou connaissance du domaine) comme les patients et leurs familles, etc. Plusieurs défis sont liés à la tâche de la RI biomédicale, à savoir : (1) la variation et la diversité du besoin en information, (2) différents types de connaissances médicales, (3) différences de compé- tences linguistiques entre experts et néophytes, (4) la quantité importante de la littérature médicale ; et (5) la nature de la tâche de RI médicale. Cela implique une difficulté d'accéder à l'information pertinente spécifique au contexte de la recherche, spécialement pour les experts du domaine qui les aideraient dans leur prise de décision médicale. Nos travaux de thèse s'inscrivent dans le domaine de la RI biomédicale et traitent les défis de la formulation du besoin en information experte et l'identification des sources pertinentes pour mieux répondre aux besoins cliniques. Concernant le volet de la formulation et l'analyse de requêtes expertes, nous proposons des analyses exploratoires sur des attributs de requêtes, que nous avons définis, formalisés et calculés, à savoir : (1) deux attributs de longueur en nombre de termes et en nombre de concepts, (2) deux facettes de spécificité terme-document et hiérarchique, (3) clarté de la requête basée sur la pertinence et celle basée sur le sujet de la requête. Nous avons proposé des études et analyses statistiques sur des collections issues de différentes campagnes d'évaluation médicales CLEF et TREC, afin de prendre en compte les différentes tâches de RI. Après les analyses descriptives, nous avons étudié d'une part, les corrélations par paires d'attributs de requêtes et les analyses de corrélation multidimensionnelle. Nous avons étudié l'impact de ces corrélations sur les performances de recherche d'autre part. Nous avons pu ainsi comparer et caractériser les différentes requêtes selon la tâche médicale d'une manière plus généralisable. Concernant le volet lié à l'accès à l'information, nous proposons des techniques d'appariement et d'expansion sémantiques de requêtes dans le cadre de la RI basée sur les preuves cliniques. / The research topic of this document deals with a particular setting of medical information retrieval (IR), referred to as expert based information retrieval. We were interested in information needs expressed by medical domain experts like praticians, physicians, etc. It is well known in information retrieval (IR) area that expressing queries that accurately reflect the information needs is a difficult task either in general domains or specialized ones and even for expert users. Thus, the identification of the users' intention hidden behind queries that they submit to a search engine is a challenging issue. Moreover, the increasing amount of health information available from various sources such as government agencies, non-profit and for-profit organizations, internet portals etc. presents oppor- tunities and issues to improve health care information delivery for medical professionals, patients and general public. One critical issue is the understanding of users search strategies and tactics for bridging the gap between their intention and the delivered information. In this thesis, we focus, more particularly, on two main aspects of medical information needs dealing with the expertise which consist of two parts, namely : - Understanding the users intents behind the queries is critically important to gain a better insight of how to select relevant results. While many studies investigated how users in general carry out exploratory health searches in digital environments, a few focused on how are the queries formulated, specifically by domain expert users. We address more specifically domain expert health search through the analysis of query attributes namely length, specificity and clarity using appropriate proposed measures built according to different sources of evidence. In this respect, we undertake an in-depth statistical analysis of queries issued from IR evalua- tion compaigns namely Text REtrieval Conference (TREC) and Conference and Labs of the Evaluation Forum (CLEF) devoted for different medical tasks within controlled evaluation settings. - We address the issue of answering PICO (Population, Intervention, Comparison and Outcome) clinical queries formulated within the Evidence Based Medicine framework. The contributions of this part include (1) a new algorithm for query elicitation based on the semantic mapping of each facet of the query to a reference terminology, and (2) a new document ranking model based on a prioritized aggregation operator. we tackle the issue related to the retrieval of the best evidence that fits with a PICO question, which is an underexplored research area. We propose a new document ranking algorithm that relies on semantic based query expansion leveraged by each question facet. The expansion is moreover bounded by the local search context to better discard irrelevant documents. The experimental evaluation carried out on the CLIREC dataset shows the benefit of our approaches.
|
199 |
Statistical Analysis of Mining Parameters to Create Empirical Models to Predict Mine Pool Formation in Underground Coal MinesSchafer , Lindsey A. 01 October 2018 (has links)
No description available.
|
200 |
Metabolic profiling of plant disease : from data alignment to pathway predictionsPerera, Munasinhage Venura Lakshitha January 2011 (has links)
Understanding the complex metabolic networks present in organisms, through the use of high throughput liquid chromatography coupled mass spectrometry, will give insight into the physiological changes responding to stress. However the lack of a proper work flow and robust methodology hinders verifiable biological interpretation of mass profiling data. In this study a novel workflow has been developed. A novel Kernel based feature alignment algorithm, which outperformed Agilent’s Mass profiler and showed roughly a 20% increase in alignment accuracy, is presented for the alignment of mass profiling data. Prior to statistical analysis post processing of data is carried out in two stages, noise filtering is applied to consensus features which were aligned at a 50% or higher rate. Followed by missing value imputation a method was developed that outperforms both at model recovery and false positive detection. The use of parametric methods for statistical analysis is inefficient and produces a large number of false positives. In order to tackle this three non-parametric methods were considered. The histogram method for statistical analysis was found to yield the lowest false positive rate. Data is presented which was analysed using these methods to reveal metabolomic changes during plant pathogenesis. A high resolution time series dataset was produced to explore the infection of Arabidopsis thaliana by the (hemi) biotroph Pseudomonas syringe pv tomato DC3000 and its disarmed mutant DC3000hrpA, which is incapable of causing infection. Approximately 2000 features were found to be significant through the time series. It was also found that by 4h the plants basal defence mechanism caused the significant ‘up-regulation’ of roughly 400 features, of which 240 were found to be at a 4-fold change. The identification of these features role in pathogenesis is supported by the fact that of those features found to discriminate between treatments a number of pathways were identified which have previously been documented to be active due to pathogenesis
|
Page generated in 0.12 seconds