651 |
Process Data Applications in Educational AssessmentQi, Jitong January 2023 (has links)
The widespread adoption of computer-based testing has opened up new possibilities for collecting process data, providing valuable insights into the problem-solving processes that examinees engage in when answering test items. In contrast to final response data, process data offers a more diverse and comprehensive view of test takers, including construct-irrelevant characteristics. However, leveraging the potential of process data poses several challenges, including dealing with serial categorical responses, navigating nonstandard formats, and handling the inherent variability. Despite these challenges, the incorporation of process data in educational assessments holds immense promise as it enriches our understanding of students' cognitive processes and provides additional insights into their interactive behaviors. This thesis focuses on the application of process data in educational assessments across three key aspects.
Chapter 2 explores the accurate assessment of a student's ability by incorporating process data into the assessment. Through a combination of theoretical analysis, simulations, and empirical study, we demonstrate that appropriately integrating process data significantly enhances assessment precision.
Building upon this foundation, Chapter 3 takes a step further by addressing not only the target attribute of interest but also the nuisance attributes present in the process data to mitigate the issue of differential item functioning. We present a novel framework that leverages process data as proxies for nuisance attributes in item response functions, effectively reducing or potentially eliminating differential item functioning. We validate the proposed framework using both simulated data and real data from the PIAAC PSTRE items.
Furthermore, this thesis extends beyond the analysis of existing tests and explores enhanced strategies for item administration. Specifically, in Chapter 4, we investigate the potential of incorporating process data in computerized adaptive testing. Our adaptive item selection algorithm leverages information about individual differences in both measured proficiency and other meaningful traits that can influence item informativeness. A new framework for process-based adaptive testing, encompassing real-time proficiency scoring and item selection is presented and evaluated through a comprehensive simulation study to demonstrate the efficacy.
|
652 |
Overlapping Communities on Large-Scale Networks: Benchmark Generation and Learning via Adaptive Stochastic OptimizationGrande, Alessandro Antonio January 2022 (has links)
This dissertation builds on two lines of research that are related to the task of community detection on large-scale network data.
Our first contribution is a novel generator for large-scale networks with overlapping communities. Synthetic generators are essential for algorithm testing and simulation studies for networks, as these data are scarce and constantly evolving. We propose a generator based on a flexible random graph model that allows for the control of two complementary measures of centrality -- the degree centrality and the eigencentrality. For an arbitrary centrality target and community structure, we study the problem of recovering the model parameters that enforce such targets in expectation. We find that this problem always admits a solution in the parameter space, which is also unique for large graphs. We propose to recover this solution via a properly initialized multivariate-Newton Raphson algorithm. The resulting benchmark generator is able to simulate networks with a billion edges and hundreds of millions of nodes in 30 seconds, while reproducing a wide spectrum of network topologies -- including assortative mixing and power-law centrality distributions.
Our second contribution involves variance reduction techniques for stochastic variational inference (SVI). SVI scales approximate inference to large-scale data -- including massive networks -- via stochastic optimization. SVI is efficient because, at each iteration, it only uses a random minibatch of the data to produce a noisy estimate of the gradient. However, such estimates can suffer from high variance, which slows down convergence. One strategy to reduce the variance of the gradient is to use importance sampling, biasing the distribution of data for each minibatch towards the data points that are most influential to the inference at hand. Here, we develop an importance sampling strategy for SVI. Our adaptive stochastic variational inference algorithm (AdaSVI) reweights the sampling distribution to minimize the variance of the stochastic natural gradient. We couple the importance sampling strategy with an adaptive learning rate providing a parameter-free stochastic optimization algorithm where the only user input required is the minibatch size. We study AdaSVI on a matrix factorization model and find that it significantly improves SVI, leading to faster convergence on synthetic data.
|
653 |
The effect of sampling error on the interpretation of a least squares regression relating phosporus and chlorophyllBeedell, David C. (David Charles) January 1995 (has links)
No description available.
|
654 |
Data Compilation and Statistical Analysis of Bachelor of Science in Engineering Graduates at the University of Central FloridaHagerty, June A. 01 January 1985 (has links) (PDF)
The College of Engineering at the University of Central Florida (UCF) required a data set containing the Bachelor of Science in Engineering (BSE) graduates from 1970 on, to be updated each semester. The data set was created in 1984, through the use of the Statistical Analysis System/Full-Screen Product (SAS/FSP), a computer system which allows for easy access to and editing of data values, and the use of SAS programming for statistical analysis of the data set. The data set presently (1985) contains 1483 observations each with 70 variables, such as personal information (age, social security number, ethnic origin), degree information (junior college attended, grade point average, honors), post-graduate information (master and doctorate degrees, first job after graduation), and test results (CLAST, SAT, GRE). The current data was obtained through the Department of institutional Research and transcripts. Because this data set will be in use a long time, a manual has been written that contains a detailed description of (a) the data set and all its variables, (b) the use of the full-screen product with a tutorial, (c) the use of the questionnaires, and (d) the method used to collect data. Five tests were performed on four semesters of graduates, equaling 301 observations. The math and overall grade point averages (GPAs) for transfer and time-shortened-degree (TSD) students were tested against the math and overall GPAs of the general UCF BSE population. It was discovered that the math and overall GPAs of the graduates who received Associate of Arts degrees from Florida community colleges before entering UCF lowered significantly at UCF. The tests also suggested a possible difference in academic approaches between the community college and UCF, and there should be more than just a recognition of the drop in math and overall GPA. The TSD graduates did not perform as well in the math and overall curriculum as might be expected. Recommendations for manual testing, updating of the data set, and further testing with SAS are included.
|
655 |
Uncertainty and Predictability of Seasonal-to-Centennial Climate VariabilityLenssen, Nathan January 2022 (has links)
The work presented in this dissertation is driven by three fundamental questions in climate science: (1) What is the natural variability of our climate system? (2) What components of this variability are predictable? (3) How does climate change affect variability and predictability? Determining the variability and predictability of the chaotic and nonlinear climate system is an inherently challenging problem. Climate scientists face the additional complications from limited and error-filled observational data of the true climate system and imperfect dynamical climate models used to simulate the climate system. This dissertation contains four chapters, each of which explores at least one of the three fundamental questions by providing novel approaches to address the complications.
Chapter 1 examines the uncertainty in the observational record. As surface temperature data is among the highest quality historical records of the Earth’s climate, it is a critical source of information about the natural variability and forced response of the climate system. However, there is still uncertainty in global and regional mean temperature series due to limited and inaccurate measurements. This chapter provides an assessment of the global and regional uncertainty in temperature from 1880-present in the NASA Goddard Institute for Space Studies (GISS) Surface Temperature Analysis (GISTEMP).
Chapter 2 extends the work of Chapter 1 to the regional spatial scale and monthly time scale. An observational uncertainty ensemble of historical global surface temperature is provided for easy use in future studies. Two applications of this uncertainty ensemble are discussed. First, an analysis of recent global and Arctic warming shows that the Arctic is warming four times faster than the rest of the global, updating the oft-provided statistic that Arctic warming is double that of the global rate. Second, the regional uncertainty product is used to provide uncertainty on country-level temperature change estimates from 1950-present.
Chapter 3 investigates the impacts of the El Niño-Southern Oscillation (ENSO) on seasonal precipitation globally. In this study, novel methodology is developed to detect ENSO-precipitation teleconnections while accounting for missing data in the CRU TS historical precipitation dataset. In addition, the predictability of seasonal precipitation is assessed through simple empirical forecasts derived from the historical impacts. These simple forecasts provide significant skill over climatological forecasts for much of the globe, suggesting accurate predictions of ENSO immediately provide skillful forecasts of precipitation for many regions.
Chapter 4 explores the role of initialization shock in long-lead ENSO forecasts. Initialized predictions from the CMIP6 decadal prediction project and uninitialized predictions using an analogue prediction method are compared to assess the role of model biases in climatology and variability on long-lead ENSO predictability. Comparable probabilistic skill is found in the first year between the model-analogs and the initialized dynamical forecasts, but the initialized dynamical forecasts generally show higher skill. The presence of skill in the initialized dynamical forecasts in spite of large initialization shocks suggest that initialization of the subsurface ocean may be a key component of multi-year ENSO skill.
Chapter 5 brings together ideas from the previous chapters through an attribution of historical temperature variability to various anthropogenic and natural sources of variability. The radiative forcing due to greenhouse gas emissions is necessary to explain the observed variability in temperature nearly everywhere on the land surface. Regional fingerprints of anthropogenic aerosols are detected as well as the impact of major sources of natural variability such as ENSO and Atlantic Multidecadal Variability (AMV).
|
656 |
Multivariate Statistical Methods for Testing a Set of Variables Between Groups with Application to GenomicsAlsulami, Abdulhadi Huda 10 1900 (has links)
<p>The use of traditional univariate analyses for comparing groups in high-dimensional genomic studies, such as the ordinary t-test that is typically used to compare two independent groups, might be suboptimal because of methodological challenges including multiple testing problem and failure to incorporate correlation among genes. Hence, multivariate methods are preferred for the joint analysis of a group or set of variables. These methods aim to test for differences in average values of a set of variables across groups. The variables that make the set could be determined statistically (using exploratory methods such as cluster analysis) or biologically (based on membership to known pathways). In this thesis, the traditional One-Way Multivariate Analysis of Variance (MANOVA) method and a robustifed version of MANOVA (Robustifed MANOVA) are compared with respect to Type I error rates and power through a simulation study. We generated data from multivariate normal as well as multivariate gamma distributions with different parameter settings. The methods are illustrated using a real gene expression data. In addition, we investigated a popular method known as Gene Set Enrichment Analysis (GSEA), where sets of genes (variables) that belong to known biological pathways are considered jointly and assessed whether or not they are "enriched" with respect to their association with a disease or phenotype of interest. We applied this method to a real genotype data.</p> / Master of Science (MSc)
|
657 |
Survey design and computer-aided analysis : the 1972 W.I.Y.S. summer surveyEdwardes, Michael D. deB. (Michael David deBurgh), 1952- January 1975 (has links)
No description available.
|
658 |
Load and resistance factor design for wood structuresPenketgorn, Thiwa January 1985 (has links)
Uncertainties in engineering design exist due to the random nature of loads and materials, lack of knowledge, and imperfect modelling of design parameters. Conventional design methods based on deterministic procedures do not always yield designs having consistent safety. In recent years considerable research has been conducted in the use of probability theory for modelling uncertainties in engineering designs and several probabilistic design formats have been developed. Probability based design methods provide a unified procedure applicable to all construction materials, all loads, and all types of uncertainties. Code committees are currently working on the development of the new design codes for various construction materials such as steel, concrete, and wood based on probabilistic concepts.
The objective of this study is to study a probability based design format for wood members. Reliability analysis of wood structural elements such as beams, columns, and beam-columns is conducted, and the risk level is measured by the reliability or safety index, β. Wood members subjected to dead plus live load and dead plus snow load combinations are considered. After conducting a reliability analysis of current designs, a target reliability index is selected. The reliability index is then used in conjunction with the predetermined load factors and load combinations to determine resistance factors. Finally, a design format is proposed for Load and Resistance Factor Design for wood structures. / M.S.
|
659 |
Discrimination of water from shadow regions on radar imagery using computer vision techniquesQian, Jianzhong January 1985 (has links)
Unlike MSS LANDSAT imagery and other photography, the specific characteristics of the intensity of water and shadow in an SAR image make the task of discriminating them extremely difficult. In this thesis, we analyze the reflectivity mechanism of water and shadow on radar imagery and describe a scene analysis system which consists of a texture preserving noise removal procedure as the preprocessing step, a probabilistic relaxation algorithm to do the low level labeling and a spatial reasoning procedure based on a relational model to perform the high level interpretation. The experimental results obtained from the SAR images are presented to illustrate the performance of this system. / M.S.
|
660 |
Frequency analysis of low flows: comparison of a physically based approach and hypothetical distribution methodsMattejat, Peter Paul January 1985 (has links)
Several different approaches are applied in low flow frequency analysis. Each method's theory and application is explained. The methods are (1) physically based recession model dealing with time series, (2) log-Pearson type III and mixed log-Pearson type III using annual minimum series, (3) Double Bounded pdf using annual minimum series, (4) Partial Duration Series applying truncated and censored flows.
Each method has a computer program for application. One day low flow analysis was applied to 15 stations, 10 perennial streams and 5 intermittent streams. The physically based method uses the exponential baseflow recession with duration, initial recession flow, and recharge due to incoming storm as random variables, and shows promise as an alternative to black box methods, and is appealing because it contains the effect of drought length. Log-Pearson is modified to handle zero flows by adding a point mass probability for zero flows. Another approach to zero flows is the Double Bounded probability density function which also includes a point mass probability for zero flows. Maximum likelihood estimation is used to estimate distribution parameters. Partial Duration Series is applied due to drawbacks of using only one low flow per year in annual minimum series. Two approaches were used in Partial Duration Series (i) truncation, and (ii) censorship which represent different low flow populations. The parameters are estimated by maximum likelihood estimation. / M.S.
|
Page generated in 0.0914 seconds