Global ETD Search

1	Nonparametric methods of assessing spatial isotropy Guan, Yong Tao 15 November 2004 (has links) A common requirement for spatial analysis is the modeling of the second-order structure. While the assumption of isotropy is often made for this structure, it is not always appropriate. A conventional practice to check for isotropy is to informally assess plots of direction-specific sample second-order properties, e.g., sample variogram or sample second-order intensity function. While a useful diagnostic, these graphical techniques are difficult to assess and open to interpretation. Formal alternatives to graphical diagnostics are valuable, but have been applied to a limited class of models. In this dissertation, we propose a formal approach testing for isotropy that is both objective and appropriate for a wide class of models. This approach, which is based on the asymptotic joint normality of the sample second-order properties, can be used to compare these properties in multiple directions. An L_2 consistent subsampling estimator for the asymptotic covariance matrix of the sample second-order properties is derived and used to construct the test statistic with a limiting $\\chi^2$ distribution under the null hypothesis. Our testing approach is purely nonparametric and can be applied to both quantitative spatial processes and spatial point processes. For quantitative processes, the results apply to both regularly spaced and irregularly spaced data when the point locations are generated by a homogeneous point process. In addition, the shape of the random field can be quite irregular. Examples and simulations demonstrate the efficacy of the approach. spatial isotropy subsampling
2	Massive data K-means clustering and bootstrapping via A-optimal Subsampling Zhou, Dali 08 1900 (has links) Purdue University West Lafayette (PUWL) / For massive data analysis, the computational bottlenecks exist in two ways. Firstly, the data could be too large that it is not easy to store and read. Secondly, the computation time could be too long. To tackle these problems, parallel computing algorithms like Divide-and-Conquer were proposed, while one of its drawbacks is that some correlations may be lost when the data is divided into chunks. Subsampling is another way to simultaneously solve the problems of the massive data analysis while taking correlation into consideration. The uniform sampling is simple and fast, but it is inefficient, see detailed discussions in Mahoney (2011) and Peng and Tan (2018). The bootstrap approach uses uniform sampling and is computing time in- tensive, which will be enormously challenged when data size is massive. k-means clustering is standard method in data analysis. This method does iterations to find centroids, which would encounter difficulty when data size is massive. In this thesis, we propose the approach of optimal subsampling for massive data bootstrapping and massive data k-means clustering. We seek the sampling distribution which minimize the trace of the variance co-variance matrix of the resulting subsampling estimators. This is referred to as A-optimal in the literature. We define the optimal sampling distribution by minimizing the sum of the component variances of the subsampling estimators. We show the subsampling k-means centroids consistently approximates the full data centroids, and prove the asymptotic normality using the empirical pro- cess theory. We perform extensive simulation to evaluate the numerical performance of the proposed optimal subsampling approach through the empirical MSE and the running times. We also applied the subsampling approach to real data. For massive data bootstrap, we conducted a large simulation study in the framework of the linear regression based on the A-optimal theory proposed by Peng and Tan (2018). We focus on the performance of confidence intervals computed from A-optimal sub- sampling, including coverage probabilities, interval lengths and running times. In both bootstrap and clustering we compared the A-optimal subsampling with uniform subsampling. Kmeans Bootstrap Subsampling
3	Massive Data K-means Clustering and Bootstrapping via A-optimal Subsampling Dali Zhou (6569396) 16 August 2019 (has links) For massive data analysis, the computational bottlenecks exist in two ways. Firstly, the data could be too large that it is not easy to store and read. Secondly, the computation time could be too long. To tackle these problems, parallel computing algorithms like Divide-and-Conquer were proposed, while one of its drawbacks is that some correlations may be lost when the data is divided into chunks. Subsampling is another way to simultaneously solve the problems of the massive data analysis while taking correlation into consideration. The uniform sampling is simple and fast, but it is inefficient, see detailed discussions in Mahoney (2011) and Peng and Tan (2018). The bootstrap approach uses uniform sampling and is computing time intensive, which will be enormously challenged when data size is massive. <i>k</i>-means clustering is standard method in data analysis. This method does iterations to find centroids, which would encounter difficulty when data size is massive. In this thesis, we propose the approach of optimal subsampling for massive data bootstrapping and massive data <i>k</i>-means clustering. We seek the sampling distribution which minimize the trace of the variance co-variance matrix of the resulting subsampling estimators. This is referred to as A-optimal in the literature. We define the optimal sampling distribution by minimizing the sum of the component variances of the subsampling estimators. We show the subsampling<i> k</i>-means centroids consistently approximates the full data centroids, and prove the asymptotic normality using the empirical process theory. We perform extensive simulation to evaluate the numerical performance of the proposed optimal subsampling approach through the empirical MSE and the running times. We also applied the subsampling approach to real data. For massive data bootstrap, we conducted a large simulation study in the framework of the linear regression based on the A-optimal theory proposed by Peng and Tan (2018). We focus on the performance of confidence intervals computed from A-optimal subsampling, including coverage probabilities, interval lengths and running times. In both bootstrap and clustering we compared the A-optimal subsampling with uniform subsampling. Statistics K-means Bootstrap subsampling
4	Software Defined Radio for Maritime Collision Avoidance Applications Humphris, Les January 2015 (has links) The design and development of a software defined radio (SDR) receiver prototype has been completed. The goal is to replace the existing automatic identification system (AIS) manufactured by Vesper Marine with a software driven system that reduces costs and provides a high degree of reconfigurability. One of the key concepts of the SDR is the consideration of directly digitizing the radio frequency (RF) signal using subsampling. This idea arises from the ambition to implement an analog-to-digital converter (ADC) as close to the antenna interface as practically possible. Thus, majority of the RF processing is encapsulated within in the digital domain. Evaluation of a frequency planning strategy that utilizes a combination of subsampling and oversampling will illustrate how the maritime bandwidth is aliased to a lower frequency. An analog front-end (AFE) board was constructed to implement the frequency planning strategy so that the digitized bandwidth can be streamed into a field programmable gate array (FPGA) for real-time processing. Research is shown on digital front-end (DFE) techniques that condition the digitized maritime signal for baseband processing. The process of a digital down converter (DDC) is conducted by an FPGA, which acquired the in-phase and quadrature signals. By implementing a digital signal processor (DSP) for baseband processing, demodulation on an AIS test signal is evaluated. The SDR prototype achieved a receiver sensitivity of -113dBm, outperforming the required sensitivity of -107dBm specified in the International Electrotechnical Commission (IEC) 62287-1 standard for AIS applications [1]. SDR Software Defined Radio AIS Subsampling undersampling
5	Sample Size Determination in Multivariate Parameters With Applications to Nonuniform Subsampling in Big Data High Dimensional Linear Regression Wang, Yu 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Subsampling is an important method in the analysis of Big Data. Subsample size determination (SSSD) plays a crucial part in extracting information from data and in breaking the challenges resulted from huge data sizes. In this thesis, (1) Sample size determination (SSD) is investigated in multivariate parameters, and sample size formulas are obtained for multivariate normal distribution. (2) Sample size formulas are obtained based on concentration inequalities. (3) Improved bounds for McDiarmid’s inequalities are obtained. (4) The obtained results are applied to nonuniform subsampling in Big Data high dimensional linear regression. (5) Numerical studies are conducted. The sample size formula in univariate normal distribution is a melody in elementary statistics. It appears that its generalization to multivariate normal (or more generally multivariate parameters) hasn’t been caught much attention to the best of our knowledge. In this thesis, we introduce a definition for SSD, and obtain explicit formulas for multivariate normal distribution, in gratifying analogy of the sample size formula in univariate normal. Commonly used concentration inequalities provide exponential rates, and sample sizes based on these inequalities are often loose. Talagrand (1995) provided the missing factor to sharpen these inequalities. We obtained the numeric values of the constants in the missing factor and slightly improved his results. Furthermore, we provided the missing factor in McDiarmid’s inequality. These improved bounds are used to give shrunken sample sizes. Sample size determination Concentration inequality Subsampling
6	Optimal Subsampling of Finite Mixture Distribution Neupane, Binod Prasad 05 1900 (has links) <p> A mixture distribution is a compounding of statistical distributions, which arises when sampling from heterogeneous populations with a different probability density function in each component. A finite mixture has a finite number of components. In the past decade the extent and the potential of the applications of finite mixture models have widened considerably.</p> <p> The objective of this project is to add some functionalities to a package 'mixdist' developed by Du and Macdonald (Du 2002) and Gao (2004) in the R environment (R Development Core Team 2004) for estimating the parameters of a finite mixture distribution with data grouped in bins and conditional data. Mixed data together with conditional data will provide better estimates of parameters than do mixed data alone. Our main objective is to obtain the optimal sample size for each bin of the mixed data to obtain conditional data, given approximate values of parameters and the distributional form of the mixture for the given data. We have also replaced the dependence of the function mix upon the optimizer nlm to optimizer optim to provide the limits to the parameters.</p> <p> Our purpose is to provide easily available tools to modeling fish growth using mixture distribution. However, it has a number of applications in other areas as well.</p> / Thesis / Master of Science (MSc)
7	Inferens av effektutvärdering på retrospektiva data : – baserad på en applikation om samverkan inom Resursteam i Uppsala 2004-2007 Avdic, Daniel January 2008 (has links) <p>Matchning av observationer på bakgrundsvariabler är en metod att i praktiken undvika de problem som uppstår när vi i icke-deterministiska observationsstudier vill mäta potentiella effekter av exem-pelvis projektsatsningar inom sjukvården. Problemet är att vi är begränsade i våra val till potentiella kontrollindivider eftersom dessa ofta skiljer sig från de behandlade individerna i fråga om faktorer som ofta är högt korrelerade med effektvariabeln som vi är intresserade av. Med hjälp av matchning på retrospektiva data kan vi dock ändå estimera kontrafaktiska individer som därefter kan användas som kontroller i estimationen av behandlingseffekten. Inferens av denna effektutvärderingsestimator är dock problematisk då vissa individer potentiellt kan användas flera gånger i analysen och skapa sned-vridning i variansestimatorn. Istället motiveras i denna uppsats en alternativ metod att göra inferens; via subsampling. Utgångspunkten för denna procedur är att asymptotiskt skatta den empiriska fördel-ningen för estimatorn av intresse genom resampling och därefter göra inferens via denna approxima-tion. Uppsatsens empiriska del är baserad på en applikation om Resursteam som har som övergripande mål att förkorta sjukdomsperioderna för individer som ligger i riskzonen för att drabbas av långtids-sjukskrivningar. Som jämförelse används en tidigare utvärdering av Resursteam där ett annat län an-vänds för att välja lämpliga kontroller.</p> Matchning inferens resampling subsampling effektutvärdering observationsstudie Economics Nationalekonomi
8	Inferens av effektutvärdering på retrospektiva data : – baserad på en applikation om samverkan inom Resursteam i Uppsala 2004-2007 Avdic, Daniel January 2008 (has links) Matchning av observationer på bakgrundsvariabler är en metod att i praktiken undvika de problem som uppstår när vi i icke-deterministiska observationsstudier vill mäta potentiella effekter av exem-pelvis projektsatsningar inom sjukvården. Problemet är att vi är begränsade i våra val till potentiella kontrollindivider eftersom dessa ofta skiljer sig från de behandlade individerna i fråga om faktorer som ofta är högt korrelerade med effektvariabeln som vi är intresserade av. Med hjälp av matchning på retrospektiva data kan vi dock ändå estimera kontrafaktiska individer som därefter kan användas som kontroller i estimationen av behandlingseffekten. Inferens av denna effektutvärderingsestimator är dock problematisk då vissa individer potentiellt kan användas flera gånger i analysen och skapa sned-vridning i variansestimatorn. Istället motiveras i denna uppsats en alternativ metod att göra inferens; via subsampling. Utgångspunkten för denna procedur är att asymptotiskt skatta den empiriska fördel-ningen för estimatorn av intresse genom resampling och därefter göra inferens via denna approxima-tion. Uppsatsens empiriska del är baserad på en applikation om Resursteam som har som övergripande mål att förkorta sjukdomsperioderna för individer som ligger i riskzonen för att drabbas av långtids-sjukskrivningar. Som jämförelse används en tidigare utvärdering av Resursteam där ett annat län an-vänds för att välja lämpliga kontroller. Matchning inferens resampling subsampling effektutvärdering observationsstudie Economics Nationalekonomi
9	Evolution of Silica Biomineralizing Plankton Kotrc, Benjamin 18 September 2013 (has links) The post-Paleozoic history of the silica cycle involves just two groups of marine plankton, radiolarians and diatoms. I apply paleobiological methods to better understand the Cenozoic evolution of both groups. The Cenozoic rise in diatom diversity has long been related to a concurrent decline in radiolarian test silicification. I address evolutionary questions on both sides of this coevolutionary coin: Was the taxonomic diversification of diatoms accompanied by morphological diversification? Is our view of morphological diatom diversification affected by sampling biases? What evolutionary mechanisms underlie the macroevolutionary decline in radiolarian silicification? Conventionally, diatom diversification describes a steep, monotonic rise, a view recently questioned due to sampling bias. For a different perspective, I constructed a diatom morphospace based on discrete characters, populated through time using an occurrence-level database. Distances between taxa in morphospace and on a molecular phylogeny are not strongly correlated, suggesting that morphospace was explored early in their evolutionary history, followed by relative stasis. I quantified morphospace occupancy through time using several disparity metrics. Metrics describing average separation of taxa show stasis, while metrics describing occupied volume show an increase with time. Disparity metrics are also subject to sampling biases. Under subsampling, I find that disparity metrics show varied responses: metrics describing separation of taxa into morphospace are unaffected, while those describing occupied volume lose their clear increases. Disparity can have geographic components, analogous to $\alpha$ and $\beta$ taxonomic diversity; I find more evidence of stasis in an analysis of $\bar{\alpha}$ disparity. Overall, these results suggest stasis in Cenozoic diatom disparity. The radiolarian decline in silicification could result from either macroevolutionary processes operating above the species level (punctuated queilibria) or anagenetic changes within lineages. I measured silicification in three phyletic lineages, Stichocorys, Didymocyrtis, and Centrobotrys, from four tropical Pacific DSDP sites. Likelihood-based model fitting finds no strong support for directional evolution, pointing toward selection among species, rather than within species. Each lineage shows a different trajectory, perhaps due to differences in the ecological role played by the test. Because Stichocorys shows close correspondence to the assemblage-level trend, abundance may be an important factor through which within-lineage changes can influence the macroevolutionary pattern. / Earth and Planetary Sciences Paleontology Geobiology cenozoic diatoms morphospace radiolarians silica subsampling
10	Charge-domain sampling of high-frequency signals with embedded filtering Karvonen, S. (Sami) 18 January 2006 (has links) Abstract Subsampling can be used in a radio receiver to perform signal downconversion and sample-and-hold operations in order to relieve the operation frequency and bandwidth requirements of the subsequent discrete-time circuitry. However, due to the inherent aliasing behaviour of wideband noise and interference in subsampling, and the difficulty of implementing appropriate bandpass anti-aliasing filtering at high frequencies, straightforward use of a low subsampling rate can result in significant degradation of the receiver dynamic range. The aim of this thesis is to investigate and implement methods for integrating filtering into high-frequency signal sampling and downconversion by subsampling to alleviate the requirements for additional front-end filters and to mitigate the effects of noise and out-of-band signal aliasing, thereby facilitating use in integrated high-quality radio receivers. The charge-domain sampling technique studied here allows simple integration of both continuous-and discrete-time filtering functions into high-frequency signal sampling. Gated current integration results in a lowpass sin(x)/x(sinc(x)) response capable of performing built-in anti-aliasing filtering in baseband signal sampling. Weighted integration of several successive current samples can be further used to obtain an embedded discrete-time finite-impulse-response (FIR) filtering response, which can be used for internal anti-aliasing and image-rejection filtering in the downconversion of bandpass signals by subsampling. The detailed analysis of elementary charge-domain sampling circuits presented here shows that the use of integrated FIR filtering with subsampling allows acceptable noise figures to be achieved and can provide effective internal anti-aliasing rejection. The new methods for increasing the selectivity of elementary charge-domain sampling circuits presented here enable the integration of advanced, digitally programmable FIR filtering functions into high-frequency signal sampling, thereby markedly relieving the requirements for additional anti-aliasing, image rejection and possibly even channel selection filters in a radio receiver. BiCMOS and CMOS IF sampler implementations are presented in order to demonstrate the feasibility of the charge-domain sampling technique for integrated anti-aliasing and image-rejection filtering in IF signal quadrature downconversion by subsampling. Circuit measurements show that this sampling technique for built-in filtering results in an accurate frequency response and allows the use of high subsampling ratios while still achieving a competitive dynamic range. FIR filtering charge-domain quadrature downconversion radio receivers sampling subsampling

Search results