11 |
Statistical data compression by optimal segmentation. Theory, algorithms and experimental results.Steiner, Gottfried 09 1900 (has links) (PDF)
The work deals with statistical data compression or data reduction by a general class of classification methods. The data compression results in a representation of the data set by a partition or by some typical points (called prototypes). The optimization problems are related to minimum variance partitions and principal point problems. A fixpoint method and an adaptive approach is applied for the solution of these problems. The work contains a presentation of the theoretical background of the optimization problems and lists some pseudo-codes for the numerical solution of the data compression. The main part of this work concentrates on some practical questions for carrying out a data compression. The determination of a suitable number of representing points, the choice of an objective function, the establishment of an adjacency structure and the improvement of the fixpoint algorithm belong to the practically relevant topics. The performance of the proposed methods and algorithms is compared and evaluated experimentally. A lot of examples deepen the understanding of the applied methods. (author's abstract)
|
12 |
Efficiency, risk and regulation compliance : applications to Lake Victoria fisheries in Tanzania /Lokina, Razack Bakari, January 1900 (has links) (PDF)
Diss. (sammanfattning) Göteborg : Univ., 2005. / Härtill 3 uppsatser.
|
13 |
Creating space for fishermen's livelihoods : Anlo-Ewe beach seine fishermen's negotiations for livelihood space within multiple governance structures in Ghana /Kraan, Marloes, January 2009 (has links)
Diss. Amsterdam : University, 2009. / DVD title: If you do good : beach seine fishing in Ghana.
|
14 |
Canine health, disease and death : data from a Swedish animal insurance database /Egenvall, Agneta, January 1900 (has links) (PDF)
Diss. (sammanfattning) Uppsala : Sveriges lantbruksuniv. / Härtill 5 uppsatser.
|
15 |
Latent pattern mixture models for binary outcomes /Saba, Laura M. January 2007 (has links)
Thesis (Ph.D. in Biostatistics) -- University of Colorado Denver, 2007. / Typescript. Includes bibliographical references (leaves 70-71). Free to UCD affiliates. Online version available via ProQuest Digital Dissertations;
|
16 |
A comparison of stratified and unstratified modeling for binary logistic regression in the presence of a simulated interactionBeebe, Claire Elizabeth. January 2008 (has links) (PDF)
Thesis--University of Oklahoma. / Bibliography: leaves 48-49.
|
17 |
Atmospheric boundary layer characterizations over Highveld Region South AfricaLuhunga, P.M. (Philbert Modest) 16 May 2013 (has links)
Atmospheric Boundary Layer (ABL) characteristics can be highly complex; the links between spatial and temporal variability of ABL meteorological quantities and existing land use patterns are still poorly understood due to the non-linearity of air-land interaction processes. This study describes the results from Monin Obukhov similarity theory and statistical analysis of meteorological observations collected by a network of ten Automatic Weather Stations (AWSs). The stations were in operation in the Highveld Priority Area (HPA) of the Republic of South Africa during 2008 – 2010. The spatial distribution of stability regimes as presented by both bulk Richardson number (BRN) and Obukhov length (L) indicates that HPA is dominated by strong stability regime. The momentum and heat fluxes show no significant spatial variation between stations. Statistical analysis revealed localization, enhancement and homogenization in the inter-station variability of observed meteorological quantities (temperature, relative humidity and wind speed) over diurnal and seasonal cycles. Enhancement of the meteorological spatial variability was found on a broad range of scales from 20 to 50 km during morning hours and in the dry winter season. These spatial scales are comparable to scales of observed land use heterogeneity, which suggests links between atmospheric variability and land use patterns through excitation of horizontal meso-scale circulations. Convective motions homogenized and synchronized meteorological variability during afternoon hours in the winter seasons, and during large parts of the day during the moist summer season. The analysis also revealed that turbulent convection overwhelms horizontal meso-scale circulations in the study area during extensive parts of the annual cycle / Dissertation (MSc)--University of Pretoria, 2013. / Geography, Geoinformatics and Meteorology / Unrestricted
|
18 |
RISK INTERPRETATION OF DIFFERENTIAL PRIVACYJiajun Liang (13190613) 31 July 2023 (has links)
<p><br></p><p>How to set privacy parameters is a crucial problem for the consistent application of DP in practice. The current privacy parameters do not provide direct suggestions for this problem. On the other hand, different databases may have varying degrees of information leakage, allowing attackers to enhance their attacks with the available information. This dissertation provides an additional interpretation of the current DP notions by introducing a framework that directly considers the worst-case average failure probability of attackers under different levels of knowledge. </p><p><br></p><p>To achieve this, we introduce a novel measure of attacker knowledge and establish a dual relationship between (type I error, type II error) and (prior, average failure probability). By leveraging this framework, we propose an interpretable paradigm to consistently set privacy parameters on different databases with varying levels of leaked information. </p><p><br></p><p>Furthermore, we characterize the minimax limit of private parameter estimation, driven by $1/(n(1-2p))^2+1/n$, where $p$ represents the worst-case probability risk and $n$ is the number of data points. This characterization is more interpretable than the current lower bound $\min{1/(n\epsilon^2),1/(n\delta^2)}+1/n$ on $(\epsilon,\delta)$-DP. Additionally, we identify the phase transition of private parameter estimation based on this limit and provide suggestions for protocol designs to achieve optimal private estimations. </p><p><br></p><p>Last, we consider a federated learning setting where the data are stored in a distributed manner and privacy-preserving interactions are required. We extend the proposed interpretation to federated learning, considering two scenarios: protecting against privacy breaches against local nodes and protecting privacy breaches against the center. Specifically, we consider a non-convex sparse federated parameter estimation problem and apply it to the generalized linear models. We tackle two challenges in this setting. Firstly, we encounter the issue of initialization due to the privacy requirements that limit the number of queries to the database. Secondly, we overcome the heterogeneity in the distribution among local nodes to identify low-dimensional structures.</p>
|
19 |
Pragmatic Statistical Approaches for Power Analysis, Causal Inference, and Biomarker DetectionFan Wu (16536675) 26 July 2023 (has links)
<p>Mediation analyses play a critical role in social and personality psychology research. However, current approaches for assessing power and sample size in mediation models have limitations, particularly when dealing with complex mediation models and multiple mediator sequential models. These limitations stem from limited software options and the substantial computational time required. In this part, we address these challenges by extending the joint significance test and product of coefficients test to incorporate the fourth-pathed mediated effect and generalized kth-pathed mediated effect. Additionally, we propose a model-based bootstrap method and provide convenient R tools for estimating power in complex mediation models. Through our research, we demonstrate that power decreases as the number of mediators increases and as the influence of coefficients varies. We summarize our results and discuss the implications of power analysis in relation to mediator complexity and coefficient influence. We provide insights for researchers seeking to optimize study designs and enhance the reliability of their findings in complex mediation models. </p>
<p>Matching is a crucial step in causal inference, as it allows for more robust and reasonable analyses by creating better-matched pairs. However, in real-world scenarios, data are often collected and stored by different local institutions or separate departments, posing challenges for effective matching due to data fragmentation. Additionally, the harmonization of such data needs to prioritize privacy preservation. In this part, we propose a new hierarchical framework that addresses these challenges by implementing differential privacy on raw data to protect sensitive information while maintaining data utility. We also design a data access control system with three different access levels for designers based on their roles, ensuring secure and controlled access to the matched datasets. Simulation studies and analyses of datasets from the 2017 Atlantic Causal Inference Conference Data Challenge are conducted to showcase the flexibility and utility of our framework. Through this research, we contribute to the advancement of statistical methodologies in matching and privacy-preserving data analysis, offering a practical solution for data integration and privacy protection in causal inference studies. </p>
<p>Biomarker discovery is a complex and resource-intensive process, encompassing discovery, qualification, verification, and validation stages prior to clinical evaluation. Streamlining this process by efficiently identifying relevant biomarkers in the discovery phase holds immense value. In this part, we present a likelihood ratio-based approach to accurately identify truly relevant protein markers in discovery studies. Leveraging the observation of unimodal underlying distributions of expression profiles for irrelevant markers, our method demonstrates promising performance when evaluated on real experimental data. Additionally, to address non-normal scenarios, we introduce a kernel ratio-based approach, which we evaluate using non-normal simulation settings. Through extensive simulations, we observe the high effectiveness of the kernel method in discovering the set of truly relevant markers, resulting in precise biomarker identifications with elevated sensitivity and a low empirical false discovery rate. </p>
|
20 |
Analyzing The Community Structure Of Web-like Networks: Models And AlgorithmsCami, Aurel 01 January 2005 (has links)
This dissertation investigates the community structure of web-like networks (i.e., large, random, real-life networks such as the World Wide Web and the Internet). Recently, it has been shown that many such networks have a locally dense and globally sparse structure with certain small, dense subgraphs occurring much more frequently than they do in the classical Erdös-Rényi random graphs. This peculiarity--which is commonly referred to as community structure--has been observed in seemingly unrelated networks such as the Web, email networks, citation networks, biological networks, etc. The pervasiveness of this phenomenon has led many researchers to believe that such cohesive groups of nodes might represent meaningful entities. For example, in the Web such tightly-knit groups of nodes might represent pages with a common topic, geographical location, etc., while in the neural networks they might represent evolved computational units. The notion of community has emerged in an effort to formalize the empirical observation of the locally dense globally sparse structure of web-like networks. In the broadest sense, a community in a web-like network is defined as a group of nodes that induces a dense subgraph which is sparsely linked with the rest of the network. Due to a wide array of envisioned applications, ranging from crawlers and search engines to network security and network compression, there has recently been a widespread interest in finding efficient community-mining algorithms. In this dissertation, the community structure of web-like networks is investigated by a combination of analytical and computational techniques: First, we consider the problem of modeling the web-like networks. In the recent years, many new random graph models have been proposed to account for some recently discovered properties of web-like networks that distinguish them from the classical random graphs. The vast majority of these random graph models take into account only the addition of new nodes and edges. Yet, several empirical observations indicate that deletion of nodes and edges occurs frequently in web-like networks. Inspired by such observations, we propose and analyze two dynamic random graph models that combine node and edge addition with a uniform and a preferential deletion of nodes, respectively. In both cases, we find that the random graphs generated by such models follow power-law degree distributions (in agreement with the degree distribution of many web-like networks). Second, we analyze the expected density of certain small subgraphs--such as defensive alliances on three and four nodes--in various random graphs models. Our findings show that while in the binomial random graph the expected density of such subgraphs is very close to zero, in some dynamic random graph models it is much larger. These findings converge with our results obtained by computing the number of communities in some Web crawls. Next, we investigate the computational complexity of the community-mining problem under various definitions of community. Assuming the definition of community as a global defensive alliance, or a global offensive alliance we prove--using transformations from the dominating set problem--that finding optimal communities is an NP-complete problem. These and other similar complexity results coupled with the fact that many web-like networks are huge, indicate that it is unlikely that fast, exact sequential algorithms for mining communities may be found. To handle this difficulty we adopt an algorithmic definition of community and a simpler version of the community-mining problem, namely: find the largest community to which a given set of seed nodes belong. We propose several greedy algorithms for this problem: The first proposed algorithm starts out with a set of seed nodes--the initial community--and then repeatedly selects some nodes from community's neighborhood and pulls them in the community. In each step, the algorithm uses clustering coefficient--a parameter that measures the fraction of the neighbors of a node that are neighbors themselves--to decide which nodes from the neighborhood should be pulled in the community. This algorithm has time complexity of order , where denotes the number of nodes visited by the algorithm and is the maximum degree encountered. Thus, assuming a power-law degree distribution this algorithm is expected to run in near-linear time. The proposed algorithm achieved good accuracy when tested on some real and computer-generated networks: The fraction of community nodes classified correctly is generally above 80% and often above 90% . A second algorithm based on a generalized clustering coefficient, where not only the first neighborhood is taken into account but also the second, the third, etc., is also proposed. This algorithm achieves a better accuracy than the first one but also runs slower. Finally, a randomized version of the second algorithm which improves the time complexity without affecting the accuracy significantly, is proposed. The main target application of the proposed algorithms is focused crawling--the selective search for web pages that are relevant to a pre-defined topic.
|
Page generated in 0.0833 seconds