Global ETD Search

161	A conditional view of causality Weinert, Friedel January 2007 (has links) No / Causal inference is perhaps the most important form of reasoning in the sciences. A panoply of disciplines, ranging from epidemiology to biology, from econometrics to physics, make use of probability and statistics to infer causal relationships. The social and health sciences analyse population-level data using statistical methods to infer average causal relations. In diagnosis of disease, probabilistic statements are based on population-level causal knowledge combined with knowledge of a particular person¿s symptoms. For the physical sciences, the Salmon-Dowe account develops an analysis of causation based on the notion of process and interaction. In artificial intelligence, the development of graphical methods has leant impetus to a probabilistic analysis of causality. The biological sciences use probabilistic methods to look for evolutionary causes of the state of a current species and to look for genetic causal factors. This variegated situation raises at least two fundamental philosophical issues: about the relation between causality and probability, and about the interpretation of probability in causal analysis. In this book we bring philosophers and scientists together to discuss the relation between causality and probability, and the applications of these concepts within the sciences. Causality ; Probability ; Causal inference ; Causal relationships
162	A logical typology for nominal compounds Modlin, Russell Garvin 10 January 2009 (has links) Semantic analysis of nominal compounds includes characterizing the semantic relationships implicit among the separate elements of nominal compounds. This thesis presents a typology for nominal compound that classifies the binary relationships implicit in nominal compounds according to their status with regard to the logical properties of transitivity and symmetry. Employing theorems from modal logic, the categories of this logical typology assist in descriptively characterizing the semantics of nominal compounds by supporting inferences concerning the sharing of properties between objects related within nominal compounds. The individual categories of the logical typology are detailed as well as the types of inferences each category supports, but actual inferences about the sharing of properties between related objects cannot be made without a general knowledge data base containing information regarding the attributes of related objects. This thesis additionally describes an implemented computer system that classifies nominal compounds in the categories of the logical typology on the basis of syntactic information concerning the nominals and the taxonomic types of their referents. / Master of Science semantics inference logic LD5655.V855 1995.M635
163	Reframing the reproducibility crisis: using an error-statistical account to inform the interpretation of replication results in psychological research Parker, Caitlin Grace 17 June 2015 (has links) Experimental psychology is said to be having a reproducibility crisis, marked by a low rate of successful replication. Researchers attempting to respond to the problem lack a framework for consistently interpreting the results of statistical tests, as well as standards for judging the outcomes of replication studies. In this paper I introduce an error-statistical framework for addressing these issues. I demonstrate how the severity requirement (and the associated severity construal of test results) can be used to avoid fallacious inferences that are complicit in the perpetuation of unreliable results. Researchers, I argue, must probe for error beyond the statistical level if they want to support substantive hypotheses. I then suggest how severity reasoning can be used to address standing questions about the interpretation of replication results. / Master of Arts statistics replicability reproducibility psychology scientific inference
164	Inferring Signal Transduction Pathways from Gene Expression Data using Prior Knowledge Aggarwal, Deepti 03 September 2015 (has links) Plants have developed specific responses to external stimuli such as drought, cold, high salinity in soil, and precipitation in addition to internal developmental stimuli. These stimuli trigger signal transduction pathways in plants, leading to cellular adaptation. A signal transduction pathway is a network of entities that interact with one another in response to given stimulus. Such participating entities control and affect gene expression in response to stimulus . For computational purposes, a signal transduction pathway is represented as a network where nodes are biological molecules. The interaction of two nodes is a directed edge. A plethora of research has been conducted to understand signal transduction pathways. However, there are a limited number of approaches to explore and integrate signal transduction pathways. Therefore, we need a platform to integrate together and to expand the information of each signal transduction pathway. One of the major computational challenges in inferring signal transduction pathways is that the addition of new nodes and edges can affect the information flow between existing ones in an unknown manner. Here, I develop the Beacon inference engine to address these computational challenges. This software engine employs a network inference approach to predict new edges. First, it uses mutual information and context likelihood relatedness to predict edges from gene expression time-series data. Subsequently, it incorporates prior knowledge to limit false-positive predictions. Finally, a naive Bayes classifier is used to predict new edges. The Beacon inference engine predicts new edges with a recall rate 77.6% and precision 81.4%. 24% of the total predicted edges are new i.e., they are not present in the prior knowledge. / Master of Science Signal Transduction Pathways Gene Expression Inference Engine
165	Exploiting Update Leakage in Searchable Symmetric Encryption Haltiwanger, Jacob Sayid 15 March 2024 (has links) Dynamic Searchable Symmetric Encryption (DSSE) provides efficient techniques for securely searching and updating an encrypted database. However, efficient DSSE schemes leak some sensitive information to the server. Recent works have implemented forward and backward privacy as security properties to reduce the amount of information leaked during update operations. Many attacks have shown that leakage from search operations can be abused to compromise the privacy of client queries. However, the attack literature has not rigorously investigated techniques to abuse update leakage. In this work, we investigate update leakage under DSSE schemes with forward and backward privacy from the perspective of a passive adversary. We propose two attacks based on a maximum likelihood estimation approach, the UFID Attack and the UF Attack, which target forward-private DSSE schemes with no backward privacy and Level 2 backward privacy, respectively. These are the first attacks to show that it is possible to leverage the frequency and contents of updates to recover client queries. We propose a variant of each attack which allows the update leakage to be combined with search pattern leakage to achieve higher accuracy. We evaluate our attacks against a real-world dataset and show that using update leakage can improve the accuracy of attacks against DSSE schemes, especially those without backward privacy. / Master of Science / Remote data storage is a ubiquitous application made possible by the prevalence of cloud computing. Dynamic Symmetric Searchable Encryption (DSSE) is a privacy-preserving technique that allows a client to search and update a remote encrypted database while greatly restricting the information the server can learn about the client's data and queries. However, all efficient DSSE schemes have some information leakage that can allow an adversarial server to infringe upon the privacy of clients. Many prior works have studied the dangers of leakage caused by the search operation, but have neglected the leakage from update operations. As such, researchers have been unsure about whether update leakage poses a threat to user privacy. To address this research gap, we propose two new attacks which exploit leakage from DSSE update operations. Our attacks are aimed at learning what keywords a client is searching and updating, even in DSSE schemes with forward and backward privacy, two security properties implemented by the strongest DSSE schemes. Our UFID Attack compromises forward-private schemes while our UF Attack targets schemes with both forward privacy and Level 2 backward privacy. We evaluate our attacks on a real-world dataset and show that they efficiently compromise client query privacy under realistic conditions. Searchable Encryption Inference Analysis Database Privacy
166	Causal Network ANOVA and Tree Model Explainability Zhongli Jiang (18848698) 24 June 2024 (has links) <p dir="ltr"><i>In this dissertation, we present research results on two independent projects, one on </i><i>analysis of variance of multiple causal networks and the other on feature-specific coefficients </i><i>of determination in tree ensembles.</i></p> Statistics not elsewhere classified Causal Inference interpretability
167	Development of a Performance Index for Stormwater Pipe Infrastructure using Fuzzy Inference Method Velayutham Kandasamy, Vivek Prasad 30 June 2017 (has links) Stormwater pipe infrastructure collects and conveys surface runoff resulting from rainfall or snowmelt to nearby streams. Traditionally, stormwater pipe systems were integrated with wastewater infrastructure through a combined sewer system. Many of these systems are being separated due to the impact of environmental laws and regulations; and the same factors have led to the creation of stormwater utilities. However, in the current ASCE Infrastructure Report Card, stormwater infrastructure is considered a sub-category of wastewater infrastructure. Stormwater infrastructure has always lacked attention compared to water and wastewater infrastructure. However, this notion has begun to shift, as aging stormwater pipes coupled with changes in climatic patterns and urban landscapes makes stormwater infrastructure more complex to manage. These changes and lack of needed maintenance has resulted in increased rates of deterioration and capacity. Stormwater utility managers have limited resources and funds to manage their pipe system. To effectively make decisions on allocating limited resources and funds, a utility should be able to understand and assess the performance of its pipe system. There is no standard rating system or comprehensive list of performance parameters for stormwater pipe infrastructure. Previous research has identified performance parameters affecting stormwater pipes and developed a performance index using a weighted factor method. However, the weighted performance index model does not capture interdependencies between performance parameters. This research developed a comprehensive list of parameters affecting stormwater pipe performance. This research also developed a performance index using fuzzy inference method to capture interdependencies among parameters. The performance index was evaluated and validated with the pipe ratings provided by one stormwater utility to document its effectiveness in real world conditions. / Master of Science / Stormwater pipe infrastructure collects and conveys the surface water resulting from rainfall or snowmelt to nearby streams. Traditionally, stormwater pipe system was integrated with wastewater infrastructure by combined sewer system. Environmental regulations forced creation of stormwater utilities and separate stormwater system, however, according to ASCE infrastructure report, stormwater infrastructure has been considered a sub-category of wastewater infrastructure. Stormwater infrastructure has always lacked attention compared to water and wastewater infrastructure. However, this notion has to shift, as aging stormwater pipes coupled with changes in climatic patterns and urban landscapes makes stormwater infrastructure complex to manage resulting in increased rate of deterioration and design capacity. Stormwater utility managers have limited resources and funds to manage their pipe system. To effectively make decisions on allocating limited resources and funds, a utility should be able to understand and assess the performance of its pipe system. There is no standard rating system for assessing the condition of stormwater pipe infrastructure. This research developed an index using fuzzy inference method to capture the interdependencies. Fuzzy inference method basically captures the interdependencies between parameters using if-then rule statements. Parameters are individual elements affecting the performance of stormwater pipes. The performance index was evaluated and validated with the pipe ratings provided by one stormwater utility to document its effectiveness in real world conditions. Stormwater Pipes Performance Index Fuzzy Inference System
168	Generalization of prior information for rapid Bayesian time estimation Roach, N.W., McGraw, Paul V., Whitaker, David J., Heron, James 2016 December 1922 (has links) Yes / To enable effective interaction with the environment, the brain combines noisy sensory information with expectations based on prior experience. There is ample evidence showing that humans can learn statistical regularities in sensory input and exploit this knowledge to improve perceptual decisions and actions. However, fundamental questions remain regarding how priors are learned and how they generalize to different sensory and behavioral contexts. In principle, maintaining a large set of highly specific priors may be inefficient and restrict the speed at which expectations can be formed and updated in response to changes in the environment. However, priors formed by generalizing across varying contexts may not be accurate. Here, we exploit rapidly induced contextual biases in duration reproduction to reveal how these competing demands are resolved during the early stages of prior acquisition. We show that observers initially form a single prior by generalizing across duration distributions coupled with distinct sensory signals. In contrast, they form multiple priors if distributions are coupled with distinct motor outputs. Together, our findings suggest that rapid prior acquisition is facilitated by generalization across experiences of different sensory inputs but organized according to how that sensory information is acted on. Bayesian inference Time perception Sensorimotor learning
169	Statistical Learning for Sequential Unstructured Data Xu, Jingbin 30 July 2024 (has links) Unstructured data, which cannot be organized into predefined structures, such as texts, human behavior status, and system logs, often presented in a sequential format with inherent dependencies. Probabilistic model are commonly used to capture these dependencies in the data generation process through latent parameters and can naturally extend into hierarchical forms. However, these models rely on the correct specification of assumptions about the sequential data generation process, which often limits their scalable learning abilities. The emergence of neural network tools has enabled scalable learning for high-dimensional sequential data. From an algorithmic perspective, efforts are directed towards reducing dimensionality and representing unstructured data units as dense vectors in low-dimensional spaces, learned from unlabeled data, a practice often referred to as numerical embedding. While these representations offer measures of similarity, automated generalizations, and semantic understanding, they frequently lack the statistical foundations required for explicit inference. This dissertation aims to develop statistical inference techniques tailored for the analysis of unstructured sequential data, with their application in the field of transportation safety. The first part of dissertation presents a two-stage method. It adopts numerical embedding to map large-scale unannotated data into numerical vectors. Subsequently, a kernel test using maximum mean discrepancy is employed to detect abnormal segments within a given time period. Theoretical results showed that learning from numerical vectors is equivalent to learning directly through the raw data. A real-world example illustrates how driver mismatched visual behavior occurred during a lane change. The second part of the dissertation introduces a two-sample test for comparing text generation similarity. The hypothesis tested is whether the probabilistic mapping measures that generate textual data are identical for two groups of documents. The proposed test compares the likelihood of text documents, estimated through neural network-based language models under the autoregressive setup. The test statistic is derived from an estimation and inference framework that first approximates data likelihood with an estimation set before performing inference on the remaining part. The theoretical result indicates that the test statistic's asymptotic behavior approximates a normal distribution under mild conditions. Additionally, a multiple data-splitting strategy is utilized, combining p-values into a unified decision to enhance the test's power. The third part of the dissertation develops a method to measure differences in text generation between a benchmark dataset and a comparison dataset, focusing on word-level generation variations. This method uses the sliced-Wasserstein distance to compute the contextual discrepancy score. A resampling method establishes a threshold to screen the scores. Crash report narratives are analyzed to compare crashes involving vehicles equipped with level 2 advanced driver assistance systems and those involving human drivers. / Doctor of Philosophy / Unstructured data, such as texts, human behavior records, and system logs, cannot be neatly organized. This type of data often appears in sequences with natural connections. Traditional methods use models to understand these connections, but these models depend on specific assumptions, which can limit their effectiveness. New tools using neural networks have made it easier to work with large and complex data. These tools help simplify data by turning it into smaller, manageable pieces, a process known as numerical embedding. While this helps in understanding the data better, it often requires a statistical foundation for the proceeding inferential analysis. This dissertation aims to develop statistical inference techniques for analyzing unstructured sequential data, focusing on transportation safety. The first part of the dissertation introduces a two-step method. First, it transforms large-scale unorganized data into numerical vectors. Then, it uses a statistical test to detect unusual patterns over a period. For example, it can identify when a driver's visual behavior doesn't properly aligned with the driving attention demand during lane changes. The second part of the dissertation presents a method to compare the similarity of text generation. It tests whether the way texts are generated is the same for two groups of documents. This method uses neural network-based models to estimate the likelihood of text documents. Theoretical results show that as the more data observed, the distribution of the test statistic will get closer to the desired distribution under certain conditions. Additionally, combining multiple data splits improves the test's power. The third part of the dissertation constructs a score to measure differences in text generation processes, focusing on word-level differences. This score is based on a specific distance measure. To check if the difference is not a false discovery, a screening threshold is established using resampling technique. If the score exceeds the threshold, the difference is considered significant. An application of this method compares crash reports from vehicles with advanced driver assistance systems to those from human-driven vehicles. Statistical Inference Text Mining Neural Networks
170	Inferring hidden features in the Internet Gürsun, Gonca 23 July 2024 (has links) The Internet is a large-scale decentralized system that is composed of thousands of independent networks. In this system, there are two main components, interdomain routing and traffic, that are vital inputs for many tasks such as traffic engineering, security, and business intelligence. However, due to the decentralized structure of the Internet, global knowledge of both interdomain routing and traffic is hard to come by. In this dissertation, we address a set of statistical inference problems with the goal of extending the knowledge of the interdomain-level Internet. In the first part of this dissertation we investigate the relationship between the interdomain topology and an individual network's inference ability. We first frame the questions through abstract analysis of idealized topologies, and then use actual routing measurements and topologies to study the ability of real networks to infer traffic flows. In the second part, we study the ability of networks to identify which paths flow through their network. We first discuss that answering this question is surprisingly hard due to the design of interdomain routing systems where each network can learn only a limited set of routes. Therefore, network operators have to rely on observed traffic. However, observed traffic can only identify that a particular route passes through its network but not that a route does not pass through its network. In order to solve the routing inference problem, we propose a nonparametric inference technique that works quite accurately. The key idea behind our technique is measuring the distances between destinations. In order to accomplish that, we define a metric called Routing State Distance (RSD) to measure distances in terms of routing similarity. Finally, in the third part, we study our new metric, RSD in detail. Using RSD we address an important and difficult problem of characterizing the set of paths between networks. The collection of the paths across networks is a great source to understand important phenomena in the Internet as path selections are driven by the economic and performance considerations of the networks. We show that RSD has a number of appealing properties that can discover these hidden phenomena. Computer science Computer networks Statistical inference

Search results