Global ETD Search

21	Category-theoretic quantitative compositional distributional models of natural language semantics Grefenstette, Edward Thomas January 2013 (has links) This thesis is about the problem of compositionality in distributional semantics. Distributional semantics presupposes that the meanings of words are a function of their occurrences in textual contexts. It models words as distributions over these contexts and represents them as vectors in high dimensional spaces. The problem of compositionality for such models concerns itself with how to produce distributional representations for larger units of text (such as a verb and its arguments) by composing the distributional representations of smaller units of text (such as individual words). This thesis focuses on a particular approach to this compositionality problem, namely using the categorical framework developed by Coecke, Sadrzadeh, and Clark, which combines syntactic analysis formalisms with distributional semantic representations of meaning to produce syntactically motivated composition operations. This thesis shows how this approach can be theoretically extended and practically implemented to produce concrete compositional distributional models of natural language semantics. It furthermore demonstrates that such models can perform on par with, or better than, other competing approaches in the field of natural language processing. There are three principal contributions to computational linguistics in this thesis. The first is to extend the DisCoCat framework on the syntactic front and semantic front, incorporating a number of syntactic analysis formalisms and providing learning procedures allowing for the generation of concrete compositional distributional models. The second contribution is to evaluate the models developed from the procedures presented here, showing that they outperform other compositional distributional models present in the literature. The third contribution is to show how using category theory to solve linguistic problems forms a sound basis for research, illustrated by examples of work on this topic, that also suggest directions for future research. 006.3
22	Computational Inference of Genome-Wide Protein-DNA Interactions Using High-Throughput Genomic Data Zhong, Jianling January 2015 (has links) <p>Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape. </p><p>We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations. </p><p>We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites. </p><p>Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets. </p><p>This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.</p> / Dissertation Bioinformatics Statistics Computer science Bayes factor Genomic data integration Protein-DNA interactions state-space models statistical inference transcriptional regulation
23	Embedding population dynamics in mark-recapture models Bishop, Jonathan R. B. January 2009 (has links) Mark-recapture methods use repeated captures of individually identifiable animals to provide estimates of properties of populations. Different models allow estimates to be obtained for population size and rates of processes governing population dynamics. State-space models consist of two linked processes evolving simultaneously over time. The state process models the evolution of the true, but unknown, states of the population. The observation process relates observations on the population to these true states. Mark-recapture models specified within a state-space framework allow population dynamics models to be embedded in inference ensuring that estimated changes in the population are consistent with assumptions regarding the biology of the modelled population. This overcomes a limitation of current mark-recapture methods. Two alternative approaches are considered. The "conditional" approach conditions on known numbers of animals possessing capture history patterns including capture in the current time period. An animal's capture history determines its state; consequently, capture parameters appear in the state process rather than the observation process. There is no observation error in the model. Uncertainty occurs only through the numbers of animals not captured in the current time period. An "unconditional" approach is considered in which the capture histories are regarded as observations. Consequently, capture histories do not influence an animal's state and capture probability parameters appear in the observation process. Capture histories are considered a random realization of the stochastic observation process. This is more consistent with traditional mark-recapture methods. Development and implementation of particle filtering techniques for fitting these models under each approach are discussed. Simulation studies show reasonable performance for the unconditional approach and highlight problems with the conditional approach. Strengths and limitations of each approach are outlined, with reference to Soay sheep data analysis, and suggestions are presented for future analyses. 579.135
24	Statistical Fault Detection with Applications to IMU Disturbances Törnqvist, David January 2006 (has links) <p>This thesis deals with the problem of detecting faults in an environment where the measurements are affected by additive noise. To do this, a residual sensitive to faults is derived and statistical methods are used to distinguish faults from noise. Standard methods for fault detection compare a batch of data with a model of the system using the generalized likelihood ratio. Careful treatment of the initial state of the model is quite important, in particular for short batch sizes. One method to handle this is the parity-space method which solves the problem by removing the influence of the initial state using a projection.</p><p>In this thesis, the case where prior knowledge about the initial state is available is treated. This can be obtained for example from a Kalman filter. Combining the prior estimate with a minimum variance estimate from the data batch results in a smoothed estimate. The influence of the estimated initial state is then removed. It is also shown that removing the influence of the initial state by an estimate from the data batch will result in the parity-space method. To model slowly changing faults, an efficient parameterization using Chebyshev polynomials is given.</p><p>The methods described above have been applied to an Inertial Measurement Unit, IMU. The IMU usually consists of accelerometers and gyroscopes, but has in this work been extended with a magnetometer. Traditionally, the IMU has been used to estimate position and orientation of airplanes, missiles etc. Recently, the size and cost has decreased making it possible to use IMU:s for applications such as augmented reality and body motion analysis. Since a magnetometer is very sensitive to disturbances from metal, such disturbances have to be detected. Detection of the disturbances makes compensation possible. Another topic covered is the fundamental question of observability for fault inputs. Given a fixed or linearly growing fault, conditions for observability are given.</p><p>The measurements from the IMU show that the noise distribution of the sensors can be well approximated with white Gaussian noise. This gives good correspondence between practical and theoretical results when the sensor is kept at rest. The disturbances for the IMU can be approximated using smooth functions with respect to time. Low rank parameterizations can therefore be used to describe the disturbances. The results show that the use of smoothing to obtain the initial state estimate and parameterization of the disturbances improves the detection performance drastically.</p> Fault detection Linear systems State-space models Signal processing State estimation Fault observability Inertial Measurement Unit Automatic control Reglerteknik
25	Statistical Fault Detection with Applications to IMU Disturbances Törnqvist, David January 2006 (has links) This thesis deals with the problem of detecting faults in an environment where the measurements are affected by additive noise. To do this, a residual sensitive to faults is derived and statistical methods are used to distinguish faults from noise. Standard methods for fault detection compare a batch of data with a model of the system using the generalized likelihood ratio. Careful treatment of the initial state of the model is quite important, in particular for short batch sizes. One method to handle this is the parity-space method which solves the problem by removing the influence of the initial state using a projection. In this thesis, the case where prior knowledge about the initial state is available is treated. This can be obtained for example from a Kalman filter. Combining the prior estimate with a minimum variance estimate from the data batch results in a smoothed estimate. The influence of the estimated initial state is then removed. It is also shown that removing the influence of the initial state by an estimate from the data batch will result in the parity-space method. To model slowly changing faults, an efficient parameterization using Chebyshev polynomials is given. The methods described above have been applied to an Inertial Measurement Unit, IMU. The IMU usually consists of accelerometers and gyroscopes, but has in this work been extended with a magnetometer. Traditionally, the IMU has been used to estimate position and orientation of airplanes, missiles etc. Recently, the size and cost has decreased making it possible to use IMU:s for applications such as augmented reality and body motion analysis. Since a magnetometer is very sensitive to disturbances from metal, such disturbances have to be detected. Detection of the disturbances makes compensation possible. Another topic covered is the fundamental question of observability for fault inputs. Given a fixed or linearly growing fault, conditions for observability are given. The measurements from the IMU show that the noise distribution of the sensors can be well approximated with white Gaussian noise. This gives good correspondence between practical and theoretical results when the sensor is kept at rest. The disturbances for the IMU can be approximated using smooth functions with respect to time. Low rank parameterizations can therefore be used to describe the disturbances. The results show that the use of smoothing to obtain the initial state estimate and parameterization of the disturbances improves the detection performance drastically. Fault detection Linear systems State-space models Signal processing State estimation Fault observability Inertial Measurement Unit Automatic control Reglerteknik
26	Recursive Residuals and Model Diagnostics for Normal and Non-Normal State Space Models Frühwirth-Schnatter, Sylvia January 1994 (has links) (PDF) Model diagnostics for normal and non-normal state space models is based on recursive residuals which are defined from the one-step ahead predictive distribution. Routine calculation of these residuals is discussed in detail. Various tools of diagnostics are suggested to check e.g. for wrong observation distributions and for autocorrelation. The paper also covers such topics as model diagnostics for discrete time series, model diagnostics for generalized linear models, and model discrimination via Bayes factors. (author's abstract) / Series: Forschungsberichte / Institut für Statistik
27	Estimating The Neutral Real Interest Rate For Turkey By Using An Unobserved Components Model Ogunc, Fethi 01 July 2006 (has links) (PDF) In this study, neutral real interest rate gap and output gap are estimated jointly under two different multivariate unobserved components models with the motivation to provide empirical measures that can be used to analyze the amount of stimulus that monetary policy is passing on to the economy, and to understand historical macroeconomic developments. In the analyses, Kalman filter technique is applied to a small-scale macroeconomic model of the Turkish economy to estimate the unobserved variables for the period 1989-2005. In addition, two alternative specifications for neutral real interest rate are used in the analyses. The first model uses a random walk model for the neutral real interest rate, whereas the second one employs more structural specification, which specifically links the neutral real rate with the trend growth rate and the long-term course of the risk premium. Comparison of the models developed by using various performance criteria clearly indicates the use of more structural specification against random walk specification. Results suggest that though there is relatively high uncertainty surrounding the neutral real interest rate estimates to use them directly in the policy-making process, estimates appear to be very useful for ex-post monetary policy evaluations. QA General 15707
28	Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing Gorrell, Genevieve January 2006 (has links) The current surge of interest in search and comparison tasks in natural language processing has brought with it a focus on vector space approaches and vector space dimensionality reduction techniques. Presenting data as points in hyperspace provides opportunities to use a variety of welldeveloped tools pertinent to this representation. Dimensionality reduction allows data to be compressed and generalised. Eigen decomposition and related algorithms are one category of approaches to dimensionality reduction, providing a principled way to reduce data dimensionality that has time and again shown itself capable of enabling access to powerful generalisations in the data. Issues with the approach, however, include computational complexity and limitations on the size of dataset that can reasonably be processed in this way. Large datasets are a persistent feature of natural language processing tasks. This thesis focuses on two main questions. Firstly, in what ways can eigen decomposition and related techniques be extended to larger datasets? Secondly, this having been achieved, of what value is the resulting approach to information retrieval and to statistical language modelling at the ngram level? The applicability of eigen decomposition is shown to be extendable through the use of an extant algorithm; the Generalized Hebbian Algorithm (GHA), and the novel extension of this algorithm to paired data; the Asymmetric Generalized Hebbian Algorithm (AGHA). Several original extensions to the these algorithms are also presented, improving their applicability in various domains. The applicability of GHA to Latent Semantic Analysisstyle tasks is investigated. Finally, AGHA is used to investigate the value of singular value decomposition, an eigen decomposition variant, to ngram language modelling. A sizeable perplexity reduction is demonstrated. Generalized Hebbian Algorithm Language Modelling Singular Value Decomposition Eigen Decomposition Latent Semantic Analysis Vector Space Models Computational linguistics Datorlingvistik
29	System Surveillance Mansoor, Shaheer January 2013 (has links) In recent years, trade activity in stock markets has increased substantially. This is mainly attributed to the development of powerful computers and intranets connecting traders to markets across the globe. The trades have to be carried out almost instantaneously and the systems in place that handle trades are burdened with millions of transactions a day, several thousand a minute. With increasing transactions the time to execute a single trade increases, and this can be seen as an impact on the performance. There is a need to model the performance of these systems and provide forecasts to give a heads up on when a system is expected to be overwhelmed by transactions. This was done in this study, in cooperation with Cinnober Financial Technologies, a firm which provides trading solutions to stock markets. To ensure that the models developed weren‟t biased, the dataset was cleansed, i.e. operational and other transactions were removed, and only valid trade transactions remained. For this purpose, a descriptive analysis of time series along with change point detection and LOESS regression were used. State space model with Kalman Filtering was further used to develop a time varying coefficient model for the performance, and this model was applied to make forecasts. Wavelets were also used to produce forecasts, and besides this high pass filters were used to identify low performance regions. The State space model performed very well to capture the overall trend in performance and produced reliable forecasts. This can be ascribed to the property of Kalman Filter to handle noisy data well. Wavelets on the other hand didn‟t produce reliable forecasts but were more efficient in detecting regions of low performance. State Space Models Forecasting Wavelets LOESS Change Point Detection Financial Systems Trading Transactions per second Kalman Filtering
30	Portfolio of original compositions Soria Luz, Rosalia January 2016 (has links) This portfolio of compositions investigates the adaptation of state-space models, frequently used in engineering control theory, to the electroacoustic composition context. These models are mathematical descriptions of physical systems that provide several variables representing the system’s behaviours. The composer adapts a set of state-space models of either abstract, mechanical or electrical systems to a music creation environment. She uses them in eight compositions: five mixed media multi-channel pieces and three mixed media pieces. In the portfolio, the composer investigates multiple ways of meaningfully mapping these system’s behaviours into music parameters. This is done either by exploring and creating timbre in synthetic sound, or by transforming existing sounds. The research also involves the process of incorporating state-space models as a real-time software tool using Max and SuperCollider. As real-time models offer several variables of continuous evolutions, the composer mapped them to different dimensions of sound simultaneously. The composer represented the model’s evolutions with either short/interrupted, long or indefinitely evolving sounds. The evolution implies changes in timbre, length and dynamic range. The composer creates gestures, textures and spaces based on the model’s behaviours. The composer explores how the model’s nature influences the musical language and the integration of these with other music sources such as recordings or musical instruments. As the models represent physical processes, the composer observes that the resulting sounds evolve in organic ways. Moreover, the composer not only sonifies the real-time models, but actually excites them to cause changes. The composer develops a compositional methodology which involves interacting with the models while observing/designing changes in sound. In that sense, the composer regards real-time state-space models as her own instruments to create music. The models are regarded as additional forces and as sound transforming agents in mixed media pieces. In fixed media pieces, the composer additionally exploits their linearity to create space through sound de-correlation. 781.3

Search results