Global ETD Search

311	Statistical methods for variant discovery and functional genomic analysis using next-generation sequencing data Tang, Man 03 January 2020 (has links) The development of high-throughput next-generation sequencing (NGS) techniques produces massive amount of data, allowing the identification of biomarkers in early disease diagnosis and driving the transformation of most disciplines in biology and medicine. A greater concentration is needed in developing novel, powerful, and efficient tools for NGS data analysis. This dissertation focuses on modeling ``omics'' data in various NGS applications with a primary goal of developing novel statistical methods to identify sequence variants, find transcription factor (TF) binding patterns, and decode the relationship between TF and gene expression levels. Accurate and reliable identification of sequence variants, including single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs), plays a fundamental role in NGS applications. Existing methods for calling these variants often make simplified assumption of positional independence and fail to leverage the dependence of genotypes at nearby loci induced by linkage disequilibrium. We propose vi-HMM, a hidden Markov model (HMM)-based method for calling SNPs and INDELs in mapped short read data. Simulation experiments show that, under various sequencing depths, vi-HMM outperforms existing methods in terms of sensitivity and F1 score. When applied to the human whole genome sequencing data, vi-HMM demonstrates higher accuracy in calling SNPs and INDELs. One important NGS application is chromatin immunoprecipitation followed by sequencing (ChIP-seq), which characterizes protein-DNA relations through genome-wide mapping of TF binding sites. Multiple TFs, binding to DNA sequences, often show complex binding patterns, which indicate how TFs with similar functionalities work together to regulate the expression of target genes. To help uncover the transcriptional regulation mechanism, we propose a novel nonparametric Bayesian method to detect the clustering pattern of multiple-TF bindings from ChIP-seq datasets. Simulation study demonstrates that our method performs best with regard to precision, recall, and F1 score, in comparison to traditional methods. We also apply the method on real data and observe several TF clusters that have been recognized previously in mouse embryonic stem cells. Recent advances in ChIP-seq and RNA sequencing (RNA-Seq) technologies provides more reliable and accurate characterization of TF binding sites and gene expression measurements, which serves as a basis to study the regulatory functions of TFs on gene expression. We propose a log Gaussian cox process with wavelet-based functional model to quantify the relationship between TF binding site locations and gene expression levels. Through the simulation study, we demonstrate that our method performs well, especially with large sample size and small variance. It also shows a remarkable ability to distinguish real local feature in the function estimates. / Doctor of Philosophy / The development of high-throughput next-generation sequencing (NGS) techniques produces massive amount of data and bring out innovations in biology and medicine. A greater concentration is needed in developing novel, powerful, and efficient tools for NGS data analysis. In this dissertation, we mainly focus on three problems closely related to NGS and its applications: (1) how to improve variant calling accuracy, (2) how to model transcription factor (TF) binding patterns, and (3) how to quantify of the contribution of TF binding on gene expression. We develop novel statistical methods to identify sequence variants, find TF binding patterns, and explore the relationship between TF binding and gene expressions. We expect our findings will be helpful in promoting a better understanding of disease causality and facilitating the design of personalized treatments. next-generation sequencing hidden Markov model variant calling transcription factor nonparametric Bayesian log Gaussian Cox process Dirichlet process mixture gene expression wavelet-based functional model
312	Deep Learning Empowered Unsupervised Contextual Information Extraction and its applications in Communication Systems Gusain, Kunal 16 January 2023 (has links) Master of Science / There has been an astronomical increase in data at the network edge due to the rapid development of 5G infrastructure and the proliferation of the Internet of Things (IoT). In order to improve the network controller's decision-making capabilities and improve the user experience, it is of paramount importance to properly analyze this data. However, transporting such a large amount of data from edge devices to the network controller requires large bandwidth and increased latency, presenting a significant challenge to resource-constrained wireless networks. By using information processing techniques, one could effectively address this problem by sending only pertinent and critical information to the network controller. Nevertheless, finding critical information from high-dimensional observation is not an easy task, especially when large amounts of background information are present. Our thesis proposes to extract critical but low-dimensional information from high-dimensional observations using an information-theoretic deep learning framework. We focus on two distinct problems where critical information extraction is imperative. In the first problem, we study the problem of feature extraction from video frames collected in a dynamic environment and showcase its effectiveness using a video game simulation experiment. In the second problem, we investigate the detection of anomaly signals in the spectrum by extracting and analyzing useful features from spectrograms. Using extensive simulation experiments based on a practical data set, we conclude that our proposed approach is highly effective in detecting anomaly signals in a wide range of signal-to-noise ratios. Information Extraction Autoencoder Convolutional Neural Networks Hidden Markov Model Multi-Modal Information H-Score Anomaly Detection One Class SVM Isolation Forest
313	[en] GOAL-BASED INVESTMENTS: A DYNAMIC STOCHASTIC PROGRAMMING APPROACH / [pt] POLÍTICA DE INVESTIMENTO ORIENTADA A OBJETIVO DE LONGO PRAZO ANDRE FREDERICO MACIEL GUTIERREZ 13 June 2024 (has links) [pt] O objetivo deste estudo é desenvolver uma política de investimentoque minimize a contribuição total necessária para atingir um objetivofinanceiro a longo prazo. Para atingir este objetivo, desenvolvemos umproblema de otimização multi-estágios que integra um modelo de Markovoculto para captar a dinâmica estocástica dos retornos dos ativos. Aocontrário dos modelos convencionais de otimização de carteiras, que sebaseiam em pressupostos irrealistas, a nossa abordagem baseia-se no quadrode investimentos orientado a objetivos, que proporciona uma solução maisprática e eficaz. Além disso, ao utilizar o modelo de Markov oculto no nossoprocesso de otimização, obtemos uma estimativa mais precisa da dinâmicados retornos dos ativos, o que se traduz numa melhor tomada de decisõesde investimento. Ao utilizar o nosso modelo, a contribuição necessária paraatingir um objetivo financeiro desejado é minimizada através de uma políticade investimento que tem em conta o estado atual da riqueza e as condiçõeseconomicas prevalecentes. / [en] The aim of this study is to develop an investment policy that minimizes the total contribution required to achieve a long-term financial objective. To achieve this goal, we developed a multi-stage optimization problem that integrates a Hidden Markov Model to capture the stochastic dynamics of asset returns. Unlike conventional portfolio optimization models which are based on unrealistic assumptions, our approach is based on the goal oriented investment framework which provides a more practical and effective solution. In addition, by using the Hidden Markov Model in our optimization process, we obtain a more accurate estimate of the dynamics of asset returns, which translates into better investment decision-making. By using our model, the contribution required to achieve a desired financial goal is minimized through an investment policy that considers current levels of wealth and prevailing economic conditions. [pt] SIMULACAO [pt] CADEIA DE MARKOV ESCONDIDA [pt] PREVIDENCIA [pt] OTIMIZACAO LINEAR [en] SIMULATION [en] GOAL ORIENTED INVESTMENT [en] HIDDEN MARKOV MODEL [en] PENSION [en] LINEAR OPTIMIZATION
314	Predictive maintenance using the classification of time series Siddik, Md Abu Bakar January 2024 (has links) In today's industrial landscape, the pursuit of operational excellence has driven organizations to seek innovative approaches to ensure the uninterrupted functionality of machinery and equipment. Predictive maintenance (PM) provides a pivotal strategy to achieve this goal by detecting faults earlier and predicting maintenance before the system enters a critical state. This thesis proposed a fault detection and diagnosis (FDD) method for predictive maintenance using particle filter resampling and a particle tracking technique. To develop this FDD method, particle filter and hidden Markov model efficiency in the forecasting system state variables are studied on a hydraulic wind power transfer system with different noise levels and system faults. Furthermore, a particle tracker is developed to analyze the particle filter's resampling process and study the particle selection process. After that, the proposed FDD method was developed and validated through three simulation tests employing system degradation models. Furthermore, the system's remaining useful life (RUL) is estimated for those simulation tests. Predictive maintenance PM Particle Filter Hidden Markov model RUL Fault detection Time series methods Bootstrap particle filter Metropolis-Hasting’s algorithm PF HMM. Control Engineering Reglerteknik
315	Word Classes in Language Modelling Erikson, Emrik, Åström, Marcus January 2024 (has links) This thesis concerns itself with word classes and their application to language modelling.Considering a purely statistical Markov model trained on sequences of word classes in theSwedish language different problems in language engineering are examined. Problemsconsidered are part-of-speech tagging, evaluating text modifiers such as translators withthe help of probability measurements and matrix norms, and lastly detecting differenttypes of text using the Fourier transform of cross entropy sequences of word classes.The results show that the word class language model is quite weak by itself but that itis able to improve part-of-speech tagging for 1 and 2 letter models. There are indicationsthat a stronger word class model could aid 3-letter and potentially even stronger models.For evaluating modifiers the model is often able to distinguish between shuffled andsometimes translated text as well as to assign a score as to how much a text has beenmodified. Future work on this should however take better care to ensure large enoughtest data. The results from the Fourier approach indicate that a Fourier analysis of thecross entropy sequence between word classes may allow the model to distinguish betweenA.I. generated text as well as translated text from human written text. Future work onmachine learning word class models could be carried out to get further insights into therole of word class models in modern applications. The results could also give interestinginsights in linguistic research regarding word classes. Word class Language Model POS-tagging n-gram Markov Model Transition Matrix Matrix norm Cross Entropy Discrete Fourier Transform Mathematics Matematik
316	Data Transformation Trajectories in Embedded Systems Kasinathan, Gokulnath January 2016 (has links) Mobile phone tracking is the ascertaining of the position or location of a mobile phone when moving from one place to another place. Location Based Services Solutions include Mobile positioning system that can be used for a wide array of consumer-demand services like search, mapping, navigation, road transport traffic management and emergency-call positioning. The Mobile Positioning System (MPS) supports complementary positioning methods for 2G, 3G and 4G/LTE (Long Term Evolution) networks. Mobile phone is popularly known as an UE (User Equipment) in LTE. A prototype method of live trajectory estimation for massive UE in LTE network has been proposed in this thesis work. RSRP (Reference Signal Received Power) values and TA(Timing Advance) values are part of LTE events for UE. These specific LTE events can be streamed to a system from eNodeB of LTE in real time by activating measurements on UEs in the network. AoA (Angle of Arrival) and TA values are used to estimate the UE position. AoA calculation is performed using RSRP values. The calculated UE positions are filtered using Particle Filter(PF) to estimate trajectory. To obtain live trajectory estimation for massive UEs, the LTE event streamer is modelled to produce several task units with events data for massive UEs. The task level modelled data structures are scheduled across Arm Cortex A15 based MPcore, with multiple threads. Finally, with massive UE live trajectory estimation, IMSI (International mobile subscriber identity) is used to maintain hidden markov requirements of particle filter functionality while maintaining load balance for 4 Arm A15 cores. This is proved by serial and parallel performance engineering. Future work is proposed for Decentralized task level scheduling with hash function for IMSI with extension of cores and Concentric circles method for AoA accuracy. / Mobiltelefoners positionering är välfungerande för positionslokalisering av mobiltelefoner när de rör sig från en plats till en annan. Lokaliseringstjänsterna inkluderar mobil positionering system som kan användas till en mängd olika kundbehovs tjänster som sökning av position, position i kartor, navigering, vägtransporters trafik managering och nödsituationssamtal med positionering. Mobil positions system (MPS) stödjer komplementär positions metoder för 2G, 3G och 4G/LTE (Long Term Evolution) nätverk. Mobiltelefoner är populärt känd som UE (User Equipment) inom LTE. En prototypmetod med verkliga rörelsers estimering för massiv UE i LTE nätverk har blivit föreslagen för detta examens arbete. RSRP (Reference Signal Received Power) värden och TA (Timing Advance) värden är del av LTE händelser för UE. Dessa specifika LTE event kan strömmas till ett system från eNodeB del av LTE, i realtid genom aktivering av mätningar på UEar i nätverk. AoA (Angel of Arrival) och TA värden är använt för att beräkna UEs position. AoA beräkningar är genomförda genom användandet av RSRP värden. Den kalkylerade UE positionen är filtrerad genom användande av Particle Filter (PF) för att estimera rörelsen. För att identifiera verkliga rörelser, beräkningar för massiva UEs, LTE event streamer är modulerad att producera flera uppgifts enheter med event data från massiva UEar. De tasks modulerade data strukturerna är planerade över Arm Cortex A15 baserade MPcore, med multipla trådar. Slutligen, med massiva UE verkliga rörelser, beräkningar med IMSI(International mobile subscriber identity) är använt av den Hidden Markov kraven i Particle Filter’s funktionalitet medans kravet att underhålla last balansen för 4 Arm A15 kärnor. Detta är utfört genom seriell och parallell prestanda teknik. Framtida arbeten för decentraliserade task nivå skedulering med hash funktion för IMSI med utökning av kärnor och Concentric circles metod för AoA noggrannhet. Angle of Arrival Hidden Markov Model Particle Filter Arm A15 MPcore Parallel Programming Real time task level scheduler Angle of Arrival Hidden Markov Model Particle Filter Arm A15 MPcore Parallel Programming Real time task level scheduler Computer and Information Sciences Data- och informationsvetenskap
317	有記憶性信用價差期間結構模型李弘道 Unknown Date (has links) 本文建立了當違約機率及回收率為隨機變動，同時信用等級移動有記憶性，且回收率和無風險利率期間結構相關之信用風險價差期間結構模型。並評價信用價差選擇權及有對手違約風險普通選擇權之價值。此模型產生的信用價差有更多的變化性，將可描述：信用價差的隨機波動行為，且即使信用等級沒變，價差仍可能發生改變；信用價差與無風險利率期間結構有相關性；公司特定或證券特定的價差及其變動行為；處於等級上升或下降趨勢公司債券之殖利率曲線，能更準確配適有風險債券的價格等實際現象。並可應用至有對手違約風險之商品及多種信用衍生性商品等的評價與避險，且可進行風險管理方面的應用。關鍵詞：信用風險；信用風險價差；馬可夫模型；信用衍生性商品 / In this thesis we develop a credit migration model with memory for the term structure of credit risk spreads. Our model incorporates stochastic default probability, stochastic recovery rate, and the correlation between the recovery rate and the term structure of risk-free interest rates. We derive valuation formulae for a credit spread option and a plain vanilla option with counterparty risk. This model provides greater variability in credit spreads, and it has properties in line with what have been observed in practice: (1) credit spreads show diffusion-like behavior even though the credit rating of the firm has not changed; (2) the model injects correlation between spreads and the term structure of interest rates; (3) the model enables firm-specific and security-specific variability of spreads to be accommodated; and (4) the model enables us to estimate the yield curves corresponding to the positive and negative trends of credit ratings and match the observed risky bond prices more precisely. This model is useful for pricing and hedging OTC derivatives with counterparty risk, for pricing and hedging credit derivatives, and for risk management. Key Words: Credit Risk, Credit Risk Spread, Markov Model, Credit Derivative. 信用風險信用風險價差馬可夫模型信用衍生性商品 Credit risk Credit risk spread Markov model Credit derivative
318	Mathematical modelling and analysis of aspects of bacterial motility Rosser, Gabriel A. January 2012 (has links) The motile behaviour of bacteria underlies many important aspects of their actions, including pathogenicity, foraging efficiency, and ability to form biofilms. In this thesis, we apply mathematical modelling and analysis to various aspects of the planktonic motility of flagellated bacteria, guided by experimental observations. We use data obtained by tracking free-swimming Rhodobacter sphaeroides under a microscope, taking advantage of the availability of a large dataset acquired using a recently developed, high-throughput protocol. A novel analysis method using a hidden Markov model for the identification of reorientation phases in the tracks is described. This is assessed and compared with an established method using a computational simulation study, which shows that the new method has a reduced error rate and less systematic bias. We proceed to apply the novel analysis method to experimental tracks, demonstrating that we are able to successfully identify reorientations and record the angle changes of each reorientation phase. The analysis pipeline developed here is an important proof of concept, demonstrating a rapid and cost-effective protocol for the investigation of myriad aspects of the motility of microorganisms. In addition, we use mathematical modelling and computational simulations to investigate the effect that the microscope sampling rate has on the observed tracking data. This is an important, but often overlooked aspect of experimental design, which affects the observed data in a complex manner. Finally, we examine the role of rotational diffusion in bacterial motility, testing various models against the analysed data. This provides strong evidence that R. sphaeroides undergoes some form of active reorientation, in contrast to the mainstream belief that the process is passive. 579.3
319	A comparative study between algorithms for time series forecasting on customer prediction : An investigation into the performance of ARIMA, RNN, LSTM, TCN and HMM Almqvist, Olof January 2019 (has links) Time series prediction is one of the main areas of statistics and machine learning. In 2018 the two new algorithms higher order hidden Markov model and temporal convolutional network were proposed and emerged as challengers to the more traditional recurrent neural network and long-short term memory network as well as the autoregressive integrated moving average (ARIMA). In this study most major algorithms together with recent innovations for time series forecasting is trained and evaluated on two datasets from the theme park industry with the aim of predicting future number of visitors. To develop models, Python libraries Keras and Statsmodels were used. Results from this thesis show that the neural network models are slightly better than ARIMA and the hidden Markov model, and that the temporal convolutional network do not perform significantly better than the recurrent or long-short term memory networks although having the lowest prediction error on one of the datasets. Interestingly, the Markov model performed worse than all neural network models even when using no independent variables. machine learning deep learning time series forecasting time series regression data science prediction crisp-dm keras markov model neural network exploratory data analysis maskininlärning djupinlärning tidsserieprediktion tidsserieprognos neurala nätverk markovmodell explorativ dataanalys dataanalys Engineering and Technology Teknik och teknologier
320	A Novel Cloud Broker-based Resource Elasticity Management and Pricing for Big Data Streaming Applications Runsewe, Olubisi A. 28 May 2019 (has links) The pervasive availability of streaming data from various sources is driving todays’ enterprises to acquire low-latency big data streaming applications (BDSAs) for extracting useful information. In parallel, recent advances in technology have made it easier to collect, process and store these data streams in the cloud. For most enterprises, gaining insights from big data is immensely important for maintaining competitive advantage. However, majority of enterprises have diﬃculty managing the multitude of BDSAs and the complex issues cloud technologies present, giving rise to the incorporation of cloud service brokers (CSBs). Generally, the main objective of the CSB is to maintain the heterogeneous quality of service (QoS) of BDSAs while minimizing costs. To achieve this goal, the cloud, although with many desirable features, exhibits major challenges — resource prediction and resource allocation — for CSBs. First, most stream processing systems allocate a ﬁxed amount of resources at runtime, which can lead to under- or over-provisioning as BDSA demands vary over time. Thus, obtaining optimal trade-oﬀ between QoS violation and cost requires accurate demand prediction methodology to prevent waste, degradation or shutdown of processing. Second, coordinating resource allocation and pricing decisions for self-interested BDSAs to achieve fairness and eﬃciency can be complex. This complexity is exacerbated with the recent introduction of containers. This dissertation addresses the cloud resource elasticity management issues for CSBs as follows: First, we provide two contributions to the resource prediction challenge; we propose a novel layered multi-dimensional hidden Markov model (LMD-HMM) framework for managing time-bounded BDSAs and a layered multi-dimensional hidden semi-Markov model (LMD-HSMM) to address unbounded BDSAs. Second, we present a container resource allocation mechanism (CRAM) for optimal workload distribution to meet the real-time demands of competing containerized BDSAs. We formulate the problem as an n-player non-cooperative game among a set of heterogeneous containerized BDSAs. Finally, we incorporate a dynamic incentive-compatible pricing scheme that coordinates the decisions of self-interested BDSAs to maximize the CSB’s surplus. Experimental results demonstrate the eﬀectiveness of our approaches. Cloud Computing Big Data Resource Prediction Resource Allocation Stream Processing Game Theory Layered Hidden Markov Model Resource Management Container-Clusters Virtual Machines Streaming Applications Nash Equilibrium Queuing Theory Dynamic Pricing Resource scaling

Search results