701 |
Topics in Signal Processing: applications in genomics and geneticsElmas, Abdulkadir January 2016 (has links)
The information in genomic or genetic data is influenced by various complex processes and appropriate mathematical modeling is required for studying the underlying processes and the data. This dissertation focuses on the formulation of mathematical models for certain problems in genomics and genetics studies and the development of algorithms for proposing efficient solutions. A Bayesian approach for the transcription factor (TF) motif discovery is examined and the extensions are proposed to deal with many interdependent parameters of the TF-DNA binding. The problem is described by statistical terms and a sequential Monte Carlo sampling method is employed for the estimation of unknown parameters. In particular, a class-based resampling approach is applied for the accurate estimation of a set of intrinsic properties of the DNA binding sites. Through statistical analysis of the gene expressions, a motif-based computational approach is developed for the inference of novel regulatory networks in a given bacterial genome. To deal with high false-discovery rates in the genome-wide TF binding predictions, the discriminative learning approaches are examined in the context of sequence classification, and a novel mathematical model is introduced to the family of kernel-based Support Vector Machines classifiers. Furthermore, the problem of haplotype phasing is examined based on the genetic data obtained from cost-effective genotyping technologies. Based on the identification and augmentation of a small and relatively more informative genotype set, a sparse dictionary selection algorithm is developed to infer the haplotype pairs for the sampled population. In a relevant context, to detect redundant information in the single nucleotide polymorphism (SNP) sites, the problem of representative (tag) SNP selection is introduced. An information theoretic heuristic is designed for the accurate selection of tag SNPs that capture the genetic diversity in a large sample set from multiple populations. The method is based on a multi-locus mutual information measure, reflecting a biological principle in the population genetics that is linkage disequilibrium.
|
702 |
Statistics, scaling and structures in fluid turbulence: case studies for thermal convection and pipe flow. / CUHK electronic theses & dissertations collectionJanuary 2002 (has links)
Shang Xiandong. / "September 2002." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (p. 141-146). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. / Abstracts in English and Chinese.
|
703 |
Statistical inference in population genetics using microsatellitesCsilléry, Katalin January 2009 (has links)
Statistical inference from molecular population genetic data is currently a very active area of research for two main reasons. First, in the past two decades an enormous amount of molecular genetic data have been produced and the amount of data is expected to grow even more in the future. Second, drawing inferences about complex population genetics problems, for example understanding the demographic and genetic factors that shaped modern populations, poses a serious statistical challenge. Amongst the many different kinds of genetic data that have appeared in the past two decades, the highly polymorphic microsatellites have played an important role. Microsatellites revolutionized the population genetics of natural populations, and were the initial tool for linkage mapping in humans and other model organisms. Despite their important role, and extensive use, the evolutionary dynamics of microsatellites are still not fully understood, and their statistical methods are often underdeveloped and do not adequately model microsatellite evolution. In this thesis, I address some aspects of this problem by assessing the performance of existing statistical tools, and developing some new ones. My work encompasses a range of statistical methods from simple hypothesis testing to more recent, complex computational statistical tools. This thesis consists of four main topics. First, I review the statistical methods that have been developed for microsatellites in population genetics applications. I review the different models of the microsatellite mutation process, and ask which models are the most supported by data, and how models were incorporated into statistical methods. I also present estimates of mutation parameters for several species based on published data. Second, I evaluate the performance of estimators of genetic relatedness using real data from five vertebrate populations. I demonstrate that the overall performance of marker-based pairwise relatedness estimators mainly depends on the population relatedness composition and may only be improved by the marker data quality within the limits of the population relatedness composition. Third, I investigate the different null hypotheses that may be used to test for independence between loci. Using simulations I show that testing for statistical independence (i.e. zero linkage disequilibrium, LD) is difficult to interpret in most cases, and instead a null hypothesis should be tested, which accounts for the “background LD” due to finite population size. I investigate the utility of a novel approximate testing procedure to circumvent this problem, and illustrate its use on a real data set from red deer. Fourth, I explore the utility of Approximate Bayesian Computation, inference based on summary statistics, to estimate demographic parameters from admixed populations. Assuming a simple demographic model, I show that the choice of summary statistics greatly influences the quality of the estimation, and that different parameters are better estimated with different summary statistics. Most importantly, I show how the estimation of most admixture parameters can be considerably improved via the use of linkage disequilibrium statistics from microsatellite data.
|
704 |
Estimations for statistical arbitrage in horse racing markets.January 2010 (has links)
Xiong, Liying. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leave 34). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Hong Kong Horse Racing Market and Models in Horse Racing --- p.3 / Chapter 2.1 --- Hong Kong Horse Racing Market --- p.4 / Chapter 2.2 --- Models in Horse Racing --- p.5 / Chapter 3 --- Probit Regression Model Incorporating with Public Estimates --- p.9 / Chapter 3.1 --- Estimation under No Particular Conditions --- p.10 / Chapter 3.2 --- Estimators under Particular Condition --- p.15 / Chapter 4 --- Prediction and Testing --- p.23 / Chapter 4.1 --- Prediction of Win Probability --- p.24 / Chapter 5 --- Conclusion --- p.32 / Bibliography --- p.34
|
705 |
Essays in Basketball AnalyticsKeshri, Suraj Kumar January 2019 (has links)
With the increasing popularity and competition in professional basketball in the past decade, data driven decision has emerged as a big competitive edge. The advent of high frequency player tracking data from SportVU has enabled a rigorous analysis of player abilities and interactions that was not possible before. The tracking data records two-dimensional x-y coordinates of 10 players on the court as well as the x-y-z coordinates of the ball at a resolution of 25 frames per second, yielding over 1 billion space-time observations over the course of a full season. This dissertation offers a collection of spatio-temporal models and player evaluation metrics that provide insight into the player interactions and their performance, hence allowing the teams to make better decisions.
Conventional approaches to simulate matches have ignored that in basketball the dynamics of ball movement is very sensitive to the lineups on the court and unique identities of players on both offense and defense sides. In chapter 2, we propose the simulation infrastructure that can bridge the gap between player identity and team level network. We model the progression of a basketball match using a probabilistic graphical model. We model every touch event in a game as a sequence of transitions between discrete states. We treat the progression of a match as a graph, where each node represents the network structure of players on the court, their actions, events, etc., and edges denote possible moves in the game flow. Our results show that either changes in the team lineup or changes in the opponent team lineup significantly affects the dynamics of a match progression. Evaluation on the match data for the 2013-16 NBA season suggests that the graphical model approach is appropriate for modeling a basketball match.
NBA teams value players who can ``stretch'' the floor, i.e. create space on the court by drawing their defender(s) closer to themselves. Clearly, this ability to attract defenders varies across players, and furthermore, this effect may also vary by the court location of the offensive player, and whether or not the player is the ball handler. For instance, a ball-handler near the basket attracts a defender more when compared to a non ball-handler at the 3 point line. This has a significant effect on the defensive assignment. This is particularly important because defensive assignment has become the cornerstone of all tracking data based player evaluation models. In chapter 3, we propose a new model to learn player and court location specific offensive attraction. We show that offensive players indeed have varying ability to attract the defender in different parts of the court. Using this metric, teams can evaluate players to construct a roster or lineup which maximizes spacing. We also improve upon the existing defensive matchup inference algorithm for SportVU data.
While the ultimate goal of the offense is to shoot the ball, the strategy lies in creating good shot opportunities. Offensive play event detection has been a topic of research interest. Current research in this area have used a supervised learning approach to detect and classify such events. We took an unsupervised learning approach to detect these events. This has two inherent benefits: first, there is no need for pretagged data to learn identifying these events which is a lobor intensive and error prone task; second, an unsupervised approach allows us to detect events that has not been tagged yet i.e. novel events. We use a HMM based approach to detect these events at any point in the time during a possession by specifying the functional form of the prior distribution on the player movement data. We test our framework on detecting ball screen, post up, and drive. However, it can be easily extended to events like isolation or a new event that has certain distinct defensive matchup or player movement feature compared to a non event. This is the topic for chapter 4.
Accurate estimation of the offensive and the defensive abilities of players in the NBA plays a crucial role in player selection and ranking. A typical approach to estimate players' defensive and offensive abilities is to learn the defensive assignment for each shot and then use a random effects model to estimate the offensive and defensive abilities for each player. The scalar estimate from the random effects model can then be used to rank player. In this approach, a shot has a binary outcome, either it is made or it is a miss. This approach is not able to take advantage of the “quality” of the shot trajectory. In chapter 5, we propose a new method for ranking players that infers the quality of a shot trajectory using a deep recurrent neural network, and then uses this quality measure in a random effects model to rank players taking defensive matchup into account. We show that the quality information significantly improves the player ranking. We also show that including the quality of shots increases the separation between the learned random effect coefficients, and thus, allows for a better differentiation of player abilities. Further, we show that we are able to infer changes in the player's ability on a game-by-game basis when using a trajectory based model. A shot based model does not have enough information to detect changes in player's ability on a game-by-game basis.
A good defensive player prevents its opponent from making a shot, attempting a good shot, making an easy pass, or scoring events, eventually leading to wasted shot clock time. The salient feature here is that a good defender prevents events. Consequently, event driven metrics, such as box scores, cannot measure defensive abilities. Conventional wisdom in basketball is that ``pesky'' defenders continuously maintain a close distance to the ball handler. A closely guarded offensive player is less likely to take or make a shot, less likely to pass, and more likely to lose the ball. In chapter 6, we introduce Defensive Efficiency Rating (DER), a new statistic that measures the defensive effectiveness of a player. DER is the effective distance a defender maintains with the ball handler during an interaction where we control for the identity and wingspan of the the defender, the shot efficiency of the ball handler, and the zone on the court. DER allows us to quantify the quality of defensive interaction without being limited by the occurrence of discrete and infrequent events like shots and rebounds. We show that the ranking from this statistic naturally picks out defenders known to perform well in particular zones.
|
706 |
Exploring Data Quality of Weigh-In-Motion SystemsDai, Chengxin 24 July 2013 (has links)
This research focuses on the data quality control methods for evaluating the performance of Weigh-In-Motion (WIM) systems on Oregon highways. This research identifies and develops a new methodology and algorithm to explore the accuracy of each station's weight and spacing data at a corridor level, and further implements the Statistical Process Control (SPC) method, finite mixture model, axle spacing error rating method, and data flag method in published research to examine the soundness of WIM systems. This research employs the historical WIM data to analyze sensor health and compares the evaluation results of the methods. The results suggest the new triangulation method identified most possible WIM malfunctions that other methods sensed, and this method unprecedentedly monitors the process behavior with controls of time and meteorological variables. The SPC method appeared superior in differentiating between sensor noises and sensor errors or drifts, but it drew wrong conclusions when accurate WIM data reference was absent. The axle spacing error rating method cannot check the essential weight data in special cases, but reliable loop sensor evaluation results were arrived at by employing this multiple linear regression model. The results of the data flag method and the finite mixed model results were not accurate, thus they could be used as additional tools to complement the data quality evaluation results. Overall, these data quality analysis results are the valuable sources for examining the early detection of system malfunctions, sensor drift, etc., and allow the WIM operators to correct the situation on time before large amounts of measurement are lost.
|
707 |
Malleefowl in the fragmented Western Australian wheatbelt : spatial and temporal analysis of a threatened speciesParsons, Blair January 2009 (has links)
[Truncated abstract] The malleefowl (Leipoa ocellata) is a large, ground-dwelling bird that is listed as threatened in all states of Australia in which it occurs. Its range encompasses much of southern Australia; however, much of it has been cleared for agriculture. Malleefowl are thought to have suffered substantial decline owing to multiple threats that include habitat loss, predation from exotic predators, grazing of habitat by introduced herbivores and fire - common threats in the decline of many Australian vertebrate species. The malleefowl has an unmistakeable appearance, unique biology, and widespread distribution across Australia. Consequently, it has been the focus of much scientific and community interest. In the Western Australian wheatbelt, community groups are working to conserve the species and have been actively collecting data on its distribution for over 15 years. The vast majority of these data are presence-only and have been collected in an opportunistic manner but, combined with long-term data from government agencies and museums spanning over 150 years, they present a significant opportunity to inform ecological questions relevant to the conservation of the species. The purpose of this study was to answer key ecological questions regarding the distribution, status and habitat preferences of malleefowl using unstructured occurrence records supplemented by reliable absences derived from Bird Atlas data sets and targeted surveys. Malleefowl in the Western Australian wheatbelt were used as a case study to illustrate: 1) how the decline of a species can be quantified and causes of that decline identified; and 2) how threats can be identified and responses to threats explored. I used bioclimatic modelling to define and explore variation within the climatic niche of the Malleefowl across Australia. '...' This thesis provides substantial additional knowledge about the ecology, distribution and status of malleefowl in Western Australia. It also illustrates how opportunistic and unstructured data can be augmented to investigate key aspects of a species' ecology. Despite the limitations of these data, which primarily relate to variation in observer effort across time and space, they can provide important outcomes that may not be achieved using standard survey and data collection techniques. The utility of opportunistic data is greatest in situations where the species: is recognisable and easily observed; is relatively sedentary; and occurs within a landscape containing consistent land use and habitat types. The approaches used in this study could be applied by researchers to situations where community interest exists for species with these attributes. At a national scale, the malleefowl is predicted to decline by at least 20% over the next three generations. The findings of this thesis suggest that the future for the species in the Western Australian wheatbelt may not be as dire as predicted elsewhere within its range, owing largely to the easing and cessation of threatening processes (e.g. land clearing, grazing of habitat by livestock) and the ability of the species to occupy a variety of habitat types. Despite this perceived security, some caution must be exercised until there is a more complete knowledge of the impact of fox predation and reduced rainfall due to climate change on malleefowl populations. Furthermore, the status of the species beyond the agricultural landscapes in Western Australia requires closer examination.
|
708 |
Statistical methods on detecting superpositional signals in a wireless channelChan, Francis, Chun Ngai, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2006 (has links)
The objective of the thesis is concerned on the problem of detecting superpositional signals in a wireless channel. In many wireless systems, an observed signal is commonly represented as a linear combination of the transmitted signal with the interfering signals dispersed in space and time. These systems are generally known as the interference-limited systems. The mathematical model of these systems is generally referred as a superpositional model. A distinguished characteristic of signal transmission in a time-varying wireless channel is that the channel process is not known a priori. Reliable signal reception inherently requires exploiting the structure of the interfering signals under channel uncertainty. Our goal is to design computational efficient receivers for various interference-limited systems by using advanced statistical signal processing techniques. The thesis consists of four main parts. Firstly, we have proposed a novel Multi-Input Multi-Output (MIMO) signal detector, known as the neighbourhood exploring detector (NED). According to the maximum likelihood principle, the space time MIMO detection problem is equivalent to a NP-hard combinatorial optimization problem. The proposed detector is a sub-optimal maximum likelihood detector which eliminates exhaustive multidimensional searches. Secondly, we address the problem of signal synchronization for Global Positioning System (GPS) in a multipath environment. The problem of multipath mitigation constitutes a joint estimation of the unknown amplitudes, phases and time delays of the linearly combined signals. The complexity of the nonlinear joint estimation problem increases exponentially with the number of signals. We have proposed two robust GPS code acquisition systems with low complexities. Thirdly, we deal with the problem of multipath mitigation in the spatial domain. A GPS receiver integrated with the Inertial Navigation System (INS) and a multiple antenna array is considered. We have designed a software based GPS receiver which effectively estimates the directions of arrival and the time of arrival of the linearly combined signals. Finally, the problem of communications with unknown channel state information is investigated. Conventionally, the information theoretical communication problem and the channel estimation problem are decoupled. However the training sequence, which facilitates the estimation of the channel, reduces the throughput of the channel. We have analytically derived the optimal length of the training sequence which maximizes the mutual information in a block fading channel.
|
709 |
The influence of occasion on consumer choice: an occasion based, value oriented investigation of wine purchase, using means-end chain analysis / by Edward John HallHall, Edward John January 2003 (has links)
Includes list of Supplementary refereed publications relating to thesis; and of Refereed conference papers, as appendix 1 / Includes bibliograhical references (p. 316-343) / xix, 381 p. : ill. ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / Focusses particularly on the purchase of wine and the factors that influence consumer choice and the values that drive the decision process across different consumption occasions. The effectiveness of occasion as part of the theoretical model of means-end chain analysis is investigated, as well as the feasibility of occasion in the Olsen and Thach (2001) conceptual framework of consumer behavior relating to wine. / Thesis (Ph.D.)--University of Adelaide, School of Agriculture and Wine, Discipline of Wine and Horticulture, 2003
|
710 |
Multiscale fractality with application and statistical modeling and estimation for computer experiment of nano-particle fabricationWoo, Hin Kyeol 24 August 2012 (has links)
The first chapter proposes multifractal analysis to measure inhomogeneity of regularity of 1H-NMR spectrum using wavelet-based multifractal tools. The geometric summaries of multifractal spectrum are informative summaries, and as such employed to discriminate 1H-NMR spectra associated with different treatments. The methodology is applied to evaluate the effect of sulfur amino acids.
The second part of this thesis provides essential materials for understanding engineering background of a nano-particle fabrication process. The third chapter introduces a constrained random effect model. Since there are certain combinations of process variables resulting to unproductive process outcomes, a logistic model is used to characterize such a process behavior. For the cases with productive outcomes a normal regression serves the second part of the model. Additionally, random-effects are included in both logistics and normal regression models to describe the potential spatial correlation among data. This chapter researches a way to approximate the likelihood function and to find estimates for maximizing the approximated likelihood.
The last chapter presents a method to decide the sample size under multi-layer system. The multi-layer is a series of layers, which become smaller and smaller. Our focus is to decide the sample size in each layer. The sample size decision has several objectives, and the most important purpose is the sample size should be enough to give a right direction to the next layer. Specifically, the bottom layer, which is the smallest neighborhood around the optimum, should meet the tolerance requirement. Performing the hypothesis test of whether the next layer includes the optimum gives the required sample size.
|
Page generated in 0.1122 seconds