Global ETD Search

11	Improvements in the Accuracy of Pairwise Genomic Alignment Hudek, Alexander Karl January 2010 (has links) Pairwise sequence alignment is a fundamental problem in bioinformatics with wide applicability. This thesis presents three new algorithms for this well-studied problem. First, we present a new algorithm, RDA, which aligns sequences in small segments, rather than by individual bases. Then, we present two algorithms for aligning long genomic sequences: CAPE, a pairwise global aligner, and FEAST, a pairwise local aligner. RDA produces interesting alignments that can be substantially different in structure than traditional alignments. It is also better than traditional alignment at the task of homology detection. However, its main negative is a very slow run time. Further, although it produces alignments with different structure, it is not clear if the differences have a practical value in genomic research. Our main success comes from our local aligner, FEAST. We describe two main improvements: a new more descriptive model of evolution, and a new local extension algorithm that considers all possible evolutionary histories rather than only the most likely. Our new model of evolution provides for improved alignment accuracy, and substantially improved parameter training. In particular, we produce a new parameter set for aligning human and mouse sequences that properly describes regions of weak similarity and regions of strong similarity. The second result is our new extension algorithm. Depending on heuristic settings, our new algorithm can provide for more sensitivity than existing extension algorithms, more specificity, or a combination of the two. By comparing to CAPE, our global aligner, we find that the sensitivity increase provided by our local extension algorithm is so substantial that it outperforms CAPE on sequence with 0.9 or more expected substitutions per site. CAPE itself gives improved sensitivity for sequence with 0.7 or more expected substitutions per site, but at a great run time cost. FEAST and our local extension algorithm improves on this too, the run time is only slightly slower than existing local alignment algorithms and asymptotically the same. bioinformatics pairwise alignment Hidden Markov Models Computer Science
12	Bilateral Asymmetry in Incisors: Implications for Miocene Hominoid Species Diagnosis Davis, Candace Ann 01 August 2011 (has links) AN ABSTRACT OF THE DISSERTATION OF CANDACE A. DAVIS, for the Doctor of Philosophy Degree in ANTHROPOLOGY, presented on March 31, 2011, at Southern Illinois University at Carbondale. TITLE: BILATERAL ASYMMETRY IN INCISORS: IMPLICATIONS FOR MIOCENE HOMINOID SPECIES DIAGNOSIS MAJOR PROFESSOR: Dr. Robert S. Corruccini The primary purpose of this dissertation is to show how knowledge of variation and asymmetry in incisor antimeric pairs of living great ape genera can be utilized as a "yardstick" for pairwise comparisons of isolated Miocene ape incisors from the two genera Kenyapithecus and Equatorius . The research was designed to help determine whether these fossil teeth could be reliably sorted into one or more than one genera. Both metric and morphological data for each class of incisor were recorded for Kenyapithecus and Equatorius , and resampling was performed to determine the significance of variation (p<.05) for each of 12 traits. Intraindividual antimeric differences in three genera of extant great apes were compared with interspecimen differences between Equatorius and Kenyapithecus. Pairwise comparisons using resampling sorted out which traits showed intraindividual significant variation and which could be used to discriminate between the two fossil genera under consideration. Based on these results, one can cautiously conclude the two fossil species within these genera are not different enough to justify placing them in two different genera. fluctuating asymmetry incisors Kenyapithecus Miocene pairwise comparisons sclerocarpy
13	User Behavior Learning in Designing Restaurant Recommender Systems: An Approach to Leveraging Historical Data and Implicit Feedback Haoxian, Feng January 2017 (has links) In typical restaurant recommendations, knowledge-based methods are used most often and do not take advantage of personal historical data. In this thesis, we are going to make some improvements to the Chicago Entrée restaurant recommender system. We will exploit the historical data and propose a weighted similarity approach to combine heuristic similarity with tag similarity between restaurants. Also, we show an improved way to mine the semantics of user behaviors using heuristic metric. These proposed approaches are evaluated by the comparison of three different pairwise approaches to learning to rank (LTR) in matrix factorization and five classic recommendation algorithms. The result shows that the combinatorial similarity outperforms the heuristic similarity on the precision, recall, F-score, and mean reciprocal rank. Pairwise learning Restaurant recommender system User behavior learning
14	Pairwise Classification and Pairwise Support Vector Machines Brunner, Carl 04 June 2012 (has links) (PDF) Several modifications have been suggested to extend binary classifiers to multiclass classification, for instance the One Against All technique, the One Against One technique, or Directed Acyclic Graphs. A recent approach for multiclass classification is the pairwise classification, which relies on two input examples instead of one and predicts whether the two input examples belong to the same class or to different classes. A Support Vector Machine (SVM), which is able to handle pairwise classification tasks, is called pairwise SVM. A common pairwise classification task is face recognition. In this area, a set of images is given for training and another set of images is given for testing. Often, one is interested in the interclass setting. The latter means that any person which is represented by an image in the training set is not represented by any image in the test set. From the mentioned multiclass classification techniques only the pairwise classification technique provides meaningful results in the interclass setting. For a pairwise classifier the order of the two examples should not influence the classification result. A common approach to enforce this symmetry is the use of selected kernels. Relations between such kernels and certain projections are provided. It is shown, that those projections can lead to an information loss. For pairwise SVMs another approach for enforcing symmetry is the symmetrization of the training sets. In other words, if the pair (a,b) of examples is a training pair then (b,a) is a training pair, too. It is proven that both approaches do lead to the same decision function for selected parameters. Empirical tests show that the approach using selected kernels is three to four times faster. For a good interclass generalization of pairwise SVMs training sets with several million training pairs are needed. A technique is presented which further speeds up the training time of pairwise SVMs by a factor of up to 130 and thus enables the learning of training sets with several million pairs. Another element affecting time is the need to select several parameters. Even with the applied speed up techniques a grid search over the set of parameters would be very expensive. Therefore, a model selection technique is introduced that is much less computationally expensive. In machine learning, the training set and the test set are created by using some data generating process. Several pairwise data generating processes are derived from a given non pairwise data generating process. Advantages and disadvantages of the different pairwise data generating processes are evaluated. Pairwise Bayes' Classifiers are introduced and their properties are discussed. It is shown that pairwise Bayes' Classifiers for interclass generalization tasks can differ from pairwise Bayes' Classifiers for interexample generalization tasks. In face recognition the interexample task implies that each person which is represented by an image in the test set is also represented by at least one image in the training set. Moreover, the set of images of the training set and the set of images of the test set are disjoint. Pairwise SVMs are applied to four synthetic and to two real world datasets. One of the real world datasets is the Labeled Faces in the Wild (LFW) database while the other one is provided by Cognitec Systems GmbH. Empirical evidence for the presented model selection heuristic, the discussion about the loss of information and the provided speed up techniques is given by the synthetic databases and it is shown that classifiers of pairwise SVMs lead to a similar quality as pairwise Bayes' classifiers. Additionally, a pairwise classifier is identified for the LFW database which leads to an average equal error rate (EER) of 0.0947 with a standard error of the mean (SEM) of 0.0057. This result is better than the result of the current state of the art classifier, namely the combined probabilistic linear discriminant analysis classifier, which leads to an average EER of 0.0993 and a SEM of 0.0051. / Es gibt verschiedene Ansätze, um binäre Klassifikatoren zur Mehrklassenklassifikation zu nutzen, zum Beispiel die One Against All Technik, die One Against One Technik oder Directed Acyclic Graphs. Paarweise Klassifikation ist ein neuerer Ansatz zur Mehrklassenklassifikation. Dieser Ansatz basiert auf der Verwendung von zwei Input Examples anstelle von einem und bestimmt, ob diese beiden Examples zur gleichen Klasse oder zu unterschiedlichen Klassen gehören. Eine Support Vector Machine (SVM), die für paarweise Klassifikationsaufgaben genutzt wird, heißt paarweise SVM. Beispielsweise werden Probleme der Gesichtserkennung als paarweise Klassifikationsaufgabe gestellt. Dazu nutzt man eine Menge von Bildern zum Training und ein andere Menge von Bildern zum Testen. Häufig ist man dabei an der Interclass Generalization interessiert. Das bedeutet, dass jede Person, die auf wenigstens einem Bild der Trainingsmenge dargestellt ist, auf keinem Bild der Testmenge vorkommt. Von allen erwähnten Mehrklassenklassifikationstechniken liefert nur die paarweise Klassifikationstechnik sinnvolle Ergebnisse für die Interclass Generalization. Die Entscheidung eines paarweisen Klassifikators sollte nicht von der Reihenfolge der zwei Input Examples abhängen. Diese Symmetrie wird häufig durch die Verwendung spezieller Kerne gesichert. Es werden Beziehungen zwischen solchen Kernen und bestimmten Projektionen hergeleitet. Zudem wird gezeigt, dass diese Projektionen zu einem Informationsverlust führen können. Für paarweise SVMs ist die Symmetrisierung der Trainingsmengen ein weiter Ansatz zur Sicherung der Symmetrie. Das bedeutet, wenn das Paar (a,b) von Input Examples zur Trainingsmenge gehört, dann muss das Paar (b,a) ebenfalls zur Trainingsmenge gehören. Es wird bewiesen, dass für bestimmte Parameter beide Ansätze zur gleichen Entscheidungsfunktion führen. Empirische Messungen zeigen, dass der Ansatz mittels spezieller Kerne drei bis viermal schneller ist. Um eine gute Interclass Generalization zu erreichen, werden bei paarweisen SVMs Trainingsmengen mit mehreren Millionen Paaren benötigt. Es wird eine Technik eingeführt, die die Trainingszeit von paarweisen SVMs um bis zum 130-fachen beschleunigt und es somit ermöglicht, Trainingsmengen mit mehreren Millionen Paaren zu verwenden. Auch die Auswahl guter Parameter für paarweise SVMs ist im Allgemeinen sehr zeitaufwendig. Selbst mit den beschriebenen Beschleunigungen ist eine Gittersuche in der Menge der Parameter sehr teuer. Daher wird eine Model Selection Technik eingeführt, die deutlich geringeren Aufwand erfordert. Im maschinellen Lernen werden die Trainingsmenge und die Testmenge von einem Datengenerierungsprozess erzeugt. Ausgehend von einem nicht paarweisen Datengenerierungsprozess werden unterschiedliche paarweise Datengenerierungsprozesse abgeleitet und ihre Vor- und Nachteile bewertet. Es werden paarweise Bayes-Klassifikatoren eingeführt und ihre Eigenschaften diskutiert. Es wird gezeigt, dass sich diese Bayes-Klassifikatoren für Interclass Generalization Aufgaben und für Interexample Generalization Aufgaben im Allgemeinen unterscheiden. Bei der Gesichtserkennung bedeutet die Interexample Generalization, dass jede Person, die auf einem Bild der Testmenge dargestellt ist, auch auf mindestens einem Bild der Trainingsmenge vorkommt. Außerdem ist der Durchschnitt der Menge der Bilder der Trainingsmenge mit der Menge der Bilder der Testmenge leer. Paarweise SVMs werden an vier synthetischen und an zwei Real World Datenbanken getestet. Eine der verwendeten Real World Datenbanken ist die Labeled Faces in the Wild (LFW) Datenbank. Die andere wurde von Cognitec Systems GmbH bereitgestellt. Die Annahmen der Model Selection Technik, die Diskussion über den Informationsverlust, sowie die präsentierten Beschleunigungstechniken werden durch empirische Messungen mit den synthetischen Datenbanken belegt. Zudem wird mittels dieser Datenbanken gezeigt, dass Klassifikatoren von paarweisen SVMs zu ähnlich guten Ergebnissen wie paarweise Bayes-Klassifikatoren führen. Für die LFW Datenbank wird ein paarweiser Klassifikator bestimmt, der zu einer durchschnittlichen Equal Error Rate (EER) von 0.0947 und einem Standard Error of The Mean (SEM) von 0.0057 führt. Dieses Ergebnis ist besser als das des aktuellen State of the Art Klassifikators, dem Combined Probabilistic Linear Discriminant Analysis Klassifikator. Dieser führt zu einer durchschnittlichen EER von 0.0993 und einem SEM von 0.0051. Paarwise Support Vector Machines Interklassen Generalisierung Paarweise Kernel Large Scale Probleme Model Auswahl Paarweise Klassifikation Pairwise Support Vector Machines Interclass Generalization Pairwise Kernels Large Scale Problems Model Selection Pairwise Classification ddc:510 rvk:SK 880
15	Pairwise Classification and Pairwise Support Vector Machines Brunner, Carl 16 May 2012 (has links) Several modifications have been suggested to extend binary classifiers to multiclass classification, for instance the One Against All technique, the One Against One technique, or Directed Acyclic Graphs. A recent approach for multiclass classification is the pairwise classification, which relies on two input examples instead of one and predicts whether the two input examples belong to the same class or to different classes. A Support Vector Machine (SVM), which is able to handle pairwise classification tasks, is called pairwise SVM. A common pairwise classification task is face recognition. In this area, a set of images is given for training and another set of images is given for testing. Often, one is interested in the interclass setting. The latter means that any person which is represented by an image in the training set is not represented by any image in the test set. From the mentioned multiclass classification techniques only the pairwise classification technique provides meaningful results in the interclass setting. For a pairwise classifier the order of the two examples should not influence the classification result. A common approach to enforce this symmetry is the use of selected kernels. Relations between such kernels and certain projections are provided. It is shown, that those projections can lead to an information loss. For pairwise SVMs another approach for enforcing symmetry is the symmetrization of the training sets. In other words, if the pair (a,b) of examples is a training pair then (b,a) is a training pair, too. It is proven that both approaches do lead to the same decision function for selected parameters. Empirical tests show that the approach using selected kernels is three to four times faster. For a good interclass generalization of pairwise SVMs training sets with several million training pairs are needed. A technique is presented which further speeds up the training time of pairwise SVMs by a factor of up to 130 and thus enables the learning of training sets with several million pairs. Another element affecting time is the need to select several parameters. Even with the applied speed up techniques a grid search over the set of parameters would be very expensive. Therefore, a model selection technique is introduced that is much less computationally expensive. In machine learning, the training set and the test set are created by using some data generating process. Several pairwise data generating processes are derived from a given non pairwise data generating process. Advantages and disadvantages of the different pairwise data generating processes are evaluated. Pairwise Bayes' Classifiers are introduced and their properties are discussed. It is shown that pairwise Bayes' Classifiers for interclass generalization tasks can differ from pairwise Bayes' Classifiers for interexample generalization tasks. In face recognition the interexample task implies that each person which is represented by an image in the test set is also represented by at least one image in the training set. Moreover, the set of images of the training set and the set of images of the test set are disjoint. Pairwise SVMs are applied to four synthetic and to two real world datasets. One of the real world datasets is the Labeled Faces in the Wild (LFW) database while the other one is provided by Cognitec Systems GmbH. Empirical evidence for the presented model selection heuristic, the discussion about the loss of information and the provided speed up techniques is given by the synthetic databases and it is shown that classifiers of pairwise SVMs lead to a similar quality as pairwise Bayes' classifiers. Additionally, a pairwise classifier is identified for the LFW database which leads to an average equal error rate (EER) of 0.0947 with a standard error of the mean (SEM) of 0.0057. This result is better than the result of the current state of the art classifier, namely the combined probabilistic linear discriminant analysis classifier, which leads to an average EER of 0.0993 and a SEM of 0.0051. / Es gibt verschiedene Ansätze, um binäre Klassifikatoren zur Mehrklassenklassifikation zu nutzen, zum Beispiel die One Against All Technik, die One Against One Technik oder Directed Acyclic Graphs. Paarweise Klassifikation ist ein neuerer Ansatz zur Mehrklassenklassifikation. Dieser Ansatz basiert auf der Verwendung von zwei Input Examples anstelle von einem und bestimmt, ob diese beiden Examples zur gleichen Klasse oder zu unterschiedlichen Klassen gehören. Eine Support Vector Machine (SVM), die für paarweise Klassifikationsaufgaben genutzt wird, heißt paarweise SVM. Beispielsweise werden Probleme der Gesichtserkennung als paarweise Klassifikationsaufgabe gestellt. Dazu nutzt man eine Menge von Bildern zum Training und ein andere Menge von Bildern zum Testen. Häufig ist man dabei an der Interclass Generalization interessiert. Das bedeutet, dass jede Person, die auf wenigstens einem Bild der Trainingsmenge dargestellt ist, auf keinem Bild der Testmenge vorkommt. Von allen erwähnten Mehrklassenklassifikationstechniken liefert nur die paarweise Klassifikationstechnik sinnvolle Ergebnisse für die Interclass Generalization. Die Entscheidung eines paarweisen Klassifikators sollte nicht von der Reihenfolge der zwei Input Examples abhängen. Diese Symmetrie wird häufig durch die Verwendung spezieller Kerne gesichert. Es werden Beziehungen zwischen solchen Kernen und bestimmten Projektionen hergeleitet. Zudem wird gezeigt, dass diese Projektionen zu einem Informationsverlust führen können. Für paarweise SVMs ist die Symmetrisierung der Trainingsmengen ein weiter Ansatz zur Sicherung der Symmetrie. Das bedeutet, wenn das Paar (a,b) von Input Examples zur Trainingsmenge gehört, dann muss das Paar (b,a) ebenfalls zur Trainingsmenge gehören. Es wird bewiesen, dass für bestimmte Parameter beide Ansätze zur gleichen Entscheidungsfunktion führen. Empirische Messungen zeigen, dass der Ansatz mittels spezieller Kerne drei bis viermal schneller ist. Um eine gute Interclass Generalization zu erreichen, werden bei paarweisen SVMs Trainingsmengen mit mehreren Millionen Paaren benötigt. Es wird eine Technik eingeführt, die die Trainingszeit von paarweisen SVMs um bis zum 130-fachen beschleunigt und es somit ermöglicht, Trainingsmengen mit mehreren Millionen Paaren zu verwenden. Auch die Auswahl guter Parameter für paarweise SVMs ist im Allgemeinen sehr zeitaufwendig. Selbst mit den beschriebenen Beschleunigungen ist eine Gittersuche in der Menge der Parameter sehr teuer. Daher wird eine Model Selection Technik eingeführt, die deutlich geringeren Aufwand erfordert. Im maschinellen Lernen werden die Trainingsmenge und die Testmenge von einem Datengenerierungsprozess erzeugt. Ausgehend von einem nicht paarweisen Datengenerierungsprozess werden unterschiedliche paarweise Datengenerierungsprozesse abgeleitet und ihre Vor- und Nachteile bewertet. Es werden paarweise Bayes-Klassifikatoren eingeführt und ihre Eigenschaften diskutiert. Es wird gezeigt, dass sich diese Bayes-Klassifikatoren für Interclass Generalization Aufgaben und für Interexample Generalization Aufgaben im Allgemeinen unterscheiden. Bei der Gesichtserkennung bedeutet die Interexample Generalization, dass jede Person, die auf einem Bild der Testmenge dargestellt ist, auch auf mindestens einem Bild der Trainingsmenge vorkommt. Außerdem ist der Durchschnitt der Menge der Bilder der Trainingsmenge mit der Menge der Bilder der Testmenge leer. Paarweise SVMs werden an vier synthetischen und an zwei Real World Datenbanken getestet. Eine der verwendeten Real World Datenbanken ist die Labeled Faces in the Wild (LFW) Datenbank. Die andere wurde von Cognitec Systems GmbH bereitgestellt. Die Annahmen der Model Selection Technik, die Diskussion über den Informationsverlust, sowie die präsentierten Beschleunigungstechniken werden durch empirische Messungen mit den synthetischen Datenbanken belegt. Zudem wird mittels dieser Datenbanken gezeigt, dass Klassifikatoren von paarweisen SVMs zu ähnlich guten Ergebnissen wie paarweise Bayes-Klassifikatoren führen. Für die LFW Datenbank wird ein paarweiser Klassifikator bestimmt, der zu einer durchschnittlichen Equal Error Rate (EER) von 0.0947 und einem Standard Error of The Mean (SEM) von 0.0057 führt. Dieses Ergebnis ist besser als das des aktuellen State of the Art Klassifikators, dem Combined Probabilistic Linear Discriminant Analysis Klassifikator. Dieser führt zu einer durchschnittlichen EER von 0.0993 und einem SEM von 0.0051. info:eu-repo/classification/ddc/510 ddc:510
16	Agricultural trade liberalization : an international trade network approach May Montana, Daniel Esteban January 2018 (has links) A number of attempts have been made to facilitate agricultural trade liberalisation over the last decades. In spite of these efforts, trade liberalisation of agricultural and food processed goods has been modest. It is argued that this lack of trade liberalisation is explained by the existence of governments that are politically biased in the sense that they place anti-trade policies in order to favour powerful sectors in the economy. While there exists some evidence supporting this argument, it is difficult to assess how these biases influence agricultural trade patterns because existing quantitative modelling approaches do not normally consider simultaneously key aspects that characterise the food industry such as intra-industry trade and the existence of intermediaries in the supply chain with significant market power, among others. The objective of this thesis is to offer an alternative theoretical model that has the potential to accommodate these key aspects and corresponds to an international trade network model that extends the framework developed by Goyal and Joshi (2006). The model was solved by means of simulations and the results revealed that policy biased indeed can prevent trade liberalisation of agricultural and food processed goods. However, other factors that apparently have not been reported so far and that are related to the market power exercised by intermediaries were identified. They correspond to the position of a country in the trade network (i.e. a country occupying a central position in the network is less likely to support trade liberalisation independently of any policy bias), the possibility that global free trade is an unlikely outcome, and the possibility that the world is trapped in an inefficient international trade network. The results also revealed that the adoption of compensatory lump sum payments across countries (i.e. inter-node transfers) or across sectors within a country (i.e. intra-node transfers) could be used a potential tools to achieve global free trade in agriculture as they can compensate losers from trade by gainers achieving, as a consequence, Pareto improving outcomes. 330
17	Ranking from Pairwise Comparisons : The Role of the Pairwise Preference Matrix Rajkumar, Arun January 2016 (has links) (PDF) Ranking a set of candidates or items from pair-wise comparisons is a fundamental problem that arises in many settings such as elections, recommendation systems, sports team rankings, document rankings and so on. Indeed it is well known in the psychology literature that when a large number of items are to be ranked, it is easier for humans to give pair-wise comparisons as opposed to complete rankings. The problem of ranking from pair-wise comparisons has been studied in multiple communities such as machine learning, operations research, linear algebra, statistics etc., and several algorithms (both classic and recent) have been proposed. However, it is not well under-stood under what conditions these different algorithms perform well. In this thesis, we aim to fill this fundamental gap, by elucidating precise conditions under which different algorithms perform well, as well as giving new algorithms that provably perform well under broader conditions. In particular, we consider a natural statistical model wherein for every pair of items (i; j), there is a probability Pij such that each time items i and j are compared, item j beats item i with probability Pij . Such models, which we summarize through a matrix containing all these pair-wise probabilities, have been used explicitly or implicitly in much previous work in the area; we refer to the resulting matrix as the pair-wise preference matrix, and elucidate clearly the crucial role it plays in determining the performance of various algorithms. In the first part of the thesis, we consider a natural generative model where all pairs of items can be sampled and where the underlying preferences are assumed to be acyclic. Under this setting, we elucidate the conditions on the pair-wise preference matrix under which popular algorithms such as matrix Borda, spectral ranking, least squares and maximum likelihood under a Bradley-Terry-Luce (BTL) model produce optimal rankings that minimize the pair-wise disagreement error. Specifically, we derive explicit sample complexity bounds for each of these algorithms to output an optimal ranking under interesting subclasses of the class of all acyclic pair-wise preference matrices. We show that none of these popular algorithms is guaranteed to produce optimal rankings for all acyclic preference matrices. We then pro-pose a novel support vector machine based rank aggregation algorithm that provably does so. In the second part of the thesis, we consider the setting where preferences may contain cycles. Here, finding a ranking that minimizes the pairwise disagreement error is in general NP-hard. However, even in the presence of cycles, one may wish to rank 'good' items ahead of the rest. We develop a framework for this setting using notions of winners based on tournament solution concepts from social choice theory. We first show that none of the existing algorithms are guaranteed to rank winners ahead of the rest for popular tournament solution based winners such as top cycle, Copeland set, Markov set etc. We propose three algorithms - matrix Copeland, unweighted Markov and parametric Markov - which provably rank winners at the top for these popular tournament solutions. In addition to ranking winners at the top, we show that the rankings output by the matrix Copeland and the parametric Markov algorithms also minimize the pair-wise disagreement error for certain classes of acyclic preference matrices. Finally, in the third part of the thesis, we consider the setting where the number of items to be ranked is large and it is impractical to obtain comparisons among all pairs. Here, one samples a small set of pairs uniformly at random and compares each pair a fixed number of times; in particular, the goal is to come up with good algorithms that sample comparisons among only O(nlog(n)) item pairs (where n is the number of items). Unlike existing results for such settings, where one either assumes a noisy permutation model (under which there is a true underlying ranking and the outcome of every comparison differs from the true ranking with some fixed probability) or assumes a BTL or Thurstone model, we develop a general algorithmic framework based on ideas from matrix completion, termed low-rank pair-wise ranking, which provably produces an good ranking by comparing only O(nlog(n)) pairs, O(log(n)) times each, not only for popular classes of models such as BTL and Thurstone, but also for much more general classes of models wherein a suitable transform of the pair-wise probabilities leads to a low-rank matrix; this subsumes the guarantees of many previous algorithms in this setting. Overall, our results help to understand at a fundamental level the statistical properties of various algorithms for the problem of ranking from pair-wise comparisons, and under various natural settings, lead to novel algorithms with improved statistical guarantees compared to existing algorithms for this problem. Pairwise Comparison - Ranking Pairwise Preference Matrix Bradley Terry Luce Condition Algorithms Probability Models Bradley-Terry-Luce (BTL) Binary Choice Polytope (BCP) Triangle Inequality (TI) Computer Science
18	Islands, Metapopulations, and Archipelagos: Genetic Equilibrium and Non-equilibrium Dynamics of Structured Populations in the Context of Conservation Reynolds, Robert Graham 01 May 2011 (has links) Understanding complex population dynamics is critical for both basic and applied ecology. Analysis of genetic data has been promoted as a way to reconstruct recent non-equilibrium processes that influence the apportioning of genetic diversity among populations of organisms. In a structured-deme context, where individual populations exist as geographically distinct units, island biogeography theory and metapopulation genetics predict that the demographic processes of extinction, colonization, and migration will affect the magnitude and rate of genetic divergence between demes. New methods have been developed to attempt to detect the influence of non-equilibrium dynamics in structured populations. I challenged two of these methods: decomposed pairwise regression and allele frequency analyses, using simulations of genetic data from structured demes. I found that these methods suffer from a high type II error rate, or failure to reject the null hypothesis of mutation-migration-drift equilibrium for demes experiencing historical demographic events. In addition, island biogeography and metapopulation ecology predict that at equilibrium, some species in a patch will be recent colonists, as equilibrium indicates a balance between colonization of the patch and extinction from the patch. Recent colonists are unlikely to have reached population mutation-migration-drift equilibrium; hence a paradox exists between population and community level equilibrium. I used nuclear and mitochondrial genetic data from populations of two species of reptiles from the Turks and Caicos Islands, British West Indies to test for patterns of equilibrium vs. non-equilibrium. I found unexpected shallow genetic divergence in the Turks Island boa (Epicrates chrysogaster), indicating that this species likely existed as a panmictic population prior to the inundation of the Turks and Caicos Banks during the last glaciation. As the initial methods I tested using simulations proved unreliable, I used methods from phylogeography, landscape genetics, and island biogeography to detect significant non-equilibrium dynamics in the Turks and Caicos curly-tailed lizard (Leiocephalus psammodromus), finding evidence for high levels of biased gene flow. I propose that studies of genetic diversity on island archipelagos use tools from all three of these methods to evaluate empirical data in the context of equilibrium and the null hypotheses offered by island biogeography and population genetics theory. I frame the results both in the context of conservation and an understanding of equilibrium and non-equilibrium dynamics. Turks and Caicos Islands equilibrium/non-equilibrium historical demography metapopulation decomposed pairwise regression bottleneck Evolution Population Biology
19	Aspects of Composite Likelihood Inference Jin, Zi 07 March 2011 (has links) A composite likelihood consists of a combination of valid likelihood objects, and in particular it is of typical interest to adopt lower dimensional marginal likelihoods. Composite marginal likelihood appears to be an attractive alternative for modeling complex data, and has received increasing attention in handling high dimensional data sets when the joint distribution is computationally difficult to evaluate, or intractable due to complex structure of dependence. We present some aspects of methodological development in composite likelihood inference. The resulting estimator enjoys desirable asymptotic properties such as consistency and asymptotic normality. Composite likelihood based test statistics and their asymptotic distributions are summarized. Higher order asymptotic properties of the signed composite likelihood root statistic are explored. Moreover, we aim to compare accuracy and efficiency of composite likelihood estimation relative to estimation based on ordinary likelihood. Analytical and simulation results are presented for different models, which include multivariate normal distributions, times series model, and correlated binary data. Composite likelihood inference Pairwise likelihood Asymptotic relative efficiency Higher-order asymptotics 0463
20	Aspects of Composite Likelihood Inference Jin, Zi 07 March 2011 (has links) A composite likelihood consists of a combination of valid likelihood objects, and in particular it is of typical interest to adopt lower dimensional marginal likelihoods. Composite marginal likelihood appears to be an attractive alternative for modeling complex data, and has received increasing attention in handling high dimensional data sets when the joint distribution is computationally difficult to evaluate, or intractable due to complex structure of dependence. We present some aspects of methodological development in composite likelihood inference. The resulting estimator enjoys desirable asymptotic properties such as consistency and asymptotic normality. Composite likelihood based test statistics and their asymptotic distributions are summarized. Higher order asymptotic properties of the signed composite likelihood root statistic are explored. Moreover, we aim to compare accuracy and efficiency of composite likelihood estimation relative to estimation based on ordinary likelihood. Analytical and simulation results are presented for different models, which include multivariate normal distributions, times series model, and correlated binary data. Composite likelihood inference Pairwise likelihood Asymptotic relative efficiency Higher-order asymptotics 0463

Search results