Global ETD Search

111	Commutative n-ary Arithmetic Bingham, Aram 15 May 2015 (has links) Motivated by primality and integer factorization, this thesis introduces generalizations of standard binary multiplication to commutative n-ary operations based upon geometric construction and representation. This class of operations are constructed to preserve commutativity and identity so that binary multiplication is included as a special case, in order to preserve relationships with ordinary multiplicative number theory. This leads to a study of their expression in terms of elementary symmetric polynomials, and connections are made to results from the theory of polyadic (n-ary) groups. Higher order operations yield wider factorization and representation possibilities which correspond to reductions in the set of primes as well as tiered notions of primality. This comes at the expense of familiar algebraic properties such as associativity, and unique factorization. Criteria for primality and a naive testing algorithm are given for the ternary arithmetic, drawing heavily upon modular arithmetic. Finally, connections with the theory of partitions of integers and quadratic forms are discussed in relation to questions about cardinality of primes. number theory primality factorization ternary operation partitions Algebra Discrete Mathematics and Combinatorics Number Theory
112	Recherche de similarité dans du code source / Looking for similarity in source code Chilowicz, Michel 25 November 2010 (has links) La duplication de code source a de nombreuses origines : copie et adaptation inter-projets ou clonage au sein d'un même projet. Rechercher des correspondances de code copié permet de le factoriser dans un projet ou de mettre en évidence des situations de plagiat. Nous étudions des méthodes statiques de recherche de similarité sur du code ayant potentiellement subi des opérations d'édition telle que l'insertion, la suppression, la transposition ainsi que la factorisation et le développement de fonctions. Des techniques d'identification de similarité génomique sont examinées et adaptées au contexte de la recherche de clones de code source sous forme lexemisée. Après une discussion sur des procédés d'alignement de lexèmes et de recherche par empreintes de n-grams, est présentée une méthode de factorisation fusionnant les graphes d'appels de fonctions de projets au sein d'un graphe unique avec introduction de fonctions synthétiques exprimant les correspondances imbriquées. Elle utilise des structures d'indexation de suffixes pour la détermination de facteurs répétés. Une autre voie d'exploration permettant de manipuler de grandes bases indexées de code par arbre de syntaxe est abordée avec la recherche de sous-arbres similaires par leur hachage et leur indexation selon des profils d'abstraction variables. Des clones exacts de sous-arbres de forte proximité dans leurs arbres d'extraction peuvent alors être consolidés afin d'obtenir des correspondances approchées et étendues. En amont et en aval de la recherche de correspondances, des métriques de similarité sont définies afin de préselectionner les zones d'examen, affiner la recherche ou mieux représenter les résultats / Several phenomenas cause source code duplication like inter-project copying and adaptation or cloning inside a same project. Looking for code matches allows to factorize them inside a project or to highlight plagiarism cases. We study statical similarity retrieval methods on source code that may be transformed via edit operations like insertion, deletion, transposition, in- or out-lining of functions. Sequence similarity retrieval methods inspired from genomics are studied and adapted to find common chunks of tokenized source. After an explanation on alignment and n-grams lookup techniques, we present a factorization method that merge function call graphs of projects to a single graph with the creation of synthetic functions modeling nested matches. It relies on the use of suffix indexation structures to find repeated token factors. Syntax tree indexation is explored to handle huge code bases allowing to lookup similar sub-trees with their hash values computed via heterogeneous abstraction profiles. Exact copies of sub-trees close in their host trees may be merged to get approximate and extended matches. Before and after match retrieval, we define similarity metrics to preselect interesting code spots, refine the search process or enhance the human understanding of results Obfuscation Duplication Factorisation Séquence de lexèmes Arbre de syntaxe Plagiat Obfuscation Duplication Factorization Token sequence Syntax tree Plagiarism
113	A wikification prediction model based on the combination of latent, dyadic and monadic features / Um modelo de previsão para Wikification baseado na combinação de atributos latentes, diádicos e monádicos Ferreira, Raoni Simões 25 April 2016 (has links) Most of the reference information, nowadays, is found in repositories of documents semantically linked, created in a collaborative fashion and freely available in the web. Among the many problems faced by content providers in these repositories, one of the most important is Wikification, that is, the placement of links in the articles. These links have to support user navigation and should provide a deeper semantic interpretation of the content. Wikification is a hard task since the continuous growth of such repositories makes it increasingly demanding for editors. As consequence, they have their focus shifted from content creation, which should be their main objective. This has motivated the design of automatic Wikification tools which, traditionally, address two distinct problems: (a) how to identify which words (or phrases) in an article should be selected as anchors and (b) how to determine to which article the link, associated with the anchor, should point. Most of the methods in literature that address these problems are based on machine learning approaches which attempt to capture, through statistical features, characteristics of the concepts and its associations. Although these strategies handle the repository as a graph of concepts, normally they take limited advantage of the topological structure of this graph, as they describe it by means of human-engineered link statistical features. Despite the effectiveness of these machine learning methods, better models should take full advantage of the information topology if they describe it by means of data-oriented approaches such as matrix factorization. This indeed has been successfully done in other domains, such as movie recommendation. In this work, we fill this gap, proposing a wikification prediction model that combines the strengths of traditional predictors based on statistical features with a latent component which models the concept graph topology by means of matrix factorization. By comparing our model with a state-of-the-art wikification method, using a sample of Wikipedia articles, we obtained a gain up to 13% in F1 metric. We also provide a comprehensive analysis of the model performance showing the importance of the latent predictor component and the attributes derived from the associations between the concepts. The study still includes the analysis of the impact of ambiguous concepts, which allows us to conclude the model is resilient to ambiguity, even though does not include any explicitly disambiguation phase. We finally study the impact of selecting training samples from specific content quality classes, an information that is available in some respositories, such as Wikipedia. We empirically shown that the quality of the training samples impact on precision and overlinking, when comparing training performed using random quality samples versus high quality samples. / Atualmente, informações de referência são disponibilizadas através de repositórios de documentos semanticamente ligados, criados de forma colaborativa e com acesso livre na Web. Entre os muitos problemas enfrentados pelos provedores de conteúdo desses repositórios, destaca-se a Wikification, isto é, a inclusão de links nos artigos desses repositórios. Esses links possibilitam a navegação pelos artigos e permitem ao usuário um aprofundamento semântico do conteúdo. A Wikification é uma tarefa complexa, uma vez que o crescimento contínuo de tais repositórios resulta em um esforço cada vez maior dos editores. Como consequência, eles têm seu foco desviado da criação de conteúdo, que deveria ser o seu principal objetivo. Isso tem motivado o desenvolvimento de ferramentas de Wikification automática que, tradicionalmente, abordam dois problemas distintos: (a) como identificar que palavras (ou frases) em um artigo deveriam ser selecionados como texto de âncora e (b) como determinar para que artigos o link, associado ao texto de âncora, deveria apontar. A maioria dos métodos na literatura que abordam esses problemas usam aprendizado de máquina. Eles tentam capturar, através de atributos estatísticos, características dos conceitos e seus links. Embora essas estratégias tratam o repositório como um grafo de conceitos, normalmente elas pouco exploram a estrutura topológica do grafo, uma vez que se limitam a descrevê-lo por meio de atributos estatísticos dos links, projetados por especialistas humanos. Embora tais métodos sejam eficazes, novos modelos poderiam tirar mais proveito da topologia se a descrevessem por meio de abordagens orientados a dados, tais como a fatoração matricial. De fato, essa abordagem tem sido aplicada com sucesso em outros domínios como recomendação de filmes. Neste trabalho, propomos um modelo de previsão para Wikification que combina a força dos previsores tradicionais baseados em atributos estatísticos, projetados por seres humanos, com um componente de previsão latente, que modela a topologia do grafo de conceitos usando fatoração matricial. Ao comparar nosso modelo com o estado-da-arte em Wikification, usando uma amostra de artigos Wikipédia, observamos um ganho de até 13% em F1. Além disso, fornecemos uma análise detalhada do desempenho do modelo enfatizando a importância do componente de previsão latente e dos atributos derivados dos links entre os conceitos. Também analisamos o impacto de conceitos ambíguos, o que permite concluir que nosso modelo se porta bem mesmo diante de ambiguidade, apesar de não tratar explicitamente este problema. Ainda realizamos um estudo sobre o impacto da seleção das amostras de treino conforme a qualidade dos seus conteúdos, uma informação disponível em alguns repositórios, tais como a Wikipédia. Nós observamos que o treino com documentos de alta qualidade melhora a precisão do método, minimizando o uso de links desnecessários. Aprendizado de máquina Fatoração matricial Link prediction Machine learning Matrix factorization Previsão de links Wikificação Wikification Wikipedia Wikipédia
114	Desenvolvimento de preditores para recomendação automática de produtos. / Development of predictors for automated products recommendation. Fuks, Willian Jean 28 May 2013 (has links) Com o avanço da internet, novos tipos de negócios surgiram. Por exemplo, o sistema de anúncios online: produtores de sites e diversos outros conteúdos podem dedicar em uma parte qualquer de sua página um espaço para a impressão de anúncios de diversas lojas em troca de um valor oferecido pelo anunciante. É neste contexto que este trabalho se insere. O objetivo principal é o desenvolvimento de algoritmos que preveem a probabilidade que um dado usuário tem de se interessar e clicar em um anúncio a que está sendo exposto. Este problema é conhecido como predição de CTR (do inglês, \"Click-Through Rate\") ou taxa de conversão. Utiliza-se para isto uma abordagem baseada em regressão logística integrada a técnicas de fatoração de matriz que preveem, através da obtenção de fatores latentes do problema, a probabilidade de conversão para um anúncio impresso em dado site. Além disto, testes considerando uma estratégia dinâmica (em função do tempo) são apresentados indicando que o desempenho previamente obtido pode melhorar ainda mais. De acordo com o conhecimento do autor, esta é a primeira vez que este procedimento é relatado na literatura. / With the popularization of the internet, new types of business are emerging. An example is the online marketing system: publishers can dedicate in any given space of theirs websites a place to the printing of banners from different stores in exchange for a fee paid by the advertiser. It\'s in this context that this work takes place. Its main goal will be the development of algorithms that forecasts the probability that a given user will get interested in the ad he or she is seeing and click it. This problem is also known as CTR Prediction Task. To do so, a logistic regression approach is used combined with matrix factorization techniques that predict, through latent factor models, the probability that the click will occur. On top of that, several tests are conducted utilizing a dynamic approach (varying in function of time) revealing that the performance can increase even higher. According to the authors knowledge, this is the first time this test is conducted on the literature of CTR prediction. Computational advertising CTR predictor Inteligência artificial Machine learning Matriz factorization Online advertising Preditores (Desenvolvimento) Produtos SVD
115	Factorisation de matrices et analyse de contraste pour la recommandation / Matrix Factorization and Contrast Analysis Techniques for Recommendation Aleksandrova, Marharyta 07 July 2017 (has links) Dans de nombreux domaines, les données peuvent être de grande dimension. Ça pose le problème de la réduction de dimension. Les techniques de réduction de dimension peuvent être classées en fonction de leur but : techniques pour la représentation optimale et techniques pour la classification, ainsi qu'en fonction de leur stratégie : la sélection et l'extraction des caractéristiques. L'ensemble des caractéristiques résultant des méthodes d'extraction est non interprétable. Ainsi, la première problématique scientifique de la thèse est comment extraire des caractéristiques latentes interprétables? La réduction de dimension pour la classification vise à améliorer la puissance de classification du sous-ensemble sélectionné. Nous voyons le développement de la tâche de classification comme la tâche d'identification des facteurs déclencheurs, c'est-à-dire des facteurs qui peuvent influencer le transfert d'éléments de données d'une classe à l'autre. La deuxième problématique scientifique de cette thèse est comment identifier automatiquement ces facteurs déclencheurs? Nous visons à résoudre les deux problématiques scientifiques dans le domaine d'application des systèmes de recommandation. Nous proposons d'interpréter les caractéristiques latentes de systèmes de recommandation basés sur la factorisation de matrices comme des utilisateurs réels. Nous concevons un algorithme d'identification automatique des facteurs déclencheurs basé sur les concepts d'analyse par contraste. Au travers d'expérimentations, nous montrons que les motifs définis peuvent être considérés comme des facteurs déclencheurs / In many application areas, data elements can be high-dimensional. This raises the problem of dimensionality reduction. The dimensionality reduction techniques can be classified based on their aim: dimensionality reduction for optimal data representation and dimensionality reduction for classification, as well as based on the adopted strategy: feature selection and feature extraction. The set of features resulting from feature extraction methods is usually uninterpretable. Thereby, the first scientific problematic of the thesis is how to extract interpretable latent features? The dimensionality reduction for classification aims to enhance the classification power of the selected subset of features. We see the development of the task of classification as the task of trigger factors identification that is identification of those factors that can influence the transfer of data elements from one class to another. The second scientific problematic of this thesis is how to automatically identify these trigger factors? We aim at solving both scientific problematics within the recommender systems application domain. We propose to interpret latent features for the matrix factorization-based recommender systems as real users. We design an algorithm for automatic identification of trigger factors based on the concepts of contrast analysis. Through experimental results, we show that the defined patterns indeed can be considered as trigger factors Fouille de données Factorisation de matrices Système de recommandation Data mining Matrix factorization Recommender systems 006.312
116	Statistical Methods for Characterizing Genomic Heterogeneity in Mixed Samples Zhang, Fan 12 December 2016 (has links) "Recently, sequencing technologies have generated massive and heterogeneous data sets. However, interpretation of these data sets is a major barrier to understand genomic heterogeneity in complex diseases. In this dissertation, we develop a Bayesian statistical method for single nucleotide level analysis and a global optimization method for gene expression level analysis to characterize genomic heterogeneity in mixed samples. The detection of rare single nucleotide variants (SNVs) is important for understanding genetic heterogeneity using next-generation sequencing (NGS) data. Various computational algorithms have been proposed to detect variants at the single nucleotide level in mixed samples. Yet, the noise inherent in the biological processes involved in NGS technology necessitates the development of statistically accurate methods to identify true rare variants. At the single nucleotide level, we propose a Bayesian probabilistic model and a variational expectation maximization (EM) algorithm to estimate non-reference allele frequency (NRAF) and identify SNVs in heterogeneous cell populations. We demonstrate that our variational EM algorithm has comparable sensitivity and specificity compared with a Markov Chain Monte Carlo (MCMC) sampling inference algorithm, and is more computationally efficient on tests of relatively low coverage (27x and 298x) data. Furthermore, we show that our model with a variational EM inference algorithm has higher specificity than many state-of-the-art algorithms. In an analysis of a directed evolution longitudinal yeast data set, we are able to identify a time-series trend in non-reference allele frequency and detect novel variants that have not yet been reported. Our model also detects the emergence of a beneficial variant earlier than was previously shown, and a pair of concomitant variants. Characterization of heterogeneity in gene expression data is a critical challenge for personalized treatment and drug resistance due to intra-tumor heterogeneity. Mixed membership factorization has become popular for analyzing data sets that have within-sample heterogeneity. In recent years, several algorithms have been developed for mixed membership matrix factorization, but they only guarantee estimates from a local optimum. At the gene expression level, we derive a global optimization (GOP) algorithm that provides a guaranteed epsilon-global optimum for a sparse mixed membership matrix factorization problem for molecular subtype classification. We test the algorithm on simulated data and find the algorithm always bounds the global optimum across random initializations and explores multiple modes efficiently. The GOP algorithm is well-suited for parallel computations in the key optimization steps. " Rare variant detection Next-generation sequencing Bayesian statistics Variational inference Global optimization Matrix factorization
117	Using hyperbolic tangents in integer factoring Pinter, Ron Y January 1980 (has links) Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1980. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING / Bibliography: leaf 45. / by Ron Yair Pinter. / M.S. Factorization (Mathematics) Hyperbola Number theory Algorithms
118	Functions of structured matrices Arslan, Bahar January 2017 (has links) The growing interest in computing structured matrix functions stems from the fact that preserving and exploiting the structure of matrices can help us gain physically meaningful solutions with less computational cost and memory requirement. The work presented here is divided into two parts. The first part deals with the computation of functions of structured matrices. The second part is concerned with the structured error analysis in the computation of matrix functions. We present algorithms applying the inverse scaling and squaring method and using the Schur-like form of the symplectic matrices as an alternative to the algorithms using the Schur decomposition to compute the logarithm of symplectic matrices. There are two main calculations in the inverse scaling and squaring method: taking a square root and evaluating the Padé approximants. Numerical experiments suggest that using the Schur-like form with the structure preserving iterations for the square root helps us to exploit the Hamiltonian structure of the logarithm of symplectic matrices. Some type of matrices are nearly structured. We discuss the conditions for using the nearest structured matrix to the nearly structured one by analysing the forward error bounds. Since the structure preserving algorithms for computing the functions of matrices provide advantages in terms of accuracy and data storage we suggest to compute the function of the nearest structured matrix. The analysis is applied to the nearly unitary, nearly Hermitian and nearly positive semi-definite matrices for the matrix logarithm, square root, exponential, cosine and sine functions. It is significant to investigate the effect of the structured perturbations in the sensitivity analysis of matrix functions. We study the structured condition number of matrix functions defined between smooth square matrix manifolds. We develop algorithms computing and estimating the structured condition number. We also present the lower and upper bounds on the structured condition number, which are cheaper to compute than the "exact" structured condition number. We observe that the lower bounds give a good estimation for the structured condition numbers. Comparing the structured and unstructured condition number reveals that they can differ by several orders of magnitude. Having discussed how to compute the structured condition number of matrix functions defined between smooth square matrix manifolds we apply the theory of structured condition numbers to the structured matrix factorizations. We measure the sensitivity of matrix factors to the structured perturbations for the structured polar decomposition, structured sign factorization and the generalized polar decomposition. Finally, we consider the unstructured perturbation analysis for the canonical generalized polar decomposition by using three different methods. Apart from theoretical aspect of the perturbation analysis, perturbation bounds obtained from these methods are compared numerically and our findings show an improvement on the sharpness of the perturbation bounds in the literature. 510
119	Analýza útoků na asymetrické kryptosystémy / Analysis of attacks on asymmetric cryptosystems Tvaroh, Tomáš January 2011 (has links) This thesis analyzes various attacks on underlying computational problem of asymmetric cryptosystems. First part introduces two of the most used problems asymmetric cryptography is based on, which are integer factorization and computation of discrete logarithm. Algorithms for solving these problems are described and for each of them there is a discussion about when the use of this particular algorithm is appropriate and when it isn't. In the next part computational problems are related to algorithms RSA and ECC and it is shown, how solving the underlying problem enables us to crack the cypher. As a part of this thesis an application was developed that measures the efficiency of described attacks and by providing easy-to-understand enumeration of algorithm's steps it can be used to demonstrate how the attack works. Based on the results of performed analysis, most secure asymmetric cryptosystem is selected along with some recommendations regarding key pair generation.
120	A wikification prediction model based on the combination of latent, dyadic and monadic features / Um modelo de previsão para Wikification baseado na combinação de atributos latentes, diádicos e monádicos Raoni Simões Ferreira 25 April 2016 (has links) Most of the reference information, nowadays, is found in repositories of documents semantically linked, created in a collaborative fashion and freely available in the web. Among the many problems faced by content providers in these repositories, one of the most important is Wikification, that is, the placement of links in the articles. These links have to support user navigation and should provide a deeper semantic interpretation of the content. Wikification is a hard task since the continuous growth of such repositories makes it increasingly demanding for editors. As consequence, they have their focus shifted from content creation, which should be their main objective. This has motivated the design of automatic Wikification tools which, traditionally, address two distinct problems: (a) how to identify which words (or phrases) in an article should be selected as anchors and (b) how to determine to which article the link, associated with the anchor, should point. Most of the methods in literature that address these problems are based on machine learning approaches which attempt to capture, through statistical features, characteristics of the concepts and its associations. Although these strategies handle the repository as a graph of concepts, normally they take limited advantage of the topological structure of this graph, as they describe it by means of human-engineered link statistical features. Despite the effectiveness of these machine learning methods, better models should take full advantage of the information topology if they describe it by means of data-oriented approaches such as matrix factorization. This indeed has been successfully done in other domains, such as movie recommendation. In this work, we fill this gap, proposing a wikification prediction model that combines the strengths of traditional predictors based on statistical features with a latent component which models the concept graph topology by means of matrix factorization. By comparing our model with a state-of-the-art wikification method, using a sample of Wikipedia articles, we obtained a gain up to 13% in F1 metric. We also provide a comprehensive analysis of the model performance showing the importance of the latent predictor component and the attributes derived from the associations between the concepts. The study still includes the analysis of the impact of ambiguous concepts, which allows us to conclude the model is resilient to ambiguity, even though does not include any explicitly disambiguation phase. We finally study the impact of selecting training samples from specific content quality classes, an information that is available in some respositories, such as Wikipedia. We empirically shown that the quality of the training samples impact on precision and overlinking, when comparing training performed using random quality samples versus high quality samples. / Atualmente, informações de referência são disponibilizadas através de repositórios de documentos semanticamente ligados, criados de forma colaborativa e com acesso livre na Web. Entre os muitos problemas enfrentados pelos provedores de conteúdo desses repositórios, destaca-se a Wikification, isto é, a inclusão de links nos artigos desses repositórios. Esses links possibilitam a navegação pelos artigos e permitem ao usuário um aprofundamento semântico do conteúdo. A Wikification é uma tarefa complexa, uma vez que o crescimento contínuo de tais repositórios resulta em um esforço cada vez maior dos editores. Como consequência, eles têm seu foco desviado da criação de conteúdo, que deveria ser o seu principal objetivo. Isso tem motivado o desenvolvimento de ferramentas de Wikification automática que, tradicionalmente, abordam dois problemas distintos: (a) como identificar que palavras (ou frases) em um artigo deveriam ser selecionados como texto de âncora e (b) como determinar para que artigos o link, associado ao texto de âncora, deveria apontar. A maioria dos métodos na literatura que abordam esses problemas usam aprendizado de máquina. Eles tentam capturar, através de atributos estatísticos, características dos conceitos e seus links. Embora essas estratégias tratam o repositório como um grafo de conceitos, normalmente elas pouco exploram a estrutura topológica do grafo, uma vez que se limitam a descrevê-lo por meio de atributos estatísticos dos links, projetados por especialistas humanos. Embora tais métodos sejam eficazes, novos modelos poderiam tirar mais proveito da topologia se a descrevessem por meio de abordagens orientados a dados, tais como a fatoração matricial. De fato, essa abordagem tem sido aplicada com sucesso em outros domínios como recomendação de filmes. Neste trabalho, propomos um modelo de previsão para Wikification que combina a força dos previsores tradicionais baseados em atributos estatísticos, projetados por seres humanos, com um componente de previsão latente, que modela a topologia do grafo de conceitos usando fatoração matricial. Ao comparar nosso modelo com o estado-da-arte em Wikification, usando uma amostra de artigos Wikipédia, observamos um ganho de até 13% em F1. Além disso, fornecemos uma análise detalhada do desempenho do modelo enfatizando a importância do componente de previsão latente e dos atributos derivados dos links entre os conceitos. Também analisamos o impacto de conceitos ambíguos, o que permite concluir que nosso modelo se porta bem mesmo diante de ambiguidade, apesar de não tratar explicitamente este problema. Ainda realizamos um estudo sobre o impacto da seleção das amostras de treino conforme a qualidade dos seus conteúdos, uma informação disponível em alguns repositórios, tais como a Wikipédia. Nós observamos que o treino com documentos de alta qualidade melhora a precisão do método, minimizando o uso de links desnecessários. Aprendizado de máquina Fatoração matricial Previsão de links Wikificação Wikipédia Link prediction Machine learning Matrix factorization Wikification Wikipedia

Search results