Global ETD Search

221	Evaluation of relational algebra queries on probabilistic databases : tractability and approximation Fink, Robert D. January 2014 (has links) Query processing is a core task in probabilistic databases: Given a query and a database that encodes uncertainty in data by means of probability distributions, the problem is to compute possible query answers together with their respective probabilities of being correct. This thesis advances the state of the art in two aspects of query processing in probabilistic databases: complexity analysis and query evaluation techniques. A dichotomy is established for non-repeating, con- junctive relational algebra queries with negation that separates #P-hard queries from those with PTIME data complexity. A framework for computing proba- bilities of relational algebra queries is presented; the probability computation algorithm is based on decomposition methods and provides exact answers in the case of exhaustive decompositions, or anytime approximate answers with absolute or relative error guarantees in the case of partial decompositions. The framework is extended to queries with aggregation operators. An experimental evaluation of the proposed algorithms’ implementations within the SPROUT query engine complements the theoretical results. The SPROUT<sup>2</sup> system uses this query engine to compute answers to queries on uncertain, tabular Web data. 005.74
222	Probabilistic modelling of morphologically rich languages Botha, Jan Abraham January 2014 (has links) This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often rely on the simplistic assumption that words are opaque symbols. This assumption does not fit morphologically complex language well, where words can have rich internal structure and sub-word elements are shared across distinct word forms. Our approach is to encode basic notions of morphology into the assumptions of three different types of language models, with the intention that leveraging shared sub-word structure can improve model performance and help overcome data sparsity that arises from morphological processes. In the context of n-gram language modelling, we formulate a new Bayesian model that relies on the decomposition of compound words to attain better smoothing, and we develop a new distributed language model that learns vector representations of morphemes and leverages them to link together morphologically related words. In both cases, we show that accounting for word sub-structure improves the models' intrinsic performance and provides benefits when applied to other tasks, including machine translation. We then shift the focus beyond the modelling of word sequences and consider models that automatically learn what the sub-word elements of a given language are, given an unannotated list of words. We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to. This approach is demonstrated on Semitic languages, and we find that modelling discontiguous sub-word structures leads to improvements in the task of segmenting words into their contiguous morphemes. 410.285
223	Global connectivity, information diffusion, and the role of multilingual users in user-generated content platforms Hale, Scott A. January 2014 (has links) Internet content and Internet users are becoming more linguistically diverse as more people speaking different languages come online and produce content on user-generated content platforms. Several platforms have emerged as truly global platforms with users speaking many different languages and coming from around the world. It is now possible to study human behavior on these platforms using the digital trace data the platforms make available about the content people are authoring. Network literature suggests that people cluster together by language, but also that there is a small average path length between any two people on most Internet platforms (including two speakers of different languages). If so, multilingual users may play critical roles as bridges or brokers on these platforms by connecting clusters of monolingual users together across languages. The large differences in the content available in different languages online underscores the importance of such roles. This thesis studies the roles of multilingual users and platform design on two large, user-generated content platforms: Wikipedia and Twitter. It finds that language has a strong role structuring each platform, that multilingual users do act as linguistic bridges subject to certain limitations, that the size of a language correlates with the roles its speakers play in cross-language connections, and that there is a correlation between activity and multilingualism. In contrast to the general understanding in linguistics of high levels of multilingualism offline, this thesis finds relatively low levels of multilingualism on Twitter (11%) and Wikipedia (15%). The findings have implications for both platform design and social network theory. The findings suggest design strategies to increase multilingualism online through the identification and promotion of multilingual starter tasks, the discovery of related other-language information, and the promotion of user choice in linguistic filtering. While weak-ties have received much attention in the social networks literature, cross-language ties are often not distinguished from same-language weak ties. This thesis finds that cross-language ties are similar to same-language weak ties in that both connect distant parts of the network, have limited bandwidth, and yet transfer a non-trivial amount of information when considered in aggregate. At the same time, cross-language ties are distinct from same-language weak ties for the purposes of information diffusion. In general cross-language ties are smaller in number than same-language ties, but each cross-language tie may convey more diverse information given the large differences in the content available in different languages and the relative ease with which a multilingual speaker may access content in multiple languages compared to a monolingual speaker. 006.3
224	Improper colourings of graphs Kang, Ross J. January 2008 (has links) We consider a generalisation of proper vertex colouring of graphs, referred to as improper colouring, in which each vertex can only be adjacent to a bounded number t of vertices with the same colour, and we study this type of graph colouring problem in several different settings. The thesis is divided into six chapters. In Chapter 1, we outline previous work in the area of improper colouring. In Chapters 2 and 3, we consider improper colouring of unit disk graphs -- a topic motivated by applications in telecommunications -- and take two approaches, first an algorithmic one and then an average-case analysis. In Chapter 4, we study the asymptotic behaviour of the improper chromatic number for the classical Erdos-Renyi model of random graphs. In Chapter 5, we discuss acyclic improper colourings, a specialisation of improper colouring, for graphs of bounded maximum degree. Finally, in Chapter 6, we consider another type of colouring, frugal colouring, in which no colour appears more than a bounded number of times in any neighbourhood. Throughout the thesis, we will observe a gradient of behaviours: for random unit disk graphs and "large" unit disk graphs, we can greatly reduce the required number of colours relative to proper colouring; in Erdos-Renyi random graphs, we do gain some improvement but only when t is relatively large; for acyclic improper chromatic numbers of bounded degree graphs, we discern an asymptotic difference in only a very narrow range of choices for t. 511
225	Análise bayesiana de dados espaciais explorando diferentes estruturas de variância Rampaso, Renato Couto [UNESP] 11 August 2014 (has links) (PDF) Made available in DSpace on 2015-03-03T11:52:52Z (GMT). No. of bitstreams: 0 Previous issue date: 2014-08-11Bitstream added on 2015-03-03T12:06:57Z : No. of bitstreams: 1 000807000.pdf: 3826800 bytes, checksum: 8f498fe53474850bd7d37809b06976e2 (MD5) / No mapeamento de doenças, o objetivo geral é estudar a incidência ou risco de mortalidade causado por uma determinada doença em um conjunto de regiões geográficas. É comum assumir que a variável resposta, geralmente uma contagem, segue uma distribuição de Poisson, cuja taxa média pode ser explicada por um grupo de covariáveis e um efeito aleatório. Para este efeito aleatório, considera-se modelos autorregressivos condicionais (CAR) que carregam informação sobre a relação de vizinhança entre as regiões. Tais relações de vizinhança são expressas por meio da matriz de variâncias presente nestes modelos. Cada modelo CAR possui características distintas que atendem a diferentes propósitos a serem considerados pelo pesquisador. O foco do trabalho foi o estudo e comparação de alguns modelos autorregressivos condicionais propostos na literatura. Para a melhor compreensão das características de cada modelo, duas aplicações com dados epidemiológicos foram conduzidas para modelar o risco de óbito por Doença de Crohn e Colite Ulcerativa e por Câncer de traqueia, brônquios e pulmões no Estado de São Paulo, no período de 2008 a 2012... / In disease mapping, the overall goal is to study the incidence or risk of mortality caused by a specific disease in a number of geographical regions. It is common to assume that the response variable, generally a count, follows a Poisson distribution, whose average rate can be explained by a group of covariates and a random effect. For this random effect, it is considered conditional autoregressive models (CAR), which carry information about the neighborhood relationship between the regions. Such neighborhood relations are expressed by the variance matrix present in the models. Each CAR model has distinct characteristics that serve different purposes to be considered by the researcher. The focus of this dissertation was the study and comparison of some conditional autoregressive models proposed in the literature. For better understanding of the characteristics of each model, two applications with epidemiological data were conducted to model the risk of death due to Crohn’s Disease and Ulcerative Colitis, and due to trachea, bronchus and lung cancer in the State of São Paulo, in the period of 2008-2012... Computação - Matematica Mortalidade - 2008 a 2012 Crohn, Doença de Colite ulcerativa - 2008 a 2012 Traqueia - Cancer - 2008 a 2012 Doenças - Estatística - 2008 a 2012 Teoria bayesiana de decisão estatistica Computer science - Mathematics
226	Estudo sobre construção de escalas com base na Teoria da Resposta ao Item : avaliação de proficiência em conteúdos matemáticos básicos / Fujii, Tânia Robaskiewicz Coneglian. January 2018 (has links) Orientador: Aparecida Donizete Pires de Souza / Banca: Adriano Ferreti Borgatto / Banca: Mario Hissamitsu Tarumoto / Resumo: Neste trabalho realizou-se um estudo sobre construção de escalas, com base na Teoria da Resposta ao Item (TRI), resultando na construção e interpretação pedagógica de uma escala de conhecimento para medir a proficiência em conteúdos matemáticos, necessários para o acompanhamento das disciplinas de cálculo e similares dos ingressantes nos cursos da área de exatas. O modelo matemático adotado nesta pesquisa foi o logístico unidimensional de três parâmetros. A estimação dos parâmetros dos itens e das proficiências dos respondentes foi feita sob enfoque bayesiano, utilizando-se o amostrador de Gibbs, algoritmo da classe dos Métodos de Monte Carlo via Cadeia de Markov (MCMC), implementado via software OpenBUGS (Bayesian inference Using Gibbs Sampling), direcionado para análise bayesiana de modelos complexos. O software BILOG-MG também foi utilizado para comparação dos resultados. O instrumento utilizado para a medida do conhecimento consistiu em uma prova composta por trinta e seis itens de múltipla escolha, cada um com cinco alternativas, sendo somente uma a correta. Os itens foram elaborados com base em uma matriz de referência construída para este fim, dividida em três temas, sendo estes "espaço e forma", "grandezas e medidas" e "números e operações/álgebra e funções". Cada tema é composto por competências e cada competência descreve uma habilidade que se deseja medir. Para a construção da escala proposta, optou-se por adotar uma escala com média 250 e desvio padrão 50. Nesta e... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: In this work, a study was carried out on the construction of scales, based on the Item Response Theory (IRT), resulting in the construction and pedagogical interpretation of a scale of knowledge to measure the proficiency in mathematical contents, necessary for the follow-up of Calculus and similar subjects of the students in the courses of the Exact Sciences Area. The mathematical model adopted in this research was the three parameters one-dimensional logistic. The parameters estimation of the items and proficiencies of the respondents was done using a Bayesian approach using the Gibbs sampler, Monte Carlo Methods via Markov Chain algorithm (MCMC), implemented using OpenBUGS software (Bayesian inference Using Gibbs Sampling), directed to Bayesian analysis of complex models. The BILOG-MG software was also used to compare the results. The instrument used for the measurement of knowledge consisted of a test composed of thirty-six multiple choice items, each with five alternatives, with only one correct. The items were elaborated based on a reference matrix constructed for this purpose, divided in three themes, being these "space and form", "quantities and measures" and "numbers and operations/ algebra and functions". Each subject is composed of competencies and each competency describes a skill that one wishes to measure. In order to construct the proposed scale, we chose to adopt a scale with a mean of 250 and standard deviation of 50. In this scale, we selected levels to be i... (Complete abstract click electronic access below) / Mestre Computação - Matematica. Matemáticos. Aptidão. Estatistica matematica. Estatística. Modelagem de dados. Computer science - Mathematics Bayesian inference Scale segmentation
227	Reconhecimento de contorno de edifício em imagens de alta resolução usando os momentos complexos de Zernike Imada, Renata Nagima [UNESP] 24 October 2014 (has links) (PDF) Made available in DSpace on 2015-04-09T12:28:28Z (GMT). No. of bitstreams: 0 Previous issue date: 2014-10-24Bitstream added on 2015-04-09T12:47:21Z : No. of bitstreams: 1 000812794.pdf: 1525344 bytes, checksum: b68f6da113153c038916e9bd3f57c375 (MD5) / Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Nesta pesquisa foi estudado um m etodo de reconhecimento de contornos de telhado de edif cios em imagens digitais de alta resolu c~ao, que classi ca-os com rela c~ao a sua forma. O m etodo baseia-se nos momentos de Zernike, que s~ao baseados nos polin omios ortogonais de Zernike, em que cria-se um vetor de caracter sticas para cada regi~ao da imagem, que deve ser previamente segmentada de maneira que seus objetos sejam divididos em diferentes regi~oes. Este m etodo para a descri c~ao de forma baseia-se na area do objeto de interesse e possui a caracter stica dos momentos serem invariantes em rela c~ao as transforma c~oes geom etricas de rota c~ao, transla c~ao e escala, que o torna atrativo para o problema de an alise de imagem proposto. Desse modo, foi criada uma base de dados contendo esbo cos (ou modelos) de poss veis apari c~oes de contornos de telhado de edif cio numa dada cena, para que seja associado tamb em um vetor de caracter sticas de Zernike para estes esbo cos. Assim, a dist ancia euclidiana entre este vetor e o vetor de caracter sticas calculado a partir de uma regi~ao segmentada na imagem, permite dizer se a regi~ao dada corresponde a um contorno de edif cio ou a outro objeto. A capacidade de discrimina c~ao do m etodo proposto entre diferentes formas de edif cios, e tamb em entre formas de edif cios e n~ao edif cios foi avaliada experimentalmente e mostrou resultados positivos. / In this research, a method of recognition of building roof contours in high-resolution digital images which classi es them with respect to their form was studied. The method is based on Zernike moments, which are based on orthogonal Zernike polynomials and it creates a feature vector for each image region. The image segmentation has to be made rst to de ne di erent regions for its objects. This method for shape analysis is based on the object area of interest and the moments has the characteristic of being invariant under geometric transformations of rotation, translation and scaling, this makes it attractive to the proposed image analysis problem. Thus, a database containing sketches (or models) of possible appearances of building roof contours in a given scene was created, so a Zernike feature vector was also associated for these sketches. Therefore, the Euclidean distance between this vector and the feature vector calculated from a segmented region in the image lets say if the given region corresponds to a building contour or other object. The capacity of the proposed method in discriminating di erent building shapes and also in discriminating building shapes from non-building shapes was evaluated experimentally and it showed positive results. Computação - Matematica Zernike, Frits 1888-1966 Edificios Fisica Imagens digitais Imagens de sensoriamento remoto Computer science - Mathematics
228	Estabilidade de sistemas dinâmicos: Estudo do memristor Moreira, Marília Davoli [UNESP] 15 April 2014 (has links) (PDF) Made available in DSpace on 2015-01-26T13:21:16Z (GMT). No. of bitstreams: 0 Previous issue date: 2014-04-15Bitstream added on 2015-01-26T13:30:54Z : No. of bitstreams: 1 000801690.pdf: 625904 bytes, checksum: 6c904be347933eff6bc28f2d0cf3ad4f (MD5) / Neste trabalho, ser a apresentado um estudo detalhado da estabilidade dos pontos de equilíbrio de alguns modelos matemáticos que representam o funcionamento de um ciruito elétrico que possui o memristor em sua composição, além dos outros componentes elétricos, formados por sistemas de equações diferenciais ordinárias de terceira e quarta ordens, envolvendo funções lineares por partes. Em tal processo e de fundamental importância o conhecimento de resultados relacionados a zeros de polinômios, pois a análise da estabilidade de tais sistemas está relacionado a determina ção dos autovalores da matriz dos coeficientes do sistema. Em tal estudo ser a utilizado o Critério de Routh-Hurwitz. / In this work, a detailed study of the stability of the equilibrium points of some mathematical models that represent the that represent the behavior of an electrical circuit with a memristor in your composition, consisting, consisting of ordinary di erential equations of third and fourth order systems, involving piecewise linear functions. In this theory is very important the study of results related to the zeros of polynomials, because the stability analysis of these systems is related to the eigenvalues of the coe cient matrix of the system. The Routh-Hurwitz criterion will be used. Hurwitz, Adolf 1859-1919 Routh, Edward John 1831-1907 Computação - Matematica Resistores com memória Sistemas lineares invariantes no tempo Estabilidade Matrizes (Matematica) Computer science - Mathematics
229	Análise bayesiana de dados espaciais explorando diferentes estruturas de variância / Rampaso, Renato Couto. January 2014 (has links) Orientador: Aparecida Doniseti Pires de Souza / Coorientador: Edilson Ferreira Flores / Banca: Vilma Mayumi Tachibana / Banca: Ricardo Sandes Ehlers / Resumo: No mapeamento de doenças, o objetivo geral é estudar a incidência ou risco de mortalidade causado por uma determinada doença em um conjunto de regiões geográficas. É comum assumir que a variável resposta, geralmente uma contagem, segue uma distribuição de Poisson, cuja taxa média pode ser explicada por um grupo de covariáveis e um efeito aleatório. Para este efeito aleatório, considera-se modelos autorregressivos condicionais (CAR) que carregam informação sobre a relação de vizinhança entre as regiões. Tais relações de vizinhança são expressas por meio da matriz de variâncias presente nestes modelos. Cada modelo CAR possui características distintas que atendem a diferentes propósitos a serem considerados pelo pesquisador. O foco do trabalho foi o estudo e comparação de alguns modelos autorregressivos condicionais propostos na literatura. Para a melhor compreensão das características de cada modelo, duas aplicações com dados epidemiológicos foram conduzidas para modelar o risco de óbito por Doença de Crohn e Colite Ulcerativa e por Câncer de traqueia, brônquios e pulmões no Estado de São Paulo, no período de 2008 a 2012... / Abstract: In disease mapping, the overall goal is to study the incidence or risk of mortality caused by a specific disease in a number of geographical regions. It is common to assume that the response variable, generally a count, follows a Poisson distribution, whose average rate can be explained by a group of covariates and a random effect. For this random effect, it is considered conditional autoregressive models (CAR), which carry information about the neighborhood relationship between the regions. Such neighborhood relations are expressed by the variance matrix present in the models. Each CAR model has distinct characteristics that serve different purposes to be considered by the researcher. The focus of this dissertation was the study and comparison of some conditional autoregressive models proposed in the literature. For better understanding of the characteristics of each model, two applications with epidemiological data were conducted to model the risk of death due to Crohn's Disease and Ulcerative Colitis, and due to trachea, bronchus and lung cancer in the State of São Paulo, in the period of 2008-2012... / Mestre Computação - Matematica. Mortalidade - 2008 a 2012. Crohn, Doença de. Colite ulcerativa - 2008 a 2012. Traqueia - Cancer - 2008 a 2012. Doenças - Estatística - 2008 a 2012. Computer science - Mathematics
230	A framework for processing correlated probabilistic data van Schaik, Sebastiaan Johannes January 2014 (has links) The amount of digitally-born data has surged in recent years. In many scenarios, this data is inherently uncertain (or: probabilistic), such as data originating from sensor networks, image and voice recognition, location detection, and automated web data extraction. Probabilistic data requires novel and different approaches to data mining and analysis, which explicitly account for the uncertainty and the correlations therein. This thesis introduces ENFrame: a framework for processing and mining correlated probabilistic data. Using this framework, it is possible to express both traditional and novel algorithms for data analysis in a special user language, without having to explicitly address the uncertainty of the data on which the algorithms operate. The framework will subsequently execute the algorithm on the probabilistic input, and perform exact or approximate parallel probability computation. During the probability computation, correlations and provenance are succinctly encoded using probabilistic events. This thesis contains novel contributions in several directions. An expressive user language – a subset of Python – is introduced, which allows a programmer to implement algorithms for probabilistic data without requiring knowledge of the underlying probabilistic model. Furthermore, an event language is presented, which is used for the probabilistic interpretation of the user program. The event language can succinctly encode arbitrary correlations using events, which are the probabilistic counterparts of deterministic user program variables. These highly interconnected events are stored in an event network, a probabilistic interpretation of the original user program. Multiple techniques for exact and approximate probability computation (with error guarantees) of such event networks are presented, as well as techniques for parallel computation. Adaptations of multiple existing data mining algorithms are shown to work in the framework, and are subsequently subjected to an extensive experimental evaluation. Additionally, a use-case is presented in which a probabilistic adaptation of a clustering algorithm is used to predict faults in energy distribution networks. Lastly, this thesis presents techniques for integrating a number of different probabilistic data formalisms for use in this framework and in other applications. 519.20285

Search results