Global ETD Search

291	Identificação de autoridades em tópicos na blogosfera brasileira usando comentários como relacionamento / Topical authority identification in the brazilian blogosphere using comments as relationships Santos, Henrique Dias Pereira dos January 2013 (has links) Com o aumento dos usuários acessando a internet no Brasil, cresce a quantidade de conteúdo produzido por brasileiros. Assim se torna importante classificar os melhores autores para que se tenha mais confiança nos textos lidos. Nesse sentido, esta dissertação faz um estudo sobre a descoberta de autoridades em tópicos na blogosfera brasileira. O escopo de estudo e análise é a plataforma de publicação de blogs, Blogspot, sobre os blogueiros que se identificam como brasileiros. Para tanto, foram coletados nove milhões de postagens do ano de 2012 e considerados os comentários como fonte de relacionamento entre os blogueiros para gerar uma rede social. Essa rede foi usada para experimentos do algoritmo de identificação de autoridades em tópicos. O algoritmo utilizado como base é o Topic PageRank, separando os diversos tópicos da blogosfera pelas tags que os usuários definem em suas postagens e posteriormente construindo a lista das autoridades em tais tópicos. Experimentos realizados demonstram que o método proposto resulta em melhor ranqueamento que o algoritmo original do PageRank. Cabe salientar que foi feita uma caracterização dos dados coletados por um questionário aplicado a quatro mil autores. / With the intesification of users accessing the Internet in Brazil, the amount of content produced by Brazilians increases. Thus, it becomes important to classify the best authors to have more confidence in the texts read. In this sense, this work presents a study on subject of topic authorities discovery in the Brazilian blogosphere. The scope of the study is the Blogspot platform, focusing on bloggers who identify themselves as Brazilians. To this end, we collected nine millions posts in the year of 2012 and considered the comments as a source of relationship between bloggers to generate a social network. This network was used for performing experiments considering the proposed approach to identify topic authorities. The algorithm used is based on the Topic PageRank, which can separate the different blogosphere’s topics by tags that users use on their posts, and then building the list of authorities on such topics. The experiments conducted show that the proposed approach results in better ranking than the original PageRank algorithm. We also characterize the collected database with a survey of over four thousand authors. Sistemas : Informação Recuperacao : Informacao Armazenamento : Dados Authority Brazilian blogosphere Social network analysis Ranking
292	PhenoVis : a visual analysis tool to phenological phenomena / PhenoVis : uma ferramenta de análise visual para fenômenos fenológicos Leite, Roger Almeida January 2015 (has links) Phenology studies recurrent periodic phenomena of plants and their relationship to environmental conditions. Monitoring forest ecosystems using digital cameras allows the study of several phenological events, such as leaf expansion or leaf fall. Since phenological phenomena are cyclic, the comparative analysis of successive years is capable of identifying interesting variation on annual patterns. However, the number of images collected rapidly gets significant since the goal is to compare data from several years. Instead of performing the analysis over images, experts prefer to use derived statistics (such as average values). We propose PhenoVis, a visual analytics tool that provides insightful ways to analyze phenological data. The main idea behind PhenoVis is the Chronological Percentage Maps (CPMs), a visual mapping that offers a summary view of one year of phenological data. CPMs are highly customizable, encoding more information about the images using a pre-defined histogram, a mapping function that translates histogram values into colors, and a normalized stacked bar chart to display the results. PhenoVis supports different color encodings, visual pattern analysis over CPMs, and similarity searches that rank vegetation patterns found at various time periods. Results for datasets comprising data of up to nine consecutive years show that PhenoVis is capable of finding relevant phenological patterns along time. Fenologia estuda os fenômenos recorrentes e periódicos que ocorrem com as plantas. Estes podem vir a ser relacionados com as condições ambientais. O monitoramento de florestas, através de câmeras, permite o estudo de eventos fenológicos como o crescimento e queda de folhas. Uma vez que os fenômenos fenológicos são cíclicos, análises comparativas de anos sucessivos podem identificar variações interessantes no comportamento destes. No entanto, o número de imagens cresce rapidamente para que sejam comparadas lado a lado. PhenoVis é uma ferramenta para análise visual que apresenta formas para analisar dados fenológicos através de comparações estatísticas (preferência dos especialistas) derivadas dos valores dos pixels destas imagens. A principal ideia por trás de PhenoVis são os mapas percentuais cronológicos (CPMs), um mapeamento visual com uma visão resumida de um período de um ano de dados fenológicos. CPMs são personalizáveis e conseguem representar mais informações sobre as imagens do que um gráfico de linha comum. Isto é possível pois o processo envolve o uso de histogramas pré-definidos, um mapeamento que transforma valores em cores e um empilhamento dos mapas de percentagem que visa a criação da CPM. PhenoVis suporta diferentes codificações de cores e análises de padrão visual sobre as CPMs. Pesquisas de similaridade ranqueiam padrões parecidos encontrados nos diferentes anos. Dados de até nove anos consecutivos mostram que PhenoVis é capaz de encontrar padrões fenológicos relevantes ao longo do tempo. Computação gráfica Visualização Visual analytics Multidimensional analysis Percentage distribution Similarity ranking Phenology
293	Generation and Ranking of Candidate Networks of Relations for Keyword Search over Relational Databases Oliveira, Péricles Silva de, 21-98498-9543 28 April 2017 (has links) Submitted by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-08-22T19:40:10Z No. of bitstreams: 2 Tese - Péricles Silva de Oliveira.pdf: 1875380 bytes, checksum: 014ba89b7fe1929a1461c9d8d3959416 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-08-22T19:40:26Z (GMT) No. of bitstreams: 2 Tese - Péricles Silva de Oliveira.pdf: 1875380 bytes, checksum: 014ba89b7fe1929a1461c9d8d3959416 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-08-22T19:40:44Z (GMT) No. of bitstreams: 2 Tese - Péricles Silva de Oliveira.pdf: 1875380 bytes, checksum: 014ba89b7fe1929a1461c9d8d3959416 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-08-22T19:40:44Z (GMT). No. of bitstreams: 2 Tese - Péricles Silva de Oliveira.pdf: 1875380 bytes, checksum: 014ba89b7fe1929a1461c9d8d3959416 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-04-28 / Several systems proposed for processing keyword queries over relational databases rely on the generation and evaluation of Candidate Networks (CNs), i.e., networks of joined database relations that, when processed as SQL queries, provide a relevant answer to the input keyword query. Although the evaluation of CNs has been extensively addressed in the literature, problems related to efficiently generating meaningful CNs have received much less attention. To generate useful CNs is necessary to automatically locating, given a handful of keywords, relations in the database that may contain relevant pieces of information, and determining suitable ways of joining these relations to satisfy the implicit information need expressed by a user when formulating her query. In this thesis, we present two main contributions related to the processing of Candidate Networks. As our first contribution, we present a novel approach for generating CNs, in which possible matchings of the query in database are efficiently enumerated at first. These query matches are then used to guide the CN generation process, avoiding the exhaustive search procedure used by current state-of-art approaches. We show that our approach allows the generation of a compact set of CNs that leads to superior quality answers, and that demands less resources in terms of processing time and memory. As our second contribution, we initially argue that the number of possible Candidate Networks that can be generated by any algorithm is usually very high, but that, in fact, only very few of them produce answers relevant to the user and are indeed worth processing. Thus, there is no point in wasting resources processing useless CNs. Then, based on such an argument, we present an algorithm for ranking CNs, based on their probability of producing relevant answers to the user. This relevance is estimated based on the current state of the underlying database using a probabilistic Bayesian model we have developed. By doing so we are able do discard a large number of CNs, ultimately leading to better results in terms of quality and performance. Our claims and proposals are supported by a comprehensive set of experiments we carried out using several query sets and datasets used in previous related work and whose results we report and analyse here. / Sem resumo. Keyword-search Match graph Relational database Ranking Candidate networks
294	Learning to rank: combinação de algoritmos aplicando stacking e análise dos resultados Paris, Bruno Mendonça 07 November 2017 (has links) Submitted by Marta Toyoda (1144061@mackenzie.br) on 2018-02-21T23:45:28Z No. of bitstreams: 2 Bruno Mendonça Paris.pdf: 2393892 bytes, checksum: 0cd807e0fd978642fc513bf059389c1f (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Paola Damato (repositorio@mackenzie.br) on 2018-04-04T11:43:59Z (GMT) No. of bitstreams: 2 Bruno Mendonça Paris.pdf: 2393892 bytes, checksum: 0cd807e0fd978642fc513bf059389c1f (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-04-04T11:43:59Z (GMT). No. of bitstreams: 2 Bruno Mendonça Paris.pdf: 2393892 bytes, checksum: 0cd807e0fd978642fc513bf059389c1f (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-11-07 / With the growth of the amount of information available in recent years, which will continue to grow due to the increase in users, devices and information shared over the internet, accessing the desired information should be done in a quick way so it is not spent too much time looking for what you want. A search in engines like Google, Yahoo, Bing is expected that the rst results bring the desired information. An area that aims to bring relevant documents to the user is known as Information Retrieval and can be aided by Learning to Rank algorithms, which applies machine learning to try to bring important documents to users in the best possible ordering. This work aims to verify a way to get an even better ordering of documents, using a technique of combining algorithms known as Stacking. To do so, it will used the RankLib tool, part of Lemur Project, developed in the Java language that contains several Learning to Rank algorithms, and the datasets from a base maintained by Microsoft Research Group known as LETOR. / Com o crescimento da quantidade de informação disponível nos últimos anos, a qual irá continuar crescendo devido ao aumento de usuários, dispositivos e informações compartilhadas pela internet, acessar a informação desejada deve ser feita de uma maneira rápida afim de não se gastar muito tempo procurando o que se deseja. Uma busca em buscadores como Google, Yahoo, Bing espera-se que os primeiros resultados tragam a informação desejada. Uma área que tem o objetivo de trazer os documentos relevantes para o usuário é conhecida por Recuperação de Informação e pode ser auxiliada por algoritmos Learning to Rank, que aplica aprendizagem de máquina para tentar trazer os documentos importantes aos usuários na melhor ordenação possível. Esse trabalho visa verificar uma maneira de obter uma ordenação ainda melhor de documentos, empregando uma técnica de combinar algoritmos conhecida por Stacking. Para isso será utilizada a ferramenta RankLib, parte de um projeto conhecido por Lemur, desenvolvida na linguagem Java, que contém diversos algoritmos Learning to Rank, e o conjuntos de dados provenientes de uma base mantida pela Microsoft Research Group conhecida por LETOR. recuperação de informação ranking learning to rank stacking
295	Image Representation using Attribute-Graphs Prabhu, Nikita January 2016 (has links) (PDF) In a digital world of Flickr, Picasa and Google Images, developing a semantic image represen-tation has become a vital problem. Image processing and computer vision researchers to date, have used several di erent representations for images. They vary from low level features such as SIFT, HOG, GIST etc. to high level concepts such as objects and people. When asked to describe an object or a scene, people usually resort to mid-level features such as size, appearance, feel, use, behaviour etc. Such descriptions are commonly referred to as the attributes of the object or scene. These human understandable, machine detectable attributes have recently become a popular feature category for image representation for various vision tasks. In addition to image and object characteristics, object interactions and back-ground/context information and the actions taking place in the scene form an important part of an image description. It is therefore, essential, to develop an image representation which can e ectively describe various image components and their interactions. Towards this end, we propose a novel image representation, termed Attribute-Graph. An Attribute-Graph is an undirected graph, incorporating both local and global image character-istics. The graph nodes characterise objects as well as the overall scene context using mid-level semantic attributes, while the edges capture the object topology and the actions being per-formed. We demonstrate the e ectiveness of Attribute-Graphs by applying them to the problem of image ranking. Since an image retrieval system should rank images in a way which is compatible with visual similarity as perceived by humans, it is intuitive that we work in a human understandable feature space. Most content based image retrieval algorithms treat images as a set of low level features or try to de ne them in terms of the associated text. Such a representation fails to capture the semantics of the image. This, more often than not, results in retrieved images which are semantically dissimilar to the query. Ranking using the proposed attribute-graph representation alleviates this problem. We benchmark the performance of our ranking algorithm on the rPascal and rImageNet datasets, which we have created in order to evaluate the ranking performance on complex queries containing multiple objects. Our experimental evaluation shows that modelling images as Attribute-Graphs results in improved ranking performance over existing techniques. Attribute-Graphs Image Representation Convolutional Neural Networks Graphs Attribute-Graph Image Ranking Datasets Computer Science
296	kernlab - An S4 Package for Kernel Methods in R Karatzoglou, Alexandros, Smola, Alex, Hornik, Kurt, Zeileis, Achim 11 1900 (has links) (PDF) kernlab is an extensible package for kernel-based machine learning methods in R. It takes advantage of R's new S4 object model and provides a framework for creating and using kernel-based algorithms. The package contains dot product primitives (kernels), implementations of support vector machines and the relevance vector machine, Gaussian processes, a ranking algorithm, kernel PCA, kernel CCA, and a spectral clustering algorithm. Moreover it provides a general purpose quadratic programming solver, and an incomplete Cholesky decomposition method.
297	Závislost velikosti příjmu nejvyšších fotbalových lig a žebříčku zemí dle klubového koeficientu UEFA. / The dependence of the amount of revenue of the highest ranking football leagues based on the ranking of countries according to the UEFA club coefficient. Oralová, Nika January 2017 (has links) Title: The dependence of the amount of revenue of the highest ranking football leagues based on the ranking of countries according to the UEFA club coefficient. Objectives: In its work, the main aim was to examine the correlation between the amount of income of the highest football leagues and the countries' rankings on the UEFA club coefficient ladder. Shall the correlation be proven, further aims of its work would be description and quantification of the correlation. One of the side aims is the analysis of the nowadays football phenomena and proving its worldwide significance. Methods: In its work, publicly available data (professional literature, annual reports, statistical yearbooks) was analysed. On the basis of the data collection and analysis, the data was compiled and compared with each other in order to follow trends and create thorough conclusions. On the basis of the correlation between the monitored indicators, these correlations were measured. For the purposes of the correlation measurement, the statistical method of regression analysis was used. Results: The main finding of this work is that the correlation between the amount of income of the highest football leagues and the contries' rankings on the UEFA club coefficient ladder was proven. This finding is based on the results of the...
298	Performance Analysis of Credit Scoring Models on Lending Club Data / Performance Analysis of Credit Scoring Models on Lending Club Data Polena, Michal January 2017 (has links) In our master thesis, we compare ten classification algorithms for credit scor- ing. Their prediction performances are measured by six different classification performance measurements. We use a unique P2P lending data set with more than 200,000 records and 23 variables for our classifiers comparison. This data set comes from Lending Club, the biggest P2P lending platform in the United States. Logistic regression, Artificial neural network, and Linear discriminant analysis are the best three classifiers according to our results. Random forest ranks as the fifth best classifier. On the other hand, Classification and regression tree and k-Nearest neighbors are ranked as the worse classifiers in our ranking. 1
299	Ranking faktorů ovlivňujících vývoj cen na realitním trhu v Šanghaji / The Ranking of Shanghai Real Estate Price Determinants Sýkora, Michal January 2017 (has links) This thesis studies the price determinants of newly developed residential buildings in Shanghai. It introduces the history and present of real estate market in PRC and Shanghai. It further describes the characteristics, trends, and structural problems the market is facing now, as well as possible future development. The empirical research is based on regression analysis using ordinary least squares method (OLS). The data analysis is performed using STATA statistical software in version 14. Statistically significant determinants are ranked according to their impact. The most impactful determinant is monetary supply. Followed by interest rate, SSE index, income and the amount of finished real estate investments. The least significant determinant is the floor space of newly constructed residential buildings in Shanghai.
300	The Use of Synthetic Mixture Based Libraries to Identify Hit Compounds for ESKAPE Pathogens, Leishmaniasis, and Inhibitors of Palmitoylation Giulianotti, Marcello 06 April 2016 (has links) The goal of this work is to demonstrate the utility of using systematically formatted mixture based libraries as part of the drug discovery processes. While there are a number of different valid approaches for identifying hit and tool compounds, systematically formatted mixture based libraries, such as those described in this study, offer the ability to develop a significant amount of structure activity relationship data from the testing of very few samples. In support of this claim a review of recent developments in the area of systematically formatted mixture based libraries as well as three case studies are presented. The three case studies provide the detailed approach and results obtained from using systematically formatted mixture based libraries in programs focused on identifying broad spectrum antibiotics, therapeutics to treat leishmaniasis, and inhibitors of palmitoylation. In each of these three cases approximately 200 samples were utilized to survey millions of compounds in order to develop a series of hit and tool compounds as well as significant structure-activity relationship (SAR) data around the compounds identified. This information will be utilized in future studies to potentially uncover novel mechanisms of action for treating infections and diseases as well as developing therapeutics to treat the patients affect by them. So while systematically formatted mixture based libraries are not the only option for identifying hit or tool compounds they do provide a very efficient method that can be adapted to a variety of assay formats and therefor should be considered when conducting a screening campaign. Positional Scanning Libraries Scaffold Ranking Structure Activity Relationship Tool Compounds Chemistry

Search results