Spelling suggestions: "subject:"banking"" "subject:"anking""
191 |
Re-ranking de busca visual de produtos usando informação multimodalSantos, Joyce Miranda dos 12 March 2013 (has links)
Made available in DSpace on 2015-04-11T14:02:51Z (GMT). No. of bitstreams: 1
joyce.pdf: 2848954 bytes, checksum: 2975e0e84f1ae7a53273f20004ce6c78 (MD5)
Previous issue date: 2013-03-12 / With the fast development of the Internet and the popularization of mobile devices, searching for a specific product in e-commerce Web sites through a query image has become a very promising area of research. In this context, CBIR (Content-Based Image Retrieval) techniques have been exploited to support and improve the shopping experience of consumers. In this dissertation, we address the problem of product visual search using an image as a query, instead of the more popular approach of search based on keywords. We propose a strategy for re-ranking based on multimedia information usually available in database of products. Our strategy makes use of category information and textual
description associated with the top-k images of an initial ranking generated by CBIR techniques only. Experiments were performed considering the judgment of users on two collections of images collected from popular e-commerce Web sites. Our results show
that our strategy achieves significant gains compared to an approach based only on CBIR techniques. / Com o rápido desenvolvimento da Internet, a popularização de dispositivos móveis e de sites de comércio eletrônico, procurar um produto específico a partir de uma imagem tem se tornado uma área de pesquisa promissora. Nesse contexto, técnicas de CBIR (Content-Based Image Retrieval) vêm sendo exploradas para apoiar e melhorar a experiência de compra dos consumidores. Neste trabalho, abordamos o problema de busca visual de produtos usando uma imagem como consulta, no lugar da mais popular abordagem de busca que é baseada em palavras-chave. Nós propomos uma estratégia de re-ranking que faz uso de informações multimídia normalmente disponíveis nas bases de dados de produtos. Nossa estratégia faz uso de informações de categoria e descrição textual associadas
às imagens melhor posicionadas de um ranking inicial gerado por técnicas puramente de CBIR. Experimentos foram realizados considerando o julgamento de usuários em duas coleções de imagens coletadas a partir de sites de comércio eletrônico. Nossos resultados mostram que nossa estratégia alcança ganhos significativos quando comparada à busca puramente visual.
|
192 |
Scientific Publishing in Information SystemsBukvova, Helena, Kruse, Paul, Kummer, Christian January 2014 (has links)
Many academic decision makers rely on quantified ranking measures to estimate the quality of journal publications. The aim of this study was to map journals in Information Systems (IS) with regard to their topic and their rank and determine whether there is a relationship between the scope of a journal and its rank. The study used content analysis, applying both qualitative and quantitative methods.
The results of the analysis show the existence of relationships between the journal rank, the research area, and the type as well as significant differences in ranking on the three lists. The findings illustrate that ranking measures, as indicators for the quality of research published in a journal ought to be considered only in the context of a particular research area and scientific community.
|
193 |
Traitement continu de requêtes top-k dans les réseaux sociaux / Continuous processing of top-k queries in social networksAlkhouli, Abdulhafiz 29 September 2017 (has links)
En raison du grand succès des réseaux sociaux, la nature et mode de diffusion del’information sur le Web a changé en faveur de contenus dynamiques diffusés sousforme de flux d’information. Dans le contexte des réseaux sociaux, les utilisateurs peuvent s’abonner à de multiples sources d’information et recevoir continuellement de nouveaux contenus. Or, ce nouveau mode de publication/consommation peut entraîner d’énormes quantités d’information, en surchargeant les utilisateurs. Ainsi,il est essentiel de développer des techniques efficaces de filtrage et de classement qui permettent aux utilisateurs d’être efficacement mis à jour avec le contenu le plus intéressant.Les requêtes top-k sur les flux d’information limitent les résultats au contenu le plus pertinent. Pour améliorer la pertinence des résultats, le modèle de classement des résultats de requêtes devrait tenir compte de divers facteurs de contexte, y compris les facteurs traditionnels basés sur le contenu, les facteurs liés aux utilisateurs et leurs relations (réseau social). Dans le réseau social, le maintien des ensembles de top-k peut être plus difficile car de nombreux événements pourraient changer les messages de top-k tels que le nouveau message, la nouvelle action, le nouvel utilisateur, les modifications de profil, etc. Pour un grand réseau social avec des millions d’utilisateurs et des milliards de messages, le traitement continu des requêtes top-k est l’approche la plus efficace. Cependant, les systèmes actuels pour le traitementcontinu des requêtes top-k échouent lorseque ces systèmes considèrent des modèles de classement riches avec des critères de réseau social. En outre, de tels systèmes ne tiennent pas compte de la diversité des contenus publiés.Dans cette thèse, nous nous concentrons sur le filtrage des flux d’information basé sur le calcul des messages top-k pour chaque utilisateur dans le réseau social. Nous visons à développer un système à large échelle capable d’évaluer efficacement les requêtes top-k continues avec une fonction de classement complexe. Nous proposons l’algorithme SANTA, capable de gérer des fonctions de classement complexes avec des critères sociaux tout en maintenant un traitement continu des requêtes top-k. Nous proposons aussi une variante (SANTA +) qui accélère le traitement d’actions dans les réseaux sociaux. Pour tenire compte de la diversité des contenus publiés, nous proposons l’algorithme DA-SANTA qui étend l’algorithme SANTA pour intégrer la diversité dans le modèle top-k continu tout en maintenant l’efficacité du système. Nos expérimentation sont menées sur des données réelles extraite de Twitter, illustrant les propriétés de nos algorithmes et de montrer leur efficacité. / Information streams provide today a prevalent way of publishing and consuming content on the Web, especially due to the great success of social networks. In the social networks context, users may subscribe to several information sources of interest and continuously receive new published content. But, this new publishing/consumption mode may lead to huge amounts of received information, overwhelming for human processing. Thus, there is a vital need to develop effective filtering and ranking techniques which allow users to efficiently be updated with the most interesting content. Top-k queries over the streams of interest allow limiting results to the most relevant content. To provide a relevant content, the ranking model should consider various context factors including traditional IR factors and social network. In the social network, maintaining top-k sets may be more difficult because many events could produce changes in the top-k sets such as new message, new action, new user, profile changes, etc. For a large social network with millions of users and billionsof messages, the continuous processing of the top-k queries is the most effective approach. However, current systems fail in combining continuous top-k processing with rich scoring models including social network criteria. Moreover, such systems do not consider the diversity of published content.In this thesis, we focus on filtering information streams based on the computation of top-k messages for each user in the social network. We aim to develop a scalable system that be able to efficiently evaluate the continuous top-k queries using the continuous approach with a ranking function including social network criteria. We propose the SANTA algorithm, able to handle scoring functions including content similarity but also social network criteria and events in a continuous processing of top-k queries. We propose a variant (SANTA+) that accelerates the processing of interaction events in social networks. To provide both diverse and relevant messages in top-k sets, we propose the DA-SANTA algorithm which extends the SANTA algorithm to integrate the diversity into the continuous top-k model while maintaining the efficiency of the system. Our experiments are conducted over a real data-set extracted from Twitter, illustrating the properties of our algorithms and demonstrating their efficiency.
|
194 |
On the ranking property and underlying dynamics of complex systems / Sur la propriété classement et dynamique sous-jacente des systèmes complexesDeng, Weibing 21 June 2013 (has links)
Des procédures de classement sont largement utilisées pour décrire les phénomènes observés dans de nombreux domaines des sciences sociales et naturelles, par exemple la sociologie, l’économie, la linguistique, la démographie, la physique, la biologie, etc.Dans cette thèse, nous nous sommes attachés à l’étude des propriétés de classement et des dynamiques sous-jacentes intégrées dans les systèmes complexes. En particulier,nous nous sommes concentrés sur les classements par score ou par prix dans les systèmes sportifs et les classements d’utilisation des mots ou caractères dans les langues humaines. Le but est de comprendre les mécanismes sous-jacents à ces questions en utilisant les méthodes de la physique statistique, de la statistique bayésienne et de la modélisation multi-agents. Les résultats concrets concernent les aspects suivants.Nous avons tout d’abord traité une étude sur les classements par score/prix dans les systèmes sportifs et analysé 40 échantillons de données dans 12 disciplines sportives différentes. Nous avons trouvé des similitudes frappantes dans différents sports, à savoir le fait que la répartition des résultats/prix suit les lois puissance universelles.Nous avons également montré que le principe de Pareto est largement respecté dans de nombreux systèmes sociaux: ainsi 20% des joueurs accumulent 80% des scores et de l’argent. Les données concernant les matchs de tennis en individuels nous ont révélé que lorsque deux joueurs s’affrontent, la probabilité que le joueur de rang supérieur gagne est liée à la différence de rang des deux adversaires. Afin de comprendre les origines de la mise à l’échelle universelle, nous avons proposé un modèle multi-agents,qui peut simuler les matchs de joueurs à travers différentes compétitions. Les résultats de nos simulations sont cohérents avec les résultats empiriques. L’extension du domaine d’étude de la simulation indique que le modèle est assez robuste par rapport aux modifications de certains paramètres. La loi de Zipf est le comportement le plus régulièrement observé dans la linguistique statistique. Elle a dès lors servi de prototype pour les relations entre rang d’apparitions et fréquence d’apparitions (relations rang-fréquence dans la suite du texte) et les lois d’échelle dans les sciences naturelles. Nous avons étudié plusieurs textes, précisé le domaine de validité de la loi de Zipf, et trouvé que la plage de validité augmente lors du mélange de différents textes. Basé sur l’analyse sémantique latente, nous avons proposé un modèle probabiliste, dans lequel nous avons supposé que les mots sont ajoutés au texte avec des probabilités aléatoires, tandis que leur densité a priori est liée, via la statistique bayésienne, aux caractéristiques générales du lexique mental de l’auteur de ce même texte. Notre modèle explique la loi de Zipf ainsi que ses limites de validité, et la généralise aux hautes et basses fréquences et au hapax legomena.Dans une autre étude, nous avons précisé les relations rang-fréquence pour les caractères chinois. Nous avons choisi d’étudier des textes courts en premier, car pour le bien de l’analyse rang fréquence, les longs textes ne sont que des mélanges de textes plus courts, thématiquement homogènes. Nos résultats ont montré que la loi de Zipf appliqués aux caractères chinois tient parfaitement pour des textes assez courts (quelques milliers de caractères différents). Le même domaine de validité est observé pour les textes courts anglais. Nous avons soutenu que les longs textes chinois montrent une structure hiérarchique à deux couches: des caractères dont la fréquence d’apparition suit une loi puissance (première couche) et des caractères dont l’apparition suit une loi exponentielle (deuxième couche)... / Ranking procedures are widely used to describe the phenomena in many differentfields of social and natural sciences, e.g., sociology, economics, linguistics, demography,physics, biology, etc. In this dissertation, we dedicated to study the ranking propertiesand underlying dynamics embedded in complex systems. In particular, we focused onthe scores/prizes ranking in sports systems and the words/characters usage ranking inhuman languages. The aim is to understand the mechanisms behind these issues byusing the methods of statistical physics, Bayesian statistics and agent-based modeling.The concrete results concern the following aspects.We took up an interesting topic on the scores/prizes ranking in sports systems, andanalyzed 40 data samples in 12 different sports fields. We found the striking similaritiesin different sports, i.e., the distributions of scores/prizes follow the universal powerlaws. We also showed that the data yielded the Pareto principle extensively observedin many social systems: 20% of the players accumulate 80% of the scores and money.For the tennis head-to-head data, we revealed that when two players compete, theprobability that the higher-ranked player will win is related to the rank difference ofthe two opponents. In order to understand the origins of the universal scaling, weproposed an agent-based model, which can simulate the competitions of players indifferent matches, and results from our simulations are consistent with the empiricalfindings. Extensive simulation studies indicate that the model is quite robust withrespect to the modifications of some parameters.Zipf’s law is the major regularity of statistical linguistics that served as a prototypefor the rank-frequency relations and scaling laws in natural sciences. We investigatedseveral English texts, clarified the valid range of Zipf’s law, and found this valid rangeincreases upon mixing different texts. Based on the latent semantic analysis, we proposeda probabilistic model, in which we assumed that the words are drawn into thetext with random probabilities, while their apriori density relates, via Bayesian statistics,to the general features of mental lexicon of the author who produced the text. Ourmodel explained the Zipf’s law together with the limits of its validity, its generalizationto high and low frequencies and hapax legomena. In another work, we specified the rank-frequency relations for Chinese characters. We chose to study the short texts first, since for the sake of the rank-frequency analysis,long texts are just mixtures of shorter, thematically homogenous pieces. Our resultsshowed that the Zipf’s law for Chinese characters perfectly holds for sufficiently shorttexts (few thousand different characters), and the scenario of its validity is similar tothat for short English texts. We argued long Chinese texts display a two-layer, hierarchicstructure: power-law rank-frequency characters (first layer) and the exponentialones (second layer). The previous results on the invalidity of the Zipf’s law for longtexts are accounted for by showing that in between of the Zipfian range and the regionof very rare characters (hapax legomena) there emerges a range of ranks, wherethe rank-frequency relation is approximately exponential. From comparative analysisof rank-frequency relations for Chinese and English, we suggested the characters playfor Chinese writers the same role as the words for those writing within alphabeticalsystems.
|
195 |
Análise da competitividade no mercado de energia Brasileiro por meio de redes complexas / Competitiveness analysis of the Brazilian energy market through complex networksSilva, Guilherme Borin da 15 September 2016 (has links)
O presente trabalho tem como meta auxiliar na resposta a um dos principais problemas estudados no campo das ciências econômicas: o quanto e como intervenções regulatórias afetam a dinâmica dos mercados. Para isso será feita uma análise dos dados contratuais de compra e venda de energia elétrica no ambiente livre de comercialização de energia brasileiro por meio de uma metodologia que utiliza métricas de análise de redes complexas para avaliação da competitividade. Os dados abordam a atividade dos agentes comercializadores de energia nesse mercado durante o período de 2006 a 2015. É estabelecido então um ranking mensal desses agentes e criada a rede por meio da verificação das trocas de posições nesses rankings. Os resultados da análise indicam em quais anos houve maior variação na competitividade no mercado e pela análise das redes resultantes verifica-se a formação de estruturas de mercado. Posteriormente os resultados são comparados com métricas tradicionais de avaliação de competitividade e concentração de mercado e, por fim, é feita uma avaliação qualitativa dos índices sob a luz das principais alterações regulatórias ocorridas no período / The main goal of this project is to assist in the answer to one of the main issues in the study of Economics: how regulatory interventions affect the dynamics of the markets, in this case specifically, electricity markets. This will be achieved through an analysis of the contractual data of electric energy in the free Brazilian energy market environment through a methodology that uses complex network analysis for the evaluation of competitiveness. The data covers the contracts of all energy traders of this market in the period from 2006 to 2015. A monthly ranking of these agents is established and a network is created through the verification of position changes in these rankings. The results of the analysis indicates in which years there was greater variation in competitiveness and the analysis of the resulting networks indicates market structures formation. The results are then compared with traditional metrics for competitiveness and market concentration. Finally, a qualitative assessment of the results is made considering the major regulatory changes that have occurred in the study period
|
196 |
Estimativas de parâmetros genéticos e estudo comparativo de índices de seleção fenotípico e genético em provas de ganho de peso na raça Nelore / Estimation of genetic parameters and comparison study of phenotypic and genetic selection indexes of performance tests of Nellore beef cattleManicardi, Fernando Ricardo 20 December 2011 (has links)
O presente estudo teve como objetivos estimar os parâmetros genéticos para características de crescimento e perímetro escrotal, bem como avaliar as alterações no ranking dos animais participantes de provas de ganho de peso (PGP) quando diferentes índices de seleção são usados e identificar eventuais erros de seleção. Os índices fenotípicos utilizaram os valores fenotípicos das características e os índices genéticos utilizaram os valores genéticos aditivos para a classificação dos animais. Como dados de crescimento foi analisado o peso aos 120 dias (p120), ao desmame (pdes), aos 12 meses (p12) e aos 18 meses (p18) de 3149, 3958, 2484 e 1872 animais, respectivamente, bem como o ganho de peso do desmame aos 12 meses (gpdes-12) e dos 12 aos 18 meses (gp12-18) com 1455 e 1465 animais, respectivamente. Foi analisado o perímetro escrotal ao desmame (pedes), 12 meses (pe12), 15 meses (pe15) e 18 meses (pe18) de 1535, 1166, 1212 e 852 animais, respectivamente. Os componentes de (co)variância, os parâmetros genéticos e as soluções para os efeitos fixos e aleatórios foram estimados pelo método REML, com o programa VCE6 sob modelo animal e utilizando um arquivo de pedigree com 15.522 animais. Para a comparação de ranking foram utilizados dados de 793 machos nascidos em 2008 e 2009 participantes das PGP. Dois critérios foram usados para a comparação de ranking dos animais, a correlação de ranking de Spearman e os erros de seleção. Para avaliar os erros de seleção duas estratégias foram adotadas: animais que apresentaram índices iguais ou acima da média mais um desvio padrão foram selecionados e os animais que apresentaram índices acima da média, reduzindo com isso a pressão de seleção. As estimativas dos coeficientes de herdabilidade direta foram 0,26, 0,25, 0,13 e 0,15 e materna de 0,20, 0,16, 0,09 e 0,15 para p120, pdes, p12 e p18, respectivamente. As estimativas de herdabilidade direta para gpdes-12 e gp12-18 foram de 0,13 e 0,20, respectivamente. Para as características pedes, pe12, pe15 e pe18 as estimativas de herdabilidade direta foram de 0,56, 0,59, 0,54 e 0,50, e materna de 0,29, 0,29, 0,26 e 0,05, nesta ordem. Em sua maioria, as estimativas estão de acordo com as descritas na literatura. As correlações genéticas entre p120, pdes e p12 com o p18, ficaram abaixo da relatada na literatura. As correlações de ranking apresentaram-se relativamente altas entre os índices fenotípicos e genéticos. Erros de seleção de 19,3 a 73,2% foram observados com a classificação dos animais sendo igual ou maior que um desvio padrão acima da média. Quando a seleção foi obtida nos animais com índice igual ou maior que a média, os erros de seleção permaneceram entre 9,0 e 22,1%. Os resultados deste estudo indicam que a seleção de touros jovens pelos índices fenotípicos pode acarretar em erros na escolha de reprodutores, principalmente quanto maior a pressão de seleção. / The objectives of this study were estimate genetic parameters of growth traits and scrotal circumference, as well evaluate changes on ranking of animals submitted to performance tests (PGP) and selection error when two selection indexes were used. The phenotypic index used the phenotypic values, deviated from the mean of the group, to rank the animals and the genetic index used the combined additive genetic values for animals\' ranking. Growth traits comprehended weights at 120 days (p120), at weaning (pdes), at 12 months (p12) and at 18 months (p18) of 3149, 3958, 2484 e 1872 animals, respectively, along with weight gain from weaning to 12 months (gpdes-12) and from 12 to 18 months (gp12-18) measured in 1455 e 1465 animals, in that order. It was analyzed the scrotal circumference at weaning (pedes), at 12 months (pe12), at 15 months (pe15) and at 18 months of 1535, 1166, 1212 e 852 animals, respectively. (Co)variance components, genetic parameters and fixed and random effects solutions were estimated by REML method by VCE program under an animal model methodology and using a pedigree data file composed by 15.522 animals. For animals\' ranking comparison performance test data of 793 animals born in 2008 and 2009 were used. Two criteria were assumed to rank comparison: Spearman ranking correlation and selection error. The evaluation of selection error was realized based on two strategies: animals that presented index equal or greater than the mean plus one standard deviation were selected and animals with index greater than the mean were selected; in this case the selection pressure was reduced. Direct heritability coefficients were estimates as 0.26, 0.25, 0.13 e 0.15, and maternal heritability coefficients estimates were 0.20, 0.16, 0.09 e 0.15 for p120, pdes, p12 and p18, respectively. For gpdes-12 and gp12-18, the estimates of heritability were 0.13 e 0.20, in that order. For scrotal circumference at weaning, 12 months, 15 months and 18 months, the estimates of direct heritability were 0.56, 0.59, 0.54 e 0.50 and for maternal heritability were 0.29, 0.29, 0.26 e 0.05, respectively. In general, those estimations were in agreement with estimation on literature. The genetic correlations between p120, pdes and p12 with p18 were slightly lower than described on other studies. Spearman ranking correlation between phenotypic and genetic index were high. Selection errors between 19.3 and 73.2% were observed when selected animals presented indexes equal or greater than the mean plus one standard deviation. When selecting animals with index greater than the mean, the selection errors observed were between 9.0 and 22.1%. The results indicate that young replacement animals\' selection based on phenotypic index can lead to selection errors, especially when the selection pressure is reduced.
|
197 |
Análise da competitividade no mercado de energia Brasileiro por meio de redes complexas / Competitiveness analysis of the Brazilian energy market through complex networksGuilherme Borin da Silva 15 September 2016 (has links)
O presente trabalho tem como meta auxiliar na resposta a um dos principais problemas estudados no campo das ciências econômicas: o quanto e como intervenções regulatórias afetam a dinâmica dos mercados. Para isso será feita uma análise dos dados contratuais de compra e venda de energia elétrica no ambiente livre de comercialização de energia brasileiro por meio de uma metodologia que utiliza métricas de análise de redes complexas para avaliação da competitividade. Os dados abordam a atividade dos agentes comercializadores de energia nesse mercado durante o período de 2006 a 2015. É estabelecido então um ranking mensal desses agentes e criada a rede por meio da verificação das trocas de posições nesses rankings. Os resultados da análise indicam em quais anos houve maior variação na competitividade no mercado e pela análise das redes resultantes verifica-se a formação de estruturas de mercado. Posteriormente os resultados são comparados com métricas tradicionais de avaliação de competitividade e concentração de mercado e, por fim, é feita uma avaliação qualitativa dos índices sob a luz das principais alterações regulatórias ocorridas no período / The main goal of this project is to assist in the answer to one of the main issues in the study of Economics: how regulatory interventions affect the dynamics of the markets, in this case specifically, electricity markets. This will be achieved through an analysis of the contractual data of electric energy in the free Brazilian energy market environment through a methodology that uses complex network analysis for the evaluation of competitiveness. The data covers the contracts of all energy traders of this market in the period from 2006 to 2015. A monthly ranking of these agents is established and a network is created through the verification of position changes in these rankings. The results of the analysis indicates in which years there was greater variation in competitiveness and the analysis of the resulting networks indicates market structures formation. The results are then compared with traditional metrics for competitiveness and market concentration. Finally, a qualitative assessment of the results is made considering the major regulatory changes that have occurred in the study period
|
198 |
Extraktion und Identifikation von Entitäten in Textdaten im Umfeld der Enterprise Search / Extraction and identification of entities in text data in the field of enterprise searchBrauer, Falk January 2010 (has links)
Die automatische Informationsextraktion (IE) aus unstrukturierten Texten ermöglicht völlig neue Wege, auf relevante Informationen zuzugreifen und deren Inhalte zu analysieren, die weit über bisherige Verfahren zur Stichwort-basierten Dokumentsuche hinausgehen. Die Entwicklung von Programmen zur Extraktion von maschinenlesbaren Daten aus Texten erfordert jedoch nach wie vor die Entwicklung von domänenspezifischen Extraktionsprogrammen.
Insbesondere im Bereich der Enterprise Search (der Informationssuche im Unternehmensumfeld), in dem eine große Menge von heterogenen Dokumenttypen existiert, ist es oft notwendig ad-hoc Programm-module zur Extraktion von geschäftsrelevanten Entitäten zu entwickeln, die mit generischen Modulen in monolithischen IE-Systemen kombiniert werden. Dieser Umstand ist insbesondere kritisch, da potentiell für jeden einzelnen Anwendungsfall ein von Grund auf neues IE-System entwickelt werden muss.
Die vorliegende Dissertation untersucht die effiziente Entwicklung und Ausführung von IE-Systemen im Kontext der Enterprise Search und effektive Methoden zur Ausnutzung bekannter strukturierter Daten im Unternehmenskontext für die Extraktion und Identifikation von geschäftsrelevanten Entitäten in Doku-menten. Grundlage der Arbeit ist eine neuartige Plattform zur Komposition von IE-Systemen auf Basis der Beschreibung des Datenflusses zwischen generischen und anwendungsspezifischen IE-Modulen. Die Plattform unterstützt insbesondere die Entwicklung und Wiederverwendung von generischen IE-Modulen und zeichnet sich durch eine höhere Flexibilität und Ausdrucksmächtigkeit im Vergleich zu vorherigen Methoden aus.
Ein in der Dissertation entwickeltes Verfahren zur Dokumentverarbeitung interpretiert den Daten-austausch zwischen IE-Modulen als Datenströme und ermöglicht damit eine weitgehende Parallelisierung von einzelnen Modulen. Die autonome Ausführung der Module führt zu einer wesentlichen Beschleu-nigung der Verarbeitung von Einzeldokumenten und verbesserten Antwortzeiten, z. B. für Extraktions-dienste. Bisherige Ansätze untersuchen lediglich die Steigerung des durchschnittlichen Dokumenten-durchsatzes durch verteilte Ausführung von Instanzen eines IE-Systems.
Die Informationsextraktion im Kontext der Enterprise Search unterscheidet sich z. B. von der Extraktion aus dem World Wide Web dadurch, dass in der Regel strukturierte Referenzdaten z. B. in Form von Unternehmensdatenbanken oder Terminologien zur Verfügung stehen, die oft auch die Beziehungen von Entitäten beschreiben. Entitäten im Unternehmensumfeld haben weiterhin bestimmte Charakteristiken: Eine Klasse von relevanten Entitäten folgt bestimmten Bildungsvorschriften, die nicht immer bekannt sind, auf die aber mit Hilfe von bekannten Beispielentitäten geschlossen werden kann, so dass unbekannte Entitäten extrahiert werden können. Die Bezeichner der anderen Klasse von Entitäten haben eher umschreibenden Charakter. Die korrespondierenden Umschreibungen in Texten können variieren, wodurch eine Identifikation derartiger Entitäten oft erschwert wird.
Zur effizienteren Entwicklung von IE-Systemen wird in der Dissertation ein Verfahren untersucht, das alleine anhand von Beispielentitäten effektive Reguläre Ausdrücke zur Extraktion von unbekannten Entitäten erlernt und damit den manuellen Aufwand in derartigen Anwendungsfällen minimiert. Verschiedene Generalisierungs- und Spezialisierungsheuristiken erkennen Muster auf verschiedenen Abstraktionsebenen und schaffen dadurch einen Ausgleich zwischen Genauigkeit und Vollständigkeit bei der Extraktion. Bekannte Regellernverfahren im Bereich der Informationsextraktion unterstützen die beschriebenen Problemstellungen nicht, sondern benötigen einen (annotierten) Dokumentenkorpus.
Eine Methode zur Identifikation von Entitäten, die durch Graph-strukturierte Referenzdaten vordefiniert sind, wird als dritter Schwerpunkt untersucht. Es werden Verfahren konzipiert, welche über einen exakten Zeichenkettenvergleich zwischen Text und Referenzdatensatz hinausgehen und Teilübereinstimmungen und Beziehungen zwischen Entitäten zur Identifikation und Disambiguierung heranziehen. Das in der Arbeit vorgestellte Verfahren ist bisherigen Ansätzen hinsichtlich der Genauigkeit und Vollständigkeit bei der Identifikation überlegen. / The automatic information extraction (IE) from unstructured texts enables new ways to access relevant information and analyze text contents, which goes beyond existing technologies for keyword-based search in document collections. However, the development of systems for extracting machine-readable data from text still requires the implementation of domain-specific extraction programs.
In particular in the field of enterprise search (the retrieval of information in the enterprise settings), in which a large amount of heterogeneous document types exists, it is often necessary to develop ad-hoc program-modules and to combine them with generic program components to extract by business relevant entities. This is particularly critical, as potentially for each individual application a new IE system must be developed from scratch.
In this work we examine efficient methods to develop and execute IE systems in the context of enterprise search and effective algorithms to exploit pre-existing structured data in the business context for the extraction and identification of business entities in documents. The basis of this work is a novel platform for composition of IE systems through the description of the data flow between generic and application-specific IE modules. The platform supports in particular the development and reuse of generic IE modules and is characterized by a higher flexibility as compared to previous methods.
A technique developed in this work interprets the document processing as data stream between IE modules and thus enables an extensive parallelization of individual modules. The autonomous execution of each module allows for a significant runtime improvement for individual documents and thus improves response times, e.g. for extraction services. Previous parallelization approaches focused only on an improved throughput for large document collections, e.g., by leveraging distributed instances of an IE system.
Information extraction in the context of enterprise search differs for instance from the extraction from the World Wide Web by the fact that usually a variety of structured reference data (corporate databases or terminologies) is available, which often describes the relationships among entities. Furthermore, entity names in a business environment usually follow special characteristics: On the one hand relevant entities such as product identifiers follow certain patterns that are not always known beforehand, but can be inferred using known sample entities, so that unknown entities can be extracted. On the other hand many designators have a more descriptive character (concatenation of descriptive words). The respective references in texts might differ due to the diversity of potential descriptions, often making the identification of such entities difficult.
To address IE applications in the presence of available structured data, we study in this work the inference of effective regular expressions from given sample entities. Various generalization and specialization heuristics are used to identify patterns at different syntactic abstraction levels and thus generate regular expressions which promise both high recall and precision. Compared to previous rule learning techniques in the field of information extraction, our technique does not require any annotated document corpus.
A method for the identification of entities that are predefined by graph structured reference data is examined as a third contribution. An algorithm is presented which goes beyond an exact string comparison between text and reference data set. It allows for an effective identification and disambiguation of potentially discovered entities by exploitation of approximate matching strategies. The method leverages further relationships among entities for identification and disambiguation. The method presented in this work is superior to previous approaches with regard to precision and recall.
|
199 |
Die Umweltleistung in der Umweltberichterstattung von Unternehmen und deren Zusammenhang mit der ökonomischen LeistungMeier, Kerstin 22 September 2009 (has links) (PDF)
Die vorliegende Arbeit untersucht den Zusammenhang zwischen der Umweltberichterstattung und der ökonomischen Leistung von Unternehmen. Dazu erfolgt nach einer Abgrenzung der wesentlichen Fachbegriffe dieser Arbeit eine umfassende Darstellung des aktuellen Standes der Forschung zu diesem Thema. Die Ergebnisse dieser Recherche verdeutlichen, dass auf dem Gebiet bereits vielfach mit unterschiedlichen Ergebnissen geforscht wurde. Einige Studien konnten einen positiven Zusammenhang zwischen der Umweltberichterstattung und der ökonomischen Leistung von Unternehmen nachweisen, viele andere hingegen nicht. Mit Hilfe dieser Ergebnisse wird eine Analyse des Zusammenhangs von Umweltberichterstattung und ökonomischer Leistung der Unternehmen des Good Company Rankings (GCR) vorbereitet. Dazu werden zu den Bewertungen des Rankings entsprechende ökonomische Kenngrößen ermittelt, welche die ökonomische Leistung der Unternehmen des GCR widerspiegeln. Anschließend wird der zu untersuchende Zusammenhang anhand linearer Einfach- und Mehrfachregressionen geprüft. Die Ergebnisse dieser Regressionen verdeutlichen, dass sich eine „gute“ ökonomische Leistung der Unternehmen positiv auf deren Umweltberichterstattung auswirken kann. Zudem kann auch eine quantitativ und qualitativ umfangreiche Umweltberichterstattung eine Steigerung der ökonomischen Leistung begründen. Diese kann unmittelbar nach Veröffentlichung der Berichterstattung erfolgen oder zeitversetzt in späteren Perioden.
|
200 |
Detection of KRAS Synthetic Lethal Partners through Integration of Existing RNAi ScreensChristodoulou, Eleni 18 December 2014 (has links) (PDF)
KRAS is a gene that plays a very important role in the initiation and development of several types of cancer. In particular, 90% of human pancreatic cancers are due to KRAS mutations. KRAS is difficult to target directly and a promising therapeutic path is its indirect inactivation by targeting one of its Synthetic Lethal Partners (SLPs).
A gene G is a Synthetic Lethal Partner of KRAS if the simultaneous perturbation of KRAS and G leads to cell death. In the past, efforts to identify KRAS SLPs with high-throughput RNAi screens have been performed. These studies have reported only few top-ranked SLPs. To our knowledge, these screens have never been considered in combination for further examination.
This thesis employs integrative analysis of the published screens, utilizing additional, independent data aiming at the detection of more robust therapeutic targets.
To this aim, RankSLP, a novel statistical analysis approach was implemented, which for the first time
i) consistently integrates existing KRAS-specific RNAi screens,
ii) consistently integrates and normalizes the results of various ranking methods,
iii) evaluates its findings with the use of external data and iv) explores the effects of random data inclusion.
This analysis was able to predict novel SLPs of KRAS and confirm some of the existing ones.
|
Page generated in 0.0682 seconds