• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 244
  • 124
  • 44
  • 38
  • 31
  • 29
  • 24
  • 24
  • 13
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • Tagged with
  • 626
  • 626
  • 144
  • 130
  • 119
  • 114
  • 92
  • 88
  • 86
  • 81
  • 80
  • 76
  • 69
  • 65
  • 65
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
151

Positioning of online betting services : A case study of finding the gap between companies’ view versus customers’ view / Positionering av online betting tjänster : En studie för att hitta gapet mellanföretagets vy mot kundernas vy

BONIECKI, PAWEL January 2016 (has links)
Under det senaste decenniet har informationsteknologin utvecklats explosionartat. Införandet av smartphones och surfplattor som alltid är anslutna till Internet har förändrat vårt sätt att "konsumera" den. Allt detta ledde till digitaliseringen av betting tjänster och flytten till Internet. I deras tidiga dagar hade online betting tjänster ett lätt jobb att göra när det gällde positionering, de behövde bara finns på Internet och det var det som lockade kunder. Men tiderna har förändrats och online betting tjänster står inför nya utmaningar. De måste hitta en tydlig positionering och ha en konkurrensfördel gentemot sina konkurrenter, men det är inte en lätt uppgift. Därför koncentrerade sig denna studie på en utredning av positioneringen för dessa tjänster. Fallstudietillvägagångssätt valdes för detta examensarbete och det undersökta företaget var Betsson Group, som är ett av de större företagen med många varumärken i sin portfölj. Forskningen är begränsad till deras tre mest populära varumärken: Betsson, Betsafe och NordicBet med undantag för text mining där Bet365 (en konkurrent) också analyserades. Empirisk data som samlades in under denna studie kom från intervjuer på fallstudieföretaget,frågeformulär besvarat av kunder och genom text mining av online betting tjänsters omdömen. Studien resulterade i att hitta gapet mellan positioneringen av företaget, kunderna och online recensenterna. Kundernas vy och recensenters vy överrensstämmer ganska bra, men  öretagets uppfattning är helt avvikand. När kunderna väljer online betting tjänst bryr de sig mest om bästa odds, bonusar och mängden av betts, där å andra sidan företag placerar sig på ett statiskt sätt med målgrupper: folklig, nerdy och lyxig. Kunderna tenderar att inte bry sig om varumärkespositioneringen och driver företag in i ett priskrig, men samtidigt måste de känna igen varumärket för att lita på det med sina pengar. Studien bidrar till att förstå kunders vy när de väljer en online betting tjänst och denna  spekt kan generaliseras för hela online bettingindustrin. Dessutom kan en del av teorierna appliceras på andra online tjänster där bästa pris söks. / During the last decade, IT had a big burst. The introduction of smartphones and tablets that are always connected to the Internet changed our way of “consuming” it. All that lead to digitalization of betting services and moving them to the Internet. In their early days online betting services had an easy job to do regarding positioning, they only needed to exist on the Internet and that attracted customers. But, times have changed and online betting services face new challenges. They must find a clear positioning and have a competitive advantage over their competitors, but that is not an easy task to do. Therefore, this study concentrated on the investigation of the positioning for these services. Case study approach was chosen for this thesis and the investigated company was Betsson Group, which is one of the bigger companies with many brands in their portfolio. However, the research was delimited to their three most popular brands: Betsson, Betsafe and NordicBet with exception to text mining where Bet365 (a competitor) were also analyzed. Empirical data gather during this study came from interviews at the case company, questionnaire with customers and through text mining of online betting services reviews. The study resulted in finding the gap, in positioning, between the company, the customers and the online reviewers. Customers view and reviewers view align quite well, however company view is completely unaligned. When choosing an online betting service, customers mostly care about best odds, bonuses and variety of bets, where on the other hand companies position themselves in a static way with targets group of folksy, nerdy and luxurious. Customers tend to not care about the brand positioning, and push companies into a price war, however, they need to know the brand in order to trust it with their money. The study contributes to understanding the customers view when they choose an online betting service and this aspect can be generalized for the online betting industry as a whole. Moreover, some of the theories can be applied to other online services where best price is searched for.
152

Text Mining for Pathway Curation

Weber-Genzel, Leon 17 November 2023 (has links)
Biolog:innen untersuchen häufig Pathways, Netzwerke von Interaktionen zwischen Proteinen und Genen mit einer spezifischen Funktion. Neue Erkenntnisse über Pathways werden in der Regel zunächst in Publikationen veröffentlicht und dann in strukturierter Form in Lehrbüchern, Datenbanken oder mathematischen Modellen weitergegeben. Deren Kuratierung kann jedoch aufgrund der hohen Anzahl von Publikationen sehr aufwendig sein. In dieser Arbeit untersuchen wir wie Text Mining Methoden die Kuratierung unterstützen können. Wir stellen PEDL vor, ein Machine-Learning-Modell zur Extraktion von Protein-Protein-Assoziationen (PPAs) aus biomedizinischen Texten. PEDL verwendet Distant Supervision und vortrainierte Sprachmodelle, um eine höhere Genauigkeit als vergleichbare Methoden zu erreichen. Eine Evaluation durch Expert:innen bestätigt die Nützlichkeit von PEDLs für Pathway-Kurator:innen. Außerdem stellen wir PEDL+ vor, ein Kommandozeilen-Tool, mit dem auch Nicht-Expert:innen PPAs effizient extrahieren können. Drei Kurator:innen bewerten 55,6 % bis 79,6 % der von PEDL+ gefundenen PPAs als nützlich für ihre Arbeit. Die große Anzahl von PPAs, die durch Text Mining identifiziert werden, kann für Forscher:innen überwältigend sein. Um hier Abhilfe zu schaffen, stellen wir PathComplete vor, ein Modell, das nützliche Erweiterungen eines Pathways vorschlägt. Es ist die erste Pathway-Extension-Methode, die auf überwachtem maschinellen Lernen basiert. Unsere Experimente zeigen, dass PathComplete wesentlich genauer ist als existierende Methoden. Schließlich schlagen wir eine Methode vor, um Pathways mit komplexen Ereignisstrukturen zu erweitern. Hier übertrifft unsere neue Methode zur konditionalen Graphenmodifikation die derzeit beste Methode um 13-24% Genauigkeit in drei Benchmarks. Insgesamt zeigen unsere Ergebnisse, dass Deep Learning basierte Informationsextraktion eine vielversprechende Grundlage für die Unterstützung von Pathway-Kurator:innen ist. / Biological knowledge often involves understanding the interactions between molecules, such as proteins and genes, that form functional networks called pathways. New knowledge about pathways is typically communicated through publications and later condensed into structured formats such as textbooks, pathway databases or mathematical models. However, curating updated pathway models can be labour-intensive due to the growing volume of publications. This thesis investigates text mining methods to support pathway curation. We present PEDL (Protein-Protein-Association Extraction with Deep Language Models), a machine learning model designed to extract protein-protein associations (PPAs) from biomedical text. PEDL uses distant supervision and pre-trained language models to achieve higher accuracy than the state of the art. An expert evaluation confirms its usefulness for pathway curators. We also present PEDL+, a command-line tool that allows non-expert users to efficiently extract PPAs. When applied to pathway curation tasks, 55.6% to 79.6% of PEDL+ extractions were found useful by curators. The large number of PPAs identified by text mining can be overwhelming for researchers. To help, we present PathComplete, a model that suggests potential extensions to a pathway. It is the first method based on supervised machine learning for this task, using transfer learning from pathway databases. Our evaluations show that PathComplete significantly outperforms existing methods. Finally, we generalise pathway extension from PPAs to more realistic complex events. Here, our novel method for conditional graph modification outperforms the current best by 13-24% accuracy on three benchmarks. We also present a new dataset for event-based pathway extension. Overall, our results show that deep learning-based information extraction is a promising basis for supporting pathway curators.
153

Using hotel reviews to assess hotel frontline employees’ roles and performances

Hu, F., Trivedi, Rohit, Teichert, T. 20 April 2022 (has links)
Yes / This study aims to explore how marketers can use text mining to analyze actors, actions and performance effects of service encounters by building on the role theory. This enables hotel managers to use introduced methodology to measure and monitor frontline employees’ role behavior and optimize their service. Design/methodology/approach: The authors’ approach links text mining and importance-performance analysis with role theory’s conceptual foundations taking into account the hotel industry’s specifics to assess the effect of frontline hotel employees’ actions on consumer satisfaction and to derive specific management implications for the hospitality sector. Findings: This study identifies different actors involved in hotel frontline interactions revealing distinct role behaviors that characterize consumers’ perspectives of service encounters with different role types associated with front-office employees. This research also identifies role performance related to role behavior to improve service encounters. Practical implications: Customer–employee interactions can be assessed by user-generated contents (UGC). Performance evaluations relate to frontline employee roles associated with distinct role scripts, whereby different hotel segments require tailored role designs. Insights of this study can be used for service optimization, market positioning as well as for improving human resource management practices in the hotel industry. Originality/value: This study contributes to the service encounter literature by applying role theory in the text mining of UGC to assess frontline employees as actors and the effects of their actions on service quality delivery. / Science Foundation of Ministry of Education, PR China (Grant No. 21YJA630031)
154

Text Mining Infrastructure in R

Meyer, David, Hornik, Kurt, Feinerer, Ingo 31 March 2008 (has links) (PDF)
During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classiffication and string kernels. (authors' abstract)
155

運用文字探勘技術建置知識本體之研究 -以財經文件為例 / The study of constructing ontology with text mining techniques-Take the macroeconomic analysis report for an instance

蘇晏譁, Su, Yan Hua Unknown Date (has links)
隨著理財觀念日漸普及,個人與企業對於財經相關資訊的需求也與日俱增。然而,各式各樣隱含有用資訊的財經相關文件雖然越來越容易取得,但多是以文字的方式呈現,無固定格式,較不易整理。如何協助使用者自大量財經文件中尋找和擷取出適當的資訊,已經成為財經相關應用領域的重要研究議題。   在目前眾多知識挖掘相關方法中,文字探勘(text mining)即是以文件內容為主要分析對象,目的在於自非結構或半結構化的文件中萃取出有意義的知識。為此,若有一個良好的機制能將文字探勘所挖掘的知識加以彙整併保存,便可使財經文件內所隱藏的知識進一步的被應用在相關領域上(如決策支援、資訊檢索、知識管理,而這也成為提昇競爭力的重要利基。   本研究針對財經領域相關文件(如財經新聞、投顧之研究報告…等)進行分析,結合文字探勘知識挖掘的能力與知識本體的概念,運用文字探勘中重要演算法-關聯分析挖掘財經文件中隱含的關鍵資訊,提出一套藉由關聯分析所得之關聯規則建立知識本體的新方法。此方法有以下幾點特色:(1)建構一「財經標的模型」,定義財經文件內容之基本架構(2)將文字探勘挖掘之知識以知識本體的方式呈現(3)自動化的建構知識本體。 / With the concept of financial management popularizing, Personal and corporations are increasing the financial information demands. However, implicit in all kinds of useful information relevant macroeconomic documents readily available, but most text has no fixed format and difficult to collate. To support users from a large number of macroeconomic documents to find and retrieve the appropriate information has become important research topic in financial-related applications.  In many Knowledge Mining Approaches, Text mining is based on analyzing the content of the documents; it purpose to extract the meaningful knowledge from Unstructured or Semi-structured Documents. If there is a good mechanism to keep the accumulation of text mining knowledge exploration, the macroeconomic documents will enable to effective application of tacit knowledge in Decision Support, Information Retrieval, Knowledge Management and other related fields, it is the foundation of enhancing competitiveness.  This study aims to analyzing the macroeconomic documents such as the financial and economic news, the research report of investment consular… and so on, Combined with Text Mining knowledge mining ability and concept of Ontology, by one of the important algorithms to text mining-Association Analysis, discovered latent key information in macroeconomic documents, apply a new method of Association Rules for building Ontology. The method has the following characteristics:(1) Constructed 「Target Model」on structure framework to give a definition for the macroeconomic documents (2) To display the knowledge form text mining by Ontology approaches (3) Constructing Ontology automatically。
156

An Approach to Incorporate Texts into a Social Network Analysis of Communication Graphs

Bohn, Angela, Feinerer, Ingo, Hornik, Kurt, Mair, Patrick January 2009 (has links) (PDF)
Social network analysis (SNA) provides tools to examine relationships between people. Text mining (TM) allows capturing the text they produce in Web 2.0 applications, for example, however it neglects their social structure. This paper applies an approach to combine the two methods named "content-based SNA" (CB-SNA). Using the R mailing lists, R-help and R-devel, we show how this combination can be used to describe people's interests and to find out if authors who have similar interests actually communicate. We find that the expected positive relationship between sharing interests and communicating gets stronger as the centrality scores of authors in the communication networks increase. / Series: Research Report Series / Department of Statistics and Mathematics
157

Dolování textu na úrovni diskursu / Mining texts at the discourse level

Van de Moosdijk, Sara Francisca January 2014 (has links)
Linguistic discourse refers to the meaning of larger text segments, and could be very useful for guiding attempts at text mining such as document selection or summarization. The aim of this project is to apply discourse information to Knowledge Discovery in Databases. As far as we know, this is the first attempt at combining these two very different fields, so the goal is to create a basis for this type of knowledge extraction. We approach the problem by extracting discourse relations using unsupervised methods, and then model the data using pattern structures in Formal Concept Analysis. Our method is applied to a corpus of medical articles compiled from PubMed. This medical data can be further enhanced with concepts from the UMLS MetaThesaurus, which are combined with the UMLS Semantic Network to apply as an ontology in the pattern structures. The results show that despite having a large amount of noise, the method is promising and could be applied to domains other than the medical domain. Powered by TCPDF (www.tcpdf.org)
158

Fouille des médias sociaux français : expertise et sentiment / French Social Media Mining : Expertise and Sentiment

Abdaoui, Amine 05 December 2016 (has links)
Les médias sociaux ont changé notre manière de communiquer entre individus, au sein des organisations et des communautés. La disponibilité de ces données sociales ouvre de nouvelles opportunités pour comprendre et influencer le comportement des utilisateurs. De ce fait, la fouille des médias sociaux connait un intérêt croissant dans divers milieux scientifiques et économiques. Dans cette thèse, nous nous intéressons spécifiquement aux utilisateurs de ces réseaux et cherchons à les caractériser selon deux axes : (i) leur expertise et leur réputation et (ii) les sentiments qu’ils expriment.De manière classique, les données sociales sont souvent fouillées selon leur structure en réseau. Cependant, le contenu textuel des messages échangés peut faire émerger des connaissances complémentaires qui ne peuvent être connues via la seule analyse de la structure. Jusqu’à récemment, la majorité des travaux concernant l’analyse du contenu textuel était proposée pour l’Anglais. L’originalité de cette thèse est de développer des méthodes et des ressources basées sur le contenu pour la fouille des réseaux sociaux pour la langue Française.Dans le premier axe, nous proposons d'abord d’identifier l'expertise des utilisateurs. Pour cela, nous avons utilisé des forums qui recrutent des experts en santé pour apprendre des modèles de classification qui servent à identifier les messages postés par les experts dans n’importe quel autre forum. Nous démontrons que les modèles appris sur des forums appropriés peuvent être utilisés efficacement sur d’autres forums. Puis, dans un second temps, nous nous intéressons à la réputation des utilisateurs dans ces forums. L’idée est de rechercher les expressions de confiance et de méfiance exprimées dans les messages, de rechercher les destinataires de ces messages et d’utiliser ces informations pour en déduire la réputation des utilisateurs. Nous proposons une nouvelle mesure de réputation qui permet de pondérer le score de chaque réponse selon la réputation de son auteur. Des évaluations automatiques et manuelles ont démontré l’efficacité de l’approche.Dans le deuxième axe, nous nous sommes focalisés sur l’extraction de sentiments (polarité et émotion). Pour cela, dans un premier temps, nous avons commencé par construire un lexique de sentiments et d’émotions pour le Français que nous appelons FEEL (French Expanded Emotion Lexicon). Ce lexique est construit de manière semi-automatique en traduisant et en étendant son homologue Anglais NRC EmoLex. Nous avons ensuite comparé FEEL avec les lexiques Français de la littérature sur des benchmarks de référence. Les résultats ont montré que FEEL permet d’améliorer la classification des textes Français selon leurs polarités et émotions. Dans un deuxième temps, nous avons proposé d’évaluer de manière assez exhaustive différentes méthodes et ressources pour la classification de sentiments en Français. Les expérimentations menées ont permis de déterminer les caractéristiques utiles dans la classification de sentiments pour différents types de textes. Les systèmes appris se sont montrés particulièrement efficaces sur des benchmarks de référence. De manière générale, ces travaux ont ouvert des perspectives prometteuses sur diverses tâches d’analyse des réseaux sociaux pour la langue française incluant: (i) combiner plusieurs sources pour transférer la connaissance sur les utilisateurs des réseaux sociaux; (ii) la fouille des réseaux sociaux en utilisant les images, les vidéos, les géolocalisations, etc. et (iii) l'analyse multilingues de sentiment. / Social Media has changed the way we communicate between individuals, within organizations and communities. The availability of these social data opens new opportunities to understand and influence the user behavior. Therefore, Social Media Mining is experiencing a growing interest in various scientific and economic circles. In this thesis, we are specifically interested in the users of these networks whom we try to characterize in two ways: (i) their expertise and their reputations and (ii) the sentiments they express.Conventionally, social data is often mined according to its network structure. However, the textual content of the exchanged messages may reveal additional knowledge that can not be known through the analysis of the structure. Until recently, the majority of work done for the analysis of the textual content was proposed for English. The originality of this thesis is to develop methods and resources based on the textual content of the messages for French Social Media Mining.In the first axis, we initially suggest to predict the user expertise. For this, we used forums that recruit health experts to learn classification models that serve to identify messages posted by experts in any other health forum. We demonstrate that models learned on appropriate forums can be used effectively on other forums. Then, in a second step, we focus on the user reputation in these forums. The idea is to seek expressions of trust and distrust expressed in the textual content of the exchanged messages, to search the recipients of these messages and use this information to deduce users' reputation. We propose a new reputation measure that weighs the score of each response by the reputation of its author. Automatic and manual evaluations have demonstrated the effectiveness of the proposed approach.In the second axis, we focus on the extraction of sentiments (emotions and polarity). For this, we started by building a French lexicon of sentiments and emotions that we call FEEL (French Expanded Emotions Lexicon). This lexicon is built semi-automatically by translating and expanding its English counterpart NRC EmoLex. We then compare FEEL with existing French lexicons from literature on reference benchmarks. The results show that FEEL improves the classification of French texts according to their polarities and emotions. Finally, we propose to evaluate different features, methods and resources for the classification of sentiments in French. The conducted experiments have identified useful features and methods in the classification of sentiments for different types of texts. The learned systems have been particularly efficient on reference benchmarks.Generally, this work opens promising perspectives on various analytical tasks of Social Media Mining including: (i) combining multiple sources in mining Social Media users; (ii) multi-modal Social Media Mining using not just text but also image, videos, location, etc. and (iii) multilingual sentiment analysis.
159

Aprendizado não supervisionado de hierarquias de tópicos a partir de coleções textuais dinâmicas / Unsupervised learning of topic hierarchies from dynamic text collections

Marcacini, Ricardo Marcondes 19 May 2011 (has links)
A necessidade de extrair conhecimento útil e inovador de grandes massas de dados textuais, tem motivado cada vez mais a investigação de métodos para Mineração de Textos. Dentre os métodos existentes, destacam-se as iniciativas para organização de conhecimento por meio de hierarquias de tópicos, nas quais o conhecimento implícito nos textos é representado em tópicos e subtópicos, e cada tópico contém documentos relacionados a um mesmo tema. As hierarquias de tópicos desempenham um papel importante na recupera ção de informação, principalmente em tarefas de busca exploratória, pois permitem a análise do conhecimento de interesse em diversos níveis de granularidade e exploração interativa de grandes coleções de documentos. Para apoiar a construção de hierarquias de tópicos, métodos de agrupamento hierárquico têm sido utilizados, uma vez que organizam coleções textuais em grupos e subgrupos, de forma não supervisionada, por meio das similaridades entre os documentos. No entanto, a maioria dos métodos de agrupamento hierárquico não é adequada em cenários que envolvem coleções textuais dinâmicas, pois são exigidas frequentes atualizações dos agrupamentos. Métodos de agrupamento que respeitam os requisitos existentes em cenários dinâmicos devem processar novos documentos assim que são adicionados na coleção, realizando o agrupamento de forma incremental. Assim, neste trabalho é explorado o uso de métodos de agrupamento incremental para o aprendizado não supervisionado de hierarquias de tópicos em coleções textuais dinâmicas. O agrupamento incremental é aplicado na construção e atualização de uma representação condensada dos textos, que mantém um sumário das principais características dos dados. Os algoritmos de agrupamento hierárquico podem, então, ser aplicados sobre as representa ções condensadas, obtendo-se a organização da coleção textual de forma mais eficiente. Foram avaliadas experimentalmente três estratégias de agrupamento incremental da literatura, e proposta uma estratégia alternativa mais apropriada para hierarquias de tópicos. Os resultados indicaram que as hierarquias de tópicos construídas com uso de agrupamento incremental possuem qualidade próxima às hierarquias de tópicos construídas por métodos não incrementais, com significativa redução do custo computacional / The need to extract new and useful knowledge from large textual collections has motivated researchs on Text Mining methods. Among the existing methods, initiatives for the knowledge organization by topic hierarchies are very popular. In the topic hierarchies, the knowledge is represented by topics and subtopics, and each topic contains documents of similar content. They play an important role in information retrieval, especially in exploratory search tasks, allowing the analysis of knowledge in various levels of granularity and interactive exploration of large document collections. Hierarchical clustering methods have been used to support the construction of topic hierarchies. These methods organize textual collections in clusters and subclusters, in an unsupervised manner, using similarities among documents. However, most existing hierarchical clustering methods is not suitable for scenarios with dynamic text collections, since frequent clustering updates are necessary. Clustering methods that meet these requirements must process new documents that are inserted into textual colections, in general, through incremental clustering. Thus, we studied the incremental clustering methods for unsupervised learning of topic hierarchies for dynamic text collections. The incremental clustering is used to build and update a condensed representation of texts, which maintains a summary of the main features of the data. The hierarchical clustering algorithms are applied in these condensed representations, obtaining the textual organization more efficiently. We experimentally evaluate three incremental clustering algorithms available in the literature. Also, we propose an alternative strategy more appropriate for construction of topic hieararchies. The results indicated that the topic hierarchies construction using incremental clustering have quality similar to non-incremental methods. Furthermore, the computational cost is considerably reduced using incremental clustering methods
160

Aprendizado de máquina parcialmente supervisionado multidescrição para realimentação de relevância em recuperação de informação na WEB / Partially supervised multi-view machine learning for relevance feedback in WEB information retrieval

Soares, Matheus Victor Brum 28 May 2009 (has links)
Atualmente, o meio mais comum de busca de informações é a WEB. Assim, é importante procurar métodos eficientes para recuperar essa informação. As máquinas de busca na WEB usualmente utilizam palavras-chaves para expressar uma busca. Porém, não é trivial caracterizar a informação desejada. Usuários diferentes com necessidades diferentes podem estar interessados em informações relacionadas, mas distintas, ao realizar a mesma busca. O processo de realimentação de relevância torna possível a participação ativa do usuário no processo de busca. A idéia geral desse processo consiste em, após o usuário realizar uma busca na WEB permitir que indique, dentre os sites encontrados, quais deles considera relevantes e não relevantes. A opinião do usuário pode então ser considerada para reordenar os dados, de forma que os sites relevantes para o usuário sejam retornados mais facilmente. Nesse contexto, e considerando que, na grande maioria dos casos, uma consulta retorna um número muito grande de sites WEB que a satisfazem, das quais o usuário é responsável por indicar um pequeno número de sites relevantes e não relevantes, tem-se o cenário ideal para utilizar aprendizado parcialmente supervisionado, pois essa classe de algoritmos de aprendizado requer um número pequeno de exemplos rotulados e um grande número de exemplos não-rotulados. Assim, partindo da hipótese que a utilização de aprendizado parcialmente supervisionado é apropriada para induzir um classificador que pode ser utilizado como um filtro de realimentação de relevância para buscas na WEB, o objetivo deste trabalho consiste em explorar algoritmos de aprendizado parcialmente supervisionado, mais especificamente, aqueles que utilizam multidescrição de dados, para auxiliar na recuperação de sites na WEB. Para avaliar esta hipótese foi projetada e desenvolvida uma ferramenta denominada C-SEARCH que realiza esta reordenação dos sites a partir da indicação do usuário. Experimentos mostram que, em casos que buscas genéricas, que o resultado possui um bom diferencial entre sites relevantes e irrelevantes, o sistema consegue obter melhores resultados para o usuário / As nowadays the WEB is the most common source of information, it is very important to find reliable and efficient methods to retrieve this information. However, the WEB is a highly volatile and heterogeneous information source, thus keyword based querying may not be the best approach when few information is given. This is due to the fact that different users with different needs may want distinct information, although related to the same keyword query. The process of relevance feedback makes it possible for the user to interact actively with the search engine. The main idea is that after performing an initial search in the WEB, the process enables the user to indicate, among the retrieved sites, a small number of the ones considered relevant or irrelevant according with his/her required information. The users preferences can then be used to rearrange sites returned in the initial search, so that relevant sites are ranked first. As in most cases a search returns a large amount of WEB sites which fits the keyword query, this is an ideal situation to use partially supervised machine learning algorithms. This kind of learning algorithms require a small number of labeled examples, and a large number of unlabeled examples. Thus, based on the assumption that the use of partially supervised learning is appropriate to induce a classifier that can be used as a filter for relevance feedback in WEB information retrieval, the aim of this work is to explore the use of a partially supervised machine learning algorithm, more specifically, one that uses multi-description data, in order to assist the WEB search. To this end, a computational tool called C-SEARCH, which performs the reordering of the searched results using the users feedback, has been implemented. Experimental results show that in cases where the keyword query is generic and there is a clear distinction between relevant and irrelevant sites, which is recognized by the user, the system can achieve good results

Page generated in 0.0894 seconds