Global ETD Search

1	Métodos para seleção de palavras-chave em sistemas de publicidade contextual Berlt, Klessius Renato 19 December 2012 (has links) Submitted by Geyciane Santos (geyciane_thamires@hotmail.com) on 2015-06-22T14:20:17Z No. of bitstreams: 1 Tese - Klessius Renato Berlt.pdf: 972646 bytes, checksum: c127b522da4fc3719f61df80976a23ad (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-24T13:07:06Z (GMT) No. of bitstreams: 1 Tese - Klessius Renato Berlt.pdf: 972646 bytes, checksum: c127b522da4fc3719f61df80976a23ad (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-24T13:07:02Z (GMT) No. of bitstreams: 1 Tese - Klessius Renato Berlt.pdf: 972646 bytes, checksum: c127b522da4fc3719f61df80976a23ad (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-24T14:44:22Z (GMT) No. of bitstreams: 1 Tese - Klessius Renato Berlt.pdf: 972646 bytes, checksum: c127b522da4fc3719f61df80976a23ad (MD5) / Made available in DSpace on 2015-06-24T14:44:22Z (GMT). No. of bitstreams: 1 Tese - Klessius Renato Berlt.pdf: 972646 bytes, checksum: c127b522da4fc3719f61df80976a23ad (MD5) Previous issue date: 2012-12-19 / CNPQ - Conselho Nacional de Desenvolvimento Científico e Tecnológico / In this work we address the problem of selecting keywords for contextual advertising systems in two di erent scenarios: web pages and short texts. We deal with the problem of selecting keywords from web pages using machine learning. While traditional machine learning approaches usually have the goal of selecting keywords considered as good by humans. The new machine learning strategy proposed drives the selection by the expected impact of the keyword in the nal quality of the ad placement system, which we name here as ad collection aware keyword selection (also referred in this work as ACAKS). This new approach relies on the judgement of the users about the ads each keyword can retrieve. Although this strategy requires a higher e ort to build the training set than previous approaches, we believe the gain obtained in recall is worth enough to make the ad collection aware approach a better choice. In experiments we performed with an ad collection and considering features proposed in a previous work, we found that the new ad collection aware approach led to a gain of 62% in recall over the baseline without dropping the precision values. Besides the new alternative to select keywords, we also study the use of features extracted from the ad collection in the task of selecting keywords. We also present three new methods to extract keywords from web pages which require no learning process and use Wikipedia as an external source of information to support the keyword selection. The information used from Wikipedia includes the titles of articles, co-occurrence of keywords and categories associated with each Wikipedia de nition. Experimental results show that our methods are quite competitive solutions for the task of selecting good keywords to represent target web pages, albeit being simple, e ective and time e cient. Besides selecting keywords from web pages we also study methods for selecting keywords from short texts. Short texts have became a very popular way users adopt for publishing content on the web. Every day, millions of users post their thoughts, needs and feelings on the Web through systems, such as social networks like Facebook and Twitter, or spaces for comments on news web sites. Much of these systems' revenue is from contextual advertising systems, thus selecting keywords in this new scenario raise as a new challenge. We propose and study a novel family of methods which uses the connectivity information present on Wikipedia to discover the most related concepts on each short textual unit. We also used the proposed methods as a new set of features on a Machine Learning Framework to boost the quality of the results obtained. We show that this approach presents a good performance and outperforms the best baselines by more than 35%. Finally, we apply the ACAKS approach on short texts and it yielded good results, outperforming a traditional machine learning approach by more than 80% in precision and 80% in recall. / Neste trabalho, nós estudamos o problema de seleção de palavras-chave para sistemas de publicidade contextualizada em dois diferentes cenários: páginas web e textos curtos. Nós lidamos com o problema de seleção de palavras-chave em páginas web utilizando aprendizado de máquina. Abordagens tradicionais baseadas em aprendizado de máquina geralmente possuem como objetivo selecionar palavras-chave consideradas como relevantes por um conjunto de usuários. Entretanto, a nova estratégia proposta nesse trabalho objetiva selecionar palavras-chave que gerem o melhor resultado na qualidade final do sistema de seleção de publicidade. A esta estratégia, nós demos o nome de ad collection aware keyword selection (também chamada de ACAKS). Esta nova abordagem baseia-se no julgamento dos usuário em relação às propagandas com as quais cada palavra-chave _e relacionada pelo sistema de seleção de publicidade. Apesar desta estratégia demandar um alto esforço para rotular o conjunto de treino em relação _as abordagens tradicionais, nós acreditamos que o ganho obtido em revocação é suficiente para fazer com que o ACAKS seja uma melhor alternativa. Nos experimentos que nós realizamos com uma coleção de anúncios e considerando as características propostas em um trabalho anterior, nós descobrimos que a nova abordagem proposta levou a um ganho de 62% em revocação em relação ao baseline utilizado sem perder precisão. Além desta nova alternativa para selecionar palavras-chave, nós estudamos ainda a utilização do conjunto de características estraída da coleção de anúncios para selecionar palavras-chave. Nós também apresentamos três novos métodos para extrair palavras chave de páginas web que não necessitam de treino e usam a Wikipédia como fonte externa de informação. A informação usada da Wikipédia inclui os títulos dos artigos, co ocorrência de palavras chave e categorias associadas com cada artigo da Wikipédia. Resultados experimentais mostram que nossos métodos são soluções competitivas para selecionar boas palavras-chave que representem bem o conteúdo de páginas web, enquanto se mantém simples eficientes. Além da seleção de palavras-chave de paginas web nós também estudamos métodos para selecionar palavras-chave em textos curtos. Textos curtos tem se tornado uma maneira muito popular que os usuários encontraram para publicar conteúdo na web. Todos os dias, milhões de usuários postam seus pensamentos, necessidades e sentimentos na web através de sistemas de redes sociais, como Facebook e Twitter, ou espaços para comentários em sites de notícias. Grande parte da renda destes sistemas _e proveniente de publicidade contextualizada, desta forma selecionar palavras-chave neste novo cenário surge como um novo desafio. Nós propomos e estudamos uma nova família de métodos que utiliza a informação de conectividade presente na Wikipédia para descobrir os conceitos mais relacionados em cada texto curto. Utilizamos também os métodos propostos como um novo conjunto de características em um Framework de aprendizado de máquina para melhorar a qualidade dos resultados obtidos. Nós mostramos que esta abordagem apresenta um bom desempenho e supera o melhor baseline em cerca de 35%. Finalmente, nós aplicamos a abordagem ACAKS em textos curtos e ele gerou bons resultados, superando uma abordagem tradicional baseada em aprendizado de máquina em cerca de 80% tanto em termos de precisão quanto revocação. Seleção de palavras-chave Aprendizado de máquina Publicidade contextualizada Keyword selection Contextual advertising
2	The Company that You Keep: When to Buy a Competitor's Keyword Shin, Woo Choel January 2010 (has links) <p>Search advertising refers to the practice where advertisers place their text-based advertisement on the search engine's result page along with the organic search results. With its growing importance, search advertising has seen a recent surge in academic interest. However, the literature has been ignoring some practical yet important problems of advertisers, including the keyword selection problem. In my dissertation, I focus on the keyword selection problem, more specifically, the choice of branded keywords in search advertising.</p> <p>My dissertation begins with an observation on different patterns of branded keyword purchase behavior by the brand owner and its competitor. Under some branded keywords, we observe in the sponsored link, only the brand owner or only the competitor. However, under some other branded keywords, we observe both firms, or neither of them. Upon this phenomenon, I aim to understand what drives this puzzling pattern in a competitive environment. To this purpose, I develop a duopoly model where two firms compete in the product market with both horizontally and vertically differentiated products. Their products are evaluated by consumers whose perception is affected by what they see in search advertising. With this setup, Then I derive a subgame perfect equilibrium of the two stage game.</p> <p>In a pricing equilibrium, I find that any benefit a firm gets from search advertising either due to exposure benefit or due to contrast or assimilation, helps this firm charge higher price while forcing the other firm charge lower price. This result affects the incentive for each firm to buy the branded keyword in the advertising stage. Specifically, firms have an incentive to buy the keyword only when the cost of advertising is justified by the exposure benefit but even in that case, each firm buys only when the detrimental context effect is not present. If the quality difference between the brand owner and the competitor is large and thus there exists a contrast between the two firms, the competitor with low quality product refrains from buying the keyword, because the contrast effect hurts the competitor. On the other hand, if the quality difference is small and thus two brands are assimilated, the brand owner with high quality product refuses to buy the keyword, because it is hurt by the assimilation effect. If the quality difference is in the intermediate range so that neither context effect is harmful to neither firm, both firms buy the keyword at the same time. On probing further the underlying incentives, I find that in some cases, the brand owner may buy its own keyword only to defend itself from the competitor's threat. In contrast, I also identify the case where the brand owner chooses to buy its own keyword and precludes the competitor from buying it. My result also suggests that both firms may be worse off by engaging in advertising, as in the prisoner's dilemma case.</p> <p>On an extension, I provide an analysis on the impact of the insufficient advertising budget. If the budget is limited, both firms may have an incentive to hurt the other firm taking the higher slot, by increasing the bid amount and thus quickly exhausting the competitor's budget. The budget constraint also deprives the advertisers of the incentive to buy the keyword and thus, the budget-constrained advertisers may refuse to match the competitor's purchase of the keyword. Finally, the experimental investigation shows the existence of the exposure effect and the context effects. It also supports the model prediction based on estimated model parameters together with the empirical observation.</p> / Dissertation Business Administration, Marketing Branded Keywords Game Theory Keyword Selection Raising Rivals' Cost Search Advertising
3	設計與實作一個針對遊戲論壇的中文文章整合系統 / Design and Implementation of a Chinese Document Integration System for Game Forums 黃重鈞, Huang, Chung Chun Unknown Date (has links) 現今網路發達便利，人們資訊交換的方式更多元，取得資訊的方式，不再僅是透過新聞，透過論壇任何人都可以快速地、較沒有門檻地分享資訊。也因為這個特性造成資訊量暴增，就算透過搜尋引擎，使用者仍需要花費許多精力蒐集、過濾與處理特定的主題。本研究以巴哈姆特電玩資訊站─英雄聯盟哈拉討論板為例，期望可以為使用者提供一個全面且精要的遊戲角色描述，讓使用者至少對該角色有大概的認知。本研究參考網路論壇探勘及新聞文件摘要系統，設計適用於論壇多篇文章的摘要系統。首先必須了解並分析論壇的特性，實驗如何從論壇挖掘出潛藏的資訊，並認識探勘論壇會遭遇的困難。根據前面的論壇分析再設計系統架構大致可分為三階段：1. 資料前處理：論壇文章與新聞文章不同，很難直接將名詞、動詞作為關鍵字，因此使用TF-IDF篩選出論壇文章中有代表性的詞彙，作為句子的向量空間維度。2. 分群：使用K-Means分群法分辨哪些句子是比較相似的，並將相似的句子分在同一群。 3. 句子挑選：根據句子的分群結果，依句子的關鍵字含量及TF-IDF選擇出最能代表文件集的句子。我們發現實驗分析過程中可以看到一些有用的相關資訊，在論文的最後提出可能的改善方法，期望未來可以開發更好的論壇文章分類方式。 / With the establishment of network infrastructure, forum users can provide information fast and easily. However, users can have information retrieved through search engines, but they still have difficulty handling the articles. This is usually beyond the ability of human processing. In this study, we design a tool to automate retrieval of information from each topic in a Chinese game forum. We analyze the characteristics of the game forum, and refer to English news summary system. Our method is divided into three phases. The first phase attempts to discover the keywords in documents by TF-IDF instead of part of speech, and builds a vector space model. The second phase distinguishes the sentences by the vector space model built in the first phase. Also in the second phase, K-means clustering algorithm is exploited to gather sentences with the same sense into the same cluster. In the third phase, we choose two features to weight sentences and order sentences according to their weights. The two features are keywords of a sentence and TF-IDF. We conduct an experiment with data collected from the game forum, and find useful information through the experiment. We believe the developed techniques and the results of the analysis can be used to design a better system in the future. 中文遊戲論壇文件摘要關鍵字擷取 K-Means分群 Chinese game forum summary keyword selection K-means clustering

1

Page generated in 0.0949 seconds