• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 4
  • 1
  • Tagged with
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • 4
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

基於意見探勘與主題模型之部落格食記剖析研究 / A Study of Opinion Mining and Topic Model Analysis on Food Diaries

賴柏帆, Lai, Po Fan Unknown Date (has links)
隨著Web 2.0興起,社群網站在資訊傳遞與獲取所占比重相當高。以美食領域來看,人們在進餐廳前先行閱覽食記評論之情形越來越常見,而部落格文章因圖文並茂,常被消費者列入參考比較之來源。儘管這一類食記內容相對短篇食評來說較為完整,但評論分散於文章中,且多半沒有評分可供參考,讀者很難在第一時間獲悉評論樣貌,得花上一番心力進行閱覽,才能對餐廳整體有所評鑑。 本研究提出一套基於意見探勘與主題模型的食記剖析方法,由部落格中各餐廳貼文情緒量來反映正負面評價,將提及評論歸納為「食物」、「服務」及「環境」三個評分面向,進而提供該家餐廳的整體推薦分數,供讀者快速參閱之。實驗語料自痞客邦美食類貼文中選定添好運台灣-台北站前店、京星港式飲茶PART2、金泰日式料理-內湖店以及喀佈貍(一店)大眾和風串燒居酒洋食堂,合計4家餐廳與200篇語料。 透過LDA主題模型對食記敘述進行主題式分群,使擁有相近主題概念的句子分為一群,並歸類至各面向,例如喀佈貍(一店)之語料可分為10群主題語句,食物面向上有6群,服務與環境面向各為2群。另一方面,為了更有效辨別食記中含有的正負向情緒,本研究透過語意導向方法(SO-PMI)來計算食記中常出現情緒詞彙之極性,以建置該領域的意見詞詞庫。 實驗結果方面,以線上餐廳評論網站-iPeen愛評網作為驗證對象,顯示其語料的平均情緒量相近,於大眾觀感與評價上傾向一致,且相較一般評論網站,本研究能從較細微的面向來切入,並以情緒量反映真實的餐廳評價。最後提出未來欲探討與改善之處,供後續研究參考之。 / As the time of Web 2.0 rise, social media platform plays a crucial role in transferring and receiving information. More and more people get used to reading the related posts before having meal. Because of its richness in content and referring photographs, blog posts are most frequently used for reference. Although the blog posts are more complete regarding their content than other short reviews, the actual reviews are scattered among words that are simply descriptions, and there are no grading scale to take as reference. These all together gives the reader a hard time to efficiently organize the overview of the review, and for them to, therefore, make the decision if they should go to the restaurant. Our study offers a method of analyzing food diaries based on opinion mining and topic model. The scale of emotion in a blog post about a restaurant is used as the reflection of its review's positive or negative. The comments are categorized into food, service and environment. And the restaurant will be graded based on these three aspects to further provide the user an overall score of recommendation. We collected total of 200 articles written on 4 restaurants in PIXNET, then categorized the contents using LDA (Latent Dirichlet Allocation) model base on their theme. The sentences with similar theme with be put into a group, then be further categorized to the three aspects that was mentioned earlier. On the other hand, to better distinguish if the emotion in certain food diary is positive or negative, our study calculated the polarity of common opinion-based words in food diaries using semantic orientation (SO-PMI), and built an opinion corpus specifically for food diaries. In terms of the result, using iPeen, a restaurant rating website, as test reference, it shows that the average scales of opinion of the restaurants we got using our method are close to iPeen, which in this case we can say are close to the public opinion and review. Furthermore, compare to common rating website, our study touches on even the minute aspect, and use the cumulative opinion to reflect the true blog authors' evaluation of the restaurant. Lastly, we would like to bring up what we intend to discuss and improve in the future for upcoming research's reference.
2

多重插補法在線上使用者評分之應用 / Managing online user-generated product reviews using multiple imputation methods

李岑志, Li, Cen Jhih Unknown Date (has links)
隨著網路普及,人們越來越常在網路上購物並在線上評價商品,產生了非常大的口碑效應。不論對廠商或對消費者來說,線上商品評論都已經變得非常重要;消費者能藉由他人購買經驗判斷產品優劣,廠商能藉由消費者評價來提升產品品質,目前已有許多電子商務網站都有蒐集消費者購買產品後的意見回饋。 這些網站中有些提供消費者能對產品打一個總分並寫一段文字評論,然而每個消費者所評論的產品特徵通常各有不同,尤其是較晚購買的消費者更可能因為自己的意見已經有人提過而省略。將每個人提到的文字敘述量化為數字分數時,沒有寫到的特徵將會使量化後的資料存在許多遺漏值。 同時消費者也有可能提到一些不重要的特徵,若能找到消費者評論中,各個特徵影響消費者的多寡,廠商就能針對產品較重要的缺點改進。本研究將會著重探討消費者所提到的特徵對產品總分的影響,以及這些遺漏值填補後是否能接近消費者真實意見。 過去許多填補遺漏值的方法都是一次填補全部資料,並沒有考慮消費者會受到時間較早的評論影響。本研究設計一套多重插補的方法並透過模擬驗證,以之填補亞馬遜網站的Canon 系列 SX210、SX230、SX260等三個世代數位相機之消費者評論資料。研究結果指出此方法能夠準確估計各項特徵對產品總分的影響。 / Online user-generated product reviews have become a rich source of product quality information for both producers and customers. As a result, many E-commerce websites allow customers to rate products using scores, and some together with text comments. However, people usually comment only on the features they care about and might omit those have been mentioned by previous customers. Consequently, missing data occur when analyzing comments. In addition, customers may comment the features which influence neither their satisfaction nor sales volume. Thus, it is important to find the significant features so that manufacturers can improve the main defects. Our research focuses on modeling customer reviews and their influence on predicting overall ratings. We aim to understand whether, by filling up missing values, the critical features can be identified and the features rating authentically reflect customer opinion. Many previous studies fill whole the dataset, but not consider that customer reviews might be influenced by the foregoing reviews. We propose a method based on multiple imputation and fill the costumer reviews of Canon digital camera (SX210, SX230, SX260 generations) on Amazon. We design a simulation to verify the method’s effectiveness and the method get a great result on identifying the critical features.
3

網路評價搜尋結果的正負意見分類系統 / A sentiment classification system on search results of web opinions

黃泓彰, Huang, Hung Chang Unknown Date (has links)
本研究嘗試建置一個包含兩個主要功能的系統,分別是網路評價搜尋以及情感分類。在網路評價搜尋的部份,我們使用Google搜尋並蒐集一攜帶型智慧裝置(智慧型手機、平板電腦與筆記型電腦)的網路評價搜尋結果;情感分類的部分則是將搜尋結果依照對該產品的意見分類為,共有正面/負面/中立、正面/負面、正面/非正面,以及負面/非負面等四種分類方式。為了建置此系統,我們首先從知名的網路論壇Mobile01和批踢踢蒐集和攜帶型智慧裝置有關的網路文章以及產品名稱,接著以人工的方式標記每篇文章,以及部分文章中的句子的情感。本研究設計了兩個層次的情感分類實驗,我們首先從語句層次出發,以監督式機器學習法訓練將句子分為正面/負面/中立等三個類別的分類模型後,再進入文章層次,將句子的意見彙整,並同樣以監督式機器學習法訓練四種不同文章層次的分類模型:正面/負面/中立、正面/負面、正面/非正面,以及負面/非負面。我們分別選出四種分類實驗中表現最佳的模型,並用於系統建置,其中表現最佳的是分類為正面/負面的分類模型,平均的F-measure為0.87;其次是分類為負面/非負面的模型,對負面類別的F-measure為0.83;接著是分類為正面/非正面的模型,對正面類別的F-measure為0.81;表現最差的是正面/負面/中立的分類,平均的F-measure為0.77。在正面/負面分類的準確率上,本研究的表現並不壞於過去以英文為主要語言的相關研究。最後,我們也以過去不經過語句層次的分類方法進行實驗並比較,其結果發現經過語句層次的情感分類比不經過語句層次的情感分類較佳。 / In this research, we implemented a system that retrieves the search results of mobile phones, tablets, and notebooks from Google, and then classifies them as: (1) positive, negative, or neutral, (2) positive or negative, (3) positive or non-positive, (4) negative or non-negative. To build this system, first we collected some documents about mobile phones, tablets, and notebooks on two popular web forums: mobile01.com and ptt.cc. Next, a sentiment label (positive, negative, or neutral) is attached to each document and each sentence of these documents. We designed a two-level supervised sentiment classification experiment. At sentence level, we trained classifiers that classify sentences as positive, negative, or neutral. The best sentence classifier was then used at document level. At document level, the sentiment labels of the sentences in documents are used. We trained classifiers in four different classification problems: (1) positive, negative, or neutral, (2) positive vs. negative, (3) positive vs. non-positive, (4) negative vs. non-negative. The best is the second classifier with an average F-measure of 0.87. The next is the fourth classifier with an F-measure of 0.83 on negative class, and then comes with the third classifier with an F-measure of 0.81 on positive class. The last is the first classifier with an average F-measure of 0.77. Our accuracy is not worse than the past English study on the classification of positive vs. negative. Finally, we conducted another classification experiment using document-level-only classification method, and the results showed that our two-level sentiment classification (first sentence level, then document level) outperforms document-level-only sentiment classification.
4

運用資料探勘分析社會輿情與廣告影響房地產行情短期波動行為之研究 / A Study of Applying Data Mining to Find the Influence of Public Opinion and Advertisement on the Sales of Real Estate in the Short Run

張修維, Chang, Hsiu Wei Unknown Date (has links)
網際網路時代資訊接收的便利性,使得大眾容易接收到媒體所發布的媒體資訊,而這些資料具含的意見詞彙間接反應出群眾對特定主題的情緒傾向。在針對房地產的媒體當中,當特定區域的房地產市場具有良好的發展空間而成為交易熱區時,這些針對特定區域且帶含情緒的房市篇章報導或其他影響房市之相關新聞以及廣告往往會影響我們的購屋決策。 本研究將以桃園市及台中市-兩個近五年來台灣房市較為熱門的區域作為研究區域進行分析及研究,期望找出在短期時間新聞輿情及廣告和房市交易價量的相關性以及會影響該房地產市場之因素。首先蒐集桃園市及台中市的實價登錄的房地產交易資料以及廣告後,運用文字探勘分析房市整體輿情與兩都市房地產價量之關聯性,再將新聞分群後找出特徵詞,個別建立時間序列來了解各種情緒及房地產價量的共同移動性,並結合廣告投入量找出房地產市場價量以及影響因素的領先關係。並透過自建的類神經網路模型建立針對桃園市和台中市的交易量預測模型以及針對特定房市熱門區域-青埔和七期的交易量預測模型,並透過計算輸入變數的權重總和來判別新聞情緒對於房地產成交價量的影響程度。 研究首先提供了對於新聞情緒的分類包含區域經濟情緒、區域社會情緒、區域環境情緒、區域政治情緒、稅制情緒、選舉情緒。接著進行時間序列分析指出總情緒序列與成交量的時間序列相關係數都有高於70%以上,桃園市成交量與桃園市情緒的相關係數為0.73,台中市成交量與台中市情緒的相關係數為0.81,皆呈現高度正相關,顯示桃園及台中的房市交易量與情緒現存在高度相關性。在特定新聞類別當中,透過兩個城市的相關係數比對顯示稅制新聞情緒,區域環境相關情緒,區域社會相關情緒,以上三個情緒跟房市的交易量共同移動較為明顯,相關係數皆在0.5左右甚至以上,可見這些類別的新聞能夠適時反映大眾對於特定區域的房地產的看好及看壞。在此階段也透過領先指標驗證了情緒以及廣告是會領先房市交易量,桃園以及台中兩個區域都有情緒領先交易量一個月的現象。針對特定區域的交易量研究包含青埔特區及七期重劃區,也發現到兩地的交易量高峰前一至兩個月都有一波廣告的高峰。 而在類神經網路模型方面的研究結果能夠良好地預測漲跌趨勢,利用桃園資料進行訓練並以台中資料做為測試的模型在19次的漲跌中預測出17次,而將百分之七十的桃園及台中混合資料進行訓練並其餘百分之三十做為測試的模型結果也成功在14次漲跌中預測出10次,顯示模型效果預測能力良好,並透過將輸入權重加總的方式來衡量各輸入變數的影響程度,研究結果指出總情緒,稅制情緒量,區域環境情緒量與兩地房地產市場交易量最有關聯且影響最重。最後利用時間序列得知廣告高峰會領先總交易高峰一至兩個月的特性,利用從2012年10月至2016年2月的青埔特區資料及2012年10月至2013年12月的七期重劃區資料混合進行訓練並以2014年1月至2016年2月七期重劃區資料做為測試資料的模型能夠有效在兩年內預測中三次交易高峰,顯示該模型能透過預測出下一期的廣告投入量做為中介變數進而推估出交易量高峰的時間透過此模型可在未來應用於相關政策投入市場後對市場交易量的影響,也能夠快速有效的得到預測結果,而在針對特定市場我們也可以透過預測廣告以及運用廣告為交易量的領先特性來了解在近期何時會有交易量高峰,如能配合了解市場輿情脈絡,可為房屋仲介以及建商在更精確的時間點投放廣告時機點達到廣告的最大效益。
5

探索美國財務報表的主觀性詞彙與盈餘的關聯性:意見分析之應用 / Exploring the relationships between annual earnings and subjective expressions in US financial statements: opinion analysis applications

陳建良, Chen, Chien Liang Unknown Date (has links)
財務報表中的主觀性詞彙往往影響市場中的參與者對於報導公司價值和獲利能力衡量的決策判斷。因此,公司的管理階層往往有高度的動機小心謹慎的選擇用詞以隱藏負面的消息而宣揚正面的消息。然而使用人工方式從文字量極大的財務報表挖掘有用的資訊往往不可行,因此本研究採用人工智慧方法驗證美國財務報表中的主觀性多字詞 (subjective MWEs) 和公司的財務狀況是否具有關聯性。多字詞模型往往比傳統的單字詞模型更能掌握句子中的語意情境,因此本研究應用條件隨機域模型 (conditional random field) 辨識多字詞形式的意見樣式。另外,本研究的實證結果發現一些跡象可以印證一般人對於財務報表的文字揭露往往與真實的財務數字存在有落差的印象;更發現在負向的盈餘變化情況下,公司管理階層通常輕描淡寫當下的短拙卻堅定地承諾璀璨的未來。 / Subjective assertions in financial statements influence the judgments of market participants when they assess the value and profitability of the reporting corporations. Hence, the managements of corporations may attempt to conceal the negative and to accentuate the positive with "prudent" wording. To excavate this accounting phenomenon hidden behind financial statements, we designed an artificial intelligence based strategy to investigate the linkage between financial status measured by annual earnings and subjective multi-word expressions (MWEs). We applied the conditional random field (CRF) models to identify opinion patterns in the form of MWEs, and our approach outperformed previous work employing unigram models. Moreover, our novel algorithms take the lead to discover the evidences that support the common belief that there are inconsistencies between the implications of the written statements and the reality indicated by the figures in the financial statements. Unexpected negative earnings are often accompanied by ambiguous and mild statements and sometimes by promises of glorious future.

Page generated in 0.0312 seconds