Global ETD Search

21	社群媒體新詞偵測系統以PTT八卦版為例 / Chinese new words detection from social media 王力弘, Wang, Li Hung Unknown Date (has links) 近年來網路社群非常活躍,非常多的網民都以社群媒體來分享與討論時事。不傴於此,網路上的群聚力量已經漸漸從虛擬走向現實,社群媒體的傳播力已經可以與大眾傳媒比擬。像台大 PTT 的八卦版就是一個這樣具指標性的社群媒體,許多新聞或是事件都從此版開始討論,然後擴散至主流媒體。透過觀察, 網路鄉民常常會以略帶灰諧的方式,發明新的詞彙去討論時事與人物,例如: 割闌尾、祭止兀、婉君、貫老闆...等。這些新詞的出現,很可能代表一個新的熱門話題的正在醞釀中。但若以傳統的關鍵詞搜索,未必能找到這些含有此類新詞的討論文章。因此,本研究提出一個基於「滑動視窗(Sliding window)」的技巧來輔助中文斷詞,以利找出這些新詞,並進而透過這些新詞對來探詢社群媒體中的新興話題。我們以此技巧修改知名的Jieba 斷詞工具,加上新詞偵測的機制,並以 PTT的八卦版為監測對象,經過長期的的監測後,結果顯示我們的系統可以正確的找出絕大多數的新詞。此外,經過與主流媒體交叉比對,本系統發現的新詞與新話題的確有極高的相關性。 / Internet new residents like to share society current event on the social media website and the influence is propagate to the reality now. For example: On Gossip(八卦版) discussion board of 台大 PTT BBS that had many post are turn into the TV News every day. After some survey we found people like to crate new words to explain society topics, This paper attempt to build up a system to detect the new words from social media. But detect the Chinese new words from unknown words is a thorny problem, on this paper we invent a way – 『Sliding Window』 to elevate the new words detection from Jieba in Chinese words Segmentation, After testing we got 96.94% correct rate and cross valid the detection result by ours system with News and Google Trending we proved the new words detection is a reasonable way to discover new topic. 中文斷詞新詞偵測社群媒體分析 Chinese Words Segmentation New Words Detection Social Media Data Analysis
22	多重群集的偵測研究 / A study of methods for detecting multiple clusters 黃柏誠, Huang, Bo Cheng Unknown Date (has links) 檢測某些地區是否有較高的疾病發生率，亦即群集(Cluster)現象，是近年來空間統計(Spatial Statistics)在流行病學的主要應用之一，常見的偵測方法包括SaTScan (Kulldorff, 1995)及Spatial Scan Statistic (Li et al., 2011)。這些方法多半大都採用一次性偵測，也就是比較疑似群集之內外相對風險(Relative Risk)，如此確實可提高計算效率，同時檢視所有疑似群集。然而，一次性偵測會受到群集外其他發生率較高群集的影響，對於相對風險較小群集的偵測能力過於保守(Zhang et al., 2010)。本文以多重群集偵測為研究目標，以逐次分析的方式修正SaTScan等群集偵測方法，逐一篩選出發生率較高的顯著群集，並探討逐次分析在使用上的時機及限制。除了透過電腦模擬，測試逐次群集分析的改進效果，我們也分析臺灣地區的癌症死亡率，比較偵測結果的差異。研究發現，逐次群集偵測確實能提高相對風險較小群集的偵測能力，像是在相對風險不大於1.6的群集時尤其有效，但若相對風險大於1.6時，SaTScan的偵測能力不受多重群集的影響。 / Cluster detection, one of the major research topics in spatial statistics, has been applied to identify areas with higher incidence rates and is very popular in many fields such as epidemiology. Many famous cluster detection methods are proposed, such as SaTScan (Kulldorff, 1995) and Spatial Scan Statistic (Li et al., 2011). Most of these methods adapt the idea for comparing the relative risk inside and outside the suspected clusters. Although these methods are efficient computationally, clusters with smaller relative risk are not easy to be detected (Zhang et al, 2010). The goal of this study is to apply the idea of sequential search into SaTScan, in order to improve the power of detecting clusters with smaller relative risk, and to explore the limitation of sequential method. The computer simulation and empirical study (Taiwan cancer mortality data) are used to evaluate the sequential SaTScan. We found that the Sequential method can improve the power of cluster detection, especially effective for the cases where the clusters with relative risk not greater than 1.6. However, the sequential method also suffers from identifying false clusters. 群集偵測空間統計逐次分析電腦模擬 Cluster detection Spatial statistics Sequential method Computer simulation
23	基於大數據資料的非監督分散式分群演算法 / An Effective Distributed GHSOM Algorithm for Unsupervised Clustering on Big Data 邱垂暉, Chiu, Chui Hui Unknown Date (has links) 基於屬性相似度將樣本進行分群的技術已經被廣泛應用在許多領域，如模式識別，特徵提取和惡意行為偵測。由於此技術的重要性，很多人已經將各種分群技術利用分散式框架進行再製，例如K-means搭配Hadoop在Apache Mahout平台上。由於K-means需要預先定義分群數量，而自組織映射圖（SOM）需要預先定義圖的大小，所以能夠自動將樣本依照樣本間的變化容差進行分群的GHSOM（增長層次自組織映射圖）就提供了一個很棒的非監督學習方法用來針對某些資訊不完整的資料。然而，GHSOM目前並不是一個分散式的演算法，這就限制了其在大數據資料的應用上。在本篇論文中，我們提出了一種新的分散式GHSOM演算法。我們使用Scala的Actor Model來實現GHSOM的分散式系統，我們將GHSOM演算法中的水平擴增以及垂直擴增交由Actor來處理並顯示出顯著的性能提升。為了評估我們所提出的方法，我們收集並分析了數千個惡意程式在現實生活中的執行行為，並通過在數百萬個樣本上進行非監督分群後推導出惡意程式行為的檢測規則來顯示其性能的改進、規則有效性以及實踐中的潛在用法。 / Clustering techniques that group samples based on their attribute similarity have been widely used in many fields such as pattern recognition, feature extraction and malicious behavior characterization. Due to its importance, various clustering techniques have been developed with distributed frameworks such as K-means with Hadoop in Apache Mahout for scalable computation. While K-means requires the number of clusters and self organizing maps (SOM) requires the map size to be given, the technique of GHSOM (growing hierarchical self organizing maps) that clusters samples dynamically to satisfy the requirement on tolerance of variation between samples, poses an attractive unsupervised learning solution for data that have limited information to decide the number of clusters in advance. However it is not scalable with sequential computation, which limits its applications on big data. In this paper, we present a novel distributed algorithm on GHSOM. We take advantage of parallel computation with scala actor model for GHSOM construction, distributing vertical and horizontal expansion tasks to actors and showing significant performance improvement. To evaluate the presented approach, we collect and analyze execution behaviors of thousands of malware in real life and derive detection rules with the presented unsupervised clustering on millions samples, showing its performance improvement, rule effectiveness and potential usage in practice. 非監督式分群 GHSOM Actor Model 惡意程式偵測平行運算 Unsupervised clustering GHSOM Actor model Malware detection Parallel computation
24	臉書相片分類及使用者樣貌分析 / Identifying User Profile Using Facebook Photos. 張婷雅, Chang,Ting Ya Unknown Date (has links) 除了文字訊息，張貼相片也是臉書使用者常用的功能，這些上傳的照片種類繁多，可能是自拍照、風景照、或食物照等等，本論文的研究以影像分析為出發點，探討相片內容跟發佈者間之關係，希望藉由相片獲得的資訊，輔助分析使用者樣貌。本研究共收集32位受測者上傳至臉書的相片，利用電腦視覺技術分析圖像內容，如人臉偵測、環境識別、找出影像上視覺顯著的區域等，藉由這些工具所提供的資訊，將照片加註標籤，以及進行自動分類，並以此兩個層次的資訊做為特徵向量，利用階層式演算法進行使用者分群，再根據實驗結果去分析每一群的行為特性。透過此研究，可對使用者進行初步分類、瞭解不同的使用者樣貌，並嘗試回應相關問題，如使用者所張貼之相片種類統計、不同性別使用者的上傳行為、依據上傳圖像內容，進行使用者樣貌分類等，深化我們對於臉書相片上傳行為的理解。 / Apart from text messages, photo posting is a popular function of Facebook. The uploaded photos are of various nature, including selfie, outdoor scenes, and food. In this thesis, we employ state-of-the-art computer vision techniques to analyze image content and establish the relationship between user profile and the type of photos posted. We collected photos from 32 Facebook users. We then applied techniques such as face detection, scene understanding and saliency map identification to gather information for automatic image tagging and classification. Grouping of users can be achieved either by tag statistics or photo classes. Characteristics of each group can be further investigated based on the results of hierarchical clustering. We wish to identify profiles of different users and respond to questions such as the type of photos most frequently posted, gender differentiation in photo posting behavior and user classification according to image content, which will promote our understanding of photo uploading activities on Facebook. 臉書人臉偵測環境識別影像標籤使用者樣貌分析 Facebook face detection scene understanding image tag user behavior analysis
25	以未經糾正之 DMC 航空影像自動產製崩塌地地理空間資料與資料庫建置 / Automated Generation of Landslide Geospatial Data from Unrectified Aerial DMC Imagery and Database Building 胡惠雅 Unknown Date (has links) 完善的崩塌地資料庫有助於地區土地利用的適宜性評估、與環境保護措施之研訂。目前，崩塌地地理空間資料(Geospatial data)的產生方法主要為：人為判釋經正射糾正(Ortho-rectification)的遙測影像，基於該影像，將辨識目標數位化(Digitizing)。然而，遙測影像的「正射糾正」與「人為判釋」往往不敷災後的緊急需求。為促進資料收集效率，本研究試圖發展一套自動化流程：以「未經糾正的遙測影像」為判釋對象，判釋作業以「物件式影像分類(Object-based classification)技術」進行，並利用「現存地形資料」，實現自動判釋所產生之辨識成果的地理對位(Georeferencing)與過濾篩選；最後，以「與現存各類輔助資料的套疊分析成果」為其屬性，以便利崩塌地地理空間資料的後續應用。物件式影像分類分為為「影像分割(Image segmentation)」與「物件分類」兩步驟。於影像分割階段，採用多重解析度分割法(Multiresolution segmentation algorithm)─由於陰影下各類地物的影像光譜差異較不明顯，為避免陰影區之錯誤分割，賦予陰影區較小的尺度參數(Scale parameter)；於物件分類階段，基於訓練資料，以「線性核函數的支持向量機(Support Vector Machine, SVM, with a linear kernel)」為分類器，偵測「非雲與植被區」，並輸出為向量式資料(Vector data)。而後基於現存地形資料，以光線追蹤法(Ray-tracing algorithm)進行分類器輸出向量式資料的地理對位，並自訂第二階段的地形特徵過濾準則。實驗成果顯示，此自動化流程產出的崩塌地地理空間資料─其生產者精度(Producer’s accuracy)與使用者精度(User’s accuracy)分別介於0.85~0.99與0.44~0.96。崩塌地偵測崩塌地資料庫物件式影像分類光線追蹤法 landslide detection landslide database object-based image classification Ray-tracing algorithm
26	蛋白質質譜儀資料之峰偵測探討 / On Peak Detection of Proteomic Mass Spectrum Data 楊智凱, Yang,Zhi Kai Unknown Date (has links) 介質輔助雷射脫附離子化/飛行時間質譜 (Matrix-assisted Laser Desorption Ionization/ Time-of-Flight –Mass Spectrum, MALDI-TOF-MS)，是種屬於高維度的蛋白質質譜儀資料，主要是用來偵測蛋白質分子的表現，而這項技術的原理，是先將蛋白質與介質混合，再利用雷射將蛋白質分子打碎，分散出帶電荷的離子，接著運用離子飛行時間的偵測，最後掃描出質譜圖形，圖形上有兩個重要變數，分別為:質量電荷比(Mass to Charge Ratio, m/z)以及強度(Intensity)，我們希望利用這些質譜資料，研究出臨床上癌症病人以及正常人的質譜資料差異，藉由找出質譜資料的生物標記，來作為判斷是否為癌症患者的依據。但是由於MALDI的技術上的限制，導致掃描出來的質譜資料往往存在著雜訊以及誤差，這時候，事前處理(Preprocessing)步驟，就顯的格外重要。生物學家認為，蛋白質質譜資料，其表現異常的蛋白質，會顯現在質譜圖的峰上面，因此在維度如此龐大的質譜資料中，如果將分析重點著重在這些峰上面，可以簡化我們事後的分析，因此在分析質譜資料的流程中，事前處理步驟的峰偵測，便成為了一個很重要的環節。Du, Kibbe以及Lin在2006年的文獻中提到，使用連續型小波轉換(Continuous Wavelet Transform, CWT)的方法來進行峰偵測，可以改善我們使用傳統峰偵測方法所產生的誤差，因此本文將CWT峰偵測方法，與傳統峰偵測方法，對已知峰正確位置的MALDI質譜資料進行分析，結果發現在不同訊號雜訊比(Signal to Noise Ratio, SNR)下，CWT峰偵測方法的敏感度(Sensitivity)都比傳統峰偵測方法來的高，且誤判率(False Discovery Rate, FDR)也都較傳統峰偵測方法低，因此就本文所使用的模擬資料，CWT峰偵測確實是一個較佳的峰偵測方法。事前處理峰偵測連續型小波轉換訊號雜訊比
27	利用多元衛星影像監測格陵蘭Russell冰河之變動行為與消融機制分析 / A remote sensing monitoring of greenland Russell glacier dynamics and analysis of melting mechanism 蔡亞倫, Tsai, Ya Lun Unknown Date (has links) 近年全球暖化現象日益嚴重，格陵蘭等極區融冰所造成之海平面上升將對全球人類帶來嚴重威脅。因冰層質量之改變與冰河移動速度高度相關，故可藉由監測格陵蘭冰層（Greenland Ice Sheet，GrIS）上冰河之移動推估全球暖化對其造成之影響。衛星影像因具有連續且快速獲得大範圍地表資訊之能力，且可結合各影像處理技術獲得地表變形量，故已廣泛應用於廣域冰河之監測。然不同影像與技術均有其優勢與限制，故本研究將使用合成孔徑雷達（Synthetic Aperture Radar，SAR）與光學影像，並結合合成孔徑雷達差分干涉（Differential Interferometric SAR，D-InSAR）、多重合成孔徑雷達干涉（Multi-aperture Interferometric SAR，MAI）與偏移偵測法（Pixel-offset，PO）技術獲得冰河表面於不同方向之位移向量，再整合各向量透過三維變動量解構法（3D decomposition）求解表面於三維方向之變形量。據此執行數值冰層動力模型（Numerical Ice Sheet Model，ISM），並結合模擬之冰底基岩渠道網絡、數化之冰面冰隙與冰面湖及氣象觀測資料後，參佐冰河變動理論，進一步了解格陵蘭Russell冰河之變動行為與機制。 / Global warming has been a worldwide issue and significantly increasing icecap melting rate over polar area. Consequently the sea level rises continuously and poses a fundamental threat to whole human beings. Since the mass loss of Greenland ice sheet （GrIS） is highly correlated to the velocity of glacier movement, this study aims to monitor the impact of global warming by tracking glacier terminus displacement over GrIS using remote sensing techniques. As there are multiple spaceborne images of various characteristics and also multiple techniques with different functions, we proposed a monitoring strategy using Synthetic Aperture Radar （SAR） and optical images, with Differential Interferometric SAR （D-InSAR）, Multi-aperture Interferometric SAR （MAI） and Pixel-offset （PO） techniques to estimate glacier movement vectors. The vectors were then merged using 3D decomposition method to derive 3D deformation. Based on the resultant 3D deformation, the Numerical Ice Sheet Model （ISM） is conducted and then integrates with modeled subglacial drainage channel network and glaciological theories, the melting dynamics and mechanism of Russell glacier can be further understood. 格陵蘭冰層合成孔徑雷達偏移偵測法數值冰層動力模型 Greenland ice sheet Synthetic Aperture Radar Pixel Offset Numerical ice sheet model
28	由職官年表中利用循序共現樣式探勘人脈網絡 / Social network analysis from official chronology using sequential co-occurrence pattern mining 宋邡熏, Song, Fang Shiun Unknown Date (has links) 在政治權力結構中，權臣與派系在其政治人物的社會網絡中扮演重要的角色。本論文研究由職官年表中探勘權臣與派系。我們提出資料探勘演算法由職官年表中探勘循序共現樣式，以探勘出政府官員官職陞貶的共現關係。接著根據所探勘出的循序共現樣式，建立官員之間的社會網絡。透過社會網絡分析中的網絡中心性與社群偵測分別探勘出權臣與派系。本論文以清康熙時期的職官年表實驗驗證。透過視覺化分析顯示本論文所提出的方法有助於歷史學者的研究。 / In a power structure, chief officials and cliques play important roles in the social network and have high influence on politics. This thesis proposes an approach of social network mining from official chronologies to discover the chief officials and the cliques. We propose and develop the algorithm to discover the sequential co-occurrence patterns from official chronologies. Then the social network is constructed based on the discovered sequential co-occurrence patterns. Chief officials are discovered by network centrality analysis while cliques are discovered by community analysis of the constructed social network. The official chronology of Kangxi Emperor is taken as an example for experiments and the visualization analysis demonstrates that the proposed methods are helpful to assist historian for historical research. 社群網絡探勘網路中心性社群偵測史料探勘職官年表 Social Network Mining Network Centrality Community Detection Historical Document Mining Official Chronology
29	焦點檢定方法比較 / A simulation study for evaluating focused tests of cluster detection 蔡丞庭 Unknown Date (has links) 臺灣的癌症發生率及死亡率有連年增加的趨勢，研究指出原因可能與環境中的污染物質有關，檢測可能的污染源附近是否存在癌症群聚(Cluster)，將有助於未來的癌症防治。在空間統計(Spatial Statistics)有不少方法可用於檢測群聚現象，其中用來檢測某個特定位置周圍是否發生群聚的方法被稱為焦點檢定(Focused Test)，本文介紹及評估常用的焦點檢定方法，並使用較佳方法探討臺灣地區疑似污染源的地區。首先本文使用電腦模擬，在不同情境假設下比較焦點檢定方法的檢定力(Power)，例如研究區域大小、群聚形狀等不同的情境，以判斷檢定方法之間的優劣。最後本文分析臺灣鄉鎮市(Township)層級癌症死亡資料，應用焦點檢定方法分析石門核一廠、恆春核三廠及麥寮六輕周圍的癌症死亡率，檢定結果發現核一廠及麥寮六輕附近有較高的癌症死亡率。 / The cancer incidence and mortality rate in Taiwan have been increasing over the past 30 years. Previous studies indicate that the pollution sources, especially for those creating air pollution and excess radiation, are one of the potential causes for the increment. Correctly, detecting the location of possible sources of contaminants can help for cancer prevention. In spatial statistics, focused test can be used to determine if the intensity rate are higher around a possible pollution source. We will introduce and evaluate frequently used focused tests and apply them in Taiwan. First we use computer simulation to compare the power of focused tests in different scenarios, such as study region and cluster shape. Next, we apply the focused tests to Taiwan cancer mortality data, in order to decide if the cancer mortality rates are higher around Chinshan nuclear power plant, Maanshan nuclear power plant, and Mailiao sixth naphtha cracker. The results show that the cancer mortality rates around Chinshan nuclear power plant and Mailiao sixth naphtha cracker are significantly higher. 群聚偵測焦點檢定癌症死亡率檢定力電腦模擬 cluster detection focused test cancer mortality power computer simulation
30	多源遙測影像於海岸變遷之研究 / Coastal changes detection using multi-source remote sensing images 梁平, Liang, Ping Unknown Date (has links) 本研究以不同時期之航遙測影像偵測宜蘭海岸濱線變遷，影像來源包含1947年之舊航照影像、1971年的美國Corona衛星影像、1985年的像片基本圖、2003年的SPOT-5衛星影像及2009年以Z/I DMC(Digital Mapping Camera)航空數位相機所拍攝之高解像力航照影像。由於影像獲取的時間與感測器皆有所差異，故本研究透過不同的方式處理資料，將影像地理對位，並利用地理資訊系統(Geographic Information Systems, GIS)軟體數化濱線及沙灘(丘)，且以套疊分析觀察不同時期濱線與沙灘變遷之情形，最後收集宜蘭地區的自然或人文資料如潮汐、降雨量與輸沙量等，分析宜蘭海岸變遷的原因。而在濱線萃取方面，由於以人工數化方式太耗時間與人力，故嘗試以半自動化方式如影像分類或影像分割萃取濱線，並與人工數化結果比較。研究結果顯示，利用多時期之遙測影像，並結合GIS之空間分析功能，確可有效掌握濱線與沙灘(丘)的歷史變化概況。 / This study used multi-temporal remote sensing images to detect shoreline changes along the Yilan coast. Various types of remote sensing images were used in this study, including old aerial images taken in 1947, Corona satellite images acquired in 1971, photo base map produced in 1985, SPOT-5 satellite images obtained in 2003, and high-resolution aerial images taken in 2009 by using Z/I DMC (Digital Mapping Camera). Because these images were taken in different time using different sensors, different procedures were applied to process the data and georeference the images to a common coordinate system. GIS (Geographic information Systems) software was used to digitize shoreline and the beach area, and overlay analysis was applied to find the shoreline changes in different time periods. Then various ancillary data such as tides, precipitation, and sediment load was collected to analyze the causes of coastal changes in Yilan. For shoreline extraction, manual digitization required a lot of time and manpower. Therefore, semi-automatic method such as image classification and image segmentation was applied to extract shoreline. The results show that, by using multi-temporal remote sensing images and spatial analysis functionalities of GIS, the historical changes of shoreline and beach area can be detected effectively. 多源遙測影像地理對位地理資訊系統套疊分析變遷偵測 multi-source remote sensing images georeferencing GIS overlay analysis change detection

Search results