Spelling suggestions: "subject:"[een] SUPPORT VECTOR MACHINE"" "subject:"[enn] SUPPORT VECTOR MACHINE""
371 |
基植於非負矩陣分解之華語流行音樂曲式分析 / Chinese popular music structure analysis based on non-negative matrix factorization黃柏堯, Huang, Po Yao Unknown Date (has links)
近幾年來,華語流行音樂的發展越來越多元,而大眾所接收到的資訊是流行音樂當中的組成元素”曲與詞”,兩者分別具有賦予人類感知的功能,使人能夠深刻體會音樂作品當中所表答的內容與意境。然而,作曲與作詞都是屬於專業的創作藝術,作詞者通常在填詞時,會先對樂曲當中的結構進行粗略的分析,找出整首曲子的曲式,而針對可以填詞的部份,再進行更細部的分析將詞填入最適當的位置。流行音樂當中,曲與詞存在著密不可分的關係,瞭解歌曲結構不僅能降低填詞的門檻,亦能夠明白曲子的骨架與脈絡;在音樂教育與音樂檢索方面亦有幫助。
本研究的目標為,使用者輸入流行音樂歌曲,系統會自動分析出曲子的『曲式結構』。方法主要分成三個部分,分別為主旋律擷取、歌句分段與音樂曲式結構擷取。首先,我們利用Support Vector Machine以學習之方式建立模型後,擷取出符號音樂中之主旋律。第二步驟我們以”歌句”為單位,對主旋律進行分段,對於分段之結果建構出Self-Similarity Matrix矩陣。最後再利用Non-Negative Matrix Factorization針對不同特徵值矩陣進行分解並建立第二層之Self-Similarity Matrix矩陣,以歧異度之方式找出曲式邊界。
我們針對分段方式對歌曲結構之影響進行分析與觀察。實驗數據顯示,事先將歌曲以歌句單位分段之效果較未分段佳,而歌句分段之評測結果F-Score為0.82;將音樂中以不同特徵值建構之自相似度矩進行Non-Negative Matrix Factorization後,另一空間中之基底特徵更能有效地分辨出不同的歌曲結構,其F-Score為0.71。 / Music structure analysis is helpful for music information retrieval, music education and alignment between lyrics and music. This thesis investigates the techniques of music structure analysis for Chinese popular music.
Our work is to analyze music form automatically by three steps, main melody finding, sentence discovery, and music form discovery. First, we extract main melody based on learning from user-labeled sample using support vector machine. Then, the boundary of music sentence is detected by two-way classification using support vector machine. To discover the music form, the sentence-based Self-Similarity Matrix is constructed for each music feature. Non-negative Matrix Factorization is employed to extract the new features and to construct the second level Self-Similarity Matrix. The checkerboard kernel correlation is utilized to find music form boundaries on the second level Self-Similarity Matrix.
Experiments on eighty Chinese popular music are performed for performance evaluation of the proposed approaches. For the main melody finding, our proposed learning-based approach is better than existing methods. The proposed approaches achieve 82% F-score for sentence discovery while 71% F-score for music form discovery.
|
372 |
Técnicas de Sistemas Automáticos de Soporte Vectorial en la Réplica del Rating CrediticioCampos Espinoza, Ricardo Álex 10 July 2012 (has links)
La correcta qualificació de risc de crèdit d'un emissor és un factor crític en l’economia actual. Aquest és un punt d’acord entre professionals i acadèmics. Actualment, des dels mitjans de comunicació s’han difós sovint notícies d'impacte provocades per agències de ràting. És per aquest motiu que treball d'anàlisi realitzat per experts financers aporta importants recursos a les empreses de consultoria d'inversió i agències qualificadores. Avui en dia, hi ha molts avenços metodològics i tècnics que permeten donar suport a la tasca que fan els professionals de la qualificació de la qualitat de crèdit dels emissors. Tanmateix encara queden molts buits per completar i àrees a desenvolupar per tal què aquesta tasca sigui tan precisa com cal.
D'altra banda, els sistemes d'aprenentatge automàtic basats en funcions nucli, particularment les Support Vector Machines (SVM), han donat bons resultats en problemes de classificació quan les dades no són linealment separables o quan hi ha patrons amb soroll. A més, al usar estructures basades en funcions nucli és possible tractar qualsevol espai de dades, ampliant les possibilitats per trobar relacions entre els patrons, tasca que no resulta fàcil amb tècniques estadístiques convencionals.
L’objectiu d'aquesta tesi és examinar les aportacions que s'han fet en la rèplica de ràting, i alhora, examinar diferents alternatives que permetin millorar l'acompliment de la rèplica amb SVM. Per a això, primer s'ha revisat la literatura financera amb la idea d'obtenir una visió general i panoràmica dels models usats per al mesurament del risc de crèdit. S'han revisat les aproximacions de mesurament de risc de crèdit individuals, utilitzades principalment per a la concessió de crèdits bancaris i per l'avaluació individual d'inversions en títols de renda fixa. També s'han revisat models de carteres d'actius, tant aquells proposats des del món acadèmic com els patrocinats per institucions financeres. A més, s'han revisat les aportacions dutes a terme per avaluar el risc de crèdit usant tècniques estadístiques i sistemes d'aprenentatge automàtic. S'ha fet especial èmfasi en aquest últim conjunt de mètodes d'aprenentatge i en el conjunt de metodologies usades per realitzar adequadament la rèplica de ràting. Per millorar l'acompliment de la rèplica, s'ha triat una tècnica de discretització de les variables sota la suposició que, per emetre l'opinió tècnica del ràting de les companyies, els experts financers en forma intuïtiva avaluen les característiques de les empreses en termes intervalars.
En aquesta tesi, per fer la rèplica de ràting, s'ha fet servir una mostra de dades de companyies de països desenvolupats. S'han usat diferents tipus de SVM per replicar i s'ha exposat la bondat dels resultats d'aquesta rèplica, comparant-la amb altres dues tècniques estadístiques àmpliament usades en la literatura financera. S'ha concentrat l'atenció de la mesura de la bondat de l'ajust dels models en les taxes d'encert i en la forma en què es distribueixen els errors.
D'acord amb els resultats obtinguts es pot sostenir que l'acompliment de les SVM és millor que el de les tècniques estadístiques usades en aquesta tesi, i després de la discretització de les dades d'entrada s'ha mostrat que no es perd informació rellevant en aquest procés. Això contribueix a la idea que els experts financers instintivament realitzen un procés similar de discretització de la informació financera per lliurar la seva opinió creditícia de les companyies qualificades. / La correcta calificación de riesgo crediticio de un emisor es un factor crítico en nuestra actual economía. Profesionales y académicos están de acuerdo en esto, y los medios de comunicación han difundido mediáticamente eventos de impacto provocados por agencias de rating. Por ello, el trabajo de análisis del deudor realizado por expertos financieros conlleva importantes recursos en las empresas de consultoría de inversión y agencias calificadoras. Hoy en día, muchos avances metodológicos y técnicos permiten el apoyo a la labor que hacen los profesionales en de calificación de la calidad crediticia de los emisores. No obstante aún quedan muchos vacíos por completar y áreas que desarrollar para que esta tarea sea todo lo precisa que necesita.
Por otra parte, los sistemas de aprendizaje automático basados en funciones núcleo, particularmente las Support Vector Machines (SVM), han dado buenos resultados en problemas de clasificación cuando los datos no son linealmente separables o cuando hay patrones ruidosos. Además, al usar estructuras basadas en funciones núcleo resulta posible tratar cualquier espacio de datos, expandiendo las posibilidades para encontrar relaciones entre los patrones, tarea que no resulta fácil con técnicas estadísticas convencionales.
El propósito de esta tesis es examinar los aportes que se han hecho en la réplica de rating, y a la vez, examinar diferentes alternativas que permitan mejorar el desempeño de la réplica con SVM. Para ello, primero se ha revisado la literatura financiera con la idea de obtener una visión general y panorámica de los modelos usados para la medición del riesgo crediticio. Se han revisado las aproximaciones de medición de riesgo crediticio individuales, utilizadas principalmente para la concesión de créditos bancarios y para la evaluación individual de inversiones en títulos de renta fija. También se han revisado modelos de carteras de activos, tanto aquellos propuestos desde el mundo académico como los patrocinados por instituciones financieras.
Además, se han revisado los aportes llevados a cabo para evaluar el riesgo crediticio usando técnicas estadísticas y sistemas de aprendizaje automático. Se ha hecho especial énfasis en este último conjunto de métodos de aprendizaje y en el conjunto de metodologías usadas para realizar adecuadamente la réplica de rating. Para mejorar el desempeño de la réplica, se ha elegido una técnica de discretización de las variables bajo la suposición de que, para emitir la opinión técnica del rating de las compañías, los expertos financieros en forma intuitiva evalúan las características de las empresas en términos intervalares.
En esta tesis, para realizar la réplica de rating, se ha usado una muestra de datos de compañías de países desarrollados. Se han usado diferentes tipos de SVM para replicar y se ha expuesto la bondad de los resultados de dicha réplica, comparándola con otras dos técnicas estadísticas ampliamente usadas en la literatura financiera. Se ha concentrado la atención de la medición de la bondad del ajuste de los modelos en las tasas de acierto y en la forma en que se distribuyen los errores.
De acuerdo con los resultados obtenidos se puede sostener que el desempeño de las SVM es mejor que el de las técnicas estadísticas usadas en esta tesis; y luego de la discretización de los datos de entrada se ha mostrado que no se pierde información relevante en dicho proceso. Esto contribuye a la idea de que los expertos financieros instintivamente realizan un proceso similar de discretización de la información financiera para entregar su opinión crediticia de las compañías calificadas. / Proper credit rating of an issuer is a critical factor in our current economy. Professionals and academics agree on this, and the media have spread impact events caused by rating agencies. Therefore, the analysis performed by the debtor's financial experts has significant resources on investment consulting firms and rating agencies. Nowadays, many methodological and technical exist to support the professional qualification of the credit quality of issuers. However there are still many gaps to complete and areas to develop for this task to be as precise as needed.
Moreover, machine learning systems based on core functions, particularly Support Vector Machines (SVM) have been successful in classification problems when the data are not linearly separable or when noisy patterns are used. In addition, by using structures based on kernel functions is possible to treat any data space, expanding the possibilities to find relationships between patterns, a task that is not easy with conventional statistical techniques.
The purpose of this thesis is to examine the contributions made in the replica of rating, and, to look at different alternatives to improve the performance of prediction with SVM. To do this, we first reviewed the financial literature and overview the models used to measure credit risk. We reviewed the approaches of individual credit risk measurement, used principally for the lending bank and the individual assessment of investments in fixed income securities. Models based on portfolio of assets have also been revised, both those proposed from academia such as those used by financial institutions. In addition, we have reviewed the contributions carried out to assess credit risk using statistical techniques and machine learning systems. Particular emphasis has been placed on learning methods methodologies used to perform adequately replicate rating. To improve the performance of replication, a discretization technique has been chosen for the variables under the assumption that, for the opinion of the technical rating companies, financial experts intuitively evaluate the performances of companies in intervalar terms.
In this thesis, for rating replication, we used a data sample of companies in developed countries. Different types of SVM have been used to replicate and discussed the goodness of the results of the replica, compared with two other statistical techniques widely used in the financial literature. Special attention has been given to measure the goodness of fit of the models in terms of rates of success and how they errors are distributed.
According to the results it can be argued that the performance of SVM is better than the statistical techniques used in this thesis. In addition, it has been shown that in the process of discretization of the input data no-relevant information is lost. This contributes to the idea that financial experts instinctively made a similar process of discretization of financial information to deliver their credit opinion of the qualified companies.
|
373 |
Frequency Analysis of Droughts Using Stochastic and Soft Computing TechniquesSadri, Sara January 2010 (has links)
In the Canadian Prairies recurring droughts are one of the realities which can
have significant economical, environmental, and social impacts. For example,
droughts in 1997 and 2001 cost over $100 million on different sectors. Drought frequency
analysis is a technique for analyzing how frequently a drought event of a given
magnitude may be expected to occur. In this study the state of the science related
to frequency analysis of droughts is reviewed and studied. The main contributions
of this thesis include development of a model in Matlab which uses the qualities of
Fuzzy C-Means (FCMs) clustering and corrects the formed regions to meet the criteria
of effective hydrological regions. In FCM each site has a degree of membership in
each of the clusters. The algorithm developed is flexible to get number of regions and
return period as inputs and show the final corrected clusters as output for most case
scenarios. While drought is considered a bivariate phenomena with two statistical
variables of duration and severity to be analyzed simultaneously, an important step
in this study is increasing the complexity of the initial model in Matlab to correct
regions based on L-comoments statistics (as apposed to L-moments). Implementing
a reasonably straightforward approach for bivariate drought frequency analysis using
bivariate L-comoments and copula is another contribution of this study. Quantile estimation at ungauged sites for return periods of interest is studied by introducing two
new classes of neural network and machine learning: Radial Basis Function (RBF)
and Support Vector Machine Regression (SVM-R). These two techniques are selected
based on their good reviews in literature in function estimation and nonparametric
regression. The functionalities of RBF and SVM-R are compared with traditional
nonlinear regression (NLR) method. As well, a nonlinear regression with regionalization
method in which catchments are first regionalized using FCMs is applied and
its results are compared with the other three models. Drought data from 36 natural
catchments in the Canadian Prairies are used in this study. This study provides a
methodology for bivariate drought frequency analysis that can be practiced in any
part of the world.
|
374 |
Applications of Soft Computing for Power-Quality Detection and Electric Machinery Fault DiagnosisWu, Chien-Hsien 20 November 2008 (has links)
With the deregulation of power industry and the market competition, stable and reliable power supply is a major concern of the independent system operator (ISO). Power-quality (PQ) study has become a more and more important subject lately. Harmonics, voltage swell, voltage sag, and power interruption could downgrade the service quality. In recent years, high speed railway (HSR) and massive rapid transit (MRT) system have been rapidly developed, with the applications of widespread semiconductor technologies in the auto-traction system. The harmonic distortion level worsens due to these increased uses of electronic equipment and non-linear loads. To ensure the PQ, power-quality disturbances (PQD) detection becomes important. A detection method with classification capability will be helpful for detecting disturbance locations and types.
Electric machinery fault diagnosis is another issue of considerable attentions from utilities and customers. ISO need to provide a high quality service to retain their customers. Fault diagnosis of turbine-generator has a great effect on the benefit of power plants. The generator fault not only damages the generator itself, but also causes outages and loss of profits. With high-temperature, high-pressure and factors such as thermal fatigues, many components may go wrong, which will not only lead to great economic loss, but sometimes a threat to social security. Therefore, it is necessary to detect generator faults and take immediate actions to cut the loss. Besides, induction motor plays a major role in a power system. For saving cost, it is important to run periodical inspections to detect incipient faults inside the motor. Preventive techniques for early detection can find out the incipient faults and avoid outages. This dissertation developed various soft computing (SC) algorithms for detection including power-quality disturbances (PQD), turbine-generator fault diagnosis, and induction motor fault diagnosis. The proposed SC algorithms included support vector machine (SVM), grey clustering analysis (GCA), and probabilistic neural network (PNN). Integrating the proposed diagnostic procedure and existing monitoring instruments, a well-monitored power system will be constructed without extra devices. Finally, all the methods in the dissertation give reasonable and practical estimation method. Compared with conventional method, the test results showed a high accuracy, good robustness, and a faster processing performance.
|
375 |
New support vector machine formulations and algorithms with application to biomedical data analysisGuan, Wei 13 June 2011 (has links)
The Support Vector Machine (SVM) classifier seeks to find the separating hyperplane wx=r that maximizes the margin distance 1/||w||2^2. It can be formalized as an optimization problem that minimizes the hinge loss Ʃ[subscript i](1-y[subscript i] f(x[subscript i]))₊ plus the L₂-norm of the weight vector. SVM is now a mainstay method of machine learning. The goal of this dissertation work is to solve different biomedical data analysis problems efficiently using extensions of SVM, in which we augment the standard SVM formulation based on the application requirements. The biomedical applications we explore in this thesis include: cancer diagnosis, biomarker discovery, and energy function learning for protein structure prediction.
Ovarian cancer diagnosis is problematic because the disease is typically asymptomatic especially at early stages of progression and/or recurrence. We investigate a sample set consisting of 44 women diagnosed with serous papillary ovarian cancer and 50 healthy women or women with benign conditions. We profile the relative metabolite levels in the patient sera using a high throughput ambient ionization mass spectrometry technique, Direct Analysis in Real Time (DART). We then reduce the diagnostic classification on these metabolic profiles into a functional classification problem and solve it with functional Support Vector Machine (fSVM) method. The assay distinguished between the cancer and control groups with an unprecedented 99\% accuracy (100\% sensitivity, 98\% specificity) under leave-one-out-cross-validation. This approach has significant clinical potential as a cancer diagnostic tool.
High throughput technologies provide simultaneous evaluation of thousands of potential biomarkers to distinguish different patient groups. In order to assist biomarker discovery from these low sample size high dimensional cancer data, we first explore a convex relaxation of the L₀-SVM problem and solve it using mixed-integer programming techniques. We further propose a more efficient L₀-SVM approximation, fractional norm SVM, by replacing the L₂-penalty with L[subscript q]-penalty (q in (0,1)) in the optimization formulation. We solve it through Difference of Convex functions (DC) programming technique. Empirical studies on the synthetic data sets as well as the real-world biomedical data sets support the effectiveness of our proposed L₀-SVM approximation methods over other commonly-used sparse SVM methods such as the L₁-SVM method.
A critical open problem in emph{ab initio} protein folding is protein energy function design. We reduce the problem of learning energy function for extit{ab initio} folding to a standard machine learning problem, learning-to-rank. Based on the application requirements, we constrain the reduced ranking problem with non-negative weights and develop two efficient algorithms for non-negativity constrained SVM optimization. We conduct the empirical study on an energy data set for random conformations of 171 proteins that falls into the {it ab initio} folding class. We compare our approach with the optimization approach used in protein structure prediction tool, TASSER. Numerical results indicate that our approach was able to learn energy functions with improved rank statistics (evaluated by pairwise agreement) as well as improved correlation between the total energy and structural dissimilarity.
|
376 |
重疊法應用於蛋白質質譜儀資料 / Overlap Technique on Protein Mass Spectrometry Data徐竣建, Hsu, Chun-Chien Unknown Date (has links)
癌症至今已連續蟬聯並高居國人十大死因之首,由於癌症初期病患接受適時治療的存活率較高,因此若能「早期發現,早期診斷,早期治療」則可降低死亡率。本文所引用的資料庫,是經由「表面強化雷射解吸電離飛行質譜技術」(SELDI-TOF-MS)所擷取建置的蛋白質質譜儀資料,包括兩筆高維度資料:一筆為攝護腺癌症,另一筆則為頭頸癌症。然而蛋白質質譜儀資料常因維度變數繁雜眾多,對於資料的存取容量及運算時間而言,往往造成相當沉重的負擔與不便;有鑑於此,本文之目的即在探討將高維度資料經由維度縮減後,找出分錯率最小化之分析方法,希冀提高癌症病例資料分類的準確性。
本研究分為實驗組及對照組兩部分,實驗組是以主成份分析(Principal Component Analysis,PCA)進行維度縮減,再利用支持向量機(Support Vector Machine,SVM)予以分類,最後藉由重疊法(Overlap)以期改善分類效果;對照組則是以支持向量機直接進行分類。分析結果顯示,重疊法對於攝護腺癌症具有顯著的改善效果,但對於頭頸癌症的改善效果卻不明顯。此外,本研究也探討關於蛋白質質譜儀資料之質量範圍,藉以確認專家學者所建議的質量範圍是否與分析結果相互一致。在攝護腺癌症中的原始資料,專家學者所建議的質量範圍以外,似乎仍隱藏著重要的相關資訊;在頭頸癌症中的原始資料,專家學者所建議的質量範圍以外,對於研究分析而言則並沒有實質上的幫助。 / Cancer has been the number one leading cause of death in Taiwan for the past 24 years. Early detection of this disease would significantly reduce the mortality rate. The database adopted in this study is from the Protein Mass Spectrometry Data Sets acquired and established by “Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry” (SELDI-TOF-MS) technique, including the Prostate Cancer and Head/Neck Cancer Data Sets. However, because of its high dimensionality, dealing the analysis of the raw data is not easy. Therefore, the purpose of this thesis is to search a feasible method, putting the dimension reduction and minimizing classification errors in the same time.
The data sets are separated into the experimental and controlled groups. The first step of the experimental group is to use dimension reduction by Principal Component Analysis (PCA), following by Support Vector Machine (SVM) for classification, and finally Overlap Method is used to reduce classification errors. For comparison, the controlled group uses SVM for classification. The empirical results indicate that the improvement of Overlap Method is significant in the Prostate Cancer case, but not in that of the Head/Neck case. We also study data range suggested according to the expert opinions. We find that there is information hidden outside the data range suggested by the experts in the Prostate Cancer case, but not in the Head/Neck case.
|
377 |
Frequency Analysis of Droughts Using Stochastic and Soft Computing TechniquesSadri, Sara January 2010 (has links)
In the Canadian Prairies recurring droughts are one of the realities which can
have significant economical, environmental, and social impacts. For example,
droughts in 1997 and 2001 cost over $100 million on different sectors. Drought frequency
analysis is a technique for analyzing how frequently a drought event of a given
magnitude may be expected to occur. In this study the state of the science related
to frequency analysis of droughts is reviewed and studied. The main contributions
of this thesis include development of a model in Matlab which uses the qualities of
Fuzzy C-Means (FCMs) clustering and corrects the formed regions to meet the criteria
of effective hydrological regions. In FCM each site has a degree of membership in
each of the clusters. The algorithm developed is flexible to get number of regions and
return period as inputs and show the final corrected clusters as output for most case
scenarios. While drought is considered a bivariate phenomena with two statistical
variables of duration and severity to be analyzed simultaneously, an important step
in this study is increasing the complexity of the initial model in Matlab to correct
regions based on L-comoments statistics (as apposed to L-moments). Implementing
a reasonably straightforward approach for bivariate drought frequency analysis using
bivariate L-comoments and copula is another contribution of this study. Quantile estimation at ungauged sites for return periods of interest is studied by introducing two
new classes of neural network and machine learning: Radial Basis Function (RBF)
and Support Vector Machine Regression (SVM-R). These two techniques are selected
based on their good reviews in literature in function estimation and nonparametric
regression. The functionalities of RBF and SVM-R are compared with traditional
nonlinear regression (NLR) method. As well, a nonlinear regression with regionalization
method in which catchments are first regionalized using FCMs is applied and
its results are compared with the other three models. Drought data from 36 natural
catchments in the Canadian Prairies are used in this study. This study provides a
methodology for bivariate drought frequency analysis that can be practiced in any
part of the world.
|
378 |
應用探勘技術於社會輿情以預測捷運週邊房地產市場之研究 / A Study of Applying Public Opinion Mining to Predict the Housing Market Near the Taipei MRT Stations吳佳芸, Wu, Chia Yun Unknown Date (has links)
因網際網路帶來的便利性與即時性,網路新聞成為社會大眾吸收與傳遞新聞資訊的重要管道之一,而累積的巨量新聞亦可反映出社會輿論對某特定新聞議題之即時反應、熱門程度以及情緒走向等。 因此,本研究期望借由意見探勘與情緒分析技術,從特定領域新聞中挖掘出有價值的關聯,並結合傳統機器學習建立一個房地產市場的預測模式,提供購屋決策的參考依據。
本研究搜集99年1月1日至103年6月30日共1,1150筆房地產新聞,以及8,165件捷運週邊250公尺內房屋買賣交易資料,運用意見探勘萃取意見詞彙進行情緒分析,並建立房市情緒與成交價量時間序列,透過半年移動平均、二次移動平均及成長斜率,瞭解社會輿情對房市行情抱持樂觀或悲觀,分析社會情緒與實際房地產成交間關聯性,以期能找出房地產買賣時機點,並進一步結合情緒及房地產的環境影響因素,藉由支援向量機建立站點房市的預測模型。
實證結果中,本研究發現房市情緒與成交價量之波動有一定的週期與相關性,且新捷運開通前一年將連帶影響整體捷運房市波動,當成交線穿越情緒線且斜率同時向上時,可做為適當的房市進場時機點。而本研究針對站點情緒與環境變數所建立之預測模型,其預測新捷運線站點之平均準確率為69.2%,而預測新捷運線熱門站點之準確率為78%,顯示模型於預測熱門站點上具有不錯的預測能力。 / Nowadays, E-News have become an important way for people to get daily information. These enormous amounts of news could reflect public opinions on a particular attention or sentiment trends in news topics. Therefore, how to use opinion mining and sentiment analysis technology to dig out valuable information from particular news becomes the latest issue.
In this study, we collected 1,1150 house news and 8,165 house transaction records around the MRT stations within 250 meters over the last five years. We extracted the emotion words from the news by manipulating opinion mining. Furthermore, we built moving average lines and the slope of the moving average in order to explore the relationship and entry point between public opinion and housing market.
In conclusion, we indicated that there is a high correlation between the news sentiment and housing market. We also uses SVM algorithm to construct a model to predict housing hotspots. The results demonstrate that the SVM model reaches average accuracy at 69.2% and the model accuracy increases up to 78% for predicting housing hotspots. Besides, we also provide investors with a basis of entry point into the housing market by utilizing the moving average cross overs and slopes analysis and a better way of predicting housing hotspots.
|
379 |
對使用者評論之情感分析研究-以Google Play市集為例 / Research into App user opinions with Sentimental Analysis on the Google Play market林育龍, Lin, Yu Long Unknown Date (has links)
全球智慧型手機的出貨量持續提升,且熱門市集的App下載次數紛紛突破500億次。而在iOS和Android手機App市集中,App的評價和評論對App在市集的排序有很大的影響;對於App開發者而言,透過評論確實可掌握使用者的需求,並在產生抱怨前能快速反應避免危機。然而,每日多達上百篇的評論,透過人力逐篇查看,不止耗費時間,更無法整合性的瞭解使用者的需求與問題。
文字情感分析通常會使用監督式或非監督式的方法分析文字評論,其中監督式方法被證實透過簡單的文件量化方法就可達到很高的正確率。但監督式方法有無法預期未知趨勢的限制,且需要進行耗費人力的文章類別標注工作。
本研究透過情感傾向和熱門關注議題兩個面向來分析App評論,提出一個混合非監督式與監督式的中文情感分析方法。我們先透過非監督式方法標注評論類別,並作視覺化整理呈現,最後再用監督式方法建立分類模型,並驗證其效果。
在實驗結果中,利用中文詞彙網路所建立的情感詞集,確實可用來判斷評論的正反情緒,唯判斷負面評論效果不佳需作改善。在議題擷取方面,嘗試使用兩種不同分群方法,其中使用NPMI衡量字詞間關係強度,再配合社群網路分析的Concor方法結果有不錯的成效。最後在使用監督式學習的分類結果中,情感傾向的分類正確率達到87%,關注議題的分類正確率達到96%,皆有不錯表現。
本研究利用中文詞彙網路與社會網路分析,來發展一個非監督式的中文類別判斷方法,並建立一個中文情感分析的範例。另外透過建立全面性的視覺化報告來瞭解使用者的正反回饋意見,並可透過分類模型來掌握新評論的內容,以提供App開發者在市場上之競爭智慧。 / While the number of smartphone shipment is continuesly growing, the number of App downloads from the popular app markets has been already over 50 billion. By Apple App Store and Google Play, ratings and reviews play a more important role in influencing app difusion. While app developers can realize users’ needs by app reviews, more than thousands of reviews produced by user everday become difficult to be read and collated.
Sentiment Analysis researchs encompass supervised and unsupervised methods for analyzing review text. The supervised learning is proven as a useful method and can reach high accuracy, but there are limits where future trend can not be recognized and the labels of individual classes must be made manually.
We concentrate on two issues, viz Sentiment Orientation and Popular Topic, to propose a Chinese Sentiment Analysis method which combines supervised and unsupervised learning. At First, we use unsupervised learning to label every review articles and produce visualized reports. Secondly, we employee supervised learning to build classification model and verify the result.
In the experiment, the Chinese WordNet is used to build sentiment lexicon to determin review’s sentiment orientation, but the result shows it is weak to find out negative review opinions. In the Topic Extraction phase, we apply two clustering methods to extract Popular Topic classes and its result is excellent by using of NPMI Model with Social Network Analysis Method i.e. Concor. In the supervised learning phase, the accuracy of Sentiment Orientation class is 87% and the accuracy of Popular Topic class is 96%.
In this research, we conduct an exemplification of the unsupervised method by means of Chinese WorkNet and Social Network Analysis to determin the review classes. Also, we build a comprehensive visualized report to realize users’ feedbacks and utilize classification to explore new comments. Last but not least, with Chinese Sentiment Analysis of this research, and the competitive intelligence in App market can be provided to the App develops.
|
380 |
以基因演算法優化最小二乘支持向量機於坐標轉換之研究 / Coordinate Transformation Using Genetic Algorithm Based Least Square Support Vector Machine黃鈞義 Unknown Date (has links)
由於採用的地球原子不同,目前,台灣地區有兩種坐標系統存在,TWD67(Taiwan Datum 1967) 和TWD97(Taiwan Datum 1997)。在應用上,必須進行不同地球原子間之坐標轉換。坐標轉換方面,有許多方法可供選擇,如六參數轉換、支持向量機(Support Vector Machine, SVM)轉換等。
最小二乘支持向量機(Least Square Support Vector Machine, LSSVM),為SVM的一種演算法,是一種非線性模型。LSSVM在運用上所需之參數少,能夠解決小樣本、非線性、高維度和局部極小點等問題。目前,LSSVM,已經被成功運用在影像分類和統計迴歸等領域上。
本研究將利用LSSVM採用不同之核函數:線性核函數(LIN)、多項式核函數(POLY)及徑向基核函數(RBF)進行TWD97和TWD67之坐標轉換。研究中並使用基因演算法來調整LSSVM的RBF核函數之系統參數(後略稱RBF+GA),找出較佳之系統參數組合以進行坐標轉換。模擬與實測之地籍資料,將被用以測試LSSVM及六參數坐標轉換方法的轉換精度。
研究結果顯示,RBF+GA在各實驗區之轉換精度優於參數優化前RBF之轉換精度,且RBF+GA之轉換精度也較六參數轉換之轉換精度高。
進行參數優化後,RBF+GA相對於RBF的精度提升率如下:(1)模擬實驗區:參考點與檢核點數量比分別為1:1、2:1、3:1、1:2及1:3時,精度提升率分別為15.2%、21.9%、33.2%、12.0%、11.7%;(2)真實實驗區:花蓮縣、台中市及台北市實驗區之精度提升率分別為20.1%、32.4% 、22.5%。 / There are two coordinate systems with different geodetic datum in Taiwan region, i.e., TWD67 (Taiwan Datum 1967) and TWD97 (Taiwan Datum 1997). In order to maintain the consistency of cadastral coordinates, it is necessary to transform from one coordinate system to another. There are many coordinate transformation methods, such as, 2-dimension 6-parameter transformation, and support vector machine (SVM). Least Square Support Vector Machine (LSSVM), is one type of SVM algorithms, and it is also a non-linear model。LSSVM needs a few parameters to solve non-linear, high-dimension problems, and it has been successfully applied to the fields of image classification, and statistical regression. The goal of this paper is to apply LSSVM with different kernel functions (POLY、LIN、RBF) to cadastral coordinate transformation between TWD67 and TWD97.
Genetic Algorithm will be used to find out an appropriate set of system parameters for LSSVM with RBF kernel to transform the cadastral coordinates. The simulated and real data sets will be used to test the performances, and coordinate transformation accuracies of LSSVM with different kernel functions and 6-parameter transformation.
According to the test results, it is found that after optimizing the RBF parameters (RBF+GA), the transformation accuracies using RBF+GA are better than RBF, and even better than those of 6-parameter transformation.
Comparing with the transformation accuracies using RBF, the transformation accuracy improving rate of RBF+GA are : (1) The simulated data sets: when the amount ratio of reference points and check points comes to 1:1, 2:1, 3:1, 1:2 and 1:3, the transformation accuracy improving rate are 15.2%, 21.9%, 33.2%, 12.0% and 11.7%, respectively; (2) The real data sets: the transformation accuracy improving rate of RBF+GA for the Hualien, Taichung and Taipei data sets are 20.1%, 32.4% and 22.5%, respectively.
|
Page generated in 0.0446 seconds