Global ETD Search

1	應用資料採礦技術於信用卡使用行為及市場需求 / Applications of Data Mining Techniques to the Behavior of Using Credit Cards and Market Demand 游涵茵 Unknown Date (has links) 隨著金融自由化、國際化的趨勢，加上國民所得提高、電子化的普及，使得信用卡市場蓬勃發展，國內各大銀行紛紛積極投入信用卡發卡行列。台灣的信用卡市場競爭的程度，從各發卡銀行所提供消費者的各項附加服務，如辦卡送禮、持卡免年費、失卡零風險、購物優惠…等，幾乎都已是每一張信用卡的基本配備。隨著卡債、卡奴的事件爆發，銀行業者舊有的信用卡行銷策略已經宣告失敗，但信用卡市場背後帶來的經濟效益，仍然是不容忽視，如今，要如何增加信用卡市場的佔有率已不是銀行業者的行銷重點，高佔有率並不一定就能帶來高經濟效益。銀行業者的行銷策略應該是做好信用卡市場區隔，找出不同特性的消費族群，依消費族者選擇信用卡的考量因素擬定行銷策略，進而提升市場競爭地位。本研究選用四種模型建置方式，分別為羅吉斯迴歸、C5.0、CHAID以及類神經網路，經由分類矩陣評估比較四種模型，其中C5.0不論是在整體預測正確率、反查率或準確度，皆是高於其它三個模型，故最後選擇C5.0此一模型。透過C5.0共獲得七項影響「是否有使用信用卡」之相關變數，其中「是否有出國旅行」、「經濟來源是否為自己」、「性別」、「是否畢業後找工作」、「是否有使用網路消費」、「認同環保意識」、「是否有投資或買保險」，此七項變數對使用信用卡消費具較大影響力，最後本研究會針對這些變數再給與發卡銀行建議。【關鍵字】信用卡、資料採礦、C5.0、CHAID、類神經網路 / As the trend of financial liberalization and globalization and also the popularization of electronic business and the increase of domestic income, the credit card market has bloomed vigorously then ever, banks are urging on developing credit card markets. All those additional service of every bank could be seen as a clue to know the competitiveness in Taiwan, such as free gift, free annual fee, zero risk of losing cards, shopping discount…etc., and those service almost become a basic equipment of every credit card. With credit debt and credit card slaves increasing, bank’s former marketing strategies have failed. The economic benefits of credit card market still are not ignored. Today, how to increase market share of credit card is not the key point of bank’s marketing strategy. There is not necessary that high market share can bring high economic benefits. In order to follow this trend, the study aims to discover the corn factors of possessing credit cards through the application of Clementine 12.0 software. Since Decision Tree-C5.0 is excellent in the forecast accuracy and validity as compared to Logistic Regression, Decision Tree-CHAID and Neural Net were adopted in this research. Through using Decision Tree-C5.0, this study identified seven factors that have greater impact on using credit cards and they are”Whether respondent travel abroad”,“Is the source of income making by yourself”,“Gender”,“Do respondent look for jobs after graduating from school”,“Do respondent buy something on the internet”,“Approve the environmental awareness”.This research will chiefly use these seven factors to provide the marketing portfolio strategy recommendations for banks. Keywords：Credit Card, Data Mining, C5.0, CHAID, Neural Net 信用卡資料採礦 C5.0 CHAID 類神經網路 Credit Card Data Mining C5.0 CHAID Neural Net
2	上呼吸道疾病的藥物交互作用連婉君 Unknown Date (has links) 藥物交互作用，輕則產生副作用或導致治療效果的降低，重者甚至會造成死亡或引發致命性的危險，這個問題確實值得引起大家的關注，尤其台灣醫師在開藥數目上較國外醫師來的多，引發藥物交互的機率也因而來的更高，如果能透過數據將此問題的嚴重性表露出來，且對此問題提出對策，對國內醫療品質的提升會是一大幫助。本研究的資料來源為中央健保局資料庫，資料選定範圍為每年就診率最高的呼吸道疾病，想藉由統計分析和資料採礦方法分析，從這龐大的資料中，得到與藥物交互作用相關的有用數據，並將這些數據轉換成訊息，讓大家了解。在統計分析的部分，從各個不同的角度來探討上呼吸道疾病的藥物交互作用現象，包括病患的基本資料，如：性別、年齡…，就醫行為，如：就醫月份、就分，開藥行為，如：開藥日份、開藥品項；至於資料採礦分析的部分，則是利用C5.0決策樹，來找出高危險群；最後，針對整體中最容易產生高危險交互作用的藥物組，對其藥效、藥物機轉…等做說明。 / Drug interactions, to a lesser extent, could cause side effects or reduce the drug efficiency, to a greater extent, is life-threatening or could lead to fatality. This issue deserves the attention from the medical personnel and us, the users. The prescription practice in Taiwan more often than not gives out relatively high doses of medications than those of the practice in North America. Consequently the risk of having drug interactions increases respectively. If we could bring the attention of the related party into this alarming situation with the supporting statistical numbers, then hopefully we could set off the stage for a better medical treatment in Taiwan. We used the data from National Health Insurance database, selecting data only on the patients contracting respiratory system disease which sit for the largest portion in the database. We examine the data using statistical and data mining techniques in an attempt to extract the data on drug interactions and turn the number into useful information for the benefit of all related parties. In light of the statistical analysis, we have in depth analysis of the drug interactions on medications used in upper respiratory tract from different perspective such as patient’s record which includes age, the frequency in seeing the physician, details on drug prescription. As for the analysis, data mining technique used is C5.0 decision tree in determining the group at risk for getting the side effects of drug interactions. Moreover, we also have summarized the list of drugs that have shown the tendency to induce drug interaction so that extra caution should be taken when administering these groups of medications. 藥物交互作用上呼吸道疾病 c5.0決策樹
3	Automatizace generování stopslov Krupník, Jiří January 2014 (has links) This diploma thesis focuses its point on automatization of stopwords generation as one method of pre-processing a textual documents. It analyses an influence of stopwords removal to a result of data mining tasks (classification and clustering). First the text mining techniques and frequently used algorithms are described. Methods of creating domain specific lists of stopwords are described to detail. In the end the results of large collections of text files testing and implementation methods are presented and discussed.
4	應用資料採礦技術於保險公司附加保單之增售李家旭 Unknown Date (has links) 摘要本研究主是利用資料採礦技術，應用於人身保險公司，試圖尋找出購買附加保單的保戶之模式，以提高保戶購買附加保單之比例。資料來源為我國某人身保險業所提供之客戶資料，原始資料共計1,500,943筆，經過資料清理後分析資料為92,581筆，隨後進行基本敘述統計分析，與決策樹、類神經網路、關聯規則等資料採礦技術，其分析結果如下：一、主保單的險種類型分為三種：死亡險、生死合險、健康險；不同的保單類型的保戶，有著不同的附加保單購買習慣。主保單為死亡險的保戶，主要因為保險需求而購買該主保單；保單為生死合險的保戶，主要因為儲蓄需求而購買保單；保單為健康險的保戶，是比較特別的族群，因為以往健康險是以附加保單形式出售，但保險公司因應潮流將健康險調整成也可以主保單形式出售，使得健康險中不會購買附加保單。二、新保戶購買主保單為死亡險的客戶時，依照分類迴歸樹模型，預測此客戶是否有意願購買附加保單。新保戶購買主保單為生死合險的客戶時，依照分類迴歸樹模型，預測此客戶是否有意願購買附加保單。三、保險公司可依照關聯規則結果產生出的8條關聯規則，針對舊有客戶進行保險商品再推銷策略。 / Abstract The main purpose of this research is to apply data mining techniques, namely decision tree, neural network, and association on insurance company’s database in modeling the behaviors of customers who bought the policies. Data source is provided by the insurance company in Taiwan. 1、There are 3 type of main insurance policies：death insurance、endowment insurance、health insurance. Insurance buyers behave differently based upon the type of insurance they have. Death insurance buyers are in for the sole purpose of being insured. Endowment insurance buyers are in for the purpose of savings. Health insurance buyers usually buy the policies as the add-on products, However as consumers in a recent trend have become more health conscious, the health insurance that used to be as consumers in a recent trend have become more health conscious, the health insurance that used to be bought as the add-on products have become the main drive and being sold as main policy for the insurance company. 2、With the above information at hand, we use CART model to predict whether the death and endowment insurance buyers will have any potential in getting the add-on policies thereby opening the window of opportunities for the insurance issuers to come up and be able to promote the new line of products to their existing customers based on the research findings. 3、The insurance company can re-promote their insurance merchandises to old customers according to the 8 rules constructed by the association rules. 資料採礦分類迴歸樹類神經網路附加保單 C5.0
5	A Machine Learning Framework for the Classification of Natura 2000 Habitat Types at Large Spatial Scales Using MODIS Surface Reflectance Data Sittaro, Fabian, Hutengs, Christopher, Semella, Sebastian, Vohland, Michael 02 June 2023 (has links) Anthropogenic climate and land use change is causing rapid shifts in the distribution and composition of habitats with profound impacts on ecosystem biodiversity. The sustainable management of ecosystems requires monitoring programmes capable of detecting shifts in habitat distribution and composition at large spatial scales. Remote sensing observations facilitate such efforts as they enable cost-efficient modelling approaches that utilize publicly available datasets and can assess the status of habitats over extended periods of time. In this study, we introduce a modelling framework for habitat monitoring in Germany using readily available MODIS surface reflectance data. We developed supervised classification models that allocate (semi-)natural areas to one of 18 classes based on their similarity to Natura 2000 habitat types. Three machine learning classifiers, i.e., Support Vector Machines (SVM), Random Forests (RF), and C5.0, and an ensemble approach were employed to predict habitat type using spectral signatures from MODIS in the visible-to-near-infrared and short-wave infrared. The models were trained on homogenous Special Areas of Conservation that are predominantly covered by a single habitat type with reference data from 2013, 2014, and 2016 and tested against ground truth data from 2010 and 2019 for independent model validation. Individually, the SVM and RF methods achieved better overall classification accuracies (SVM: 0.72–0.93%, RF: 0.72–0.94%) than the C5.0 algorithm (0.66–0.93%), while the ensemble classifier developed from the individual models gave the best performance with overall accuracies of 94.23% for 2010 and 80.34% for 2019 and also allowed a robust detection of non-classifiable pixels. We detected strong variability in the cover of individual habitat types, which were reduced when aggregated based on their similarity. Our methodology is capable to provide quantitative information on the spatial distribution of habitats, differentiate between disturbance events and gradual shifts in ecosystem composition, and could successfully allocate natural areas to Natura 2000 habitat types. info:eu-repo/classification/ddc/620 ddc:620
6	Técnicas de extracción de características y clasificación de imágenes orientada a objetos aplicadas a la actualización de bases de datos de ocupación del suelo Recio Recio, Jorge Abel 08 January 2010 (has links) El objetivo general de esta tesis es el desarrollo de metodologías para la actualización de bases de datos cartográficas de ocupación del suelo, basadas en el empleo de datos de observación de la Tierra y geográficos. Esta actualización se aborda mediante la integración y el análisis de información cartográfica vectorial, imágenes aéreas de alta resolución, la información alfanumérica contenida en la base de datos e información auxiliar. La integración de los datos se realiza mediante la extracción de características y la clasificación de imágenes orientada a objetos. En primer lugar, la cartografía aporta los límites espaciales que delimitan los objetos de estudio. En segundo lugar, el uso de las subparcelas se asigna mediante el análisis de un conjunto de características, como son las extraídas a partir del análisis de una imagen de alta resolución, o las definidas por su forma, su uso previo, etc. La asignación de clases se realiza con el multiclasificador boosting sobre un conjunto de árboles de decisión creados mediante el algoritmo C5.0, a partir de un conjunto de muestras de aprendizaje. Por último, se compara la clasificación de las subparcelas con la clase contenida en la base de datos, de forma que se detecten las discordancias entre ambas fuentes. Las cuales son revisadas por un fotointérprete con el fin de determinar si ha existido un cambio real o un error de clasificación. / Recio Recio, JA. (2009). Técnicas de extracción de características y clasificación de imágenes orientada a objetos aplicadas a la actualización de bases de datos de ocupación del suelo [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/6848 Cartografía de ocupación del suelo C5.0 Bases de datos cartográficas Teledetección Fotointerpretación Árboles de decisión Extracción de características
7	以資料採礦方法探討國內數位落差之現象 / Effect of Digital Divide in Taiwan: Data Mining Applications 林建宇, Lin,chien yu Unknown Date (has links) 全球化時代與資訊化社會的來臨，電腦與網際網路成為生活中不可或缺的要素，儘管至2008年為止，我國有將近七成的民眾透過網路科技享受到更多的便利性，但社會上仍存在著數位落差（digital divide）的問題，數位落差除了使得資訊窮人（information-poor）不易取得資訊，亦將對其經濟、人權等各方面造成影響。故研究目的在利用資料採礦的應用，配合SPSS Clementine 12.0的軟體，探討數位落差的現象，並嘗試找出形成數位落差的影響原因。　　本研究主要投入人口統計變數以及生活型態變數，並藉由C5.0決策樹、C&RT分類樹，以及CHAID分類樹建立模型，透過這三個分類迴歸樹的模型，發現到「年齡」、「教育程度」、「地理區域」、「個人資產狀況」、「經濟主要來源：子女」、「個人每月可支配所得」以及「收入來源：薪資」共七項變數同時對民眾是否成為數位落差中的資訊富人（information-rich）有著較重要的影響性，因此，研究最後依據此七項進行政策建議，以提供相關單位之參考。 / In this globalized and informational society, computers and internet networks are essential elements in our daily lives. Until the year 2008, almost 70% of population in Taiwan has enjoyed greater conveniences through networking technologies. However, the issue of “digital divide” remains, where information-poor cannot obtain information easily, and the issue affects the society in terms of economies and human rights. Consequently, the purpose of this research is aimed to find the reasons behind “digital divide” using data-mining techniques with SPSS Clementine 12.0 statistical software. 　　The research will input demographic variables and life-style variables. Using C5.0 decision tree, C&RT tree, and CHAID methodologies to build model, and subsequently discovers that whether the 7 variables - “age”, “level of education”, “location”, “personal asset status”, “main source of income: children”, “monthly personal disposal income” and “source of income: salary” will have significant impacts on information-rich population within “digital divide”. Therefore, the research recommendations will be provided according to the results from these 7 variables. 數位落差資訊富人資訊窮人資料採礦分類迴歸樹 C5.0決策樹 C&RT分類樹 CHAID分類樹 Digital Divide Information-rich Information-poor Data Mining Decision Tree C5.0 C&RT CHAID
8	應用商業智慧於汽車再購行為林秀玲 Unknown Date (has links) 在我國正式加入WTO後，汽車產業面臨嚴重競爭，且相較於其他國家，台灣的汽車市場趨近於飽和現象，在供過於求情況下，製造與裝配過程附加價值低，附加價值存在於隨品牌所提供的產品服務。藉由商業智慧系統之建立，透過資料採礦、預測的功能，將不同消費習慣的客戶進行分類，達到準確的目標行銷。　　本研究採用「決策樹C5.0」、「CART迴歸樹」與「羅吉斯迴歸」等方法建立模型於新車滿意度資料庫、車主購買滿意度資料庫及車主維修滿意度資料庫之資料。從三大資料庫中找出既有顧客群裡，較有可能會再次購買同品牌汽車的顧客特性，進一步整合行銷策略，提升顧客再次購買的機會。根據不同資料庫，本研究目的如下：一、分析於新車滿意度資料庫之資料，此資料包含2005年購買新車之車主資料，主要是建立客戶類型區隔模型及再購模型。二、分析於車主購買滿意度資料庫之資料，分別建立顧客是否會再次購買以及是否會向親朋好友推薦該品牌之模型。三、分析於車主維修滿意度資料庫之資料，分別建立顧客是否會再次購買、是否會推薦親朋好友以及是否會再次回廠維修之模型。商業智慧資料採礦汽車再購再購意願決策樹C5.0 CART迴歸樹羅吉斯迴歸
9	Anomaly-based network intrusion detection enhancement by prediction threshold adaptation of binary classification models Al Tobi, Amjad Mohamed January 2018 (has links) Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the performance (accuracy) of anomaly-based network Intrusion Detection Systems (IDS) that are built using predictive models in a batch-learning setup. This thesis investigates how adapting the discriminating threshold of model predictions, specifically to the evaluated traffic, improves the detection rates of these Intrusion Detection models. Specifically, this thesis studied the adaptability features of three well known Machine Learning algorithms: C5.0, Random Forest, and Support Vector Machine. The ability of these algorithms to adapt their prediction thresholds was assessed and analysed under different scenarios that simulated real world settings using the prospective sampling approach. A new dataset (STA2018) was generated for this thesis and used for the analysis. This thesis has demonstrated empirically the importance of threshold adaptation in improving the accuracy of detection models when training and evaluation (test) traffic have different statistical properties. Further investigation was undertaken to analyse the effects of feature selection and data balancing processes on a model's accuracy when evaluation traffic with different significant features were used. The effects of threshold adaptation on reducing the accuracy degradation of these models was statistically analysed. The results showed that, of the three compared algorithms, Random Forest was the most adaptable and had the highest detection rates. This thesis then extended the analysis to apply threshold adaptation on sampled traffic subsets, by using different sample sizes, sampling strategies and label error rates. This investigation showed the robustness of the Random Forest algorithm in identifying the best threshold. The Random Forest algorithm only needed a sample that was 0.05% of the original evaluation traffic to identify a discriminating threshold with an overall accuracy rate of nearly 90% of the optimal threshold. 004
10	資料採礦於資訊流通業(B2B)之應用研究—以個案公司為例陳炳輝, Chen, Ping-Hui Unknown Date (has links) 所謂資料採礦是指『從大量資料或大型資料庫中由電腦自動選取一些重要的、潛在有用的資料類型或知識』。目前資料採礦所包含的各種技術已被廣泛的應用在許多領域上，本研究即要利用資料採礦的技術從大量的客戶交易資料中採掘出客戶與商品之間的關聯性知識，並將之應用未來客戶銷售活動。資料採礦於流通業多為B2C之應用，本研究則嘗試將資料採礦分析應用於B2B之交易分析，並以個案公司與其客戶之實際銷售資料為本研究之資料來源，本研究利用Clementine電腦軟體為資料採礦工具，並依分析目的之不同，運用該軟體提供之各項採礦模組分別對個案公司之交易資料進行分析，如： .使用關聯網〈web〉的方式，針對個案資料，尋找商品銷售間的強弱關係，挑出銷售關聯性較高的商品組合，並且利用C5.0決策樹演算法，尋找該交易行為的對象之特性為何。 .使用Apriori演算法，針對BZ(商圈)、DL(經銷商)、SP(門市)等不同客戶類型在不同的資料期間，找出資料中所有商品之關聯規則。 *.利用Apriori演算法，利用前半年資料，找出IFAKMB(主機板)、IFDDLC(LCD監視器)、IFCOCP(中央處理器)等類別商品的購買規則，並分別以後半年的資料進行驗證，探究此規則之可行性。接著針對各項資料採礦結果，就個案公司之實際狀況進行解讀，同時更重要的是探討該分析結果應用於銷售實務上之可行性，如：產品銷售規則，行銷策略、促銷戰術之擬定等。最後並以本研究之結果及經驗，對個案公司提出資訊管理系統資料補強之建議及資料採礦於未來可再延伸探討之應用方向。資訊流通業資料採礦關聯圖 C 5.0決策樹演算法 Apriori關聯規則演算法 data mining decision tree C5.0 Apriori association rule B2B

1	應用資料採礦技術於信用卡使用行為及市場需求 / Applications of Data Mining Techniques to the Behavior of Using Credit Cards and Market Demand 游涵茵 Unknown Date (has links) 隨著金融自由化、國際化的趨勢，加上國民所得提高、電子化的普及，使得信用卡市場蓬勃發展，國內各大銀行紛紛積極投入信用卡發卡行列。台灣的信用卡市場競爭的程度，從各發卡銀行所提供消費者的各項附加服務，如辦卡送禮、持卡免年費、失卡零風險、購物優惠…等，幾乎都已是每一張信用卡的基本配備。隨著卡債、卡奴的事件爆發，銀行業者舊有的信用卡行銷策略已經宣告失敗，但信用卡市場背後帶來的經濟效益，仍然是不容忽視，如今，要如何增加信用卡市場的佔有率已不是銀行業者的行銷重點，高佔有率並不一定就能帶來高經濟效益。銀行業者的行銷策略應該是做好信用卡市場區隔，找出不同特性的消費族群，依消費族者選擇信用卡的考量因素擬定行銷策略，進而提升市場競爭地位。本研究選用四種模型建置方式，分別為羅吉斯迴歸、C5.0、CHAID以及類神經網路，經由分類矩陣評估比較四種模型，其中C5.0不論是在整體預測正確率、反查率或準確度，皆是高於其它三個模型，故最後選擇C5.0此一模型。透過C5.0共獲得七項影響「是否有使用信用卡」之相關變數，其中「是否有出國旅行」、「經濟來源是否為自己」、「性別」、「是否畢業後找工作」、「是否有使用網路消費」、「認同環保意識」、「是否有投資或買保險」，此七項變數對使用信用卡消費具較大影響力，最後本研究會針對這些變數再給與發卡銀行建議。【關鍵字】信用卡、資料採礦、C5.0、CHAID、類神經網路 / As the trend of financial liberalization and globalization and also the popularization of electronic business and the increase of domestic income, the credit card market has bloomed vigorously then ever, banks are urging on developing credit card markets. All those additional service of every bank could be seen as a clue to know the competitiveness in Taiwan, such as free gift, free annual fee, zero risk of losing cards, shopping discount…etc., and those service almost become a basic equipment of every credit card. With credit debt and credit card slaves increasing, bank’s former marketing strategies have failed. The economic benefits of credit card market still are not ignored. Today, how to increase market share of credit card is not the key point of bank’s marketing strategy. There is not necessary that high market share can bring high economic benefits. In order to follow this trend, the study aims to discover the corn factors of possessing credit cards through the application of Clementine 12.0 software. Since Decision Tree-C5.0 is excellent in the forecast accuracy and validity as compared to Logistic Regression, Decision Tree-CHAID and Neural Net were adopted in this research. Through using Decision Tree-C5.0, this study identified seven factors that have greater impact on using credit cards and they are”Whether respondent travel abroad”,“Is the source of income making by yourself”,“Gender”,“Do respondent look for jobs after graduating from school”,“Do respondent buy something on the internet”,“Approve the environmental awareness”.This research will chiefly use these seven factors to provide the marketing portfolio strategy recommendations for banks. Keywords：Credit Card, Data Mining, C5.0, CHAID, Neural Net 信用卡資料採礦 C5.0 CHAID 類神經網路 Credit Card Data Mining C5.0 CHAID Neural Net
2	上呼吸道疾病的藥物交互作用連婉君 Unknown Date (has links) 藥物交互作用，輕則產生副作用或導致治療效果的降低，重者甚至會造成死亡或引發致命性的危險，這個問題確實值得引起大家的關注，尤其台灣醫師在開藥數目上較國外醫師來的多，引發藥物交互的機率也因而來的更高，如果能透過數據將此問題的嚴重性表露出來，且對此問題提出對策，對國內醫療品質的提升會是一大幫助。本研究的資料來源為中央健保局資料庫，資料選定範圍為每年就診率最高的呼吸道疾病，想藉由統計分析和資料採礦方法分析，從這龐大的資料中，得到與藥物交互作用相關的有用數據，並將這些數據轉換成訊息，讓大家了解。在統計分析的部分，從各個不同的角度來探討上呼吸道疾病的藥物交互作用現象，包括病患的基本資料，如：性別、年齡…，就醫行為，如：就醫月份、就分，開藥行為，如：開藥日份、開藥品項；至於資料採礦分析的部分，則是利用C5.0決策樹，來找出高危險群；最後，針對整體中最容易產生高危險交互作用的藥物組，對其藥效、藥物機轉…等做說明。 / Drug interactions, to a lesser extent, could cause side effects or reduce the drug efficiency, to a greater extent, is life-threatening or could lead to fatality. This issue deserves the attention from the medical personnel and us, the users. The prescription practice in Taiwan more often than not gives out relatively high doses of medications than those of the practice in North America. Consequently the risk of having drug interactions increases respectively. If we could bring the attention of the related party into this alarming situation with the supporting statistical numbers, then hopefully we could set off the stage for a better medical treatment in Taiwan. We used the data from National Health Insurance database, selecting data only on the patients contracting respiratory system disease which sit for the largest portion in the database. We examine the data using statistical and data mining techniques in an attempt to extract the data on drug interactions and turn the number into useful information for the benefit of all related parties. In light of the statistical analysis, we have in depth analysis of the drug interactions on medications used in upper respiratory tract from different perspective such as patient’s record which includes age, the frequency in seeing the physician, details on drug prescription. As for the analysis, data mining technique used is C5.0 decision tree in determining the group at risk for getting the side effects of drug interactions. Moreover, we also have summarized the list of drugs that have shown the tendency to induce drug interaction so that extra caution should be taken when administering these groups of medications. 藥物交互作用上呼吸道疾病 c5.0決策樹
3	Automatizace generování stopslov Krupník, Jiří January 2014 (has links) This diploma thesis focuses its point on automatization of stopwords generation as one method of pre-processing a textual documents. It analyses an influence of stopwords removal to a result of data mining tasks (classification and clustering). First the text mining techniques and frequently used algorithms are described. Methods of creating domain specific lists of stopwords are described to detail. In the end the results of large collections of text files testing and implementation methods are presented and discussed.
4	應用資料採礦技術於保險公司附加保單之增售李家旭 Unknown Date (has links) 摘要本研究主是利用資料採礦技術，應用於人身保險公司，試圖尋找出購買附加保單的保戶之模式，以提高保戶購買附加保單之比例。資料來源為我國某人身保險業所提供之客戶資料，原始資料共計1,500,943筆，經過資料清理後分析資料為92,581筆，隨後進行基本敘述統計分析，與決策樹、類神經網路、關聯規則等資料採礦技術，其分析結果如下：一、主保單的險種類型分為三種：死亡險、生死合險、健康險；不同的保單類型的保戶，有著不同的附加保單購買習慣。主保單為死亡險的保戶，主要因為保險需求而購買該主保單；保單為生死合險的保戶，主要因為儲蓄需求而購買保單；保單為健康險的保戶，是比較特別的族群，因為以往健康險是以附加保單形式出售，但保險公司因應潮流將健康險調整成也可以主保單形式出售，使得健康險中不會購買附加保單。二、新保戶購買主保單為死亡險的客戶時，依照分類迴歸樹模型，預測此客戶是否有意願購買附加保單。新保戶購買主保單為生死合險的客戶時，依照分類迴歸樹模型，預測此客戶是否有意願購買附加保單。三、保險公司可依照關聯規則結果產生出的8條關聯規則，針對舊有客戶進行保險商品再推銷策略。 / Abstract The main purpose of this research is to apply data mining techniques, namely decision tree, neural network, and association on insurance company’s database in modeling the behaviors of customers who bought the policies. Data source is provided by the insurance company in Taiwan. 1、There are 3 type of main insurance policies：death insurance、endowment insurance、health insurance. Insurance buyers behave differently based upon the type of insurance they have. Death insurance buyers are in for the sole purpose of being insured. Endowment insurance buyers are in for the purpose of savings. Health insurance buyers usually buy the policies as the add-on products, However as consumers in a recent trend have become more health conscious, the health insurance that used to be as consumers in a recent trend have become more health conscious, the health insurance that used to be bought as the add-on products have become the main drive and being sold as main policy for the insurance company. 2、With the above information at hand, we use CART model to predict whether the death and endowment insurance buyers will have any potential in getting the add-on policies thereby opening the window of opportunities for the insurance issuers to come up and be able to promote the new line of products to their existing customers based on the research findings. 3、The insurance company can re-promote their insurance merchandises to old customers according to the 8 rules constructed by the association rules. 資料採礦分類迴歸樹類神經網路附加保單 C5.0
5	A Machine Learning Framework for the Classification of Natura 2000 Habitat Types at Large Spatial Scales Using MODIS Surface Reflectance Data Sittaro, Fabian, Hutengs, Christopher, Semella, Sebastian, Vohland, Michael 02 June 2023 (has links) Anthropogenic climate and land use change is causing rapid shifts in the distribution and composition of habitats with profound impacts on ecosystem biodiversity. The sustainable management of ecosystems requires monitoring programmes capable of detecting shifts in habitat distribution and composition at large spatial scales. Remote sensing observations facilitate such efforts as they enable cost-efficient modelling approaches that utilize publicly available datasets and can assess the status of habitats over extended periods of time. In this study, we introduce a modelling framework for habitat monitoring in Germany using readily available MODIS surface reflectance data. We developed supervised classification models that allocate (semi-)natural areas to one of 18 classes based on their similarity to Natura 2000 habitat types. Three machine learning classifiers, i.e., Support Vector Machines (SVM), Random Forests (RF), and C5.0, and an ensemble approach were employed to predict habitat type using spectral signatures from MODIS in the visible-to-near-infrared and short-wave infrared. The models were trained on homogenous Special Areas of Conservation that are predominantly covered by a single habitat type with reference data from 2013, 2014, and 2016 and tested against ground truth data from 2010 and 2019 for independent model validation. Individually, the SVM and RF methods achieved better overall classification accuracies (SVM: 0.72–0.93%, RF: 0.72–0.94%) than the C5.0 algorithm (0.66–0.93%), while the ensemble classifier developed from the individual models gave the best performance with overall accuracies of 94.23% for 2010 and 80.34% for 2019 and also allowed a robust detection of non-classifiable pixels. We detected strong variability in the cover of individual habitat types, which were reduced when aggregated based on their similarity. Our methodology is capable to provide quantitative information on the spatial distribution of habitats, differentiate between disturbance events and gradual shifts in ecosystem composition, and could successfully allocate natural areas to Natura 2000 habitat types. info:eu-repo/classification/ddc/620 ddc:620
6	Técnicas de extracción de características y clasificación de imágenes orientada a objetos aplicadas a la actualización de bases de datos de ocupación del suelo Recio Recio, Jorge Abel 08 January 2010 (has links) El objetivo general de esta tesis es el desarrollo de metodologías para la actualización de bases de datos cartográficas de ocupación del suelo, basadas en el empleo de datos de observación de la Tierra y geográficos. Esta actualización se aborda mediante la integración y el análisis de información cartográfica vectorial, imágenes aéreas de alta resolución, la información alfanumérica contenida en la base de datos e información auxiliar. La integración de los datos se realiza mediante la extracción de características y la clasificación de imágenes orientada a objetos. En primer lugar, la cartografía aporta los límites espaciales que delimitan los objetos de estudio. En segundo lugar, el uso de las subparcelas se asigna mediante el análisis de un conjunto de características, como son las extraídas a partir del análisis de una imagen de alta resolución, o las definidas por su forma, su uso previo, etc. La asignación de clases se realiza con el multiclasificador boosting sobre un conjunto de árboles de decisión creados mediante el algoritmo C5.0, a partir de un conjunto de muestras de aprendizaje. Por último, se compara la clasificación de las subparcelas con la clase contenida en la base de datos, de forma que se detecten las discordancias entre ambas fuentes. Las cuales son revisadas por un fotointérprete con el fin de determinar si ha existido un cambio real o un error de clasificación. / Recio Recio, JA. (2009). Técnicas de extracción de características y clasificación de imágenes orientada a objetos aplicadas a la actualización de bases de datos de ocupación del suelo [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/6848 Cartografía de ocupación del suelo C5.0 Bases de datos cartográficas Teledetección Fotointerpretación Árboles de decisión Extracción de características
7	以資料採礦方法探討國內數位落差之現象 / Effect of Digital Divide in Taiwan: Data Mining Applications 林建宇, Lin,chien yu Unknown Date (has links) 全球化時代與資訊化社會的來臨，電腦與網際網路成為生活中不可或缺的要素，儘管至2008年為止，我國有將近七成的民眾透過網路科技享受到更多的便利性，但社會上仍存在著數位落差（digital divide）的問題，數位落差除了使得資訊窮人（information-poor）不易取得資訊，亦將對其經濟、人權等各方面造成影響。故研究目的在利用資料採礦的應用，配合SPSS Clementine 12.0的軟體，探討數位落差的現象，並嘗試找出形成數位落差的影響原因。　　本研究主要投入人口統計變數以及生活型態變數，並藉由C5.0決策樹、C&RT分類樹，以及CHAID分類樹建立模型，透過這三個分類迴歸樹的模型，發現到「年齡」、「教育程度」、「地理區域」、「個人資產狀況」、「經濟主要來源：子女」、「個人每月可支配所得」以及「收入來源：薪資」共七項變數同時對民眾是否成為數位落差中的資訊富人（information-rich）有著較重要的影響性，因此，研究最後依據此七項進行政策建議，以提供相關單位之參考。 / In this globalized and informational society, computers and internet networks are essential elements in our daily lives. Until the year 2008, almost 70% of population in Taiwan has enjoyed greater conveniences through networking technologies. However, the issue of “digital divide” remains, where information-poor cannot obtain information easily, and the issue affects the society in terms of economies and human rights. Consequently, the purpose of this research is aimed to find the reasons behind “digital divide” using data-mining techniques with SPSS Clementine 12.0 statistical software. 　　The research will input demographic variables and life-style variables. Using C5.0 decision tree, C&RT tree, and CHAID methodologies to build model, and subsequently discovers that whether the 7 variables - “age”, “level of education”, “location”, “personal asset status”, “main source of income: children”, “monthly personal disposal income” and “source of income: salary” will have significant impacts on information-rich population within “digital divide”. Therefore, the research recommendations will be provided according to the results from these 7 variables. 數位落差資訊富人資訊窮人資料採礦分類迴歸樹 C5.0決策樹 C&RT分類樹 CHAID分類樹 Digital Divide Information-rich Information-poor Data Mining Decision Tree C5.0 C&RT CHAID
8	應用商業智慧於汽車再購行為林秀玲 Unknown Date (has links) 在我國正式加入WTO後，汽車產業面臨嚴重競爭，且相較於其他國家，台灣的汽車市場趨近於飽和現象，在供過於求情況下，製造與裝配過程附加價值低，附加價值存在於隨品牌所提供的產品服務。藉由商業智慧系統之建立，透過資料採礦、預測的功能，將不同消費習慣的客戶進行分類，達到準確的目標行銷。　　本研究採用「決策樹C5.0」、「CART迴歸樹」與「羅吉斯迴歸」等方法建立模型於新車滿意度資料庫、車主購買滿意度資料庫及車主維修滿意度資料庫之資料。從三大資料庫中找出既有顧客群裡，較有可能會再次購買同品牌汽車的顧客特性，進一步整合行銷策略，提升顧客再次購買的機會。根據不同資料庫，本研究目的如下：一、分析於新車滿意度資料庫之資料，此資料包含2005年購買新車之車主資料，主要是建立客戶類型區隔模型及再購模型。二、分析於車主購買滿意度資料庫之資料，分別建立顧客是否會再次購買以及是否會向親朋好友推薦該品牌之模型。三、分析於車主維修滿意度資料庫之資料，分別建立顧客是否會再次購買、是否會推薦親朋好友以及是否會再次回廠維修之模型。商業智慧資料採礦汽車再購再購意願決策樹C5.0 CART迴歸樹羅吉斯迴歸
9	Anomaly-based network intrusion detection enhancement by prediction threshold adaptation of binary classification models Al Tobi, Amjad Mohamed January 2018 (has links) Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the performance (accuracy) of anomaly-based network Intrusion Detection Systems (IDS) that are built using predictive models in a batch-learning setup. This thesis investigates how adapting the discriminating threshold of model predictions, specifically to the evaluated traffic, improves the detection rates of these Intrusion Detection models. Specifically, this thesis studied the adaptability features of three well known Machine Learning algorithms: C5.0, Random Forest, and Support Vector Machine. The ability of these algorithms to adapt their prediction thresholds was assessed and analysed under different scenarios that simulated real world settings using the prospective sampling approach. A new dataset (STA2018) was generated for this thesis and used for the analysis. This thesis has demonstrated empirically the importance of threshold adaptation in improving the accuracy of detection models when training and evaluation (test) traffic have different statistical properties. Further investigation was undertaken to analyse the effects of feature selection and data balancing processes on a model's accuracy when evaluation traffic with different significant features were used. The effects of threshold adaptation on reducing the accuracy degradation of these models was statistically analysed. The results showed that, of the three compared algorithms, Random Forest was the most adaptable and had the highest detection rates. This thesis then extended the analysis to apply threshold adaptation on sampled traffic subsets, by using different sample sizes, sampling strategies and label error rates. This investigation showed the robustness of the Random Forest algorithm in identifying the best threshold. The Random Forest algorithm only needed a sample that was 0.05% of the original evaluation traffic to identify a discriminating threshold with an overall accuracy rate of nearly 90% of the optimal threshold. 004
10	資料採礦於資訊流通業(B2B)之應用研究—以個案公司為例陳炳輝, Chen, Ping-Hui Unknown Date (has links) 所謂資料採礦是指『從大量資料或大型資料庫中由電腦自動選取一些重要的、潛在有用的資料類型或知識』。目前資料採礦所包含的各種技術已被廣泛的應用在許多領域上，本研究即要利用資料採礦的技術從大量的客戶交易資料中採掘出客戶與商品之間的關聯性知識，並將之應用未來客戶銷售活動。資料採礦於流通業多為B2C之應用，本研究則嘗試將資料採礦分析應用於B2B之交易分析，並以個案公司與其客戶之實際銷售資料為本研究之資料來源，本研究利用Clementine電腦軟體為資料採礦工具，並依分析目的之不同，運用該軟體提供之各項採礦模組分別對個案公司之交易資料進行分析，如： .使用關聯網〈web〉的方式，針對個案資料，尋找商品銷售間的強弱關係，挑出銷售關聯性較高的商品組合，並且利用C5.0決策樹演算法，尋找該交易行為的對象之特性為何。 .使用Apriori演算法，針對BZ(商圈)、DL(經銷商)、SP(門市)等不同客戶類型在不同的資料期間，找出資料中所有商品之關聯規則。 *.利用Apriori演算法，利用前半年資料，找出IFAKMB(主機板)、IFDDLC(LCD監視器)、IFCOCP(中央處理器)等類別商品的購買規則，並分別以後半年的資料進行驗證，探究此規則之可行性。接著針對各項資料採礦結果，就個案公司之實際狀況進行解讀，同時更重要的是探討該分析結果應用於銷售實務上之可行性，如：產品銷售規則，行銷策略、促銷戰術之擬定等。最後並以本研究之結果及經驗，對個案公司提出資訊管理系統資料補強之建議及資料採礦於未來可再延伸探討之應用方向。資訊流通業資料採礦關聯圖 C 5.0決策樹演算法 Apriori關聯規則演算法 data mining decision tree C5.0 Apriori association rule B2B

Search results