Global ETD Search

61	Empirical Evaluations of Different Strategies for Classification with Skewed Class Distribution Ling, Shih-Shiung 09 August 2004 (has links) Existing classification analysis techniques (e.g., decision tree induction,) generally exhibit satisfactory classification effectiveness when dealing with data with non-skewed class distribution. However, real-world applications (e.g., churn prediction and fraud detection) often involve highly skewed data in decision outcomes. Such a highly skewed class distribution problem, if not properly addressed, would imperil the resulting learning effectiveness. In this study, we empirically evaluate three different approaches, namely the under-sampling, the over-sampling and the multi-classifier committee approaches, for addressing classification with highly skewed class distribution. Due to its popularity, C4.5 is selected as the underlying classification analysis technique. Based on 10 highly skewed class distribution datasets, our empirical evaluations suggest that the multi-classifier committee generally outperformed the under-sampling and the over-sampling approaches, using the recall rate, precision rate and F1-measure as the evaluation criteria. Furthermore, for applications aiming at a high recall rate, use of the over-sampling approach will be suggested. On the other hand, if the precision rate is the primary concern, adoption of the classification model induced directly from original datasets would be recommended. Classification Analysis Decision Tree Induction Multi-classifier Committee Approach Under-sampling Over-sampling Skewed Class Distribution
62	Applications of Data Mining on Drug Safety: Predicting Proper Dosage of Vancomycin for Patients with Renal Insufficiency and Impairment Yon, Chuen-huei 24 August 2004 (has links) Abstract Drug misuses result in medical resource wastes and significant society costs. Due to the narrow therapeutic range of vancomycin, appropriate vancomycin dosage is difficult to determine. When inappropriate dosage is used, such side effects as poisoning reaction or drug resistance may occur. Clinically, medical professionals adjust drug protocols of vancomycin based on the Therapeutic Drug Monitoring (TDM) results. TDM is usually defined as the clinical use of drug blood concentration measurements as an aid in dosage finding and adjustment. However, TDM cannot be applied to first-time treatments and, in case, dosage decisions need to reply on medical professionals¡¦ clinical experiences and judgments. Data mining has been applied in various medical and healthcare applications. In this study, we will employ a decision-tree induction (specifically, C4.5) and a backpropagation neural network technique for predicting the appropriateness of vancomycin usage for patients with renal insufficiency and impairment. In addition, we will evaluate whether the use of the boosting and bagging algorithms will improve predictive accuracy. Our empirical evaluation results suggest that use of the boosting and bagging algorithms could improve predictive accuracy. Specifically, use of C4.5 in conjunction with the AdaBoost algorithm achieves an overall accuracy of 79.65%, which significantly improves that of the existing practice, recording an accuracy rate at 41.38%. With respect to the appropriateness category (¡§Y¡¨) and the inappropriateness category (¡§N¡¨), C4.5 in conjunction with the AdaBoost algorithm can achieve a recall rate at 78.75% and 80.25%, respectively. Hence, the incorporation of data mining techniques to decision support would enhance the drug safety, which in turn, would improve patient safety and reduce subsequent medical resource wastes. Data Mining AdaBoost Drug Safety Bagging Backpropagation Network Classification Analysis Decision Tree Induction
63	A Framework for Designing Nursing Knowledge Management System and the Application to Pediatric Nursing Chen, Wei-jen 17 March 2007 (has links) With the advances in technology, the change of the healthcare environment, and the need for users, the use of computerized support systems or expert systems are able to cut down costs for unnecessary procedures, achieve higher levels of efficiency and productivity. Applied to the nursing department, it may provide good quality of care, decrease the time that nurses duplicate patient history, reduce nurses¡¦ burden and enhance the abilities to solve problems. The topic of this research mainly focused on the nursing department in the pediatric ward. I propose a framework for nursing knowledge management by using subjective data, objective data, assessment, and care plan (SOAP), which is used by the nursing staffs as a way of decision-making processes. The method is to collect subjective and objective data, read relevant clinical practice guidelines, make clinical judgments about patients¡¦ actual or potential problems and provide applicable nursing plans and interventions. The staffs review and make final decision to accept or reject these judgments, nursing plans and related interventions. If the staffs reject any judgment, nursing plan and intervention, the system should have inquiry-signs to ask physician and nursing staff. Then the staffs correct the inappropriateness. These clear and easy-to-follow processes help student nurses or beginning nurses cultivate their abilities to care and hope it can provide as a guide to nursing teaching and clinical patient care. Nursing Knowledge Management Systems Knowledge Management Nursing Process Nursing care plan IF-THEN Rules Decision Tree
64	Overview Of Solutions To Prevent Liquid Loading Problems In Gas Wells Binli, Ozmen 01 February 2010 (has links) (PDF) Every gas well ceases producing as reservoir pressure depletes. The usual liquid presence in the reservoir can cause further problems by accumulating in the wellbore and reducing production even more. There are a number of options in well completion to prevent liquid loading even before it becomes a problem. Tubing size and perforation interval optimization are the two most common methods. Although completion optimization will prevent liquid accumulation in the wellbore for a certain time, eventually as the reservoir pressure decreases more, the well will start loading. As liquid loading occurs it is crucial to recognize the problem at early stages and select a suitable prevention method. There are various methods to prevent liquid loading such as / gas lift, plunger lift, pumping and velocity string installation. This study set out to construct a decision tree for a possible expert system used to determine the best result for a particular gas well. The findings are tested to confirm by field applications as attempts of the expert system. TA Engineering Design 174
65	Improving Data Quality: Development and Evaluation of Error Detection Methods Lee, Nien-Chiu 25 July 2002 (has links) High quality of data are essential to decision support in organizations. However estimates have shown that 15-20% of data within an organization¡¦s databases can be erroneous. Some databases contain large number of errors, leading to a large potential problem if they are used for managerial decision-making. To improve data quality, data cleaning endeavors are needed and have been initiated by many organizations. Broadly, data quality problems can be classified into three categories, including incompleteness, inconsistency, and incorrectness. Among the three data quality problems, data incorrectness represents the major sources for low quality data. Thus, this research focuses on error detection for improving data quality. In this study, we developed a set of error detection methods based on the semantic constraint framework. Specifically, we proposed a set of error detection methods including uniqueness detection, domain detection, attribute value dependency detection, attribute domain inclusion detection, and entity participation detection. Empirical evaluation results showed that some of our proposed error detection techniques (i.e., uniqueness detection) achieved low miss rates and low false alarm rates. Overall, our error detection methods together could identify around 50% of the errors introduced by subjects during experiments. Semantic Constraint Error Detection Data Quality Outlier Detection Data Cleaning Decision Tree Induction
66	Applying Data Mining Techniques to the Prediction of Marine Smuggling Behaviors Lee, Chang-mou 26 July 2008 (has links) none 5-fold cross-validation Random Sampling Data Mining Artificial Neural Network Decision Tree
67	A Study on Relationship between Metropolitan Population and Airport Yearly Enplanement-Based on the Airports in the Mainland of the United States Yu, Heng-Tsung 19 January 2009 (has links) Nowadays, the aviation technology has become much reliable than ever, and air transportation is by far the best choice for long distance transportation. Airports serve as the flight nodes for air transportation, and the construction and development of airports are often considered as the most important development plans of the entire country or the local government. The huge amount of cost for constructing an airport and the long (usually more than fifty years) life cycle demand a comprehensive plan in the initial stage of an airport construction. Underestimating the transportation demand of the airport may make it difficult to extend the airport in the future and affect its subsequent operations. On the other hand, overestimating the transportation demand of the airport may result in over-investment and poor operation performance. Around the world, airports are often considered as enterprises. The governments and airport administrators have begun to pay attention to the operation performance of airports and adopt every indicator of the conduct in order to carry on the performance assessment. By doing so, they hope to reduce the operating cost, increase profit, and enlarge their competition advantages. Of the indicators of the operation performance, yearly enplanement has widely been considered as a key indicator. This research collected the data pertaining to commercial airports in the mainland of the United States whose yearly enplanements are over 2,500 passengers. It employs statistical method and decision tree to analyse the relationship between the population change of the metropolitan (population, density of population, population change, etc.) and the yearly enplanement change of airports. Also, we discuss the relationship among the number of airports in a metropolitan, the distance from an airport to the closest business center, the distance from the airport to another nearest airport, and the yearly enplanement change of the airport. Primary Commercial Service Airport Airport Development Decision Tree Metropolitan Population Hub Airport Airport Performance Enplanement
68	The application of machine learning methods in software verification and validation Phuc, Nguyen Vinh, 1955- 04 January 2011 (has links) Machine learning methods have been employed in data mining to discover useful, valid, and beneficial patterns for various applications of which, the domain encompasses areas in business, medicine, agriculture, census, and software engineering. Focusing on software engineering, this report presents an investigation of machine learning techniques that have been utilized to predict programming faults during the verification and validation of software. Artifacts such as traces in program executions, information about test case coverage and data pertaining to execution failures are of special interest to address the following concerns: Completeness for test suite coverage; Automation of test oracles to reduce human intervention in Software testing; Detection of faults causing program failures; Defect prediction in software. A survey of literature pertaining to the verification and validation of software also revealed a novel concept designed to improve black-box testing using Category-Partition for test specifications and test suites. The report includes two experiments using data extracted from source code available from the website (15) to demonstrate the application of a decision tree (C4.5) and the multilayer perceptron for fault prediction, and an example that shows a potential candidate for the Category-Partition scheme. The results from several research projects shows that the application of machine learning in software testing has achieved various degrees of success in effectively assisting software developers to improve their test strategy in verification and validation of software systems. / text Machine learning Software verification Software validation Algorithm Decision tree Category-partitioning
69	Multivariate real options valuation Wang, Tianyang 08 June 2011 (has links) This dissertation research focuses on modeling and evaluating multivariate uncertainties and the dependency between the uncertainties. Managing risk and making strategic decisions under uncertainty is critically important for both individual and corporate success. In this dissertation research, we present two new methodologies, the implied binomial tree approach and the dependent decision tree approach, to modeling multivariate decision making problems with practical applications in real options valuation. First, we present the implied binomial tree approach to consolidate the representation of multiple sources of uncertainty into univariate uncertainty, while capturing the impact of these uncertainties on the project’s cash flows. This approach provides a nonparametric extension of the approaches in the literature by allowing the project value to follow a generalized diffusion process in which the volatility may vary with time and with the asset prices, therefore offering more modeling flexibility. This approach was motivated by the Implied Binomial Tree (IBT) approach that is widely used to value complex financial options. By constructing the implied recombining binomial tree in a way so as to be consistent with the simulated market information, we extended the finance-based IBT method for real options valuation — when the options are contingent on the value of one or more market related uncertainties that are not traded assets. Further, we present a general framework based on copulas for modeling dependent multivariate uncertainties through the use of a decision tree. The proposed dependent decision tree model allows multiple dependent uncertainties with arbitrary marginal distributions to be represented in a decision tree with a sequence of conditional probability distributions. This general framework could be naturally applied in decision analysis and real options valuations, as well as in more general applications of dependent probability trees. While this approach to modeling dependencies can be based on several popular copula families as we illustrate, we focus on the use of the normal copula and present an efficient computational method for multivariate decision and risk analysis that can be standardized for convenient application. / text Real options Decision tree Copulas Dependence Evaluating uncertainties Implied binomial tree
70	Applying Data Mining Techniques on Continuous Sensed Data : For daily living activity recognition Li, Yunjie January 2014 (has links) Nowadays, with the rapid development of the Internet of Things, the applicationfield of wearable sensors has been continuously expanded and extended, especiallyin the areas of remote electronic medical treatment, smart homes ect. Human dailyactivities recognition based on the sensing data is one of the challenges. With avariety of data mining techniques, the activities can be automatically recognized. Butdue to the diversity and the complexity of the sensor data, not every kind of datamining technique can performed very easily, until after a systematic analysis andimprovement. In this thesis, several data mining techniques were involved in theanalysis of a continuous sensing dataset in order to achieve the objective of humandaily activities recognition. This work studied several data mining techniques andfocuses on three of them; Decision Tree, Naive Bayes and neural network, analyzedand compared these techniques according to the classification results. The paper alsoproposed some improvements to the data mining techniques according to thespecific dataset. The comparison of the three classification results showed that eachclassifier has its own limitations and advantages. The proposed idea of combing theDecision Tree model with the neural network model significantly increased theclassification accuracy in this experiment. data mining technique Decision Tree Naive Bayes neural network activities recognition

Search results