• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 11
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 22
  • 22
  • 22
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Discussion on Fifty Years of Classification and Regression Trees

Rusch, Thomas, Zeileis, Achim 12 1900 (has links) (PDF)
In this discussion paper, we argue that the literature on tree algorithms is very fragmented. We identify possible causes and discuss good and bad sides of this situation. Among the latter is the lack of free open-source implementations for many algorithms. We argue that if the community adopts a standard of creating and sharing free open-source implementations for their developed algorithms and creates easy access to these programs the bad sides of the fragmentation will be actively combated and will benefit the whole scientific community. (authors' abstract)
2

Ovarian tumor risk factors study in a south medical center in Taiwan

Wu, Wei-Wen 05 July 2012 (has links)
This study discusses main risk factors of ovarian tumor to determine a tumor type. Since symptoms of ovarian tumor are not obvious as the tumors are located in pelvic, the ovarian tumor is difficult to detect. The symptoms are mostly stomach or lower abdomen swellings, which are often ignored. The probability of ovarian cancer is lower than cervical cancer, but the mortality rate is the highest of all gynecologic diseases. The study uses statistical methods to analyze risk factors of patients to determine the tumor type and an early treatment in order to reduce the death rate. The sources of the studies are from Kaohsiung Veterans General Hospital and are classified according to different cases of tumors based on ultrasound checks and other relevant risk factors, such as ages and tumor marks so as to provide a determined method to distinguish among benign, borderline and malignant ovarian tumors in order to create appropriate classification criteria for followups, surgeries, and references for tracking. To differentiate between malignant and nonmalignant (benign and borderline) cases, we use risk factors to construct classification and regression trees so as to help physicians to determine the tumor type. In the situation in which the non-malignant tumor may be determined, we use logistic regression model according to the degree of influence of risk factors to further classify between benign and borderline tumors. The aforementioned process can determine tumor types precisely and can also determine surgery types so as to help determining whether patients would need a follow-up.
3

Tree-based Models for Longitudinal Data

Liu, Dan 16 June 2014 (has links)
No description available.
4

Classifying natural forests using LiDAR data / Klassificering av nyckelbiotoper med hjälp av LiDAR-data

Arvidsson, Simon, Gullstrand, Marcus January 2019 (has links)
In forestry, natural forests are forest areas with high biodiversity, in need of preservation. The current mapping of natural forests is a tedious task that requires manual labor that could possibly be automated. In this paper we explore the main features used by a random forest algorithm to classify natural forest and managed forest in northern Sweden. The goal was to create a model with a substantial strength of agreement, meaning a Kappa value of 0.61 or higher, placing the model in the same range as models produced in previous research. We used raster data gathered from airborne LiDAR, combined with labeled sample areas, both supplied by the Swedish Forest Agency. Two experiments were performed with different features. Experiment 1 used features extracted using methods inspired from previous research while Experiment 2 further added upon those features. From the total number of used sample areas (n=2882), 70% was used to train the models and 30% was used for evaluation. The result was a Kappa value of 0.26 for Experiment 1 and 0.32 for Experiment 2. Features shown to be prominent are features derived from canopy height, where the supplied data also had the highest resolution. Percentiles, kurtosis and canopy crown areas derived from the canopy height were shown to be the most important for classification. The results fell short of our goal, possibly indicating a range of flaws in the data used. The size of the sample areas and resolution of raster data are likely important factors when extracting features, playing a large role in the produced model’s performance.
5

Statistical Methods In Credit Rating

Sezgin, Ozge 01 September 2006 (has links) (PDF)
Credit risk is one of the major risks banks and financial institutions are faced with. With the New Basel Capital Accord, banks and financial institutions have the opportunity to improve their risk management process by using Internal Rating Based (IRB) approach. In this thesis, we focused on the internal credit rating process. First, a short overview of credit scoring techniques and validation techniques was given. By using real data set obtained from a Turkish bank about manufacturing firms, default prediction logistic regression, probit regression, discriminant analysis and classification and regression trees models were built. To improve the performances of the models the optimum sample for logistic regression was selected from the data set and taken as the model construction sample. In addition, also an information on how to convert continuous variables to ordered scaled variables to avoid difference in scale problem was given. After the models were built the performances of models for whole data set including both in sample and out of sample were evaluated with validation techniques suggested by Basel Committee. In most cases classification and regression trees model dominates the other techniques. After credit scoring models were constructed and evaluated, cut-off values used to map probability of default obtained from logistic regression to rating classes were determined with dual objective optimization. The cut-off values that gave the maximum area under ROC curve and minimum mean square error of regression tree was taken as the optimum threshold after 1000 simulation. Keywords: Credit Rating, Classification and Regression Trees, ROC curve, Pietra Index
6

Applying Classification and Regression Trees to manage financial risk

Martin, Stephen Fredrick 16 August 2012 (has links)
This goal of this project is to develop a set of business rules to mitigate risk related to a specific financial decision within the prepaid debit card industry. Under certain circumstances issuers of prepaid debit cards may need to decide if funds on hold can be released early for use by card holders prior to the final transaction settlement. After a brief introduction to the prepaid card industry and the financial risk associated with the early release of funds on hold, the paper presents the motivation to apply the CART (Classification and Regression Trees) method. The paper provides a tutorial of the CART algorithms formally developed by Breiman, Friedman, Olshen and Stone in the monograph Classification and Regression Trees (1984), as well as, a detailed explanation of the R programming code to implement the RPART function. (Therneau 2010) Special attention is given to parameter selection and the process of finding an optimal solution that balances complexity against predictive classification accuracy when measured against an independent data set through a cross validation process. Lastly, the paper presents an analysis of the financial risk mitigation based on the resulting business rules. / text
7

Ekonometrický odhad očekávané úvěrové ztráty při selhání / Econometric Estimation of Loss Given Default

Jacina, Viktor January 2014 (has links)
One of the most mentioned credit risk parameters in banking sector is loss given default (LGD). The regulatory framework allows to use own LGD estimation procedures after approval. The classification and regression trees are appropriate and flexible in this context and they offer some advantages comparing to the traditional approaches such as linear regression model. This work includes a theoretical background on tree based methods. In the last section, loss given default from debit accounts is estimated using the random forests which show the best performance in this case.
8

Leveraging Artificial Intelligence to increase STEM Graduates Among Underrepresented Populations

Riep, Josette R. 05 October 2021 (has links)
No description available.
9

Forecasting Harmful Algal Blooms for Western Lake Erie using Data Driven Machine Learning Techniques

Reinoso, Nicholas L. 23 May 2017 (has links)
No description available.
10

Empirical Investigation of CART and Decision Tree Extraction from Neural Networks

Hari, Vijaya 27 April 2009 (has links)
No description available.

Page generated in 0.2099 seconds