Spelling suggestions: "subject:"classification anda regression tre"" "subject:"classification ando regression tre""
1 |
The Approach-dependent, Time-dependent, Label-constrained Shortest Path Problem and Enhancements for the CART Algorithm with Application to Transportation SystemsJeenanunta, Chawalit 30 July 2004 (has links)
In this dissertation, we consider two important problems pertaining to the analysis of transportation systems. The first of these is an approach-dependent, time-dependent, label-constrained shortest path problem that arises in the context of the Route Planner Module of the Transportation Analysis Simulation System (TRANSIMS), which has been developed by the Los Alamos National Laboratory for the Federal Highway Administration. This is a variant of the shortest path problem defined on a transportation network comprised of a set of nodes and a set of directed arcs such that each arc has an associated label designating a mode of transportation, and an associated travel time function that depends on the time of arrival at the tail node, as well as on the node via which this node was approached. The lattermost feature is a new concept injected into the time-dependent, label-constrained shortest path problem, and is used to model turn-penalties in transportation networks. The time spent at an intersection before entering the next link would depend on whether we travel straight through the intersection, or make a right turn at it, or make a left turn at it. Accordingly, we model this situation by incorporating within each link's travel time function a dependence on the link via which its tail node was approached. We propose two effective algorithms to solve this problem by adapting two efficient existing algorithms to handle time dependency and label constraints: the Partitioned Shortest Path (PSP) algorithm and the Heap-Dijkstra (HP-Dijkstra) algorithm, and present related theoretical complexity results. In addition, we also explore various heuristic methods to curtail the search. We explore an Augmented Ellipsoidal Region Technique (A-ERT) and a Distance-Based A-ERT, along with some variants to curtail the search for an optimal path between a given origin and destination to more promising subsets of the network. This helps speed up computation without sacrificing optimality. We also incorporate an approach-dependent delay estimation function, and in concert with a search tree level-based technique, we derive a total estimated travel time and use this as a key to prioritize node selections or to sort elements in the heap. As soon as we reach the destination node, while it is within some p% of the minimum key value of the heap, we then terminate the search. We name the versions of PSP and HP-Dijkstra that employ this method as Early Terminated PSP (ET-PSP) and Early Terminated Heap-Dijkstra (ETHP-Dijkstra) algorithms. All of these procedures are compared with the original Route Planner Module within TRANSIMS, which is implemented in the Linux operating system, using C++ along with the g++ GNU compiler.
Extensive computational testing has been conducted using available data from the Portland, Oregon, and Blacksburg, Virginia, transportation networks to investigate the efficacy of the developed procedures. In particular, we have tested twenty-five different combinations of network curtailment and algorithmic strategies on three test networks: the Blacksburg-light, the Blacksburg-full, and the BigNet network. The results indicate that the Heap-Dijkstra algorithm implementations are much faster than the PSP algorithmic approaches for solving the underlying problem exactly. Furthermore, mong the curtailment schemes, the ETHP-Dijkstra with p=5%, yields the best overall results. This method produces solutions within 0.37-1.91% of optimality, while decreasing CPU effort by 56.68% at an average, as compared with applying the best available exact algorithm.
The second part of this dissertation is concerned with the Classification and Regression Tree (CART) algorithm, and its application to the Activity Generation Module of TRANSIMS. The CART algorithm has been popularly used in various contexts by transportation engineers and planners to correlate a set of independent household demographic variables with certain dependent activity or travel time variables. However, the algorithm lacks an automated mechanism for deriving classification trees based on optimizing specified objective functions and handling desired side-constraints that govern the structure of the tree and the statistical and demographic nature of its leaf nodes. Using a novel set partitioning formulation, we propose new tree development, and more importantly, optimal pruning strategies to accommodate the consideration of such objective functions and side-constraints, and establish the theoretical validity of our approach. This general enhancement of the CART algorithm is then applied to the Activity Generator module of TRANSIMS. Related computational results are presented using real data pertaining to the Portland, Oregon, and Blacksburg, Virginia, transportation networks to demonstrate the flexibility and effectiveness of the proposed approach in classifying data, as well as to examine its numerical performance. The results indicate that a variety of objective functions and constraints can be readily accommodated to efficiently control the structural information that is captured by the developed classification tree as desired by the planner or analyst, dependent on the scope of the application at hand. / Ph. D.
|
2 |
Application of CART Decision Tree On the Evaluation of Mutual FundHsu, Chiny-Yin 04 August 2006 (has links)
None
|
3 |
Determinantes do acesso ao crédito rural: um estudo a partir do levantamento das unidades produtivas agropecuárias (LUPA) do Estado de São Paulo / Determinats of acces to rural credit: a study based on a survey of agricultural production units (LUPA, in Portuguese) of the State of São PauloEusébio, Gabriela dos Santos 22 February 2011 (has links)
Este trabalho busca compreender e mensurar as características dos produtores rurais que ampliam a probabilidade para que o mesmo tenha acesso ao crédito rural. Utilizando os dados do Levantamento das Unidades Produtivas Agropecuárias (LUPA) do Estado de São Paulo (2006/2007), que abrange todas as UPAs pertencentes aos 645 municípios do estado, foi possível detalhar as características observáveis dos produtores e das propriedades que acessaram o crédito rural em 2007. Para tanto, foi utilizado o método de Árvores de Classificação e Regressão. As estimações realizadas para todas as UPAs de estado de São Paulo mostraram que a diferença de tamanho das unidades produtivas é o principal determinante para o acesso ao crédito. Quando se analisa o acesso ao crédito para unidades produtivas de pequena, média e grande extensão, algumas variáveis apresentam maior impacto no acesso ao crédito. Para as unidades de pequena extensão (até dez hectares), a diversificação de cultura, entre cultura temporária e perene, aumenta a probabilidade dos produtores acessarem o crédito. Para propriedades de média extensão (até quinhentos hectares), a presença de vínculos institucionais, seja cooperativa, sindicato ou associação, e melhorias em gestão (uso de computador, acesso á assistência técnica oficial), além da diversificação de cultura, elevam as probabilidades de acesso ao crédito. A análise mostra também que para unidades produtivas de grande extensão as variáveis que impactam a probabilidade de acesso ao crédito rural estão relacionadas a participação em instituições (cooperado e associado), além de variáveis relacionadas à melhoria de gestão, independentemente do tipo de cultura cultivada pela UPA. / This paper aims to understand and measure the characteristics of farmers which enhance their likelihood of having access to rural credit. Using data from the Survey of Agricultural Production Units (LUPA, in portuguese) of São Paulo (2006/2007), which covers all 645 Agricultural Production Units belonging to municipalities in the state was possible to detail the observable characteristics and properties of the producers who have accessed rural credit in 2007. For this, we used the Classification and Regression Trees method. The estimates performed for all UPAs (in Portuguese) in the state of São Paulo showed that the difference in size of production units is the main determinant to access credit. When analyzing the access to credit for production units of small, medium and large extent, some variables have greater impact on access to credit. For units of small extent (up to ten hectares) the culture diversification between temporary and perennial crop, increases the likelihood of farmers to access credit. For production units of medium length (up to five hundred acres), the presence of institutional links, such as cooperative, union or association, and improvements in management (computer use, technical support officer access), and crop diversification, increase the likelihood of access to credit. The analysis also shows that for production units with large extent the variables that have more impact in the probability of access to rural credit are related to participation in institutions (cooperative and associate), and variables related to improvement management, regardless of the type of crop cultivated by UPA.
|
4 |
Determinantes do acesso ao crédito rural: um estudo a partir do levantamento das unidades produtivas agropecuárias (LUPA) do Estado de São Paulo / Determinats of acces to rural credit: a study based on a survey of agricultural production units (LUPA, in Portuguese) of the State of São PauloGabriela dos Santos Eusébio 22 February 2011 (has links)
Este trabalho busca compreender e mensurar as características dos produtores rurais que ampliam a probabilidade para que o mesmo tenha acesso ao crédito rural. Utilizando os dados do Levantamento das Unidades Produtivas Agropecuárias (LUPA) do Estado de São Paulo (2006/2007), que abrange todas as UPAs pertencentes aos 645 municípios do estado, foi possível detalhar as características observáveis dos produtores e das propriedades que acessaram o crédito rural em 2007. Para tanto, foi utilizado o método de Árvores de Classificação e Regressão. As estimações realizadas para todas as UPAs de estado de São Paulo mostraram que a diferença de tamanho das unidades produtivas é o principal determinante para o acesso ao crédito. Quando se analisa o acesso ao crédito para unidades produtivas de pequena, média e grande extensão, algumas variáveis apresentam maior impacto no acesso ao crédito. Para as unidades de pequena extensão (até dez hectares), a diversificação de cultura, entre cultura temporária e perene, aumenta a probabilidade dos produtores acessarem o crédito. Para propriedades de média extensão (até quinhentos hectares), a presença de vínculos institucionais, seja cooperativa, sindicato ou associação, e melhorias em gestão (uso de computador, acesso á assistência técnica oficial), além da diversificação de cultura, elevam as probabilidades de acesso ao crédito. A análise mostra também que para unidades produtivas de grande extensão as variáveis que impactam a probabilidade de acesso ao crédito rural estão relacionadas a participação em instituições (cooperado e associado), além de variáveis relacionadas à melhoria de gestão, independentemente do tipo de cultura cultivada pela UPA. / This paper aims to understand and measure the characteristics of farmers which enhance their likelihood of having access to rural credit. Using data from the Survey of Agricultural Production Units (LUPA, in portuguese) of São Paulo (2006/2007), which covers all 645 Agricultural Production Units belonging to municipalities in the state was possible to detail the observable characteristics and properties of the producers who have accessed rural credit in 2007. For this, we used the Classification and Regression Trees method. The estimates performed for all UPAs (in Portuguese) in the state of São Paulo showed that the difference in size of production units is the main determinant to access credit. When analyzing the access to credit for production units of small, medium and large extent, some variables have greater impact on access to credit. For units of small extent (up to ten hectares) the culture diversification between temporary and perennial crop, increases the likelihood of farmers to access credit. For production units of medium length (up to five hundred acres), the presence of institutional links, such as cooperative, union or association, and improvements in management (computer use, technical support officer access), and crop diversification, increase the likelihood of access to credit. The analysis also shows that for production units with large extent the variables that have more impact in the probability of access to rural credit are related to participation in institutions (cooperative and associate), and variables related to improvement management, regardless of the type of crop cultivated by UPA.
|
5 |
Classification and Regression Trees in R / Classification and Regression Trees in RNemčíková, Lucia January 2014 (has links)
Tree-based methods are a nice add-on to traditional statistical methods when solving classification and regression problems. The aim of this master thesis is not to judge which approach is better but rather bring the overview of these methods and apply them on the real data using R. Focus is made especially on the basic methodology of tree-based models and the application in specific software in order to provide wide range of tool for reader to be able to use these methods. One part of the thesis touches the advanced tree-based methods to provide full picture of possibilities.
|
6 |
Analyses Of Crash Occurence And Injury Severities On Multi Lane Highways Using Machine Learning AlgorithmsDas, Abhishek 01 January 2009 (has links)
Reduction of crash occurrence on the various roadway locations (mid-block segments; signalized intersections; un-signalized intersections) and the mitigation of injury severity in the event of a crash are the major concerns of transportation safety engineers. Multi lane arterial roadways (excluding freeways and expressways) account for forty-three percent of fatal crashes in the state of Florida. Significant contributing causes fall under the broad categories of aggressive driver behavior; adverse weather and environmental conditions; and roadway geometric and traffic factors. The objective of this research was the implementation of innovative, state-of-the-art analytical methods to identify the contributing factors for crashes and injury severity. Advances in computational methods render the use of modern statistical and machine learning algorithms. Even though most of the contributing factors are known a-priori, advanced methods unearth changing trends. Heuristic evolutionary processes such as genetic programming; sophisticated data mining methods like conditional inference tree; and mathematical treatments in the form of sensitivity analyses outline the major contributions in this research. Application of traditional statistical methods like simultaneous ordered probit models, identification and resolution of crash data problems are also key aspects of this study. In order to eliminate the use of unrealistic uniform intersection influence radius of 250 ft, heuristic rules were developed for assigning crashes to roadway segments, signalized intersection and access points using parameters, such as 'site location', 'traffic control' and node information. Use of Conditional Inference Forest instead of Classification and Regression Tree to identify variables of significance for injury severity analysis removed the bias towards the selection of continuous variable or variables with large number of categories. For the injury severity analysis of crashes on highways, the corridors were clustered into four optimum groups. The optimum number of clusters was found using Partitioning around Medoids algorithm. Concepts of evolutionary biology like crossover and mutation were implemented to develop models for classification and regression analyses based on the highest hit rate and minimum error rate, respectively. Low crossover rate and higher mutation reduces the chances of genetic drift and brings in novelty to the model development process. Annual daily traffic; friction coefficient of pavements; on-street parking; curbed medians; surface and shoulder widths; alcohol / drug usage are some of the significant factors that played a role in both crash occurrence and injury severities. Relative sensitivity analyses were used to identify the effect of continuous variables on the variation of crash counts. This study improved the understanding of the significant factors that could play an important role in designing better safety countermeasures on multi lane highways, and hence enhance their safety by reducing the frequency of crashes and severity of injuries. Educating young people about the abuses of alcohol and drugs specifically at high schools and colleges could potentially lead to lower driver aggression. Removal of on-street parking from high speed arterials unilaterally could result in likely drop in the number of crashes. Widening of shoulders could give greater maneuvering space for the drivers. Improving pavement conditions for better friction coefficient will lead to improved crash recovery. Addition of lanes to alleviate problems arising out of increased ADT and restriction of trucks to the slower right lanes on the highways would not only reduce the crash occurrences but also resulted in lower injury severity levels.
|
7 |
L'évaluation du risque de récidive chez les agresseurs sexuels adultesParent, Geneviève January 2008 (has links)
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal.
|
8 |
L'évaluation du risque de récidive chez les agresseurs sexuels adultesParent, Geneviève January 2008 (has links)
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal
|
9 |
Credit Scoring Methods And Accuracy RatioIscanoglu, Aysegul 01 August 2005 (has links) (PDF)
The credit scoring with the help of classification techniques provides to take easy and quick decisions in lending. However, no definite consensus has been reached with regard to the best method for credit scoring and in what conditions the methods performs best. Although a huge range of classification techniques has been used in this area, the logistic regression has been seen an important tool and used
very widely in studies. This study aims to examine accuracy and bias properties in parameter estimation of the logistic regression by using Monte Carlo simulations in four aspect which are dimension of the sets, length, the included percentage defaults in data and effect of variables on estimation. Moreover, application of some important statistical and non-statistical methods on Turkish credit default
data is provided and the method accuracies are compared for Turkish market. Finally, ratings on the results of best method is done by using receiver operating characteristic curve.
|
10 |
Establishment of a clinical algorithm for the diagnosis of P. falciparum malaria in children from an endemic area using a Classification and Regression Tree (CART) modelVinnemeier, Christof David 21 January 2015 (has links)
Die Weltgesundheitsorganisation WHO schätzte die Zahl der an Malaria erkrankten Menschen im Jahr 2009 auf weltweit 225 Millionen. Auf dem afrikanischen Kontinent betrafen 85% der durch Malaria verursachten Todesfälle Kinder unter fünf Jahren. Obwohl die Inzidenzen der P. falciparum-Malaria in einigen Teilen des subsaharischen Afrika sinken und andere Erkrankungen mit ähnlichen Symptomen wie denen der Malaria an Bedeutung gewinnen, ist eine vorsorgliche medikamentöse Behandlung im Verdachtsfall weiterhin üblich. Ziel dieser Arbeit ist die Generierung eines auf das Lebensalter bezogenen klinischen Algorithmus, der mit einfachen klinischen Symptomen die Diagnose einer P. falciparum - Parasitämie ermöglicht.
Die Studie wurde in einem ländlichen Krankenhaus in der Ashanti-Region in Ghana durchgeführt, welche über das ganze Jahr hinweg holoendemisch für Malaria ist. Insgesamt wurden 5447 ambulante Besuche von 3641 Patienten im Alter zwischen 2-60 Monaten analysiert. Alle Kinder wurden von einem Pädiater klinisch untersucht und es wurden ein kleines Blutbild sowie ein Malariaausstrich (‘Dicker Tropfen’) angefertigt. Mit Hilfe einesClassification and Regression Tree (CART) wurde ein klinischer Entscheidungsbaum für die Prädiktion einer Plasmodium-Parasitämie generiert und prädiktive Werte für alle erfassten Symptome berechnet.
Eine Parasitämie wurde bei Kindern im Alter von 2-12 Monaten mit einer Prävalenz von 13.8% und bei Kindern im Alter zwischen 12 und 60 Monatenmit einer Prävalenz von 30.6% gefunden. Das CART-Modell ergab altersabhängige Unterschiede in der Fähigkeit der Variablen eine Parasitämie vorherzusagen. Während sich bei Kindern im Alter zwischen 2 und 12 Monaten die „palmare Blässe“ als das wichtigste Symptom herausstellte, gewannen die Variablen „Fieber in der Anamnese“ und „erhöhte Körpertemperatur ≥ 37.5°C“ bei Kindern im Alter zwischen 12 und 60 Monaten an Bedeutung. Die Variable „palmare Blässe“ war bei Kindern jedes Alters signifikant (p<0.001) mit niedrigeren Hämoglobinwerten assoziiert. Im Vergleich zum Algorithmus des Integrated Management of Childhood Illness (IMCI) hatte das CART-Modell eine deutlich höhere Spezifität sowie einen höheren positiven prädiktiven Wert für die Vorhersage einer Parasitämie.
Die Anwendung von altersbezogenen Algorithmen erhöht die Spezifität der Vorhersage einer P. falciparum - Parasitämie. Selbst in einer Population mit einer hohen Prävalenz an Anämie ermöglicht der prädiktive Wert der „palmaren Blässe“ eine Erkennung von signifikant geringeren Hb-Werten. Die Bedeutung der „palmaren Blässe“ sollte daher in der Schulung von Gesundheitshelfern hervorgehoben werden. Mangels ausreichender Sensitivität kann allerdings weder auf Basis des besten Algorithmus noch mit „palmarer Blässe“ als einzelnem klinischem Zeichen eine Therapieentscheidung getroffen werden. Sie sind daher kein Ersatz für eine vorsorgliche medikamentöse Behandlung und einen Erregernachweis.
|
Page generated in 0.1663 seconds