Spelling suggestions: "subject:"[een] CLASSIFICATION TREES"" "subject:"[enn] CLASSIFICATION TREES""
21 |
Les crises économiques et financières et les facteurs favorisant leur occurrence / Empirical varieties and leading contexts of economic and financial crisesCabrol, Sébastien 31 May 2013 (has links)
Cette étude vise à mettre en lumière les différences et similarités existant entre les principales crises économiques et financières ayant frappé un échantillon de 21 pays avancés depuis 1981. Nous analyserons plus particulièrement la crise des subprimes que nous rapprocherons avec des épisodes antérieurs. Nous étudierons à la fois les années du déclenchement des turbulences (analyse typologique) ainsi que celles les précédant (prévision). Cette analyse sera fondée sur l’utilisation de la méthode CART (Classification And Regression Trees). Cette technique non linéaire et non paramétrique permet de prendre en compte les effets de seuil et les interactions entre variables explicatives de façon à révéler plusieurs contextes distincts explicatifs d’un même événement. Dans le cadre d‘un modèle de prévision, l’analyse des années précédant les crises nous indique que les variables à surveiller sont : la variation et la volatilité du cours de l’once d’or, le déficit du compte courant en pourcentage du PIB et la variation de l’openness ratio et enfin la variation et la volatilité du taux de change. Dans le cadre de l’analyse typologique, l’étude des différentes variétés de crise (année du déclenchement de la crise) nous permettra d’identifier deux principaux types de turbulence d’un point de vue empirique. En premier lieu, nous retiendrons les crises globales caractérisées par un fort ralentissement ou une baisse de l’activité aux Etats-Unis et une faible croissance du PIB dans les pays touchés. D’autre part, nous mettrons en évidence des crises idiosyncratiques propres à un pays donné et caractérisées par une inflation et une volatilité du taux de change élevées. / The aim of this thesis is to analyze, from an empirical point of view, both the different varieties of economic and financial crises (typological analysis) and the context’s characteristics, which could be associated with a likely occurrence of such events. Consequently, we analyze both: years seeing a crisis occurring and years preceding such events (leading contexts analysis, forecasting). This study contributes to the empirical literature by focusing exclusively on the crises in advanced economies over the last 30 years, by considering several theoretical types of crises and by taking into account a large number of both economic and financial explanatory variables. As part of this research, we also analyze stylized facts related to the 2007/2008 subprimes turmoil and our ability to foresee crises from an epistemological perspective. Our empirical results are based on the use of binary classification trees through CART (Classification And Regression Trees) methodology. This nonparametric and nonlinear statistical technique allows us to manage large data set and is suitable to identify threshold effects and complex interactions among variables. Furthermore, this methodology leads to characterize crises (or context preceding a crisis) by several distinct sets of independent variables. Thus, we identify as leading indicators of economic and financial crises: variation and volatility of both gold prices and nominal exchange rates, as well as current account balance (as % of GDP) and change in openness ratio. Regarding the typological analysis, we figure out two main different empirical varieties of crises. First, we highlight « global type » crises characterized by a slowdown in US economic activity (stressing the role and influence of the USA in global economic conditions) and low GDP growth in the countries affected by the turmoil. Second, we find that country-specific high level of both inflation and exchange rates volatility could be considered as evidence of « idiosyncratic type » crises.
|
22 |
A Study on Comparison Websites in the Airline Industry and Using CART Methods to Determine Key Parameters in Flight Search Conversion / En studie av jämförelsehemsidor i flygbranschen och tillämpningen av CART metoder för att analysera nyckelparametrar i konvertering av flygsökningar.Hansén, Jacob, Gustafsson, Axel January 2019 (has links)
This bachelor thesis in applied mathematics and industrial engineering and management aimed to identify relationships between search parameters in flight comparison search engines and the exit conversion rate, while also investigating how the emergence of such comparison search engines has impacted the airline industry. To identify such relationships, several classification models were employed in conjunction with several sampling methods to produce a predictive model using the program R. To investigate the impact of the emergence of comparison websites, Porter's 5 forces and a SWOT - analysis were employed to analyze findings of a literature study and a qualitative interview. The classification models developed performed poorly with regards to several assessments metrics which suggested that there were little to no significance in the relationship between the search parameters investigated and exit conversion rate. Porter's 5 forces and the SWOT-analysis suggested that the competitive landscape of the airline industry has become more competitive and that airlines which do not manage to adapt to this changing market environment will experience decreasing profitability. / Detta kandidatexamensarbete inriktat på tillämpad matematik och industriell ekonomi syftade till att identifiera samband mellan sökparametrar från flygsökmotorer och konverteringsgraden för utträde till ett flygbolags hemsida, och samtidigt undersöka hur uppkomsten av flygsökmotorer har påverkat flygindustrin för flygbolag. För att identifiera sådana samband, tillämpades flera klassificeringsmodeller tillsammans med stickprovsmetoder för att bygga en predikativ modell i programmet R. För att undersöka påverkan av flygsökmotorer tillämpades Porters 5 krafter och SWOT-analys som teoretiska ramverk för att analysera information uppsamlad genom en litteraturstudie och en intervju. Klassificeringsmodellerna som byggdes presterade undermåligt med avseende på flera utvärderingsmått, vilket antydde att det fanns lite eller inget samband mellan de undersökta sökparametrarna och konverteringsgraden för utträde. Porters 5 krafter och SWOT-analysen visade att flygindustrin hade blivit mer konkurrensutsatt och att flygbolag som inte lyckas anpassa sig efter en omgivning i ändring kommer att uppleva minskande lönsamhet.
|
23 |
[pt] DESENVOLVIMENTO DE MODELOS PARA PREVISÃO DE QUALIDADE DE SISTEMAS DE RECONHECIMENTO DE VOZ / [en] DEVELOPMENT OF PREDICTION MODELS FOR THE QUALITY OF SPOKEN DIALOGUE SYSTEMSBERNARDO LINS DE ALBUQUERQUE COMPAGNONI 12 November 2021 (has links)
[pt] Spoken Dialogue Systems (SDS s) são sistemas baseados em computadores desenvolvidos para fornecerem informações e realizar tarefas utilizando o diálogo como forma de interação. Eles são capazes de reconhecimento de voz, interpretação, gerenciamento de diálogo e são capazes de ter uma voz como saída de dados, tentando reproduzir uma interação natural falada entre um usuário humano e um sistema. SDS s provém diferentes serviços, todos através de linguagem falada com um sistema. Mesmo com todo o
desenvolvimento nesta área, há escassez de informações sobre como avaliar a qualidade de tais sistemas com o propósito de otimização do mesmo. Com dois destes sistemas, BoRIS e INSPIRE, usados para reservas de restaurantes e gerenciamento de casas inteligentes, diversos experimentos foram conduzidos
no passado, onde tais sistemas foram utilizados para resolver tarefas específicas. Os participantes avaliaram a qualidade do sistema em uma série de questões. Além disso, todas as interações foram gravadas e anotadas por um especialista.O desenvolvimento de métodos para avaliação de performance é um tópico aberto de pesquisa na área de SDS s. Seguindo a idéia do modelo PARADISE (PARAdigm
for DIalogue System Evaluation – desenvolvido pro Walker e colaboradores na AT&T em 1998), diversos experimentos foram conduzidos para desenvolver modelos de previsão de performance de sistemas de reconhecimento de voz e linguagem falada. O objetivo desta dissertação de mestrado é desenvolver
modelos que permitam a previsão de dimensões de qualidade percebidas por um usuário humano, baseado em parâmetros instrumentalmente mensuráveis utilizando dados coletados nos experimentos realizados com os sistemas BoRIS e INSPIRE , dois sistemas de reconhecimento de voz (o primeiro para busca de
restaurantes e o segundo para Smart Homes). Diferentes algoritmos serão utilizados para análise (Regressão linear, Árvores de Regressão, Árvores de Classificação e Redes Neurais) e para cada um dos algoritmos, uma ferramenta diferente será programada em MATLAB, para poder servir de base para análise de experimentos futuros, sendo facilmente modificado para sistemas e parâmetros novos em estudos subsequentes.A idéia principal é desenvolver ferramentas que possam ajudar na otimização de um SDS sem o envolvimento direto de um usuário humano ou servir de ferramenta para estudos futuros na área. / [en] Spoken Dialogue Systems (SDS s) are computer-based systems developed to provide information and carry out tasks using speech as the interaction mode. They are capable of speech recognition, interpretation, management of dialogue and have speech output capabilities, trying to reproduce a more or less natural
spoken interaction between a human user and the system. SDS s provide several different services, all through spoken language. Even with all this development, there is scarcity of information on ways to assess and evaluate the quality of such systems with the purpose of optimization. With two of these SDS s ,BoRIS and INSPIRE, (used for Restaurant Booking Services and Smart Home Systems), extensive experiments were conducted in the past, where the systems were used to resolve specific tasks. The evaluators rated the quality of the system on a multitude of scales. In addition to that, the interactions were recorded and annotated by an expert. The development of methods for performance evaluation
is an open research issue in this area of SDS s. Following the idea of the PARADISE model (PARAdigm for DIalogue System Evaluation model, the most well-known model for this purpose (developed by Walker and co-workers at AT&T in 1998), several experiments were conducted to develop predictive
models of spoken dialogue performance. The objective of this dissertation is to develop and assess models which allow the prediction of quality dimensions as perceived by the human user, based on instrumentally measurable variables using all the collected data from the BoRIS and INSPIRE systems. Different types of
algorithms will be compared to their prediction performance and to how generic they are. Four different approaches will be used for these analyses: Linear regression, Regression Trees, Classification Trees and Neural Networks. For each of these methods, a different tool will be programmed using MATLAB, that can
carry out all experiments from this work and be easily modified for new experiments with data from new systems or new variables on future studies. All the used MATLAB programs will be made available on the attached CD with an operation manual for future users as well as a guide to modify the existing
programs to work on new data. The main idea is to develop tools that would help on the optimization of a spoken dialogue system without a direct involvement of the human user or serve as tools for future studies in this area.
|
24 |
Predicting Stock Price Direction for Asian Small Cap Stocks with Machine Learning Methods / Prediktering av Aktiekursriktningen för Asiatiska Småbolagsaktier med MaskininlärningAbazari, Tina, Baghchesara, Sherwin January 2021 (has links)
Portfolio managers have a great interest in detecting high-performing stocks early on. Detecting outperforming stocks has for long been of interest from a research as well as financial point of view. Quantitative methods to predict stock movements have been widely studied in diverse contexts, where some present promising results. The quantitative algorithms for such prediction models can be, to name a few, support vector machines, tree-based methods, and regression models, where each one can carry different predictive power. Most previous research focuses on indices such as S&P 500 or large-cap stocks, while small- and micro-cap stocks have been examined to a lesser extent. These types of stocks also commonly share the characteristic of high volatility, with prospects that can be difficult to assess. This study examines to which extent widely studied quantitative methods such as random forest, support vector machine, and logistic regression can produce accurate predictions of stock price directions on a quarterly and yearly basis. The problem is modeled as a binary classification task, where the aim is to predict whether a stock achieves a return above or below a benchmark index. The focus lies on Asian small- and micro-cap stocks. The study concludes that the random forest method for a binary yearly prediction produces the highest accuracy of 69.64%, where all three models produced higher accuracy than a binary quarterly prediction. Although the statistical power of the models can be ruled adequate, more extensive studies are desirable to examine whether other models or variables can increase the prediction accuracy for small- and micro-cap stocks. / Portföljförvaltare har ett stort intresse av att upptäcka högpresterande aktier tidigt. Detektering av högavkastande aktier har länge varit av stort intresse dels i forskningssyfte men också ur ett finansiellt perspektiv. Kvantitativa metoder för att förutsäga riktning av aktiepriset har studerats i stor utsträckning där vissa presenterar lovande resultat. De kvantitativa algoritmerna för sådana prediktionsmodeller kan vara, för att nämna ett fåtal, support vector machines, trädbaserade metoder och regressionsmodeller, där var och en kan bära olika prediktiv kraft. Majoriteten av tidigare studier fokuserar på index såsom S&P 500 eller storbolagsaktier, medan små- och mikrobolagsaktier har undersökts i mindre utsträckning. Dessa sistnämnda typer av aktier innehar ofta en hög volatilitet med framtidsutsikter som kan vara svåra att bedöma. Denna studie undersöker i vilken utsträckning väletablerade kvantitativa modeller såsom random forest, support vector machine och logistisk regression, kan ge korrekta förutsägelser av små- och mikrobolags aktiekursriktningar på kvartals- och årsbasis. I avhandlingen modelleras detta som ett binärt klassificeringsproblem, där avkastningen för varje aktie antingen är över eller under jämförelseindex. Fokuset ligger på asiatiska små-och mikrobolag. Studien drar slutsatsen att random forest för en binär årlig prediktion ger den högsta noggrannheten på 69,64 %, där samtliga tre modeller ger högre noggrannhet än en binär kvartalsprediktion. Även om modellerna bedöms vara statistiskt säkerställda, är det önskvärt med fler omfattande studier för att undersöka om andra modeller eller variabler kan öka noggrannheten i prediktionen för små- och mikrobolags aktiekursriktning.
|
25 |
[en] ESSAYS IN ECONOMETRICS: ONLINE LEARNING IN HIGH-DIMENSIONAL CONTEXTS AND TREATMENT EFFECTS WITH COMPLEX AND UNKNOWN ASSIGNMENT RULES / [pt] ESTUDOS EM ECONOMETRIA: APRENDIZADO ONLINE EM AMBIENTES DE ALTA DIMENSÃO E EFEITOS DE TRATAMENTO COM REGRAS DE ALOCAÇÃO COMPLEXAS E DESCONHECIDASCLAUDIO CARDOSO FLORES 04 October 2021 (has links)
[pt] Essa tese é composta por dois capítulos. O primeiro deles refere-se ao
problema de aprendizado sequencial, útil em diversos campos de pesquisa e
aplicações práticas. Exemplos incluem problemas de apreçamento dinâmico,
desenhos de leilões e de incentivos, além de programas e tratamentos sequenciais.
Neste capítulo, propomos a extensão de uma das mais populares regras
de aprendizado, epsilon-greedy, para contextos de alta-dimensão, levando em consideração
uma diretriz conservadora. Em particular, nossa proposta consiste em
alocar parte do tempo que a regra original utiliza na adoção de ações completamente
novas em uma busca focada em um conjunto restrito de ações promissoras.
A regra resultante pode ser útil para aplicações práticas nas quais existem
restrições suaves à adoção de ações não-usuais, mas que eventualmente, valorize
surpresas positivas, ainda que a uma taxa decrescente. Como parte dos resultados,
encontramos limites plausíveis, com alta probabilidade, para o remorso
cumulativo para a regra epsilon-greedy conservadora em alta-dimensão. Também,
mostramos a existência de um limite inferior para a cardinalidade do conjunto
de ações viáveis que implica em um limite superior menor para o remorso da
regra conservadora, comparativamente a sua versão não-conservadora. Adicionalmente,
usuários finais possuem suficiente flexibilidade em estabelecer o nível
de segurança que desejam, uma vez que tal nível não impacta as propriedades
teóricas da regra de aprendizado proposta. Ilustramos nossa proposta tanto
por meio de simulação, quanto por meio de um exercício utilizando base de
dados de um problema real de sistemas de classificação. Por sua vez, no segundo
capítulo, investigamos efeitos de tratamento determinísticos quando a
regra de aloção é complexa e desconhecida, talvez por razões éticas, ou para
evitar manipulação ou competição desnecessária. Mais especificamente, com
foco na metodologia de regressão discontínua sharp, superamos a falta de conhecimento
de pontos de corte na alocação de unidades, pela implementação
de uma floresta de árvores de classificação, que também utiliza aprendizado
sequencial na sua construção, para garantir que, assintoticamente, as regras de
alocação desconhecidas sejam identificadas corretamente. A estrutura de árvore
também é útil nos casos em que a regra de alocação desconhecida é mais complexa que as tradicionais univariadas. Motivado por exemplos da vida prática,
nós mostramos nesse capítulo que, com alta probabilidade e baseado em
premissas razoáveis, é possível estimar consistentemente os efeitos de tratamento
sob esse cenário. Propomos ainda um algoritmo útil para usuários finais
que se mostrou robusto para diferentes especificações e que revela com relativa
confiança a regra de alocação anteriormente desconhecida. Ainda, exemplificamos
os benefícios da metodologia proposta pela sua aplicação em parte do
P900, um programa governamental Chileno de suporte para escolas, que se
mostrou adequado ao cenário aqui estudado. / [en] Sequential learning problems are common in several fields of research
and practical applications. Examples include dynamic pricing and assortment,
design of auctions and incentives and permeate a large number of sequential
treatment experiments. In this essay, we extend one of the most popular
learning solutions, the epsilon-greedy heuristics, to high-dimensional contexts considering
a conservative directive. We do this by allocating part of the time the
original rule uses to adopt completely new actions to a more focused search
in a restrictive set of promising actions. The resulting rule might be useful for
practical applications that still values surprises, although at a decreasing rate,
while also has restrictions on the adoption of unusual actions. With high probability,
we find reasonable bounds for the cumulative regret of a conservative
high-dimensional decaying epsilon-greedy rule. Also, we provide a lower bound for
the cardinality of the set of viable actions that implies in an improved regret
bound for the conservative version when compared to its non-conservative
counterpart. Additionally, we show that end-users have sufficient flexibility
when establishing how much safety they want, since it can be tuned without
impacting theoretical properties. We illustrate our proposal both in a simulation
exercise and using a real dataset. The second essay studies deterministic
treatment effects when the assignment rule is both more complex than traditional
ones and unknown to the public perhaps, among many possible causes,
due to ethical reasons, to avoid data manipulation or unnecessary competition.
More specifically, sticking to the well-known sharp RDD methodology,
we circumvent the lack of knowledge of true cutoffs by employing a forest of
classification trees which also uses sequential learning, as in the last essay, to
guarantee that, asymptotically, the true unknown assignment rule is correctly
identified. The tree structure also turns out to be suitable if the program s rule
is more sophisticated than traditional univariate ones. Motivated by real world
examples, we show in this essay that, with high probability and based on reasonable
assumptions, it is possible to consistently estimate treatment effects
under this setup. For practical implementation we propose an algorithm that
not only sheds light on the previously unknown assignment rule but also is capable
to robustly estimate treatment effects regarding different specifications
imputed by end-users. Moreover, we exemplify the benefits of our methodology
by employing it on part of the Chilean P900 school assistance program, which
proves to be suitable for our framework.
|
Page generated in 0.403 seconds