• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 188
  • 56
  • 24
  • 10
  • 9
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 383
  • 232
  • 87
  • 73
  • 70
  • 66
  • 48
  • 46
  • 46
  • 40
  • 39
  • 37
  • 35
  • 34
  • 31
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
351

Deep Scenario Generation of Financial Markets / Djup scenario generering av finansiella marknader

Carlsson, Filip, Lindgren, Philip January 2020 (has links)
The goal of this thesis is to explore a new clustering algorithm, VAE-Clustering, and examine if it can be applied to find differences in the distribution of stock returns and augment the distribution of a current portfolio of stocks and see how it performs in different market conditions. The VAE-clustering method is as mentioned a newly introduced method and not widely tested, especially not on time series. The first step is therefore to see if and how well the clustering works. We first apply the algorithm to a dataset containing monthly time series of the power demand in Italy. The purpose in this part is to focus on how well the method works technically. When the model works well and generates proper results with the Italian Power Demand data, we move forward and apply the model on stock return data. In the latter application we are unable to find meaningful clusters and therefore unable to move forward towards the goal of the thesis. The results shows that the VAE-clustering method is applicable for time series. The power demand have clear differences from season to season and the model can successfully identify those differences. When it comes to the financial data we hoped that the model would be able to find different market regimes based on time periods. The model is though not able distinguish different time periods from each other. We therefore conclude that the VAE-clustering method is applicable on time series data, but that the structure and setting of the financial data in this thesis makes it to hard to find meaningful clusters. The major finding is that the VAE-clustering method can be applied to time series. We highly encourage further research to find if the method can be successfully used on financial data in different settings than tested in this thesis. / Syftet med den här avhandlingen är att utforska en ny klustringsalgoritm, VAE-Clustering, och undersöka om den kan tillämpas för att hitta skillnader i fördelningen av aktieavkastningar och förändra distributionen av en nuvarande aktieportfölj och se hur den presterar under olika marknadsvillkor. VAE-klusteringsmetoden är som nämnts en nyinförd metod och inte testad i stort, särskilt inte på tidsserier. Det första steget är därför att se om och hur klusteringen fungerar. Vi tillämpar först algoritmen på ett datasätt som innehåller månatliga tidsserier för strömbehovet i Italien. Syftet med denna del är att fokusera på hur väl metoden fungerar tekniskt. När modellen fungerar bra och ger tillfredställande resultat, går vi vidare och tillämpar modellen på aktieavkastningsdata. I den senare applikationen kan vi inte hitta meningsfulla kluster och kan därför inte gå framåt mot målet som var att simulera olika marknader och se hur en nuvarande portfölj presterar under olika marknadsregimer. Resultaten visar att VAE-klustermetoden är väl tillämpbar på tidsserier. Behovet av el har tydliga skillnader från säsong till säsong och modellen kan framgångsrikt identifiera dessa skillnader. När det gäller finansiell data hoppades vi att modellen skulle kunna hitta olika marknadsregimer baserade på tidsperioder. Modellen kan dock inte skilja olika tidsperioder från varandra. Vi drar därför slutsatsen att VAE-klustermetoden är tillämplig på tidsseriedata, men att strukturen på den finansiella data som undersöktes i denna avhandling gör det svårt att hitta meningsfulla kluster. Den viktigaste upptäckten är att VAE-klustermetoden kan tillämpas på tidsserier. Vi uppmuntrar ytterligare forskning för att hitta om metoden framgångsrikt kan användas på finansiell data i andra former än de testade i denna avhandling
352

Monitoring Vehicle Suspension Elements Using Machine Learning Techniques / Tillståndsövervakning av komponenter i fordonsfjädringssystem genom maskininlärningstekniker

Karlsson, Henrik January 2019 (has links)
Condition monitoring (CM) is widely used in industry, and there is a growing interest in applying CM on rail vehicle systems. Condition based maintenance has the possibility to increase system safety and availability while at the sametime reduce the total maintenance costs.This thesis investigates the feasibility of using condition monitoring of suspension element components, in this case dampers, in rail vehicles. There are different methods utilized to detect degradations, ranging from mathematicalmodelling of the system to pure "knowledge-based" methods, using only large amount of data to detect patterns on a larger scale. In this thesis the latter approach is explored, where acceleration signals are evaluated on severalplaces on the axleboxes, bogieframes and the carbody of a rail vehicle simulation model. These signals are picked close to the dampers that are monitored in this study, and frequency response functions (FRF) are computed between axleboxes and bogieframes as well as between bogieframes and carbody. The idea is that the FRF will change as the condition of the dampers change, and thus act as indicators of faults. The FRF are then fed to different classificationalgorithms, that are trained and tested to distinguish between the different damper faults.This thesis further investigates which classification algorithm shows promising results for the problem, and which algorithm performs best in terms of classification accuracy as well as two other measures. Another aspect explored is thepossibility to apply dimensionality reduction to the extracted indicators (features). This thesis is also looking into how the three performance measures used are affected by typical varying operational conditions for a rail vehicle,such as varying excitation and carbody mass. The Linear Support Vector Machine classifier using the whole feature space, and the Linear Discriminant Analysis classifier combined with Principal Component Analysis dimensionality reduction on the feature space both show promising results for the taskof correctly classifying upcoming damper degradations. / Tillståndsövervakning används brett inom industrin och det finns ett ökat intresse för att applicera tillståndsövervakning inom spårfordons olika system. Tillståndsbaserat underhåll kan potentiellt öka ett systems säkerhet och tillgänglighetsamtidigt som det kan minska de totala underhållskostnaderna.Detta examensarbete undersöker möjligheten att applicera tillståndsövervakning av komponenter i fjädringssystem, i detta fall dämpare, hos spårfordon. Det finns olika metoder för att upptäcka försämringar i komponenternas skick, från matematisk modellering av systemet till mer ”kunskaps-baserade” metodersom endast använder stora mängder data för att upptäcka mönster i en större skala. I detta arbete utforskas den sistnämnda metoden, där accelerationssignaler inhämtas från axelboxar, boggieramar samt vagnskorg från en simuleringsmodellav ett spårfordon. Dessa signaler är extraherade nära de dämpare som övervakas, och används för att beräkna frekvenssvarsfunktioner mellan axelboxar och boggieramar, samt mellan boggieramar och vagnskorg. Tanken är att frekvenssvarsfunktionerna förändras när dämparnas skick förändras ochpå så sätt fungera som indikatorer av dämparnas skick. Frekvenssvarsfunktionerna används sedan för att träna och testa olika klassificeringsalgoritmer för att kunna urskilja olika dämparfel.Detta arbete undersöker vidare vilka klassificeringsalgoritmer som visar lovande resultat för detta problem, och vilka av dessa som presterar bäst med avseende på noggrannheten i prediktionerna, samt två andra mått på algoritmernasprestanda. En annan aspekt som undersöks är möjligheten att applicera dimensionalitetsminskning på de extraherade indikatorerna. Detta arbete undersöker också hur de tre prestandamåtten som används påverkas av typiska förändringar i driftsförhållanden för ett spårfordon såsom varierande exciteringfrån spåret och vagnkorgsmassa. Resultaten visar lovande prestanda för klassificeringsalgoritmen ”Linear Support Vector Machine” som använder hela rymden med felindikatorer, samt algoritmen ”Linear Discriminant Analysis” i kombination med ”Principal Component Analysis” dimensionalitetsreducering.
353

Towards Scalable Machine Learning with Privacy Protection

Fay, Dominik January 2023 (has links)
The increasing size and complexity of datasets have accelerated the development of machine learning models and exposed the need for more scalable solutions. This thesis explores challenges associated with large-scale machine learning under data privacy constraints. With the growth of machine learning models, traditional privacy methods such as data anonymization are becoming insufficient. Thus, we delve into alternative approaches, such as differential privacy. Our research addresses the following core areas in the context of scalable privacy-preserving machine learning: First, we examine the implications of data dimensionality on privacy for the application of medical image analysis. We extend the classification algorithm Private Aggregation of Teacher Ensembles (PATE) to deal with high-dimensional labels, and demonstrate that dimensionality reduction can be used to improve privacy. Second, we consider the impact of hyperparameter selection on privacy. Here, we propose a novel adaptive technique for hyperparameter selection in differentially gradient-based optimization. Third, we investigate sampling-based solutions to scale differentially private machine learning to dataset with a large number of records. We study the privacy-enhancing properties of importance sampling, highlighting that it can outperform uniform sub-sampling not only in terms of sample efficiency but also in terms of privacy. The three techniques developed in this thesis improve the scalability of machine learning while ensuring robust privacy protection, and aim to offer solutions for the effective and safe application of machine learning in large datasets. / Den ständigt ökande storleken och komplexiteten hos datamängder har accelererat utvecklingen av maskininlärningsmodeller och gjort behovet av mer skalbara lösningar alltmer uppenbart. Den här avhandlingen utforskar tre utmaningar förknippade med storskalig maskininlärning under dataskyddskrav. För stora och komplexa maskininlärningsmodeller blir traditionella metoder för integritet, såsom datananonymisering, otillräckliga. Vi undersöker därför alternativa tillvägagångssätt, såsom differentiell integritet. Vår forskning behandlar följande utmaningar inom skalbar och integitetsmedveten maskininlärning: För det första undersöker vi hur hög data-dimensionalitet påverkar integriteten för medicinsk bildanalys. Vi utvidgar klassificeringsalgoritmen Private Aggregation of Teacher Ensembles (PATE) för att hantera högdimensionella etiketter och visar att dimensionsreducering kan användas för att förbättra integriteten. För det andra studerar vi hur valet av hyperparametrar påverkar integriteten. Här föreslår vi en ny adaptiv teknik för val av hyperparametrar i gradient-baserad optimering med garantier på differentiell integritet. För det tredje granskar vi urvalsbaserade lösningar för att skala differentiellt privat maskininlärning till stora datamängder. Vi studerar de integritetsförstärkande egenskaperna hos importance sampling och visar att det kan överträffa ett likformigt urval av sampel, inte bara när det gäller effektivitet utan även för integritet. De tre teknikerna som utvecklats i denna avhandling förbättrar skalbarheten för integritetsskyddad maskininlärning och syftar till att erbjuda lösningar för effektiv och säker tillämpning av maskininlärning på stora datamängder. / <p>QC 20231101</p>
354

Heart- and Sapwood Segmentation on Hyperspectral Images using Deep Learning

Hallin, Samuel, Samnegård, Simon January 2023 (has links)
For manufacturers in the wood industry, an important way to make the production more effective is to automate the process of detecting defects and different attributes on boards. One important attribute on most boards is heartwood and sapwood. This thesis project was conducted at the company MiCROTEC and aims to investigate methods to classify heartwood and sapwood on boards. The dataset used in this project consisted of oak boards. In order to increase the amount of information retrieved from the boards, hyperspectral imaging was used instead of conventional RGB cameras. Based on this data, deep learning models in the form of U-Net and U-within-U-Net architecture as well as different spectral dimensionality reduction methods were developed to segment boards in heartwood and sapwood. The performance of these deep learning models was compared to PLS-DA and SVM. PLS-DA has already been used at MiCROTEC and has been used in this work for comparison as a baseline model.   The result of the thesis work showed that a deep learning approach could increase the F1-Score from 0.730 for the baseline classifier PLS-DA to an F1-Score of 0.918, and that the different spectral reduction methods only had a small impact on the result. The increase in F1-score was mainly due to an increase in precision, since the PLS-DA had a similar recall as the deep learning models.
355

Estimating Poolability of Transport Demand Using Shipment Encoding : Designing and building a tool that estimates different poolability types of shipment groups using dimensionality reduction. / Uppskattning av Poolbarhet av Transportefterfrågan med Försändelsekodning : Designa och bygga ett verktyg som uppskattar olika typer av poolbarhetstyper av försändelsegrupper med hjälp av dimensionsreduktion och mätvärden för att mäta poolbarhetsegenskaper.

Kërçini, Marvin January 2023 (has links)
Dedicating less transport resources by grouping goods to be shipped together, or pooling as we name it, has a very crucial role in saving costs in transport networks. Nonetheless, it is not so easy to estimate pooling among different groups of shipments or understand why these groups are poolable. The typical solution would be to consider all shipments of both groups as one and use some Vehicle Routing Problem (VRP) software to estimate costs of the new combined group. However, this brings with it some drawbacks, such as high computational costs and no pooling explainability. On this work we build a tool that estimates the different types of pooling using demand data. This solution includes mapping shipment data to a lower dimension, where each poolability trait corresponds to a latent dimension. We tested different dimensionality reduction techniques and found that the best performing are the autoencoder models based on neural networks. Nevertheless, comparing shipments on the latent space turns out to be more challenging than expected, because distances in these latent dimensions are sometimes uncorrelated to the distances in the real shipment features. Although this limits the use cases of this approach, we still manage to build the full poolability tool that incorporates the autoencoders and uses metrics we designed to measure each poolability trait. This tool is then compared to a VRP software and proves to have close accuracy, while being much faster and explainable. / Att optimera transportresurser genom att gruppera varor som ska skickas tillsammans, även kallat poolning, spelar en avgörande roll för att spara kostnader i transportnätverk. Trots detta är det inte så enkelt att uppskatta poolning mellan olika grupper av försändelser eller förstå varför dessa grupper kan poolas. Den vanliga lösningen skulle vara att betrakta alla försändelser från båda grupperna som en enda enhet och använda mjukvara för att lösa problemet med fordonsschemaläggning (Vehicle Routing Problem, VRP) för att uppskatta kostnaderna för den nya sammanslagna gruppen. Detta medför dock vissa nackdelar, såsom höga beräkningskostnader och bristande förklarbarhet när det kommer till poolning. I detta arbete bygger vi ett verktyg som med hjälp av efterfrågedata uppskattar olika typer av poolning. Lösningen innefattar kartläggning av försändelsedata till en lägre dimension där varje egenskap för poolbarhet motsvarar en dold dimension. Vi testade olika tekniker för att minska dimensionerna och fann att de bäst presterande är autoencoder-modeller baserade på neurala nätverk. Trots detta visade det sig vara mer utmanande än förväntat att jämföra försändelser i det dolda rummet eftersom avstånden i dessa dolda dimensioner ibland inte korrelerar med avstånden i de faktiska försändelseegenskaperna. Trots att detta begränsar användningsområdena för denna metod lyckades vi ändå bygga ett komplett verktyg för poolbarhet som inkluderar autoencoders och använder metriker som vi har utformat för att mäta varje egenskap för poolbarhet. Detta verktyg jämförs sedan med en VRP-mjukvara och visar sig ha liknande noggrannhet samtidigt som det är betydligt snabbare och mer förklarligt. / Dedicare meno risorse di trasporto raggruppando insieme le merci da spedire, o creando un pool come lo chiamiamo noi, svolge un ruolo cruciale nel risparmio dei costi nelle reti di trasporto. Tuttavia, non è facile stimare il grado di aggregazione tra diversi gruppi di spedizioni o comprendere perché tali gruppi siano aggregabili. La soluzione tipica consisterebbe nel considerare tutte le spedizioni di entrambi i gruppi come una sola entità e utilizzare un software di Problema di Routing dei Veicoli (VRP) per stimare i costi del nuovo gruppo combinato. Tuttavia, ciò comporta alcuni svantaggi, come elevati costi computazionali e la mancanza di spiegazioni riguardo all'aggregazione. In questo lavoro abbiamo sviluppato uno strumento che stima i diversi tipi di aggregabilità utilizzando i dati di domanda. Questa soluzione prevede la mappatura dei dati delle spedizioni in una dimensione inferiore, in cui ciascuna caratteristica di aggregabilità corrisponde a una dimensione. Abbiamo testato diverse tecniche di riduzione dimensionale e abbiamo constatato che i modelli autoencoder basati su reti neurali sono i più efficaci. Tuttavia, confrontare le spedizioni nello spazio latente si è rivelato più complesso del previsto, poiché le distanze in queste dimensioni latenti talvolta non sono correlate alle distanze nelle caratteristiche reali delle spedizioni. Sebbene ciò limiti le applicazioni di questo approccio, siamo comunque riusciti a sviluppare uno strumento completo per l'aggregabilità che incorpora gli autoencoder e utilizza metriche da noi progettate per misurare ciascuna caratteristica di aggregabilità. Successivamente, abbiamo confrontato questo strumento con un software VRP e dimostrato che presenta un'accuratezza simile, pur essendo più veloce e fornendo spiegazioni chiare.
356

Accelerated Discovery of Multi-Principal Element Alloys and Wide Bandgap Semiconductors under Extreme Conditions

Saswat Mishra (19185079) 22 July 2024 (has links)
<p dir="ltr">Advancements in material science are accelerating technological evolution, driven by initiatives like the Materials Genome Project, which integrates computational and experi- mental strategies to expedite material discovery. In this work, we focus on the reliability of advanced materials under extreme conditions, a critical area for enhancing their technological applications.</p><p dir="ltr">Multi-principal component alloys (MPEAs) exhibit remarkable properties under extreme conditions. However, their vast compositional space makes a brute-force exploration of potential alloys prohibitive. We address this challenge by employing a Bayesian approach to explore the oxidation resistance of hundreds of alloys, applying computational techniques to accurately calculate and quantify errors in the melting temperatures of MPEAs, and investigating the compositional biases and short-range order in their nucleation behaviors.</p><p dir="ltr">Furthermore, we scrutinize the role of wide bandgap semiconductors, which are essential in high-power applications due to their superior breakdown voltage, drift velocity, and sheet charge density. The lack of lattice-matched substrates often results in strained films, which enhances piezoelectric effects crucial for device reliability. Our research advances the pre- diction of piezoelectric and dielectric responses as influenced by biaxial strain and doping in gallium nitride (GaN). Additionally, we delve into how various common defects affect the formation of trap states, significantly impacting the electronic properties of these materials. These studies offer significant advancements in understanding MPEAs and wide bandgap semiconductors under extreme conditions. We also provide foundational insights for developing robust and efficient materials essential for next-generation applications.</p>
357

Multi-defect detection in hardwood using AI on hyperspectral images

Ytterberg, Kalle January 2024 (has links)
With the evolution of GPU performance, the interest of using AI for all kinds of purposes has risen. Companies today put a great amount of resources to find new ways of using AI to increase the value of their products or automating processes. An area in the wood industry where AI is widely used and studied is in defect detection. In this thesis, the combination of using AI and hyperspectral images is studied and evaluated in the case of segmenting defects in hardwood with a U- Net network structure. The performance is compared to another known method usually used when dealing with high-dimensional data: PLS-DA. This thesis also compares the use of RGB image data in combination with AI, to further analyze the usefulness that the hyperspectral data provide. The results showed signs of improvement when using hyperspectral images com- pared to RGB images when detecting blue stain and red heartwood defects. De- tection of the defects rot and knots did however show no sign of improvements. Due to the annotations being more accurate in the RGB data, the results from the hyperspectral data-fed networks would suggest that blue stain and red heartwood could be of interest regarding further investigation. Computational performance is shown to vary across the different reduction meth- ods, and the results from this thesis provides some insight that might aid in the reasoning regarding how to choose an appropriate reduction method.
358

PHYSICS INFORMED MACHINE LEARNING METHODS FOR UNCERTAINTY QUANTIFICATION

Sharmila Karumuri (14226875) 17 May 2024 (has links)
<p>The need to carry out Uncertainty quantification (UQ) is ubiquitous in science and engineering. However, carrying out UQ for real-world problems is not straightforward and they require a lot of computational budget and resources. The objective of this thesis is to develop computationally efficient approaches based on machine learning to carry out UQ. Specifically, we addressed two problems.</p> <p><br></p> <p>The first problem is, it is difficult to carry out Uncertainty propagation (UP) in systems governed by elliptic PDEs with spatially varying uncertain fields in coefficients and boundary conditions. Here as we have functional uncertainties, the number of uncertain parameters is large. Unfortunately, in these situations to carry out UP we need to solve the PDE a large number of times to obtain convergent statistics of the quantity governed by the PDE. However, solving the PDE by a numerical solver repeatedly leads to a computational burden. To address this we proposed to learn the surrogate of the solution of the PDE in a data-free manner by utilizing the physics available in the form of the PDE. We represented the solution of the PDE as a deep neural network parameterized function in space and uncertain parameters. We introduced a physics-informed loss function derived from variational principles to learn the parameters of the network. The accuracy of the learned surrogate is validated against the corresponding ground truth estimate from the numerical solver. We demonstrated the merit of using our approach by solving UP problems and inverse problems faster than by using a standard numerical solver.</p> <p><br></p> <p>The second problem we focused on in this thesis is related to inverse problems. State of the art approach to solving inverse problems involves posing the inverse problem as a Bayesian inference task and estimating the distribution of input parameters conditioned on the observed data (posterior). Markov Chain Monte Carlo (MCMC) methods and variational inference methods provides us ways to estimate the posterior. However, these inference techniques need to be re-run whenever a new set of observed data is given leading to a computational burden. To address this, we proposed to learn a Bayesian inverse map i.e., the map from the observed data to the posterior. This map enables us to do on-the-fly inference. We demonstrated our approach by solving various examples and we validated the posteriors learned from our approach against corresponding ground truth posteriors from the MCMC method.</p>
359

Visualisierung und Analyse multivariater Daten in der gartenbaulichen Beratung -Methodik, Einsatz und Vergleich datenanalytischer Verfahren

Krusche, Stefan 16 December 1999 (has links)
Ausgangspunkt der vorliegenden Arbeit ist die Suche der gartenbaulichen Beratung nach Visualisierungsmöglichkeiten umfangreicher gartenbaulicher Datensätze, die einerseits zu einer graphischen Zusammenfassung der in den Daten enthaltenen Informationen dienen und die andererseits auf interaktivem Weg Möglichkeiten der graphischen Analyse von Erhebungsdaten liefern. Die weitgehende Freiheit von Modellannahmen, der überwiegend deskriptive Charakter der Untersuchungen, das interaktive, schrittweise Vorgehen in der Auswertung, und die Betonung graphischer Elemente kennzeichnet die Arbeit als Beitrag zur explorativen Datenanalyse. Das ausgewählte Methodenspektrum, das ausführlich besprochen wird, schließt Verfahren der Dimensionserniedrigung (Hauptkomponentenanalyse, Korrespondenzanalyse und mehrdimensionale Skalierung) und darauf aufbauende Biplots, die Analyse gruppierter Daten (Prokrustes-Rotation und Gruppenanalysemodelle in der Hauptkomponentenanalyse), Linienverbände (Liniendiagramme der formalen Begriffsanalyse, Baumdiagramme und graphische Modelle), sowie ergänzende graphische Verfahren, wie zum Beispiel Trellis-Displays, ein. Beispielhaft werden eine betriebsbegleitende Untersuchung mit Cyclamen aus der Beratungspraxis der Landwirtschaftskammer Westfalen-Lippe und die Kennzahlen der Jahre 1992 bis 1994 der Topfpflanzenbetriebe des Arbeitskreises für Betriebswirtschaft im Gartenbau aus Hannover analysiert. Neben einer Vielzahl informativer Einzelergebnisse, zeigt die Arbeit auch auf, daß die qualitativ relativ schlechten Datengrundlagen nur selten eindeutige Schlußfolgerungen zulassen. Sie sensibilisiert also in diesem Bereich für die Problematik, die der explorativen Analyse wenig perfekter Daten innewohnt. Als besonders sinnvolle Hilfsmittel in der graphischen Analyse erweisen sich Biplots, hierarchische Liniendiagramme und Trellis-Displays. Die Segmentierung einer Vielzahl von Objekten in einzelne Gruppen wird durch Klassifikations- und Regressionsbäume vor allem unter dem Gesichtspunkt der Visualisierung gut gelöst, da den entstehenden Baumstrukturen auch die die Segmente bestimmenden Variablen visuell entnommen werden können. Diskrete graphische Modelle bieten schließlich einen guten Ansatzpunkt zur Analyse von multivariaten Beziehungszusammenhängen. Einzelne, nicht in der statistischen Standardsoftware vorhandene Prozeduren sind in eigens erstellten Programmcodes zusammengefaßt und können mit dem Programm Genstat genutzt werden. / In order to interpret large data sets in the context of consultancy and extension in horticulture, this thesis attempts to find ways to visually explore horticultural multivariate data, in order to obtain a concise description and summary of the information available in the data and moreover develop possibilities to interactively analyse survey data. The thesis is part of an exploratory data analysis which analyses data without making specific model assumptions, is predominantly descriptive, analyses data step by step in a highly interactive setting, and makes full use of all kinds of graphical displays. The methods used comprise various dimensionality reduction techniques (principal components analysis, correspondence analysis, multidimensional scaling), biplots, the multivariate analysis of grouped data (procrustes rotation and groupwise principal components), graphical models, CART, and line diagrams of formal concept analysis. In addition, further graphical methods are used, like e.g. trellis displays. Data from an on-site investigation of the production process of Cyclamen in 20 nurseries and from the microeconomics indicators of 297 growers in Germany (so called Kennzahlen) from the years 1992 to 1994 are used to demonstrate the analytical capabilities of the methods used. The data present a perfect example of unperfect data, and therefore represent the majority of the data sets that horticultural consultancy has to work with. Thus, it becomes clear, that despite the variety of results, which helps to enhance the understanding of the data at hand, not only the complexity of the processes observed, but also the low data quality make it fairly difficult to arrive at clear cut conclusions. The most helpful tools in the graphical data analysis are biplots, hierarchical line diagrams and trellis displays. Finding an empirical grouping of objects is best solved by classification and regression trees, which provide both, the data segmentation, and an intuitively appealing visualisation and explanation of the derived groups. In order to understand multivariate relationships better, discrete graphical models are well suited. The procedures to carry out a number of the methods which cannot be found in general statistics packages are provided in the form of Genstat codes.
360

Efficient estimation using the characteristic function : theory and applications with high frequency data

Kotchoni, Rachidi 05 1900 (has links)
The attached file is created with Scientific Workplace Latex / Nous abordons deux sujets distincts dans cette thèse: l'estimation de la volatilité des prix d'actifs financiers à partir des données à haute fréquence, et l'estimation des paramétres d'un processus aléatoire à partir de sa fonction caractéristique. Le chapitre 1 s'intéresse à l'estimation de la volatilité des prix d'actifs. Nous supposons que les données à haute fréquence disponibles sont entachées de bruit de microstructure. Les propriétés que l'on prête au bruit sont déterminantes dans le choix de l'estimateur de la volatilité. Dans ce chapitre, nous spécifions un nouveau modèle dynamique pour le bruit de microstructure qui intègre trois propriétés importantes: (i) le bruit peut être autocorrélé, (ii) le retard maximal au delà duquel l'autocorrélation est nulle peut être une fonction croissante de la fréquence journalière d'observations; (iii) le bruit peut avoir une composante correlée avec le rendement efficient. Cette dernière composante est alors dite endogène. Ce modèle se différencie de ceux existant en ceci qu'il implique que l'autocorrélation d'ordre 1 du bruit converge vers 1 lorsque la fréquence journalière d'observation tend vers l'infini. Nous utilisons le cadre semi-paramétrique ainsi défini pour dériver un nouvel estimateur de la volatilité intégrée baptisée "estimateur shrinkage". Cet estimateur se présente sous la forme d'une combinaison linéaire optimale de deux estimateurs aux propriétés différentes, l'optimalité étant défini en termes de minimisation de la variance. Les simulations indiquent que l'estimateur shrinkage a une variance plus petite que le meilleur des deux estimateurs initiaux. Des estimateurs sont également proposés pour les paramètres du modèle de microstructure. Nous clôturons ce chapitre par une application empirique basée sur des actifs du Dow Jones Industrials. Les résultats indiquent qu'il est pertinent de tenir compte de la dépendance temporelle du bruit de microstructure dans le processus d'estimation de la volatilité. Les chapitres 2, 3 et 4 s'inscrivent dans la littérature économétrique qui traite de la méthode des moments généralisés. En effet, on rencontre en finance des modèles dont la fonction de vraisemblance n'est pas connue. On peut citer en guise d'exemple la loi stable ainsi que les modèles de diffusion observés en temps discrets. Les méthodes d'inférence basées sur la fonction caractéristique peuvent être envisagées dans ces cas. Typiquement, on spécifie une condition de moment basée sur la différence entre la fonction caractéristique (conditionnelle) théorique et sa contrepartie empirique. Le défit ici est d'exploiter au mieux le continuum de conditions de moment ainsi spécifié pour atteindre la même efficacité que le maximum de vraisemblance dans les inférences. Ce défit a été relevé par Carrasco et Florens (2000) qui ont proposé la procédure CGMM (continuum GMM). La fonction objectif que ces auteurs proposent est une forme quadratique hilbertienne qui fait intervenir l'opérateur inverse de covariance associé au continuum de condition de moments. Cet opérateur inverse est régularisé à la Tikhonov pour en assurer l'existence globale et la continuité. Carrasco et Florens (2000) ont montré que l'estimateur obtenu en minimisant cette forme quadratique est asymptotiquement aussi efficace que l'estimateur du maximum de vraisemblance si le paramètre de régularisation (α) tend vers zéro lorsque la taille de l'échatillon tend vers l'infini. La nature de la fonction objectif du CGMM soulève deux questions importantes. La première est celle de la calibration de α en pratique, et la seconde est liée à la présence d'intégrales multiples dans l'expression de la fonction objectif. C'est à ces deux problématiques qu'essayent de répondent les trois derniers chapitres de la présente thèse. Dans le chapitre 2, nous proposons une méthode de calibration de α basée sur la minimisation de l'erreur quadratique moyenne (EQM) de l'estimateur. Nous suivons une approche similaire à celle de Newey et Smith (2004) pour calculer un développement d'ordre supérieur de l'EQM de l'estimateur CGMM de sorte à pouvoir examiner sa dépendance en α en échantillon fini. Nous proposons ensuite deux méthodes pour choisir α en pratique. La première se base sur le développement de l'EQM, et la seconde se base sur des simulations Monte Carlo. Nous montrons que la méthode Monte Carlo délivre un estimateur convergent de α optimal. Nos simulations confirment la pertinence de la calibration de α en pratique. Le chapitre 3 essaye de vulgariser la théorie du chapitre 2 pour les modèles univariés ou bivariés. Nous commençons par passer en revue les propriétés de convergence et de normalité asymptotique de l'estimateur CGMM. Nous proposons ensuite des recettes numériques pour l'implémentation. Enfin, nous conduisons des simulations Monte Carlo basée sur la loi stable. Ces simulations démontrent que le CGMM est une méthode fiable d'inférence. En guise d'application empirique, nous estimons par CGMM un modèle de variance autorégressif Gamma. Les résultats d'estimation confirment un résultat bien connu en finance: le rendement est positivement corrélé au risque espéré et négativement corrélé au choc sur la volatilité. Lorsqu'on implémente le CGMM, une difficulté majeure réside dans l'évaluation numérique itérative des intégrales multiples présentes dans la fonction objectif. Les méthodes de quadrature sont en principe parmi les plus précises que l'on puisse utiliser dans le présent contexte. Malheureusement, le nombre de points de quadrature augmente exponentiellement en fonction de la dimensionalité (d) des intégrales. L'utilisation du CGMM devient pratiquement impossible dans les modèles multivariés et non markoviens où d≥3. Dans le chapitre 4, nous proposons une procédure alternative baptisée "reéchantillonnage dans le domaine fréquentielle" qui consiste à fabriquer des échantillons univariés en prenant une combinaison linéaire des éléments du vecteur initial, les poids de la combinaison linéaire étant tirés aléatoirement dans un sous-espace normalisé de ℝ^{d}. Chaque échantillon ainsi généré est utilisé pour produire un estimateur du paramètre d'intérêt. L'estimateur final que nous proposons est une combinaison linéaire optimale de tous les estimateurs ainsi obtenus. Finalement, nous proposons une étude par simulation et une application empirique basées sur des modèles autorégressifs Gamma. Dans l'ensemble, nous faisons une utilisation intensive du bootstrap, une technique selon laquelle les propriétés statistiques d'une distribution inconnue peuvent être estimées à partir d'un estimé de cette distribution. Nos résultats empiriques peuvent donc en principe être améliorés en faisant appel aux connaissances les plus récentes dans le domaine du bootstrap. / In estimating the integrated volatility of financial assets using noisy high frequency data, the time series properties assumed for the microstructure noise determines the proper choice of the volatility estimator. In the first chapter of the current thesis, we propose a new model for the microstructure noise with three important features. First of all, our model assumes that the noise is L-dependent. Secondly, the memory lag L is allowed to increase with the sampling frequency. And thirdly, the noise may include an endogenous part, that is, a piece that is correlated with the latent returns. The main difference between this microstructure model and existing ones is that it implies a first order autocorrelation that converges to 1 as the sampling frequency goes to infinity. We use this semi-parametric model to derive a new shrinkage estimator for the integrated volatility. The proposed estimator makes an optimal signal-to-noise trade-off by combining a consistent estimators with an inconsistent one. Simulation results show that the shrinkage estimator behaves better than the best of the two combined ones. We also propose some estimators for the parameters of the noise model. An empirical study based on stocks listed in the Dow Jones Industrials shows the relevance of accounting for possible time dependence in the noise process. Chapters 2, 3 and 4 pertain to the generalized method of moments based on the characteristic function. In fact, the likelihood functions of many financial econometrics models are not known in close form. For example, this is the case for the stable distribution and a discretely observed continuous time model. In these cases, one may estimate the parameter of interest by specifying a moment condition based on the difference between the theoretical (conditional) characteristic function and its empirical counterpart. The challenge is then to exploit the whole continuum of moment conditions hence defined to achieve the maximum likelihood efficiency. This problem has been solved in Carrasco and Florens (2000) who propose the CGMM procedure. The objective function of the CGMM is a quadrqtic form on the Hilbert space defined by the moment function. That objective function depends on a Tikhonov-type regularized inverse of the covariance operator associated with the moment function. Carrasco and Florens (2000) have shown that the estimator obtained by minimizing the proposed objective function is asymptotically as efficient as the maximum likelihood estimator provided that the regularization parameter (α) converges to zero as the sample size goes to infinity. However, the nature of this objective function raises two important questions. First of all, how do we select α in practice? And secondly, how do we implement the CGMM when the multiplicity (d) of the integrals embedded in the objective-function d is large. These questions are tackled in the last three chapters of the thesis. In Chapter 2, we propose to choose α by minimizing the approximate mean square error (MSE) of the estimator. Following an approach similar to Newey and Smith (2004), we derive a higher-order expansion of the estimator from which we characterize the finite sample dependence of the MSE on α. We provide two data-driven methods for selecting the regularization parameter in practice. The first one relies on the higher-order expansion of the MSE whereas the second one uses only simulations. We show that our simulation technique delivers a consistent estimator of α. Our Monte Carlo simulations confirm the importance of the optimal selection of α. The goal of Chapter 3 is to illustrate how to efficiently implement the CGMM for d≤2. To start with, we review the consistency and asymptotic normality properties of the CGMM estimator. Next we suggest some numerical recipes for its implementation. Finally, we carry out a simulation study with the stable distribution that confirms the accuracy of the CGMM as an inference method. An empirical application based on the autoregressive variance Gamma model led to a well-known conclusion: investors require a positive premium for bearing the expected risk while a negative premium is attached to the unexpected risk. In implementing the characteristic function based CGMM, a major difficulty lies in the evaluation of the multiple integrals embedded in the objective function. Numerical quadratures are among the most accurate methods that can be used in the present context. Unfortunately, the number of quadrature points grows exponentially with d. When the data generating process is Markov or dependent, the accurate implementation of the CGMM becomes roughly unfeasible when d≥3. In Chapter 4, we propose a strategy that consists in creating univariate samples by taking a linear combination of the elements of the original vector process. The weights of the linear combinations are drawn from a normalized set of ℝ^{d}. Each univariate index generated in this way is called a frequency domain bootstrap sample that can be used to compute an estimator of the parameter of interest. Finally, all the possible estimators obtained in this fashion can be aggregated to obtain the final estimator. The optimal aggregation rule is discussed in the paper. The overall method is illustrated by a simulation study and an empirical application based on autoregressive Gamma models. This thesis makes an extensive use of the bootstrap, a technique according to which the statistical properties of an unknown distribution can be estimated from an estimate of that distribution. It is thus possible to improve our simulations and empirical results by using the state-of-the-art refinements of the bootstrap methodology.

Page generated in 0.0844 seconds