• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 34
  • 9
  • 7
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 65
  • 65
  • 26
  • 15
  • 11
  • 11
  • 10
  • 9
  • 9
  • 8
  • 8
  • 8
  • 7
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Estimation robuste de courbes de consommmation électrique moyennes par sondage pour de petits domaines en présence de valeurs manquantes / Robust estimation of mean electricity consumption curves by sampling for small areas in presence of missing values

De Moliner, Anne 05 December 2017 (has links)
Dans cette thèse, nous nous intéressons à l'estimation robuste de courbes moyennes ou totales de consommation électrique par sondage en population finie, pour l'ensemble de la population ainsi que pour des petites sous-populations, en présence ou non de courbes partiellement inobservées.En effet, de nombreuses études réalisées dans le groupe EDF, que ce soit dans une optique commerciale ou de gestion du réseau de distribution par Enedis, se basent sur l'analyse de courbes de consommation électrique moyennes ou totales, pour différents groupes de clients partageant des caractéristiques communes. L'ensemble des consommations électriques de chacun des 35 millions de clients résidentiels et professionnels Français ne pouvant être mesurées pour des raisons de coût et de protection de la vie privée, ces courbes de consommation moyennes sont estimées par sondage à partir de panels. Nous prolongeons les travaux de Lardin (2012) sur l'estimation de courbes moyennes par sondage en nous intéressant à des aspects spécifiques de cette problématique, à savoir l'estimation robuste aux unités influentes, l'estimation sur des petits domaines, et l'estimation en présence de courbes partiellement ou totalement inobservées.Pour proposer des estimateurs robustes de courbes moyennes, nous adaptons au cadre fonctionnel l'approche unifiée d'estimation robuste en sondages basée sur le biais conditionnel proposée par Beaumont (2013). Pour cela, nous proposons et comparons sur des jeux de données réelles trois approches : l'application des méthodes usuelles sur les courbes discrétisées, la projection sur des bases de dimension finie (Ondelettes ou Composantes Principales de l'Analyse en Composantes Principales Sphériques Fonctionnelle en particulier) et la troncature fonctionnelle des biais conditionnels basée sur la notion de profondeur d'une courbe dans un jeu de données fonctionnelles. Des estimateurs d'erreur quadratique moyenne instantanée, explicites et par bootstrap, sont également proposés.Nous traitons ensuite la problématique de l'estimation sur de petites sous-populations. Dans ce cadre, nous proposons trois méthodes : les modèles linéaires mixtes au niveau unité appliqués sur les scores de l'Analyse en Composantes Principales ou les coefficients d'ondelettes, la régression fonctionnelle et enfin l'agrégation de prédictions de courbes individuelles réalisées à l'aide d'arbres de régression ou de forêts aléatoires pour une variable cible fonctionnelle. Des versions robustes de ces différents estimateurs sont ensuite proposées en déclinant la démarche d'estimation robuste basée sur les biais conditionnels proposée précédemment.Enfin, nous proposons quatre estimateurs de courbes moyennes en présence de courbes partiellement ou totalement inobservées. Le premier est un estimateur par repondération par lissage temporel non paramétrique adapté au contexte des sondages et de la non réponse et les suivants reposent sur des méthodes d'imputation. Les portions manquantes des courbes sont alors déterminées soit en utilisant l'estimateur par lissage précédemment cité, soit par imputation par les plus proches voisins adaptée au cadre fonctionnel ou enfin par une variante de l'interpolation linéaire permettant de prendre en compte le comportement moyen de l'ensemble des unités de l'échantillon. Des approximations de variance sont proposées dans chaque cas et l'ensemble des méthodes sont comparées sur des jeux de données réelles, pour des scénarios variés de valeurs manquantes. / In this thesis, we address the problem of robust estimation of mean or total electricity consumption curves by sampling in a finite population for the entire population and for small areas. We are also interested in estimating mean curves by sampling in presence of partially missing trajectories.Indeed, many studies carried out in the French electricity company EDF, for marketing or power grid management purposes, are based on the analysis of mean or total electricity consumption curves at a fine time scale, for different groups of clients sharing some common characteristics.Because of privacy issues and financial costs, it is not possible to measure the electricity consumption curve of each customer so these mean curves are estimated using samples. In this thesis, we extend the work of Lardin (2012) on mean curve estimation by sampling by focusing on specific aspects of this problem such as robustness to influential units, small area estimation and estimation in presence of partially or totally unobserved curves.In order to build robust estimators of mean curves we adapt the unified approach to robust estimation in finite population proposed by Beaumont et al (2013) to the context of functional data. To that purpose we propose three approaches : application of the usual method for real variables on discretised curves, projection on Functional Spherical Principal Components or on a Wavelets basis and thirdly functional truncation of conditional biases based on the notion of depth.These methods are tested and compared to each other on real datasets and Mean Squared Error estimators are also proposed.Secondly we address the problem of small area estimation for functional means or totals. We introduce three methods: unit level linear mixed model applied on the scores of functional principal components analysis or on wavelets coefficients, functional regression and aggregation of individual curves predictions by functional regression trees or functional random forests. Robust versions of these estimators are then proposed by following the approach to robust estimation based on conditional biais presented before.Finally, we suggest four estimators of mean curves by sampling in presence of partially or totally unobserved trajectories. The first estimator is a reweighting estimator where the weights are determined using a temporal non parametric kernel smoothing adapted to the context of finite population and missing data and the other ones rely on imputation of missing data. Missing parts of the curves are determined either by using the smoothing estimator presented before, or by nearest neighbours imputation adapted to functional data or by a variant of linear interpolation which takes into account the mean trajectory of the entire sample. Variance approximations are proposed for each method and all the estimators are compared to each other on real datasets for various missing data scenarios.
62

Méthodes de modélisation statistique de la durée de vie des composants en génie électrique / Statistical methods for the lifespan modeling of electrical engineering components

Salameh, Farah 07 November 2016 (has links)
La fiabilité constitue aujourd’hui un enjeu important dans le contexte du passage aux systèmes plus électriques dans des secteurs critiques tels que l’aéronautique, l’espace ou le nucléaire. Il s’agit de comprendre, de modéliser et de prédire les mécanismes de vieillissement susceptibles de conduire les composants à la défaillance et le système à la panne. L’étude des effets des contraintes opérationnelles sur la dégradation des composants est indispensable pour la prédiction de leur durée de vie. De nombreux modèles de durée de vie ont été développés dans la littérature dans le contexte du génie électrique. Cependant, ces modèles présentent des limitations car ils dépendent du matériau étudié et de ses propriétés physiques et se restreignent souvent à un ou deux facteurs de stress, sans intégrer les interactions pouvant exister entre ces facteurs. Cette thèse présente une nouvelle méthodologie pour la modélisation de la durée de vie des composants du génie électrique. Cette méthodologie est générale ; elle s’applique à différents composants sans a priori sur leurs propriétés physiques. Les modèles développés sont des modèles statistiques estimés sur la base de données expérimentales issues de tests de vieillissement accéléré où plusieurs types de stress sont considérés. Les modèles visent alors à étudier les effets des différents facteurs de stress ainsi que de leurs différentes interactions. Le nombre et la configuration des tests de vieillissement nécessaires à construire les modèles (bases d’apprentissage) sont optimisés de façon à minimiser le coût expérimental tout en maximisant la précision des modèles. Des points expérimentaux supplémentaires aléatoirement configurés sont réalisés pour valider les modèles (bases de test). Deux catégories de composants sont testées : deux types d’isolants couramment utilisés dans les machines électriques et des sources de lumière OLED. Différentes formes des modèles de durée de vie sont présentées : les modèles paramétriques, non paramétriques et les modèles hybrides. Tous les modèles développés sont évalués à l’aide de différents outils statistiques permettant, d’une part, d’étudier la pertinence des modèles et d’autre part, d’évaluer leur prédictibilité sur les points des bases de test. Les modèles paramétriques permettent de quantifier les effets des facteurs et de leurs interactions sur la durée de vie à partir d’une expression analytique prédéfinie. Un test statistique permet ensuite d’évaluer la significativité de chacun des paramètres inclus dans le modèle. Ces modèles sont caractérisés par une bonne qualité de prédiction sur leurs bases de test. La relation entre la durée de vie et les contraintes est également modélisée par les arbres de régression comme méthode alternative aux modèles paramétriques. Les arbres de régression sont des modèles non paramétriques qui permettent de classifier graphiquement les points expérimentaux en différentes zones dans lesquelles les contraintes sont hiérarchisées selon leurs effets sur la durée de vie. Ainsi, une relation simple, graphique, et directe entre la durée de vie et les contraintes est obtenue. Cependant, à la différence des modèles paramétriques continus sur le domaine expérimental étudié, les arbres de régression sont constants par morceaux, ce qui dégrade leur qualité de prédiction sur la base de test. Pour remédier à cet inconvénient, une troisième approche consiste à attribuer un modèle linéaire à chacune des zones identifiées avec les arbres de régression. Le modèle résultant, dit modèle hybride, est donc linéaire par morceaux et permet alors de raffiner les modèles paramétriques en évaluant les effets des facteurs dans chacune des zones tout en améliorant la qualité de prédiction des arbres de régression. / Reliability has become an important issue nowadays since the most critical industries such as aeronautics, space and nuclear are moving towards the design of more electrical based systems. The objective is to understand, model and predict the aging mechanisms that could lead to component and system failure. The study of the operational constraints effects on the degradation of the components is essential for the prediction of their lifetime. Numerous lifespan models have been developed in the literature in the field of electrical engineering. However, these models have some limitations: they depend on the studied material and its physical properties, they are often restricted to one or two stress factors and they do not integrate interactions that may exist between these factors. This thesis presents a new methodology for the lifespan modeling of electrical engineering components. This methodology is general; it is applicable to various components without prior information on their physical properties. The developed models are statistical models estimated on experimental data obtained from accelerated aging tests where several types of stress factors are considered. The models aim to study the effects of the different stress factors and their different interactions. The number and the configuration of the aging tests needed to construct the models (learning sets) are optimized in order to minimize the experimental cost while maximizing the accuracy of the models. Additional randomly configured experiments are carried out to validate the models (test sets). Two categories of components are tested: two types of insulation materials that are commonly used in electrical machines and OLED light sources. Different forms of lifespan models are presented: parametric, non-parametric and hybrid models. Models are evaluated using different statistical tools in order to study their relevance and to assess their predictability on the test set points. Parametric models allow to quantify the effects of stress factors and their interactions on the lifespan through a predefined analytical expression. Then a statistical test allows to assess the significance of each parameter in the model. These models show a good prediction quality on their test sets. The relationship between the lifespan and the constraints is also modeled by regression trees as an alternative method to parametric models. Regression trees are non-parametric models that graphically classify experimental points into different zones where the constraints are hierarchized according to their effects on the lifespan. Thus, a simple, graphic and direct relationship between the lifespan and the stress factors is obtained. However, unlike parametric models that are continuous in the studied experimental domain, regression trees are piecewise constant, which degrades their predictive quality with respect to parametric models. To overcome this disadvantage, a third approach consists in assigning a linear model to each of the zones identified with regression trees. The resulting model, called hybrid model, is piecewise linear. It allows to refine parametric models by evaluating the effects of the factors in each of the zones while improving the prediction quality of regression trees.
63

Estimating the load weight of freight trains using machine learning

Kongpachith, Erik January 2023 (has links)
Accurate estimation of the load weight of freight trains is crucial for ensuring safe, efficient and sustainable rail freight transports. Traditional methods for estimating load weight often suffer from limitations in accuracy and efficiency. In recent years, machine learning algorithms have gained significant attention and use cases within the railway industry due to their strong predictive capabilities for classification and regression tasks. This study aims to present a proof of concept in the form of a comparative analysis of five machine learning regression algorithms: Polynomial Regression, K-Nearest Neighbors, Regression Trees, Random Forest Regression, and Support Vector Regression for estimating the load weight of freight trains using simulation data. The study utilizes two comprehensive datasets derived from train simulations in GENSYS, a simulation software for modeling rail vehicles. The datasets encompasses various driving condition factors such as train speed, track conditions and running gear configurations. The algorithms are trained and evaluated on these datasets and their performance is evaluated based on the root mean squared error and R2 metrics. Results from the experiments demonstrate that all five machine learning algorithms show promising performance for estimating the load weight. Polynomial regression achieves the best result for both of the datasets when using many features of the datasets are considered. Random forest regression achieves the best result for both of the data sets when a small number features of the datasets are considered. Furthermore, it is suggested that the methodical approach of this study is examined on real world data from operating freight trains to assert the proof of concept in a real world setting. / Noggrann uppskattning av godstågens lastvikt är avgörande för att säkerställa säkra, effektiva och hållbara godstransporter via järnväg. Traditionella metoder för att uppskatta lastvikt lider ofta av begränsningar i noggrannhet och effektivitet. Under de senaste åren har maskininlärningsalgoritmer fått betydande uppmärksamhet och användningsfall inom järnvägsindustrin på grund av deras starka prediktiva förmåga för klassificerings- och regressionsproblem. Denna studie syftar till att presentera en proof of concept i form av en jämförande analys av fem maskininlärningalgoritmer för regression: Polynom regression, K-Nearest Neighbors, Regression träd, Random Forest Regression och Support Vector Regression för att uppskatta lastvikten för godståg med hjälp av simuleringsdata. Studien använder två omfattande dataset konstruerade från tågsimuleringar i GENSYS, en simuleringsprogramvara för modellering av järnvägsfordon. Dataseten omfattar olika körfaktorer såsom tåghastighet, spårförhållanden och vagns konfigurationer. Algoritmerna tränas och utvärderas på dessa dataset och deras prestanda utvärderas baserat på root mean squared error och R2 måtten. Resultat från experimenten visar att alla fem maskininlärningsalgoritmerna visar lovande prestanda för att uppskatta lastvikten. Polynom regression uppnår det bästa resultatet för båda dataset när många variabler i datan beaktas. Random Forest Regression ger det bästa resultatet för båda dataset när ett mindre antal variabler i datan beaktas. Det föreslås det att det metodiska tillvägagångssättet för denna studie undersöks på verklig data från aktiva godståg för att fastställa en proof of concept på en verklig världsbild.
64

[pt] DESENVOLVIMENTO DE MODELOS PARA PREVISÃO DE QUALIDADE DE SISTEMAS DE RECONHECIMENTO DE VOZ / [en] DEVELOPMENT OF PREDICTION MODELS FOR THE QUALITY OF SPOKEN DIALOGUE SYSTEMS

BERNARDO LINS DE ALBUQUERQUE COMPAGNONI 12 November 2021 (has links)
[pt] Spoken Dialogue Systems (SDS s) são sistemas baseados em computadores desenvolvidos para fornecerem informações e realizar tarefas utilizando o diálogo como forma de interação. Eles são capazes de reconhecimento de voz, interpretação, gerenciamento de diálogo e são capazes de ter uma voz como saída de dados, tentando reproduzir uma interação natural falada entre um usuário humano e um sistema. SDS s provém diferentes serviços, todos através de linguagem falada com um sistema. Mesmo com todo o desenvolvimento nesta área, há escassez de informações sobre como avaliar a qualidade de tais sistemas com o propósito de otimização do mesmo. Com dois destes sistemas, BoRIS e INSPIRE, usados para reservas de restaurantes e gerenciamento de casas inteligentes, diversos experimentos foram conduzidos no passado, onde tais sistemas foram utilizados para resolver tarefas específicas. Os participantes avaliaram a qualidade do sistema em uma série de questões. Além disso, todas as interações foram gravadas e anotadas por um especialista.O desenvolvimento de métodos para avaliação de performance é um tópico aberto de pesquisa na área de SDS s. Seguindo a idéia do modelo PARADISE (PARAdigm for DIalogue System Evaluation – desenvolvido pro Walker e colaboradores na AT&T em 1998), diversos experimentos foram conduzidos para desenvolver modelos de previsão de performance de sistemas de reconhecimento de voz e linguagem falada. O objetivo desta dissertação de mestrado é desenvolver modelos que permitam a previsão de dimensões de qualidade percebidas por um usuário humano, baseado em parâmetros instrumentalmente mensuráveis utilizando dados coletados nos experimentos realizados com os sistemas BoRIS e INSPIRE , dois sistemas de reconhecimento de voz (o primeiro para busca de restaurantes e o segundo para Smart Homes). Diferentes algoritmos serão utilizados para análise (Regressão linear, Árvores de Regressão, Árvores de Classificação e Redes Neurais) e para cada um dos algoritmos, uma ferramenta diferente será programada em MATLAB, para poder servir de base para análise de experimentos futuros, sendo facilmente modificado para sistemas e parâmetros novos em estudos subsequentes.A idéia principal é desenvolver ferramentas que possam ajudar na otimização de um SDS sem o envolvimento direto de um usuário humano ou servir de ferramenta para estudos futuros na área. / [en] Spoken Dialogue Systems (SDS s) are computer-based systems developed to provide information and carry out tasks using speech as the interaction mode. They are capable of speech recognition, interpretation, management of dialogue and have speech output capabilities, trying to reproduce a more or less natural spoken interaction between a human user and the system. SDS s provide several different services, all through spoken language. Even with all this development, there is scarcity of information on ways to assess and evaluate the quality of such systems with the purpose of optimization. With two of these SDS s ,BoRIS and INSPIRE, (used for Restaurant Booking Services and Smart Home Systems), extensive experiments were conducted in the past, where the systems were used to resolve specific tasks. The evaluators rated the quality of the system on a multitude of scales. In addition to that, the interactions were recorded and annotated by an expert. The development of methods for performance evaluation is an open research issue in this area of SDS s. Following the idea of the PARADISE model (PARAdigm for DIalogue System Evaluation model, the most well-known model for this purpose (developed by Walker and co-workers at AT&T in 1998), several experiments were conducted to develop predictive models of spoken dialogue performance. The objective of this dissertation is to develop and assess models which allow the prediction of quality dimensions as perceived by the human user, based on instrumentally measurable variables using all the collected data from the BoRIS and INSPIRE systems. Different types of algorithms will be compared to their prediction performance and to how generic they are. Four different approaches will be used for these analyses: Linear regression, Regression Trees, Classification Trees and Neural Networks. For each of these methods, a different tool will be programmed using MATLAB, that can carry out all experiments from this work and be easily modified for new experiments with data from new systems or new variables on future studies. All the used MATLAB programs will be made available on the attached CD with an operation manual for future users as well as a guide to modify the existing programs to work on new data. The main idea is to develop tools that would help on the optimization of a spoken dialogue system without a direct involvement of the human user or serve as tools for future studies in this area.
65

Vývoj moderních akustických parametrů kvantifikujících hypokinetickou dysartrii / Development of modern acoustic features quantifying hypokinetic dysarthria

Kowolowski, Alexander January 2019 (has links)
This work deals with designing and testing of new acoustic features for analysis of dysprosodic speech occurring in hypokinetic dysarthria patients. 41 new features for dysprosody quantification (describing melody, loudness, rhythm and pace) are presented and tested in this work. New features can be divided into 7 groups. Inside the groups, features vary by the used statistical values. First four groups are based on absolute differences and cumulative sums of fundamental frequency and short-time energy of the signal. Fifth group contains features based on multiples of this fundamental frequency and short-time energy combined into one global intonation feature. Sixth group contains global time features, which are made of divisions between conventional rhythm and pace features. Last group contains global features for quantification of whole dysprosody, made of divisions between global intonation and global time features. All features were tested on Czech Parkinsonian speech database PARCZ. First, kernel density estimation was made and plotted for all features. Then correlation analysis with medicinal metadata was made, first for all the features, then for global features only. Next classification and regression analysis were made, using classification and regression trees algorithm (CART). This analysis was first made for all the features separately, then for all the data at once and eventually a sequential floating feature selection was made, to find out the best fitting combination of features for the current matter. Even though none of the features emerged as a universal best, there were a few features, that were appearing as one of the best repeatedly and also there was a trend that there was a bigger drop between the best and the second best feature, marking it as a much better feature for the given matter, than the rest of the tested. Results are included in the conclusion together with the discussion.

Page generated in 0.0664 seconds