Global ETD Search

1	Multi-layer perceptron networks for ordinal data analysis : order independent online learning by sequential estimation / Dlugosz, Stephan. January 2008 (has links) Zugl.: Münster (Westfalen), University, Diss., 2008.
2	Développement des indicateurs de la qualité de vie urbaine à l'aide de la télédétection à très haute résolution spatiale cas de la ville de Hanoi Pham, Thi Thanh Hien January 2010 (has links) In studies of urban quality of life, the information that can be extracted from satellite images is limited by image resolution and by the standard method of pixel classification. Recently, very high spatial resolution (VHSR) satellite images have allowed the development of new remote sensing application, especially for complex urban areas. Despite of the numerous advantages of the object-oriented approach for VHSR image processing, the parameters used to carry it out, especially at the object creation stage, are not very well documented. Moreover, the evaluation of urban quality of life has never considered the perception of inhabitants of the zones under study. This dissertation therefore addresses these two issues and aims 1) at testing a systematic ways of achieving the best parameters for object-oriented classification with the software Definiens and 2) at quantifying the relation between objective indicators and perceived satisfaction. Hoàn Kiém district, in Hanoi, Vietnam, was chosen as our zone of interest. The image used for this study is a 0,7m spatial resolution Quickbird image.In the first part of the dissertation, we identify eight land occupation classes on the image: lakes, river, parks, groups of trees along streets, isolated trees, large road and residential blocks. Using these classes and additional cartographic information, we calculate nine quality of life indicators that correspond to two central aspects of urban life: commodity (urban services) and amenity (urban landscape). For each group of indicators, we carried out a principal components analysis to obtain non-correlated components. We then conducted a survey with eight city planning experts who live and work in the zone under study to obtain an assessment of the satisfaction of inhabitants towards their area of residence. The weight of each component in the determination of quality of life was achieved through an ordinal regression whose independent variables are the components and the dependent variable is the level of satisfaction as evaluated by the experts. The weights were then used to interpret the importance of our indicators for quality of life. Our results show that it is possible to classify land occupation types with a good accuracy: our average accuracy rate is 80.5%. As for the weight of quality of life indicators, our results allow us to make methodological and interpretative contributions. Contrary to previous work, our method allows us to evaluate the explanatory power of our model. Our regression shows that 22% of variation in satisfaction towards commodity and nearly 54% of variation in satisfaction towards amenity can be attributed to our indicators. As for the nature of the factors playing a role in quality of life, our results show that the relation between indicators and perceived satisfaction is not linear, which had never been shown in previous studies. Satisfaction towards commodity increases when transportation and health care are both sufficient. Satisfaction towards amenity is on the other hand largely determined by residential space, while vegetation plays a minor role, contrary to what was found in the urban zones of developed countries. Régression ordinale Milieu urbain Qualité de vie Images de très haute résolution
3	Contributions statistiques à l'analyse de mégadonnées publiques / Statical contributions to the analysis of public big data Sainct, Benoît 12 June 2018 (has links) L'objectif de cette thèse est de proposer un ensemble d'outils méthodologiques pour répondre à deux problématiques : la prédiction de masse salariale des collectivités, et l'analyse de leurs données de fiscalité. Pour la première, les travaux s'articulent à nouveau autour de deux thèmes statistiques : la sélection de modèle de série temporelle, et l'analyse de données fonctionnelles. Du fait de la complexité des données et des fortes contraintes de temps de calcul, un rassemblement de l'information a été privilégié. Nous avons utilisé en particulier l'Analyse en Composantes Principales Fonctionnelle et un modèle de mélanges gaussiens pour faire de la classification non-supervisée des profils de rémunération. Ces méthodes ont été appliquées dans deux prototypes d'outils qui représentent l'une des réalisations de cette thèse. Pour la seconde problématique, le travail a été effectué en trois temps : d'abord, des méthodes novatrices de classification d'une variable cible ordinale ont été comparées sur des données publiques déjà analysées dans la littérature, notamment en exploitant des forêts aléatoires, des SVM et du gradient boosting. Ensuite, ces méthodes ont été adaptées à la détection d'anomalies dans un contexte ciblé, ordinal, non supervisé et non paramétrique, et leur efficacité a été principalement comparée sur des jeux de données synthétiques. C'est notre forêt aléatoire ordinale par séparation de classes qui semble présenter le meilleur résultat. Enfin, cette méthode a été appliquée sur des données réelles de bases fiscales, où les soucis de taille et de complexité des données sont plus importants. Destinée aux directions des collectivités territoriales, cette nouvelle approche de l'examen de leur base de données constitue le second aboutissement de ces travaux de thèse. / The aim of this thesis is to provide a set of methodological tools to answer two problems: the prediction of the payroll of local authorities, and the analysis of their tax data. For the first, the work revolves around two statistical themes: the selection of time series model, and the analysis of functional data. Because of the complexity of the data and the heavy computation time constraints, a clustering approach has been favored. In particular, we used Functional Principal Component Analysis and a model of Gaussian mixtures to achieve unsupervised classification. These methods have been applied in two prototypes of tools that represent one of the achievements of this thesis. For the second problem, the work was done in three stages: first, innovative methods for classifying an ordinal target variable were compared on public data, notably by exploiting random forests, SVM and gradient boosting. Then, these methods were adapted to outlier detection in a targeted, ordinal, unsupervised and non-parametric context, and their efficiency was mainly compared on synthetic datasets. It is our ordinal random forest by class separation that seems to have the best result. Finally, this method has been applied to real data of tax bases, where the concerns of size and complexity are more important. Aimed at local authorities directorates, this new approach to examining their database is the second outcome of this work. Détection d'anomalies Classification Forêt aléatoire SVM XGBoost Variable ordinale
4	Analysis of systematic and random differences between paired ordinal categorical data / Svensson, Elisabeth. January 1993 (has links) Thesis (doctoral)--Göteborgs Universitet, 1993. / Errata sheet laid in. Includes bibliographical references.
5	Feature extraction and supervised learning on fMRI : from practice to theory / Estimation de variables et apprentissage supervisé en IRMf : de la pratique à la théorie Pedregosa-Izquierdo, Fabian 20 February 2015 (has links) Jusqu'à l'avènement de méthodes de neuroimagerie non invasives les connaissances du cerveau sont acquis par l'étude de ses lésions, des analyses post-mortem et expérimentations invasives. De nos jours, les techniques modernes d'imagerie telles que l'IRMf sont capables de révéler plusieurs aspects du cerveau humain à une résolution spatio-temporelle progressivement élevé. Cependant, afin de pouvoir répondre à des questions neuroscientifiques de plus en plus complexes, les améliorations techniques dans l'acquisition doivent être jumelés à de nouvelles méthodes d'analyse des données. Dans cette thèse, je propose différentes applications de l'apprentissage statistique au traitement des données d'IRMf. Souvent, les données acquises par le scanner IRMf suivent une étape de sélection de variables dans lequel les cartes d'activation sont extraites du signal IRMf. La première contribution de cette thèse est l'introduction d'un modèle nommé Rank-1 GLM (R1-GLM) pour l'estimation jointe des cartes d'activation et de la fonction de réponse hémodynamique (HRF). Nous quantifions l'amélioration de cette approche par rapport aux procédures existantes sur différents jeux de données IRMf. La deuxième partie de cette thèse est consacrée au problème de décodage en IRMf, ce est à dire, la tâche de prédire quelques informations sur les stimuli à partir des cartes d'activation du cerveau. D'un point de vue statistique, ce problème est difficile due à la haute dimensionnalité des données, souvent des milliers de variables, tandis que le nombre d'images disponibles pour la formation est faible, typiquement quelques centaines. Nous examinons le cas où la variable cible est composé à partir de valeurs discrets et ordonnées. La deuxième contribution de cette thèse est de proposer les deux mesures suivantes pour évaluer la performance d'un modèle de décodage: l'erreur absolue et de désaccord par paires. Nous présentons plusieurs modèles qui optimisent une approximation convexe de ces fonctions de perte et examinent leur performance sur des ensembles de données IRMf. Motivé par le succès de certains modèles de régression ordinales pour la tâche du décodage basé IRMf, nous nous tournons vers l'étude de certaines propriétés théoriques de ces méthodes. La propriété que nous étudions est connu comme la cohérence de Fisher. La troisième, et la plus théorique, la contribution de cette thèse est d'examiner les propriétés de cohérence d'une riche famille de fonctions de perte qui sont utilisés dans les modèles de régression ordinales. / Until the advent of non-invasive neuroimaging modalities the knowledge of the human brain came from the study of its lesions, post-mortem analyses and invasive experimentations. Nowadays, modern imaging techniques such as fMRI are revealing several aspects of the human brain with progressively high spatio-temporal resolution. However, in order to answer increasingly complex neuroscientific questions the technical improvements in acquisition must be matched with novel data analysis methods. In this thesis we examine different applications of machine learning to the processing of fMRI data. We propose novel extensions and investigate the theoretical properties of different models. % The goal of an fMRI experiments is to answer a neuroscientific question. However, it is usually not possible to perform hypothesis testing directly on the data output by the fMRI scanner. Instead, fMRI data enters a processing pipeline in which it suffers several transformations before conclusions are drawn. Often the data acquired through the fMRI scanner follows a feature extraction step in which time-independent activation coefficients are extracted from the fMRI signal. The first contribution of this thesis is the introduction a model named Rank-1 GLM (R1-GLM) for the joint estimation of time-independent activation coefficients and the hemodynamic response function (HRF). We quantify the improvement of this approach with respect to existing procedures on different fMRI datasets. The second part of this thesis is devoted to the problem of fMRI-based decoding, i.e., the task of predicting some information about the stimuli from brain activation maps. From a statistical standpoint, this problem is challenging due to the high dimensionality of the data, often thousands of variables, while the number of images available for training is small, typically a few hundreds. We examine the case in which the target variable consist of discretely ordered values. The second contribution of this thesis is to propose the following two metrics to assess the performance of a decoding model: the absolute error and pairwise disagreement. We describe several models that optimize a convex surrogate of these loss functions and examine their performance on different fMRI datasets. Motivated by the success of some ordinal regression models for the task of fMRI-based decoding, we turn to study some theoretical properties of these methods. The property that we investigate is known as consistency or Fisher consistency and relates the minimization of a loss to the minimization of its surrogate. The third, and most theoretical, contribution of this thesis is to examine the consistency properties of a rich family of surrogate loss functions that are used in the context of ordinal regression. We give sufficient conditions for the consistency of the surrogate loss functions considered. This allows us to give theoretical reasons for some empirically observed differences in performance between surrogates. IRMf BOLD Estimation de variables Apprentissage supervisé Régression ordinale Décodage Decoding Ordinal regression 004
6	Nichtparametrische relative Effekte Domhof, Sebastian 02 May 2001 (has links) No description available. 15 Statistik EGCG EGCP MED 241 MED 242 Economics and Management Science Nichtparametrische Statistik faktorielle Versuchspläne ordinale Daten 31.73 42.11 44.32
7	Die statistische Auswertung von ordinalen Daten bei zwei Zeitpunkten und zwei Stichproben / The Statistical Analysis of Ordinal Data at two Timepoints and two Groups Siemer, Alexander 03 April 2002 (has links) No description available. 15 Statistik EGCG EGCP MED 240 MED 241 MED 242 Mathematics and Natural Science ordinale Daten nichtparametrische Verfahren wiederholte Beobachtungen ordinal data nonparametric analysis repeated measures 44.32 31.73
8	Une nouvelle famille de modèles linéaires généralisés (GLMs) pour l'analyse de données catégorielles ; application à la structure et au développement des plantes. Peyhardi, Jean 09 December 2013 (has links) (PDF) Le but de cette thèse est de proposer une nouvelle classe de GLMs pour une variable réponse catégorielle structurée hiérarchiquement, comme une variable partiellement ordonnée par exemple. Une première étape a été de mettre en évidence les différences et les point communs entre les GLMs pour variables réponses nominale et ordinale. Sur cette base nous avons introduit une nouvelle spécification des GLMs pour variable réponse catégorielle, qu'elle soit ordinale ou nominale, basée sur trois composantes : le ratio de probabilitées r, la fonction de répartition F et la matrice de design Z. Ce cadre de travail nous a permis de définir une nouvelle famille de modèles pour données nominales, comparable aux familles de modèles cumulatifs, séquentiels et adjacents pour données ordinales. Puis nous avons défini la classe des modèles linéaires généralisés partitionnés conditionnels (PCGLMs) en utilisant des arbres orientés et la specification (r,F,Z). Dans notre contexte biologique, les données sont des séquences multivariées composées d'une variable réponse catégorielle (le type de production axillaire) et de variables explicatives (longueur de l'entre-noeud par exemple). Dans les combinaisons semi-markoviennes de modèles linéaires généralisés partitionnés conditionnés (SMS-PCGLM) estimées sur la base de ces séquences, la semi-chaîne de Markov sous-jacente représente la succession et les longueurs des zones de ramification, tandis que les PCGLMs représentent, l'influence des variables explicatives de croissance sur les productions axillaires dans chaque zone de ramification. En utilisant ces modèles statistiques intégratifs, nous avons montré que la croissance de la pousse influençait des événements de ramification particuliers. [MATH:MATH_ST] Mathematics/Statistics [STAT:TH] Statistics/Statistics Theory [STAT:TH] Statistiques/Théorie [STAT:ME] Statistics/Methodology [STAT:ME] Statistiques/Méthodologie [STAT:ML] Statistics/Machine Learning [STAT:ML] Statistiques/Machine Learning fonction de lien variable nominale variable ordinale variable structurée hiérarchiquement reparametrisation de modèle motif de ramification
9	Prédiction d’états mentaux futurs à partir de données de phénotypage numérique Jean, Thierry 12 1900 (has links) Le phénotypage numérique mobilise les nombreux capteurs du téléphone intelligent (p. ex. : accéléromètre, GPS, Bluetooth, métadonnées d’appels) pour mesurer le comportement humain au quotidien, sans interférence, et les relier à des symptômes psychiatriques ou des indicateurs de santé mentale. L’apprentissage automatique est une composante intégrale au processus de transformation de signaux bruts en information intelligible pour un clinicien. Cette approche émerge d’une volonté de caractériser le profil de symptômes et ses variations dans le temps au niveau individuel. Ce projet consistait à prédire des variables de santé mentale (p. ex. : stress, humeur, sociabilité, hallucination) jusqu’à sept jours dans le futur à partir des données du téléphone intelligent pour des patients avec un diagnostic de schizophrénie. Le jeu de données CrossCheck, composé d’un échantillon de 62 participants, a été utilisé. Celui-ci inclut 23,551 jours de signaux du téléphone avec 29 attributs et 6364 autoévaluations de l’état mental à l’aide d’échelles ordinales à 4 ancrages. Des modèles prédictifs ordinaux ont été employés pour générer des prédictions discrètes interprétables sur l’échelle de collecte de données. Au total, 240 modèles d’apprentissage automatique ont été entrainés, soit les combinaisons de 10 variables de santé mentale, 3 horizons temporels (même jour, prochain jour, prochaine semaine), 2 algorithmes (XGBoost, LSTM) et 4 tâches d’apprentissage (classification binaire, régression continue, classification multiclasse, régression ordinale). Les modèles ordinaux et binaires ont performé significativement au-dessus du niveau de base et des deux autres tâches avec une erreur moyenne absolue macro entre 1,436 et 0,767 et une exactitude balancée de 58% à 73%. Les résultats montrent l’effet prépondérant du débalancement des données sur la performance prédictive et soulignent que les mesures n’en tenant pas compte surestiment systématiquement la performance. Cette analyse ancre une série de considérations plus générales quant à l’utilisation de l’intelligence artificielle en santé. En particulier, l’évaluation de la valeur clinique de solutions d’apprentissage automatique présente des défis distinctifs en comparaison aux traitements conventionnels. Le rôle grandissant des technologies numériques en santé mentale a des conséquences sur l’autonomie, l’interprétation et l’agentivité d’une personne sur son expérience. / Digital phenotyping leverages the numerous sensors of smartphones (e.g., accelerometer, GPS, Bluetooth, call metadata) to measure daily human behavior without interference and link it to psychiatric symptoms and mental health indicators. Machine learning is an integral component of processing raw signals into intelligible information for clinicians. This approach emerges from a will to characterize symptom profiles and their temporal variations at an individual level. This project consisted in predicting mental health variables (e.g., stress, mood, sociability, hallucination) up to seven days in the future from smartphone data for patients with a diagnosis of schizophrenia. The CrossCheck dataset, which has a sample of 62 participants, was used. It includes 23,551 days of phone sensor data with 29 features, and 6364 mental state self-reports on 4-point ordinal scales. Ordinal predictive models were used to generate discrete predictions that can be interpreted using the guidelines from the clinical data collection scale. In total, 240 machine learning models were trained, i.e., combinations of 10 mental health variables, 3 forecast horizons (same day, next day, next week), 2 algorithms (XGBoost, LSTM), and 4 learning tasks (binary classification, continuous regression, multiclass classification, ordinal regression). The ordinal and binary models performed significantly better than the baseline and the two other tasks with a macroaveraged mean absolute error between 1.436 and 0.767 and a balanced accuracy between 58% and 73%. Results showed a dominant effect of class imbalance on predictive performance and highlighted that metrics not accounting for it lead to systematic overestimation of performance. This analysis anchors a series of broader considerations about the use of artificial intelligence in healthcare. In particular, assessing the clinical value of machine learning solutions present distinctive challenges when compared to conventional treatments. The growing role of digital technologies in mental health has implication for autonomy, sense-making, and agentivity over one’s experience. santé numérique soi quantifié intelligence artificielle explicabilité échelle clinique apprentissage automatique régression ordinale prévision débalancement de classes Digital health Quantified self Artificial intelligence Ordinal regression Forecast Explainability Clinical scale Machine learning Class imbalance
10	Das nichtparametrische Behrens-Fisher-Problem: ein studentisierter Permutationstest und robuste Konfidenzintervalle für den Shift-Effekt / The non-parametric Behrens-Fisher Problem: A Studentized Permutation Test and Robust Confidence Intervals for the Shift Effect Neubert, Karin 07 July 2006 (has links) No description available. 310 Statistik EGCG 090 EGCG 100 EGCG 150 EGCG 200 Mathematics and Natural Science studentisierte Statistik asymptotische Verteilung Brunner-Munzel-Test ordinale Daten Rangtest Scores Studentized Statistic Asymptotic Distribution Brunner-Munzel Test Ordered Categorical Data Rank Test Count Data 42.11 Biomathematik Biokybernetik

Search results