1 |
Locally Optimized Mapping of Slum Conditions in a Sub-Saharan Context: A Case Study of Bamenda, CameroonAnchang, Julius 18 November 2016 (has links)
Despite being an indicator of modernization and macro-economic growth, urbanization in regions such as Sub-Saharan Africa is tightly interwoven with poverty and deprivation. This has manifested physically as slums, which represent the worst residential urban areas, marked by lack of access to good quality housing and basic services. To effectively combat the slum phenomenon, local slum conditions must be captured in quantitative and spatial terms. However, there are significant hurdles to this. Slum detection and mapping requires readily available and reliable data, as well as a proper conceptualization of measurement and scale. Using Bamenda, Cameroon, as a test case, this dissertation research was designed as a three-pronged attack on the slum mapping problematic. The overall goal was to investigate locally optimized slum mapping strategies and methods that utilize high resolution satellite image data, household survey data, simple machine learning and regionalization theory.
The first major objective of the study was to tackle a "measurement" problem. The aim was to explore a multi-index approach to measure and map local slum conditions. The rationale behind this was that prior sub-Saharan slum research too often used simplified measurement techniques such as a single unweighted composite index to represent diverse local slum conditions. In this study six household indicators relevant to the United Nations criteria for defining slums were extracted from a 2013 Bamenda household survey data set and aggregated for 63 local statistical areas. The extracted variables were the percent of households having the following attributes: more than two residents per room, non-owner, occupying a single room or studio, having no flush toilet, having no piped water, having no drainage. Hierarchical variable clustering was used as a surrogate for exploratory factor analysis to determine fewer latent slum factors from these six variables. Variable groups were classified such that the most correlated variables fell in the same group while non-correlated variables fell in separate groups. Each group membership was then examined to see if the group suggested a conceptually meaningful slum factor which could quantified as a stand-alone "high" and "low" binary slum index. Results showed that the slum indicators in the study area could be replaced by at least two meaningful and statistically uncorrelated latent factors. One factor reflected the home occupancy conditions (tenancy status, overcrowded and living space conditions) and was quantified using K-means clustering of units as an ‘occupancy disadvantage index’ (Occ_D). The other reflected the state of utilities access (piped water and flush toilet) and was quantified as utilities disadvantage index (UT_D). Location attributes were used to examine/validate both indices. Independent t-tests showed that units with high Occ_D were on average closer to nearest town markets and major roads when compared with units of low Occ_D. This was consistent with theory as it is expected that typical slum residents (in this case overcrowded and non-owner households) will favor accessibility to areas of high economic activity. However, this situation was not the same with UT_D as shown by lack of such as a strong pattern.
The second major objective was to tackle a "learning" problem. The purpose was to explore the potential of unsupervised machine learning to detect or "learn" slum conditions from image data. The rationale was that such an approach would be efficient, less reliant on prior knowledge and expertise. A 2012 GeoEye image scene of the study area was subjected to image classification from which the following physical settlement attributes were quantified for each of the 63 statistical areas: per cent roof area, percent open space area, per cent bare soil, per cent paved road surface, per cent dirt road surface, building shadow-roof area ratio. The shadow-roof ratio was an innovative measure used to capture the size and density attributes of buildings. In addition to the 6 image derived variables, the mean slope of each area was calculated from a digital elevation dataset. All 7 attributes were subject to principal component analysis from which the first 2 components were extracted and used for hierarchical clustering of statistical areas to derive physical types. Results show that area units could be optimally classified into 4 physical types labelled generically as Categories 1 – 4, each with at least one defining physical characteristic. Kruskal Wallis tests comparing physical types in terms of household and locations attributes showed that at least two physical types were different in terms of aggregated household slum conditions and location attributes. Category 4 areas, located on steep slopes and having high shadow-to-roof ratio, had the highest distribution of non-owner households. They were also located close to nearest town markets. They were thus the most likely candidates of slums in the city. Category 1 units on other hand located at the outskirts and having abundant open space were least likely to have slum conditions.
The third major objective was to tackle the problem of "spatial scale". Neighborhoods, by their very nature of contiguity and homogeneity, represent an ideal scale for urban spatial analysis and mapping. Unfortunately, in most areas, neighborhoods are not objectively defined and slum mapping often relies in the use of arbitrary spatial units which do not capture the true extent of the phenomenon. The objective was thus to explore the use of analytic regionalization to quantitatively derive the neighborhood unit for mapping slums. Analytic neighborhoods were created by spatially constrained clustering of statistical areas using the minimum spanning tree algorithm. Unlike previous studies that relied on socio-economic and/or demographic information, this study innovatively used multiple land cover and terrain attributes as neighborhood homogenizing factors. Five analytic neighborhoods (labeled Regions 1-5) were created this way and compared using Kruskal Wallis tests for differences in household slum attributes. This was to determine largest possible contiguous areas that could be labeled as slum or non-slum neighborhoods. The results revealed that at least two analytic regions were significantly different in terms of aggregated household indicators. Region 1 stood apart as having significantly higher distributions of overcrowded and non-owner households. It could thus be viewed as the largest potential slum neighborhood in the city. In contrast, regions 3 (located at higher elevation and separated from rest of city by a steep escarpment) was generally associated with low distribution of household slum attributes and could be considered the strongest model of a non-slum or formal neighborhood. Both Regions 1 and 3 were also qualitatively correlated with two locally recognized (vernacular) neighborhoods. These neighborhoods, "Sisia" (for Region 1) and "Up Station" (for Region 3), are commonly perceived by local folk as occupying opposite ends of the socio-economic spectrum.
The results obtained by successfully carrying the three major objectives have major implication for future research and policy. In the case of multi-index analysis of slum conditions, it affirms the notion the that slum phenomenon is diverse in the local context and that remediation efforts must be compartmentalized to be effective. The results of image based unsupervised mapping of slums from imagery show that it is a tool with high potential for rapid slum assessment even when there is no supporting field data. Finally, the results of analytic regionalization showed that the true extent of contiguous slum neighborhoods can be delineated objectively using land cover and terrain attributes. It thus presents an opportunity for local planning and policy actors to consider redesigning the city neighborhood districts as analytic units. Quantitively derived neighborhoods are likely to be more useful in the long term, be it for spatial sampling, mapping or planning purposes.
|
2 |
Contributions à la réduction de dimensionKuentz, Vanessa 20 November 2009 (has links)
Cette thèse est consacrée au problème de la réduction de dimension. Cette thématique centrale en Statistique vise à rechercher des sous-espaces de faibles dimensions tout en minimisant la perte d'information contenue dans les données. Tout d'abord, nous nous intéressons à des méthodes de statistique multidimensionnelle dans le cas de variables qualitatives. Nous abordons la question de la rotation en Analyse des Correspondances Multiples (ACM). Nous définissons l'expression analytique de l'angle de rotation planaire optimal pour le critère de rotation choisi. Lorsque le nombre de composantes principales retenues est supérieur à deux, nous utilisons un algorithme de rotations planaires successives de paires de facteurs. Nous proposons également différents algorithmes de classification de variables qualitatives qui visent à optimiser un critère de partitionnement basé sur la notion de rapports de corrélation. Un jeu de données réelles illustre les intérêts pratiques de la rotation en ACM et permet de comparer empiriquement les différents algorithmes de classification de variables qualitatives proposés. Puis nous considérons un modèle de régression semiparamétrique, plus précisément nous nous intéressons à la méthode de régression inverse par tranchage (SIR pour Sliced Inverse Regression). Nous développons une approche basée sur un partitionnement de l'espace des covariables, qui est utilisable lorsque la condition fondamentale de linéarité de la variable explicative est violée. Une seconde adaptation, utilisant le bootstrap, est proposée afin d'améliorer l'estimation de la base du sous-espace de réduction de dimension. Des résultats asymptotiques sont donnés et une étude sur des données simulées démontre la supériorité des approches proposées. Enfin les différentes applications et collaborations interdisciplinaires réalisées durant la thèse sont décrites. / This thesis concentrates on dimension reduction approaches, that seek for lower dimensional subspaces minimizing the lost of statistical information. First we focus on multivariate analysis for categorical data. The rotation problem in Multiple Correspondence Analysis (MCA) is treated. We give the analytic expression of the optimal angle of planar rotation for the chosen criterion. If more than two principal components are to be retained, this planar solution is used in a practical algorithm applying successive pairwise planar rotations. Different algorithms for the clustering of categorical variables are also proposed to maximize a given partitioning criterion based on correlation ratios. A real data application highlights the benefits of using rotation in MCA and provides an empirical comparison of the proposed algorithms for categorical variable clustering. Then we study the semiparametric regression method SIR (Sliced Inverse Regression). We propose an extension based on the partitioning of the predictor space that can be used when the crucial linearity condition of the predictor is not verified. We also introduce bagging versions of SIR to improve the estimation of the basis of the dimension reduction subspace. Asymptotic properties of the estimators are obtained and a simulation study shows the good numerical behaviour of the proposed methods. Finally applied multivariate data analysis on various areas is described.
|
3 |
Méthodes de réduction de dimension pour la construction d'indicateurs de qualité de vie / Dimension reduction methods to construct quality of life indicatorsLabenne, Amaury 20 November 2015 (has links)
L’objectif de cette thèse est de développer et de proposer de nouvellesméthodes de réduction de dimension pour la construction d’indicateurs composites dequalité de vie à l’échelle communale. La méthodologie statistique développée met l’accentsur la prise en compte de la multidimensionnalité du concept de qualité de vie, avecune attention particulière sur le traitement de la mixité des données (variables quantitativeset qualitatives) et l’introduction des conditions environnementales. Nous optonspour une approche par classification de variables et pour une méthode multi-tableaux(analyse factorielle multiple pour données mixtes). Ces deux méthodes permettent deconstruire des indicateurs composites que nous proposons comme mesure des conditionsde vie à l’échelle communale. Afin de faciliter l’interprétation des indicateurscomposites construits, une méthode de sélection de variables de type bootstrap estintroduite en analyse factorielle multiple. Enfin nous proposons la méthode hclustgeode classification d’observations qui intègre des contraintes de proximité géographiqueafin de mieux appréhender la spatialité des phénomènes mis en jeu. / The purpose of this thesis is to develop and suggest new dimensionreduction methods to construct composite indicators on a municipal scale. The developedstatistical methodology highlights the consideration of the multi-dimensionalityof the quality of life concept, with a particular attention on the treatment of mixeddata (quantitative and qualitative variables) and the introduction of environmentalconditions. We opt for a variable clustering approach and for a multi-table method(multiple factorial analysis for mixed data). These two methods allow to build compositeindicators that we propose as a measure of living conditions at the municipalscale. In order to facilitate the interpretation of the created composite indicators, weintroduce a method of selections of variables based on a bootstrap approach. Finally,we suggest the clustering of observations method, named hclustgeo, which integratesgeographical proximity constraints in the clustering procedure, in order to apprehendthe spatiality specificities better.
|
4 |
Similarity Measures for Nominal Data in Hierarchical Clustering / Míry podobnosti pro nominální data v hierarchickém shlukováníŠulc, Zdeněk January 2013 (has links)
This dissertation thesis deals with similarity measures for nominal data in hierarchical clustering, which can cope with variables with more than two categories, and which aspire to replace the simple matching approach standardly used in this area. These similarity measures take into account additional characteristics of a dataset, such as frequency distribution of categories or number of categories of a given variable. The thesis recognizes three main aims. The first one is an examination and clustering performance evaluation of selected similarity measures for nominal data in hierarchical clustering of objects and variables. To achieve this goal, four experiments dealing both with the object and variable clustering were performed. They examine the clustering quality of the examined similarity measures for nominal data in comparison with the commonly used similarity measures using a binary transformation, and moreover, with several alternative methods for nominal data clustering. The comparison and evaluation are performed on real and generated datasets. Outputs of these experiments lead to knowledge, which similarity measures can generally be used, which ones perform well in a particular situation, and which ones are not recommended to use for an object or variable clustering. The second aim is to propose a theory-based similarity measure, evaluate its properties, and compare it with the other examined similarity measures. Based on this aim, two novel similarity measures, Variable Entropy and Variable Mutability are proposed; especially, the former one performs very well in datasets with a lower number of variables. The third aim of this thesis is to provide a convenient software implementation based on the examined similarity measures for nominal data, which covers the whole clustering process from a computation of a proximity matrix to evaluation of resulting clusters. This goal was also achieved by creating the nomclust package for the software R, which covers this issue, and which is freely available.
|
Page generated in 0.1283 seconds