1 |
Canopy Change Assessment and Water Resources Utilization in the Civano Community, ArizonaPan, Yajuan 12 1900 (has links)
The Civano community of Tucson, Arizona, is built for sustainability. Trees and plants are precious resources in the community and balancing human needs and natural resources. The design of rainwater harvesting systems and the usage of reclaimed water inside the community effectively irrigate plants and save drinking water. This project estimates canopy changes over time and explores the effect of water resources on plant growth for developed areas and natural areas, respectively. This project generates land cover classifications for 2007, 2010, and 2015 using supervised classification method and measures canopy cover change over time. Based on City of Tucson Water “harvesting rainwater guide to water-efficient landscaping”, this project discusses if water supply meets plant water demand in the developed areas of the community. Additionally, the normalized difference vegetation index (NDVI) data for developed area and natural area over ten years are compared and provide a correlation analysis with water sources. The results show that canopy cover across the entire community decreased from 2007 to 2010, then increased from 2010 to 2015. Water supply in the developed areas is sufficient for plant water demand. In natural areas plant growth changes dramatically as a result of precipitation fluctuation. In addition, it’s proved that 2011 National Land Cover Database (NLCD) tree canopy underestimates canopy cover in the Civano community. The final products not only provide the fundamental canopy cover data for other studies, also serve as a reference of water efficient landscaping within a community.
|
2 |
Machine Learning Approach on Evaluating Predictive Factors of Fall-Related InjuriesAteeq, Sameen January 2018 (has links)
According to the Public Health Agency of Canada, falls account for 95% of all hip fractures in Canada; 20% of fall-related injury cases end in death. This thesis evaluates the predictive power of many variables to predict fall-related injuries. The dataset chosen was CCHS which is high dimensional and diverse. The use of Principal Component Analysis (PCA) and random forest was employed to determine the highest priority risk factors to include in the predictive model. The results show that it is possible to predict fall-related injuries with a sensitivity of 80% or higher using four predictors (frequency of consultations with medical doctor, food and vegetable consumption, height and monthly physical activity level of over 15 minutes). Alternatively, the same sensitivity can be reached using age, frequency of walking for exercise per 3 months, alcohol consumption and personal income. None of the predictive models reached an accuracy of 70% or higher.
Further work in studying nutritional diets that offer protection from incurring a fall related injury are also recommended. Since the predictors are behavioral determinants of health and have a high sensitivity but a low accuracy, population health interventions are recommended rather than individual-level interventions. Suggestions to improve accuracy of built models are also proposed. / Thesis / Master of Science (MSc)
|
3 |
Novel Measures on Directed Graphs and Applications to Large-Scale Within-Network ClassificationMantrach, Amin 25 October 2010 (has links)
Ces dernières années, les réseaux sont devenus une source importante d’informations dans différents domaines aussi variés que les sciences sociales, la physique ou les mathématiques. De plus, la taille de ces réseaux n’a cessé de grandir de manière conséquente. Ce constat a vu émerger de nouveaux défis, comme le besoin de mesures précises et intuitives pour caractériser et analyser ces réseaux de grandes tailles en un temps raisonnable.
La première partie de cette thèse introduit une nouvelle mesure de similarité entre deux noeuds d’un réseau dirigé et pondéré : la covariance “sum-over-paths”. Celle-ci a une interprétation claire et précise : en dénombrant tous les chemins possibles deux noeuds sont considérés comme fortement corrélés s’ils apparaissent souvent sur un même chemin – de préférence court. Cette mesure dépend d’une distribution de probabilités, définie sur l’ensemble infini dénombrable des chemins dans le graphe, obtenue en minimisant l'espérance du coût total entre toutes les paires de noeuds du graphe sachant que l'entropie relative totale injectée dans le réseau est fixée à priori. Le paramètre d’entropie permet de biaiser la distribution de probabilité sur un large spectre : allant de marches aléatoires naturelles où tous les chemins sont équiprobables à des marches biaisées en faveur des plus courts chemins. Cette mesure est alors appliquée à des problèmes de classification semi-supervisée sur des réseaux de taille moyennes et comparée à l’état de l’art.
La seconde partie de la thèse introduit trois nouveaux algorithmes de classification de noeuds en sein d’un large réseau dont les noeuds sont partiellement étiquetés. Ces algorithmes ont un temps de calcul linéaire en le nombre de noeuds, de classes et d’itérations, et peuvent dés lors être appliqués sur de larges réseaux. Ceux-ci ont obtenus des résultats compétitifs en comparaison à l’état de l’art sur le large réseaux de citations de brevets américains et sur huit autres jeux de données. De plus, durant la thèse, nous avons collecté un nouveau jeu de données, déjà mentionné : le réseau de citations de brevets américains. Ce jeu de données est maintenant disponible pour la communauté pour la réalisation de tests comparatifs.
La partie finale de cette thèse concerne la combinaison d’un graphe de citations avec les informations présentes sur ses noeuds. De manière empirique, nous avons montré que des données basées sur des citations fournissent de meilleurs résultats de classification que des données basées sur des contenus textuels. Toujours de manière empirique, nous avons également montré que combiner les différentes sources d’informations (contenu et citations) doit être considéré lors d’une tâche de classification de textes. Par exemple, lorsqu’il s’agit de catégoriser des articles de revues, s’aider d’un graphe de citations extrait au préalable peut améliorer considérablement les performances. Par contre, dans un autre contexte, quand il s’agit de directement classer les noeuds du réseau de citations, s’aider des informations présentes sur les noeuds n’améliora pas nécessairement les performances.
La théorie, les algorithmes et les applications présentés dans cette thèse fournissent des perspectives intéressantes dans différents domaines.
In recent years, networks have become a major data source in various fields ranging from social sciences to mathematical and physical sciences. Moreover, the size of available networks has grow substantially as well. This has brought with it a number of new challenges, like the need for precise and intuitive measures to characterize and analyze large scale networks in a reasonable time.
The first part of this thesis introduces a novel measure between two nodes of a weighted directed graph: The sum-over-paths covariance. It has a clear and intuitive interpretation: two nodes are considered as highly correlated if they often co-occur on the same -- preferably short -- paths. This measure depends on a probability distribution over the (usually infinite) countable set of paths through the graph which is obtained by minimizing the total expected cost between all pairs of nodes while fixing the total relative entropy spread in the graph. The entropy parameter allows to bias the probability distribution over a wide spectrum: going from natural random walks (where all paths are equiprobable) to walks biased towards shortest-paths. This measure is then applied to semi-supervised classification problems on medium-size networks and compared to state-of-the-art techniques.
The second part introduces three novel algorithms for within-network classification in large-scale networks, i.e., classification of nodes in partially labeled graphs. The algorithms have a linear computing time in the number of edges, classes and steps and hence can be applied to large scale networks. They obtained competitive results in comparison to state-of-the-art technics on the large scale U.S.~patents citation network and on eight other data sets. Furthermore, during the thesis, we collected a novel benchmark data set: the U.S.~patents citation network. This data set is now available to the community for benchmarks purposes.
The final part of the thesis concerns the combination of a citation graph with information on its nodes. We show that citation-based data provide better results for classification than content-based data. We also show empirically that combining both sources of information (content-based and citation-based) should be considered when facing a text categorization problem. For instance, while classifying journal papers, considering to extract an external citation graph may considerably boost the performance. However, in another context, when we have to directly classify the network citation nodes, then the help of features on nodes will not improve the results.
The theory, algorithms and applications presented in this thesis provide interesting perspectives in various fields.
|
4 |
On Fractionally-Supervised Classification: Weight Selection and Extension to the Multivariate t-DistributionGallaugher, Michael P.B. January 2017 (has links)
Recent work on fractionally-supervised classification (FSC), an approach that allows classification to be carried out with a fractional amount of weight given to the unla- belled points, is extended in two important ways. First, and of fundamental impor- tance, the question over how to choose the amount of weight given to the unlabelled points is addressed. Then, the FSC approach is extended to mixtures of multivariate t-distributions. The first extension is essential because it makes FSC more readily applicable to real problems. The second, although less fundamental, demonstrates the efficacy of FSC beyond Gaussian mixture models. / Thesis / Master of Science (MSc)
|
5 |
Plug-in methods in classification / Méthodes de type plug-in en classificationChzhen, Evgenii 25 September 2019 (has links)
Ce manuscrit étudie plusieurs problèmes de classification sous contraintes. Dans ce cadre de classification, notre objectif est de construire un algorithme qui a des performances aussi bonnes que la meilleure règle de classification ayant une propriété souhaitée. Fait intéressant, les méthodes de classification de type plug-in sont bien appropriées à cet effet. De plus, il est montré que, dans plusieurs configurations, ces règles de classification peuvent exploiter des données non étiquetées, c'est-à-dire qu'elles sont construites de manière semi-supervisée. Le Chapitre 1 décrit deux cas particuliers de la classification binaire - la classification où la mesure de performance est reliée au F-score, et la classification équitable. A ces deux problèmes, des procédures semi-supervisées sont proposées. En particulier, dans le cas du F-score, il s'avère que cette méthode est optimale au sens minimax sur une classe usuelle de distributions non-paramétriques. Aussi, dans le cas de la classification équitable, la méthode proposée est consistante en terme de risque de classification, tout en satisfaisant asymptotiquement la contrainte d’égalité des chances. De plus, la procédure proposée dans ce cadre d'étude surpasse en pratique les algorithmes de pointe. Le Chapitre 3 décrit le cadre de la classification multi-classes par le biais d'ensembles de confiance. Là encore, une procédure semi-supervisée est proposée et son optimalité presque minimax est établie. Il est en outre établi qu'aucun algorithme supervisé ne peut atteindre une vitesse de convergence dite rapide. Le Chapitre 4 décrit un cas de classification multi-labels dans lequel on cherche à minimiser le taux de faux-négatifs sous réserve de contraintes de type presque sûres sur les règles de classification. Dans cette partie, deux contraintes spécifiques sont prises en compte: les classifieurs parcimonieux et ceux soumis à un contrôle des erreurs négatives à tort. Pour les premiers, un algorithme supervisé est fourni et il est montré que cet algorithme peut atteindre une vitesse de convergence rapide. Enfin, pour la seconde famille, il est montré que des hypothèses supplémentaires sont nécessaires pour obtenir des garanties théoriques sur le risque de classification / This manuscript studies several problems of constrained classification. In this frameworks of classification our goal is to construct an algorithm which performs as good as the best classifier that obeys some desired property. Plug-in type classifiers are well suited to achieve this goal. Interestingly, it is shown that in several setups these classifiers can leverage unlabeled data, that is, they are constructed in a semi-supervised manner.Chapter 2 describes two particular settings of binary classification -- classification with F-score and classification of equal opportunity. For both problems semi-supervised procedures are proposed and their theoretical properties are established. In the case of the F-score, the proposed procedure is shown to be optimal in minimax sense over a standard non-parametric class of distributions. In the case of the classification of equal opportunity the proposed algorithm is shown to be consistent in terms of the misclassification risk and its asymptotic fairness is established. Moreover, for this problem, the proposed procedure outperforms state-of-the-art algorithms in the field.Chapter 3 describes the setup of confidence set multi-class classification. Again, a semi-supervised procedure is proposed and its nearly minimax optimality is established. It is additionally shown that no supervised algorithm can achieve a so-called fast rate of convergence. In contrast, the proposed semi-supervised procedure can achieve fast rates provided that the size of the unlabeled data is sufficiently large.Chapter 4 describes a setup of multi-label classification where one aims at minimizing false negative error subject to almost sure type constraints. In this part two specific constraints are considered -- sparse predictions and predictions with the control over false negative errors. For the former, a supervised algorithm is provided and it is shown that this algorithm can achieve fast rates of convergence. For the later, it is shown that extra assumptions are necessary in order to obtain theoretical guarantees in this case
|
6 |
Geographic information science: contribution to understanding salt and sodium affected soils in the Senegal River valleyNdiaye, Ramatoulaye January 1900 (has links)
Doctor of Philosophy / Department of Geography / John A. Harrington Jr / The Senegal River valley and delta (SRVD) are affected by long term climate variability.
Indicators of these climatic shifts include a rainfall deficit, warmer temperatures, sea level rise,
floods, and drought. These shifts have led to environmental degradation, water deficits, and
profound effects on human life and activities in the area. Geographic Information Science
(GIScience), including satellite-based remote sensing methods offer several advantages over
conventional ground-based methods used to map and monitor salt-affected soil (SAS) features.
This study was designed to assess the accuracy of information on soil salinization extracted from
Landsat satellite imagery. Would available imagery and GIScience data analysis enable an
ability to discriminate natural soil salinization from soil sodication and provide an ability to
characterize the SAS trend and pattern over 30 years? A set of Landsat MSS (June 1973 and
September 1979), Landsat TM (November 1987, April 1994 and November 1999) and ETM+
(May 2001 and March 2003) images have been used to map and monitor salt impacted soil
distribution. Supervised classification, unsupervised classification and post-classification change
detection methods were used. Supervised classifications of May 2001 and March 2003 images
were made in conjunction field data characterizing soil surface chemical characteristics that
included exchange sodium percentage (ESP), cation exchange capacity (CEC) and the electrical
conductivity (EC). With this supervised information extraction method, the distribution of three
different types of SAS (saline, saline-sodic, and sodic) was mapped with an accuracy of 91.07%
for 2001 image and 73.21% for 2003 image. Change detection results confirmed a decreasing
trend in non-saline and saline soil and an increase in saline-sodic and sodic soil. All seven
Landsat images were subjected to the unsupervised classification method which resulted in maps
that separate SAS according to their degree of salinity. The spatial distribution of sodic and
saline-sodic soils has a strong relationship with the area of irrigated rice crop management. This
study documented that human-induced salinization is progressively replacing natural salinization
in the SRVD. These pedologic parameters obtained using GIScience remote sensing techniques
can be used as a scientific tool for sustainable management and to assist with the implementation
of environmental policy.
|
7 |
Détection et caractérisation du cancer de la prostate par images IRM 1.5T multiparamétriques / Computer-aided decision system for prostate cancer detection and characterization based on multi-parametric 1.5T MRILehaire, Jérôme 04 October 2016 (has links)
Le cancer de la prostate est le plus courant en France et la 4ième cause de mortalité par cancer. Les méthodes diagnostics de références actuel sont souvent insuffisantes pour détecter et localiser précisément une lésion. L’imagerie IRM multi-paramétrique est désormais la technique la plusprometteuse pour le diagnostic et la prise en charge du cancer de la prostate. Néanmoins, l’interprétation visuelle des multiples séquences IRM n’est pas aisée. Dans ces conditions, un fort intérêt s’est porté sur les systèmes d’aide au diagnostic dont le but est d’assister le radiologue dans ses décisions. Cette thèse présente la conception d’un système d’aide à la détection (CADe) dontl’approche finale est de fournir au radiologue une carte de probabilité du cancer dans la zone périphérique de la prostate. Ce CADe repose sur une base d’images IRM multi-paramétrique (IRM-mp) 1.5T de types T2w, dynamique et de diffusion provenant d’une base de 49 patients annotés permettant d’obtenir une vérité terrain par analyse stricte des coupes histologiques des pièces de prostate. Cette thèse met l’accent sur la détection des cancers mais aussisur leur caractérisation dans le but de fournir une carte de probabilité corrélée au grade de Gleason des tumeurs. Nous avons utilisé une méthode d’apprentissage de dictionnaires permettant d’extraire de nouvelles caractéristiques descriptives dont l’objectif est de discriminer chacun des cancers. Ces dernières sont ensuite utilisées par deux classifieurs : régression logistique et séparateur à vaste marge (SVM), permettant de produire une carte de probabilité du cancer. Nous avons concentré nos efforts sur la discrimination des cancers agressifs (Gleason>6) et fourni une analyse de la corrélationentre probabilités et scores de Gleason. Les résultats montrent de très bonnes performances de détection des cancers agressifs et l’analyse des probabilités conclue sur une forte capacité du système à séparer les cancers agressifs du reste des tissus mais ne permet pas aisément de distinguer chacundes grades de cancer / Prostate cancer is the most frequent and the fourth leading cause of mortality in France. Actual diagnosis methods are often insufficient in order to detect and precisely locate cancer. Multiparametrics MRI is now one of the most promising method for accurate follow-up of the disease. However, the visual interpretation of MRI is not easy and it is shown that there is strongvariability among expert radiologists to perform diagnosis, especially when MR sequences are contradictory. Under these circumstances, a strong interest is for Computer-aided diagnosis systems (CAD) aiming at assisting expert radiologist in their final decision. This thesis presents our work toward the conception of a CADe which final goal is to provide a cancer probability map to expertradiologist. This study is based on a rich dataset of 49 patients made of T2w, dynamic and diffusion MR images. The ground truth was obtained through strict process of annotations and correlation between histology and MRI. This thesis focuses both for cancer detection and characterization in order to provide a cancer probability map correlated to cancer aggressiveness (Gleason score). To that end we used a dictionary learning method to extract new features to better characterize cancer aggressiveness signatures as well as image features. Those features are then used as an input to Support Vector Machines (SVM) and Logistic Regression (LR) classifiers to produce a cancer probability map. We then focused on discriminating agressive cancers (Gleason score >6) from other tissues and provided an analysis of the correlation between cancer aggressiveness and probabilities. Our work conclude on a strong capability to distinguish agressive cancer from other tissues but fails to precisely distinguish different grades of cancers
|
8 |
ANÁLISE ORIENTADA A OBJETO GEOGRÁFICO NA CARACTERIZAÇÃO DO USO E OCUPAÇÃO DA TERRA EM SEGMENTOS DO RIO PITANGUI, PARANÁ: AVALIAÇÕES PRELIMINARESAntunes, Dinameres Aparecida 04 March 2015 (has links)
Made available in DSpace on 2017-07-21T18:15:22Z (GMT). No. of bitstreams: 1
Dinameres Aparecida Antunes.pdf: 6452935 bytes, checksum: 42b0127ee41d759b5a9558135f9e818a (MD5)
Previous issue date: 2015-03-04 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Our major goal is through GEOBIA (Geographic Object Based Image Analysis) obtain the characterization of use and land occupation in two different zones nearby Pitangui’s river, situated among the cities of Castro, Carambeí and Ponta Grossa, Paraná State. Nowadays that is a growing demand of remote sensing images with high spatial resolution. As a result, there is a need to create new methodologies for digital image processing. Here we can mention GEOBIA, which has some advantages, such as a polygon generation for each geographical object created, relational data base with several descriptors, as well as, the possibility to use spectral, spatial and texture information. Yet we have been used the Principal Components Analysis (PCA) and the Cluster Analysis (CA) in order to reduce the size of our training samples from the relational data base, which are trained by means of supervised classification. With this, we aim to select descriptors that could bring the best results for our classification. We have obtained different geographic object oriented supervised classifications, whereas the descriptors were selected from PCA, also from PCA together with CA, as well as, all descriptors have been selected from GEOBIA. We have concluded that the best results for the two work zones were obtain from the descriptors selected from PCA together with CA. In work zone 1 we have obtained 95,68% of general precision in the confusion matrix and 0,95 in the kappa index. While, in work zone 2 we have obtained 93,85% in the confusion matrix and 0,93 in the kappa index. These numbers are consider outstanding in the literature. With this, we show that descriptors selection for geographic object oriented classification is an important approach, since there are descriptors with redundant information that could mislead the classification result. / O objetivo desse trabalho foi mediante a GEOBIA (Geographic Object Based Image Analysis) caracterizar o uso e ocupação da terra em duas diferentes áreas no entorno do rio Pitangui, localizado entre os municípios de Castro, Carambeí e Ponta Grossa, Paraná. Devido à maior demanda de imagens de sensoriamento remoto com alta resolução espacial, há a necessidade de novas metodologias no processamento de imagem digital, pode-se citar a GEOBIA, que é análise orientada a objeto geográfico, que dentre as suas vantagens está a geração de polígonos para cada objeto geográfico criado e banco de dados relacional com diversos descritores, além de possibilitar o uso de informação espectral, espacial e de textura. A Análise de Componentes Principais (ACP) e Análise de Agrupamentos (AA) foram utilizadas com o objetivo de reduzir a dimensionalidade do banco de dados relacional das amostras de treinamento da classificação supervisionada e dessa forma, selecionar os descritores que proporcionassem melhores resultados para a classificação. Processou-se classificações supervisionadas orientadas a objetos utilizando descritores selecionados a partir da ACP, selecionados pela ACP mais AA, e todos os descritores gerados a partir da GEOBIA. Observou-se que os melhores resultados para ambas as áreas de estudo foram com os descritores selecionados pela ACP mais AA, que obtiveram na área de estudo 1 95,68% de precisão geral na matriz de confusão, e 0,95 no índice kappa, e para área de estudo 2 93,85% na matriz e 0,93 no índice kappa, valores considerados excelentes pela literatura, demonstrando que a seleção de descritores para a classificação orientada a objetos é pertinente, pois há descritores com informações redundantes que podem prejudicar o resultado final da classificação.
|
9 |
Mapping land-use in north-western Nigeria (Case study of Dutse)Anavberokhai, Isah January 2007 (has links)
<p>This project analyzes satellite images from 1976, 1985 and 2000 of Dutse, Jigawa state, in north-western Nigeria. The analyzed satellite images were used to determine land-use and vegetation changes that have occurred in the land-use from 1976 to 2000 will help recommend possible planning measures in order to protect the vegetation from further deterioration.</p><p>Studying land-use change in north-western Nigeria is essential for analyzing various ecological and developmental consequences over time. The north-western region of Nigeria is of great environmental and economic importance having land cover rich in agricultural production and livestock grazing. The increase of population over time has affected the land-use and hence agricultural and livestock production.</p><p>On completion of this project, the possible land use changes that have taken place in Dutse will be analyzed for future recommendation. The use of supervised classification and change detection of satellite images have produced an economic way to quantify different types of landuse and changes that has occurred over time.</p><p>The percentage difference in land-use between 1976 and 2000 was 37%, which is considered to be high land-use change within the period of study. The result in this project is being used to propose planning strategies that could help in planning sustainable land-use and diversity in Dutse.</p>
|
10 |
Detecting Land Cover Change over a 20 Year Time Period in the Niagara Escarpment Plan Using Satellite Remote SensingWaite, Holly January 2009 (has links)
The Niagara Escarpment is one of Southern Ontario’s most important landscapes. Due to the nature of the landform and its location, the Escarpment is subject to various development pressures including urban expansion, mineral resource extraction, agricultural practices and recreation. In 1985, Canada’s first large scale environmentally based land use plan was put in place to ensure that only development that is compatible with the Escarpment occurred within the Niagara Escarpment Plan (NEP). The southern extent of the NEP is of particular interest in this study, since a portion of the Plan is located within the rapidly expanding Greater Toronto Area (GTA). The Plan area located in the Regional Municipalities of Hamilton and Halton represent both urban and rural geographical areas respectively, and are both experiencing development pressures and subsequent changes in land cover. Monitoring initiatives on the NEP have been established, but have done little to identify consistent techniques for monitoring land cover on the Niagara Escarpment. Land cover information is an important part of planning and environmental monitoring initiatives. Remote sensing has the potential to provide frequent and accurate land cover information over various spatial scales. The goal of this research was to examine land cover change in the Regional Municipalities of Hamilton and Halton portions of the NEP. This was achieved through the creation of land cover maps for each region using Landsat 5 Thematic Mapper (TM) remotely sensed data. These maps aided in determining the qualitative and quantitative changes that had occurred in the Plan area over a 20 year time period from 1986 to 2006. Change was also examined based on the NEP’s land use designations, to determine if the Plan policy has been effective in protecting the Escarpment. To obtain land cover maps, five different supervised classification methods were explored: Minimum Distance, Mahalanobis Distance, Maximum Likelihood, Object-oriented and Support Vector Machine. Seven land cover classes were mapped (forest, water, recreation, bare agricultural fields, vegetated agricultural fields, urban and mineral resource extraction areas) at a regional scale. SVM proved most successful at mapping land cover on the Escarpment, providing classification maps with an average accuracy of 86.7%. Land cover change analysis showed promising results with an increase in the forested class and only slight increases to the urban and mineral resource extraction classes. Negatively, there was a decrease in agricultural land overall. An examination of land cover change based on the NEP land use designations showed little change, other than change that is regulated under Plan policies, proving the success of the NEP for protecting vital Escarpment lands insofar as this can be revealed through remote sensing. Land cover should be monitored in the NEP consistently over time to ensure changes in the Plan area are compatible with the Niagara Escarpment. Remote sensing is a tool that can provide this information to the Niagara Escarpment Commission (NEC) in a timely, comprehensive and cost-effective way. The information gained from remotely sensed data can aid in environmental monitoring and policy planning into the future.
|
Page generated in 0.0225 seconds