• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 315
  • 25
  • 20
  • 13
  • 8
  • 5
  • 5
  • 4
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 476
  • 476
  • 256
  • 255
  • 137
  • 124
  • 120
  • 117
  • 106
  • 82
  • 80
  • 73
  • 72
  • 70
  • 57
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Do Economic Factors Help Forecast Political Turnover? Comparing Parametric and Nonparametric Approaches

Burghart, Ryan A. 22 April 2021 (has links)
No description available.
32

Predicting base conservation scores in RNA 3D structures

Bulbul, Gul Bahar 11 August 2023 (has links)
No description available.
33

Stock market estimation : Using Linear Regression and Random Forest

Kastberg, Daniel January 2022 (has links)
Stock market speculation is captivating to many people. Millions of people worldwide sell and buy stocks in the hope of turning a profit. By using machine learning could Random Forest or Linear Regression estimate which direction the trend of the stock market is heading, and would Random Forest outperform Linear Regression since it involves more complex methods. To explore the subject, several stocks from Nasdaq and the index of Swedish OMX are studied and used to evaluate the machine learning models. The data was modified to measure the change in percentage to accommodate the Random Forests inability to extrapolate. The return on investment in percentage was chosen as a dependent variable. Without a technical analysis both models performed poorly, but when RSI 14, EMA 10 and SMA 10 was added, both models proved significant, while Random Forest proved the superior of them both. Hyperparameter optimization was applied on Random Forest to evaluate if it was possible to prove it even more superior to Linear Regression, but alas, it only gave an improvement in half of the datasets, which made it inconclusive. This thesis adds to the already existing papers of predicting stock prices, but goes into exploring the difference between Random Forest and Linear Regression to see if there are any obvious differences in their ability to estimate the direction of a stock’s price in a near future.
34

Maskininlärningsklassificering av fordonsstatus för minskade reparationskostnader och avbrott inom kollektivtrafiken : Applicering av Random Forest-klassificering på fordonssignaler / Machine Learning Classification of Vehicle Status for Reduction of Cost and Downtime in Public Transport

Stopner, Julia, Willberg, Carl-Åke January 2022 (has links)
I takt med att den moderna och datadrivna världen fortsätter att utvecklas, så väljer många instutitioner och företag att göra en ansats att kapitalisera på dessa entiters egna strömmar av data. Parallellt med denna utveckling söker en än mer globaliserad värld efter sätt att förena en ökande befolkning och större behov av att röra sig flexibelt genom moderna städer med ett trängande behov av att mildra den klimatskada som denna mobilitet medför. Framtiden för kollektivtrafik står som potentiell lösning i gränssnittet mellan dessa två trender och det går därmed att se många fördelar med att tillåta en maskininlärningsalgoritm att finna tidigare osedda mönster och hinder i den dagliga verksamheten. Denna studie utforskar om en på historisk data tränad klassificeringsmodell av typen Random Forest kan användas för att förutspå och förebygga driftstopp i kollektivtrafiken till följd av reparationsbehov hos fordonen. Implementationen av modellen resulterade i en accuracy på 63,1% och en recall på 59,9%. Slutsatsen från undersökningen blir därmed att det finns inneboende potential i metoden, även om det krävs en ökning i kvalitet och bredd på signaldata för att höja effektiviteten i modellen. Detta implicerar, givet ytterligare forskning och förbättring av intern datahantering, att en Random Forest-modell kan ha en kommersiellt mätbar relevans sett till driftstopp och reparationskostnader. / As the modern and data-driven world continues to evolve, many institutions and corporations are eager to capitalize on their own data streams for optimizations of their operations. In tandem with this, the globalized world is searching to find ways of dealing with an ever increasing population with an urge to travel and move throughout sprawling cityscapes - all the while finding ways to mitigate the climate impact that comes with this ease of movement. The future of public transport stands in the middle of these two trends and many advantages can be gained from seizing the opportunity to let machine learning ascertain unknown patterns and obstacles in daily operations. This study explores if the use of a Random Forest classifier, having been trained on historical data, would present an effective way of predicting vehicle downtime due to repairs. The implementation of the classifier resulted in an accuracy of 63.1% and a 59.9% recall. The conclusion of the study reveals that there is potential in the method although the quality and range of possible signals need to be improved to further raise the effectiveness of the model. This implies, given further investigation and an ample adaptation of the data stream and the company technical infrastructure, that a Random Forest model would result in commercial benefits in regards to downtime and cost of repair.
35

EVALUATING THE PERFORMANCE OF PROCESS-BASED AND MACHINE LEARNING MODELS FOR RAINFALL-RUNOFF SIMULATION WITH APPLICATION OF SATELLITE AND RADAR PRECIPITATION PRODUCTS

Bhusal, Amrit 01 May 2023 (has links) (PDF)
Hydrology Modeling using HEC-HMS (Hydrological Engineering Centre-Hydrologic Modeling System) is accepted globally for event-based or continuous simulation of the rainfall-runoff operation. Similarly, Machine learning is a fast-growing discipline that offers numerous alternatives suitable for hydrology research's high demands and limitations. Conventional and process-based models such as HEC-HMS are typically created at specific spatiotemporal scales and do not easily fit the diversified and complex input parameters. Therefore, in this research, the effectiveness of Random Forest, a machine learning model, was compared with HEC-HMS for the rainfall-runoff process. In addition, Point gauge observations have historically been the primary source of the necessary rainfall data for hydrologic models. However, point gauge observation does not provide accurate information on rainfall's spatial and temporal variability, which is vital for hydrological models. Therefore, this study also evaluates the performance of satellite and radar precipitation products for hydrological analysis. The results revealed that integrated Machine Learning and physical-based model could provide more confidence in rainfall-runoff and flood depth prediction. Similarly, the study revealed that radar data performance was superior to the gauging station's rainfall data for the hydrologic analysis in large watersheds. The discussions in this research will encourage researchers and system managers to improve current rainfall-runoff simulation models by application of Machine learning and radar rainfall data.
36

Modelagem da produtividade da cultura da cana de açúcar por meio do uso de técnicas de mineração de dados / Modeling sugarcane yield through Data Mining techniques

Hammer, Ralph Guenther 27 July 2016 (has links)
O entendimento da hierarquia de importância dos fatores que influenciam a produtividade da cana de açúcar pode auxiliar na sua modelagem, contribuindo assim para a otimização do planejamento agrícola das unidades produtoras do setor, bem como no aprimoramento das estimativas de safra. Os objetivos do presente estudo foram a ordenação das variáveis que condicionam a produtividade da cana de açúcar, de acordo com a sua importância, bem como o desenvolvimento de modelos matemáticos de produtividade da cana de açúcar. Para tanto, foram utilizadas três técnicas de mineração de dados nas análises de bancos de dados de usinas de cana de açúcar no estado de São Paulo. Variáveis meteorológicas e de manejo agrícola foram submetidas às análises por meio das técnicas Random Forest, Boosting e Support Vector Machines, e os modelos resultantes foram testados por meio da comparação com dados independentes, utilizando-se o coeficiente de correlação (r), índice de Willmott (d), índice de confiança de Camargo (C), erro absoluto médio (EAM) e raíz quadrada do erro médio (RMSE). Por fim, comparou-se o desempenho dos modelos gerados com as técnicas de mineração de dados com um modelo agrometeorológico, aplicado para os mesmos bancos de dados. Constatou-se que, das variáveis analisadas, o número de cortes foi o fator mais importante em todas as técnicas de mineração de dados. A comparação entre as produtividades estimadas pelos modelos de mineração de dados e as produtividades observadas resultaram em RMSE variando de 19,70 a 20,03 t ha-1 na abordagem mais geral, que engloba todas as regiões do banco de dados. Com isso, o desempenho preditivo foi superior ao modelo agrometeorológico, aplicado no mesmo banco de dados, que obteve RMSE ≈ 70% maior (≈ 34 t ha-1). / The understanding of the hierarchy of the importance of the factors which influence sugarcane yield can subsidize its modeling, thus contributing to the optimization of agricultural planning and crop yield estimates. The objectives of this study were to ordinate the variables which condition the sugarcane yield, according to their relative importance, as well as the development of mathematical models for predicting sugarcane yield. For this, three Data Mining techniques were applied in the analyses of data bases of several sugar mills in the State of São Paulo, Brazil. Meteorological and crop management variables were analyzed through the Data Mining techniques Random Forest, Boosting and Support Vector Machines, and the resulting models were tested through the comparison with an independent data set, using the coefficient of correlation (r), Willmott index (d), confidence index of Camargo (c), mean absolute error (MAE), and root mean square error (RMSE). Finally, the predictive performances of these models were compared with the performance of an agrometeorological model, applied in the same data set. The results allowed to conclude that, within all the variables, the number of cuts was the most important factor considered by all Data Mining models. The comparison between the observed yields and those estimated by the Data Mining techniques resulted in a RMSE ranging between 19,70 to 20,03 t ha-1, in the general method, which considered all regions of the data base. Thus, the predictive performances of the Data Mining algorithms were superior to that of the agrometeorological model, which presented RMSE ≈ 70% higher (≈ 34 t ha-1).
37

Modelagem da produtividade da cultura da cana de açúcar por meio do uso de técnicas de mineração de dados / Modeling sugarcane yield through Data Mining techniques

Ralph Guenther Hammer 27 July 2016 (has links)
O entendimento da hierarquia de importância dos fatores que influenciam a produtividade da cana de açúcar pode auxiliar na sua modelagem, contribuindo assim para a otimização do planejamento agrícola das unidades produtoras do setor, bem como no aprimoramento das estimativas de safra. Os objetivos do presente estudo foram a ordenação das variáveis que condicionam a produtividade da cana de açúcar, de acordo com a sua importância, bem como o desenvolvimento de modelos matemáticos de produtividade da cana de açúcar. Para tanto, foram utilizadas três técnicas de mineração de dados nas análises de bancos de dados de usinas de cana de açúcar no estado de São Paulo. Variáveis meteorológicas e de manejo agrícola foram submetidas às análises por meio das técnicas Random Forest, Boosting e Support Vector Machines, e os modelos resultantes foram testados por meio da comparação com dados independentes, utilizando-se o coeficiente de correlação (r), índice de Willmott (d), índice de confiança de Camargo (C), erro absoluto médio (EAM) e raíz quadrada do erro médio (RMSE). Por fim, comparou-se o desempenho dos modelos gerados com as técnicas de mineração de dados com um modelo agrometeorológico, aplicado para os mesmos bancos de dados. Constatou-se que, das variáveis analisadas, o número de cortes foi o fator mais importante em todas as técnicas de mineração de dados. A comparação entre as produtividades estimadas pelos modelos de mineração de dados e as produtividades observadas resultaram em RMSE variando de 19,70 a 20,03 t ha-1 na abordagem mais geral, que engloba todas as regiões do banco de dados. Com isso, o desempenho preditivo foi superior ao modelo agrometeorológico, aplicado no mesmo banco de dados, que obteve RMSE ≈ 70% maior (≈ 34 t ha-1). / The understanding of the hierarchy of the importance of the factors which influence sugarcane yield can subsidize its modeling, thus contributing to the optimization of agricultural planning and crop yield estimates. The objectives of this study were to ordinate the variables which condition the sugarcane yield, according to their relative importance, as well as the development of mathematical models for predicting sugarcane yield. For this, three Data Mining techniques were applied in the analyses of data bases of several sugar mills in the State of São Paulo, Brazil. Meteorological and crop management variables were analyzed through the Data Mining techniques Random Forest, Boosting and Support Vector Machines, and the resulting models were tested through the comparison with an independent data set, using the coefficient of correlation (r), Willmott index (d), confidence index of Camargo (c), mean absolute error (MAE), and root mean square error (RMSE). Finally, the predictive performances of these models were compared with the performance of an agrometeorological model, applied in the same data set. The results allowed to conclude that, within all the variables, the number of cuts was the most important factor considered by all Data Mining models. The comparison between the observed yields and those estimated by the Data Mining techniques resulted in a RMSE ranging between 19,70 to 20,03 t ha-1, in the general method, which considered all regions of the data base. Thus, the predictive performances of the Data Mining algorithms were superior to that of the agrometeorological model, which presented RMSE ≈ 70% higher (≈ 34 t ha-1).
38

Análise e predição de bilheterias de filmes

FLORÊNCIO, João Carlos Procópio 29 February 2016 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-08-08T12:41:40Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) dissertacao-mestrado-jcpf.pdf: 6512881 bytes, checksum: 0e42b481cf73ab357ca212b410fbd5ee (MD5) / Made available in DSpace on 2016-08-08T12:41:40Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) dissertacao-mestrado-jcpf.pdf: 6512881 bytes, checksum: 0e42b481cf73ab357ca212b410fbd5ee (MD5) Previous issue date: 2016-02-29 / Prever o sucesso de um filme e, por consequência, seu sucesso nas bilheterias tem uma grande importância na indústria cinematográfica, desde a fase de pré-produção do filme, quando os investidores querem saber quais serão os filmes mais promissores, até nas semanas seguintes ao seu lançamento, quando se deseja prever as bilheterias das semanas restantes de exibição. Por conta disso, essa área tem sido alvo de muitos estudos que tem usado diferentes abordagens de predição, seja na seleção das características dos filmes como nas técnicas de aprendizagem, para atingir uma maior capacidade de prever o sucesso dos filmes. Neste trabalho de mestrado, foi feita uma investigação sobre o comportamento das principais características dos filmes (gênero, classificação etária, orçamento de produção, etc), com maior foco nos resultados das bilheterias e sua relação com as características dos filmes, de forma a obter uma visão mais clara de como as caracaterísticas dos filmes podem influenciar no seu sucesso, seja ele interpretado como lucro ou volume de bilheterias. Em seguida, em posse de uma base de filmes extraída do Box-Office Mojo e do IMDb, foi proposto um novo modelo de predição de box office utilizando os dados disponíveis dessa base, que é composta de: meta-dados dos filmes, palavras-chaves, e dados de bilheterias. Algumas dessas características são hibridizadas com o objetivo evidenciar as combinações de características mais importantes. É aplicado também um processo de seleção de características para excluir aquelas que não são relevantes ao modelo. O modelo utiliza Random Forest como máquina de aprendizagem. Os resultados obtidos com a técnica proposta sugerem, além de uma maior simplificação do modelo em relação a estudos anteriores, que o método consegue obter taxas de acerto superior 90% quando a classificação é medida com a métrica 1-away (quando a amostra é classificada com até 1 classe de distância), e consegue melhorar a qualidade da predição em relação a estudos anteriores quando testado com os dados da base disponível. / Predicting the success of a movie and, consequently, its box office success, has a huge importance in the motion pictures industry. Its importance comes since from the pre-production period, when the investors want to know the most promising movies to invest, until the first few weeks after release, when exhibitors want to predict the box office of the remaining weeks of exhibition. As result, this area has been subject of many studies which have used different prediction approaches, in both feature selection and learning methods, to achieve better capacity to predict movies’ success. In this mastership work, a deep research about the movie’s main features (genre, MPAA, production budget, etc) has been done, with more focus on the results of box offices and its relation with the movie’s features in order to get a clearer view of the organization of information and how variables can influence the success of a film, whether this success be interpreted as profit or revenue volumes at the box office. Then, in possession of a movie database extracted from Box-Office Mojo and IMDb, it was proposed a new box office prediction model based on available data from the database composed of: movie meta-data, key-words and box office data. Some of these features are hybridized aiming to emphasize the most important features’ combinations. A features’ selection process is also applied to exclude irrelevant features. The obtained results with the proposed method suggests, besides a further simplification of the model compared to previous studies, that the method can get hit rate of more than 90% when classification is measured with the metric 1-away (when the sample is classified within 1 class of distance from the right class), and achieve a improvement in the prediction quality when compared to previous studies using the available database.
39

Uso de random forests e redes biológicas na associação de poliformismos à doença de Alzheimer

ARAÚJO, Gilderlanio Santana de 07 March 2013 (has links)
Submitted by Irene Nascimento (irene.kessia@ufpe.br) on 2016-10-18T19:17:10Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertacao -Gilderlanio Santana de Araujo.pdf: 9533988 bytes, checksum: 951b1cf090729a87ebf3a8741ff00ad4 (MD5) / Made available in DSpace on 2016-10-18T19:17:10Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertacao -Gilderlanio Santana de Araujo.pdf: 9533988 bytes, checksum: 951b1cf090729a87ebf3a8741ff00ad4 (MD5) Previous issue date: 2013-03-07 / FACEPE / O desenvolvimento de técnicas de genotipagem de baixo custo (SNP arrays) e as anotações de milhares de polimorfismos de nucleotídeo único (SNPs) em bancos de dados públicos têm originado um crescente número de estudos de associação em escala genômica (do inglês, Genome-Wide Associations Studies - GWAS). Nesses estudos, um enorme número de SNPs (centenas de milhares) são avaliados com métodos estatísticos univariados de forma a encontrar SNPs associados a um determinado fenótipo. Testes univariados são incapazes de capturar relações de alta ordem entre os SNPs, algo comum em doenças genéticas complexas e são afetados pela alta correlação entre SNPs na mesma região genômica. Métodos de aprendizado de máquina, como o Random Forest (RF), têm sido aplicados em dados de GWAS para realizar a previsão de riscos de doenças e capturar os SNPs associados às mesmas. Apesar de RF ser um método com reconhecido desempenho em dados de alta dimensionalidade e na captura de relações não-lineares, o uso de todos os SNPs presentes em um estudo GWAS é computacionalmente inviável. Neste estudo propomos o uso de redes biológicas para a seleção inicial de SNPs candidatos a serem usados pela RF. A partir de um conjunto inicial de genes já relacionados à doença na literatura, usamos ferramentas de redes de interação gene-gene, para encontrar novos genes que possam estar associados a doença. Logo, é possível extrair um número reduzido de SNPs tornando a aplicação do método RF viável. Os experimentos realizados nesse estudo concentram-se em investigar quais polimorfismos podem influenciar na suscetibilidade à doença de Alzheimer (DA) e ao comprometimento cognitivo leve (MCI). O resultado final das análises é a delineação de uma metodologia para o uso de RF, para a análise de dados de GWAS, assim como a caracterização de potenciais fatores de riscos da DA. / The development of low cost genotyping techniques (SNP arrays) and annotations of thousands of single nucleotide polymorphisms (SNPs) in public databases has led to an increasing number of Genome-Wide Associations Studies (GWAS). In these studies, a large number of SNPs (hundreds of thousands) are evaluated with univariate statistical methods in order to find SNPs associated with a particular phenotype. Univariate tests are unable to capture high-order relationships among SNPs, which are common in complex genetic diseases, and are affected by the high correlation between SNPs at the same genomic region. Machine learning methods, such as the Random Forest (RF), have been applied to GWAS data to perform the prediction of the risk of diseases and capture a set of SNPs associated with them. Although, RF is a method with recognized performance in high dimensional data and capacity to capture non-linear relationships, the use of all SNPs present in GWAS data is computationally intractable. In this study we propose the use of biological networks for the initial selection of candidate SNPs to be used by RF. From an initial set of genes already related to a disease based on the literature, we use tools for construct gene-gene interaction networks, to find novel genes that might be associated with disease. Therefore, it is possible to extract a small number of SNPs making the method RF feasible. The experiments conducted in this study focus on investigating which polymorphisms may influence the susceptibility of Alzheimer’s disease (AD) and mild cognitive impairment (MCI). This work presents a delineation of a methodology on using RF for analysis of GWAS data, and characterization of potential risk factors for AD.
40

Le rôle des facteurs environnementaux sur la concentration des métaux-tracesdans les lacs urbains -Lac de Pampulha, Lac de Créteil et 49 lacs péri-urbains d’Ile de France / The role of environmental factors on trace-metalconcentrations in urban lakes - Lake Pampulha, Lake Créteil and 49 lakes in the Ile-de-France region

Tran khac, Viet 19 December 2016 (has links)
Les lacs jouent un rôle particulier dans le cycle de l’eau dans les bassins versants urbains. La stratification thermique et le temps de séjour de l’eau élevé favorisent le développement phytoplanctonique. La plupart des métaux sont naturellement présents dans l’environnement à l’état de traces. Ils sont essentiels pour les organismes vivants. Néanmoins, certains métaux sont connus pour leurs effets toxiques sur les animaux et les humains. La concentration totale des métaux ne reflète pas leur toxicité. Elle dépend de leurs propriétés et de leur spéciation (fractions particulaires, dissoutes: labiles ou biodisponibles et inertes). Dans les systèmes aquatiques, les métaux peuvent être absorbés par des ligands organiques ou minéraux. Leur capacité à se complexer avec la matière organique dissoute (MOD), particulièrement les substances humiques, a été largement étudiée. Dans les lacs, le développement phytoplanctonique peut produire de la MOD non-humique, connue pour sa capacité complexante des métaux. Pourtant, peu de recherche sur la spéciation des métaux dans la colonne d’eau des lacs urbains a été réalisée jusqu’à présent.Les objectifs principaux de cette thèse sont (1) d’obtenir une base de données fiables des concentrations en métaux traces dans la colonne d’eau de lacs urbains représentatifs; (2) d’évaluer leur biodisponibilité via une technique de spéciation adéquate ; (3) d’analyser leur évolution saisonnière et spatiale et leur spéciation; (4) d’étudier l’impact des variables environnementales, en particulier de la MOD autochtone sur leur biodisponibilité; (5) de lier la concentration des métaux au mode d’occupation du sol du bassin versant.Notre méthodologie est basée sur un suivi in-situ des lacs en complément d’analyses spécifiques en laboratoire. L’étude a été conduite sur trois sites: le lac de Créteil (France), le lac de Pampulha (Brésil) et 49 lacs péri-urbains (Ile de France). Sur le lac de Créteil, plusieurs dispositifs de mesure en continu nous ont fourni une partie de la base de données limnologiques. Dans le bassin versant du lac de Pampulha, la pression anthropique est très importante. Le climat et le régime hydrologique des 2 lacs sont très différents. Les 49 lacs de la région d’Ile de France ont été échantillonnés une fois pendant trois étés successifs (2011-2013). Ces lacs nous ont fourni une base de données synoptique, représentative de la contamination métallique à l’échelle d’une région fortement anthropisée.Afin d’expliquer le rôle des variables environnementales sur la concentration métallique, le modèle Random Forest a été appliqué sur les bases de données du lac de Pampulha et des 49 lacs urbains avec 2 objectifs spécifiques: (1) dans le lac de Pampulha, comprendre le rôle des variables environnementales sur la fraction labile des métaux traces, potentiellement biodisponible et (2) dans les 49 lacs, comprendre la relation des variables environnementales, particulièrement au niveau du bassin versant, sur la concentration dissoute des métaux. L’analyse des relations entre métaux et variables environnementales constitue l’un des principaux résultats de cette thèse. Dans le lac de Pampulha, environ 80% de la variance du cobalt labile est expliqué par des variables limnologiques: Chla, O2, pH et P total. Pour les autres métaux, le modèle n’a pas réussi à expliquer plus de 50 % de la relation entre fraction labile et variables limnologiques. Dans les 49 lacs, le modèle Random Forest a donné un bon résultat pour le cobalt (60% de la variance expliquée) et un très bon résultat pour le nickel (86% de la variance expliquée). Pour Ni les variables explicatives sont liées au mode d’occupation du sol : « Activités » (Equipements pour l’eau et l’assainissement, entrepôts logistiques, bureaux…) et « Décharge ». Ce résultat est en accord avec le cas du lac de Créteil où la concentration en Ni dissous est très élevée et où les catégories d’occupation du sol « Activités » et « Décharges » sont dominantes / Lakes have a particular influence on the water cycle in urban catchments. Thermal stratification and a longer water residence time in the lake boost the phytoplankton production. Most metals are naturally found in the environment in trace amounts. Trace metals are essential to growth and reproduction of organisms. However, some are also well known for their toxic effects on animals and humans. Total metal concentrations do not reflect their ecotoxicity that depends on their properties and speciation (particulate, dissolved: labile or bioavailable and inert fractions). Trace metals can be adsorbed to various components in aquatic systems including inorganic and organic ligands. The ability of metal binding to dissolved organic matter (DOM), in particular humic substances, has been largely studied. In urban lakes, the phytoplankton development can produce autochthonous DOM, non humic substances that can have the ability of metal binding.. But there are few studies about trace metal speciation in lake water column.The main objectives of this thesis are (1) to obtain a consistent database of trace metal concentrations in the water column of representative urban lakes; (2) to access their bioavailability through an adapted speciation technique; (3) to analyze the seasonal and spatial evolution of the metals and their speciation; (4) to study the potential impact of environmental variables, particularly of dissolved organic matter related to phytoplankton production on metal bioavailability and (5) to link the metal concentrations to the land use in the lake watershed.Our methodology is based on a dense field survey of the water bodies in addition to specific laboratory analysis. The research has been conducted on three study sites: Lake Créteil (France), Lake Pampulha (Brazil) and a panel of 49 peri-urban lakes (Ile de France). Lake Créteil is an urban lake impacted by anthropogenic pollution. It benefits of a large number of monitoring equipment, which allowed us to collect a part of the data set. In Lake Pampulha catchment, the anthropogenic pressure is high. Lake Pampulha has to face with many pollution point and non-point sources. The climate and limnological characteristics of the lakes are also very different. The panel of 49 lakes of Ile de France was sampled once during three successive summers (2011-2013); they provided us with a synoptic, representative data set of the regional metal contamination in a densely anthropized region.In order to explain the role of the environmental variables on the metal concentrations, we applied the Random Forest model on the Lake Pampulha dataset and on the 49 urban lake dataset with 2 specific objectives: (1) in Lake Pampulha, understanding the role of environmental variables on the trace metal labile concentration, considered as potentially bioavailable and (2) in the 49 lakes, understanding the relationship of the environmental variables, more particularly the watershed variables, on the dissolved metal concentrations. The analysis of the relationships between the trace metal speciation and the environmental variables provided the following key results of this thesis.In Lake Pampulha, around 80% of the variance of the labile cobalt is explained by some limnological variables: Chl a, O2, pH, and total phosphorus. For the other metals, the RF model did not succeed in explaining more than 50% of the relationships between the metals and the limnological variables.In the 49 urban lakes in Ile de France, the RF model gave a good result for Co (66% of explained variance) and very satisfying for Ni (86% of explained variance). For Ni, the best explanatory variables are landuse variables such as “activities” (facilities for water, sanitation and energy, logistical warehouses, shops, office…) and “landfill”. This result fits with Lake Creteil where dissolved Ni concentration is particularly high and where the “activities” and “landfill” landuse categories are the highest

Page generated in 0.0885 seconds