Global ETD Search

71	Machine Learning Approaches to Refining Post-translational Modification Predictions and Protein Identifications from Tandem Mass Spectrometry Chung, Clement 11 December 2012 (has links) Tandem mass spectrometry (MS/MS) is the dominant approach for large-scale peptide sequencing in high-throughput proteomic profiling studies. The computational analysis of MS/MS spectra involves the identification of peptides from experimental spectra, especially those with post-translational modifications (PTMs), as well as the inference of protein composition based on the putative identified peptides. In this thesis, we tackled two major challenges associated with an MS/MS analysis: 1) the refinement of PTM predictions from MS/MS spectra and 2) the inference of protein composition based on peptide predictions. We proposed two PTM prediction refinement algorithms, PTMClust and its Bayesian nonparametric extension \emph{i}PTMClust, and a protein identification algorithm, pro-HAP, that is based on a novel two-layer hierarchical clustering approach that leverages prior knowledge about protein function. Individually, we show that our two PTM refinement algorithms outperform the state-of-the-art algorithms and our protein identification algorithm performs at par with the state of the art. Collectively, as a demonstration of our end-to-end MS/MS computational analysis of a human chromatin protein complex study, we show that our analysis pipeline can find high confidence putative novel protein complex members. Moreover, it can provide valuable insights into the formation and regulation of protein complexes by detailing the specificity of different PTMs for the members in each complex. Machine Learning Unsupervised Learning Clustering Mass Spectrometry Protein Identification Nonparameteric Bayesian method Hierarchical clustering 0800 0984 0715
72	Intra-metropolitan agglomerations of producer services firms: the case of graphic design firms in metropolitan Melbourne, 1981-2001 Elliott, Peter Vincent Unknown Date (has links) (PDF) Graphic Design is one part of the producer services sector of the modern metropolitan region. It is a sector that has experienced considerable development in terms of number of firms through demand created by the expansion of advertising and multi media. To date research has established that producer services, particularly finance related ones, agglomerate in the central city to take advantage of the agglomeration economies available in large metropolitan areas. This thesis argues that one of the key factors for the agglomeration of graphic design is the need for face-to-face communication with clients and other firms. There has been some work undertaken looking at the location of non-finance producer services, such as design, although these have been presented as snapshots at a point in time.This thesis extends this understanding through an analysis of agglomerations of graphic design firms over a twenty year time horizon. Using details of firm location in Melbourne every five years from 1981 to 2001 the thesis uses a geospatial analytical technique to identify agglomerations and explores the change in the size, location and density of agglomerations of firms. This research shows that the initial agglomeration of 1981 was still present by 2001 and had been joined by a number of new agglomerations ringing the Melbourne CBD while at the same time there has also been a dispersal of firms to the middle suburbs. In order to provide some insight in to the agglomeration of graphic design firms this research also examines the geography of two industries allied to graphic design: advertising and printing. This research shows that graphic designers and advertising agencies tend to locate in similar parts of inner Melbourne which may be due to the need for face-to-face contact between fims in these two industries. (For complete abstract open document)
73	Bio-optical characterization of the Salish Sea, Canada, towards improved chlorophyll algorithms for MODIS and Sentinel-3 Phillips, Stephen Robert 22 December 2015 (has links) The goal of this research was to improve ocean colour chlorophyll a (Chla) retrievals in the coastal Case 2 waters of the Salish Sea by characterizing the main drivers of optical variability and using this information to parameterize empirical algorithms based on an optical classification. This was addressed with three specific objectives: (1) build a comprehensive spatio-temporal data set of in situ optical and biogeochemical parameters, (2) apply a hierarchical clustering analysis to classify above-water remote sensing reflectance (Rrs) and associated bio-optical regimes, (3) optimize and validate class-specific empirical algorithms for improved Chla retrievals. Biogeochemical and optical measurements, acquired at 145 sites, showed considerable variation; Chla (mean=1.64, range: 0.10 – 7.20 µg l-1), total suspended matter (TSM) (3.09, 0.82 – 20.69 mg l-1), and absorption by chromophoric dissolved organic matter (a_cdom (443)) (0.525, 0.007 – 3.072 m-1), thus representing the spatial and temporal variability of the Salish Sea. A comparable range was found in the measured optical properties; particulate scattering (b_p (650)) (1.316, 0.250 – 7.450 m-1), particulate backscattering (b_bp (650)) (0.022, 0.005 – 0.097 m-1), total beam attenuation coefficient (c_t (650)) (1.675, 0.371 – 9.537 m-1), and particulate absorption coefficient (a_p (650)) (0.345, 0.048 – 2.020 m-1). Empirical orthogonal function (EOF) analysis revealed 95% of the Rrs variance was highly correlated to b_p (r = 0.90), b_bp (r = 0.82), and TSM concentration (r = 0.80), suggesting a strong influence from riverine systems in this region. Hierarchical clustering on the normalized Rrs revealed four spectral classes. Class 1 is defined by high overall Rrs magnitudes in the red, indicating more turbid waters, Class 2 showed high Rrs values in the red and well defined fluorescence and absorption features, indicated by a high Chla and TSM presence, Class 3 showed low TSM influence and more defined Chla signatures, and Class 4 is characterized by overall low Rrs values, suggesting more optically clear oceanic waters. Spectral similarities justified a simplification of this classification into two dominant water classes – (1) estuarine class (Classes 1 and 2) and (2) oceanic class (Classes 3 and 4) – representing the dominant influences seen here. In situ Chla and above-water remote sensing reflectance measurements, used to validate and parameterize the OC3M/OC3S3, two-band ratio, FLH and, modified FLH (ModFLH) empirical algorithms, showed a systematic overestimation of low Chla concentrations and underestimation of higher Chla values for all four algorithms when tuned to regional data. FLH and ModFLH algorithms performed best for these data (R2 ~ 0.40; RMSE ~ 0.32). Algorithm accuracy was significantly improved for the class-specific parametrizations with the two-band ratio showing a strong correlation to the Chla concentrations in the estuarine class (R2 ~ 0.71; RMSE ~ 0.33) and the ModFLH algorithm in the oceanic class (R2 ~ 0.70; RMSE ~ 0.26). These results demonstrated the benefit of applying an optical classification as a necessary first step into improving Chla retrievals from remotely sensed data in the contrasted coastal waters of the Salish Sea. With accurate Chla information, the health of the Salish Sea can be viably monitored at spatial and temporal scales suitable for ecosystem management. / Graduate / 0416 / stephen.uvic@gmail.com optical Salish Sea chlorophyll algorithm MODIS Sentinel-3 ecosystem empirical algorithms remote sensing hierarchical clustering empirical orthogonal function analysis coastal waters
74	Analys av nutidens tågindelning : Ett uppdrag framtaget av Trafikverket / Analysis of today's train division Grek, Viktoria, Gabrielsson, Molinia January 2018 (has links) The information used in this paper comes from Trafikverket's delivery monitoring system. It consists of information about planned train missions on the Swedish railways for the years 2014 to 2017 during week four (except planned train missions on Roslagsbanan and Saltsjöbanan). Trafikanalys with help from Trafikverket presents public statistics for short-distance trains, middle-distance trains and long-distance trains on Trafikanalys website. The three classes of trains have no scientific basis. The purpose of this study is therefore to analyze if today's classes of trains can be used and which variables that have importance for the classification. The purpose of this study is also to analyze if there is a better way to categorize the classes of trains when Trafikanalys publishes public statistics. The statistical methods that are used in this study are decision tree, neural network and hierarchical clustering. The result obtained from the decision tree was a 92.51 percent accuracy for the classification of Train type. The most important variables for Train type were Train length, Planned train kilometers and Planned km/h.Neural networks were used to investigate whether this method could also provide a similar result as the decision tree too strengthening the reliability. Neural networks got an 88 percent accuracy when classifying Train type. Based on these two results, it indicates that the larger proportion of train assignments could be classified to the correct Train Type. This means that the current classification of Train type works when Trafikanalys presents official statistics. For the new train classification, three groups were analyzed when hierarchical clustering was used. These three groups were not the same as the group's short-distance trains, middle-distance trains and long-distance trains. Because the new divisions have blended the various passenger trains, this result does not help to find a better subdivision that can be used for when Trafikanalys presents official statistics. / Datamaterialet som används i uppsatsen kommer ifrån Trafikverkets leveransuppföljningssystem. I datamaterialet finns information om planerade tåguppdrag för de svenska järnvägarna för år 2014 till 2017 under vecka fyra (bortsett från planerade tåguppdrag för Roslagsbanan och Saltsjöbanan). Trafikanalys med hjälp av Trafikverket redovisar officiell statistik för kortdistanståg, medeldistanståg och långdistanståg på Trafikanalys hemsida. De tre tågkategorierna har inte någon vetenskaplig grund. Syftet med denna studie är därför att undersöka ifall dagens tågindelning fungerar och vilka variabler som hänger ihop med denna indelning. Syftet är även att undersöka om det finns någon bättre tågindelning som kan användas när Trafikanalys redovisar officiell statistik. De statistiska metoder studien utgått ifrån är beslutsträd, neurala nätverk och hierarkisk klustring. Resultatet som erhölls från beslutsträdet var en ackuratess på 92.51 procent för klassificeringen av Tågsort. De variabler som hade störst betydelse för Tågsort var Tåglängd, Planerade tågkilometrar och Planerad km/h. Neurala nätverk användes för att undersöka om även denna metod kunde ge ett liknande resultat som beslutsträdet och därmed stärka tillförlitligheten. Neurala nätverket fick en ackuratess på 88 procent vid klassificeringen av Tågsort. Utifrån dessa två resultat tyder det på att den större andelen tåguppdrag kunde klassificeras till rätt Tågsort. Det innebär att nuvarande klassificering av Tågsort fungerar när Trafikanalys presenterar officiell statistik. För den nya tågklassificeringen analyserades tre grupper när hierarkisk klustring användes. Dessa tre grupper liknande inte dagens indelning för kortdistanståg, medeldistanståg och långdistanståg. Eftersom att de nya indelningarna blandade de olika persontågen går det inte med detta resultat att hitta en bättre indelning som kan användas när Trafikanalys presenterar officiell statistik. Probability Theory and Statistics Sannolikhetsteori och statistik
75	Risk–based modeling, simulation and optimization for the integration of renewable distributed generation into electric power networks / Modélisation, simulation et optimisation basée sur le risque pour l’intégration de génération distribuée renouvelable dans des réseaux de puissance électrique Mena, Rodrigo 30 June 2015 (has links) Il est prévu que la génération distribuée par l’entremise d’énergie de sources renouvelables (DG) continuera à jouer un rôle clé dans le développement et l’exploitation des systèmes de puissance électrique durables, efficaces et fiables, en vertu de cette fournit une alternative pratique de décentralisation et diversification de la demande globale d’énergie, bénéficiant de sources d’énergie plus propres et plus sûrs. L’intégration de DG renouvelable dans les réseaux électriques existants pose des défis socio–technico–économiques, qu’ont attirés de la recherche et de progrès substantiels.Dans ce contexte, la présente thèse a pour objet la conception et le développement d’un cadre de modélisation, simulation et optimisation pour l’intégration de DG renouvelable dans des réseaux de puissance électrique existants. Le problème spécifique à considérer est celui de la sélection de la technologie,la taille et l’emplacement de des unités de génération renouvelable d’énergie, sous des contraintes techniques, opérationnelles et économiques. Dans ce problème, les questions de recherche clés à aborder sont: (i) la représentation et le traitement des variables physiques incertains (comme la disponibilité de les diverses sources primaires d’énergie renouvelables, l’approvisionnement d’électricité en vrac, la demande de puissance et l’apparition de défaillances de composants) qui déterminent dynamiquement l’exploitation du réseau DG–intégré, (ii) la propagation de ces incertitudes sur la réponse opérationnelle du système et le suivi du risque associé et (iii) les efforts de calcul intensif résultant du problème complexe d’optimisation combinatoire associé à l’intégration de DG renouvelable.Pour l’évaluation du système avec un plan d’intégration de DG renouvelable donné, un modèle de calcul de simulation Monte Carlo non–séquentielle et des flux de puissance optimale (MCS–OPF) a été conçu et mis en oeuvre, et qui émule l’exploitation du réseau DG–intégré. Réalisations aléatoires de scénarios opérationnels sont générés par échantillonnage à partir des différentes distributions des variables incertaines, et pour chaque scénario, la performance du système est évaluée en termes économiques et de la fiabilité de l’approvisionnement en électricité, représenté par le coût global (CG) et l’énergie non fournie (ENS), respectivement. Pour mesurer et contrôler le risque par rapport à la performance du système, deux indicateurs sont introduits, la valeur–à–risque conditionnelle(CVaR) et l’écart du CVaR (DCVaR).Pour la sélection optimale de la technologie, la taille et l’emplacement des unités DG renouvelables,deux approches distinctes d’optimisation multi–objectif (MOO) ont été mis en oeuvre par moteurs de recherche d’heuristique d’optimisation (HO). La première approche est basée sur l’algorithme génétique élitiste de tri non-dominé (NSGA–II) et vise à la réduction concomitante de l’espérance mathématique de CG et de ENS, dénotés ECG et EENS, respectivement, combiné avec leur valeurs correspondent de CVaR(CG) et CVaR(ENS); la seconde approche effectue un recherche à évolution différentielle MOO (DE) pour minimiser simultanément ECG et s’écart associé DCVaR(CG). Les deux approches d’optimisation intègrent la modèle de calcul MCS–OPF pour évaluer la performance de chaque réseau DG–intégré proposé par le moteur de recherche HO.Le défi provenant de les grands efforts de calcul requises par les cadres de simulation et d’optimisation proposée a été abordée par l’introduction d’une technique originale, qui niche l’analyse de classification hiérarchique (HCA) dans un moteur de recherche de DE.Exemples d’application des cadres proposés ont été élaborés, concernant une adaptation duréseau test de distribution électrique IEEE 13–noeuds et un cadre réaliste du système test de sous–transmission et de distribution IEEE 30–noeuds. [...] / Renewable distributed generation (DG) is expected to continue playing a fundamental role in the development and operation of sustainable, efficient and reliable electric power systems, by virtue of offering a practical alternative to diversify and decentralize the overall power generation, benefiting from cleaner and safer energy sources. The integration of renewable DG in the existing electric powernetworks poses socio–techno–economical challenges, which have attracted substantial research and advancement.In this context, the focus of the present thesis is the design and development of a modeling,simulation and optimization framework for the integration of renewable DG into electric powernetworks. The specific problem considered is that of selecting the technology, size and location of renewable generation units, under technical, operational and economic constraints. Within this problem, key research questions to be addressed are: (i) the representation and treatment of the uncertain physical variables (like the availability of diverse primary renewable energy sources, bulk–power supply, power demands and occurrence of components failures) that dynamically determine the DG–integrated network operation, (ii) the propagation of these uncertainties onto the system operational response and the control of the associated risk and (iii) the intensive computational efforts resulting from the complex combinatorial optimization problem of renewable DG integration.For the evaluation of the system with a given plan of renewable DG, a non–sequential MonteCarlo simulation and optimal power flow (MCS–OPF) computational model has been designed and implemented, that emulates the DG–integrated network operation. Random realizations of operational scenarios are generated by sampling from the different uncertain variables distributions,and for each scenario the system performance is evaluated in terms of economics and reliability of power supply, represented by the global cost (CG) and the energy not supplied (ENS), respectively.To measure and control the risk relative to system performance, two indicators are introduced, the conditional value–at–risk (CVaR) and the CVaR deviation (DCVaR).For the optimal technology selection, size and location of the renewable DG units, two distinct multi–objective optimization (MOO) approaches have been implemented by heuristic optimization(HO) search engines. The first approach is based on the fast non–dominated sorting genetic algorithm(NSGA–II) and aims at the concurrent minimization of the expected values of CG and ENS, thenECG and EENS, respectively, combined with their corresponding CVaR(CG) and CVaR(ENS) values; the second approach carries out a MOO differential evolution (DE) search to minimize simultaneously ECG and its associated deviation DCVaR(CG). Both optimization approaches embed the MCS–OPF computational model to evaluate the performance of each DG–integrated network proposed by the HO search engine. The challenge coming from the large computational efforts required by the proposed simulation and optimization frameworks has been addressed introducing an original technique, which nests hierarchical clustering analysis (HCA) within a DE search engine. Examples of application of the proposed frameworks have been worked out, regarding an adaptation of the IEEE 13 bus distribution test feeder and a realistic setting of the IEEE 30 bussub–transmission and distribution test system. The results show that these frameworks are effectivein finding optimal DG–integrated networks solutions, while controlling risk from two distinctperspectives: directly through the use of CVaR and indirectly by targeting uncertainty in the form ofDCVaR. Moreover, CVaR acts as an enabler of trade–offs between optimal expected performanceand risk, and DCVaR integrates also uncertainty into the analysis, providing a wider spectrum ofinformation for well–supported and confident decision making. Génération distribuée Valeur–à–risque conditionnelle Évolution différentielle Analyse de classification hiérarchique Simulation Renewable distributed generation Conditional value–at–risk Differential evolution Hierarchical clustering analysis Simulation
76	Selecionando candidatos a descritores para agrupamentos hierárquicos de documentos utilizando regras de associação / Selecting candidate labels for hierarchical document clusters using association rules Fabiano Fernandes dos Santos 17 September 2010 (has links) Uma forma de extrair e organizar o conhecimento, que tem recebido muita atenção nos últimos anos, é por meio de uma representação estrutural dividida por tópicos hierarquicamente relacionados. Uma vez construída a estrutura hierárquica, é necessário encontrar descritores para cada um dos grupos obtidos pois a interpretação destes grupos é uma tarefa complexa para o usuário, já que normalmente os algoritmos não apresentam descrições conceituais simples. Os métodos encontrados na literatura consideram cada documento como uma bag-of-words e não exploram explicitamente o relacionamento existente entre os termos dos documento do grupo. No entanto, essas relações podem trazer informações importantes para a decisão dos termos que devem ser escolhidos como descritores dos nós, e poderiam ser representadas por regras de associação. Assim, o objetivo deste trabalho é avaliar a utilização de regras de associação para apoiar a identificação de descritores para agrupamentos hierárquicos. Para isto, foi proposto o método SeCLAR (Selecting Candidate Labels using Association Rules), que explora o uso de regras de associação para a seleção de descritores para agrupamentos hierárquicos de documentos. Este método gera regras de associação baseadas em transações construídas à partir de cada documento da coleção, e utiliza a informação de relacionamento existente entre os grupos do agrupamento hierárquico para selecionar candidatos a descritores. Os resultados da avaliação experimental indicam que é possível obter uma melhora significativa com relação a precisão e a cobertura dos métodos tradicionais / One way to organize knowledge, that has received much attention in recent years, is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters, since most algorithms do not produce simple descriptions and the interpretation of these clusters is a difficult task for users. The related works consider each document as a bag-of-words and do not explore explicitly the relationship between the terms of the documents. However, these relationships can provide important information to the decision of the terms that must be chosen as descriptors of the nodes, and could be represented by rass. This works aims to evaluate the use of association rules to support the identification of labels for hierarchical document clusters. Thus, this paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical clusters of documents. This method generates association rules based on transactions built from each document in the collection, and uses the information relationship between the nodes of hierarchical clustering to select candidates for labels. The experimental results show that it is possible to obtain a significant improvement with respect to precision and recall of traditional methods Agrupamento hierárquico de documantos Mineração de texto Regras de associação Association rules Hierarchical document clustering Label hierarchical clustering Text mining
77	Clustering the Web : Comparing Clustering Methods in Swedish / Webbklustring : En jämförelse av klustringsmetoder på svenska Hinz, Joel January 2013 (has links) Clustering -- automatically sorting -- web search results has been the focus of much attention but is by no means a solved problem, and there is little previous work in Swedish. This thesis studies the performance of three clustering algorithms -- k-means, agglomerative hierarchical clustering, and bisecting k-means -- on a total of 32 corpora, as well as whether clustering web search previews, called snippets, instead of full texts can achieve reasonably decent results. Four internal evaluation metrics are used to assess the data. Results indicate that k-means performs worse than the other two algorithms, and that snippets may be good enough to use in an actual product, although there is ample opportunity for further research on both issues; however, results are inconclusive regarding bisecting k-means vis-à-vis agglomerative hierarchical clustering. Stop word and stemmer usage results are not significant, and appear to not affect the clustering by any considerable magnitude. clustering web search results snippets k-means agglomerative hierarchical clustering bisecting k-means swedish Human Computer Interaction
78	Etude statistique de la variabilité des teneurs atmosphériques en aérosols désertiques en Afrique de l'Ouest / Statistical study of the variability of atmospheric desert aerosols concentrations over West Africa Kaly, François 13 April 2016 (has links) Le but de cette thèse est de documenter la variabilité des teneurs en aérosols minéraux en Afrique de l’Ouest et de comprendre les mécanismes qui la contrôlent. L’analyse des concentrations mesurées au sol au Sahel montre que ce sont les transports d’aérosols sahariens qui sont responsables des maxima de concentration observés en saison sèche Marticorena et al. (2010). Ces évènements présentent une très forte variabilité à l’échelle intra-saisonnière et interannuelle mais une certaine persistance à l’échelle régionale. Quelles sont les conditions météorologiques qui expliquent cette variabilité ? La réponse à cette question ne peut être obtenue qu’au travers d’une analyse systématique et conjointe des conditions météorologiques régionales et des mesures d’aérosol désertiques sur le continent (concentration de surface, épaisseurs optiques en aérosols) et au large de l’Afrique de l’Ouest. Les mesures d’épaisseur optique en aérosols (AOT) permettent la surveillance globale du contenu atmosphérique en particules. Cependant la relation permettant de retrouver les concentrations massiques en (PM10) à partir des mesures d’épaisseur optique n’est pas toujours simple. Ainsi dans un premier temps l’analyse de la variabilité de la concentration en PM10 et des conditions météorologiques locales mesurées au sol, a permis de confirmer un cycle saisonnier régional présenté par Marticorena et al. (2010), caractérisé par un maximum de concentration en saison sèche et un minimum pendant la saison des pluies sur l’ensemble de trois stations. L’analyse du cycle diurne des concentrations a permis de voir qu’il est diffèrent selon les saisons. En saison sèche, le jet de basses couches (NLLJ en anglais) saharien semble moduler le cycle diurne des concentrations en particules mesurées au Sahel, tandis que l’évolution diurne des concentrations de poussière en saison des pluies apparaît être modulé par l’évolution des systèmes convectifs a l’échelle régionale.Dans un deuxième temps, une méthodologie basée sur une classification en types de temps a été proposée. Elle permet de mettre en évidence des situations météorologiques typiques pour lesquelles la relation AOT-PM10 est simplifiée, puis une modélisation de PM10 à partir de l’épaisseur optique. Pour cela deux approches ont été adoptées. D’abord une famille composée de six régimes de temps bruts a été définie décrivant la saisonnalité climatique. La mise en relation AOT-PM10 permet de voir que les régimes de temps caractéristiques du flux d’harmattan présentent une relation meilleure en terme de corrélation par rapport à la relation avant la séparation en type de temps. Ensuite six régimes de temps ayant un impact sur la variabilité interannuelle des aérosols ont été définis. Les régimes de temps obtenus sont caractérisés par une forte variabilité interannuelle et moins persistante que pour les régimes de temps bruts. Les régimes de temps sont caractérisés par des anomalies de circulation atmosphérique, soit cyclonique, soit anticyclonique, d’échelle synoptique et centrée sur le nord du domaine. L’intégration de l’ensemble de cette approche en régimes de temps permet d’obtenir des résultats satisfaisants dans la prévision des valeurs de PM10, ouvrant des perspectives en terme de système d’alerte précoce. / The aim of this thesis is to document the variability of the atmospheric dust content in West Africa and to understand the mechanisms that control it. The analysis of the concentrations measured ground in the Sahel shows that it is the transport of Saharan aerosols that are responsible for the maximum concentration observed in the dry season. These events have a very high variability at intra-seasonal and interannual scale with some persistence on a regional scale. What are the weather conditions that explain this variability? The answer to this question can be obtained only through a systematic analysis of regional weather and desert aerosol measurements on the continent (surface concentration, aerosol optical thickness) and of the West Africa. The aerosol optical thickness measurements (AOT) allow global monitoring of atmospheric particulate content. However the relationship allowing to find the concentrations (PM10) from the optical thickness measurement is not always simple. So at first the analysis of the variability of the PM10 concentration and local meteorological conditions measured on the ground, confirmed a regional seasonal cycle presented by Marticorena et al. (2010), characterized by a maximum concentration during the dry season and a minimum during the rainy season on all three stations. The analysis of the diurnal cycle concentrations allowed seeing that it differ depending on the season. In the dry season, the Saharian low level jet (NLLJ) appears to modulate the diurnal cycle of particulate concentrations in the Sahel, while the diurnal evolution of concentrations of the rainy season appears to be modulated by changes in convective systems has regionally. Secondly, a methodology based on a classification into weather types was proposed. It allows to highlight typical meteorological situations in which the AOT- PM10 relationship is simplified and modeling PM10 from the Aerosol Optical Thickness. For this, two approaches have been adopted. At first a family of six weather types has been defined describing the climate seasonality. The link between AOT and PM10 can allowed to see that the weather types characterise to harmattan flow have a better relationship in terms of correlation to the relationship before separating in weather type. Then six weather types affecting the interannual variability of aerosols were defined. The resulting weather patterns are characterized by high interannual variability and less persistent than for rough weather regimes. The weather patterns are characterized by atmospheric circulation anomalies, either cyclonic or anticyclonic at synoptic scale and centered on the northern area. The integration of all of this approach in weather types provides satisfactory results in forecasting PM10 values, opening perspectives in terms of early warning system. Régimes de temps Carte topologique auto organisatrice Classification ascendante hiérarchique PM10 AOT Afrique de l'Ouest Weather regims Self-organizing map Hierarchical clustering 550.5
79	Approaches to detect and classify Megavirales / Méthodes de dépistage et de classification des mégavirales Sharma, Vikas 23 October 2015 (has links) Les Megavirales appartiennent à des familles de virus géants infectant un grand nombre d'hôtes eucaryotes. Leurs génomes ont des tailles variant de 100 kb to 2.5 mb et leur composition a montré des caractéristiques surprenantes qui ont soulevées diverses questions sur l’origine et l’évolution de ces virus. Les études de métagénomique environnementale ont montré qu’il existe une «matière noire», composée de séquences reliées à aucun organisme connu. Cependant, l'identification des séquences a été principalement réalisée en utilisant les séquences ADN ribosomal (ADNr), ce qui conduit à ignorer les virus. D’autres gènes informationnels « cœur », incluant la DNA-dependant RNA polymerase (RNAP) constituent d'autres marqueurs qui apparaissent comme plus appropriés pour une classification plus exhaustive des séquences, puisqu’ils apparaissent conservés dans les organismes cellulaires ainsi que les mégavirus. Nous avons utilisé un petit ensemble de gènes universels conservés incluant la RNAP et avons reconstruit des séquences ancestrales pour rechercher des séquences reliées aux mégavirus dans les bases de données. Cela a permis d’identifier trois nouvelles séquences de megavirus qui avaient été mal annotées comme correspondant à des organismes cellulaires, ainsi que de nouveaux clades viraux dans les bases métagénomiques environnementales. De plus, nous avons montré que l’ordre Megavirales constituait une quatrième branche monophylogénétique ou « TRUC » (pour Things Resisting Uncompleted Classification). Nos analyses montrent également que la RNAP ainsi que quelques autres gènes utilisés dans nos études permettent de considérer un répertoire plus complet d’organismes que l’ADNr. / Nucleocytoplasmic large DNA viruses (NCLDVs), or representatives of order Megavirales, belong to families of giant viruses that infect a large number of eukaryotic hosts. These viruses genomes size ranges from 100 kb to 2.5 mb and compose surprising features, which raised various questions about their origin and evolution. Environmental metagenomic studies showed that there is a “dark matter”, composed of sequences not linked to any known organism. However sequence identification was mainly determined using ribosomal DNA (rDNA) sequences, which led therefore to ignore viruses, because they are devoid of such genes. Informational genes, including DNA-dependant RNA polymerase (RNAP), are other markers that appear as more appropriate for a comprehensive classification as they are conserved in cellular organisms (Bacteria, Archaea and Eukarya) and in Megavirales. We used a small set of universally conserved genes that included RNAP and reconstructed ancestral sequences to search for megavirus relatives in sequence databases and to perform phylogeny reconstructions. This allowed identified three megaviral sequences that were misannotated as cellular orgainsms, and new viral clades in environmental databases. In addition, we delineated Megavirales as a fourth monophylogenetic TRUC (things resisting uncompleted classification) aside cellular organisms. Moreover, we classified by phylogenetic and phyletic analyses based on informational genes new giant viruses as new bona fide members of the fourth TRUC. Our analyses shows that RNAP as well as a few other genes used in our studies allow a more comprehensive overview and classification of the biological diversity than rDNA. Virus Géants Megavirus Ncldv Gènes informationnels Truc Les domaines de vie Phylogénie Clustering hiérarchique Giant virus Megavirales Ncldv Informational genes Truc Domains of life Phylogeny Hierarchical clustering
80	High-dimensional VAR analysis of regional house prices in United States / Analýza regionálních cen nemovitostí ve Spojených státech pomocí vysokodimenzionálního VAR modelu Krčál, Adam January 2015 (has links) In this thesis the heterogeneity of regional real estate prices in United States is investigated. A high dimensional VAR model with additional exogenous predictors, originally introduced by \cite{fan11}, is adopted. In this framework, the common factor in regional house prices dynamics is explained by exogenous predictors and the spatial dependencies are captured by lagged house prices in other regions. For the purpose of estimation and variable selection under high-dimensional setting the concept of Penalized Least Squares (PLS) with different penalty functions (e.g. LASSO penalty) is studied in detail and implemented. Moreover, clustering methods are employed to identify subsets of statistical regions with similar house prices dynamics. It is demonstrated that these clusters are well geographically defined and contribute to a better interpretation of the VAR model. Next, we make use of the LASSO variable selection property in order to construct the impulse response functions and to simulate the prices behavior when a shock occurs. And last but not least, one-period-ahead forecasts from VAR model are compared to those from the Diffusion Index Factor Model by \cite{stock02}, a commonly used model for forecasts.

Search results