Global ETD Search

11	[pt] EFEITO DA MODELAGEM PROBABILÍSTICA DE PARÂMETROS NA ANÁLISE DA INTERFERÊNCIA PRODUZIDA POR MÚLTIPLOS SATÉLITES GEOESTACIONÁRIOS EM RECEPTORES DO SERVIÇO FIXO TERRESTRE / [en] EFFECT OF THE PROBABILISTIC MODELING OF PARAMETERS IN THE ANALYSIS OF THE INTERFERENCE PRODUCED BY MULTIPLE GEOSTATIONARY SATELLITES IN FIXED SERVICE RECEIVERS HYGSON ASSEF PEREIRA DA ROCHA 17 November 2016 (has links) [pt] O compartilhamento de frequências entre sistemas de comunicações por satélite e sistemas terrestres tem sido objeto de estudos desde o aparecimento dos primeiros sistemas comerciais de comunicações por satélite. Esta dissertação analisa a proteção de receptores do Serviço Fixo Terrestre (FS - Fixed Service) contra a interferência agregada produzida pela transmissão de múltiplos satélites geoestacionários do Serviço Fixo por Satélite (FSS - Fixed Satellite Service) na faixa de 7 GHz. Estudos recentes, realizados no âmbito do Setor de Radicomunicações da União Internacional de Telecomunicações (ITU-R), consideraram modelos matemáticos determinísticos conservadores. Este trabalho desenvolve uma análise probabilística dessa interferência agregada, baseada em um modelo matemático no qual as densidades de fluxo de potência produzidas pelos satélites na superfície da Terra e os ganhos nos lóbulos laterais da antena receptora do FS são caracterizados como variáveis aleatórias. O modelo matemático proposto foi aplicado a alguns cenários específicos e os resultados obtidos foram comparados àqueles resultantes do cálculo determinístico. / [en] The sharing of frequencies between satellite communication systems and terrestrial systems have been studied since the appearance of the first commercial satellite comunicacacoes. This dissertation analyzes the protection of Fixed Service (FS) receivers from the aggregate interference produced by the transmissions of multiple Fixed Satellite Service (FSS) geostationary satellites in the 7 GHz band. Recent studies within the Radiocommunication Sector of the International Telecommunication Union (ITU-R) have considered conservative deterministic mathematical models. This work develops a probabilistic analysis of this aggregate interference, based on a mathematical model in which the power flux density produced by the the satellites on the Earth surface and the FS receiving antenna side-lobe gains are characterized as random variables. The proposed mathematical model was applied to some specific scenarios and the obtained results were compared to those resulted from the deterministic calculation. [pt] INTERFERENCIA [pt] SATELITE GEOESTACIONARIO [pt] SERVICO FIXO TERRESTRE [pt] MODELAGEM PROBABILISTICA [en] INTERFERENCE [en] FIXED SERVICE [en] PROBABILISTIC MODELLING
12	[pt] CÁLCULO DE INTERFERÊNCIAS ENTRE REDES DE COMUNICAÇÃO POR SATÉLITE: EFEITO CONJUNTO DA MODELAGEM PROBABILÍSTICA DAS POSIÇÕES GEOGRÁFICAS E DOS GANHOS NOS LÓBULOS LATERAIS DAS ANTENAS DAS ESTAÇÕES TERRENAS / [en] EVALUATION OF INTERFERENCE AMONG SATELLITE COMMUNICATIONS NETWORKS: JOINT EFFECT OF THE PROBABILISTIC MODELING OF EARTH STATION ANTENNA LOCATIONS AND ANTENNA SIDELOBE GAINS ALBERTH RONAL TAMO CALLA 25 August 2014 (has links) [pt] Num ambiente onde diversos sistemas de comunicação compartilham uma determinada faixa de frequências, cada um dos sistemas envolvidos opera sujeito às interferências geradas pelos demais sistemas. Neste ambiente, cresce a importância de uma avaliação precisa dos efeitos de interferência. Dada a complexidade do problema, o cálculo de interferências é usualmente feito considerando-se diversas situações de pior caso e hipóteses simplificadoras. Estas situações incluem, por exemplo, a hipótese de que a degradação devida a chuvas está presente apenas no enlace vítima, não afetando os enlaces interferentes, a hipótese de que as estações terrenas envolvidas estão localizadas nos pontos mais desfavoráveis (em termos de interferência) de suas área de serviço e a consideração de um diagrama de referência para os diagramas de radiação das antenas. Estas hipóteses implicam num cálculo de interferências conservador, no qual os níveis de interferência obtidos são maiores do que os níveis reais. No presente trabalho, como alternativa à consideração de que as estações terrenas envolvidas estão localizadas nos pontos mais desfavoráveis das áreas de serviço, as posições geográficas das estações terrenas são modeladas probabilisticamente, sendo caracterizadas por vetores aleatórios. Neste caso, a razão portadora interferência resultante é também uma variável aleatória cujo comportamento estatístico é avaliado no trabalho. Resultados numéricos são obtidos para situações envolvendo sistemas multifeixes de comunicações por satélite operando na Banda Ka. Estes resultados são comparados àqueles obtidos sob a hipótese de que as estações terrenas envolvidas estão localizadas nos pontos mais desfavoráveis das áreas de serviço. Finalmente, o efeito conjunto da modelagem probabilística tanto das posições geográficas das estações terrenas quanto dos ganhos nos lóbulos laterais de suas antenas, foi avaliado. / [en] When multiple communication systems share a particular frequency band, each system operates subject to interference generated by the others. In this environment, an accurate assessment of the effects of interference is very important. Given the complexity of the problem, interference calculations is usually done considering a number of worst case conditions and simplifying assumptions. They include, for example, the hypothesis that the degradation due to rain is only present in the victim link and do not affect any of the interfering links, the hypothesis that earth stations involved are located at the most unfavorable sites (in terms of interference) within their service areas, and the use of a reference diagram for radiation antenna patterns. These hypotheses imply a conservative calculation of interference in which obtained interference levels are higher than their actual levels. In this work, as an alternative to the assumption that the earth stations are located at the most unfavorable sites of their service areas, a probabilistic model is used. In the proposed model the geographic locations of the ground stations are characterized by random vectors. As a consequence, the resulting carrier interference ratio is also a random variable and its statistical behavior is also evaluated in this work. Numerical results are obtained for particular situations involving multiple multi-beam satellite communication networks operating in the Ka band. The results are compared to those obtained under the assumption that the Earth stations are located at the most unfavorable sites of their service areas. Finally, the joint effect of modeling both, the ground station locations and their antena sidelobe gains, as random quantities was evaluated. [pt] INTERFERENCIA [pt] LOCALIZACAO DAS ESTACOES TERRENAS [pt] GANHOS DE ANTENAS [pt] SISTEMAS VIA SATELITE [pt] MODELAGEM PROBABILISTICA [en] INTERFERENCE [en] PROBABILISTIC MODELLING
13	Conception et application d'un modèle de l'information routière et ses effets sur le trafic / Modelling traveler information and its effects on traffic Nguyen, Thai Phu 29 June 2010 (has links) Les conditions de circulation sur un réseau routier subissent souvent de la congestion. Selon ses sources, la congestion routière peut être classée en deux catégories : la congestion récurrente déterminée par les lois de trafic et la congestion non-récurrente due aux incidents, accidents ou autres aléas sur la route. Grâce à l'avancement des technologies, notamment en informatique, communication et techniques de traitement des données, l'exploitant est devenu capable de détecter les perturbations, de mesurer les effets et même d'anticiper l'état du trafic afin de mieux adapter ses actions d'exploitation. L'information dynamique concernant les conditions de trafic permet aux usagers de réduire l'inconfort et d'effectuer leur choix d'itinéraire de manière plus raisonnable. Pour l'exploitant, le service d'information aux usagers peut servir à la gestion du trafic. Nous avons étudié la contribution potentielle de l'information dynamique au profit individuel des usagers et à la performance collective du système en prenant en compte : i) la congestion récurrente et non-récurrente ; ii) des différents comportements de choix d'itinéraire en fonction de l'accessibilité à l'information ; iii) d'autres actions de gestion du trafic menées par l'exploitant. Un modèle théorique avec une application analytique sur un réseau élémentaire de deux routes parallèles, une paire origine-destination et deux classes d'usagers respectivement informée ou non-informée nous a permis de retirer de nombreuses indications : i) la diffusion excessive de l'information avec un contenu « neutre » dégrade à la fois le profit individuel et la performance du système ; ii) l'information dynamique avec certain contenu « coopératif » peut contribuer l'optimisation du système sans causer le problème d'acceptabilité ; iii) l'information dynamique et d'autres mesures de gestion dynamique s'interagissent de manière complémentaire à l'optimisation du trafic / Traffic conditions on a road network often suffer from congestion. According to sources, the traffic congestion can be classified into two categories : recurrent congestion determined by the physic laws of traffic and non-recurrent congestion due to incidents, accidents or other hazards on the road. Thanks to the advancement of technologies, including computers, communications and data processing, the traffic operator is now able to detect disturbances, to measure the effects and even to anticipate traffic conditions to better match traffic management activities. Dynamic information on traffic conditions enables users to reduce discomfort and make their route choice decision more reasonable. For the operator, the service user information may be used as a traffic management tool. We investigated the potential contribution of dynamic traffic information for the benefit of individual users and system performance by taking into account : i) recurring congestion and non-recurring ; ii) different route choice behaviours based on accessibility to information service ; iii) other traffic management actions taken by the traffic operator. A theoretical model with an analytical application on a simple two-parallel-road network, an origin-destination pairs and two user classes, respectively-informed or non-informed has given many conclusions : i) an excessive distribution of traffic information with a « neutral » content damages both the individual profit and system performance ; ii) traffic information with some « cooperative » content may help optimize the system performance without causing acceptability problem ; and iii) dynamic information and other traffic management tools interplay in a complementary manner to optimize the traffic Information aux voyageurs Affectation du trafic routier Congestion non-récurente Modélisation probabiliste Equilibre bi-niveau Traveler information Trafic assignment Non-recurrent congestion Probabilistic modelling Bi-layer equilibrium
14	Approche géomatique de la variabilité spatio-temporelle de la contamination microbienne des eaux récréatives Nzang Essono, Francine January 2016 (has links) L’objectif général de cette thèse est de caractériser la dynamique des transferts des bactéries fécales à l’aide d’une modélisation spatio-temporelle, à l’échelle du bassin versant (BV) dans une région agricole et à l’échelle événementielle. Ce projet vise à mieux comprendre l'influence des processus hydrologiques, les facteurs environnementaux et temporels impliqués dans l’explication des épisodes de contamination microbienne des eaux récréatives. Premièrement, un modèle bayésien hiérarchique a été développé pour quantifier et cartographier les niveaux de probabilité des eaux à être contaminées par des effluents agricoles, sur la base des données spectrales et des variables géomorphologiques. Par cette méthode, nous avons pu calculer les relations pondérées entre les concentrations d’Escherichia coli et la distribution de l’ensemble des paramètres agro-pédo-climatiques qui régissent sa propagation. Les résultats ont montré que le modèle bayésien développé peut être utilisé en mode prédictif de la contamination microbienne des eaux récréatives. Ce modèle avec un taux de succès de 71 % a mis en évidence le rôle significatif joué par la pluie qui est la cause principale du transport des polluants. Deuxièmement, le modèle bayésien a fait l’objet d'une analyse de sensibilité liée aux paramètres spatiaux, en utilisant les indices de Sobol. Cette démarche a permis (i) la quantification des incertitudes sur les variables pédologiques, d’occupation du sol et de la distance et (2) la propagation de ces incertitudes dans le modèle probabiliste c'est-à-dire le calcul de l’erreur induite dans la sortie par les incertitudes des entrées spatiales. Enfin, une analyse de sensibilité des simulations aux différentes sources d’incertitude a été effectuée pour évaluer la contribution de chaque facteur sur l’incertitude globale en prenant en compte leurs interactions. Il apparaît que sur l’ensemble des scénarios, l’incertitude de la contamination microbienne dépend directement de la variabilité des sols argileux. Les indices de premier ordre de l’analyse de Sobol ont montré que parmi les facteurs les plus susceptibles d’influer la contamination microbienne, la superficie des zones agricoles est le premier facteur important dans l'évaluation du taux de coliformes. C’est donc sur ce paramètre que l’attention devra se porter dans le contexte de prévision d'une contamination microbienne. Ensuite, la deuxième variable la plus importante est la zone urbaine avec des parts de sensibilité d’environ 30 %. Par ailleurs, les estimations des indices totaux sont meilleures que celles des indices de premier ordre, ce qui signifie que l’impact des interactions paramétriques est nettement significatif pour la modélisation de la contamination microbienne Enfin, troisièmement, nous proposons de mettre en œuvre une modélisation de la variabilité temporelle de la contamination microbiologique du bassin versant du lac Massawippi, à partir du modèle AVSWAT. Il s'agit d'une modélisation couplant les composantes temporelles et spatiales qui caractérisent la dynamique des coliformes. La synthèse des principaux résultats démontrent que les concentrations de coliformes dans différents sous-bassins versants se révèlent influencées par l’intensité de pluie. La recherche a également permis de conclure que les meilleures performances en calage sont obtenues au niveau de l'optimisation multi-objective. Les résultats de ces travaux ouvrent des perspectives encourageantes sur le plan opérationnel en fournissant une compréhension globale de la dynamique de la contamination microbienne des eaux de surface. / Abstract : The aim of this study was to predict water faecal contamination from a bayesian probabilistic model, on a watershed scale in a farming area and on a factual scale. This project aims to better understand the influence of hydrological, environmental and temporal factors involved in the explanation of microbial contamination episodes of recreational waters. First, a bayesian probabilistic model: Weight of Evidence was developed to identify and map the probability of water levels to be contaminated by agricultural effluents, on the basis of spectrals data and geomorphologic variables. By this method, we were able to calculate weighted relationships between concentrations of Escherichia coli and distribution of key agronomic, pedologic and climatic parameters that influence the spread of these microorganisms. The results showed that the Bayesian model that was developed can be used as a prediction of microbial contamination of recreational waters. This model, with a success rate of 71%, highlighted the significant role played by the rain, which is the main cause of pollution transport. Secondly, the Bayesian probabilistic model has been the subject of a sensitivity analysis related to spatial parameters, using Sobol indications. This allowed (1) quantification of uncertainties on soil variables, land use and distance and (2) the spread of these uncertainties in the probabilistic model that is to say, the calculation of induced error in the output by the uncertainties of spatial inputs. Lastly, simulation sensitivity analysis to the various sources of uncertainty was performed to assess the contribution of each factor on the overall uncertainty taking into account their interactions. It appears that of all the scenarios, the uncertainty of the microbial contamination is directly dependent on the variability of clay soils. Sobol prime indications analysis showed that among the most likely to influence the microbial factors, the area of farmland is the first important factor in assessing the coliforms. Importance must be given on this parameter in the context of preparation for microbial contamination. Then, the second most important variable is the urban area with sensitivity shares of approximately 30%. Furthermore, estimates of the total indications are better than those of the first order, which means that the impact of parametric interaction is clearly significant for the modeling of microbial contamination. Thirdly, we propose to implement a temporal variability model of microbiological contamination on the watershed of Lake Massawippi, based on the AVSWAT model. This is a model that couples the temporal and spatial components that characterize the dynamics of coliforms. The synthesis of the main results shows that concentrations of Escherichia coli in different sub-watersheds are influenced by rain intensity. Research also concluded that best performance is obtained by multi-objective optimization. The results of these studies show the prospective of operationally providing a comprehensive understanding of the dynamics of microbial contamination of surface water. Contamination microbienne Eaux récréatives Approche probabiliste Indices de Sobol AVSWAT Cartographie Modélisation spatio-temporelle Faecal contamination Recreational water Probabilistic modelling Sobol’s sensitivity indices Predictive mapping Spatio-temporal model
15	[en] INTERFERENCE ANALYSIS INVOLVING HIGHLY INCLINED ELLIPTICAL ORBIT SATELLITES AND FIXED SERVICE RECEIVERS: PROBABILISTIC MODELING OF THE FIXED SERVICE RECEIVER ANTENNA ELEVATION ANGLE / [pt] ANÁLISE DE INTERFERÊNCIA ENVOLVENDO SATÉLITES EM ÓRBITAS ELÍPTICAS ALTAMENTE INCLINADAS E RECEPTORES DO SERVIÇO FIXO TERRESTRE: MODELAGEM PROBABILÍSTICA DO ÂNGULO DE ELEVAÇÃO DAS ANTENAS DOS RECEPTORES TERRESTRES ANA GABRIELA CORREA MENA 12 May 2014 (has links) [pt] O compartilhamento de frequências entre sistemas de comunicações por satélite e sistemas terrestres tem sido objeto de estudos desde o aparecimento dos primeiros sistemas comerciais de comunicações por satélite. Um caso particular que tem despertado o interesse dos operadores de sistemas terrestres diz respeito à interferência produzida pelos enlaces descendentes de sistemas de satélite que utilizam orbitas elípticas altamente inclinadas (Highly Elliptical Orbit - HEO) nos receptores do Serviço Fixo Terrestre (Fixed Service - FS). Os trabalhos encontrados na literatura, envolvendo este tipo de interferência, apresentam análises que consideram a hipótese simplificadora de que todos os receptores vítima do Serviço Fixo Terrestre têm suas antenas receptoras com elevação zero graus. No presente trabalho é feita uma análise probabilística da interferência agregada produzida por satélites de múltiplos sistemas HEO sobre receptores do Serviço Fixo Terrestre na faixa de 18 GHz, na qual o ângulo de elevação da antena receptora dos sistemas FS considerados é modelado por uma variável aleatória com função densidade de probabilidade conhecida. A modelagem matemática desenvolvida é aplicada a dois cenários envolvendo múltiplos sistemas HEO interferentes. Mais especificamente, um dos cenários considera três sistemas HEO com satélites operando apenas no hemisfério norte e o outro, três sistemas HEO com satélites operando tanto no hemisfério norte quanto no sul. Os resultados obtidos são comparados àqueles que utilizam a hipótese de que as antenas receptoras de todos os sistemas FS considerados têm angulo de elevação zero. / [en] Frequency sharing between satellite systems and terrestrial fixed service (FS) systems has been object of studies since the onset of commercial satellite communication systems. A particular case of interest refers to the interference produced by the downlink of highly elliptical orbit satellite systems (HEO) into fixed service receivers. In the literature, studies involving this type of interference have presented analyzes that consider the simplifying assumption that all victim fixed service receivers have their receiving antennas with zero degree elevation angle. In this work a probabilistic analysis is used to evaluate the aggregate interference produced by the satellites of multiple highly elliptical orbit satellite systems into fixed service receivers operating in the 18 GHz frequency band. The FS receiving antenna elevation angle is modeled as a random variable with known probability density function. The proposed mathematical model is applied to two scenarios involving multiple interfering HEO systems. More specifically, the first scenario considers three interfering HEO systems having satellites that operate only in the northern hemisphere. In the second scenario, three HEO systems with satellites that operate in both the northern and southern hemisphere are considered. The obtained results are compared to those resulting from analyses that use the hypothesis that the receiving antennas of all the FS victim systems have a zero degree elevation angle. [pt] INTERFERENCIA [en] INTERFERENCE [pt] MODELAGEM PROBABILISTICA [en] PROBABILISTIC MODELLING [en] HIGHLY INCLINED ELLIPTICAL ORBITS [pt] SERVICO FIXO TERRESTRE [en] FIXED SERVICE
16	Time series forecasting with applications in macroeconomics and energy Arora, Siddharth January 2013 (has links) The aim of this study is to develop novel forecasting methodologies. The applications of our proposed models lie in two different areas: macroeconomics and energy. Though we consider two very different applications, the common underlying theme of this thesis is to develop novel methodologies that are not only accurate, but are also parsimonious. For macroeconomic time series, we focus on generating forecasts for the US Gross National Product (GNP). The contribution of our study on macroeconomic forecasting lies in proposing a novel nonlinear and nonparametric method, called weighted random analogue prediction (WRAP) method. The out-of-sample forecasting ability of WRAP is evaluated by employing a range of different performance scores, which measure its accuracy in generating both point and density forecasts. We show that WRAP outperforms some of the most commonly used models for forecasting the GNP time series. For energy, we focus on two different applications: (1) Generating accurate short-term forecasts for the total electricity demand (load) for Great Britain. (2) Modelling Irish electricity smart meter data (consumption) for both residential consumers and small and medium-sized enterprises (SMEs), using methods based on kernel density (KD) and conditional kernel density (CKD) estimation. To model load, we propose methods based on a commonly used statistical dimension reduction technique, called singular value decomposition (SVD). Specifically, we propose two novel methods, namely, discount weighted (DW) intraday and DW intraweek SVD-based exponential smoothing methods. We show that the proposed methods are competitive with some of the most commonly used models for load forecasting, and also lead to a substantial reduction in the dimension of the model. The load time series exhibits a prominent intraday, intraweek and intrayear seasonality. However, most existing studies accommodate the âdouble seasonalityâ while modelling short-term load, focussing only on the intraday and intraweek seasonal effects. The methods considered in this study accommodate the âtriple seasonalityâ in load, by capturing not only intraday and intraweek seasonal cycles, but also intrayear seasonality. For modelling load, we also propose a novel rule-based approach, with emphasis on special days. The load observed on special days, e.g. public holidays, is substantially lower compared to load observed on normal working days. Special day effects have often been ignored during the modelling process, which leads to large forecast errors on special days, and also on normal working days that lie in the vicinity of special days. The contribution of this study lies in adapting some of the most commonly used seasonal methods to model load for both normal and special days in a coherent and unified framework, using a rule-based approach. We show that the post-sample error across special days for the rule-based methods are less than half, compared to their original counterparts that ignore special day effects. For modelling electricity smart meter data, we investigate a range of different methods based on KD and CKD estimation. Over the coming decade, electricity smart meters are scheduled to replace the conventional electronic meters, in both US and Europe. Future estimates of consumption can help the consumer identify and reduce excess consumption, while such estimates can help the supplier devise innovative tariff strategies. To the best of our knowledge, there are no existing studies which focus on generating density forecasts of electricity consumption from smart meter data. In this study, we evaluate the density, quantile and point forecast accuracy of different methods across one thousand consumption time series, recorded from both residential consumers and SMEs. We show that the KD and CKD methods accommodate the seasonality in consumption, and correctly distinguish weekdays from weekends. For each application, our comprehensive empirical comparison of the existing and proposed methods was undertaken using multiple performance scores. The results show strong potential for the models proposed in this thesis.
17	Evaluation of Probabilistic Programming Frameworks Munkby, Carl January 2022 (has links) In recent years significant progress has been made in the area of Probabilistic Programming, contributing to a considerably easier workflow for quantitative research in many fields. However, as new Probabilistic Programming Frameworks (PPFs) are continuously being created and developed, there is a need for finding ways of evaluating and benchmarking these frameworks. To this end, this thesis explored the use of a range of evaluation measures to evaluate and better understand the performance of three PPFs: Stan, NumPyro and TensorFlow Probability (TFP). Their respective Hamiltonian Monte Carlo (HMC) samplers were benchmarked on three different hierarchical models using both centered and non-centered parametrizations. The results showed that even if the same inference algorithms were used, the PPFs’ samplers still exhibited different behaviours, which consequently lead to non-negligible differences in their statistical efficiency. Furthermore, the sampling behaviour of the PPFs indicated that the observed differences can possibly be attributed to how the warm-up phase used in HMC-sampling is constructed. Finally, this study concludes that the computational speed of the numerical library used, was the primary deciding factor of performance in this benchmark. This was demonstrated by NumPyros superior computational speed, contributing to it yielding up to 10x higher ESSmin/s than Stan and 4x higher ESSmin/s than TFP. Computational statistics Bayesian statistics Probabilistic programming Probabilistic modelling Stan TensorFlow Probability Pyro NumPyro Markov Chain Monte Carlo Hamiltonian Monte Carlo NUTS Probability Theory and Statistics Sannolikhetsteori och statistik
18	Multi-layer Perceptron Error Surfaces: Visualization, Structure and Modelling Gallagher, Marcus Reginald Unknown Date (has links) The Multi-Layer Perceptron (MLP) is one of the most widely applied and researched Artificial Neural Network model. MLP networks are normally applied to performing supervised learning tasks, which involve iterative training methods to adjust the connection weights within the network. This is commonly formulated as a multivariate non-linear optimization problem over a very high-dimensional space of possible weight configurations. Analogous to the field of mathematical optimization, training an MLP is often described as the search of an error surface for a weight vector which gives the smallest possible error value. Although this presents a useful notion of the training process, there are many problems associated with using the error surface to understand the behaviour of learning algorithms and the properties of MLP mappings themselves. Because of the high-dimensionality of the system, many existing methods of analysis are not well-suited to this problem. Visualizing and describing the error surface are also nontrivial and problematic. These problems are specific to complex systems such as neural networks, which contain large numbers of adjustable parameters, and the investigation of such systems in this way is largely a developing area of research. In this thesis, the concept of the error surface is explored using three related methods. Firstly, Principal Component Analysis (PCA) is proposed as a method for visualizing the learning trajectory followed by an algorithm on the error surface. It is found that PCA provides an effective method for performing such a visualization, as well as providing an indication of the significance of individual weights to the training process. Secondly, sampling methods are used to explore the error surface and to measure certain properties of the error surface, providing the necessary data for an intuitive description of the error surface. A number of practical MLP error surfaces are found to contain a high degree of ultrametric structure, in common with other known configuration spaces of complex systems. Thirdly, a class of global optimization algorithms is also developed, which is focused on the construction and evolution of a model of the error surface (or search spa ce) as an integral part of the optimization process. The relationships between this algorithm class, the Population-Based Incremental Learning algorithm, evolutionary algorithms and cooperative search are discussed. The work provides important practical techniques for exploration of the error surfaces of MLP networks. These techniques can be used to examine the dynamics of different training algorithms, the complexity of MLP mappings and an intuitive description of the nature of the error surface. The configuration spaces of other complex systems are also amenable to many of these techniques. Finally, the algorithmic framework provides a powerful paradigm for visualization of the optimization process and the development of parallel coupled optimization algorithms which apply knowledge of the error surface to solving the optimization problem. error surface neural networks multi-layer perceptron global optimization supervised learning scientific visualization ultrametricity configuration space analysis search space analysis evolutionary algorithms probabilistic modelling probability density estimation principal component analysis
19	Machine Learning methods in shotgun proteomics Truong, Patrick January 2023 (has links) As high-throughput biology experiments generate increasing amounts of data, the field is naturally turning to data-driven methods for the analysis and extraction of novel insights. These insights into biological systems are crucial for understanding disease progression, drug targets, treatment development, and diagnostics methods, ultimately leading to improving human health and well-being, as well as, deeper insight into cellular biology. Biological data sources such as the genome, transcriptome, proteome, metabolome, and metagenome provide critical information about biological system structure, function, and dynamics. The focus of this licentiate thesis is on proteomics, the study of proteins, which is a natural starting point for understanding biological functions as proteins are crucial functional components of cells. Proteins play a crucial role in enzymatic reactions, structural support, transport, storage, cell signaling, and immune system function. In addition, proteomics has vast data repositories and technical and methodological improvements are continually being made to yield even more data. However, generating proteomic data involves multiple steps, which are prone to errors, making sophisticated models essential to handle technical and biological artifacts and account for uncertainty in the data. In this licentiate thesis, the use of machine learning and probabilistic methods to extract information from mass-spectrometry-based proteomic data is investigated. The thesis starts with an introduction to proteomics, including a basic biological background, followed by a description of how massspectrometry-based proteomics experiments are performed, and challenges in proteomic data analysis. The statistics of proteomic data analysis are also explored, and state-of-the-art software and tools related to each step of the proteomics data analysis pipeline are presented. The thesis concludes with a discussion of future work and the presentation of two original research works. The first research work focuses on adapting Triqler, a probabilistic graphical model for protein quantification developed for data-dependent acquisition (DDA) data, to data-independent acquisition (DIA) data. Challenges in this study included verifying that DIA data conformed with the model used in Triqler, addressing benchmarking issues, and modifying the missing value model used by Triqler to adapt for DIA data. The study showed that DIA data conformed with the properties required by Triqler, implemented a protein inference harmonization strategy, and modified the missing value model to adapt for DIA data. The study concluded by showing that Triqler outperformed current protein quantification techniques. The second research work focused on developing a novel deep-learning based MS2-intensity predictor by incorporating the self-attention mechanism called transformer into Prosit, an established Recurrent Neural Networks (RNN) based deep learning framework for MS2 spectrum intensity prediction. RNNs are a type of neural network that can efficiently process sequential data by capturing information from previous steps, in a sequential manner. The transformer self-attention mechanism allows a model to focus on different parts of its input sequence during processing independently, enabling it to capture dependencies and relationships between elements more effectively. The transformers therefore remedy some of the drawbacks of RNNs, as such, we hypothesized that the implementation of MS2-intensity predictor using transformers rather than RNN would improve its performance. Hence, Prosit-transformer was developed, and the study showed that the model training time and the similarity between the predicted MS2 spectrum and the observed spectrum improved. These original research works address various challenges in computational proteomics and contribute to the development of data-driven life science. / Allteftersom high-throughput experiment genererar allt större mängder data vänder sig området naturligt till data-drivna metoder för analys och extrahering av nya insikter. Dessa insikter om biologiska system är avgörande för att förstå sjukdomsprogression, läkemedelspåverkan, behandlingsutveckling, och diagnostiska metoder, vilket i slutändan leder till en förbättring av människors hälsa och välbefinnande, såväl som en djupare förståelse av cell biologi. Biologiska datakällor som genomet, transkriptomet, proteomet, metabolomet och metagenomet ger kritisk information om biologiska systems struktur, funktion och dynamik. I licentiatuppsats fokusområde ligger på proteomik, studiet av proteiner, vilket är en naturlig startpunkt för att förstå biologiska funktioner eftersom proteiner är avgörande funktionella komponenter i celler. Dessa proteiner spelar en avgörande roll i enzymatiska reaktioner, strukturellt stöd, transport, lagring, cellsignalering och immunsystemfunktion. Dessutom har proteomik har stora dataarkiv och tekniska samt metodologiska förbättringar görs kontinuerligt för att ge ännu mer data. Men för att generera proteomisk data krävs flera steg, som är felbenägna, vilket gör att sofistikerade modeller är väsentliga för att hantera tekniska och biologiska artefakter och för att ta hänsyn till osäkerhet i data. I denna licentiatuppsats undersöks användningen av maskininlärning och probabilistiska metoder för att extrahera information från masspektrometribaserade proteomikdata. Avhandlingen börjar med en introduktion till proteomik, inklusive en grundläggande biologisk bakgrund, följt av en beskrivning av hur masspektrometri-baserade proteomikexperiment utförs och utmaningar i proteomisk dataanalys. Statistiska metoder för proteomisk dataanalys utforskas också, och state-of-the-art mjukvara och verktyg som är relaterade till varje steg i proteomikdataanalyspipelinen presenteras. Avhandlingen avslutas med en diskussion om framtida arbete och presentationen av två original forskningsarbeten. Det första forskningsarbetet fokuserar på att anpassa Triqler, en probabilistisk grafisk modell för proteinkvantifiering som utvecklats för datadependent acquisition (DDA) data, till data-independent acquisition (DIA) data. Utmaningarna i denna studie inkluderade att verifiera att DIA-datas egenskaper överensstämde med modellen som användes i Triqler, att hantera benchmarking-frågor och att modifiera missing-value modellen som användes av Triqler till DIA-data. Studien visade att DIA-data överensstämde med de egenskaper som krävdes av Triqler, implementerade en proteininferensharmoniseringsstrategi och modifierade missing-value modellen till DIA-data. Studien avslutades med att visa att Triqler överträffade nuvarande state-of-the-art proteinkvantifieringsmetoder. Det andra forskningsarbetet fokuserade på utvecklingen av en djupinlärningsbaserad MS2-intensitetsprediktor genom att inkorporera self-attention mekanismen som kallas för transformer till Prosit, en etablerad Recurrent Neural Network (RNN) baserad djupinlärningsramverk för MS2 spektrum intensitetsprediktion. RNN är en typ av neurala nätverk som effektivt kan bearbeta sekventiell data genom att bevara och använda dolda tillstånd som fångar information från tidigare steg på ett sekventiellt sätt. Självuppmärksamhetsmekanismen i transformer tillåter modellen att fokusera på olika delar av sekventiellt data samtidigt under bearbetningen oberoende av varandra, vilket gör det möjligt att fånga relationer mellan elementen mer effektivt. Genom detta lyckas Transformer åtgärda vissa nackdelar med RNN, och därför hypotiserade vi att en implementation av en ny MS2-intensitetprediktor med transformers istället för RNN skulle förbättra prestandan. Därmed konstruerades Prosit-transformer, och studien visade att både modellträningstiden och likheten mellan predicerat MS2-spektrum och observerat spektrum förbättrades. Dessa originalforskningsarbeten hanterar olika utmaningar inom beräkningsproteomik och bidrar till utvecklingen av datadriven livsvetenskap. / <p>QC 2023-05-22</p> benchmark mathematical methods transformers computational proteomics proteomics bioinformatics bert ms2 intensity probabilistic modelling Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
20	Reparametrization in deep learning Dinh, Laurent 02 1900 (has links) No description available. Neural networks Deep neural networks Machine learning Deep learning Unsupervised learning Probabilistic modelling Probabilistic models Generative modelling Generative models Generator networks Variational inference Generalization Reparametrization trick Réseaux de neurones Réseaux neuronaux Réseaux de neurones profonds Réseaux neuronaux profonds Apprentissage automatique Apprentissage profond Apprentissage non-supervisé Modélisation probabiliste Modélisation générative Modèles probabilistes Modèles génératifs Réseaux générateurs Inférence variationnelle Généralisation Astuce de la reparamétrisation

Search results