• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 171
  • 54
  • 50
  • 49
  • 10
  • 8
  • 8
  • 6
  • 5
  • 5
  • 5
  • 3
  • 3
  • 2
  • 2
  • Tagged with
  • 447
  • 95
  • 73
  • 71
  • 66
  • 56
  • 46
  • 43
  • 43
  • 38
  • 37
  • 33
  • 32
  • 32
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
301

Estimation of Pareto Distribution Functions from Samples Contaminated by Measurement Errors

Kondlo, Lwando Orbet January 2010 (has links)
>Magister Scientiae - MSc / Estimation of population distributions, from samples that are contaminated by measurement errors, is a common problem. This study considers the problem of estimating the population distribution of independent random variables Xi, from error-contaminated samples ~i (.j = 1, ... , n) such that Yi = Xi + f·.i, where E is the measurement error, which is assumed independent of X. The measurement error ( is also assumed to be normally distributed. Since the observed distribution function is a convolution of the error distribution with the true underlying distribution, estimation of the latter is often referred to as a deconvolution problem. A thorough study of the relevant deconvolution literature in statistics is reported. We also deal with the specific case when X is assumed to follow a truncated Pareto form. If observations are subject to Gaussian errors, then the observed Y is distributed as the convolution of the finite-support Pareto and Gaussian error distributions. The convolved probability density function (PDF) and cumulative distribution function (CDF) of the finite-support Pareto and Gaussian distributions are derived. The intention is to draw more specific connections bet.ween certain deconvolution methods and also to demonstrate the application of the statistical theory of estimation in the presence of measurement error. A parametric methodology for deconvolution when the underlying distribution is of the Pareto form is developed. Maximum likelihood estimation (MLE) of the parameters of the convolved distributions is considered. Standard errors of the estimated parameters are calculated from the inverse Fisher's information matrix and a jackknife method. Probability-probability (P-P) plots and Kolmogorov-Smirnov (K-S) goodnessof- fit tests are used to evaluate the fit of the posited distribution. A bootstrapping method is used to calculate the critical values of the K-S test statistic, which are not available. Simulated data are used to validate the methodology. A real-life application of the methodology is illustrated by fitting convolved distributions to astronomical data
302

Modeling and Simulation of Spatial Extremes Based on Max-Infinitely Divisible and Related Processes

Zhong, Peng 17 April 2022 (has links)
The statistical modeling of extreme natural hazards is becoming increasingly important due to climate change, whose effects have been increasingly visible throughout the last decades. It is thus crucial to understand the dependence structure of rare, high-impact events over space and time for realistic risk assessment. For spatial extremes, max-stable processes have played a central role in modeling block maxima. However, the spatial tail dependence strength is persistent across quantile levels in those models, which is often not realistic in practice. This lack of flexibility implies that max-stable processes cannot capture weakening dependence at increasingly extreme levels, resulting in a drastic overestimation of joint tail risk. To address this, we develop new dependence models in this thesis from the class of max-infinitely divisible (max-id) processes, which contain max-stable processes as a subclass and are flexible enough to capture different types of dependence structures. Furthermore, exact simulation algorithms for general max-id processes are typically not straightforward due to their complex formulations. Both simulation and inference can be computationally prohibitive in high dimensions. Fast and exact simulation algorithms to simulate max-id processes are provided, together with methods to implement our models in high dimensions based on the Vecchia approximation method. These proposed methodologies are illustrated through various environmental datasets, including air temperature data in South-Eastern Europe in an attempt to assess the effect of climate change on heatwave hazards, and sea surface temperature data for the entire Red Sea. In another application focused on assessing how the spatial extent of extreme precipitation has changed over time, we develop new time-varying $r$-Pareto processes, which are the counterparts of max-stable processes for high threshold exceedances.
303

Risk–constrained stochastic economic dispatch and demand response with maximal renewable penetration under renewable obligation

Hlalele, Thabo Gregory January 2020 (has links)
In the recent years there has been a great deal of attention on the optimal demand and supply side strategy. The increase in renewable energy sources and the expansion in demand response programmes has shown the need for a robust power system. These changes in power system require the control of the uncertain generation and load at the same time. Therefore, it is important to provide an optimal scheduling strategy that can meet an adequate energy mix under demand response without affecting the system reliability and economic performance. This thesis addresses the following four aspects to these changes. First, a renewable obligation model is proposed to maintain an adequate energy mix in the economic dispatch model while minimising the operational costs of the allocated spinning reserves. This method considers a minimum renewable penetration that must be achieved daily in the energy mix. If the renewable quota is not achieved, the generation companies are penalised by the system operator. The uncertainty of renewable energy sources are modelled using the probability density functions and these functions are used for scheduling output power from these generators. The overall problem is formulated as a security constrained economic dispatch problem. Second, a combined economic and demand response optimisation model under a renewable obligation is presented. Real data from a large-scale demand response programme are used in the model. The model finds an optimal power dispatch strategy which takes advantage of demand response to minimise generation cost and maximise renewable penetration. The optimisation model is applied to a South African large-scale demand response programme in which the system operator can directly control the participation of the electrical water heaters at a substation level. Actual load profile before and after demand reduction are used to assist the system operator in making optimal decisions on whether a substation should participate in the demand response programme. The application of these real demand response data avoids traditional approaches which assume arbitrary controllability of flexible loads. Third, a stochastic multi-objective economic dispatch model is presented under a renewable obligation. This approach minimises the total operating costs of generators and spinning reserves under renewable obligation while maximising renewable penetration. The intermittency nature of the renewable energy sources is modelled using dynamic scenarios and the proposed model shows the effectiveness of the renewable obligation policy framework. Due to the computational complexity of all possible scenarios, a scenario reduction method is applied to reduce the number of scenarios and solve the model. A Pareto optimal solution is presented for a renewable obligation and further decision making is conducted to assess the trade-offs associated with the Pareto front. Four, a combined risk constrained stochastic economic dispatch and demand response model is presented under renewable obligation. An incentive based optimal power dispatch strategy is implemented to minimise generation costs and maximise renewable penetration. In addition, a risk-constrained approach is used to control the financial risks of the generation company under demand response programme. The coordination strategy for the generation companies to dispatch power using thermal generators and renewable energy sources while maintaining an adequate spinning reserve is presented. The proposed model is robust and can achieve significant demand reduction while increasing renewable penetration and decreasing the financial risks for generation companies. / Thesis (PhD (Electrical Engineering))--University of Pretoria, 2020. / Electrical, Electronic and Computer Engineering / PhD (Electrical Engineering) / Unrestricted
304

Aplicación de la teoría de los sistemas complejos y la autoorganización al estudio de la distribución del tamaño de las empresas

Llorca Ponce, Alicia 01 February 2020 (has links)
[EN] The research work presented below tries to advance the explanation of an empirically contrasted phenomenon: the assymetric behaviour of the distribution of the size of companies. Empirical evidence in this regard has shown that, in most cases, economies are supplied by companies of all sizes. The assymetric behaviour of the distribution indicates us that markets are generally made up of very few large-size companies next to a high number of small-size companies. This behaviour, far from being an exclusive characteristic of the distribution of the size of companies, is present in other phenomena not only economic – for instance, the distribution of a population’s income or the localisation of activities in space – but also belonging to other very different fields. In 1949, linguistic George Kingley Zipf published a work in which he talked about various phenomena, distributed in an assymetric way, where you could see a mathematic relation between the size of an event and the frequency of its appearance. This relation, today known as Zipf’s law, indicates us that the appearance frequency of a certain event inversely depends on its size or intensity. Applied to the distribution of the size of companies, the compliance of the law implies that the appearance frequency of companies of a certain size inversely depends on the size achieved by them raised to a power. This behaviour was already discovered by Pareto in 1896 in a controversial question: the distribution of income in a population. Since Zipf’s work was published, many others have discovered the presence of potential laws in the distribution of different phenomena: the intensity of earthquakes, the frequency of words occurrence, the avalanches of species in danger of extinction, or the visits to web pages, among others. Nowadays, the ubiquitous character of this behaviour known as distributions of potential law is recognized. In spite of the important empiric evidence in this regard, theoretical explanations about the abundant appearance of phenomena that are distributed as potential laws have not been very successful. This research, centered in adjusting the distribution of the size of companies to the Zipf law, widens the empirical evidence in this regard: it checks compliance thereof for Spanish companies. Nontheless, beyond empirical work, the aim of the research is to advance the possible theoretical explanations of the phenomenon. In this sense, the researches carried out consider that the paradigme of complexity and Selforganization is the most suitable approach to deal with this matter. The conclusion is that potential laws observed in complex systems are a characteristic of the architecture of selforganized systems. Specifically, Zipf law observed for the distribution of the size of companies is a sign of the selforganization of the system, in our case of the market. Potential laws are considered as a macrobehaviour which spontaneously emerges in systems and derives form the multiple interactions between the agents involved. From these interactions, the statistic behaviour or guideline comes out; it can only be observed at the level of the system. The research recognizes the relation between the appearance of potential laws and the selforganization processes; from now on, the challenge is to try to determine which type of processes give place to the emergence of these laws. Although there are some theoretical explanations and models in this regard, they do not seem to be sufficiently satisfactory; there is still much to advance in the search of underlying mechanisms that generate the appearance of potential laws in the distribution of the size of companies. / [ES] El trabajo de investigación, que se presenta a continuación, trata de avanzar en la explicación de un fenómeno contrastado empíricamente: el comportamiento asimétrico de la distribución del tamaño de las empresas. La evidencia empírica al respecto, ha mostrado que, en la mayor parte de los casos, las economías son abastecidas por empresas de todos los tamaños. El comportamiento asimétrico de la distribución nos indica que los mercados están formados, generalmente, por muy pocas empresas de gran tamaño junto a un número elevado de empresas de pequeño tamaño. Este comportamiento, lejos de ser una característica exclusiva de la distribución del tamaño de las empresas, está presente en otros fenómenos tanto de carácter económico -por ejemplo, la distribución de los ingresos de una población o la localización de actividades en el espacio- como pertenecientes a otros campos muy diversos. En 1949, el lingüística George Kingley Zipf publicó un trabajo en el que dio a conocer diversos fenómenos, distribuidos de forma asimétrica, en los que se podía observar una relación matemática entre el tamaño de un suceso y su frecuencia de aparición. Esta relación, hoy conocida como ley de Zipf, nos indica que la frecuencia de aparición de un determinado suceso depende inversamente de su tamaño o intensidad. Aplicado a la distribución del tamaño de las empresas, el cumplimiento de la ley implica que la frecuencia de aparición de empresas con un determinado tamaño depende inversamente del tamaño alcanzado por ésta elevado a una potencia. Este comportamiento ya fue descubierto por Pareto en el año 1896 en una cuestión polémica: la distribución de los ingresos de la población. Desde la publicación del trabajo de Zipf muchos han descubierto la presencia de leyes potenciales en la distribución de diversos fenómenos: la intensidad de los terremotos, la frecuencia de aparición de las palabras, las avalanchas de extinción de especies, o las visitas a las páginas web, entre otros. Hoy por hoy, se reconoce el carácter ubicuo de este comportamiento conocido como distribuciones de ley potencial. Pese a la importante evidencia empírica al respecto, las explicaciones teóricas sobre la abundante aparición de fenómenos que se distribuyen como leyes potenciales no han tenido demasiado éxito. Esta investigación, centrada en el ajuste de la distribución del tamaño de las empresas a la ley de Zipf, amplia la evidencia empírica al respecto: comprueba su cumplimiento para las empresas españolas. Ahora bien, más allá del trabajo empírico, el objetivo de la investigación es avanzar en las posibles explicaciones teóricas del fenómeno. En este sentido, las investigaciones realizadas consideran que el paradigma de la complejidad y la autoorganización es el enfoque más adecuado para abordar la cuestión. Se concluye que las leyes potenciales observadas en sistemas complejos son una característica de la arquitectura de los sistemas autoorganizados. Concretamente, la ley de Zipf observada para la distribución del tamaño de las empresas es una manifestación de la autoorganización del sistema, en nuestro caso del mercado. Las leyes potenciales son consideradas como un macrocomportamiento que emerge de forma espontánea en los sistemas y que se deriva de las múltiples interacciones entre los agentes que lo forman. De estas últimas, surge la pauta o comportamiento estadístico que sólo es observable a nivel del sistema. La investigación reconoce la relación entre la aparición de leyes potenciales y los procesos de autoorganización, a partir de aquí, el reto es tratar de determinar qué tipo de procesos dan lugar a la emergencia de estas leyes. Aunque existen algunas explicaciones y modelos teóricos al respecto, no parecen ser suficientemente satisfactorios, aún queda mucho por avanzar en la búsqueda de los mecanismos subyacentes que generan la aparición de leyes potenciales en la distribución del tamaño de las empresas. / [CA] l treball d’investigació presentat a continuació tracta d’avançar en l’explicació d’un fenomen contrastat empíricament: el comportament asimètric de la distribució del tamany de les empreses. L’evidència empírica al respecte ha mostrat que, en la majoria dels casos, les economies són abastides per empreses de tots els tamanys. El comportament asimètric de la distribució ens indica que els mercats estan formats, generalment, per molt poques empreses de gran tamany juntament amb una xifra elevada d’empreses de tamany xicotet. Aquest comportament, lluny de ser una característica exclusiva de la distribució del tamany de les empreses, és present en uns altres fenòmens tant de caràcter econòmic – per exemple, la distribució dels ingressos d’una població o la localització d’activitats a l’espai – com pertanyents a uns altres camps molt diversos. El 1949, el lingüista George Kingley Zipf va publicar un treball en què va donar a conéixer diversos fenòmens, distribuïts de manera asimètrica, en els quals es podia observar una relació matemàtica entre entre el tamany d’un succés i la seua freqüència d’aparició. Aquesta relació, hui coneguda com la llei de Zipf, ens indica que la freqüència d’aparició d’un determinat succés depén inversament del seu tamany o intensitat. Aplicat a la distribució del tamany a les empreses, el compliment de la llei indica que la freqüència d’aparició d’empreses amb un determinat tamany depén inversament del tamany aconseguit per aquesta elevat a una potència. Aquest comportament ja va ser descobert per Pareto l’any 1896 en una qüestió polèmica: la distribució dels ingressos de la població. D’ençà de la publicació del treball de Zipf molts han descobert la presència de lleis potencials en la distribució de diversos fenòmens: la intensitat dels terratrèmols, la freqüència d’aparició de les paraules, les allaus d’extinció d’espècies, o les visites a les pàgines web, entre d’altres. Ara per ara es reconeix el caràcter ubic d’aquest comportament conegut com a distribucions de llei potencial. Malgrat la important evidència empírica al respecte, les explicacions teòriques sobre l’abundant aparició de fenòmens que es distribueixen com a lleis potencials no han tingut gaire èxit. Aquesta investigació, centrada en l’ajustament de la distribució del tamany de les empreses a la llei de Zipf, amplia l’evidència empírica al respecte: comprova el seu compliment per a les empreses espanyoles. Ara bé, més enllà del treball empíric, l’objectiu de la investigació és avançar en les possibles explicacions teòriques del fenomen. En aquest sentit, les investigacions realitzades consideren que el paradigma de la Complexitat i l’Autoorganització és l’enfocament més adequat per abordar la qüestió. Es conclou que les lleis potencials observades en sistemes complexos són una característica de l’arquitectura dels sistemes autoorganitzats. Concretament, la llei de Zipf observada per a la distribució del tamany de les empreses és una manifestació de l’autoorganització del sistema, en el nostre cas del mercat. Les lleis potencials són considerades com un macrocomportament que emergeix de manera espontània en els sistemes i que es deriva de les múltiples interaccions entre els agents que el formen. D’aquestes últimes sorgeix la pauta o comportament estadístic que només és observable en l’àmbit del sistema. La investigació reconeix la relació entre l’aparició de lleis potencials i els processos d’autoorganització. A partir d’ací, el repte és tractar de determinar quin tipus de processos donen lloc a l’emergència d’aquestes lleis. Encara que hi ha algunes explicacions i models teòrics al respecte, no semblen ser suficientment satisfactoris. Encara queda molt per avançar en la recerca dels mecanismes subjacents que generen l’aparició de lleis potencials en la distribució del tamany de les empreses. / Llorca Ponce, A. (2007). Aplicación de la teoría de los sistemas complejos y la autoorganización al estudio de la distribución del tamaño de las empresas [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/136194 / TESIS
305

Characteristics of Electricity Storage Technologies for Maintaining Reliability of Grid with High Amounts of Intermittent Energy

Sundararagavan, Sandhya 01 January 2010 (has links) (PDF)
For the grid to be stable, the supply of power must equal the demands of the consumer at every moment during the day. The unpredictable intermittent nature of wind results in inconsistent power generation. Energy storage technologies coupled with a wind farm can not only provide power during fluctuations but also maintain a stable and reliable grid. The objective of the thesis is to perform a comprehensive analysis of different types of energy storage technologies that can be coupled with a wind farm. The analysis is performed on the basis of multiple characteristics which affect their viability. We identified key characteristics for a range of storage technologies, including lead-acid, sodium-sulphur, nickel cadmium, lithium-ion, superconducting magnetic energy storage, electrochemical capacitors, flywheels, flow batteries, pumped hydro and compressed air energy storage systems. We performed a comparison study to analyze trade-offs and assessed potential improvement areas that will make them more competitive in the electric power industry. We suggested viable energy storage systems that could be better and suitable for different applications for an electric grid integrated with a wind farm.
306

Bayesian Modeling of Sub-Asymptotic Spatial Extremes

Yadav, Rishikesh 04 1900 (has links)
In many environmental and climate applications, extreme data are spatial by nature, and hence statistics of spatial extremes is currently an important and active area of research dedicated to developing innovative and flexible statistical models that determine the location, intensity, and magnitude of extreme events. In particular, the development of flexible sub-asymptotic models is in trend due to their flexibility in modeling spatial high threshold exceedances in larger spatial dimensions and with little or no effects on the choice of threshold, which is complicated with classical extreme value processes, such as Pareto processes. In this thesis, we develop new flexible sub-asymptotic extreme value models for modeling spatial and spatio-temporal extremes that are combined with carefully designed gradient-based Markov chain Monte Carlo (MCMC) sampling schemes and that can be exploited to address important scientific questions related to risk assessment in a wide range of environmental applications. The methodological developments are centered around two distinct themes, namely (i) sub-asymptotic Bayesian models for extremes; and (ii) flexible marked point process models with sub-asymptotic marks. In the first part, we develop several types of new flexible models for light-tailed and heavy-tailed data, which extend a hierarchical representation of the classical generalized Pareto (GP) limit for threshold exceedances. Spatial dependence is modeled through latent processes. We study the theoretical properties of our new methodology and demonstrate it by simulation and applications to precipitation extremes in both Germany and Spain. In the second part, we construct new marked point process models, where interest mostly lies in the extremes of the mark distribution. Our proposed joint models exploit intrinsic CAR priors to capture the spatial effects in landslide counts and sizes, while the mark distribution is assumed to take various parametric forms. We demonstrate that having a sub-asymptotic distribution for landslide sizes provides extra flexibility to accurately capture small to large and especially extreme, devastating landslides.
307

A Pareto-Frontier Analysis of Performance Trends for Small Regional Coverage LEO Constellation Systems

Hinds, Christopher Alan 01 December 2014 (has links) (PDF)
As satellites become smaller, cheaper, and quicker to manufacture, constellation systems will be an increasingly attractive means of meeting mission objectives. Optimizing satellite constellation geometries is therefore a topic of considerable interest. As constellation systems become more achievable, providing coverage to specific regions of the Earth will become more common place. Small countries or companies that are currently unable to afford large and expensive constellation systems will now, or in the near future, be able to afford their own constellation systems to meet their individual requirements for small coverage regions. The focus of this thesis was to optimize constellation geometries for small coverage regions with the constellation design limited between 1-6 satellites in a Walker-delta configuration, at an altitude of 200-1500km, and to provide remote sensing coverage with a minimum ground elevation angle of 60 degrees. Few Pareto-frontiers have been developed and analyzed to show the tradeoffs among various performance metrics, especially for this type of constellation system. The performance metrics focus on geometric coverage and include revisit time, daily visibility time, constellation altitude, ground elevation angle, and the number of satellites. The objective space containing these performance metrics were characterized for 5 different regions at latitudes of 0, 22.5, 45, 67.5, and 90 degrees. In addition, the effect of minimum ground elevation angle was studied on the achievable performance of this type of constellation system. Finally, the traditional Walker-delta pattern constraint was relaxed to allow for asymmetrical designs. These designs were compared to see how the Walker-delta pattern performs compared to a more relaxed design space. The goal of this thesis was to provide both a framework as well as obtain and analyze Pareto-frontiers for constellation performance relating to small regional coverage LEO constellation systems. This work provided an in-depth analysis of the trends in both the design and objective space of the obtained Pareto-frontiers. A variation on the εNSGA-II algorithm was utilized along with a MATLAB/STK interface to produce these Pareto-frontiers. The εNSGA-II algorithm is an evolutionary algorithm that was developed by Kalyanmoy Deb to solve complex multi-objective optimization problems. The algorithm used in this study proved to be very efficient at obtaining various Pareto-frontiers. This study was also successful in characterizing the design and solution space surrounding small LEO remote sensing constellation systems providing small regional coverage.
308

Using Pareto points for model identification in predictive toxicology

Palczewska, Anna Maria, Neagu, Daniel, Ridley, Mick J. January 2013 (has links)
no / Predictive toxicology is concerned with the development of models that are able to predict the toxicity of chemicals. A reliable prediction of toxic effects of chemicals in living systems is highly desirable in cosmetics, drug design or food protection to speed up the process of chemical compound discovery while reducing the need for lab tests. There is an extensive literature associated with the best practice of model generation and data integration but management and automated identification of relevant models from available collections of models is still an open problem. Currently, the decision on which model should be used for a new chemical compound is left to users. This paper intends to initiate the discussion on automated model identification. We present an algorithm, based on Pareto optimality, which mines model collections and identifies a model that offers a reliable prediction for a new chemical compound. The performance of this new approach is verified for two endpoints: IGC50 and LogP. The results show a great potential for automated model identification methods in predictive toxicology.
309

Multi-objective day-ahead scheduling of microgrids using modified grey wolf optimizer algorithm

Javidsharifi, M., Niknam, T., Aghaei, J., Mokryani, Geev, Papadopoulos, P. 10 August 2018 (has links)
Yes / Investigation of the environmental/economic optimal operation management of a microgrid (MG) as a case study for applying a novel modified multi-objective grey wolf optimizer (MMOGWO) algorithm is presented in this paper. MGs can be considered as a fundamental solution in order for distributed generators’ (DGs) management in future smart grids. In the multi-objective problems, since the objective functions are conflict, the best compromised solution should be extracted through an efficient approach. Accordingly, a proper method is applied for exploring the best compromised solution. Additionally, a novel distance-based method is proposed to control the size of the repository within an aimed limit which leads to a fast and precise convergence along with a well-distributed Pareto optimal front. The proposed method is implemented in a typical grid-connected MG with non-dispatchable units including renewable energy sources (RESs), along with a hybrid power source (micro-turbine, fuel-cell and battery) as dispatchable units, to accumulate excess energy or to equalize power mismatch, by optimal scheduling of DGs and the power exchange between the utility grid and storage system. The efficiency of the suggested algorithm in satisfying the load and optimizing the objective functions is validated through comparison with different methods, including PSO and the original GWO. / Supported in part by Royal Academy of Engineering Distinguished Visiting Fellowship under Grant DVF1617\6\45
310

Interpretation, Identification and Reuse of Models. Theory and algorithms with applications in predictive toxicology.

Palczewska, Anna Maria January 2014 (has links)
This thesis is concerned with developing methodologies that enable existing models to be effectively reused. Results of this thesis are presented in the framework of Quantitative Structural-Activity Relationship (QSAR) models, but their application is much more general. QSAR models relate chemical structures with their biological, chemical or environmental activity. There are many applications that offer an environment to build and store predictive models. Unfortunately, they do not provide advanced functionalities that allow for efficient model selection and for interpretation of model predictions for new data. This thesis aims to address these issues and proposes methodologies for dealing with three research problems: model governance (management), model identification (selection), and interpretation of model predictions. The combination of these methodologies can be employed to build more efficient systems for model reuse in QSAR modelling and other areas. The first part of this study investigates toxicity data and model formats and reviews some of the existing toxicity systems in the context of model development and reuse. Based on the findings of this review and the principles of data governance, a novel concept of model governance is defined. Model governance comprises model representation and model governance processes. These processes are designed and presented in the context of model management. As an application, minimum information requirements and an XML representation for QSAR models are proposed. Once a collection of validated, accepted and well annotated models is available within a model governance framework, they can be applied for new data. It may happen that there is more than one model available for the same endpoint. Which one to chose? The second part of this thesis proposes a theoretical framework and algorithms that enable automated identification of the most reliable model for new data from the collection of existing models. The main idea is based on partitioning of the search space into groups and assigning a single model to each group. The construction of this partitioning is difficult because it is a bi-criteria problem. The main contribution in this part is the application of Pareto points for the search space partition. The proposed methodology is applied to three endpoints in chemoinformatics and predictive toxicology. After having identified a model for the new data, we would like to know how the model obtained its prediction and how trustworthy it is. An interpretation of model predictions is straightforward for linear models thanks to the availability of model parameters and their statistical significance. For non linear models this information can be hidden inside the model structure. This thesis proposes an approach for interpretation of a random forest classification model. This approach allows for the determination of the influence (called feature contribution) of each variable on the model prediction for an individual data. In this part, there are three methods proposed that allow analysis of feature contributions. Such analysis might lead to the discovery of new patterns that represent a standard behaviour of the model and allow additional assessment of the model reliability for new data. The application of these methods to two standard benchmark datasets from the UCI machine learning repository shows a great potential of this methodology. The algorithm for calculating feature contributions has been implemented and is available as an R package called rfFC. / BBSRC and Syngenta (International Research Centre at Jealott’s Hill, Bracknell, UK).

Page generated in 0.0219 seconds