Spelling suggestions: "subject:"[een] MISSING DATA"" "subject:"[enn] MISSING DATA""
141 |
Chemometric Approaches for Systems BiologyFolch Fortuny, Abel 23 January 2017 (has links)
The present Ph.D. thesis is devoted to study, develop and apply approaches commonly used in chemometrics to the emerging field of systems biology. Existing procedures and new methods are applied to solve research and industrial questions in different multidisciplinary teams. The methodologies developed in this document will enrich the plethora of procedures employed within omic sciences to understand biological organisms and will improve processes in biotechnological industries integrating biological knowledge at different levels and exploiting the software packages derived from the thesis.
This dissertation is structured in four parts. The first block describes the framework in which the contributions presented here are based. The objectives of the two research projects related to this thesis are highlighted and the specific topics addressed in this document via conference presentations and research articles are introduced. A comprehensive description of omic sciences and their relationships within the systems biology paradigm is given in this part, jointly with a review of the most applied multivariate methods in chemometrics, on which the novel approaches proposed here are founded.
The second part addresses many problems of data understanding within metabolomics, fluxomics, proteomics and genomics. Different alternatives are proposed in this block to understand flux data in steady state conditions. Some are based on applications of multivariate methods previously applied in other chemometrics areas. Others are novel approaches based on a bilinear decomposition using elemental metabolic pathways, from which a GNU licensed toolbox is made freely available for the scientific community. As well, a framework for metabolic data understanding is proposed for non-steady state data, using the same bilinear decomposition proposed for steady state data, but modelling the dynamics of the experiments using novel two and three-way data analysis procedures. Also, the relationships between different omic levels are assessed in this part integrating different sources of information of plant viruses in data fusion models. Finally, an example of interaction between organisms, oranges and fungi, is studied via multivariate image analysis techniques, with future application in food industries.
The third block of this thesis is a thoroughly study of different missing data problems related to chemometrics, systems biology and industrial bioprocesses. In the theoretical chapters of this part, new algorithms to obtain multivariate exploratory and regression models in the presence of missing data are proposed, which serve also as preprocessing steps of any other methodology used by practitioners. Regarding applications, this block explores the reconstruction of networks in omic sciences when missing and faulty measurements appear in databases, and how calibration models between near infrared instruments can be transferred, avoiding costs and time-consuming full recalibrations in bioindustries and research laboratories. Finally, another software package, including a graphical user interface, is made freely available for missing data imputation purposes.
The last part discusses the relevance of this dissertation for research and biotechnology, including proposals deserving future research. / Esta tesis doctoral se centra en el estudio, desarrollo y aplicación de técnicas quimiométricas en el emergente campo de la biología de sistemas. Procedimientos comúnmente utilizados y métodos nuevos se aplican para resolver preguntas de investigación en distintos equipos multidisciplinares, tanto del ámbito académico como del industrial. Las metodologías desarrolladas en este documento enriquecen la plétora de técnicas utilizadas en las ciencias ómicas para entender el funcionamiento de organismos biológicos y mejoran los procesos en la industria biotecnológica, integrando conocimiento biológico a diferentes niveles y explotando los paquetes de software derivados de esta tesis.
Esta disertación se estructura en cuatro partes. El primer bloque describe el marco en el cual se articulan las contribuciones aquí presentadas. En él se esbozan los objetivos de los dos proyectos de investigación relacionados con esta tesis. Asimismo, se introducen los temas específicos desarrollados en este documento mediante presentaciones en conferencias y artículos de investigación. En esta parte figura una descripción exhaustiva de las ciencias ómicas y sus interrelaciones en el paradigma de la biología de sistemas, junto con una revisión de los métodos multivariantes más aplicados en quimiometría, que suponen las pilares sobre los que se asientan los nuevos procedimientos aquí propuestos.
La segunda parte se centra en resolver problemas dentro de metabolómica, fluxómica, proteómica y genómica a partir del análisis de datos. Para ello se proponen varias alternativas para comprender a grandes rasgos los datos de flujos metabólicos en estado estacionario. Algunas de ellas están basadas en la aplicación de métodos multivariantes propuestos con anterioridad, mientras que otras son técnicas nuevas basadas en descomposiciones bilineales utilizando rutas metabólicas elementales. A partir de éstas se ha desarrollado software de libre acceso para la comunidad científica. A su vez, en esta tesis se propone un marco para analizar datos metabólicos en estado no estacionario. Para ello se adapta el enfoque tradicional para sistemas en estado estacionario, modelando las dinámicas de los experimentos empleando análisis de datos de dos y tres vías. En esta parte de la tesis también se establecen relaciones entre los distintos niveles ómicos, integrando diferentes fuentes de información en modelos de fusión de datos. Finalmente, se estudia la interacción entre organismos, como naranjas y hongos, mediante el análisis multivariante de imágenes, con futuras aplicaciones a la industria alimentaria.
El tercer bloque de esta tesis representa un estudio a fondo de diferentes problemas relacionados con datos faltantes en quimiometría, biología de sistemas y en la industria de bioprocesos. En los capítulos más teóricos de esta parte, se proponen nuevos algoritmos para ajustar modelos multivariantes, tanto exploratorios como de regresión, en presencia de datos faltantes. Estos algoritmos sirven además como estrategias de preprocesado de los datos antes del uso de cualquier otro método. Respecto a las aplicaciones, en este bloque se explora la reconstrucción de redes en ciencias ómicas cuando aparecen valores faltantes o atípicos en las bases de datos. Una segunda aplicación de esta parte es la transferencia de modelos de calibración entre instrumentos de infrarrojo cercano, evitando así costosas re-calibraciones en bioindustrias y laboratorios de investigación. Finalmente, se propone un paquete software que incluye una interfaz amigable, disponible de forma gratuita para imputación de datos faltantes.
En la última parte, se discuten los aspectos más relevantes de esta tesis para la investigación y la biotecnología, incluyendo líneas futuras de trabajo. / Aquesta tesi doctoral es centra en l'estudi, desenvolupament, i aplicació de tècniques quimiomètriques en l'emergent camp de la biologia de sistemes. Procediments comúnment utilizats i mètodes nous s'apliquen per a resoldre preguntes d'investigació en diferents equips multidisciplinars, tant en l'àmbit acadèmic com en l'industrial. Les metodologies desenvolupades en aquest document enriquixen la plétora de tècniques utilitzades en les ciències òmiques per a entendre el funcionament d'organismes biològics i milloren els processos en la indústria biotecnològica, integrant coneixement biològic a distints nivells i explotant els paquets de software derivats d'aquesta tesi.
Aquesta dissertació s'estructura en quatre parts. El primer bloc descriu el marc en el qual s'articulen les contribucions ací presentades. En ell s'esbossen els objectius dels dos projectes d'investigació relacionats amb aquesta tesi. Així mateix, s'introduixen els temes específics desenvolupats en aquest document mitjançant presentacions en conferències i articles d'investigació. En aquesta part figura una descripació exhaustiva de les ciències òmiques i les seues interrelacions en el paradigma de la biologia de sistemes, junt amb una revisió dels mètodes multivariants més aplicats en quimiometria, que supossen els pilars sobre els quals s'assenten els nous procediments ací proposats.
La segona part es centra en resoldre problemes dins de la metabolòmica, fluxòmica, proteòmica i genòmica a partir de l'anàlisi de dades. Per a això es proposen diverses alternatives per a compendre a grans trets les dades de fluxos metabòlics en estat estacionari. Algunes d'elles estàn basades en l'aplicació de mètodes multivariants propostos amb anterioritat, mentre que altres són tècniques noves basades en descomposicions bilineals utilizant rutes metabòliques elementals. A partir d'aquestes s'ha desenvolupat software de lliure accés per a la comunitat científica. Al seu torn, en aquesta tesi es proposa un marc per a analitzar dades metabòliques en estat no estacionari. Per a això s'adapta l'enfocament tradicional per a sistemes en estat estacionari, modelant les dinàmiques dels experiments utilizant anàlisi de dades de dues i tres vies. En aquesta part de la tesi també s'establixen relacions entre els distints nivells òmics, integrant diferents fonts d'informació en models de fusió de dades. Finalment, s'estudia la interacció entre organismes, com taronges i fongs, mitjançant l'anàlisi multivariant d'imatges, amb futures aplicacions a la indústria alimentària.
El tercer bloc d'aquesta tesi representa un estudi a fons de diferents problemes relacionats amb dades faltants en quimiometria, biologia de sistemes i en la indústria de bioprocessos. En els capítols més teòrics d'aquesta part, es proposen nous algoritmes per a ajustar models multivariants, tant exploratoris com de regressió, en presencia de dades faltants. Aquests algoritmes servixen ademés com a estratègies de preprocessat de dades abans de l'ús de qualsevol altre mètode. Respecte a les aplicacions, en aquest bloc s'explora la reconstrucció de xarxes en ciències òmiques quan apareixen valors faltants o atípics en les bases de dades. Una segona aplicació d'aquesta part es la transferència de models de calibració entre instruments d'infrarroig proper, evitant així costoses re-calibracions en bioindústries i laboratoris d'investigació. Finalment, es proposa un paquet software que inclou una interfície amigable, disponible de forma gratuïta per a imputació de dades faltants.
En l'última part, es discutixen els aspectes més rellevants d'aquesta tesi per a la investigació i la biotecnologia, incloent línies futures de treball. / Folch Fortuny, A. (2016). Chemometric Approaches for Systems Biology [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/77148 / Premios Extraordinarios de tesis doctorales
|
142 |
應用資料採礦技術於資料庫加值中的插補方法比較 / Imputation of value-added database in data mining黃雅芳 Unknown Date (has links)
資料在企業資訊來源中扮演了極為重要的角色,特別是在現今知識與技術的世代裡。如果對於一個有意義且具有代表性資料庫中的遺漏值能夠正確的處理,那麼對於企業資訊而言,是一個大有可為的突破。
然而,有時我們或許會遇到一些不是那麼完善的資料庫,當資料庫中的資料有遺漏值時,從這樣資料庫中所獲得的結果,或許會是一些有偏差或容易令人誤解的結果。因此,本研究的目的在於插補遺漏值為資料庫加值,進而根據遺漏值類型建立插補模型。
如果遺漏值為連續型,用迴歸模型和倒傳遞類神經模型來進行插補;如果遺漏值為類別型,採用邏輯斯迴歸、倒傳遞類神經和決策樹進行插補分析。經由模擬的結果顯示,對於連續型的遺漏值,迴歸模型提供了最佳的插補估計;而類別型的遺漏值,C5.0決策樹是最佳的選擇。此外,對於資料庫中的稀少資料,當連續型的遺漏值,倒傳遞類神經模型提供了最佳的插補估計;而類別型的遺漏值,亦是C5.0決策樹是最佳的選擇。 / Data plays a vital role as source of information to the organization especially in the era of information and technology. A meaningful, qualitative and representative database if properly handled could mean a promising breakthrough to the organizations.
However, from time to time, we may encounter a not so perfect database, that is we have the situation where the data in the database is missing. With the incomplete database, the results obtained from such database may provide biased or misleading solutions. Therefore, the purpose of this research is to place its emphasis on imputing missing data of the value-added database then builds the model in accordance to the type of data.
If the missing data type is continuous, regression model and BPNN neural network is applied. If the missing data type is categorical, logistic regression, BPNN neural network and decision tree is chosen for the application. Our result has shown that for the continuous missing data, the regression model proved to deliver the best estimate. For the categorical missing data, C5.0 decision tree model is the chosen one. Besides, as regards the rare data missing in the database, our result has shown that for the continuous missing data, the BPNN neural network proved to deliver the best estimate.
For the categorical missing data, C5.0 decision tree model is the chosen one.
|
143 |
Bayesian Cluster Analysis : Some Extensions to Non-standard SituationsFranzén, Jessica January 2008 (has links)
The Bayesian approach to cluster analysis is presented. We assume that all data stem from a finite mixture model, where each component corresponds to one cluster and is given by a multivariate normal distribution with unknown mean and variance. The method produces posterior distributions of all cluster parameters and proportions as well as associated cluster probabilities for all objects. We extend this method in several directions to some common but non-standard situations. The first extension covers the case with a few deviant observations not belonging to one of the normal clusters. An extra component/cluster is created for them, which has a larger variance or a different distribution, e.g. is uniform over the whole range. The second extension is clustering of longitudinal data. All units are clustered at all time points separately and the movements between time points are modeled by Markov transition matrices. This means that the clustering at one time point will be affected by what happens at the neighbouring time points. The third extension handles datasets with missing data, e.g. item non-response. We impute the missing values iteratively in an extra step of the Gibbs sampler estimation algorithm. The Bayesian inference of mixture models has many advantages over the classical approach. However, it is not without computational difficulties. A software package, written in Matlab for Bayesian inference of mixture models is introduced. The programs of the package handle the basic cases of clustering data that are assumed to arise from mixture models of multivariate normal distributions, as well as the non-standard situations.
|
144 |
COMPARISON OF LONGITUDINAL AND CONVENTIONAL DATA ANALYSIS METHODS FOR ASSESSING EFFECTIVENESSJadhav, Pravin R 01 January 2006 (has links)
Pharmaceutical drug development is a costly and time consuming process. Reportedly, it takes about 10-15 years and ~900 million dollars of investment to launch a new drug in the world market. Any measure that increases the power and also decreases uncertainty about that power also increases drug net present value. For some time now, it has been argued that judicious utilization of available data might lead to more efficient use of resources during drug development. Conventionally, assessment of effectiveness has been based on comparing change from baseline at some pre-specified time for the control and test treatment (SPA). The last observation carry forward (LOCF) is a widely used technique if the data are missing due to any reason. Although, LOCF is known to introduce bias, the direction and magnitude is debatable.The primary aim of the proposed simulation experiments was to assess the properties of the random effects model (REM) and mixed model repeated measures (MMRM) methods that utilize all the data collected during pivotal trials. A total of 43 scenarios based on disease progression, magnitude of drug effect, between and within subject variability and patient drop-outs were analyzed. Three analysis methods, viz. SPA, REM and MMRM, were investigated. For the SPA method, the missing data were imputed with four different methods, such as LOCF, mean imputation, population and individual regression. The false-positive, false-negative inferences and bias in estimating the effect size for each method was assessed.The most important finding of this report is that the REM and MMRM methods are efficient alternatives to the SPA methods with ~50% savings on sample size. These methods are based on sound scientific principles and provide stronger evidence against the null hypothesis. The choice of the REM versus MMRM method is dependent on the purpose of the analysis and data gathered from the experimental design. The results support the use of likelihood-based MMRM methods for regulatory decision making. The REM methods are useful in understanding the time course of the disease and drug effect, making predictions based on the data and gaining insights into time to steady state effect for rational decision making. The SPA methods are less powerful across all the scenarios. The SPA-LOCF yielded anticonservative results in some cases with type-1 error rate exceeding 15% if data were missing due to toxicity. On the other hand, the drug effect was consistently underestimated (~40%), if data were missing due to lack of effectiveness. The results demonstrate that the SPA-LOCF methods make it practically impossible to establish effectiveness in these areas with a reasonable sample size.
|
145 |
Apport de la reconstruction virtuelle du bassin Regourdou 1 (Dordogne, France) à la connaissance des mécaniques obstétricales néandertaliennes. / Contribution of the virtual reconstruction of the pelvis Regourdou 1 (Dordogne, France) in the knowledge of the Neandertal obstetrical mechanics.Meyer, Valentine 04 October 2013 (has links)
La découverte d’un nouveau bassin, Regourdou 1, offre l’occasion de discuter de l’implication fonctionnelle de la morphologie pelvienne néandertalienne. Dans un premier temps, ce spécimen est décrit, ce qui permet de vérifier son appartenance aux Néandertaliens et mettre en évidence certains traits spécifiques à cette population. Bien qu’aucun ne soit autapomorphique, la combinaison de ces caractères caractérise la ceinture pelvienne néandertalienne. Le bassin Regourdou 1 est ensuite reconstruit à l’aide d’une estimation desdonnées manquantes, par thin-plate spline à partir de Kebara 2. Les dimensions du canal pelvien de Regourdou 1sont comparées à celles de deux autres spécimens néandertaliens (Tabun C1 et Kebara 2) et d’une populationmoderne (n=151). L’analyse de la morphologie des détroits obstétricaux néandertaliens (par morphométrie géométrique), et de la relation céphalo-pelvienne, met en évidence la présence de caractéristiques associées chez l’Homme anatomiquement moderne à la naissance rotationnelle. Notre travail confirme l’existence de mécaniques obstétricales néandertaliennes de type moderne. Cette interprétation permet d’enrichir notre connaissance biologique et culturelle de cette population. / The discovery of a new Neandertal pelvis, Regourdou 1, allows discussing the functional implications of Neandertal pelvic morphology. First, the specimen is described, which offers to proveits affiliation to the Neandertal population and to highlight specific Neandertal features. Even if none of these are autapomorphic, the combinaison of these traits characterize the Neandertal pelvic belt. The pelvis Regourdou 1 is reconstructed, by an estimation of missing-data, thanks to the thin-plate splines method, applied on Kebara 2. The dimensions of Regourdou 1 birth canal are compared with those of two others Neandertal individuals (Tabun C1 and Kebara 2) as well as a modern population (n=151).The analysis of the obstetrical planes morphology (by geometric morphometrics) and the cephalo-pelvic relation highlights the presence of traits associated with rotational birth in modern Human. Our work attests the existence of modern type obstetrical mechanics, in Neandertal. This interpretation allows enriching our biological and cultural knowledge of this population.
|
146 |
Modely a statistická analýza procesu rekordů / Models and statistical analysis of record processesTůmová, Alena January 2011 (has links)
In this work we model the historical development of best performances in men's 100, 200, 400 and 800m running events. We suppose that the years best performances are independent random variables with generalized extreme value distribution for minima and that there is a decreasing trend in location. Parameters of the models are estimated by using maximum likelihood techniques. The data of years best performances are missing for some years, we treat them as right censored data that are censored by value of world record valid at that time. Graphic tools used for models diagnostics are adjusted to the censoring. The models we get are used to estimate the ultimate records and to predict new records in next years. At the end of the work we estimate several models describing historical development of years best performances for more events at one time.
|
147 |
Traitement des données manquantes en épidémiologie : application de l’imputation multiple à des données de surveillance et d’enquêtes / Missing data management in epidemiology : Application of multiple imputation to data from surveillance systems and surveysHéraud Bousquet, Vanina 06 April 2012 (has links)
Le traitement des données manquantes est un sujet en pleine expansion en épidémiologie. La méthode la plus souvent utilisée restreint les analyses aux sujets ayant des données complètes pour les variables d’intérêt, ce qui peut réduire lapuissance et la précision et induire des biais dans les estimations. L’objectif de ce travail a été d’investiguer et d’appliquer une méthode d’imputation multiple à des données transversales d’enquêtes épidémiologiques et de systèmes de surveillance de maladies infectieuses. Nous avons présenté l’application d’une méthode d’imputation multiple à des études de schémas différents : une analyse de risque de transmission du VIH par transfusion, une étude cas-témoins sur les facteurs de risque de l’infection à Campylobacter et une étude capture-recapture estimant le nombre de nouveaux diagnostics VIH chez les enfants. A partir d’une base de données de surveillance de l’hépatite C chronique (VHC), nous avons réalisé une imputation des données manquantes afind’identifier les facteurs de risque de complications hépatiques graves chez des usagers de drogue. A partir des mêmes données, nous avons proposé des critères d’application d’une analyse de sensibilité aux hypothèses sous-jacentes àl’imputation multiple. Enfin, nous avons décrit l’élaboration d’un processus d’imputation pérenne appliqué aux données du système de surveillance du VIH et son évolution au cours du temps, ainsi que les procédures d’évaluation et devalidation.Les applications pratiques présentées nous ont permis d’élaborer une stratégie de traitement des données manquantes, incluant l’examen approfondi de la base de données incomplète, la construction du modèle d’imputation multiple, ainsi queles étapes de validation des modèles et de vérification des hypothèses. / The management of missing values is a common and widespread problem in epidemiology. The most common technique used restricts the data analysis to subjects with complete information on variables of interest, which can reducesubstantially statistical power and precision and may also result in biased estimates.This thesis investigates the application of multiple imputation methods to manage missing values in epidemiological studies and surveillance systems for infectious diseases. Study designs to which multiple imputation was applied were diverse: a risk analysis of HIV transmission through blood transfusion, a case-control study on risk factors for ampylobacter infection, and a capture-recapture study to estimate the number of new HIV diagnoses among children. We then performed multiple imputation analysis on data of a surveillance system for chronic hepatitis C (HCV) to assess risk factors of severe liver disease among HCV infected patients who reported drug use. Within this study on HCV, we proposedguidelines to apply a sensitivity analysis in order to test the multiple imputation underlying hypotheses. Finally, we describe how we elaborated and applied an ongoing multiple imputation process of the French national HIV surveillance database, evaluated and attempted to validate multiple imputation procedures.Based on these practical applications, we worked out a strategy to handle missing data in surveillance data base, including the thorough examination of the incomplete database, the building of the imputation model, and the procedure to validate imputation models and examine underlying multiple imputation hypotheses.
|
148 |
Devenir à long terme de couples traités par fécondation in vitro dans la cohorte DAIFI / Long-term outcome of couples treated by in vitro fertilization in the DAIFI cohortTroude, Pénélope 21 June 2013 (has links)
Les études sur les couples traités par fécondation in vitro (FIV) ont jusqu’à présent porté essentiellement sur l’évaluation du succès en FIV. Très peu de données sont disponibles sur le devenir à long terme de couples traités par FIV. L’objectif de ce travail était d’estimer la fréquence de réalisation du projet parental à long terme, et d’étudier les facteurs associés aux interruptions précoces des traitements et aux naissances naturelles.L’enquête DAIFI-2009 a inclus 6 507 couples ayant débuté un programme de FIV en 2000-2002 dans l’un des 8 centres de FIV participant à l’étude. Les données médicales des couples et leur parcours dans le centre ont été obtenus à partir des dossiers médicaux des centres de FIV pour tous les couples. L’information sur le devenir des couples après le départ du centre a été obtenue par questionnaire postal auprès des couples en 2008-2009 (38% de participation 7 à 9 ans après l’initiation des FIV). L’étude des facteurs associés à la participation à l’enquête postale suggérait que la fréquence de réalisation du projet parental estimée sur les répondants seulement pourrait être biaisée. Les différentes méthodes mises en œuvre pour corriger la non réponse (pondération, imputation multiple) n’ont pas modifié l’estimation de la fréquence de réalisation du projet parental. Au total, 7 à 9 ans après l’initiation des FIV, 60% des couples ont réalisé leur projet parental de façon biologique, suite à un traitement ou suite à une conception naturelle. Lorsque les adoptions sont aussi prises en compte, 71% des couples ont réalisé leur projet parental. Après l’échec d’une première tentative de FIV, un couple sur 4 (26%) a interrompu les FIV dans le centre d’inclusion. Globalement, les couples avec de mauvais facteurs pronostiques ont un plus grand risque d’interrompre les FIV. Cependant, la proportion plus importante d’interruption parmi les couples avec une origine inexpliquée de l’infécondité pourrait s’expliquer par la survenue plus fréquente de naissance naturelle dans ce sous-groupe de couples. Parmi les couples n’ayant pas eu d’enfant suite aux traitements, 24% ont ensuite conçu naturellement en médiane 28 mois après l’initiation des FIV. Parmi les couples ayant eu un enfant suite aux traitements, 17% ont ensuite conçu naturellement en médiane 33 mois après la naissance de l’enfant conçu par AMP. Les facteurs associés aux naissances naturelles sont des indicateurs d’un meilleur pronostic de fertilité, particulièrement chez les couples sans enfant AMP.L’enquête DAIFI-2009 a permis d’apporter des informations sur le parcours à long terme des couples traités par FIV qui n’avait jusqu’à présent été que peu étudié, souvent sur de faibles effectifs et avec un suivi plus court. Ces résultats doivent apporter de l’espoir aux couples inféconds, puisque la majorité d’entre eux ont finalement réalisé leur projet parental, même si cela peut prendre de nombreuses années. / Until now, most studies of couples treated by in vitro fertilization (IVF) have been centered on IVF success. Very few data are available on the long-term outcome of these couples, including spontaneous conception and adoptions. This work aimed to estimate the long-term cumulative parenthood rate, and to study factors associated with early IVF discontinuation and with spontaneous live births.The DAIFI study is a retrospective cohort including 6,507 couples who began IVF in 2000-2002 in one of the eight participating French IVF centres. Medical data on all couples were obtained from centre databases. Information on long-term outcome after leaving the IVF center was collected by postal questionnaire sent to couples in 2008-2010 (7 to 9 years after IVF initiation, participation rate 38%). Study of factors associated with participation in the postal survey suggested that the cumulative parenthood rate estimated only in participants might be biased. The different methods used to correct for non-response bias (inverse probability weighting, multiple imputation) did not modify the estimation of the cumulative parenthood rate obtained with the complete case approach. Finally, 7 to 9 years after IVF initiation, the cumulative parenthood rate was estimated at 60%, including live births following IVF, other treatment or spontaneous conception. When adoptions were also considered, the cumulative parenthood rate reached 71%. After a first failed IVF cycle, just over one couple out of four (26%) discontinued IVF treatment. Globally, couples with poor prognostic factors had a higher risk of early discontinuation of IVF treatment. However, the higher proportion of early discontinuation observed among couples with unexplained infertility could be linked to a higher chance of spontaneous pregnancy in this subpopulation. Among couples who remained childless after treatment, 24% later had a spontaneous live birth (SLB), at a median of 28 months after the first IVF attempt. Among couples who had had a child during medical treatment, 17% later had an SLB, at a median of 33 months after the birth following medical treatment. Regarding factors associated with SLB, they can be viewed as indicators of a better fertility prognosis, especially among unsuccessfully treated couples.The DAIFI study has provided information on the long-term outcome of couples treated by IVF, which has until now been little studied, often on small samples and with a shorter duration of follow-up. These results should give hope to infertile couples as nearly three couples out of four finally became parents, even if it may take many years.
|
149 |
Bayesian Cluster Analysis : Some Extensions to Non-standard SituationsFranzén, Jessica January 2008 (has links)
<p>The Bayesian approach to cluster analysis is presented. We assume that all data stem from a finite mixture model, where each component corresponds to one cluster and is given by a multivariate normal distribution with unknown mean and variance. The method produces posterior distributions of all cluster parameters and proportions as well as associated cluster probabilities for all objects. We extend this method in several directions to some common but non-standard situations. The first extension covers the case with a few deviant observations not belonging to one of the normal clusters. An extra component/cluster is created for them, which has a larger variance or a different distribution, e.g. is uniform over the whole range. The second extension is clustering of longitudinal data. All units are clustered at all time points separately and the movements between time points are modeled by Markov transition matrices. This means that the clustering at one time point will be affected by what happens at the neighbouring time points. The third extension handles datasets with missing data, e.g. item non-response. We impute the missing values iteratively in an extra step of the Gibbs sampler estimation algorithm. The Bayesian inference of mixture models has many advantages over the classical approach. However, it is not without computational difficulties. A software package, written in Matlab for Bayesian inference of mixture models is introduced. The programs of the package handle the basic cases of clustering data that are assumed to arise from mixture models of multivariate normal distributions, as well as the non-standard situations.</p>
|
150 |
A Note on the Generalization Performance of Kernel Classifiers with MarginEvgeniou, Theodoros, Pontil, Massimiliano 01 May 2000 (has links)
We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.
|
Page generated in 0.4518 seconds