321 |
Multivariate semiparametric regression models for longitudinal dataLi, Zhuokai January 2014 (has links)
Multiple-outcome longitudinal data are abundant in clinical investigations. For example, infections with different pathogenic organisms are often tested concurrently, and assessments are usually taken repeatedly over time. It is therefore natural to consider a multivariate modeling approach to accommodate the underlying interrelationship among the multiple longitudinally measured outcomes. This dissertation proposes a multivariate semiparametric modeling framework for such data. Relevant estimation and inference procedures as well as model selection tools are discussed within this modeling framework. The first part of this research focuses on the analytical issues concerning binary data. The second part extends the binary model to a more general situation for data from the exponential family of distributions. The proposed model accounts for the correlations across the outcomes as well as the temporal dependency among the repeated measures of each outcome within an individual. An important feature of the proposed model is the addition of a bivariate smooth function for the depiction of concurrent nonlinear and possibly interacting influences of two independent variables on each outcome. For model implementation, a general approach for parameter estimation is developed by using the maximum penalized likelihood method. For statistical inference, a likelihood-based resampling procedure is proposed to compare the bivariate nonlinear effect surfaces across the outcomes. The final part of the dissertation presents a variable selection tool to facilitate model development in practical data analysis. Using the adaptive least absolute shrinkage and selection operator (LASSO) penalty, the variable selection tool simultaneously identifies important fixed effects and random effects, determines the correlation structure of the outcomes, and selects the interaction effects in the bivariate smooth functions. Model selection and estimation are performed through a two-stage procedure based on an expectation-maximization (EM) algorithm. Simulation studies are conducted to evaluate the performance of the proposed methods. The utility of the methods is demonstrated through several clinical applications.
|
322 |
Dynamic Graph Embedding on Event Streams with Apache FlinkPerini, Massimo January 2019 (has links)
Graphs are often considered an excellent way of modeling complex real-world problems since they allow to capture relationships between items. Because of their ubiquity, graph embedding techniques have occupied research groups, seeking how vertices can be encoded into a low-dimensional latent space, useful to then perform machine learning. Recently Graph Neural Networks (GNN) have dominated the space of embeddings generation due to their inherent ability to encode latent node dependencies. Moreover, the newly introduced Inductive Graph Neural Networks gained much popularity for inductively learning and representing node embeddings through neighborhood aggregate measures. Even when an entirely new node, unseen during training, appears in the graph, it can still be properly represented by its neighboring nodes. Although this approach appears suitable for dynamic graphs, available systems and training methodologies are agnostic of dynamicity and solely rely on re-processing full graph snapshots in batches, an approach that has been criticized for its high computational costs. This work provides a thorough solution to this particular problem via an efficient prioritybased method for selecting rehearsed samples that guarantees low complexity and high accuracy. Finally, a data-parallel inference method has been evaluated at scale using Apache Flink, a data stream processor for real-time predictions on high volume graph data streams. / Molti problemi nel mondo reale possono essere rappresentati come grafi poichè queste strutture dati consentono di modellare relazioni tra elementi. A causa del loro vasto uso, molti gruppi di ricerca hanno tentato di rappresentare i vertici in uno spazio a bassa dimensione, utile per poi poter utilizzare tecniche di apprendimento automatico. Le reti neurali per grafi sono state ampiamente utilizzate per via della loro capacità di codificare dipendenze tra vertici. Le reti neurali induttive recentemente introdotte, inoltre, hanno guadagnato popolarità poichè consentono di generare rappresentazioni di vertici aggregando altri vertici. In questo modo anche un nodo completamente nuovo può comunque essere rappresentato utilizzando i suoi nodi vicini. Sebbene questo approccio sia adatto per grafici dinamici, i sistemi ad oggi disponibili e gli algoritmi di addestramento si basano esclusivamente sulla continua elaborazione di grafi statici, un approccio che è stato criticato per i suoi elevati costi di calcolo. Questa tesi fornisce una soluzione a questo problema tramite un metodo efficiente per l’allenamento di reti neurali induttive basato su un’euristica per la selezione dei vertici. Viene inoltre descritto un metodo per eseguire predizioni in modo scalabile in tempo reale utilizzando Apache Flink, un sistema per l’elaborazione di grandi quantità di flussi di dati in tempo reale. / Grafer anses ofta vara ett utmärkt sätt att modellera komplexa problem i verkligheten eftersom de gör det möjligt att fånga relationer mellan objekt. På grund av deras allestädes närhet har grafinbäddningstekniker sysselsatt forskningsgrupper som undersöker hur hörn kan kodas in i ett lågdimensionellt latent utrymme, vilket är användbart för att sedan utföra maskininlärning. Nyligen har Graph Neural Networks (GNN) dominerat utrymmet för inbäddningsproduktion tack vare deras inneboende förmåga att koda latenta nodberoenden. Dessutom fick de nyinförda induktiva grafiska nervnäten stor popularitet för induktivt lärande och representerande nodbäddningar genom sammanlagda åtgärder i grannskapet. Även när en helt ny nod, osynlig under träning, visas i diagrammet, kan den fortfarande representeras ordentligt av dess angränsande noder. Även om detta tillvägagångssätt tycks vara lämpligt för dynamiska grafer, är tillgängliga system och träningsmetodologier agnostiska för dynamik och förlitar sig bara på att behandla fullständiga ögonblicksbilder i partier, en metod som har kritiserats för dess höga beräkningskostnader. Detta arbete ger en grundlig lösning på detta specifika problem via en effektiv prioriteringsbaserad metod för att välja repeterade prover som garanterar låg komplexitet och hög noggrannhet. Slutligen har en dataparallell inferensmetod utvärderats i skala med Apache Flink, en dataströmprocessor för realtidsprognoser för grafiska dataströmmar med hög volym.
|
Page generated in 0.1169 seconds