Global ETD Search

201	TRANSFORMS IN SUFFICIENT DIMENSION REDUCTION AND THEIR APPLICATIONS IN HIGH DIMENSIONAL DATA Weng, Jiaying 01 January 2019 (has links) The big data era poses great challenges as well as opportunities for researchers to develop efficient statistical approaches to analyze massive data. Sufficient dimension reduction is such an important tool in modern data analysis and has received extensive attention in both academia and industry. In this dissertation, we introduce inverse regression estimators using Fourier transforms, which is superior to the existing SDR methods in two folds, (1) it avoids the slicing of the response variable, (2) it can be readily extended to solve the high dimensional data problem. For the ultra-high dimensional problem, we investigate both eigenvalue decomposition and minimum discrepancy approaches to achieve optimal solutions and also develop a novel and efficient optimization algorithm to obtain the sparse estimates. We derive asymptotic properties of the proposed estimators and demonstrate its efficiency gains compared to the traditional estimators. The oracle properties of the sparse estimates are derived. Simulation studies and real data examples are used to illustrate the effectiveness of the proposed methods. Wavelet transform is another tool that effectively detects information from time-localization of high frequency. Parallel to our proposed Fourier transform methods, we also develop a wavelet transform version approach and derive the asymptotic properties of the resulting estimators. Central subspace Fourier transform Predictors hypothesis tests Sufficient dimension reduction Sufficient variable selection Wavelet transform Multivariate Analysis Statistical Methodology Statistical Models
202	A NEW INDEPENDENCE MEASURE AND ITS APPLICATIONS IN HIGH DIMENSIONAL DATA ANALYSIS Ke, Chenlu 01 January 2019 (has links) This dissertation has three consecutive topics. First, we propose a novel class of independence measures for testing independence between two random vectors based on the discrepancy between the conditional and the marginal characteristic functions. If one of the variables is categorical, our asymmetric index extends the typical ANOVA to a kernel ANOVA that can test a more general hypothesis of equal distributions among groups. The index is also applicable when both variables are continuous. Second, we develop a sufficient variable selection procedure based on the new measure in a large p small n setting. Our approach incorporates marginal information between each predictor and the response as well as joint information among predictors. As a result, our method is more capable of selecting all truly active variables than marginal selection methods. Furthermore, our procedure can handle both continuous and discrete responses with mixed-type predictors. We establish the sure screening property of the proposed approach under mild conditions. Third, we focus on a model-free sufficient dimension reduction approach using the new measure. Our method does not require strong assumptions on predictors and responses. An algorithm is developed to find dimension reduction directions using sequential quadratic programming. We illustrate the advantages of our new measure and its two applications in high dimensional data analysis by numerical studies across a variety of settings. High dimensional data analysis Independence Reproducing Kernel Hilbert Space Sufficient Dimension Reduction Sufficient Variable Selection Categorical Data Analysis Multivariate Analysis Statistics and Probability
203	Bayesian models for DNA microarray data analysis Lee, Kyeong Eun 29 August 2005 (has links) Selection of signi?cant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This research proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables in a regression setting and use a Bayesian mixture prior to perform the variable selection. Due to the binary nature of the data, the posterior distributions of the parameters are not in explicit form, and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the posterior distributions. The Bayesian model is ?exible enough to identify the signi?cant genes as well as to perform future predictions. The method is applied to cancer classi?cation via cDNA microarrays. In particular, the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify the set of signi?cant genes to classify BRCA1 and others. Microarray data can also be applied to survival models. We address the issue of how to reduce the dimension in building model by selecting signi?cant genes as well as assessing the estimated survival curves. Additionally, we consider the wellknown Weibull regression and semiparametric proportional hazards (PH) models for survival analysis. With microarray data, we need to consider the case where the number of covariates p exceeds the number of samples n. Speci?cally, for a given vector of response values, which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the responsible genes, which are controlling the survival time. This approach enables us to estimate the survival curve when n << p. In our approach, rather than ?xing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional ?exibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in e?ect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology with (a) di?use large B??cell lymphoma (DLBCL) complementary DNA (cDNA) data and (b) Breast Carcinoma data. Lastly, we propose a mixture of Dirichlet process models using discrete wavelet transform for a curve clustering. In order to characterize these time??course gene expresssions, we consider them as trajectory functions of time and gene??speci?c parameters and obtain their wavelet coe?cients by a discrete wavelet transform. We then build cluster curves using a mixture of Dirichlet process priors. DNA microarray Bayesian Variable Selection Probit Regression Model Weibull Regression Model Cox's Proportional Hazard Model Survival Analysis Curve Clustering Mixture of Dirichlet Processes Wavelet
204	Modelling and forecasting economic time series with single hidden-layer feedforward autoregressive artificial neural networks Rech, Gianluigi January 2001 (has links) This dissertation consists of 3 essays In the first essay, A Simple Variable Selection Technique for Nonlinear Models, written in cooperation with Timo Teräsvirta and Rolf Tschernig, I propose a variable selection method based on a polynomial expansion of the unknown regression function and an appropriate model selection criterion. The hypothesis of linearity is tested by a Lagrange multiplier test based on this polynomial expansion. If rejected, a kth order general polynomial is used as a base for estimating all submodels by ordinary least squares. The combination of regressors leading to the lowest value of the model selection criterion is selected. The second essay, Modelling and Forecasting Economic Time Series with Single Hidden-layer Feedforward Autoregressive Artificial Neural Networks, proposes an unified framework for artificial neural network modelling. Linearity is tested and the selection of regressors performed by the methodology developed in essay I. The number of hidden units is detected by a procedure based on a sequence of Lagrange multiplier (LM) tests. Serial correlation of errors and parameter constancy are checked by LM tests as well. A Monte-Carlo study, the two classical series of the lynx and the sunspots, and an application on the monthly S&P 500 index return series are used to demonstrate the performance of the overall procedure. In the third essay, Forecasting with Artificial Neural Network Models (in cooperation with Marcelo Medeiros), the methodology developed in essay II, the most popular methods for artificial neural network estimation, and the linear autoregressive model are compared by forecasting performance on 30 time series from different subject areas. Early stopping, pruning, information criterion pruning, cross-validation pruning, weight decay, and Bayesian regularization are considered. The findings are that 1) the linear models very often outperform the neural network ones and 2) the modelling approach to neural networks developed in this thesis stands up well with in comparison when compared to the other neural network modelling methods considered here. / <p>Diss. Stockholm : Handelshögskolan, 2002. Spikblad saknas</p> Neural networks Nonlinear time series Nonparametric variable selection Misspecification tests Parameter constancy Autocorrelation Lagrange multiplier test Model specification Forecasting Ekonometri Tidsserieanalys Econometrics Ekonometri
205	Robust inference of gene regulatory networks : System properties, variable selection, subnetworks, and design of experiments Nordling, Torbjörn E. M. January 2013 (has links) In this thesis, inference of biological networks from in vivo data generated by perturbation experiments is considered, i.e. deduction of causal interactions that exist among the observed variables. Knowledge of such regulatory influences is essential in biology. A system property–interampatteness–is introduced that explains why the variation in existing gene expression data is concentrated to a few “characteristic modes” or “eigengenes”, and why previously inferred models have a large number of false positive and false negative links. An interampatte system is characterized by strong INTERactions enabling simultaneous AMPlification and ATTEnuation of different signals and we show that perturbation of individual state variables, e.g. genes, typically leads to ill-conditioned data with both characteristic and weak modes. The weak modes are typically dominated by measurement noise due to poor excitation and their existence hampers network reconstruction. The excitation problem is solved by iterative design of correlated multi-gene perturbation experiments that counteract the intrinsic signal attenuation of the system. The next perturbation should be designed such that the expected response practically spans an additional dimension of the state space. The proposed design is numerically demonstrated for the Snf1 signalling pathway in S. cerevisiae. The impact of unperturbed and unobserved latent state variables, that exist in any real biological system, on the inferred network and required set-up of the experiments for network inference is analysed. Their existence implies that a subnetwork of pseudo-direct causal regulatory influences, accounting for all environmental effects, in general is inferred. In principle, the number of latent states and different paths between the nodes of the network can be estimated, but their identity cannot be determined unless they are observed or perturbed directly. Network inference is recognized as a variable/model selection problem and solved by considering all possible models of a specified class that can explain the data at a desired significance level, and by classifying only the links present in all of these models as existing. As shown, these links can be determined without any parameter estimation by reformulating the variable selection problem as a robust rank problem. Solution of the rank problem enable assignment of confidence to individual interactions, without resorting to any approximation or asymptotic results. This is demonstrated by reverse engineering of the synthetic IRMA gene regulatory network from published data. A previously unknown activation of transcription of SWI5 by CBF1 in the IRMA strain of S. cerevisiae is proven to exist, which serves to illustrate that even the accumulated knowledge of well studied genes is incomplete. / Denna avhandling behandlar inferens av biologiskanätverk från in vivo data genererat genom störningsexperiment, d.v.s. bestämning av kausala kopplingar som existerar mellan de observerade variablerna. Kunskap om dessa regulatoriska influenser är väsentlig för biologisk förståelse. En system egenskap—förstärksvagning—introduceras. Denna förklarar varför variationen i existerande genexpressionsdata är koncentrerat till några få ”karakteristiska moder” eller ”egengener” och varför de modeller som konstruerats innan innehåller många falska positiva och falska negativa linkar. Ett system med förstärksvagning karakteriseras av starka kopplingar som möjliggör simultan FÖRSTÄRKning och förSVAGNING av olika signaler. Vi demonstrerar att störning av individuella tillståndsvariabler, t.ex. gener, typiskt leder till illakonditionerat data med både karakteristiska och svaga moder. De svaga moderna domineras typiskt av mätbrus p.g.a. dålig excitering och försvårar rekonstruktion av nätverket. Excitationsproblemet löses med iterativdesign av experiment där korrelerade störningar i multipla gener motverkar systemets inneboende försvagning av signaller. Följande störning bör designas så att det förväntade svaret praktiskt spänner ytterligare en dimension av tillståndsrummet. Den föreslagna designen demonstreras numeriskt för Snf1 signalleringsvägen i S. cerevisiae. Påverkan av ostörda och icke observerade latenta tillståndsvariabler, som existerar i varje verkligt biologiskt system, på konstruerade nätverk och planeringen av experiment för nätverksinferens analyseras. Existens av dessa tillståndsvariabler innebär att delnätverk med pseudo-direkta regulatoriska influenser, som kompenserar för miljöeffekter, generellt bestäms. I princip så kan antalet latenta tillstånd och alternativa vägar mellan noder i nätverket bestämmas, men deras identitet kan ej bestämmas om de inte direkt observeras eller störs. Nätverksinferens behandlas som ett variabel-/modelselektionsproblem och löses genom att undersöka alla modeller inom en vald klass som kan förklara datat på den önskade signifikansnivån, samt klassificera endast linkar som är närvarande i alla dessa modeller som existerande. Dessa linkar kan bestämmas utan estimering av parametrar genom att skriva om variabelselektionsproblemet som ett robustrangproblem. Lösning av rangproblemet möjliggör att statistisk konfidens kan tillskrivas individuella linkar utan approximationer eller asymptotiska betraktningar. Detta demonstreras genom rekonstruktion av det syntetiska IRMA genreglernätverket från publicerat data. En tidigare okänd aktivering av transkription av SWI5 av CBF1 i IRMA stammen av S. cerevisiae bevisas. Detta illustrerar att t.o.m. den ackumulerade kunskapen om välstuderade gener är ofullständig. / <p>QC 20130508</p> network inference reverse engineering variable selection model selection feature selection subset selection system identification system theory network theory subnetworks design of experiments perturbation experiments gene regulatory networks biological networks
206	Alignment and Variable Selection Tools for Gas Chromatography – Mass Spectrometry Data Sinkov, Nikolai Unknown Date No description available. Gas Chromatography Mass Spectromentry Arson Gasoline Ignitable liquid Chemometrics Variable Selection Feature Selection Cluster Resolution PCA PLS-DA SIMCA Selectivity Ratio ANOVA Classification Chromatographic Alignment
207	Improving the performance of micro-machined metal oxide gas sensors: Optimization of the temperatura modulation mode via pseudorandom sequences. Vergara Tinoco, Alexander 25 July 2006 (has links) Un dels majors problemes experimentats pels sistemes de detecció de gasos basats en sensors d'òxids metàl·lics és la seva manca de reproduibilitat, estabilitat i selectivitat. A fi i a efecte d'intentar resoldre aquest problemes, diferents estratègies han estat desenvolupades en paral·lel. Algunes es relacionen a la millora dels materials i d'altres impliquen el condicionament o el pre-tractament de les mostres. Les més emprades han consistit en aprofitar que els sensors presenten sensibilitats solapades per construir matrius de sensors i emprar tècniques de processament del senyal o bé utilitzar característiques de la resposta dinàmica dels sensors. En els darrers anys, modular la temperatura de treball del sensors d'òxids metàl·lics s'ha convertit en un dels mètodes més utilitzats per incrementar-ne la selectivitat. Això s'esdevé així donat que la resposta del sensor varia amb la seva temperatura de treball. Per això, en determinats casos, mesurant la resposta d'un sensor a n temperatures de treball diferents pot ser equivalent a tenir una matriu de n sensors diferents. Això permet obtenir informació multivariant de cada sensor individualment i ajuda a mantenir baixa la dimensionalitat del sistema de mesura per resoldre una determinada aplicació. Malgrat que molts i bons resultats han estat publicats dins aquest àmbit, la tria de les freqüències emprades en la modulació de la temperatura de treball dels sensor ha consistit fins ara en un procés empíric que no garanteix la obtenció dels millors resultats per una determinada aplicació. En aquest context, el principal objectiu d'aquesta tesi doctoral ha consistit en desenvolupar un mètode sistemàtic que permeti determinar quines són les freqüències de modulació òptimes que caldria emprar per resoldre un determinat problema d'anàlisi de gasos. Aquest mètode, extret del camp d'identificació de sistemes, ha esta desenvolupat i implementat per primer cop dins l'àmbit dels sensors de gasos. Aquest consisteix en estudiar la resposta dels sensors en presència de gasos mentre la temperatura de treball dels sensors és modulada per un senyal pseudo-aleatori de longitud màxima. Aquest senyals comparteixen algunes propietats amb el soroll blanc, i per tant poden ajudar a estimar la resposta lineal d'un sistema amb no-linealitats (per exemple, la resposta impulsional d'un sistema sensor-gas). El procés d'optimització es duu a terme mitjançant la selecció entre els components espectrals de les estimacions de la resposta impulsional, d'aquells que millor ajuden bé a discriminar o a quantificar els gasos objectiu dins una aplicació d'anàlisi de gasos donada. Tenint en compte que els components espectrals estan directament relacionats amb les xvii Improving the performance of micro-machined metal oxide gas sensors: Optimization of the temperature modulation mode via pseudo-random sequences. freqüències de modulació, la tria d'uns pocs components espectrals resulta en la determinació de les freqüències òptimes de modulació. En els primer experiments, senyals binaris pseudo-aleatoris van ser emprats per modular la temperatura de treball de sensors de gasos basats en òxids metàl·lics micro-mecanitzats dins d'un rang comprès entre 0 i 112,5 Hz. La freqüència superior és lleugerament superior a la frequència de tall de les membranes dels sensors. El resultat principal derivat d'aques estudi va ser que les freqüències de modulació interessants es trobaven en un rang comprès entre 0 i 1 Hz. Això és comprensible donat que la cinètica de les reaccions i dels processos d'adsorció que es produeixen en la superfície dels sensors són lentes i si aquestes s'han de veure modificades per la modulació térmica, llavors caldran senyals de modulació de baixa freqüència. Això explica perquè s'han vingut emprant senyals moduladores de temperatura en el rang dels mHz, malgrat que les membranes d'un dispositiu micromecanitzat presenten respostes tèrmiques molts més ràpides (típicament de l'ordre de 100 Hz). En els experiments que continuaren els primers, un mètode evolucionat per determinar les freqüències de modulació tèrmica òptimes va ser implementat. Aquest es basa en l'ús de seqüències pseudo-aleatòries multi-nivell de longitud màxima. Els senyals de tipus multi-nivell van ser considerats en substitució dels senyals binaris ja que els primers permeten obtenir una millor estimació que els segons de la dinàmica lineal d'un sistema amb no linealitats. I és ben conegut que els sensors de gasos basats en òxids metàl·lics presenten no linealitat en la seva resposta. Aquests estudis sistemàtics van ser completament validats mitjançant la síntesi de senyals multi-sinusoïdals amb les freqüències prèviament identificades emprant sequències pseudo-aleatòries. Quan la temperatura de treball dels sensors va ser modulada amb un senyal, el contingut freqüencial del qual era l'òptim, els gasos i les mescles de gasos considerades van poder ser discriminades perfectament i es va mostrar la possibilitat d'obtenir models de calibració acurats per predir la concentració dels gasos. En alguns casos, aquest procés de validació es va portar a terme emprant sensors que no havien estat utilitzats durant el procés d'optimització (per exemple, una agrupació de sensors diferent però del mateix lot de fabricació). En resum, el nou mètode desenvolupat en aquesta tesi per seleccionar les freqüències de modulació òptimes s'ha mostrat consistent i efectiu. El mètode és d'aplicació general i podria ser emprat en qualsevol problema d'anàlisi de gasos o bé estès a altres tipus de sensors (per exemple sensors polimèrics). Les contribucions científiques d'aquesta tesi s'han recollit en quatre articles en revistes internacionals i 13 llibres d'actes de conferències. / Uno de los mayores problemas experimentados en los sistemas de detección de gases basados en dispositivos de óxidos metálicos es su falta de reproducibilidad, estabilidad y selectividad. Con el fin de intentar resolver estos problemas, diferentes estrategias han sido desarrolladas en paralelo. Algunas de ellas se relacionan con la mejora de los materiales y otras implican acondicionamiento o pre-tratamiento de las muestras. Otras estrategias ampliamente empleadas consisten en aprovechar que los sensores presentan sensibilidades solapadas para construir matrices de sensores y emplear técnicas de procesamiento de señal o bien utilizar características de la respuesta dinámica de los sensores.En los últimos años, modular la temperatura de trabajo de los sensores de óxidos metálicos se ha convertido en uno de los métodos más utilizados para incrementar su selectividad. Esto se debe a, dado que la respuesta del sensor varía con su propia temperatura de trabajo, entonces, en determinados casos, midiendo la respuesta de un sensor a n temperaturas de trabajo diferentes, es equivalente a tener una matriz de n sensores diferentes. Esto permite obtener información multivariante de cada sensor individualmente y ayuda a mantener baja la dimensionalidad del sistema de medida para resolver una determinada aplicación. A pesar de los buenos resultados que han sido publicados dentro de este ámbito, la selección de las frecuencias empleadas en la modulación de la temperatura de trabajo de los sensores ha consistido, hasta el momento, en un proceso empírico lo que no garantiza la obtención de los mejores resultados para una determinada aplicación.En este contexto, el principal objetivo de esta tesis doctoral ha consistido en desarrollar un método sistemático que permita determinar cuales son las frecuencias de modulación óptimas que podrían emplearse para resolver un determinado problema de análisis de gases. Este método, extraído del campo de identificación de sistemas, ha sido desarrollado e implementado por primera vez dentro del ámbito de los sensores de gases. Éste consiste en estudiar la respuesta de los sensores en presencia de gases mientras la temperatura de trabajo de los sensores es modulada mediante una señal pseudo-aleatoria de longitud máxima. Estas señales comparten algunas propiedades con el ruido blanco, y por tanto pueden ayudar a estimar la respuesta lineal de un sistema con no-linealidades (por ejemplo, la respuesta impulsional de un sistema sensor-gas).El proceso de optimización es llevado a cabo mediante la selección entre las componentes espectrales de las estimaciones de la respuesta impulsional, de aquellas que más ayudan ya sea a discriminar o a cuantificar los gases objetivo dentro de una aplicación de análisis de gases dada. Teniendo en cuenta que las componentes espectrales están directamente relacionadas con las frecuencias de modulación, la selección de unas pocas componentes espectrales resulta en la determinación de las frecuencias optimas de modulación.En los primeres experimentos, señales binarias pseudo-aleatorias fueron utilizadas para modular la temperatura de trabajo de los sensores de gases basados en óxidos metálicos micro-mecanizados en un rango comprendido entre 0 a 112.5 Hz. La frecuencia superior es ligeramente mayor a la frecuencia de corte de las membranas de los sensores. El resultado principal derivado de estos estudios fue que las frecuencias de modulación interesantes se encuentran en un rango comprendido entre 0 y 1 Hz. Esto es comprensible dado que la cinética de las reacciones y de los procesos de adsorción que se producen en la superficie del sensor son lentos y si estos se han de alterar mediante la modulación térmica, se habrá de elaborar señales de modulación a bajas frecuencias. Esto explica por que se han venido empleado señales moduladoras de temperatura en el rango de los mHz, a pesar que las membranas de un dispositivo micro-mecanizado presentan respuestas mucho más rápidas (típicamente en el orden de los 100 Hz).En los experimentos posteriores a los primeros, un método evolucionado para determinar las frecuencias de modulación óptimas de los sensores micro-mecanizados fue implementado, el cual se basa en el uso de secuencias pseudo-aleatorias multi-nivel de longitud máxima (MLPRS). Las señales de tipo multi-nivel fueron consideradas en lugar de las binarias ya que las primeras permiten obtener una mejor estimación que las segundas de la dinámica lineal de un sistema con no linealidades. Y es bien conocido que los sensores de gases basados en óxidos metálicos presentan no-linealidades en su respuesta.Estos estudios sistemáticos fueron completamente validados mediante la síntesis de señales multi-senoidales con las frecuencias previamente identificadas utilizando secuencias pseudo-aleatorias. Cuando la temperatura de trabajo de los sensores fue modulada por una señal, el contenido frecuencial de la cual es el óptimo, los gases y mezclas de gases considerados pudieron ser discriminados perfectamente y se verificó la posibilidad de obtener modelos de calibración precisos para predecir la concentración de los gases. En algunos casos, estos procesos de validación se llevaron a cabo con sensores que no habían sido utilizados durante el proceso de optimización (por ejemplo, una agrupación de sensores diferentes pero del mismo lote de fabricación).En resumen, El nuevo método desarrollado in esta tesis para seleccionar las frecuencias de modulación optimas se a mostrado consistente y efectivo. El método es de aplicación general y podría ser utilizado en cualquier problema de análisis de gases o bien extendido a otro tipo de sensores (por ejemplo sensores poliméricos).Las contribuciones científicas de esta tesis se han recogido en 4 artículos en revistas internacionales y trece actas de conferencias. / One of the major problems in gas sensing systems that use metal oxide devices is the lack of reproducibility, stability and selectivity. In order to tackle these troubles experienced with metal oxide gas sensors, different strategies have been developed in parallel. Some of these are related to the improvement of materials, or the use of sample conditioning and pre-treating methods. Other widely used techniques include taking benefit of the unavoidable partially overlapping sensitivities by using sensor arrays and pattern recognition techniques or the use of dynamic features from the gas sensor response.In the last years, modulating the working temperature of metal oxide gas sensors has been one of the most used methods to enhance sensor selectivity. This occurs because, since, the sensor response is different at different working temperatures, and therefore, measuring the sensor response at n different temperatures is, in some cases, similar to the use of an array comprising n different sensors. This allows for measuring multivariate information from every single sensor and helps in keeping low the dimensionality of the measurement system needed to solve a specific application. Although the good results reported, until now, the selection of the frequencies used to modulate the working temperature remained an empirical process and that is not an accurate method to ensure that the best results are reached for a given application.In view of this context, the principal objective of this doctoral thesis was to develop a systematic method to determine which are the optimal temperature modulation frequencies to solve a given gas analysis problem. This method, which is borrowed from the field of system identification, has been developed and introduced for the first time in the area of gas sensors. It consists of studying the sensor response to gases when the operating temperature is modulated via maximum-length pseudo-random sequences. Such signals share some properties with white noise and, therefore, can be of help to estimate the linear response of a system with non-linearity (e.g., the impulse response of a sensor-gas system).The optimization process is conducted by selecting among the spectral components of the impulse response estimates, the few that better help either discriminating or quantifying the target gases of a given gas analysis application. Since spectral components are directly related to modulating frequencies, the selection of spectral components results in the determination of the optimal temperature modulating frequencies.In the first experiments, pseudo-random binary signals (PRBS) were employed to modulate the working temperature of micro-machined metal oxide gas sensors in a frequency range from 0 up to 112.5 Hz. The upper frequency is slightly higher than the cutoff frequency of the sensor membranes. The outcome of this initial study was that the important modulating frequencies were in the range between 0 and 1 Hz. This is understandable, since the kinetics of reaction and adsorption processes taking place at the sensor surface (i.e., physisorption/chemisorption/ionosorption) are slow and if these are to be altered by the thermal modulation, low frequency modulating signals need to be devised. This explains why low-frequency temperature-modulating signals (i.e. in the mHz range) have been used with micro-hotplate gas sensors, even though the thermal response of their membranes is much faster (typically, near 100 Hz).In the experiments that followed the first ones, an evolved method to determine the optimal temperature modulating frequencies for micro-hotplate gas sensors was introduced, which was based on the use of maximum length multilevel pseudo-random sequences (MLPRS). Multilevel signals were considered instead of the binary ones because the former can provide a better estimate than the latter of the linear dynamics of a process with non-linearity. And it is well known that temperature-modulated metal oxide gas sensors present non-linearity in their response.These systematic studies were fully validated by synthesizing multi-sinusoidal signals at the optimal frequencies previously identified using pseudo-random sequences. When the sensors had their operating temperatures modulated by a signal with a frequency content that corresponded to the optimal, the gases and gas mixtures considered could be perfectly discriminated and the building of accurate calibration models to predict gas concentration was found to be possible. In some cases, the validation process was conducted on sensors that had not been used for optimization purposes (e.g. a different sensor array from the same fabrication batch).Summarizing, the new method developed in this thesis for selecting the optimal modulating frequencies is shown to be consistent and effective. The method applies generally and could be used in any gas analysis problem or extended to other type of sensors (e.g. conducting polymer sensors).The scientific contributions of this thesis are collected in four journal papers and thirteen conference proceedings. calibration models. optimization Micro-hotplate variable selection system identification pseudo-random maximum length sequences temperature modulation gas sensor 311 51 62 621.3
208	Bayesian Inference in Structural Second-Price Auctions Wegmann, Bertil January 2011 (has links) The aim of this thesis is to develop efficient and practically useful Bayesian methods for statistical inference in structural second-price auctions. The models are applied to a carefully collected coin auction dataset with bids and auction-specific characteristics from one thousand Internet auctions on eBay. Bidders are assumed to be risk-neutral and symmetric, and compete for a single object using the same game-theoretic strategy. A key contribution in the thesis is the derivation of very accurate approximations of the otherwise intractable equilibrium bid functions under different model assumptions. These easily computed and numerically stable approximations are shown to be crucial for statistical inference, where the inverse bid functions typically needs to be evaluated several million times. In the first paper, the approximate bid is a linear function of a bidder's signal and a Gaussian common value model is estimated. We find that the publicly available book value and the condition of the auctioned object are important determinants of bidders' valuations, while eBay's detailed seller information is essentially ignored by the bidders. In the second paper, the Gaussian model in the first paper is contrasted to a Gamma model that allows intrinsically non-negative common values. The Gaussian model performs slightly better than the Gamma model on the eBay data, which we attribute to an almost normal or at least symmetrical distribution of valuations. The third paper compares the model in the first paper to a directly comparable model for private values. We find many interesting empirical regularities between the models, but no strong and consistent evidence in favor of one model over the other. In the last paper, we consider auctions with both private-value and common-value bidders. The equilibrium bid function is given as the solution to an ordinary differential equation, from which we derive an approximate inverse bid as an explicit function of a given bid. The paper proposes an elaborate model where the probability of being a common value bidder is a function of covariates at the auction level. The model is estimated by a Metropolis-within-Gibbs algorithm and the results point strongly to an active influx of both private-value and common-value bidders. / <p>At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 1: Epub ahead of print. Paper 2: Manuscript. Paper 3: Manuscript. Paper 4: Manuscript.</p> Asymmetry Bid function approximation Common values Gamma model Gaussian model Markov Chain Monte Carlo Private values Variable selection Internet auctions Statistics Statistik
209	Some Contributions to Design Theory and Applications Mandal, Abhyuday 13 June 2005 (has links) The thesis focuses on the development of statistical theory in experimental design with applications in global optimization. It consists of four parts. In the first part, a criterion of design efficiency, under model uncertainty, is studied with reference to possibly nonregular fractions of general factorials. The results are followed by a numerical study and the findings are compared with those based on other design criteria. In the second part, optimal designs are dentified using Bayesian methods. This work is linked with response surface methodology where the first step is to perform factor screening, followed by response surface exploration using different experiment plans. A Bayesian analysis approach is used that aims to achieve both goals using one experiment design. In addition we use a Bayesian design criterion, based on the priors for the analysis approach. This creates an integrated design and analysis framework. To distinguish between competing models, the HD criterion is used, which is based on the pairwise Hellinger distance between predictive densities. Mixed-level fractional factorial designs are commonly used in practice but its aliasing relations have not been studied in full rigor. These designs take the form of a product array. Aliasing patterns of mixed level factorial designs are discussed in the third part. In the fourth part, design of experiment ideas are used to introduce a new global optimization technique called SELC (Sequential Elimination of Level Combinations), which is motivated by genetic algorithms but finds the optimum faster. The two key features of the SELC algorithm, namely, forbidden array and weighted mutation, enhance the performance of the search procedure. Illustration is given with the optimization of three functions, one of which is from Shekel's family. A real example on compound optimization is also given. Factorial design Genetic algorithms Cross arrays Model discrimination Hellinger distance Projection Fractional factorial design Bayesian variable selection Orthogonal arrays Response surface methodology
210	Bayesian models for DNA microarray data analysis Lee, Kyeong Eun 29 August 2005 (has links) Selection of signi?cant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This research proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables in a regression setting and use a Bayesian mixture prior to perform the variable selection. Due to the binary nature of the data, the posterior distributions of the parameters are not in explicit form, and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the posterior distributions. The Bayesian model is ?exible enough to identify the signi?cant genes as well as to perform future predictions. The method is applied to cancer classi?cation via cDNA microarrays. In particular, the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify the set of signi?cant genes to classify BRCA1 and others. Microarray data can also be applied to survival models. We address the issue of how to reduce the dimension in building model by selecting signi?cant genes as well as assessing the estimated survival curves. Additionally, we consider the wellknown Weibull regression and semiparametric proportional hazards (PH) models for survival analysis. With microarray data, we need to consider the case where the number of covariates p exceeds the number of samples n. Speci?cally, for a given vector of response values, which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the responsible genes, which are controlling the survival time. This approach enables us to estimate the survival curve when n << p. In our approach, rather than ?xing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional ?exibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in e?ect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology with (a) di?use large B??cell lymphoma (DLBCL) complementary DNA (cDNA) data and (b) Breast Carcinoma data. Lastly, we propose a mixture of Dirichlet process models using discrete wavelet transform for a curve clustering. In order to characterize these time??course gene expresssions, we consider them as trajectory functions of time and gene??speci?c parameters and obtain their wavelet coe?cients by a discrete wavelet transform. We then build cluster curves using a mixture of Dirichlet process priors. DNA microarray Bayesian Variable Selection Probit Regression Model Weibull Regression Model Cox's Proportional Hazard Model Survival Analysis Curve Clustering Mixture of Dirichlet Processes Wavelet

Search results