Global ETD Search

581	Stochastic Stepwise Ensembles for Variable Selection Xin, Lu 30 April 2009 (has links) Ensembles methods such as AdaBoost, Bagging and Random Forest have attracted much attention in the statistical learning community in the last 15 years. Zhu and Chipman (2006) proposed the idea of using ensembles for variable selection. Their implementation used a parallel genetic algorithm (PGA). In this thesis, I propose a stochastic stepwise ensemble for variable selection, which improves upon PGA. Traditional stepwise regression (Efroymson 1960) combines forward and backward selection. One step of forward selection is followed by one step of backward selection. In the forward step, each variable other than those already included is added to the current model, one at a time, and the one that can best improve the objective function is retained. In the backward step, each variable already included is deleted from the current model, one at a time, and the one that can best improve the objective function is discarded. The algorithm continues until no improvement can be made by either the forward or the backward step. Instead of adding or deleting one variable at a time, Stochastic Stepwise Algorithm (STST) adds or deletes a group of variables at a time, where the group size is randomly decided. In traditional stepwise, the group size is one and each candidate variable is assessed. When the group size is larger than one, as is often the case for STST, the total number of variable groups can be quite large. Instead of evaluating all possible groups, only a few randomly selected groups are assessed and the best one is chosen. From a methodological point of view, the improvement of STST ensemble over PGA is due to the use of a more structured way to construct the ensemble; this allows us to better control over the strength-diversity tradeoff established by Breiman (2001). In fact, there is no mechanism to control this fundamental tradeoff in PGA. Empirically, the improvement is most prominent when a true variable in the model has a relatively small coefficient (relative to other true variables). I show empirically that PGA has a much higher probability of missing that variable. Stochastic Stepwise Ensemble Parallel Genetic Algorithm Variable Selection statistical learning Statistics
582	Framework for Calibration of a Traffic State Space Model Sandin, Mats, Fransson, Magnus January 2012 (has links) To evaluate the traffic state over time and space, several models can be used. A typical model for estimating the state of the traffic for a stretch of road or a road network is the cell transmission model, which is a form of state space model. This kind of model typically needs to be calibrated since the different roads have different properties. This thesis will present a calibration framework for the velocity based cell transmission model, the CTM-v. The cell transmission model for velocity is a discrete time dynamical system that can model the evolution of the velocity field on highways. Such a model can be fused with an ensemble Kalman filter update algorithm for the purpose of velocity data assimilation. Indeed, enabling velocity data assimilation was the purpose for ever developing the model in the first place and it is an essential part of the Mobile Millennium research project. Therefore a systematic methodology for calibrating the cell transmission is needed. This thesis presents a framework for calibration of the velocity based cell transmission model that is combined with the ensemble Kalman filter. The framework consists of two separate methods, one is a statistical approach to calibration of the fundamental diagram. The other is a black box optimization method, a simplification of the complex method that can solve inequality constrained optimization problems with non-differentiable objective functions. Both of these methods are integrated with the existing system, yielding a calibration framework, in particular highways were stationary detectors are part of the infrastructure. The output produced by the above mentioned system is highly dependent on the values of its characterising parameters. Such parameters need to be calibrated so as to make the model a valid representation of reality. Model calibration and validation is a process of its own, most often tailored for the researchers models and purposes. The combination of the two methods are tested in a suit of experiments for two separate highway models of Interstates 880 and 15, CA which are evaluated against travel time and space mean speed estimates given by Bluetooth detectors with an error between 7.4 and 13.4 % for the validation time periods depending on the parameter set and model. traffic framework calibration macro traffic model ensemble kalman filter cell transmission model
583	Cooperative Training in Multiple Classifier Systems Dara, Rozita Alaleh January 2007 (has links) Multiple classifier system has shown to be an effective technique for classification. The success of multiple classifiers does not entirely depend on the base classifiers and/or the aggregation technique. Other parameters, such as training data, feature attributes, and correlation among the base classifiers may also contribute to the success of multiple classifiers. In addition, interaction of these parameters with each other may have an impact on multiple classifiers performance. In the present study, we intended to examine some of these interactions and investigate further the effects of these interactions on the performance of classifier ensembles. The proposed research introduces a different direction in the field of multiple classifiers systems. We attempt to understand and compare ensemble methods from the cooperation perspective. In this thesis, we narrowed down our focus on cooperation at training level. We first developed measures to estimate the degree and type of cooperation among training data partitions. These evaluation measures enabled us to evaluate the diversity and correlation among a set of disjoint and overlapped partitions. With the aid of properly selected measures and training information, we proposed two new data partitioning approaches: Cluster, De-cluster, and Selection (CDS) and Cooperative Cluster, De-cluster, and Selection (CO-CDS). In the end, a comprehensive comparative study was conducted where we compared our proposed training approaches with several other approaches in terms of robustness of their usage, resultant classification accuracy and classification stability. Experimental assessment of CDS and CO-CDS training approaches validates their robustness as compared to other training approaches. In addition, this study suggests that: 1) cooperation is generally beneficial and 2) classifier ensembles that cooperate through sharing information have higher generalization ability compared to the ones that do not share training information. Pattern Recognition Multiple Classifier System Cooperation Cooperative Training Classifier Ensemble Optimization System Design Engineering
584	Stochastic Stepwise Ensembles for Variable Selection Xin, Lu 30 April 2009 (has links) Ensembles methods such as AdaBoost, Bagging and Random Forest have attracted much attention in the statistical learning community in the last 15 years. Zhu and Chipman (2006) proposed the idea of using ensembles for variable selection. Their implementation used a parallel genetic algorithm (PGA). In this thesis, I propose a stochastic stepwise ensemble for variable selection, which improves upon PGA. Traditional stepwise regression (Efroymson 1960) combines forward and backward selection. One step of forward selection is followed by one step of backward selection. In the forward step, each variable other than those already included is added to the current model, one at a time, and the one that can best improve the objective function is retained. In the backward step, each variable already included is deleted from the current model, one at a time, and the one that can best improve the objective function is discarded. The algorithm continues until no improvement can be made by either the forward or the backward step. Instead of adding or deleting one variable at a time, Stochastic Stepwise Algorithm (STST) adds or deletes a group of variables at a time, where the group size is randomly decided. In traditional stepwise, the group size is one and each candidate variable is assessed. When the group size is larger than one, as is often the case for STST, the total number of variable groups can be quite large. Instead of evaluating all possible groups, only a few randomly selected groups are assessed and the best one is chosen. From a methodological point of view, the improvement of STST ensemble over PGA is due to the use of a more structured way to construct the ensemble; this allows us to better control over the strength-diversity tradeoff established by Breiman (2001). In fact, there is no mechanism to control this fundamental tradeoff in PGA. Empirically, the improvement is most prominent when a true variable in the model has a relatively small coefficient (relative to other true variables). I show empirically that PGA has a much higher probability of missing that variable. Stochastic Stepwise Ensemble Parallel Genetic Algorithm Variable Selection statistical learning Statistics
585	Musik für Holzinstrumente Drude, Matthias 19 November 2012 (has links) (PDF) Partitur eines Kammermusikwerkes von Matthias Drude. Das Werk wurde 2010 für Oboe, Klarinette, Fagott, Marimbaphon und Streichquintett komponiert. Komposition Kammermusik Ensemblestück Neue Musik score chamber music ddc:785.15 rvk:LU 95900 Ensemble (instrumental)
586	Develop Microchip with Gold Nanoelectrode Ensemble Electrodes for Electrochemical Detection of Verapamil Chuang, Jui-Fen 11 August 2011 (has links) Verapamil is a commonly used medicine for the treatment of supraventricular arrhythmias, angina and hypertension. Recently, some newly developed applications of Verapamil, such as treating hypomania and chemotherapy for cancers, have been reported. Thus, monitoring the concentration of Verapamil accurately is very important. The major clinical analytical methods of Verapamil concentration determination are high performance liquid chromatography (HPLC) with UV or with fluorescence detector. However, these analytical methods have some disadvantages, like expensive instruments, complex operation, and time-consuming etc. The chemical structure and properties of Verapamil are very stable. The preliminary result of electrochemical analysis doesn¡¦t show any electrochemical activity. In this study, we developed an innovative ozone pre-treatment method to oxidize Verapamil to the smaller molecules and change its structure. Verapamil have excellent electrochemical activity after ozone pre-treatment. The spectroscopy and mass spectrometry show the changes of Verapamil structure. The products of Verapamil treated with ozone are also predicted by mass spectrometry. The gold nanoelectrode ensemble electrodes (GNEE) are used as working electrode for its good catalytic activity of electrochemical reaction, high sensitivity and high selectivity. The overall experimental framework of this study is microchip with GNEE working electrode accompanied by cyclic voltammetry, an electrochemical analytical instrument. Compared with traditional analytical methods, the system has some advantages such as small size, micro sample volume, easy operation, rapid detection and low cost. The limit concentration of Verapamil solution for stable detection in the system is 10 ng/mL. A linear dynamic range with a high correlation factor from 10 ng/mL to 100 £gg/mL was obtained. For the analysis of serum sample, Verapamil present excellent electrochemical activity at 1 ng/mL. A linear dynamic range with a high correlation factor from 1 ng/mL to 100 £gg/mLwas obtained. According to the results, our system for clinical Verapmil concentration analysis has the feasibility of the practical application. analysis of serum sample gold nanoelectrode ensemble microchip Verapamil cyclic voltammetry ozone pretreatment
587	Ensemble Statistics and Error Covariance of a Rapidly Intensifying Hurricane Rigney, Matthew C. 16 January 2010 (has links) This thesis presents an investigation of ensemble Gaussianity, the effect of non- Gaussianity on covariance structures, storm-centered data assimilation techniques, and the relationship between commonly used data assimilation variables and the underlying dynamics for the case of Hurricane Humberto. Using an Ensemble Kalman Filter (EnKF), a comparison of data assimilation results in Storm-centered and Eulerian coordinate systems is made. In addition, the extent of the non-Gaussianity of the model ensemble is investigated and quantified. The effect of this non-Gaussianity on covariance structures, which play an integral role in the EnKF data assimilation scheme, is then explored. Finally, the correlation structures calculated from a Weather Research Forecast (WRF) ensemble forecast of several state variables are investigated in order to better understand the dynamics of this rapidly intensifying cyclone. Hurricane Humberto rapidly intensified in the northwestern Gulf of Mexico from a tropical disturbance to a strong category one hurricane with 90 mph winds in 24 hours. Numerical models did not capture the intensification of Humberto well. This could be due in large part to initial condition error, which can be addressed by data assimilation schemes. Because the EnKF scheme is a linear theory developed on the assumption of the normality of the ensemble distribution, non-Gaussianity in the ensemble distribution used could affect the EnKF update. It is shown that multiple state variables do indeed show significant non-Gaussianity through an inspection of statistical moments. In addition, storm-centered data assimilation schemes present an alternative to traditional Eulerian schemes by emphasizing the centrality of the cyclone to the assimilation window. This allows for an update that is most effective in the vicinity of the storm center, which is of most concern in mesoscale events such as Humberto. Finally, the effect of non-Gaussian distributions on covariance structures is examined through data transformations of normal distributions. Various standard transformations of two Gaussian distributions are made. Skewness, kurtosis, and correlation between the two distributions are taken before and after the transformations. It can be seen that there is a relationship between a change in skewness and kurtosis and the correlation between the distributions. These effects are then taken into consideration as the dynamics contributing to the rapid intensification of Humberto are explored through correlation structures. hurricane numerical weather prediction data assimilation error covariance Ensemble Kalman Filter EnKF storm-centered assimilation
588	Upscaling methods for multi-phase flow and transport in heterogeneous porous media Li, Yan 2009 December 1900 (has links) In this dissertation we discuss some upscaling methods for flow and transport in heterogeneous reservoirs. We studied realization-based multi-phase flow and transport upscaling and ensemble-level flow upscaling. Multi-phase upscaling is more accurate than single-phase upscaling and is often required for high level of coarsening. In multi-phase upscaling, the upscaled transport parameters are time-dependent functions and are challenging to compute. Due to the hyperbolic feature of the saturation equation, the nonlocal effects evolve in both space and time. Standard local two-phase upscaling gives significantly biased results with reference to fine-scale solutions. In this work, we proposed two types of multi-phase upscaling methods, TOF (time-offlight)- based two-phase upscaling and local-global two-phase upscaling. These two methods incorporate global flow information into local two-phase upscaling calculations. A linear function of time and time-of-flight and a global coarse-scale two-phase solution (time-dependent) are used respectively in these two approaches. The local boundary condition therefore captures the global flow effects both spatially and temporally. These two methods are applied to permeability distributions with various correlation lengths. Numerical results show that they consistently improve existing two-phase upscaling methods and provide accurate coarse-scale solutions for both flow and transport. We also studied ensemble level flow upscaling. Ensemble level upscaling is up scaling for multiple geological realizations and often required for uncertainty quantification. Solving the flow problem for all the realizations is time-consuming. In recent years, some stochastic procedures are combined with upscaling methods to efficiently compute the upscaled coefficients for a large set of realization. We proposed a fast perturbation approach in the ensemble level upscaling. By Karhunen-Lo`eve expansion (KLE), we proposed a correction scheme to fast compute the upscaled permeability for each realization. Then the sparse grid collocation and adaptive clustering are coupled with the correction scheme. When we solve the local problem, the solution can be represented by a product of Green's function and source term. Using collocation and clusering technique, one can avoid the computation of Green's function for all the realizations. We compute Green's function at the interpolation nodes, then for any realization, the Green's function can be obtained by interpolation. The above techniques allow us to compute the upscaled permeability rapidly for all realizations in stochastic space.
589	The Bootstrap in Supervised Learning and its Applications in Genomics/Proteomics Vu, Thang 2011 May 1900 (has links) The small-sample size issue is a prevalent problem in Genomics and Proteomics today. Bootstrap, a resampling method which aims at increasing the efficiency of data usage, is considered to be an effort to overcome the problem of limited sample size. This dissertation studies the application of bootstrap to two problems of supervised learning with small sample data: estimation of the misclassification error of Gaussian discriminant analysis, and the bagging ensemble classification method. Estimating the misclassification error of discriminant analysis is a classical problem in pattern recognition and has many important applications in biomedical research. Bootstrap error estimation has been shown empirically to be one of the best estimation methods in terms of root mean squared error. In the first part of this work, we conduct a detailed analytical study of bootstrap error estimation for the Linear Discriminant Analysis (LDA) classification rule under Gaussian populations. We derive the exact formulas of the first and the second moment of the zero bootstrap and the convex bootstrap estimators, as well as their cross moments with the resubstitution estimator and the true error. Based on these results, we obtain the exact formulas of the bias, the variance, and the root mean squared error of the deviation from the true error of these bootstrap estimators. This includes the moments of the popular .632 bootstrap estimator. Moreover, we obtain the optimal weight for unbiased and minimum-RMS convex bootstrap estimators. In the univariate case, all the expressions involve Gaussian distributions, whereas in the multivariate case, the results are written in terms of bivariate doubly non-central F distributions. In the second part of this work, we conduct an extensive empirical investigation of bagging, which is an application of bootstrap to ensemble classification. We investigate the performance of bagging in the classification of small-sample gene-expression data and protein-abundance mass spectrometry data, as well as the accuracy of small-sample error estimation with this ensemble classification rule. We observed that, under t-test and RELIEF filter-based feature selection, bagging generally does a good job of improving the performance of unstable, overtting classifiers, such as CART decision trees and neural networks, but that improvement was not sufficient to beat the performance of single stable, non-overtting classifiers, such as diagonal and plain linear discriminant analysis, or 3-nearest neighbors. Furthermore, the ensemble method did not improve the performance of these stable classifiers significantly. We give an explicit definition of the out-of-bag estimator that is intended to remove estimator bias, by formulating carefully how the error count is normalized, and investigate the performance of error estimation for bagging of common classification rules, including LDA, 3NN, and CART, applied on both synthetic and real patient data, corresponding to the use of common error estimators such as resubstitution, leave-one-out, cross-validation, basic bootstrap, bootstrap 632, bootstrap 632 plus, bolstering, semi-bolstering, in addition to the out-of-bag estimator. The results from the numerical experiments indicated that the performance of the out-of-bag estimator is very similar to that of leave-one-out; in particular, the out-of-bag estimator is slightly pessimistically biased. The performance of the other estimators is consistent with their performance with the corresponding single classifiers, as reported in other studies. The results of this work are expected to provide helpful guidance to practitioners who are interested in applying the bootstrap in supervised learning applications. Bootstrap Error Estimation Classification LDA Bagging Out-of-Bag Estimation Ensemble Methods Genomics Proteomics
590	An Ensemble Approach for Text Categorization with Positive and Unlabeled Examples Chen, Hsueh-Ching 29 July 2005 (has links) Text categorization is the process of assigning new documents to predefined document categories on the basis of a classification model(s) induced from a set of pre-categorized training documents. In a typical dichotomous classification scenario, the set of training documents includes both positive and negative examples; that is, each of the two categories is associated with training documents. However, in many real-world text categorization applications, positive and unlabeled documents are readily available, whereas the acquisition of samples of negative documents is extremely expensive or even impossible. In this study, we propose and develop an ensemble approach, referred to as E2, to address the limitations of existing algorithms for learning from positive and unlabeled training documents. Using the spam email filtering as the evaluation application, our empirical evaluation results suggest that the proposed E2 technique exhibits more stable and reliable performance than PNB and PEBL. Single-Class Classification Text Mining Positive Examples Text Categorization Unlabeled Examples Ensemble Approach

Search results