Global ETD Search

41	Machine Learning-based Analysis of the Relationship Between the Human Gut Microbiome and Bone Health January 2020 (has links) abstract: The Human Gut Microbiome (GM) modulates a variety of structural, metabolic, and protective functions to benefit the host. A few recent studies also support the role of the gut microbiome in the regulation of bone health. The relationship between GM and bone health was analyzed based on the data collected from a group of twenty-three adolescent boys and girls who participated in a controlled feeding study, during which two different doses (0 g/d fiber and 12 g/d fiber) of Soluble Corn Fiber (SCF) were added to their diet. This analysis was performed by predicting measures of Bone Mineral Density (BMD) and Bone Mineral Content (BMC) which are indicators of bone strength, using the GM sequence of proportions of 178 microbes collected from 23 subjects, by building a machine learning regression model. The model developed was evaluated by calculating performance metrics such as Root Mean Squared Error, Pearson’s correlation coefficient, and Spearman’s rank correlation coefficient, using cross-validation. A noticeable correlation was observed between the GM and bone health, and it was observed that the overall prediction correlation was higher with SCF intervention (r ~ 0.51). The genera of microbes that played an important role in this relationship were identified. Eubacterium (g), Bacteroides (g), Megamonas (g), Acetivibrio (g), Faecalibacterium (g), and Paraprevotella (g) were some of the microbes that showed an increase in proportion with SCF intervention. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2020 Electrical engineering Bone Health Cross-Validation Human Gut Microbiome Machine Learning Soluble Corn Fiber
42	Comparative Data Analytic Approach for Detection of Diabetes Sood, Radhika January 2018 (has links) No description available. Information Technology Data mining Diabetes Clustering K-fold cross-validation Imbalanced data Decision-support tool
43	Using Transcriptomic Data to Predict Biomarkers for Subtyping of Lung Cancer Daran, Rukesh January 2021 (has links) Lung cancer is one the most dangerous types of all cancer. Several studies have explored the use of machine learning methods to predict and diagnose this cancer. This study explored the potential of decision tree (DT) and random forest (RF) classification models, in the context of a small transcriptome dataset for outcome prediction of different subtypes on lung cancer. In the study we compared the three subtypes; adenocarcinomas (AC), small cell lung cancer (SCLC) and squamous cell carcinomas (SCC) with normal lung tissue by applying the two machine learning methods from caret R package. The DT and RF model and their validation showed different results for each subtype of the lung cancer data. The DT found more features and validated them with better metrics. Analysis of the biological relevance was focused on the identified features for each of the subtypes AC, SCLC and SCC. The DT presented a detailed insight into the biological data which was essential by classifying it as a biomarker. The identified features from this research may serve as potential candidate genes which could be explored further to confirm their role in corresponding lung cancer types and contribute to targeted diagnostics of different subtypes. lung cancer decision tree random forest accuracy cross-validation machine learning Bioinformatics and Systems Biology Bioinformatik och systembiologi
44	Multi-objective ROC learning for classification Clark, Andrew Robert James January 2011 (has links) Receiver operating characteristic (ROC) curves are widely used for evaluating classifier performance, having been applied to e.g. signal detection, medical diagnostics and safety critical systems. They allow examination of the trade-offs between true and false positive rates as misclassification costs are varied. Examination of the resulting graphs and calcu- lation of the area under the ROC curve (AUC) allows assessment of how well a classifier is able to separate two classes and allows selection of an operating point with full knowledge of the available trade-offs. In this thesis a multi-objective evolutionary algorithm (MOEA) is used to find clas- sifiers whose ROC graph locations are Pareto optimal. The Relevance Vector Machine (RVM) is a state-of-the-art classifier that produces sparse Bayesian models, but is unfor- tunately prone to overfitting. Using the MOEA, hyper-parameters for RVM classifiers are set, optimising them not only in terms of true and false positive rates but also a novel measure of RVM complexity, thus encouraging sparseness, and producing approximations to the Pareto front. Several methods for regularising the RVM during the MOEA train- ing process are examined and their performance evaluated on a number of benchmark datasets demonstrating they possess the capability to avoid overfitting whilst producing performance equivalent to that of the maximum likelihood trained RVM. A common task in bioinformatics is to identify genes associated with various genetic conditions by finding those genes useful for classifying a condition against a baseline. Typ- ically, datasets contain large numbers of gene expressions measured in relatively few sub- jects. As a result of the high dimensionality and sparsity of examples, it can be very easy to find classifiers with near perfect training accuracies but which have poor generalisation capability. Additionally, depending on the condition and treatment involved, evaluation over a range of costs will often be desirable. An MOEA is used to identify genes for clas- sification by simultaneously maximising the area under the ROC curve whilst minimising model complexity. This method is illustrated on a number of well-studied datasets and ap- plied to a recent bioinformatics database resulting from the current InChianti population study. Many classifiers produce “hard”, non-probabilistic classifications and are trained to find a single set of parameters, whose values are inevitably uncertain due to limited available training data. In a Bayesian framework it is possible to ameliorate the effects of this parameter uncertainty by averaging over classifiers weighted by their posterior probabil- ity. Unfortunately, the required posterior probability is not readily computed for hard classifiers. In this thesis an Approximate Bayesian Computation Markov Chain Monte Carlo algorithm is used to sample model parameters for a hard classifier using the AUC as a measure of performance. The ability to produce ROC curves close to the Bayes op- timal ROC curve is demonstrated on a synthetic dataset. Due to the large numbers of sampled parametrisations, averaging over them when rapid classification is needed may be impractical and thus methods for producing sparse weightings are investigated. 519.6
45	Model calibration methods for mechanical systems with local nonlinearities Chen, Yousheng January 2016 (has links) Most modern product development utilizes computational models. With increasing demands on reducing the product development lead-time, it becomes more important to improve the accuracy and efficiency of simulations. In addition, to improve product performance, a lot of products are designed to be lighter and more flexible, thus more prone to nonlinear behaviour. Linear finite element (FE) models, which still form the basis of numerical models used to represent mechanical structures, may not be able to predict structural behaviour with necessary accuracy when nonlinear effects are significant. Nonlinearities are often localized to joints or boundary conditions. Including nonlinear behaviour in FE-models introduces more sources of uncertainty and it is often necessary to calibrate the models with the use of experimental data. This research work presents a model calibration method that is suitable for mechanical systems with structural nonlinearities. The methodology concerns pre-test planning, parameterization, simulation methods, vibrational testing and optimization. The selection of parameters for the calibration requires physical insights together with analyses of the structure; the latter can be achieved by use of simulations. Traditional simulation methods may be computationally expensive when dealing with nonlinear systems; therefore an efficient fixed-step state-space based simulation method was developed. To gain knowledge of the accuracy of different simulation methods, the bias errors for the proposed method as well as other widespread simulation methods were studied and compared. The proposed method performs well in comparison to other simulation methods. To obtain precise estimates of the parameters, the test data should be informative of the parameters chosen and the parameters should be identifiable. Test data informativeness and parameter identifiability are coupled and they can be assessed by the Fisher information matrix (FIM). To optimize the informativeness of test data, a FIM based pre-test planning method was developed and a multi-sinusoidal excitation was designed. The steady-state responses at the side harmonics were shown to contain valuable information for model calibration of FE-models representing mechanical systems with structural nonlinearities. In this work, model calibration was made by minimizing the difference between predicted and measured multi-harmonic frequency response functions using an efficient optimization routine. The steady-state responses were calculated using the extended multi-harmonic balance method. When the parameters were calibrated, a k-fold cross validation was used to obtain parameter uncertainty. The proposed model calibration method was validated using two test-rigs, one with a geometrical nonlinearity and one with a clearance type of nonlinearity. To attain high quality data efficiently, the amplitude of the forcing harmonics was controlled at each frequency step by an off-line force feedback algorithm. The applied force was then measured and used in the numerical simulations of the responses. It was shown in the validation results that the predictions from the calibrated models agree well with the experimental results. In summary, the presented methodology concerns both theoretical and experimental aspects as it includes methods for pre-test planning, simulations, testing, calibration and validation. As such, this research work offers a complete framework and contributes to more effective and efficient analyses on mechanical systems with structural nonlinearities. model calibration finite element modelling nonlinear structural dynamics pre-test planning multi-sinusoidal excitation vibrational testing cross validation
46	Neighborhood-Oriented feature selection and classification of Duke’s stages on colorectal Cancer using high density genomic data. Peng, Liang January 1900 (has links) Master of Science / Department of Statistics / Haiyan Wang / The selection of relevant genes for classification of phenotypes for diseases with gene expression data have been extensively studied. Previously, most relevant gene selection was conducted on individual gene with limited sample size. Modern technology makes it possible to obtain microarray data with higher resolution of the chromosomes. Considering gene sets on an entire block of a chromosome rather than individual gene could help to reveal important connection of relevant genes with the disease phenotypes. In this report, we consider feature selection and classification while taking into account of the spatial location of probe sets in classification of Duke’s stages B and C using DNA copy number data or gene expression data from colorectal cancers. A novel method was presented for feature selection in this report. A chromosome was first partitioned into blocks after the probe sets were aligned along their chromosome locations. Then a test of interaction between Duke’s stage and probe sets was conducted on each block of probe sets to select significant blocks. For each significant block, a new multiple comparison procedure was carried out to identify truly relevant probe sets while preserving the neighborhood location information of the probe sets. Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) classification using the selected final probe sets was conducted for all samples. Leave-One-Out Cross Validation (LOOCV) estimate of accuracy is reported as an evaluation of selected features. We applied the method on two large data sets, each containing more than 50,000 features. Excellent classification accuracy was achieved by the proposed procedure along with SVM or KNN for both data sets even though classification of prognosis stages (Duke’s stages B and C) is much more difficult than that for the normal or tumor types. Feature selection Classification Hypothesis testing Cross validation Multiple comparison Genomic data Bioinformatics (0715) Computer Science (0984) Statistics (0463)
47	Relação hipsométrica de eucalipto clonal no sul do Tocantins Schmitt, Thaís 14 September 2017 (has links) Este trabalho foi estruturado em dois capítulos, utilizando 11 parcelas retangulares e permanentes de 348 m² cada, de um plantio clonal de Eucalyptus camaldulensis e Eucalyptus urophylla na região Sul do Estado do Tocantins. O primeiro capítulo objetivou a melhor forma de ajuste de modelos hipsométricos, analisando a acurácia do melhor modelo, e aplicando-o em uma situação florestal diferente. Os dados foram divididos em um lote de ajuste e outro de aplicação, com três classes de diâmetro e três classes de altura dominante. Inicialmente determinou-se o coeficiente de determinação ajustado em porcentagem (R²aj), erro-padrão da estimativa em porcentagem (Syx%), e análise gráfica residual. Posteriormente realizou-se um teste de identidade de modelos, seguido de um delineamento inteiramente casualizado (DIC) no esquema de parcelas subdivididas, juntamente com o teste de Dunnet. No final da análise, para avaliar a estabilidade dos modelos em um teste de validação, foram utilizados os critérios: coeficiente de determinação da predição (R²), soma de quadrados do resíduo relativo (SQRR), raiz quadrada do erro médio (RQEM), erro médio percentual (EMP). Concluiu-se que a melhor forma de ajuste foi realizar um ajuste por classe, sendo o modelo regional o mais adequado a se utilizar. O segundo capítulo aborda a avaliação de modelos hipsométricos aplicando a técnica de validação cruzada, e a comparação dos resultados com aqueles obtidos no capítulo 1, visando obter o melhor modelo a ser utilizado na região sob diferentes aspectos de seleção. Inicialmente aplicaram-se os critérios de precisão: coeficiente de determinação ajustado, erro padrão da estimativa e análise gráfica residual. Em seguida foram aplicados os critérios de estabilidade realizando a validação cruzada entre os dois lotes de dados, que foram estes: erro médio absoluto, raiz do quadrado médio e soma de quadrados do erro médio. Os modelos selecionados foram submetidos a uma nova análise, utilizando-se os lotes de dados do capítulo 1, onde se aplicou os mesmos critérios de precisão e estabilidade utilizados anteriormente, resultando na comparação entre os capítulos. Concluiu-se que o melhor modelo local foi o 14 de Chapman-Richards, o melhor modelo regional foi o parabólico 03, e na comparação com os modelos selecionados no capítulo 01, o mais adequado para o plantio foi o modelo regional parabólico 3, proveniente do capítulo 02. / Hypsometric relations of clonal eucalyptus in south of Tocantins. This work was structured in two chapters, using 11 rectangular and permanent plots of 348 m² each, from a clonal plantation of Eucalyptus camaldulensis and Eucalyptus urophylla in the southern region of the state of Tocantins. The first chapter aimed at the best way of adjusting hypsometric models, analyzing the accuracy of the best model, and applying it in a different forest situation. The data were divided into one set of adjustment and another of application, with three classes of diameter and three classes of dominant height. The coefficient of determination adjusted in percentage (R²aj), standard error of the estimate in percentage (Syx%), and residual graphical analysis were determined initially. A model identity test was then performed, followed by a completely randomized design (DIC) in the subdivided plot scheme, along with the Dunnet test. At the end of the analysis, to evaluate the stability of the models in a validation test, the following criteria were used: prediction determination coefficient (R²), sum of squares of the residual residue (SQRR), square root mean error (RQEM) mean error (EMP). It was concluded that the best form of adjustment was to perform an adjustment by class, being the regional model the most appropriate to be used. The second chapter deals with the evaluation of hypsometric models applying the cross validation technique, and the comparison of the results with those obtained in chapter 1, aiming to obtain the best model to be used in the region under different aspects of selection. Initially the precision criteria were applied: adjusted coefficient of determination, standard error of the estimate and residual graphical analysis. Then, the stability criteria were applied by performing cross-validation between the two batches of data, which were: absolute mean error, mean square root, and mean square error sum. The selected models were submitted to a new analysis, using the data bundles of chapter 1, where the same criteria of precision and stability previously used were applied, resulting in the comparison between the chapters. It was concluded that the best local model was Chapman-Richards 14, the best regional model was parabolic 03, and in comparison with the models selected in chapter 01, the most suitable for planting was the regional parabolic model 3, of chapter 02. CNPQ::ENGENHARIAS Relação altura/diâmetro Formas de ajuste Validação cruzada Teste de identidade Height / diameter ratio; Cross-validation Identity test
48	Serial Testing for Detection of Multilocus Genetic Interactions Al-Khaledi, Zaid T. 01 January 2019 (has links) A method to detect relationships between disease susceptibility and multilocus genetic interactions is the Multifactor-Dimensionality Reduction (MDR) technique pioneered by Ritchie et al. (2001). Since its introduction, many extensions have been pursued to deal with non-binary outcomes and/or account for multiple interactions simultaneously. Studying the effects of multilocus genetic interactions on continuous traits (blood pressure, weight, etc.) is one case that MDR does not handle. Culverhouse et al. (2004) and Gui et al. (2013) proposed two different methods to analyze such a case. In their research, Gui et al. (2013) introduced the Quantitative Multifactor-Dimensionality Reduction (QMDR) that uses the overall average of response variable to classify individuals into risk groups. The classification mechanism may not be efficient under some circumstances, especially when the overall mean is close to some multilocus means. To address such difficulties, we propose a new algorithm, the Ordered Combinatorial Quantitative Multifactor-Dimensionality Reduction (OQMDR), that uses a series of testings, based on ascending order of multilocus means, to identify best interactions of different orders with risk patterns that minimize the prediction error. Ten-fold cross-validation is used to choose from among the resulting models. Regular permutations testings are used to assess the significance of the selected model. The assessment procedure is also modified by utilizing the Generalized Extreme-Value distribution to enhance the efficiency of the evaluation process. We presented results from a simulation study to illustrate the performance of the algorithm. The proposed algorithm is also applied to a genetic data set associated with Alzheimer's Disease. Multifactor dimensionality reduction Cross Validation Model selection Continuous Trait Continuous Phenotype Ordered Combinatorial Partitioning Applied Statistics Biostatistics Statistics and Probability
49	New Non-Parametric Methods for Income Distributions Luo, Shan 26 April 2013 (has links) Low income proportion (LIP), Lorenz curve (LC) and generalized Lorenz curve (GLC) are important indexes in describing the inequality of income distribution. They have been widely used for measuring social stability by governments around the world. The accuracy of estimating those indexes is essential to quantify the economics of a country. Established statistical inferential methods for these indexes are based on an asymptotic normal distribution, which may have poor performance when the real income data is skewed or has outliers. Recent applications of nonparametric methods, though, allow researchers to utilize techniques without giving data the parametric distribution assumption. For example, existing research proposes the plug-in empirical likelihood (EL)-based inferences for LIP, LC and GLC. However, this method becomes computationally intensive and mathematically complex because of the presence of nonlinear constraints in the underlying optimization problem. Meanwhile, the limiting distribution of the log empirical likelihood ratio is a scaled Chi-square distribution. The estimation of the scale constant will affect the overall performance of the plug-in EL method. To improve the efficiency of the existing inferential methods, this dissertation first proposes kernel estimators for LIP, LC and GLC, respectively. Then the cross-validation method is proposed to choose bandwidth for the kernel estimators. These kernel estimators are proved to have asymptotic normality. The smoothed jackknife empirical likelihood (SJEL) for LIP, LC and GLC are defined. Then the log-jackknife empirical likelihood ratio statistics are proved to follow the standard Chi-square distribution. Extensive simulation studies are conducted to evaluate the kernel estimators in terms of Mean Square Error and Asymptotic Relative Efficiency. Next, the SJEL-based confidence intervals and the smoothed bootstrap-based confidence intervals are proposed. The coverage probability and interval length for the proposed confidence intervals are calculated and compared with the normal approximation-based intervals. The proposed kernel estimators are found to be competitive estimators, and the proposed inferential methods are observed to have better finite-sample performance. All inferential methods are illustrated through real examples. Low income proportion Lorenz curve Generalized Lorenz curve Ker- nel estimator Bandwidth Empirical likelihood Bootstrap Jackknife Cross-validation
50	Ensembles of Artificial Neural Networks: Analysis and Development of Design Methods Torres Sospedra, Joaquín 30 September 2011 (has links) This thesis is focused on the analysis and development of Ensembles of Neural Networks. An ensemble is a system in which a set of heterogeneous Artificial Neural Networks are generated in order to outperform the Single network based classifiers. However, this proposed thesis differs from others related to ensembles of neural networks [1, 2, 3, 4, 5, 6, 7] since it is organized as follows. In this thesis, firstly, an ensemble methods comparison has been introduced in order to provide a rank-based list of the best ensemble methods existing in the bibliography. This comparison has been split into two researches which represents two chapters of the thesis. Moreover, there is another important step related to the ensembles of neural networks which is how to combine the information provided by the neural networks in the ensemble. In the bibliography, there are some alternatives to apply in order to get an accurate combination of the information provided by the heterogeneous set of networks. For this reason, a combiner comparison has also been introduced in this thesis. Furthermore, Ensembles of Neural Networks is only a kind of Multiple Classifier System based on neural networks. However, there are other alternatives to generate MCS based on neural networks which are quite different to Ensembles. The most important systems are Stacked Generalization and Mixture of Experts. These two systems will be also analysed in this thesis and new alternatives are proposed. One of the results of the comparative research developed is a deep understanding of the field of ensembles. So new ensemble methods and combiners can be designed after analyzing the results provided by the research performed. Concretely, two new ensemble methods, a new ensemble methodology called Cross-Validated Boosting and two reordering algorithms are proposed in this thesis. The best overall results are obtained by the ensemble methods proposed. Finally, all the experiments done have been carried out on a common experimental setup. The experiments have been repeated ten times on nineteen different datasets from the UCI repository in order to validate the results. Moreover, the procedure applied to set up specific parameters is quite similar in all the experiments performed. It is important to conclude by remarking that the main contributions are: 1) An experimental setup to prepare the experiments which can be applied for further comparisons. 2) A guide to select the most appropriate methods to build and combine ensembles and multiple classifiers systems. 3) New methods proposed to build ensembles and other multiple classifier systems. Ensemble Neural Networks Multilayer Feedforward Mixture Stacked Combination Multiple Classifier Systems Cross-Validation Boosting Reordering 004

Search results