41 |
Aeroacústica de motores aeronáuticos: uma abordagem por meta-modelo / Aeroengine aeroacoustics: a meta-model approachRafael Gigena Cuenca 20 June 2017 (has links)
Desde a última década, as autoridades aeronáuticas dos países membros da ICAO vem, gradativamente, aumentando as restrições nos níveis de ruído externo de aeronaves, principalmente nas proximidades dos aeroportos. Por isso os novos motores aeronáuticos precisam ter projetos mais silenciosos, tornando as técnicas de predição de ruído de motores cada vez mais importantes. Diferente das técnicas semi-analíticas, que vêm evoluindo nas últimas décadas, as técnicas semiempíricas possuem suas bases lastreadas em técnicas e dados que remontam à década de 70, como as desenvolvidas no projeto ANOPP. Uma bancada de estudos aeroacústicos para um conjunto rotor/estator foi construída no departamento de Engenharia Aeronáutica da Escola de Engenharia de São Carlos, permitindo desenvolver uma metodologia capaz de gerar uma técnica semi-empírica utilizando métodos e dados novos. Tal bancada é capaz de variar a rotação, o espaçamento rotor/estator e controlar a vazão mássica, resultando em 71 configurações avaliadas. Para isso, uma antena de parede com 14 microfones foi usada. O espectro do ruído de banda larga é modelado como um ruído rosa e o ruído tonal é modelado por um comportamento exponencial, resultando em 5 parâmetros: nível do ruído, decaimento linear e fator de forma da banda larga, nível do primeiro tonal e o decaimento exponencial de seus harmônicos. Uma regressão superficial Kriging é utilizada para aproximar os 5 parâmetros utilizando as variáveis do experimento e o estudo mostrou que Mach Tip e RSS são as principais variáveis que definem o ruído, assim como utilizado pelo projeto ANOPP. Assim, um modelo de previsão é definido para o conjunto rotor/estator estudado na bancada, o que permite prever o espectro em condições não ensaiadas. A análise do modelo resultou em uma ferramenta de interpretação dos resultados. Ao modelo são aplicadas 3 técnicas de validação cruzada: leave one out, monte carlo e repeated k-folds e mostrou que o modelo desenvolvido possui um erro médio, do nível do ruído total do espectro, de 2.35 dBs e desvio padrão de 0.91. / Since the last decade, the countries members of ICAO, via its aeronautical authorities, has been gradually increasing the restrictions on external aircraft noise levels, especially in the vicinity of airports. Because that, the new aero-engines need quieter designs, so noise prediction techniques for aero-engines are getting even more important. Semi-analytical techniques have undergone a major evolution since the 70th until nowadays, but semi-empirical techniques still have their bases pegged in techniques and data defined on the 70th, developed in the ANOPP project. An Aeroacoustics Fan Rig to investigate a Rotor/Stator assembly was developed at Aeronautical Engineering Department of São Carlos School of Engineering, allowing the development of a methodology capable of defining a semi-empirical technique based on new data and methods. Such rig is able to vary the rotation, the rotor/stator spacing and mass flow rate, resulting in a set of 71 configurations tested. To measure the noise, a microphone wall antenna with 14 sensors were used. The broadband noise was modeled by a pink noise and the tonal with exponential behavior, resulting in 5 parameters: broadband noise level, decay and form factor and the level and decay of tonal noise. A superficial kriging regression were used to approach the parameters using the experimental variables and the investigation has shown that Mach Tip and RSS are the most important variables that defines the noise, as well on ANOPP. A prediction model for the rotor/stator noise are defined with the 5 approximation of the parameters, that allow to predict the spectra at operations points not measured. The model analyses of the model resulted on a tool for results interpretation. Tree different cross validation techniques are applied to model: leave ou out, Monte Carlo and repeated k-folds. That analysis shows that the model developed has average error of 2.35 dBs and standard deviation of 0.91 for the spectrum level predicted.
|
42 |
Seleção e análise de associação genômica em dados simulados e da qualidade da carne de ovinos da raça Santa Inês / Genomic selection and association analysis in simulated data and meat quality of Santa Inês sheep breedSimone Fernanda Nedel Pértile 19 August 2015 (has links)
Informações de milhares de marcadores genéticos têm sido incluídas nos programas de melhoramento genético, permitindo a seleção dos animais considerando estas informações e a identificações de regiões genômicas associadas às características de interesse econômico. Devido ao alto custo associado a esta tecnologia e às coletas de dados, os dados simulados apresentam grande importância para que novas metodologias sejam estudadas. O objetivo deste trabalho foi avaliar a eficiência do método ssGBLUP utilizando pesos para os marcadores genéticos, informações de genótipo e fenótipos, com ou sem as informações de pedigree, para seleção e associação genômica ampla, considerando diferentes coeficientes de herdabilidade, presença de efeito poligênico, diferentes números de QTL (quantitative trait loci) e pressões de seleção. Adicionalmente, dados de qualidade da carne de ovinos da raça Santa Inês foram comparados com a os padrões descritos para esta raça. A população estudada foi obtida por simulação de dados, e foi composta por 8.150 animais, sendo 5.850 animais genotipados. Os dados simulados foram analisados utilizando o método ssGBLUP com matrizes de relacionamento com ou sem informações de pedigree, utilizando pesos para os marcadores genéticos obtidos em cada iteração. As características de qualidade da carne estudadas foram: área de olho de lombo, espessura de gordura subcutânea, cor, pH ao abate e após 24 horas de resfriamento das carcaças, perdas por cocção e força de cisalhamento. Quanto maior o coeficiente de herdabilidade, melhores foram os resultados de seleção e associação genômica. Para a identificação de regiões associadas a características de interesse, não houve influência do tipo de matriz de relacionamento utilizada. Para as características com e sem efeito poligênico, quando considerado o mesmo coeficiente de herdabilidade, não houve diferenças para seleção genômica, mas a identificação de QTL foi melhor nas características sem efeito poligênico. Quanto maior a pressão de seleção, mais acuradas foram as predições dos valores genéticos genômicos. Os dados de qualidade da carne obtidos de ovinos da raça Santa Inês estão dentro dos padrões descritos para esta raça e foram identificas diversas regiões genômicas associadas às características estudadas. / Thousands of genetic markers data have been included in animal breeding programs to allow the selection of animals considering this information and to identify genomic regions associated to traits of economic interest. Simulated data have great importance to the study of new methodologies due to the high cost associated with this technology and data collection. The objectives of this study were to evaluate the efficiency of the ssGBLUP method using genotype and phenotype information, with or without pedigree information, and attributing weights for genetic markers, for selection and genome-wide association considering different coefficients of heritability, the presence of polygenic effect, different numbers of quantitative trait loci and selection pressures. Additionally, meat quality data of Santa Ines sheep breed were compared with the standards for the breed. The population of simulated data was composed by 8.150 individuals and 5.850 genotyped animals. The simulated data was analysed by the ssGBLUP method and by two relationship matrix, with or without pedigree information, and weights for genetic markers were obtained in every iteration. The traits of meat quality evaluated were: rib eye area, fat thickness, color, pH at slaughter and 24 hours after the carcass cooling, cooking losses and shear force. The results of selection and genomic association were better for the traits with the highest heritability coefficients. For traits with the greater selection pressure, more accurate predictions of the genomic breeding values were obtained. There was no difference between the relationship matrix studied to identify the regions associated with traits of interest. For the traits with and without polygenic effect, considering the same heritability coefficient, they did not show differences in genomic selection, but the identification of the QTL was better for traits without polygenic effect. The meat quality data obtained from Santa Ines sheep breed are in accordance with the standards for this breed and different genomic regions associated to the studied characteristics were identified.
|
43 |
An investigation of feature weighting algorithms and validation techniques using blind analysis for analogy-based estimationSigweni, Boyce B. January 2016 (has links)
Context: Software effort estimation is a very important component of the software development life cycle. It underpins activities such as planning, maintenance and bidding. Therefore, it has triggered much research over the past four decades, including many machine learning approaches. One popular approach, that has the benefit of accessible reasoning, is analogy-based estimation. Machine learning including analogy is known to significantly benefit from feature selection/weighting. Unfortunately feature weighting search is an NP hard problem, therefore computationally very demanding, if not intractable. Objective: Therefore, one objective of this research is to develop an effi cient and effective feature weighting algorithm for estimation by analogy. However, a major challenge for the effort estimation research community is that experimental results tend to be contradictory and also lack reliability. This has been paralleled by a recent awareness of how bias can impact research results. This is a contributory reason why software effort estimation is still an open problem. Consequently the second objective is to investigate research methods that might lead to more reliable results and focus on blinding methods to reduce researcher bias. Method: In order to build on the most promising feature weighting algorithms I conduct a systematic literature review. From this I develop a novel and e fficient feature weighting algorithm. This is experimentally evaluated, comparing three feature weighting approaches with a na ive benchmark using 2 industrial data sets. Using these experiments, I explore blind analysis as a technique to reduce bias. Results: The systematic literature review conducted identified 19 relevant primary studies. Results from the meta-analysis of selected studies using a one-sample sign test (p = 0.0003) shows a positive effect - to feature weighting in general compared with ordinary analogy-based estimation (ABE), that is, feature weighting is a worthwhile technique to improve ABE. Nevertheless the results remain imperfect so there is still much scope for improvement. My experience shows that blinding can be a relatively straightforward procedure. I also highlight various statistical analysis decisions which ought not be guided by the hunt for statistical significance and show that results can be inverted merely through a seemingly inconsequential statistical nicety. After analysing results from 483 software projects from two separate industrial data sets, I conclude that the proposed technique improves accuracy over the standard feature subset selection (FSS) and traditional case-based reasoning (CBR) when using pseudo time-series validation. Interestingly, there is no strong evidence for superior performance of the new technique when traditional validation techniques (jackknifing) are used but is more effi cient. Conclusion: There are two main findings: (i) Feature weighting techniques are promising for software effort estimation but they need to be tailored for target case for their potential to be adequately exploited. Despite the research findings showing that assuming weights differ in different parts of the instance space ('local' regions) may improve effort estimation results - majority of studies in software effort estimation (SEE) do not take this into consideration. This represents an improvement on other methods that do not take this into consideration. (ii) Whilst there are minor challenges and some limits to the degree of blinding possible, blind analysis is a very practical and an easy-to-implement method that supports more objective analysis of experimental results. Therefore I argue that blind analysis should be the norm for analysing software engineering experiments.
|
44 |
Simulation and Application of Binary Logic Regression ModelsHeredia Rico, Jobany J 01 April 2016 (has links)
Logic regression (LR) is a methodology to identify logic combinations of binary predictors in the form of intersections (and), unions (or) and negations (not) that are linearly associated with an outcome variable. Logic regression uses the predictors as inputs and enables us to identify important logic combinations of independent variables using a computationally efficient tree-based stochastic search algorithm, unlike the classical regression models, which only consider pre-determined conventional interactions (the “and” rules). In the thesis, we focused on LR with a binary outcome in a logistic regression framework. Simulation studies were conducted to examine the performance of LR under the assumption of independent and correlated observations, respectively, for various characteristics of the data sets and LR search parameters. We found that the proportion of times that LR selected the correct logic rule was usually low when the signal and/or prevalence of the true logic rule were relatively low. The method performed satisfactorily under easy learning conditions such as high signal, simple logic rules and/or small numbers of predictors. Given the simulation characteristics and correlation structures tested, we found some but not significant difference in performance when LR was applied to dependent observations compared to the independent case. In addition to simulation studies, an advanced application method was proposed to integrate LR and resampling methods in order to enhance LR performance. The proposed method was illustrated using two simulated data sets as well as a data set from a real-life situation. The proposed method showed some evidence of being effective in discerning the correct logic rule, even for unfavorable learning conditions.
|
45 |
Využití bootstrapu a křížové validace v odhadu predikční chyby regresních modelů / Utilizing Bootstrap and Cross-validation for prediction error estimation in regression modelsLepša, Ondřej January 2014 (has links)
Finding a well-predicting model is one of the main goals of regression analysis. However, to evaluate a model's prediction abilities, it is a normal practice to use criteria which either do not serve this purpose, or criteria of insufficient reliability. As an alternative, there are relatively new methods which use repeated simulations for estimating an appropriate loss function -- prediction error. Cross-validation and bootstrap belong to this category. This thesis describes how to utilize these methods in order to select a regression model that best predicts new values of the response variable.
|
46 |
Open and closed loop model identification and validationGuidi, Figuroa Hernan 03 July 2009 (has links)
Closed-loop system identification and validation are important components in dynamic system modelling. In this dissertation, a comprehensive literature survey is compiled on system identification with a specific focus on closed-loop system identification and issues of identification experiment design and model validation. This is followed by simulated experiments on known linear and non-linear systems and experiments on a pilot scale distillation column. The aim of these experiments is to study several sensitivities between identification experiment variables and the consequent accuracy of identified models and discrimination capacity of validation sets given open and closed-loop conditions. The identified model structure was limited to an ARX structure and the parameter estimation method to the prediction error method. The identification and validation experiments provided the following findings regarding the effects of different feedback conditions: <ul> <li>Models obtained from open-loop experiments produced the most accurate responses when approximating the linear system. When approximating the non-linear system, models obtained from closed-loop experiments were found to produce the most accurate responses.</li> <li>Validation sets obtained from open-loop experiments were found to be most effective in discriminating between models approximating the linear system while the same may be said of validation sets obtained from closed-loop experiments for the nonlinear system.</li> </ul> These finding were mostly attributed to the condition that open-loop experiments produce more informative data than closed-loop experiments given no constraints are imposed on system outputs. In the case that system output constraints are imposed, closed-loop experiments produce the more informative data of the two. In identifying the non-linear system and the distillation column it was established that defining a clear output range, and consequently a region of dynamics to be identified, is very important when identifying linear approximations of non-linear systems. Thus, since closed-loop experiments produce more informative data given output constraints, the closed-loop experiments were more effective on the non-liner systems. Assessment into other identification experiment variables revealed the following: <ul> <li>Pseudo-random binary signals were the most persistently exciting signals as they were most consistent in producing models with accurate responses.</li> <li>Dither signals with frequency characteristics based on the system’s dominant dynamics produced models with more accurate responses.</li> <li>Setpoint changes were found to be very important in maximising the generation of informative data for closed-loop experiments</li></ul> Studying the literature surveyed and the results obtained from the identification and validation experiments it is recommended that, when identifying linear models approximating a linear system and validating such models, open-loop experiments should be used to produce data for identification and cross-validation. When identifying linear approximations of a non-linear system, defining a clear output range and region of dynamics is essential and should be coupled with closed-loop experiments to generate data for identification and cross-validation. / Dissertation (MEng)--University of Pretoria, 2009. / Chemical Engineering / unrestricted
|
47 |
Um estudo comparativo das técnicas de validação cruzada aplicadas a modelos mistos / A comparative study of cross-validation techniques applied to mixed modelsCunha, João Paulo Zanola 28 May 2019 (has links)
A avaliação da predição de um modelo por meio do cálculo do seu risco esperado é uma importante etapa no processo de escolha do um preditor eficiente para observações futuras. Porém, deve ser evitado nessa avaliação usar a mesma base em que foi criado o preditor, pois traz, no geral, estimativas abaixo do valor real do risco esperado daquele modelo. As técnicas de validação cruzada (K-fold, Leave-One-Out, Hold-Out e Bootstrap) são aconselhadas nesse caso, pois permitem a divisão de uma base em amostra de treino e validação, fazendo assim que a criação do preditor e a avaliação do seu risco sejam feitas em bases diferentes. Este trabalho apresenta uma revisão dessas técnicas e suas particularidades na estimação do risco esperado. Essas técnicas foram avaliadas em dois modelos mistos com distribuições Normal e Logístico e seus desempenhos comparados por meio de estudos de simulação. Por fim, as metodologias foram aplicadas em um conjunto de dados real. / The appraisal of models prediction through the calculation of the expected risk is an important step on the process of the choice of an efficient predictor to future observations. However, in this evaluation it should be avoided to use the same data to calculate the predictor on which it was created, due to it brings, in general, estimates above the real expected risk value of the model. In this case, the cross-validation methods (K-fold, Leave-One-Out, Hold-Out and Bootstrap) are recommended because the partitioning of the data in training and validation samples allows the creation of the predictor and its risk evaluation on different data sets. This work presents a briefing of this methods and its particularities on the expected risk estimation. These methods were evaluated on two mixed models with Normal and Logistic distributions and their performances were compared through simulation cases. Lastly, those methods were applied on a real database.
|
48 |
Machine Learning-based Analysis of the Relationship Between the Human Gut Microbiome and Bone HealthJanuary 2020 (has links)
abstract: The Human Gut Microbiome (GM) modulates a variety of structural, metabolic, and protective functions to benefit the host. A few recent studies also support the role of the gut microbiome in the regulation of bone health. The relationship between GM and bone health was analyzed based on the data collected from a group of twenty-three adolescent boys and girls who participated in a controlled feeding study, during which two different doses (0 g/d fiber and 12 g/d fiber) of Soluble Corn Fiber (SCF) were added to their diet. This analysis was performed by predicting measures of Bone Mineral Density (BMD) and Bone Mineral Content (BMC) which are indicators of bone strength, using the GM sequence of proportions of 178 microbes collected from 23 subjects, by building a machine learning regression model. The model developed was evaluated by calculating performance metrics such as Root Mean Squared Error, Pearson’s correlation coefficient, and Spearman’s rank correlation coefficient, using cross-validation. A noticeable correlation was observed between the GM and bone health, and it was observed that the overall prediction correlation was higher with SCF intervention (r ~ 0.51). The genera of microbes that played an important role in this relationship were identified. Eubacterium (g), Bacteroides (g), Megamonas (g), Acetivibrio (g), Faecalibacterium (g), and Paraprevotella (g) were some of the microbes that showed an increase in proportion with SCF intervention. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2020
|
49 |
Comparative Data Analytic Approach for Detection of DiabetesSood, Radhika January 2018 (has links)
No description available.
|
50 |
Using Transcriptomic Data to Predict Biomarkers for Subtyping of Lung CancerDaran, Rukesh January 2021 (has links)
Lung cancer is one the most dangerous types of all cancer. Several studies have explored the use of machine learning methods to predict and diagnose this cancer. This study explored the potential of decision tree (DT) and random forest (RF) classification models, in the context of a small transcriptome dataset for outcome prediction of different subtypes on lung cancer. In the study we compared the three subtypes; adenocarcinomas (AC), small cell lung cancer (SCLC) and squamous cell carcinomas (SCC) with normal lung tissue by applying the two machine learning methods from caret R package. The DT and RF model and their validation showed different results for each subtype of the lung cancer data. The DT found more features and validated them with better metrics. Analysis of the biological relevance was focused on the identified features for each of the subtypes AC, SCLC and SCC. The DT presented a detailed insight into the biological data which was essential by classifying it as a biomarker. The identified features from this research may serve as potential candidate genes which could be explored further to confirm their role in corresponding lung cancer types and contribute to targeted diagnostics of different subtypes.
|
Page generated in 0.1178 seconds