Spelling suggestions: "subject:"[een] IMPUTATION"" "subject:"[enn] IMPUTATION""
221 |
多重插補法在線上使用者評分之應用 / Managing online user-generated product reviews using multiple imputation methods李岑志, Li, Cen Jhih Unknown Date (has links)
隨著網路普及,人們越來越常在網路上購物並在線上評價商品,產生了非常大的口碑效應。不論對廠商或對消費者來說,線上商品評論都已經變得非常重要;消費者能藉由他人購買經驗判斷產品優劣,廠商能藉由消費者評價來提升產品品質,目前已有許多電子商務網站都有蒐集消費者購買產品後的意見回饋。
這些網站中有些提供消費者能對產品打一個總分並寫一段文字評論,然而每個消費者所評論的產品特徵通常各有不同,尤其是較晚購買的消費者更可能因為自己的意見已經有人提過而省略。將每個人提到的文字敘述量化為數字分數時,沒有寫到的特徵將會使量化後的資料存在許多遺漏值。
同時消費者也有可能提到一些不重要的特徵,若能找到消費者評論中,各個特徵影響消費者的多寡,廠商就能針對產品較重要的缺點改進。本研究將會著重探討消費者所提到的特徵對產品總分的影響,以及這些遺漏值填補後是否能接近消費者真實意見。
過去許多填補遺漏值的方法都是一次填補全部資料,並沒有考慮消費者會受到時間較早的評論影響。本研究設計一套多重插補的方法並透過模擬驗證,以之填補亞馬遜網站的Canon 系列 SX210、SX230、SX260等三個世代數位相機之消費者評論資料。研究結果指出此方法能夠準確估計各項特徵對產品總分的影響。 / Online user-generated product reviews have become a rich source of product quality information for both producers and customers. As a result, many E-commerce websites allow customers to rate products using scores, and some together with text comments. However, people usually comment only on the features they care about and might omit those have been mentioned by previous customers. Consequently, missing data occur when analyzing comments.
In addition, customers may comment the features which influence neither their satisfaction nor sales volume. Thus, it is important to find the significant features so that manufacturers can improve the main defects. Our research focuses on modeling customer reviews and their influence on predicting overall ratings. We aim to understand whether, by filling up missing values, the critical features can be identified and the features rating authentically reflect customer opinion.
Many previous studies fill whole the dataset, but not consider that customer reviews might be influenced by the foregoing reviews. We propose a method based on multiple imputation and fill the costumer reviews of Canon digital camera (SX210, SX230, SX260 generations) on Amazon. We design a simulation to verify the method’s effectiveness and the method get a great result on identifying the critical features.
|
222 |
A cox proportional hazard model for mid-point imputed interval censored dataGwaze, Arnold Rumosa January 2011 (has links)
There has been an increasing interest in survival analysis with interval-censored data, where the event of interest (such as infection with a disease) is not observed exactly but only known to happen between two examination times. However, because so much research has been focused on right-censored data, so many statistical tests and techniques are available for right-censoring methods, hence interval-censoring methods are not as abundant as those for right-censored data. In this study, right-censoring methods are used to fit a proportional hazards model to some interval-censored data. Transformation of the interval-censored observations was done using a method called mid-point imputation, a method which assumes that an event occurs at some midpoint of its recorded interval. Results obtained gave conservative regression estimates but a comparison with the conventional methods showed that the estimates were not significantly different. However, the censoring mechanism and interval lengths should be given serious consideration before deciding on using mid-point imputation on interval-censored data.
|
223 |
Three Studies of Transitions of Young People in Public Care: A Focus on Educational OutcomesTessier, Nicholas January 2015 (has links)
The educational outcomes of children in care, as they prepare for and eventually complete the transition out of care, have been the subject of a growing body of research. Despite the progress made, no unified theory of risk and protective factors associated with educational outcomes has yet arisen from the longitudinal, cohort, and cross-sectional studies conducted with youth in care. This dissertation presents three papers that examine the effects of risk and protective factors on a range of educational outcome variables. The studies follow the timeline of a young person preparing for transition, moving into supported transitional living, and then eventually exiting care altogether.
Study 1 presents cross-sectional and longitudinal tests of the generalizability of many of the risk and protective factors identified by O’Higgins, Sebba, & Gardner (2014) in their systematic review of predictors of educational achievement among young people living in foster or kinship care. The cross-sectional sample consisted of 3,662 young people aged 12 to 17 years who were residing in out-of-home care in Ontario, Canada. An additional longitudinal sample was composed of a subsample of 962 young people from the cross-sectional sample who had also been assessed 36 months later with the AAR-C2-2010 during year 13 (2013-2014) of the OnLAC project. Supporting evidence for twelve of the twenty factors identified by O’Higgins et al. are revealed in the broad cross-sectional study and for the four factors that were found to predict change in academic success over a longitudinal timeframe suggest we are on the right track. Study 2 uses a lag-as-moderator approach to see if the time between assessments influences the predictive capacity of variables assessed when the young person was in care to predict educational variables evaluated when the youth had completed the transition to support independent living. Results from this thorough methodological study of gap length over six years of OnLAC data are encouraging: 87.5% of the predictors tested for statistical moderation effects by the length of time between assessments were shown to be stable predictors across all gaps (i.e., no moderation by gap length effect). Study 3 presents a pilot 12-month follow-up study conducted with young people at the point of a major transition within or from child welfare services, comparing their characteristics with those of samples from the general population.
When assembled together, the three studies provide a foundation towards the formalizing of a list of risk and protective predictors of educational outcomes (namely, academic success, educational attainment, educational aspirations, and NEET status) originally selected from a systematic review that identified a range of factors to be associated with the educational outcomes of youth in care (O’Higgins, Sebba, and Gardner; 2014). Additionally, this dissertation presents a series of recommendations regarding the management and multiple imputation of missing data and the use of Lag as Moderator statistical methods in child welfare research.
|
224 |
Estudo de associação genômica ampla aplicada ao conteúdo de macronutrientes em grãos de Coffea arabica L.Felicio, Mariane Silva January 2020 (has links)
Orientador: Douglas Silva Domingues / Resumo: O café é uma das commodities agrícolas tropicais mais comercializadas no mundo. Coffea arabica é a principal espécie utilizada para a produção comercial de café. A espécie é originária da Etiópia. Ela é única espécie alotetraploide do gênero (2n = 4x = 44) e se reproduz predominantemente por autofecundação. As cultivares comerciais de C. arabica possuem baixa diversidade genética, o que indica a necessidade de introgressão de alelos de germoplasma para o melhoramento dessas cultivares. Acessos do centro de origem da espécie possuem maior diversidade que as cultivares comerciais e podem ser utilizados para a identificação de novos alelos. O conteúdo de macronutrientes em grãos do cafeeiro tem impacto direto na qualidade do produto. No entanto, a base molecular da composição mineral de grãos de cafeeiro ainda é pouco conhecida. Com isso, o objetivo desse trabalho foi identificar marcadores SNP possivelmente associados com a composição de macronutrientes em grãos de C. arabica. Para alcance deste objetivo, foram comparados três métodos de imputação de genótipos, bem como foi realizado o mapeamento associativo em estudo de associação genômica ampla (GWAS). Foi utilizado um painel de 110 genótipos de C. arabica, composto por genótipos elite do programa de melhoramento do Instituto Agronômico do Paraná (3), cultivares comerciais (11) e acessos selvagens (96). Foram realizadas análises da composição de cinco macronutrientes (N, P, K, Ca e Mg) em grãos de cafeeiro coletados de 70 e 1... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Coffee is one of the most traded tropical commodities in the world. Coffea arabica is the main species used for commercial production. The species is originally from Ethiopia. In the Coffea genus, C. arabica is the only allotetraploid species (2n = 4x = 44) and it reproduces predominantly by self-fertilization. The commercial cultivars of C. arabica have a narrow genetic base that indicates the need for the introgression of new alleles from germplasm into coffee breeding programs. Wild accessions of C. arabica, from Ethiopia, have higher genetic diversity and can be used to identify new alleles. The macronutrient composition of the coffee grains has a direct impact on grain quality. However, the molecular basis for the mineral composition in coffee grains still poorly understood. Thus, the aim of this work was to perform mapping association analyses using the genome-wide association study (GWAS) technique to identify single nucleotide polymorphisms (SNPs) associated with macronutrient content in coffee grains from C. arabica. We also tested three imputation methods (haplotype missing allele imputation - Beagle, K-nearest neighbors, and Random Forest) in the genotypic data, and mapped it to two C. arabica reference genomes from the cultivar Caturra red and the spontaneous dihaploid Et39. We used a panel of 110 C. arabica genotypes, including elite landraces from the IAPAR coffee breeding program (3), commercial cultivars (11) and wild accessions (96). Analysis of the compositi... (Complete abstract click electronic access below) / Doutor
|
225 |
A Comparison of Techniques for Handling Missing Data in Longitudinal StudiesBogdan, Alexander R 07 November 2016 (has links)
Missing data are a common problem in virtually all epidemiological research, especially when conducting longitudinal studies. In these settings, clinicians may collect biological samples to analyze changes in biomarkers, which often do not conform to parametric distributions and may be censored due to limits of detection. Using complete data from the BioCycle Study (2005-2007), which followed 259 premenopausal women over two menstrual cycles, we compared four techniques for handling missing biomarker data with non-Normal distributions. We imposed increasing degrees of missing data on two non-Normally distributed biomarkers under conditions of missing completely at random, missing at random, and missing not at random. Generalized estimating equations were used to obtain estimates from complete case analysis, multiple imputation using joint modeling, multiple imputation using chained equations, and multiple imputation using chained equations and predictive mean matching on Day 2, Day 13 and Day 14 of a standardized 28-day menstrual cycle. Estimates were compared against those obtained from analysis of the completely observed biomarker data. All techniques performed comparably when applied to a Normally distributed biomarker. Multiple imputation using joint modeling and multiple imputation using chained equations produced similar estimates across all types and degrees of missingness for each biomarker. Multiple imputation using chained equations and predictive mean matching consistently deviated from both the complete data estimates and the other missing data techniques when applied to a biomarker with a bimodal distribution. When addressing missing biomarker data in longitudinal studies, special attention should be given to the underlying distribution of the missing variable. As biomarkers become increasingly Normal, the amount of missing data tolerable while still obtaining accurate estimates may also increase when data are missing at random. Future studies are necessary to assess these techniques under more elaborate missingness mechanisms and to explore interactions between biomarkers for improved imputation models.
|
226 |
Statistical Inference for Multivariate Stochastic Differential EquationsLiu, Ge 15 November 2019 (has links)
No description available.
|
227 |
Predicting Marital Dissolution Using Data from Both SpousesLu, Chao-Chin 16 December 2010 (has links) (PDF)
The present research studies marital dissolution using data from both spouses from the National Survey of Families and Households (NSFH) and uses the method of multiple imputation to handle missing data. Role theory and another four approaches (social exchange theory, stake theory, gender perspective and heterogeneity perspective) are used to make a methodological argument why using data from both spouses is necessary to study marital stability. Five data sets are imputed and there are 3,777 observations in each imputed data set. Main research findings are as followed. First, the model fits of the data from both spouses on marital dissolution are significantly better than the model fits of the data from one spouse only; therefore, gathering perceptual data from both spouses is necessary to understand marital dissolution. Second, overall, the effects of most spousal discrepancies do not support the heterogeneity perspective. Third, the model fits of the wife only model are significantly better than the model fits of the husband only model across different periods of marital duration, and the predictability of wives' variables is more stable than husbands' variables. Therefore, if only individual-level data are available to use, researchers are encouraged to use wives' data rather than husbands' data. Fourth, the predictability of factors varies with marital duration and gender in the models with data from both spouses.
|
228 |
Return to Eden: An Examination of Personal Salvation in Martin Luther's Von der Freiheit eines ChristenmenschenWhite, Jordan P. 27 July 2012 (has links)
No description available.
|
229 |
El injusto penal organizacional frente al injusto penal personal en el delito de criminalidad organizada dentro del ordenamiento jurídico peruanoCoronel Silva, Ruth Noemi January 2024 (has links)
La presente investigación se enfocó en determinar cómo la aplicación mixta del injusto organizacional y personal en la criminalidad organizada contribuiría a imputar a sus miembros inactivos, por lo que se abordaron discordancias normativas respecto a la sanción en casos de organizaciones criminales, las cuales aplican sanciones tanto a nivel organizacional como personal. Asimismo, el estudio se centró en desarrollar la figura de imputación subjetiva sistémica, con el objetivo de aplicar de manera conjunta el injusto organizacional y el injusto personal en delitos de criminalidad organizada, con el propósito de
prevenir la impunidad de los miembros inactivos que contribuyen al funcionamiento delictivo de la organización. Finalmente se aplicó un enfoque metodológico cualitativo para lograr estos objetivos, permitiendo una comprensión más profunda de las complejidades involucradas en la imputación de este tipo de delitos. / The present investigation focused on determining how the mixed application of organizational and personal injustice in organized crime would contribute to charging its inactive members, so normative discrepancies were addressed regarding the sanction in cases of criminal organizations, which apply sanctions both at an organizational and personal level.
Likewise, the study focused on developing the figure of systemic subjective imputation, with the objective of jointly applying organizational injustice and personal injustice in organized crime crimes, with the purpose of preventing impunity for inactive members who contribute to the crime. criminal operation of the organization. Finally, a qualitative methodological approach was applied to achieve these objectives, allowing a deeper understanding of the complexities involved in the imputation of this type of crimes.
|
230 |
Identifying Induced Bias in Machine LearningChowdhury Mohammad Rakin Haider (18414885) 22 April 2024 (has links)
<p dir="ltr">The last decade has witnessed an unprecedented rise in the application of machine learning in high-stake automated decision-making systems such as hiring, policing, bail sentencing, medical screening, etc. The long-lasting impact of these intelligent systems on human life has drawn attention to their fairness implications. A majority of subsequent studies targeted the existing historically unfair decision labels in the training data as the primary source of bias and strived toward either removing them from the dataset (de-biasing) or avoiding learning discriminatory patterns from them during training. In this thesis, we show label bias is not a necessary condition for unfair outcomes from a machine learning model. We develop theoretical and empirical evidence showing that biased model outcomes can be introduced by a range of different data properties and components of the machine learning development pipeline.</p><p dir="ltr">In this thesis, we first prove that machine learning models are expected to introduce bias even when the training data doesn’t include label bias. We use the proof-by-construction technique in our formal analysis. We demonstrate that machine learning models, trained to optimize for joint accuracy, introduce bias even when the underlying training data is free from label bias but might include other forms of disparity. We identify two data properties that led to the introduction of bias in machine learning. They are the group-wise disparity in the feature predictivity and the group-wise disparity in the rates of missing values. The experimental results suggest that a wide range of classifiers trained on synthetic or real-world datasets are prone to introducing bias under feature disparity and missing value disparity independently from or in conjunction with the label bias. We further analyze the trade-off between fairness and established techniques to improve the generalization of machine learning models such as adversarial training, increasing model complexity, etc. We report that adversarial training sacrifices fairness to achieve robustness against noisy (typically adversarial) samples. We propose a fair re-weighted adversarial training method to improve the fairness of the adversarially trained models while sacrificing minimal adversarial robustness. Finally, we observe that although increasing model complexity typically improves generalization accuracy, it doesn’t linearly improve the disparities in the prediction rates.</p><p dir="ltr">This thesis unveils a vital limitation of machine learning that has yet to receive significant attention in FairML literature. Conventional FairML literature reduces the ML fairness task to as simple as de-biasing or avoiding learning discriminatory patterns. However, the reality is far away from it. Starting from deciding on which features collect up to algorithmic choices such as optimizing robustness can act as a source of bias in model predictions. It calls for detailed investigations on the fairness implications of machine learning development practices. In addition, identifying sources of bias can facilitate pre-deployment fairness audits of machine learning driven automated decision-making systems.</p>
|
Page generated in 0.0313 seconds