Global ETD Search

81	Relational Outlier Detection: Techniques and Applications Lu, Yen-Cheng 10 June 2021 (has links) Nowadays, outlier detection has attracted growing interest. Unlike typical outlier detection problems, relational outlier detection focuses on detecting abnormal patterns in datasets that contain relational implications within each data point. Furthermore, different from the traditional outlier detection that focuses on only numerical data, modern outlier detection models must be able to handle data in various types and structures. Detecting relational outliers should consider (1) Dependencies among different data types, (2) Data types that are not continuous or do not have ordinal characteristics, such as binary, categorical or multi-label, and (3) Special structures in the data. This thesis focuses on the development of relational outlier detection methods and real-world applications in datasets that contain non-numerical, mixed-type, and special structure data in three tasks, namely (1) outlier detection in mixed-type data, (2) categorical outlier detection in music genre data, and (3) outlier detection in categorized time series data. For the first task, existing solutions for mixed-type data mostly focus on computational efficiency, and their strategies are mostly heuristic driven, lacking a statistical foundation. The proposed contributions of our work include: (1) Constructing a novel unsupervised framework based on a robust generalized linear model (GLM), (2) Developing a model that is capable of capturing large variances of outliers and dependencies among mixed-type observations, and designing an approach for approximating the analytically intractable Bayesian inference, and (3) Conducting extensive experiments to validate effectiveness and efficiency. For the second task, we extended and applied the modeling strategy to a real-world problem. The existing solutions to the specific task are mostly supervised, and the traditional outlier detection methods only focus on detecting outliers by the data distributions, ignoring the input-output relation between the genres and the extracted features. The proposed contributions of our work for this task include: (1) Proposing an unsupervised outlier detection framework for music genre data, (2) Extending the GLM based model in the first task to handle categorical responses and developing an approach to approximate the analytically intractable Bayesian inference, and (3) Conducting experiments to demonstrate that the proposed method outperforms the benchmark methods. For the third task, we focused on improving the outlier detection performance in the second task by proposing a novel framework and expanded the research scope to general categorized time-series data. Existing studies have suggested a large number of methods for automatic time series classification. However, there is a lack of research focusing on detecting outliers from manually categorized time series. The proposed contributions of our work for this task include: (1) Proposing a novel semi-supervised robust outlier detection framework for categorized time-series datasets, (2) Further extending the new framework to an active learning system that takes user insights into account, and (3) Conducting a comprehensive set of experiments to demonstrate the performance of the proposed method in real-world applications. / Doctor of Philosophy / In recent years, outlier detection has been one of the most important topics in the data mining and machine learning research domain. Unlike typical outlier detection problems, relational outlier detection focuses on detecting abnormal patterns in datasets that contain relational implications within each data point. Detecting relational outliers should consider (1) Dependencies among different data types, (2) Data types that are not continuous or do not have ordinal characteristics, such as binary, categorical or multi-label, and (3) Special structures in the data. This thesis focuses on the development of relational outlier detection methods and real-world applications in datasets that contain non-numerical, mixed-type, and special structure data in three tasks, namely (1) outlier detection in mixed-type data, (2) categorical outlier detection in music genre data, and (3) outlier detection in categorized time series data. The first task aims on constructing a novel unsupervised framework, developing a model that is capable of capturing the normal pattern and the effects, and designing an approach for model fitting. In the second task, we further extended and applied the modeling strategy to a real-world problem in the music technology domain. For the third task, we expanded the research scope from the previous task to general categorized time-series data, and focused on improving the outlier detection performance by proposing a novel semi-supervised framework. Relational Outlier Detection Generalized Linear Model Robust Estimation Music Genre Recognition Time Series Outlier Detection
82	A Consensus Model for Predicting the Distribution of the Threatened Plant Telephus Spurge (Euphorbia Telephioides) Bracken, Jason 02 December 2016 (has links) No description available. Conservation Biology Botany SDM species distribution model Telephus spurge Euphorbia telephioides GLM generalized linear model
83	A Monte Carlo Study of Power Analysis of Hierarchical Linear Model and Repeated Measures Appoaches to Longitudinal Data Analysis Fang, Hua 03 October 2006 (has links) No description available. Statistics Hierarchical Linear Model Repeated Measures Mixed Model Longitudinal Study Power Analysis Monte Carlo
84	Sustainable Design and Operation of the Cement Industry Avetisyan, Hakob G. 19 December 2008 (has links) No description available. Civil Engineering Cement Production Economic-Mathematical Model Life Cycle Cost Emission Constraints Optimal Production Linear Model
85	Feasible Generalized Least Squares: theory and applications González Coya Sandoval, Emilio 04 June 2024 (has links) We study the Feasible Generalized Least-Squares (FGLS) estimation of the parameters of a linear regression model in which the errors are allowed to exhibit heteroskedasticity of unknown form and to be serially correlated. The main contribution is two fold; first we aim to demystify the reasons often advanced to use OLS instead of FGLS by showing that the latter estimate is robust, and more efficient and precise. Second, we devise consistent FGLS procedures, robust to misspecification, which achieves a lower mean squared error (MSE), often close to that of the correctly specified infeasible GLS. In the first chapter we restrict our attention to the case with independent heteroskedastic errors. We suggest a Lasso based procedure to estimate the skedastic function of the residuals. This estimate is then used to construct a FGLS estimator. Using extensive Monte Carlo simulations, we show that this Lasso-based FGLS procedure has better finite sample properties than OLS and other linear regression-based FGLS estimates. Moreover, the FGLS-Lasso estimate is robust to misspecification of both the functional form and the variables characterizing the skedastic function. The second chapter generalizes our investigation to the case with serially correlated errors. There are three main contributions; first we show that GLS is consistent requiring only pre-determined regressors, whereas OLS requires exogenous regressors to be consistent. The second contribution is to show that GLS is much more robust that OLS; even a misspecified GLS correction can achieve a lower MSE than OLS. The third contribution is to devise a FGLS procedure valid whether or not the regressors are exogenous, which achieves a MSE close to that of the correctly specified infeasible GLS. Extensive Monte Carlo experiments are conducted to assess the performance of our FGLS procedure against OLS in finite samples. FGLS achieves important reductions in MSE and variance relative to OLS. In the third chapter we consider an empirical application; we re-examine the Uncovered Interest Parity (UIP) hypothesis, which states that the expected rate of return to speculation in the forward foreign exchange market is zero. We extend the FGLS procedure to a setting in which lagged dependent variables are included as regressors. We thus provide a consistent and efficient framework to estimate the parameters of a general k-step-ahead linear forecasting equation. Finally, we apply our FGLS procedures to the analysis of the two main specifications to test the UIP. Economics Confidence intervals Feasible Generalized Least-Squares Linear model Mean-squared error Non-parametric methods
86	Semiparametric Methods for the Generalized Linear Model Chen, Jinsong 01 July 2010 (has links) The generalized linear model (GLM) is a popular model in many research areas. In the GLM, each outcome of the dependent variable is assumed to be generated from a particular distribution function in the exponential family. The mean of the distribution depends on the independent variables. The link function provides the relationship between the linear predictor and the mean of the distribution function. In this dissertation, two semiparametric extensions of the GLM will be developed. In the first part of this dissertation, we have proposed a new model, called a semiparametric generalized linear model with a log-concave random component (SGLM-L). In this model, the estimate of the distribution of the random component has a nonparametric form while the estimate of the systematic part has a parametric form. In the second part of this dissertation, we have proposed a model, called a generalized semiparametric single-index mixed model (GSSIMM). A nonparametric component with a single index is incorporated into the mean function in the generalized linear mixed model (GLMM) assuming that the random component is following a parametric distribution. In the first part of this dissertation, since most of the literature on the GLM deals with the parametric random component, we relax the parametric distribution assumption for the random component of the GLM and impose a log-concave constraint on the distribution. An iterative numerical algorithm for computing the estimators in the SGLM-L is developed. We construct a log-likelihood ratio test for inference. In the second part of this dissertation, we use a single index model to generalize the GLMM to have a linear combination of covariates enter the model via a nonparametric mean function, because the linear model in the GLMM is not complex enough to capture the underlying relationship between the response and its associated covariates. The marginal likelihood is approximated using the Laplace method. A penalized quasi-likelihood approach is proposed to estimate the nonparametric function and parameters including single-index coe±cients in the GSSIMM. We estimate variance components using marginal quasi-likelihood. Asymptotic properties of the estimators are developed using a similar idea by Yu (2008). A simulation example is carried out to compare the performance of the GSSIMM with that of the GLMM. We demonstrate the advantage of my approach using a study of the association between daily air pollutants and daily mortality adjusted for temperature and wind speed in various counties of North Carolina. / Ph. D. Penalized splines Generalized linear mixed model Generalized linear model Single-Index Model
87	The Impact of the Internet on Civic and Political Participation in Local Governance: A Mulitilevel Model for Bridging Individual and Group Levels of Analysis Kim, Byoung Joon 18 February 2009 (has links) Politically interested individual citizens often use information and communication technology (ICT) to facilitate and augment their civic and political participation. At the local level, ICT plays an important role for communication and information sharing in order for local groups to create awareness and draw citizens into public deliberation about local issues and concerns. This research examines the interplay of individual and local group level factors in order to better understand the relationship between civic engagement and ICT, especially the internet, by using household survey data from the town of Blacksburg, Virginia and environs in 2005 and 2006. It seeks to reconcile those different levels of analysis relating to the use and impact of the internet on civic engagement in local governance. This study identifies the distinctive influences at both the individual citizen level and the group level by applying a multilevel statistical model (the Hierarchical Linear Model). First, this study found the effects of internal and external political efficacy and community collective efficacy as significant individual level influences on internet use for civic and political purposes. Second, group internet use—which includes new internet technologies—and group political discussion were revealed as key influences on citizens' perspectives on the helpfulness of the internet for civic and political purposes at the group level of analysis. Finally, in multilevel analysis, those recognized group level variables (group internet use and group political discussion and interests) led to positive agreement with the following statements: 1) the internet has helped me feel more connected with people like myself in the local area; 2) the internet has helped me feel more connected with a diversity of people in the local area; and 3) the internet has helped me become more involved in local issues that interest me when taking individual level variables into account. / Ph. D. E-democracy E-government Hierarchical Linear Model Local Community Group Internet Local Governance Civic Engagement
88	[en] USING LINEAR MIXED MODELS ON DATA FROM EXPERIMENTS WITH RESTRICTION IN RANDOMIZATION / [pt] UTILIZAÇÃO DE MODELOS LINEARES MISTOS EM DADOS PROVENIENTES DE EXPERIMENTOS COM RESTRIÇÃO NA ALEATORIZAÇÃO MARCELA COHEN MARTELOTTE 04 October 2010 (has links) [pt] Esta dissertação trata da aplicação de modelos lineares mistos em dados provenientes de experimentos com restrição na aleatorização. O experimento utilizado neste trabalho teve como finalidade verificar quais eram os fatores de controle do processo de laminação a frio que mais afetavam a espessura do material utilizado na fabricação das latas para bebidas carbonatadas. A partir do experimento, foram obtidos dados para modelar a média e a variância da espessura do material. O objetivo da modelagem era identificar quais fatores faziam com que a espessura média atingisse o valor desejado (0,248 mm). Além disso, era necessário identificar qual a combinação dos níveis desses fatores que produzia a variância mínima na espessura do material. Houve replicações neste experimento, mas estas não foram executadas de forma aleatória, e, além disso, os níveis dos fatores utilizados não foram reinicializados, nas rodadas do experimento. Devido a estas restrições, foram utilizados modelos mistos para o ajuste da média, e da variância, da espessura, uma vez que com tais modelos é possível trabalhar na presença de dados auto-correlacionados e heterocedásticos. Os modelos mostraram uma boa adequação aos dados, indicando que para situações onde existe restrição na aleatorização, a utilização de modelos mistos se mostra apropriada. / [en] This dissertation presents an application of linear mixed models on data from an experiment with restriction in randomization. The experiment used in this study was aimed to verify which were the controlling factors, in the cold-rolling process, that most affected the thickness of the material used in the carbonated beverages market segment. From the experiment, data were obtained to model the mean and variance of the thickness of the material. The goal of modeling was to identify which factors were significant for the thickness reaches the desired value (0.248 mm). Furthermore, it was necessary to identify which combination of levels, of these factors, produced the minimum variance in the thickness of the material. There were replications of this experiment, but these were not performed randomly. In addition, the levels of factors used were not restarted during the trials. Due to these limitations, mixed models were used to adjust the mean and the variance of the thickness. The models showed a good fit to the data, indicating that for situations where there is restriction on randomization, the use of mixed models is suitable. [pt] PROCESSO [pt] EXPERIMENTO [pt] ALEATORIZACAO [pt] MODELO LINEAR [en] PROCESS [en] EXPERIMENTS [en] RANDOMIZATION [en] LINEAR MODEL
89	Sea turtle bycatch by the U.S. Atlantic pelagic longline fishery: A simulation modeling analysis of estimation methods Barlow, Paige Fithian 01 September 2009 (has links) The U.S. pelagic longline fishery catches 98% of domestic swordfish landings but is also one of the three fisheries most affecting federally protected sea turtles (Crowder and Myers 2001, Witherington et al 2009). Bycatch by fisheries is considered the main anthropogenic threat to sea turtles (NRC 1990). Accurate and precise bycatch estimates are imperative for sea turtle conservation and appropriate fishery management. However, estimation is complicated by only 8% observer coverage of fishing and data that are hierarchical in structure (i.e., multiple sets per trip), zero-heavy (i.e., bycatch is rare), and often overdispersed (i.e., larger variance than expected). Therefore, I evaluated two predominant bycatch estimation methods, the delta-lognormal method and generalized linear models, and investigated improvements in uncertainty incorporation. I constructed a simulation model to evaluate bycatch estimation at two spatial scales under ten spatial models of sea turtle, fishing set, and observer distributions. Results indicated that distributing observers relative to fishing effort and using the delta-lognormal-strata method was most appropriate. The delta-lognormal-strata 95% confidence interval (CI) was wider than statistically appropriate. The delta-lognormal-all sets pooled 95% CI was narrower but simulated bycatch was above the CI too frequently. Thus, I developed a bycatch estimate risk distribution to incorporate uncertainty in bycatch estimates. It gives managers access to the entire distribution of bycatch estimates and their choice of any risk level. Results support the management agency's observer distribution and estimation method but suggest a new procedure to incorporate uncertainty. This study is also informative for many similar datasets. / Master of Science pelagic longline fishery sea turtle simulation model bycatch delta-lognormal estimation generalized linear model
90	Modelo linear beta Weibull generalizado: propriedades, estimação, diagnóstico e aplicações / Generalized beta Weibull linear model: properties, estimation, diagnostics and aplications Santana, Tiago Viana Flor de 05 October 2016 (has links) Neste trabalho dois novos modelos estatísticos de regressão são propostos, com estrutura muito semelhante aos Modelos Lineares Generalizados (MLG) porém, admitindo as distribuições Weibull exponenciada (WE) e beta Weibull (BW) para o componente aleatório as quais não pertencem a família exponencial como é requerido em MLG. Os novos modelos trazem uma nova abordagem para as distribuições admitidas em modelos de regressão e estende o MLG para além da família exponencial. Os modelos, nomeados por Modelo Linear Weibull Exponeciada Generalizado (MLWEG) e Modelo Linear Beta Weibull Generalizado (MLBWG), possuem como caso particular o modelo Exponencial, pertencente a família de MLG, além de outros modelos que os MLGs não contemplam como, por exemplo: Weibull, WE, Exponencial Exponenciado (EE) entre outros. Além da função taxa de falha (ftf) constante da distribuição Exponencial, os novos modelos ajustam também formas monótonas e não monótonas da ftf. Quando se admite função de ligação logarítmica obtém-se o mesmo modelo de locação e escala, muito utilizado em análise de sobrevivência, sem a necessidade de transformação da variável resposta simplificando a modelagem e permitindo maior compreensão da influência das covariáveis na resposta. Método de estudo de observações influentes foi construído baseado na metodologia de influência local sobre três esquemas de perturbações: perturbação da verossimilhança, da variável resposta e das covariáveis e a análise de resíduo foi proposta a partir da função quantílica. Por fim, dois conjuntos de dados reais foram utilizados para ilustrar a aplicabilidade dos modelos propostos e seus resultados discutidos. / In this work two new statistical regression models are proposed, with very similar structure to Generalized Linear Models (GLM) but, assuming the exponentiated Weibull (EW) and beta Weibull (BW) distributions for the random component which do not belong to the exponential family as required in GLM. The new models bring a new approach to the distribution accepted in regression models and extend the GLM beyond of the exponential family. The models, named by Generalized Exponentiated Weibull Linear Model (GEWLM) and Generalized Beta Weibull Linear Model (GBWLM) have as a particular case the Exponential model, belonging to the family of GLM, and other models that GLMs do not include, for example : Weibull, EW, Exponentiated Exponential (EE) among others. Besides the failure rate function (frf) constant of Exponential distribution, the new models also model monotonous and not monotonous forms of frf. When it accepts logarithmic link function obtains the same location and scale model, widely used in the analysis of survival without the need to transform the response variable simplifying the modeling and allowing greater understanding of the inuence of covariates on the response. Study of inuential observations method was built based on the methodology of the local inuence on three perturbations schemes: perturbation of the likelihood of the response variable and the covariates and residual analysis was proposed from the quantile function. Finally, two sets of real data are used to illustrate the applicability of the models proposed and results discussed. Análise de resíduo Função quantílica Generalized Beta Weibull linear model Generalized linear model Influência local Local influence Modelo linear Beta Weibull generalizado Modelo linear generalizado Residual analysis

Search results