• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 158
  • 158
  • 30
  • 7
  • 6
  • 6
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • Tagged with
  • 438
  • 438
  • 177
  • 154
  • 148
  • 115
  • 101
  • 70
  • 54
  • 50
  • 40
  • 36
  • 34
  • 33
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Time-varying linear prediction as a base for an isolated-word recognition algorithm

McMillan, David Evans 17 November 2012 (has links)
There is a vast amount of research being done in the area of voice recognition. A large portion of this research concentrates on developing algorithms that will yield higher accuracy rates; such as algorithms based on dynamic time warping, vector quantization, and other mathematical methods [l2][21][l5]. In this research, the evaluation of the feasibility of using linear prediction (LP) with time-varying parameters as a base for a voice recognition algorithm will be investigated. First the development of an anti-aliasing filter is discussed with some results from the filter hardware realization included. Then a brief discussion of LP is presented and a method for time-varying LP is derived from this discussion. A comparison between time-varying and segmentation LP is made and a description of the developed algorithm that tests time-varying LP as a recognition technique is given. The evaluation is conducted with the developed algorithm configured for speaker-dependent and speaker-independent isolated-word recognition. The conclusion drawn from this research is that this particular technique of voice recognition is very feasible as a base for a voice recognition algorithm. With the incorporation of other techniques, a complete algorithm can conceivably be developed that will yield very high accuracy rates. Recommendations for algorithm improvements are given along with other techniques that might be added to make a complete recognition algorithm. / Master of Science
82

Ill-conditioned information matrices and the generalized linear model: an asymptotically biased estimation approach

Marx, Brian D. January 1988 (has links)
In the regression framework of the generalized linear model (Nelder and Wedderburn (1972)), interative maximum likelihood parameter estimation is employed via the method of scoring. This iterative procedure involves a key matrix, the information matrix. Ill-conditioning of the information matrix can be responsible for making many desirable properties of the parameter estimates unattainable. Some asymptotically biased alternatives to maximum likelihood estimation are put forth which alleviate the detrimental effects of near singular information. Notions of ridge estimation (Hoerl and Kennard (1970a) and Schaefer (1979)), principal component estimation (Webster et al. (1974) and Schaefer (1986)), and Stein estimation (Stein (1960)) are extended into a regression setting utilizing any one of an entire class of response distributions. / Ph. D.
83

Sequential robust response surface strategy

DeFeo, Patrick A. January 1988 (has links)
General Response Surface Methodology involves the exploration of some response variable which is a function of other controllable variables. Many criteria exist for selecting an experimental design for the controllable variables. A good choice of a design is one that may not be optimal in a single sense, but rather near optimal with respect to several criteria. This robust approach can lend well to strategies that involve sequential or two stage experimental designs. An experimenter that fits a first order regression model for the response often fears the presence of curvature in the system. Experimental designs can be chosen such that the experimenter who fits a first order model will have a high degree of protection against potential model bias from the presence of curvature. In addition, designs can also be selected such that the experimenter will have a high chance for detection of curvature in the system. A lack of fit test is usually performed for detection of curvature in the system. Ideally, an experimenter desires good detection capabilities along with good protection capabilities. An experimental design criterion that incorporates both detection and protection capabilities is the A₂* criterion. This criterion is used to select the designs which maximize the average noncentrality parameter of the lack of fit test among designs with a fixed bias. The first order rotated design class is a new class of designs that offers an improvement in terms of the A₂* criterion over standard first order factorial designs. In conjunction with a sequential experimental strategy, a class of second order rotated designs are easily constructed by augmenting the first order rotated designs. These designs allow for estimation of second order model terms when a significant lack of fit is observed. Two other design criteria, that are closely related, and incorporate both detection and protection capabilities are the J<sub>PCA</sub>, and J<sub>PCMAX</sub> criterion. J<sub>PCA</sub>, considers the average mean squared error of prediction for a first order model over a region where the detection capabilities of the lack of fit test are not strong. J<sub>PCMAX</sub> considers the maximum mean squared error of prediction over the region where the detection capabilities are not strong. The J<sub>PCA</sub> and J<sub>PCMAX</sub> criteria are used within a sequential strategy to select first order experimental designs that perform well in terms of the mean squared error of prediction when it is likely that a first order model will be employed. These two criteria are also adopted for nonsequential experiments for the evaluation of first order model prediction performance. For these nonsequential experiments, second order designs are used and constructed based upon J<sub>PCA</sub> and J<sub>PCMAX</sub> for first order model properties and D₂ -efficiency and D-efficiency for second order model properties. / Ph. D.
84

Unbiased Estimation for the Contextual Effect of Duration of Adolescent Height Growth on Adulthood Obesity and Health Outcomes via Hierarchical Linear and Nonlinear Models

Carrico, Robert 22 May 2012 (has links)
This dissertation has multiple aims in studying hierarchical linear models in biomedical data analysis. In Chapter 1, the novel idea of studying the durations of adolescent growth spurts as a predictor of adulthood obesity is defined, established, and illustrated. The concept of contextual effects modeling is introduced in this first section as we study secular trend of adulthood obesity and how this trend is mitigated by the durations of individual adolescent growth spurts and the secular average length of adolescent growth spurts. It is found that individuals with longer periods of fast height growth in adolescence are more prone to having favorable BMI profiles in adulthood. In Chapter 2 we study the estimation of contextual effects in a hierarchical generalized linear model (HGLM). We simulate data and study the effects using the higher level group sample mean as the estimate for the true mean versus using an Empirical Bayes (EB) approach (Shin and Raudenbush 2010). We study this comparison for logistic, probit, log-linear, ordinal and nominal regression models. We find that in general the EB estimate lends a parameter estimate much closer to the true value, except for cases with very small variability in the upper level, where it is a more complicated situation and there is likely no need for contextual effects analysis. In Chapter 3 the HGLM studies are made clearer with large-scale simulations. These large scale simulations are shown for logistic regression and probit regression models for binary outcome data. With repetition we are able to establish coverage percentages of the confidence intervals of the true contextual effect. Coverage percentages show the percentage of simulations that have confidence intervals containing the true parameter values. Results confirm observations from the preliminary simulations in the previous section of this paper, and an accompanying example of adulthood hypertension shows how these results can be used in an application.
85

Modelos não lineares e lineares generalizados para avaliação da germinação de sementes de milho e soja / Non-linear and linear generalized models for evaluation of the germination of corn and soybean seeds

Amorim, Deoclecio Jardim 24 January 2019 (has links)
Submitted by DEOCLECIO JARDIM AMORIM (deocleciojardim@hotmail.com) on 2019-01-31T12:16:23Z No. of bitstreams: 1 DISSERTAÇÃO.pdf: 2351649 bytes, checksum: 9491438fdbbb72bee3416a1ee635f01f (MD5) / Approved for entry into archive by Ana Lucia de Grava Kempinas (algkempinas@fca.unesp.br) on 2019-01-31T18:34:52Z (GMT) No. of bitstreams: 1 amorim_dj_me_botfca.pdf: 2351649 bytes, checksum: 9491438fdbbb72bee3416a1ee635f01f (MD5) / Made available in DSpace on 2019-01-31T18:34:52Z (GMT). No. of bitstreams: 1 amorim_dj_me_botfca.pdf: 2351649 bytes, checksum: 9491438fdbbb72bee3416a1ee635f01f (MD5) Previous issue date: 2019-01-24 / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / Dentre as características mais estudadas na indústria de sementes e bancos de germoplasma, destaca-se o potencial fisiológico, tendo em vista que sementes de maior qualidade fisiológica permitem obter uma rápida e uniforme emergência das plântulas, e consequentemente o estabelecimento do estande. O objetivo dessa pesquisa foi avaliar a germinação de sementes de milho (Zea mays L.) e soja (Glycine max (L.) Merrill) por meio de modelos não lineares e lineares generalizados. Foram utilizadas as cultivares de milho: AS 1633 PRO3, 2B587 RR, 2A401PW, AL Bandeirante e BRS 4103, e de soja as cultivares: DS59716 IPRO, CD2737 RR, CD251 RR, CD2820 IPRO e CD2857 RR, ambas da safra 2016/17. Avaliou-se a germinação de 20 sementes com quatro repetições por cultivar por meio do teste de emissão da raiz primária (protrusão). A contagem das sementes germinadas foi efetuada em intervalos regulares de 6, 12 e 24 horas até 204 horas, adotando-se como critério de germinação a protrusão da raiz primária ≥ 2 mm. Os dados foram dispostos na forma de porcentagem acumulada ao longo do tempo e pela proporção de sementes viáveis em cada intervalo de tempo testado dado por uma sequência de ensaios de Bernoulli. Os dados de porcentagem acumulada ao longo do tempo foram modelados pelas curvas não lineares de Gompertz e função de Hill de quatro parâmetros e os dados de proporção foram avaliados por modelos lineares generalizados testando as funções ligação: Probit, Logit e Complemento Log Log. As cultivares de milho que apresentaram a maior velocidade de germinação foram: AL Bandeirante e BRS 4103. Para soja os melhores resultados foram observados para as cultivares CD251 RR e CD2737 RR. As metodologias corroboraram quanto à classificação da qualidade fisiológica das cultivares. A curva de Gompertz teve melhor ajuste e permitiu aplicações práticas para o estudo de germinação estabelecendo um novo parâmetro para comparação de diferentes lotes de sementes. Os modelos lineares generalizados constituem uma metodologia robusta para avaliação da germinação de sementes de diferentes lotes e espécies agrícolas permitindo estimar qualquer tempo de germinação e uniformidade. / Among the most studied characteristics in the seed industry and germplasm banks, the physiological potential stands out, since seeds of higher physiological quality allow a quick and uniform emergence of the seedlings, and consequently the establishment of the stand. The objective of this research was to evaluate the germination of corn (Zea mays L.) and soybean (Glycine max (L.) Merrill) seeds using nonlinear models and generalized linear. The used cultivars of corn were: AS 1633 PRO3, 2B587 RR, 2A401PW, AL Bandeirante and BRS 4103, and the soybean cultivars were: DS59716 IPRO, CD2737 RR, CD251 RR, CD2820 IPRO and CD2857 RR, both of the 2016/17 crop. The germination of 20 seeds with four replicates per cultivar was evaluated by the primary root emission test (protrusion). The germinated seeds were counted at regular intervals of 6, 12 and 24 hours up to 204 hours, with protrusion of the primary root ≥ 2 mm being the germination criterion. The data were plotted as a percentage accumulated over time and by the proportion of viable seeds at each interval of time tested given by a sequence of Bernoulli assays. The data of percentage accumulated over time were modeled by the non-linear Gompertz curves and Hill function with four parameters and the proportion data were evaluated by generalized linear models testing the linking functions: Probit, Logit and Complement Log Log.The corn cultivars with the highest germination speed were: AL Bandeirante and BRS 4103. For soybean the best results were observed for the cultivars CD251 RR and CD2737 RR. The methodologies corroborate the classification of the physiological quality of cultivars. The Gompertz curve had a better adjustment and allowed practical applications for the study of germination, establishing a new parameter for comparison of different seeds lots. The generalized linear models constitute a robust methodology to evaluate the germination of seeds of different lots and agricultural species, allowing to estimate any germination and uniformity time.
86

Discrepancy-based algorithms for best-subset model selection

Zhang, Tao 01 May 2013 (has links)
The selection of a best-subset regression model from a candidate family is a common problem that arises in many analyses. In best-subset model selection, we consider all possible subsets of regressor variables; thus, numerous candidate models may need to be fit and compared. One of the main challenges of best-subset selection arises from the size of the candidate model family: specifically, the probability of selecting an inappropriate model generally increases as the size of the family increases. For this reason, it is usually difficult to select an optimal model when best-subset selection is attempted based on a moderate to large number of regressor variables. Model selection criteria are often constructed to estimate discrepancy measures used to assess the disparity between each fitted candidate model and the generating model. The Akaike information criterion (AIC) and the corrected AIC (AICc) are designed to estimate the expected Kullback-Leibler (K-L) discrepancy. For best-subset selection, both AIC and AICc are negatively biased, and the use of either criterion will lead to overfitted models. To correct for this bias, we introduce a criterion AICi, which has a penalty term evaluated from Monte Carlo simulation. A multistage model selection procedure AICaps, which utilizes AICi, is proposed for best-subset selection. In the framework of linear regression models, the Gauss discrepancy is another frequently applied measure of proximity between a fitted candidate model and the generating model. Mallows' conceptual predictive statistic (Cp) and the modified Cp (MCp) are designed to estimate the expected Gauss discrepancy. For best-subset selection, Cp and MCp exhibit negative estimation bias. To correct for this bias, we propose a criterion CPSi that again employs a penalty term evaluated from Monte Carlo simulation. We further devise a multistage procedure, CPSaps, which selectively utilizes CPSi. In this thesis, we consider best-subset selection in two different modeling frameworks: linear models and generalized linear models. Extensive simulation studies are compiled to compare the selection behavior of our methods and other traditional model selection criteria. We also apply our methods to a model selection problem in a study of bipolar disorder.
87

Modelos lineares parciais aditivos generalizados com suavização por meio de P-splines / Generalized additive partial linear models with P-splines smoothing

Holanda, Amanda Amorim 03 May 2018 (has links)
Neste trabalho apresentamos os modelos lineares parciais generalizados com uma variável explicativa contínua tratada de forma não paramétrica e os modelos lineares parciais aditivos generalizados com no mínimo duas variáveis explicativas contínuas tratadas de tal forma. São utilizados os P-splines para descrever a relação da variável resposta com as variáveis explicativas contínuas. Sendo assim, as funções de verossimilhança penalizadas, as funções escore penalizadas e as matrizes de informação de Fisher penalizadas são desenvolvidas para a obtenção das estimativas de máxima verossimilhança penalizadas por meio da combinação do algoritmo backfitting (Gauss-Seidel) e do processo iterativo escore de Fisher para os dois tipos de modelo. Em seguida, são apresentados procedimentos para a estimação do parâmetro de suavização, bem como dos graus de liberdade efetivos. Por fim, com o objetivo de ilustração, os modelos propostos são ajustados à conjuntos de dados reais. / In this work we present the generalized partial linear models with one continuous explanatory variable treated nonparametrically and the generalized additive partial linear models with at least two continuous explanatory variables treated in such a way. The P-splines are used to describe the relationship among the response and the continuous explanatory variables. Then, the penalized likelihood functions, penalized score functions and penalized Fisher information matrices are derived to obtain the penalized maximum likelihood estimators by the combination of the backfitting (Gauss-Seidel) algorithm and the Fisher escoring iterative method for the two types of model. In addition, we present ways to estimate the smoothing parameter as well as the effective degrees of freedom. Finally, for the purpose of illustration, the proposed models are fitted to real data sets.
88

Modelos lineares parciais aditivos generalizados com suavização por meio de P-splines / Generalized additive partial linear models with P-splines smoothing

Amanda Amorim Holanda 03 May 2018 (has links)
Neste trabalho apresentamos os modelos lineares parciais generalizados com uma variável explicativa contínua tratada de forma não paramétrica e os modelos lineares parciais aditivos generalizados com no mínimo duas variáveis explicativas contínuas tratadas de tal forma. São utilizados os P-splines para descrever a relação da variável resposta com as variáveis explicativas contínuas. Sendo assim, as funções de verossimilhança penalizadas, as funções escore penalizadas e as matrizes de informação de Fisher penalizadas são desenvolvidas para a obtenção das estimativas de máxima verossimilhança penalizadas por meio da combinação do algoritmo backfitting (Gauss-Seidel) e do processo iterativo escore de Fisher para os dois tipos de modelo. Em seguida, são apresentados procedimentos para a estimação do parâmetro de suavização, bem como dos graus de liberdade efetivos. Por fim, com o objetivo de ilustração, os modelos propostos são ajustados à conjuntos de dados reais. / In this work we present the generalized partial linear models with one continuous explanatory variable treated nonparametrically and the generalized additive partial linear models with at least two continuous explanatory variables treated in such a way. The P-splines are used to describe the relationship among the response and the continuous explanatory variables. Then, the penalized likelihood functions, penalized score functions and penalized Fisher information matrices are derived to obtain the penalized maximum likelihood estimators by the combination of the backfitting (Gauss-Seidel) algorithm and the Fisher escoring iterative method for the two types of model. In addition, we present ways to estimate the smoothing parameter as well as the effective degrees of freedom. Finally, for the purpose of illustration, the proposed models are fitted to real data sets.
89

Supervised Learning of Piecewise Linear Models

Manwani, Naresh January 2012 (has links) (PDF)
Supervised learning of piecewise linear models is a well studied problem in machine learning community. The key idea in piecewise linear modeling is to properly partition the input space and learn a linear model for every partition. Decision trees and regression trees are classic examples of piecewise linear models for classification and regression problems. The existing approaches for learning decision/regression trees can be broadly classified in to two classes, namely, fixed structure approaches and greedy approaches. In the fixed structure approaches, tree structure is fixed before hand by fixing the number of non leaf nodes, height of the tree and paths from root node to every leaf node of the tree. Mixture of experts and hierarchical mixture of experts are examples of fixed structure approaches for learning piecewise linear models. Parameters of the models are found using, e.g., maximum likelihood estimation, for which expectation maximization(EM) algorithm can be used. Fixed structure piecewise linear models can also be learnt using risk minimization under an appropriate loss function. Learning an optimal decision tree using fixed structure approach is a hard problem. Constructing an optimal binary decision tree is known to be NP Complete. On the other hand, greedy approaches do not assume any parametric form or any fixed structure for the decision tree classifier. Most of the greedy approaches learn tree structured piecewise linear models in a top down fashion. These are built by binary or multi-way recursive partitioning of the input space. The main issues in top down decision tree induction is to choose an appropriate objective function to rate the split rules. The objective function should be easy to optimize. Top-down decision trees are easy to implement and understand, but there are no optimality guarantees due to their greedy nature. Regression trees are built in the similar way as decision trees. In regression trees, every leaf node is associated with a linear regression function. All piece wise linear modeling techniques deal with two main tasks, namely, partitioning of the input space and learning a linear model for every partition. However, Partitioning of the input space and learning linear models for different partitions are not independent problems. Simultaneous optimal estimation of partitions and learning linear models for every partition, is a combinatorial problem and hence computationally hard. However, piecewise linear models provide better insights in to the classification or regression problem by giving explicit representation of the structure in the data. The information captured by piecewise linear models can be summarized in terms of simple rules, so that, they can be used to analyze the properties of the domain from which the data originates. These properties make piecewise linear models, like decision trees and regression trees, extremely useful in many data mining applications and place them among top data mining algorithms. In this thesis, we address the problem of supervised learning of piecewise linear models for classification and regression. We propose novel algorithms for learning piecewise linear classifiers and regression functions. We also address the problem of noise tolerant learning of classifiers in presence of label noise. We propose a novel algorithm for learning polyhedral classifiers which are the simplest form of piecewise linear classifiers. Polyhedral classifiers are useful when points of positive class fall inside a convex region and all the negative class points are distributed outside the convex region. Then the region of positive class can be well approximated by a simple polyhedral set. The key challenge in optimally learning a fixed structure polyhedral classifier is to identify sub problems, where each sub problem is a linear classification problem. This is a hard problem and identifying polyhedral separability is known to be NP complete. The goal of any polyhedral learning algorithm is to efficiently handle underlying combinatorial problem while achieving good classification accuracy. Existing methods for learning a fixed structure polyhedral classifier are based on solving non convex constrained optimization problems. These approaches do not efficiently handle the combinatorial aspect of the problem and are computationally expensive. We propose a method of model based estimation of posterior class probability to learn polyhedral classifiers. We solve an unconstrained optimization problem using a simple two step algorithm (similar to EM algorithm) to find the model parameters. To the best of our knowledge, this is the first attempt to form an unconstrained optimization problem for learning polyhedral classifiers. We then modify our algorithm to find the number of required hyperplanes also automatically. We experimentally show that our approach is better than the existing polyhedral learning algorithms in terms of training time, performance and the complexity. Most often, class conditional densities are multimodal. In such cases, each class region may be represented as a union of polyhedral regions and hence a single polyhedral classifier is not sufficient. To handle such situation, a generic decision tree is required. Learning optimal fixed structure decision tree is a computationally hard problem. On the other hand, top-down decision trees have no optimality guarantees due to the greedy nature. However, top-down decision tree approaches are widely used as they are versatile and easy to implement. Most of the existing top-down decision tree algorithms (CART,OC1,C4.5, etc.) use impurity measures to assess the goodness of hyper planes at each node of the tree. These measures do not properly capture the geometric structures in the data. We propose a novel decision tree algorithm that ,at each node, selects hyperplanes based on an objective function which takes into consideration geometric structure of the class regions. The resulting optimization problem turns out to be a generalized eigen value problem and hence is efficiently solved. We show through empirical studies that our approach leads to smaller size trees and better performance compared to other top-down decision tree approaches. We also provide some theoretical justification for the proposed method of learning decision trees. Piecewise linear regression is similar to the corresponding classification problem. For example, in regression trees, each leaf node is associated with a linear regression model. Thus the problem is once again that of (simultaneous) estimation of optimal partitions and learning a linear model for each partition. Regression trees, hinge hyperplane method, mixture of experts are some of the approaches to learn continuous piecewise linear regression models. Many of these algorithms are computationally intensive. We present a method of learning piecewise linear regression model which is computationally simple and is capable of learning discontinuous functions as well. The method is based on the idea of K plane regression that can identify a set of linear models given the training data. K plane regression is a simple algorithm motivated by the philosophy of k means clustering. However this simple algorithm has several problems. It does not give a model function so that we can predict the target value for any given input. Also, it is very sensitive to noise. We propose a modified K plane regression algorithm which can learn continuous as well as discontinuous functions. The proposed algorithm still retains the spirit of k means algorithm and after every iteration it improves the objective function. The proposed method learns a proper Piece wise linear model that can be used for prediction. The algorithm is also more robust to additive noise than K plane regression. While learning classifiers, one normally assumes that the class labels in the training data set are noise free. However, in many applications like Spam filtering, text classification etc., the training data can be mislabeled due to subjective errors. In such cases, the standard learning algorithms (SVM, Adaboost, decision trees etc.) start over fitting on the noisy points and lead to poor test accuracy. Thus analyzing the vulnerabilities of classifiers to label noise has recently attracted growing interest from the machine learning community. The existing noise tolerant learning approaches first try to identify the noisy points and then learn classifier on remaining points. In this thesis, we address the issue of developing learning algorithms which are inherently noise tolerant. An algorithm is inherently noise tolerant if, the classifier it learns with noisy samples would have the same performance on test data as that learnt from noise free samples. Algorithms having such robustness (under suitable assumption on the noise) are attractive for learning with noisy samples. Here, we consider non uniform label noise which is a generic noise model. In non uniform label noise, the probability of the class label for an example being incorrect, is a function of the feature vector of the example.(We assume that this probability is less than 0.5 for all feature vectors.) This can account for most cases of noisy data sets. There is no provably optimal algorithm for learning noise tolerant classifiers in presence of non uniform label noise. We propose a novel characterization of noise tolerance of an algorithm. We analyze noise tolerance properties of risk minimization frame work as risk minimization is a common strategy for classifier learning. We show that risk minimization under 01 loss has the best noise tolerance properties. None of the other convex loss functions have such noise tolerance properties. Empirical risk minimization under 01 loss is a hard problem as 01 loss function is not differentiable. We propose a gradient free stochastic optimization technique to minimize risk under 01 loss function for noise tolerant learning of linear classifiers. We show (under some conditions) that the algorithm converges asymptotically to the global minima of the risk under 01 loss function. We illustrate the noise tolerance of our algorithm through simulations experiments. We demonstrate the noise tolerance of the algorithm through simulations.
90

Bayesian generalized linear models for meta-analysis of diagnostic tests.

Xing, Yan. Cormier, Janice N., Swint, John Michael, January 2008 (has links)
Thesis (Ph. D.)--University of Texas Health Science Center at Houston, School of Public Health, 2008. / Source: Dissertation Abstracts International, Volume: 69-02, Section: B, page: 0769. Advisers: Claudia Pedroza; Asha S. Kapadia. Includes bibliographical references.

Page generated in 0.0757 seconds