• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 264
  • 147
  • 41
  • 30
  • 23
  • 14
  • 13
  • 6
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • Tagged with
  • 626
  • 626
  • 210
  • 124
  • 114
  • 87
  • 86
  • 86
  • 75
  • 67
  • 61
  • 58
  • 58
  • 56
  • 55
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
511

Feature Selection under Multicollinearity & Causal Inference on Time Series

Bhattacharya, Indranil January 2017 (has links) (PDF)
In this work, we study and extend algorithms for Sparse Regression and Causal Inference problems. Both the problems are fundamental in the area of Data Science. The goal of regression problem is to nd out the \best" relationship between an output variable and input variables, given samples of the input and output values. We consider sparse regression under a high-dimensional linear model with strongly correlated variables, situations which cannot be handled well using many existing model selection algorithms. We study the performance of the popular feature selection algorithms such as LASSO, Elastic Net, BoLasso, Clustered Lasso as well as Projected Gradient Descent algorithms under this setting in terms of their running time, stability and consistency in recovering the true support. We also propose a new feature selection algorithm, BoPGD, which cluster the features rst based on their sample correlation and do subsequent sparse estimation using a bootstrapped variant of the projected gradient descent method with projection on the non-convex L0 ball. We attempt to characterize the efficiency and consistency of our algorithm by performing a host of experiments on both synthetic and real world datasets. Discovering causal relationships, beyond mere correlation, is widely recognized as a fundamental problem. The Causal Inference problems use observations to infer the underlying causal structure of the data generating process. The input to these problems is either a multivariate time series or i.i.d sequences and the output is a Feature Causal Graph where the nodes correspond to the variables and edges capture the direction of causality. For high dimensional datasets, determining the causal relationships becomes a challenging task because of the curse of dimensionality. Graphical modeling of temporal data based on the concept of \Granger Causality" has gained much attention in this context. The blend of Granger methods along with model selection techniques, such as LASSO, enables efficient discovery of a \sparse" sub-set of causal variables in high dimensional settings. However, these temporal causal methods use an input parameter, L, the maximum time lag. This parameter is the maximum gap in time between the occurrence of the output phenomenon and the causal input stimulus. How-ever, in many situations of interest, the maximum time lag is not known, and indeed, finding the range of causal e ects is an important problem. In this work, we propose and evaluate a data-driven and computationally efficient method for Granger causality inference in the Vector Auto Regressive (VAR) model without foreknowledge of the maximum time lag. We present two algorithms Lasso Granger++ and Group Lasso Granger++ which not only constructs the hypothesis feature causal graph, but also simultaneously estimates a value of maxlag (L) for each variable by balancing the trade-o between \goodness of t" and \model complexity".
512

Produktivita a její měření se zaměřením na jednotlivé velikostní kategorie podniků / Productivity and its measuring aiming at individual size categories of companies.

ŘÍHA, Jaroslav January 2015 (has links)
The main aim of this thesis was to assess the relation between productivity and size of company in selected sector of national economy construction industry. The sub-objectives were determined as follows: to gather theoretical information on the topic, to analyse the productivity according to the individual size categories of companies in the selected sector, to assess the relation between the size of company and the productivity and to ascertain the level of productivity in the Czech Republic in comparison with the selected sector.
513

Estimativa do escoamento superficial em diferentes níveis de dossel vegetativo e cobertura do solo / Runoff estimate at different levels of canopy vegetative and soil cover

Knies, Alberto Eduardo 25 March 2014 (has links)
Conselho Nacional de Desenvolvimento Científico e Tecnológico / The soil tillage systems modify its water balance and for the correct irrigation management is fundamental to determining the runoff and effective rainfall, which helps to maximize the use of rainwater and minimizes the use of supplemental irrigation. The objective of this study was to determine, model and estimate the runoff and the effective rainfall during the development cycle of the common black bean and maize in soil with and without straw on the surface, in different land slope and rainfall intensities simulated, using the field experiments, multivariate equations, the Curve Number Method (CN) and the SIMDualKc Model. Two experiments were conducted in the field with crops of black beans and maize, where different intensities of simulated rainfall (35, 70 and 105 mm h-1) were applied at different times of the crop cycle (soil cover of 0, 28, 63 and 100% by the canopy beans; 0, 30, 72 and 100% by canopy of maize) and distinct land slope (1, 5 and 10%) in soil without and with (5 Mg ha-1) of oat straw on the surface. The runoff values observed were compared with those estimated by the CN method, suggesting new values of CN to improve the estimate. From the set of data collected from the field analysis of multiple linear regression to estimate runoff and simulations with SIMDualKc model to estimate runoff and effective rainfall were performed. The start time of the runoff, constant runoff rate, total runoff and the percentage of runoff in relation to the volume of rain were little influenced by the crops of beans and maize. Reductions in runoff were provided by the straw on the soil surface within 45 and 48% for the crops beans and maize, respectively. The CN method for the bean crop underestimated runoff by up to 10% for the soil without straw on the surface, and overestimated by up to 17% for the soil with straw. For maize, the method overestimated the runoff by up 32.4% in soil with straw and 12% in soil without straw. To improve estimation the CN, new values are proposed for CN, considering the crop, the presence or absence of straw on soil surface and intensity rain. The use of multiple linear regression analyzes indicated that the volume of precipitation (R2=0.52) and soil cover by straw (R2=0.18) are the variables with the greatest influence on runoff. Four multiple equations were generated, and the equation 2, whose input parameters are the volume of rain and amount of litter on the soil surface, was presented the best estimate of the runoff of a data set than the one that gave its origin. The SIMDualKc Model requires adjustments to estimate runoff and effective rainfall during the crop cycle of beans and maize, so consider the benefits of straw on the soil surface in reducing runoff. Thus, the suggested value of CN (CN=75) was changed to 71 and 87 to the black bean crop, and 56 and 79 for the maize crop for the soil with and without straw on the surface, respectively. / Os sistemas de manejo do solo modificam o seu balanço hídrico e para o correto manejo da irrigação é de fundamental importância a determinação do escoamento superficial e da chuva efetiva, o que contribui para maximizar o uso da água das chuvas e minimiza a utilização de irrigação suplementar. O objetivo do presente trabalho foi determinar, modelar e estimar o escoamento superficial e a chuva efetiva durante o ciclo de desenvolvimento das culturas do feijão e milho, cultivados em solo com e sem palha na superfície, em diferentes declividade do terreno e intensidades de chuvas simuladas, utilizando experimentos a campo, equações multivariadas, o método Curva Número (CN) e o modelo SIMDualKc. Foram realizados dois experimentos à campo, com as culturas do feijão e milho, em que foram aplicadas diferentes intensidades de chuvas simuladas (35, 70 e 105 mm h-1), em diferentes momentos do ciclo das culturas (cobertura do solo de 0, 28, 63 e 100% pelo dossel vegetativo do feijão; 0, 30, 72 e 100% pelo dossel vegetativo do milho) e distintas declividade do terreno (1, 5 e 10%), em solo sem e com (5 Mg ha-1) palha de aveia preta na superfície. Os valores de escoamento superficial observados foram comparados com os estimados pelo método CN, sugerindo-se novos valores de CN para melhorar a estimativa. A partir do conjunto de dados coletados a campo, foram realizadas análises de regressão linear múltiplas para a estimativa do escoamento superficial e, simulações com o modelo SIMDualKc para estimativa do escoamento superficial e da chuva efetiva. O tempo de início do escoamento, a taxa constante de escoamento, o escoamento total e a porcentagem de escoamento em relação ao volume da chuva foram pouco influenciados pelo crescimento do dossel vegetativo das plantas de feijão e milho. Reduções no escoamento superficial foram proporcionadas pela presença de palha na superfície do solo, em até 45 e 48% para as culturas do feijão e milho, respectivamente. O método CN para a cultura do feijão subestimou o escoamento superficial em até 10% para o solo sem palha na superfície e, superestimou em até 17% para o solo com palha. Para a cultura do milho, o método CN superestimou o escoamento superficial em até 32,4% no solo com palha e 12% no solo sem palha. Para melhorar a estimativa do método CN, foram propostos novos valores de CN, considerando a cultura, a presença ou não de palha na superfície do solo e a intensidade da chuva. A utilização de análises de regressão linear múltiplas indicaram que o volume da chuva (R2=0,52) e a cobertura do solo por palha (R2=0,18) são as variáveis com maior influência sobre o escoamento superficial. Foram geradas quatro equações múltiplas, sendo que a equação 2, cujos parâmetros de entrada são o volume da chuva e quantidade de palha na superfície do solo, foi a que apresentou a melhor estimativa do escoamento superficial de um conjunto de dados diferente daquele que lhe deu origem. O modelo SIMDualKc necessita de ajustes para estimar o escoamento superficial e a chuva efetiva durante o ciclo das culturas de feijão e milho, de modo que considere os benefícios da palha na superfície do solo na redução do escoamento superficial. Desta forma, o valor sugerido de CN (CN=75) foi alterado para 71 e 87 para a cultura do feijão e, 56 e 79 para a cultura do milho, para o solo com e sem palha na superfície, respectivamente.
514

Aplicação estruturada de dados de redes sociais na modelagem de instrumentos de apoio às decisões de concessão de crédito / Social networks structured data application: modelins support tools for credit acquisitions decisions

Fattibene, Marcos 27 January 2015 (has links)
Made available in DSpace on 2016-06-02T19:53:33Z (GMT). No. of bitstreams: 1 FATTIBENE_Marcos_2015.pdf: 1035875 bytes, checksum: 9f4308478818fe20ad4a239e96c1bb67 (MD5) Previous issue date: 2015-01-27 / The credit analysis for individuals has traditionally relied on three pillars: documentary proof of income and residence; refers to negative credit bureaus as SERASA and SCPC and the use of forecasting models based on the hypothesis that similar profiles in the future will reproduce the same credit behavior of the past, such as the "credit scores" (HAND; HENLEY, 2007) . This approach has been adequate, while being susceptible to moments of economic crisis or to fast profile changing of the target market, as occurred in the U.S. subprime in 2008. This study aims to point out ways to use Social Networks informational content, where individuals express and record their opinions, preferences, and especially get evident their network of relationships, in the credit analysis context. It was made evident the feasibility to investigate the assumption that an individual's proximity to other appropriate profile payers, or vice versa, influences the repayment rate. To illustrate such a conclusion, a real social network, enriched with credit data obtained by statistical simulation, was used. Three models of data weighting and three other based on multiple linear regression models were developed. In general the results were not statistically significant, by need to use a non-brazilian social network, as well synthetic data bureau score, since real information was not available in this country. It was shown a way to investigate the hypothesis that the informational content of a social network may generate greater efficiency into credit analysis when added to decision-making, operational and control systems of this segment. / A análise de crédito para pessoas físicas tem tradicionalmente se apoiado em três pilares: comprovação documental de renda e de residência; consulta a birôs negativos de crédito, como SERASA Experian e SCPC e a utilização de modelos de projeção baseados na hipótese que perfis semelhantes reproduzirão no futuro o comportamento de crédito do passado, como por exemplo, os credit scores (HAND ; HENLEY, 2007). Tal abordagem tem se mostrado adequada, sendo, entretanto suscetível a momentos de crise econômica ou mudança rápida do perfil do mercado alvo, a exemplo do ocorrido no mercado imobiliário dos EUA no ano de 2008. O presente trabalho propõe-se indicar alternativas para a utilização do teor informacional presente nas Redes Sociais, onde os indivíduos registram suas opiniões, preferências e especialmente evidenciam sua rede de relacionamentos, no contexto da análise de risco de crédito. Evidenciaram-se formas de averiguação da premissa que proximidade de um indivíduo a outros com perfil de bons pagadores, ou vice-versa, influencia a taxa de adimplência. Para se ilustrar tais sugestões, foi utilizada uma rede social real, enriquecida com dados de crédito obtidos por simulação estatística. Foram elaborados três modelos de ponderação de dados e três modelos baseados em regressão linear múltipla. Em geral os resultados não foram estatisticamente significantes, dada a necessidade de uso de rede social estrangeira como também da geração de dados sintéticos de score de birô de crédito, dada a indisponibilidade de informações reais no País. Porém, ficou evidenciada a viabilidade da averiguação da hipótese de que o conteúdo informacional contido em redes sociais pode ampliar a eficiência do sistema de análise de crédito, se incorporado aos sistemas decisórios, operativos e de controle.
515

Estimativa de demanda de energia elétrica em uma instituição de ensino superior.

Garcia, Altemir Tomaz de Carvalho 28 August 2015 (has links)
Submitted by Maike Costa (maiksebas@gmail.com) on 2016-04-25T12:59:29Z No. of bitstreams: 1 arquivo total.pdf: 2399944 bytes, checksum: b31ffd454938b9e66e2494032cb6d9ee (MD5) / Made available in DSpace on 2016-04-25T12:59:29Z (GMT). No. of bitstreams: 1 arquivo total.pdf: 2399944 bytes, checksum: b31ffd454938b9e66e2494032cb6d9ee (MD5) Previous issue date: 2015-08-28 / In recent years, several studies where published regarding to the estimation of variables related to the use of electricity, where the most varied methodologies are used to perform modeling and estimation of demand for energy of countries, States, companies in general and educational systems. In this dissertation where chosen this last category and the focus is on Higher Education Institutions (HEIs). Looking for drawing up an estimate of Wing Maxim Demand (WMD), monthly of electrical energy power, for the (HEIs), from the amount of students and, if necessary, from other causal variables, which can contribute to managerial way for the renegotiation of contracts with concessionaires that lead to annual cost savings and still contribute to a better control of the levels of maxim demand of electricity. To achieve this objective, it was realized a review of the literature regarding to the variables that could introduce correlation with the dependent variable WMD. This review indicated several methodologies that could contribute to the solution of the problem proposed: Markov Chain, Support vector Regression methodology, Genetic Programming Model and Artificial Neural Networks. It was adopted the methodology of Multiple Linear Regression (MLR) because it is less complex and a methodology directed at large companies. It was selected an IES and were carried out interviews with some engineers and technician of his electrical engineering division, seeking to better understand energy use and the behavior of the variable WMD in this IES being made available the reports of power energy monitoring where the WMD data of January-December 2008 of 2014 were contained. So on the basis of these data and documental research of the independent variables, and, through the methodologies of Multiple Linear Regression (MLR), it was developed a model from the data of 72 months which had their waste evaluated, showing a coefficient of determination R ^ 2 equal to 0.883. Independent variables that remained in the model, from the use of the backward method, were 4 (four) Dummy variables associated with the years, six variables of this type associated with the months and a variable which is the product of school days for graduates and the quantity of graduate students registered. This model was able to identify seasonality presents in the behavior of the WMD of this HIE. It would allow the hiring of WMD per month, that would make savings of 57% compared to the traditional contracting mode (WMD fixed for the entire period), considering the period from July to December, before the period left for validation. In conclusion, a forecast for the period of January to May 2015 and the adoption of the proposed model was able to provide a savings of 45% in relation to the scheme currently used by this HEI. / Nos últimos anos, diversos trabalhos foram publicados em relação à estimativa de variáveis relacionadas ao uso da energia elétrica, onde as mais variadas metodologias são utilizadas para realizar a modelagem e estimação da demanda por energia de países, Estados, empresas em geral e dos sistemas de ensino. Nesta dissertação foi escolhida esta última categoria e o foco consiste nas Instituições de Ensino Superior (IES). Procurando elaborar uma estimativa de Demanda Máxima de Ponta (DMP), mensal de potência de energia elétrica adequada às IESS, a partir da quantidade de alunos, e, se necessário, a partir de outras variáveis causais, que possam contribuir de maneira gerencial para a renegociação de contratos com concessionárias que levem à redução de custos anuais e que ainda podem contribuir para um melhor controle dos níveis de demanda máxima de energia elétrica. Para alcançar tal objetivo, foi realizada uma revisão da literatura a respeito de variáveis que poderiam apresentar correlação com a variável dependente DMP. Esta revisão indicou várias metodologias que poderiam contribuir para a solução do problema proposto: a Cadeia de Markov, a Metodologia de Regressão do vetor de Suporte, o Modelo de Programação Genética e as Redes Neurais Artificiais. Por ser uma metodologia menos complexa e direcionada a empresas de grande porte, adotou-se a Metodologia de Regressão Linear Múltipla (RLM). Foi selecionada uma IES e foram realizadas entrevistas com alguns engenheiros e técnico da sua divisão de engenharia elétrica, procurando entender melhor o uso da energia e o comportamento da variável DMP nesta IES, sendo disponibilizados os relatórios de energia do sistema de monitoração de energia onde os dados de DMP de janeiro de 2008 a dezembro de 2014 estavam contidos. Então, com base nestes dados e em pesquisa documental das candidatas a variáveis independentes, e, através da Metodologia (RLM), foi desenvolvido um modelo a partir dos dados de 72 meses, que teve seus resíduos avaliados, apresentando um coeficiente de determinação 𝑅2 igual a 0,883 .As variáveis independentes que permaneceram no modelo, a partir da utilização do método backward, foram 4(quatro) variáveis Dummy associadas a anos, seis variáveis deste tipo associadas a meses e uma variável fruto do produto entre dias letivos de graduação e quantidade de alunos da graduação matriculados. O modelo foi capaz de identificar a sazonalidade presente no comportamento da DMP da IES em estudo. Ele possibilitaria a contratação de DMP por mês, o que daria uma economia de 57% em relação ao modo de contratação tradicional (DMP fixo para todo o período), considerando o período de julho a dezembro, antes do período deixado para validação. Concluindo, foi realizada uma previsão para o período de janeiro a maio de 2015 e a adoção do modelo proposto foi capaz de proporcionar uma economia de 45% em relação ao esquema utilizado atualmente pela IES.
516

Fatores associados a sintomas depressivos em mães de recém-nascidos pré-termo de muito baixo peso: 36 meses de seguimento / Factors associated with depressive symptoms in very low birth weight preterm mothers: 36 months follow-up

Bonini, Marília Martins Prado 15 December 2016 (has links)
A sobrevivência de recém-nascidos pré-termo de muito baixo peso (RNPT MBP) aumentou marcadamente nas últimas décadas, mas não acompanhada por redução das taxas de morbidade relacionadas à prematuridade. O parto prematuro é apontado como um importante fator de desajuste no equilíbrio emocional materno e nas suas relações familiares e sociais. O prejuízo emocional materno representa potencial risco ao desenvolvimento neuropsicológico da criança. Mães de RNPT MBP apresentam maior incidência de sintomas de ansiedade e depressão, bem como pior percepção de bem-estar, do que mães de recém-nascidos a termo (RNT). No entanto, ainda é pouco conhecida a associação entre as comorbidades relacionadas à prematuridade (displasia broncopulmonar, retinopatia e hemorragia peri-intraventricular) e a intensidade de sintomas depressivos maternos. O objetivo do presente estudo foi verificar a intensidade de sintomas depressivos em mães de RNPT MBP durante 36 meses após o parto e sua possível associação com características sociodemográficas, clínicas e qualidade de vida das mães e com características clínicas dos RNPT MBP. Setenta e cinco mães de RNPT (≤ 34 semanas de idade gestacional) MBP (peso ao nascer ≤ 1.500g) internados numa unidade de terapia intensiva neonatal (UTIN) participaram de um estudo longitudinal e responderam ao Inventário de Depressão de Beck (IDB) e ao WHOQOL-abreviado, em seis momentos ao longo de 36 meses após o parto. A análise fatorial foi utilizada para identificar possíveis clusters do IDB e a regressão linear múltipla para avaliar a contribuição de cada variável independente na variação do escore global do IDB. A análise fatorial do IDB identificou a presença de dois fatores: baixa autoestima / insatisfação e sintomas somáticos. As medianas dos escores do IDB foram maiores (p = 0,03) no momento da alta materna (9,0; 0-56) em comparação com os obtidos 6 meses pós-parto (6,0; 0-27), mantendo-se estáveis com doze (5,0; 0-36), dezoito (7,0; 0-33), 24 (7,0; 0-33) e 36 (6,5; 0-34) meses. As medianas dos escores do cluster sintomas somáticos foram maiores (p = 0,00) no momento da alta materna (6,0; 0-23) e aos seis meses (5,0; 0-17) do que doze (4,0; 0-11), dezoito (3,0; 0-13), 24 (3,5; 0-16) e 36 (3,0; 0-15) meses após o parto. Os modelos da regressão explicaram grande parte da variação dos escores do IDB em todos os períodos do estudo (0,19 ≤ R2 ajustado ≤ 0,64; p < 0,01). Os domínios do WHOQOL-abreviado (físico, psicológico, social e meio ambiente) foram as variáveis que explicaram as variações do escore global do IDB (-0,34 ≤β≤-0,12; p < 0,01). Mães de RNPT MBP apresentaram maior intensidade de sintomas depressivos no momento da sua alta hospitalar. A presença de sintomas depressivos associa-se, sobretudo, com pior qualidade de vida em mães de RNPT MBP. / The survival rate of very low birth weight preterm infants (VLBW) has increased markedly in recent decades but has not been accompanied by a reduction in prematurity related morbidities. Preterm birth is reported as an important factor of imbalance in maternal emotional health and in their family and social relationships. Maternal emotional impairment represents a potential risk to child's neuropsychological development. Mothers of VLBW infants have a higher incidence of anxiety and depression symptoms as well as a worse perception of well-being than mothers and full-term infants. However, the association between comorbidities related to prematurity (bronchopulmonary dysplasia, retinopathy and peri-intraventricular hemorrhage) and the severity of maternal depressive symptoms are still unknown. The objective of the present study was to verify the intensity of depressive symptoms in mothers of VLBW infants for 36 months after delivery and its possible association with mothers’ quality of life and sociodemographic and clinical characteristics and with VLBW infants’ clinical characteristics. Seventy five mothers of VLBW infants (≤ 34 gestational age weeks and birth weight ≤ 1,500g) admitted to a neonatal intensive care unit (NICU) participated in a longitudinal study and responded to the Beck Depression Inventory (BDI) and WHOQOL-Bref, at six times over 36 months postpartum. The factorial analysis was used to identify possible BDI clusters and multiple linear regression was used to evaluate the contribution of each independent variable in the variation of the overall BDI score. The BDI factorial analysis identified the presence of two factors: low self-esteem / dissatisfaction and somatic symptoms. Mothers’ median BDI scores were higher (p = 0.03) at discharge (9.0; 0-56) than 6 months postpartum (6.0; 0-27) and remained stable with twelve (5.0, 0-36), eighteen (7.0, 0-33), 24 (7.0, 0-33) and 36 (6.5, 0-34) months. The somatic symptoms cluster meadian scores were higher (p = 0.00) at the time of maternal discharge (6.0, 0-23) and at six months (5.0; 0-17) than twelve (4, 0, 0-11), eighteen (3.0; 0-13), 24 (3.5; 0-16) and 36 (3,0; 0-15) months postpartum. The regression models explained a large part of the BDI scores variation in all study periods (0.19 ≤ adjusted R2 ≤ 0.64, p <0.01). The WHOQOL-Bref domains (physical, psychological, social and environment) were the variables that explained the variations in the overall BDI scores (-0.34 ≤β≤-0.12; p <0.01). Mothers of VLBW infants present greater intensity of depressive symptoms at the moment of their hospital discharge. The presence of depressive symptoms is mainly associated with poorer quality of life in VLBW infants’ mothers. / Tese (Doutorado)
517

Supervised Learning of Piecewise Linear Models

Manwani, Naresh January 2012 (has links) (PDF)
Supervised learning of piecewise linear models is a well studied problem in machine learning community. The key idea in piecewise linear modeling is to properly partition the input space and learn a linear model for every partition. Decision trees and regression trees are classic examples of piecewise linear models for classification and regression problems. The existing approaches for learning decision/regression trees can be broadly classified in to two classes, namely, fixed structure approaches and greedy approaches. In the fixed structure approaches, tree structure is fixed before hand by fixing the number of non leaf nodes, height of the tree and paths from root node to every leaf node of the tree. Mixture of experts and hierarchical mixture of experts are examples of fixed structure approaches for learning piecewise linear models. Parameters of the models are found using, e.g., maximum likelihood estimation, for which expectation maximization(EM) algorithm can be used. Fixed structure piecewise linear models can also be learnt using risk minimization under an appropriate loss function. Learning an optimal decision tree using fixed structure approach is a hard problem. Constructing an optimal binary decision tree is known to be NP Complete. On the other hand, greedy approaches do not assume any parametric form or any fixed structure for the decision tree classifier. Most of the greedy approaches learn tree structured piecewise linear models in a top down fashion. These are built by binary or multi-way recursive partitioning of the input space. The main issues in top down decision tree induction is to choose an appropriate objective function to rate the split rules. The objective function should be easy to optimize. Top-down decision trees are easy to implement and understand, but there are no optimality guarantees due to their greedy nature. Regression trees are built in the similar way as decision trees. In regression trees, every leaf node is associated with a linear regression function. All piece wise linear modeling techniques deal with two main tasks, namely, partitioning of the input space and learning a linear model for every partition. However, Partitioning of the input space and learning linear models for different partitions are not independent problems. Simultaneous optimal estimation of partitions and learning linear models for every partition, is a combinatorial problem and hence computationally hard. However, piecewise linear models provide better insights in to the classification or regression problem by giving explicit representation of the structure in the data. The information captured by piecewise linear models can be summarized in terms of simple rules, so that, they can be used to analyze the properties of the domain from which the data originates. These properties make piecewise linear models, like decision trees and regression trees, extremely useful in many data mining applications and place them among top data mining algorithms. In this thesis, we address the problem of supervised learning of piecewise linear models for classification and regression. We propose novel algorithms for learning piecewise linear classifiers and regression functions. We also address the problem of noise tolerant learning of classifiers in presence of label noise. We propose a novel algorithm for learning polyhedral classifiers which are the simplest form of piecewise linear classifiers. Polyhedral classifiers are useful when points of positive class fall inside a convex region and all the negative class points are distributed outside the convex region. Then the region of positive class can be well approximated by a simple polyhedral set. The key challenge in optimally learning a fixed structure polyhedral classifier is to identify sub problems, where each sub problem is a linear classification problem. This is a hard problem and identifying polyhedral separability is known to be NP complete. The goal of any polyhedral learning algorithm is to efficiently handle underlying combinatorial problem while achieving good classification accuracy. Existing methods for learning a fixed structure polyhedral classifier are based on solving non convex constrained optimization problems. These approaches do not efficiently handle the combinatorial aspect of the problem and are computationally expensive. We propose a method of model based estimation of posterior class probability to learn polyhedral classifiers. We solve an unconstrained optimization problem using a simple two step algorithm (similar to EM algorithm) to find the model parameters. To the best of our knowledge, this is the first attempt to form an unconstrained optimization problem for learning polyhedral classifiers. We then modify our algorithm to find the number of required hyperplanes also automatically. We experimentally show that our approach is better than the existing polyhedral learning algorithms in terms of training time, performance and the complexity. Most often, class conditional densities are multimodal. In such cases, each class region may be represented as a union of polyhedral regions and hence a single polyhedral classifier is not sufficient. To handle such situation, a generic decision tree is required. Learning optimal fixed structure decision tree is a computationally hard problem. On the other hand, top-down decision trees have no optimality guarantees due to the greedy nature. However, top-down decision tree approaches are widely used as they are versatile and easy to implement. Most of the existing top-down decision tree algorithms (CART,OC1,C4.5, etc.) use impurity measures to assess the goodness of hyper planes at each node of the tree. These measures do not properly capture the geometric structures in the data. We propose a novel decision tree algorithm that ,at each node, selects hyperplanes based on an objective function which takes into consideration geometric structure of the class regions. The resulting optimization problem turns out to be a generalized eigen value problem and hence is efficiently solved. We show through empirical studies that our approach leads to smaller size trees and better performance compared to other top-down decision tree approaches. We also provide some theoretical justification for the proposed method of learning decision trees. Piecewise linear regression is similar to the corresponding classification problem. For example, in regression trees, each leaf node is associated with a linear regression model. Thus the problem is once again that of (simultaneous) estimation of optimal partitions and learning a linear model for each partition. Regression trees, hinge hyperplane method, mixture of experts are some of the approaches to learn continuous piecewise linear regression models. Many of these algorithms are computationally intensive. We present a method of learning piecewise linear regression model which is computationally simple and is capable of learning discontinuous functions as well. The method is based on the idea of K plane regression that can identify a set of linear models given the training data. K plane regression is a simple algorithm motivated by the philosophy of k means clustering. However this simple algorithm has several problems. It does not give a model function so that we can predict the target value for any given input. Also, it is very sensitive to noise. We propose a modified K plane regression algorithm which can learn continuous as well as discontinuous functions. The proposed algorithm still retains the spirit of k means algorithm and after every iteration it improves the objective function. The proposed method learns a proper Piece wise linear model that can be used for prediction. The algorithm is also more robust to additive noise than K plane regression. While learning classifiers, one normally assumes that the class labels in the training data set are noise free. However, in many applications like Spam filtering, text classification etc., the training data can be mislabeled due to subjective errors. In such cases, the standard learning algorithms (SVM, Adaboost, decision trees etc.) start over fitting on the noisy points and lead to poor test accuracy. Thus analyzing the vulnerabilities of classifiers to label noise has recently attracted growing interest from the machine learning community. The existing noise tolerant learning approaches first try to identify the noisy points and then learn classifier on remaining points. In this thesis, we address the issue of developing learning algorithms which are inherently noise tolerant. An algorithm is inherently noise tolerant if, the classifier it learns with noisy samples would have the same performance on test data as that learnt from noise free samples. Algorithms having such robustness (under suitable assumption on the noise) are attractive for learning with noisy samples. Here, we consider non uniform label noise which is a generic noise model. In non uniform label noise, the probability of the class label for an example being incorrect, is a function of the feature vector of the example.(We assume that this probability is less than 0.5 for all feature vectors.) This can account for most cases of noisy data sets. There is no provably optimal algorithm for learning noise tolerant classifiers in presence of non uniform label noise. We propose a novel characterization of noise tolerance of an algorithm. We analyze noise tolerance properties of risk minimization frame work as risk minimization is a common strategy for classifier learning. We show that risk minimization under 01 loss has the best noise tolerance properties. None of the other convex loss functions have such noise tolerance properties. Empirical risk minimization under 01 loss is a hard problem as 01 loss function is not differentiable. We propose a gradient free stochastic optimization technique to minimize risk under 01 loss function for noise tolerant learning of linear classifiers. We show (under some conditions) that the algorithm converges asymptotically to the global minima of the risk under 01 loss function. We illustrate the noise tolerance of our algorithm through simulations experiments. We demonstrate the noise tolerance of the algorithm through simulations.
518

Método de estimativa de temperaturas mínimas e máximas médias mensais climatológicas do ar no Rio Grande do Sul / Method to estimate the air average monthly maximun and minimun climatological temperatures in the state of Rio Grande do Sul

Pimentel, Maria da Graça Pereira, Pimentel, Maria da Graça Pereira 18 February 2007 (has links)
Submitted by Aline Batista (alinehb.ufpel@gmail.com) on 2018-06-21T22:25:06Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Dissertacao_Maria_da_Graca_Pereira_Pimentel.pdf: 1105022 bytes, checksum: a8ce676040253811180c0351efada9a1 (MD5) / Approved for entry into archive by Aline Batista (alinehb.ufpel@gmail.com) on 2018-06-21T22:44:49Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Dissertacao_Maria_da_Graca_Pereira_Pimentel.pdf: 1105022 bytes, checksum: a8ce676040253811180c0351efada9a1 (MD5) / Made available in DSpace on 2018-06-21T22:44:49Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Dissertacao_Maria_da_Graca_Pereira_Pimentel.pdf: 1105022 bytes, checksum: a8ce676040253811180c0351efada9a1 (MD5) Previous issue date: 2007-02-18 / Sem bolsa / Um modelo de equações de regressão linear múltiplas, foi utilizado para realização de estimativas das temperaturas mínimas (Tmin) e máximas (Tmax) médias mensais do ar. As variáveis independentes adotadas como preditores, foram latitude, longitude e altitude e valores de temperaturas mínimas e máximas médias mensais da ar de noventa anos de dados (1913-1992) de quarenta estações meteorológicas de Estado do Rio Grande do Sul, como variáveis dependentes ou preditantes. A altitude influi significativamente na determinação das temperaturas mínimas e máximas. A latitude contribui em ambas de forma significativa mas atua mais intensamente sobre as temperaturas máximas. A longitude, é um fator pouco relevante e praticamente negligenciável sobre as temperaturas mínimas mas sobre as temperaturas máximas é mais intenso. O modelo faz boas estimativas das temperaturas mínimas e máximas, com exceção de alguns meses, e para algumas estações, nas quais o efeito da oceanidade/continentalidade se manifesta. Os baixos dos coeficientes de correlação para as temperaturas máximas nas sub-séries 30 A, 30 B e 30 C indicam que as estimativas mais eficientes devem considerar séries longas de dados. O modelo realizou boas estimativas das temperaturas mínimas e máximas médias mensais apresentando pequenos erros absolutos. / Multiple linear regression model equations, to estimate the average monthly minimum (Tmin) and maximum air temperature (Tmax), were established using latitude, longitude and altitude as independent variables and 90 years (1913-2002) of minimum and maximun average monthly air temperature data from forty meteorological stations, situated in the State of Rio Grande do Sul (Brazil) as dependent variables. The altitude shows a great influence in minimum and maximum air temperature. The latitude makes contribution in both, but more strongly over maximum temperatures. The longitude, is a low relevant factor and practically negligenceable about the minimum temperatures but in the maximum temperarures is more relevant. The model make very good estimations of minimum and maximum temperatures, except for some months, and to some meteorological stations in which the continental-oceanic effect are present. The smaller values of correlations coefficients of the maximum temperatures in years interval 30 A, 30 B and 30 C subseries denote that the more efficient estimations have to include long series of data.
519

Paralelização de algoritmos APS e Firefly para seleção de variáveis em problemas de calibração multivariada / Parallelization of APF and Firefly algorithms for variable selection in multivariate calibration problems

Paula, Lauro Cássio Martins de 15 July 2014 (has links)
Submitted by Jaqueline Silva (jtas29@gmail.com) on 2014-10-21T18:36:43Z No. of bitstreams: 2 Dissertação - Lauro Cássio Martins de Paula - 2014.pdf: 2690755 bytes, checksum: 3f2c0a7c51abbf9cd88f38ffbe54bb67 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Approved for entry into archive by Jaqueline Silva (jtas29@gmail.com) on 2014-10-21T18:37:00Z (GMT) No. of bitstreams: 2 Dissertação - Lauro Cássio Martins de Paula - 2014.pdf: 2690755 bytes, checksum: 3f2c0a7c51abbf9cd88f38ffbe54bb67 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2014-10-21T18:37:00Z (GMT). No. of bitstreams: 2 Dissertação - Lauro Cássio Martins de Paula - 2014.pdf: 2690755 bytes, checksum: 3f2c0a7c51abbf9cd88f38ffbe54bb67 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2014-07-15 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The problem of variable selection is the selection of attributes for a given sample that best contribute to the prediction of the property of interest. Traditional algorithms as Successive Projections Algorithm (APS) have been quite used for variable selection in multivariate calibration problems. Among the bio-inspired algorithms, we note that the Firefly Algorithm (AF) is a newly proposed method with potential application in several real world problems such as variable selection problem. The main drawback of these tasks lies in them computation burden, as they grow with the number of variables available. The recent improvements of Graphics Processing Units (GPU) provides to the algorithms a powerful processing platform. Thus, the use of GPUs often becomes necessary to reduce the computation time of the algorithms. In this context, this work proposes a GPU-based AF (AF-RLM) for variable selection using multiple linear regression models (RLM). Furthermore, we present two APS implementations, one using RLM (APSRLM) and the other sequential regressions (APS-RS). Such implementations are aimed at improving the computational efficiency of the algorithms. The advantages of the parallel implementations are demonstrated in an example involving a large number of variables. In such example, gains of speedup were obtained. Additionally we perform a comparison of AF-RLM with APS-RLM and APS-RS. Based on the results obtained we show that the AF-RLM may be a relevant contribution for the variable selection problem. / O problema de seleção de variáveis consiste na seleção de atributos de uma determinada amostra que melhor contribuem para a predição da propriedade de interesse. O Algoritmo das Projeções Sucessivas (APS) tem sido bastante utilizado para seleção de variáveis em problemas de calibração multivariada. Entre os algoritmos bioinspirados, nota-se que o Algoritmo Fire f ly (AF) é um novo método proposto com potencial de aplicação em vários problemas do mundo real, tais como problemas de seleção de variáveis. A principal desvantagem desses dois algoritmos encontra-se em suas cargas computacionais, conforme seu tamanho aumenta com o número de variáveis. Os avanços recentes das Graphics Processing Units (GPUs) têm fornecido para os algoritmos uma poderosa plataforma de processamento e, com isso, sua utilização torna-se muitas vezes indispensável para a redução do tempo computacional. Nesse contexto, este trabalho propõe uma implementação paralela em GPU de um AF (AF-RLM) para seleção de variáveis usando modelos de Regressão Linear Múltipla (RLM). Além disso, apresenta-se duas implementações do APS, uma utilizando RLM (APS-RLM) e uma outra que utiliza a estratégia de Regressões Sequenciais (APS-RS). Tais implementações visam melhorar a eficiência computacional dos algoritmos. As vantagens das implementações paralelas são demonstradas em um exemplo envolvendo um número relativamente grande de variáveis. Em tal exemplo, ganhos de speedup foram obtidos. Adicionalmente, realiza-se uma comparação do AF-RLM com o APS-RLM e APS-RS. Com base nos resultados obtidos, mostra-se que o AF-RLM pode ser uma contribuição relevante para o problema de seleção de variáveis.
520

Modelagem matemática da variação espaço-temporal da temperatura média diária e do ciclo do algodoeiro herbáceo no Estado de Goiás / Mathematical modeling of the spatio-temporal variation of the daily average temperature and of the herbaceous cotton cycle in the State of Goiás-Brazil

ANTONINI, Jorge Cesar dos Anjos 14 August 2009 (has links)
Made available in DSpace on 2014-07-29T14:52:09Z (GMT). No. of bitstreams: 1 tese jorge antonini.pdf: 527031 bytes, checksum: f95363ba3889fe76020a491c3ec28afe (MD5) Previous issue date: 2009-08-14 / The regional climatic conditions in the State of Goias Brazil are favorable for herbaceous cotton (Gossypium hirsutum L. r. latifolium hutch) cultivation. However, for achieving the best productivities, it is important that both the planting date and the fruiting period be matched with the adequate soil-water availability as well as that the period from open bull to harvesting be coincident with the dry period. Thus, the knowledge of cotton cycle as function of planting location is very important for choosing the optimum planting date. In this context, the air temperature is one of the climatic variables that mostly influence the cotton growth. Nevertheless, the low density of meteorological stations with capability for measuring temperature has restricted the modeling studies for estimating cotton cycle. This work was carried out with the objective of developing and validating mathematical models to estimate average daily air temperature and based on the degreesday theory, the cycle of herbaceous cotton in the State of Goias, considering altogether its variations in space and time. Both models were based on a linear combination of elevation, latitude, longitude, and the daily time variation, represented by an incomplete Fourier series. The parameter models were adjusted to the data from 21 meteorological stations available in the State of Goiás and Federal District of Brazil, using multiple linear regressions with observations varying from eight to twenty four years. In the case of modeling degrees-day, the maximum and minimum temperature data were limited between 15°C and 40°C, which were taken as the lower and upper threshold temperatures, respectively. The air-temperature model was validated against the measured data from three meteorological stations from different elevations: high (1100 m), medium (554 m) and low (431 m). The coefficients of determination obtained from fitting the models for both daily air-temperature and daily degrees-day were 0.82 and 0.84, respectively, resulting in a medium performance for both low and high altitudes and very good for intermediate altitudes. The validation of the degrees-day model was conducted by comparing the period duration running from crop emergence to 90% open bulls observed from cotton cultivars, cropped in commercial fields. The results showed an overall performance index of 0.85, which was considered as very good. The models developed in this study adequately estimated the average daily air temperature and the cycle of herbaceous cotton cultivars in the State of Goias / As condições climáticas regionais do Estado de Goiás são favoráveis ao cultivo do algodoeiro herbáceo (Gossypium hirsutum L. r. latifolium hutch), contudo, para alcançar as melhores produtividades, é imprescindível que a semeadura e o período de frutificação coincidam com o período de maior disponibilidade de água no solo e os períodos de abertura dos capulhos e da colheita coincidam com o período seco. Assim, o conhecimento acerca do ciclo da cultura em função do local de cultivo, é muito importante na definição da melhor época de plantio. Nesse contexto, é a temperatura do ar, uma das variáveis climáticas, que mais influencia o desenvolvimento do algodoeiro. No entanto, a baixa densidade de estações meteorológicas com capacidade de medição da temperatura tem limitado os estudos de modelagem de estimativa do ciclo deste cultivo. Este trabalho foi conduzido com o objetivo de desenvolver e validar modelos matemáticos para estimar a temperatura média diária do ar e, com base na teoria de graus-dia, o ciclo do algodoeiro herbáceo no Estado de Goiás, considerando, simultaneamente, suas variações no espaço e no tempo. Ambos os modelos basearam-se em uma combinação linear da altitude, latitude, longitude e da variação temporal diária, representada pela série trigonométrica incompleta de Fourier. Os parâmetros dos modelos foram ajustados aos dados de 21 estações meteorológicas disponíveis no Estado de Goiás e Distrito Federal, por meio de regressão linear múltipla, com observações variando de 8 a 24 anos. No caso da modelagem de graus-dia, os dados de temperatura máxima e mínima ficaram restritos ao intervalo de 15°C a 40°C, cujos limites foram adotados como os valores de temperatura de base inferior e superior, respectivamente. O modelo de temperatura foi validado, considerando os dados observados de temperatura em estações localizadas em condições de altitudes diferentes: elevada (1100 m), média (554 m) e baixa (431 m). Os coeficientes de determinação resultantes do ajuste dos modelos aos dados de temperatura média diária ou aos de grausdia foram 0,82 e 0,84, respectivamente. O desempenho do modelo foi mediano nas altitudes baixas e elevadas e muito bom nas altitudes médias. A validação do modelo de graus-dia foi feita comparando-se a duração observada do período entre a emergência e 90% de capulhos abertos de cultivares de algodoeiro, plantados em lavouras comerciais, resultando em um índice de desempenho de 0,85, classificado como muito bom. Os modelos desenvolvidos estimaram adequadamente a temperatura média diária do ar e a duração do ciclo dos cultivares de algodoeiro herbáceo no Estado de Goiás

Page generated in 0.0608 seconds