• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 64
  • 22
  • 9
  • 7
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 132
  • 27
  • 26
  • 20
  • 17
  • 16
  • 16
  • 13
  • 13
  • 13
  • 13
  • 12
  • 12
  • 11
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Effectivisation of an Industrial Painting Process : A discrete event approach to modeling and analysing the painting process at Volvo GTO Umeå / Analys och modellering av en industriell målningsprocess

Alishev, Boris, Kågström, Oskar January 2022 (has links)
For any manufacturing process, one of the key challenges after a solid foundation has been built is how improvements can be made. Management has to consider how possible changes will affect both the process as a whole in addition to every individual part before implementation. The groundwork for this is to have a clear overview of every part and the possibility to investigate effects of changes. This thesis thus aims to provide a clear overview of the complex painting process at Volvo GTO in Umeå and a template for investigating how differently implemented changes will affect the process. The means for doing this is to use statistics, modeling and discrete event simulation. Modeling shall provide an approximate recreation of reality and the subsequent analysis shall take into account similarities and differences to estimate the effects of changes. Recreation of real-world data and variability is based on bootstrap resampling for multiple independent weeks of observations. Results obtained from simulation are compared to observed data in order to validate the model and investigate discrepancies. Given the results of model validation, modifications are implemented and information obtained from model validation is used to evaluate the results of the modifications. Further, strengths and weaknesses of the thesis are presented and a recommendation of altering the stance on process improvements is provided to Volvo GTO.
32

A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique

Pahren, Laura 23 August 2022 (has links)
No description available.
33

A Computational Approach To Nonparametric Regression: Bootstrapping Cmars Method

Yazici, Ceyda 01 September 2011 (has links) (PDF)
Bootstrapping is a resampling technique which treats the original data set as a population and draws samples from it with replacement. This technique is widely used, especially, in mathematically intractable problems. In this study, it is used to obtain the empirical distributions of the parameters to determine whether they are statistically significant or not in a special case of nonparametric regression, Conic Multivariate Adaptive Regression Splines (CMARS). Here, the CMARS method, which uses conic quadratic optimization, is a modified version of a well-known nonparametric regression model, Multivariate Adaptive Regression Splines (MARS). Although performing better with respect to several criteria, the CMARS model is more complex than that of MARS. To overcome this problem, and to improve the CMARS performance further, three different bootstrapping regression methods, namely, Random-X, Fixed-X and Wild Bootstrap are applied on four data sets with different size and scale. Then, the performances of the models are compared using various criteria including accuracy, precision, complexity, stability, robustness and efficiency. Random-X yields more precise, accurate and less complex models particularly for medium size and medium scale data even though it is the least efficient method.
34

Resampling-based tuning of ordered model selection

Willrich, Niklas 02 December 2015 (has links)
In dieser Arbeit wird die Smallest-Accepted Methode als neue Lepski-Typ Methode für Modellwahl im geordneten Fall eingeführt. In einem ersten Schritt wird die Methode vorgestellt und im Fall von Schätzproblemen mit bekannter Fehlervarianz untersucht. Die Hauptkomponenten der Methode sind ein Akzeptanzkriterium, basierend auf Modellvergleichen für die eine Familie von kritischen Werten mit einem Monte-Carlo-Ansatz kalibriert wird, und die Wahl des kleinsten (in Komplexität) akzeptierten Modells. Die Methode kann auf ein breites Spektrum von Schätzproblemen angewandt werden, wie zum Beispiel Funktionsschätzung, Schätzung eines linearen Funktionals oder Schätzung in inversen Problemen. Es werden allgemeine Orakelungleichungen für die Methode im Fall von probabilistischem Verlust und einer polynomialen Verlustfunktion gezeigt und Anwendungen der Methode in spezifischen Schätzproblemen werden untersucht. In einem zweiten Schritt wird die Methode erweitert auf den Fall einer unbekannten, möglicherweise heteroskedastischen Fehlerstruktur. Die Monte-Carlo-Kalibrierung wird durch eine Bootstrap-basierte Kalibrierung ersetzt. Eine neue Familie kritischer Werte wird eingeführt, die von den (zufälligen) Beobachtungen abhängt. In Folge werden die theoretischen Eigenschaften dieser Bootstrap-basierten Smallest-Accepted Methode untersucht. Es wird gezeigt, dass unter typischen Annahmen unter normalverteilten Fehlern für ein zugrundeliegendes Signal mit Hölder-Stetigkeits-Index s > 1/4 und log(n) (p^2/n) klein, wobei n hier die Anzahl der Beobachtungen und p die maximale Modelldimension bezeichnet, die Anwendung der Bootstrap-Kalibrierung anstelle der Monte-Carlo-Kalibrierung theoretisch gerechtfertigt ist. / In this thesis, the Smallest-Accepted method is presented as a new Lepski-type method for ordered model selection. In a first step, the method is introduced and studied in the case of estimation problems with known noise variance. The main building blocks of the method are a comparison-based acceptance criterion relying on Monte-Carlo calibration of a set of critical values and the choice of the model as the smallest (in complexity) accepted model. The method can be used on a broad range of estimation problems like function estimation, estimation of linear functionals and inverse problems. General oracle results are presented for the method in the case of probabilistic loss and for a polynomial loss function. Applications of the method to specific estimation problems are studied. In a next step, the method is extended to the case of an unknown possibly heteroscedastic noise structure. The Monte-Carlo calibration step is now replaced by a bootstrap-based calibration. A new set of critical values is introduced, which depends on the (random) observations. Theoretical properties of this bootstrap-based Smallest-Accepted method are then studied. It is shown for normal errors under typical assumptions, that the replacement of the Monte-Carlo step by bootstrapping in the Smallest-Accepted method is valid, if the underlying signal is Hölder-continuous with index s > 1/4 and log(n) (p^2/n) is small for a sample size n and a maximal model dimension p.
35

Optimising evolutionary strategies for problems with varying noise strength

Di Pietro, Anthony January 2007 (has links)
For many real-world applications of evolutionary computation, the fitness function is obscured by random noise. This interferes with the evaluation and selection processes and adversely affects the performance of the algorithm. Noise can be effectively eliminated by averaging a large number of fitness samples for each candidate, but the number of samples used per candidate (the resampling rate) required to achieve this is usually prohibitively large and time-consuming. Hence there is a practical need for algorithms that handle noise without eliminating it. Moreover, the amount of noise (noise strength and distribution) may vary throughout the search space, further complicating matters. We study noisy problems for which the noise strength varies throughout the search space. Such problems have generally been ignored by previous work, which has instead generally focussed on the specific case where the noise strength is the same at all points in the search domain. However, this need not be the case, and indeed this assumption is false for many applications. For example, in games of chance such as Poker, some strategies may be more conservative than others and therefore less affected by the inherent noise of the game. This thesis makes three significant contributions in the field of noisy fitness functions: We present the concept of dynamic resampling. Dynamic resampling is a technique that varies the resampling rate based on the noise strength and fitness for each candidate individually. This technique is designed to exploit the variation in noise strength and fitness to yield a more efficient algorithm. We present several dynamic resampling algorithms and give results that show that dynamic resampling can perform significantly better than the standard resampling technique that is usually used by the optimisation community, and that dynamic resampling algorithms that vary their resampling rates based on both noise strength and fitness can perform better than algorithms that vary their resampling rate based on only one of the above. We study a specific class of noisy fitness functions for which we counterintuitively find that it is better to use a higher resampling rate in regions of lower noise strength, and vice versa. We investigate how the evolutionary search operates on such problems, explain why this is the case, and present a hypothesis (with supporting evidence) for classifying such problems. We present an adaptive engine that automatically tunes the noise compensation parameters of the search during the run, thereby eliminating the need for the user to choose these parameters ahead of time. This means that our techniques can be readily applied to real-world problems without requiring the user to have specialised domain knowledge of the problem that they wish to solve. These three major contributions present a significant addition to the body of knowledge for noisy fitness functions. Indeed, this thesis is the first work specifically to examine the implications of noise strength that varies throughout the search domain for a variety of noise landscapes, and thus starts to fill a large void in the literature on noisy fitness functions.
36

Active learning in cost-sensitive environments

Liu, Alexander Yun-chung 21 June 2010 (has links)
Active learning techniques aim to reduce the amount of labeled data required for a supervised learner to achieve a certain level of performance. This can be very useful in domains where unlabeled data is easy to obtain but labelling data is costly. In this dissertation, I introduce methods of creating computationally efficient active learning techniques that handle different misclassification costs, different evaluation metrics, and different label acquisition costs. This is accomplished in part by developing techniques from utility-based data mining typically not studied in conjunction with active learning. I first address supervised learning problems where labeled data may be scarce, especially for one particular class. I revisit claims about resampling, a particularly popular approach to handling imbalanced data, and cost-sensitive learning. The presented research shows that while resampling and cost-sensitive learning can be equivalent in some cases, the two approaches are not identical. This work on resampling and cost-sensitive learning motivates a need for active learners that can handle different misclassification costs. After presenting a cost-sensitive active learning algorithm, I show that this algorithm can be combined with a proposed framework for analyzing evaluation metrics in order to create an active learning approach that can optimize any evaluation metric that can be expressed as a function of terms in a confusion matrix. Finally, I address methods for active learning in terms of different utility costs incurred when labeling different types of points, particularly when label acquisition costs are spatially driven. / text
37

Model adaptation techniques in machine translation

Shah, Kashif 29 June 2012 (has links) (PDF)
Nowadays several indicators suggest that the statistical approach to machinetranslation is the most promising. It allows fast development of systems for anylanguage pair provided that sufficient training data is available.Statistical Machine Translation (SMT) systems use parallel texts ‐ also called bitexts ‐ astraining material for creation of the translation model and monolingual corpora fortarget language modeling.The performance of an SMT system heavily depends upon the quality and quantity ofavailable data. In order to train the translation model, the parallel texts is collected fromvarious sources and domains. These corpora are usually concatenated, word alignmentsare calculated and phrases are extracted.However, parallel data is quite inhomogeneous in many practical applications withrespect to several factors like data source, alignment quality, appropriateness to thetask, etc. This means that the corpora are not weighted according to their importance tothe domain of the translation task. Therefore, it is the domain of the training resourcesthat influences the translations that are selected among several choices. This is incontrast to the training of the language model for which well‐known techniques areused to weight the various sources of texts.We have proposed novel methods to automatically weight the heterogeneous data toadapt the translation model.In a first approach, this is achieved with a resampling technique. A weight to eachbitexts is assigned to select the proportion of data from that corpus. The alignmentscoming from each bitexts are resampled based on these weights. The weights of thecorpora are directly optimized on the development data using a numerical method.Moreover, an alignment score of each aligned sentence pair is used as confidencemeasurement.In an extended work, we obtain such a weighting by resampling alignments usingweights that decrease with the temporal distance of bitexts to the test set. By thesemeans, we can use all the available bitexts and still put an emphasis on the most recentone. The main idea of our approach is to use a parametric form or meta‐weights for theweighting of the different parts of the bitexts. This ensures that our approach has onlyfew parameters to optimize.In another work, we have proposed a generic framework which takes into account thecorpus and sentence level "goodness scores" during the calculation of the phrase‐tablewhich results into better distribution of probability mass of the individual phrase pairs.
38

Utilização de técnicas multivariadas na análise da divergência genética via modelo AMMI com reamostragem \"bootstrap\" / Use of multivariate techniques in the analysis of genetic diversity through ammi model with bootstrap resampling

Faria, Priscila Neves 01 October 2012 (has links)
Em estudos de divergência genética por métodos multivariados, a distância euclidiana é a medida de distância mais amplamente utilizada e essa distância é a mais recomendada quando as unidades de cálculos são escores de componentes principais, como é o caso da análise AMMI (additive main effects and multiplicative interaction analysis). Tal análise permite a obtenção de estimativas mais precisas das respostas genotípicas e possibilita a análise da divergência genética por métodos aglomerativos. A análise dos modelos AMMI combina, num único modelo, componentes aditivos para os efeitos principais (genótipos e ambientes) e componentes multiplicativos para os efeitos da interação genótipos × ambientes. Os melhoristas de plantas compreendem que a interação genótipos × ambientes é de suma importância para a obtenção de variedades superiores e as estimativas de dissimilaridade atendem aos objetivos do melhorista, por quantificarem e informarem sobre o grau de semelhança ou de diferença entre pares de indivíduos. Entretanto, quando o número de indivíduos é grande, torna-se inviável o reconhecimento de grupos homogêneos pelo exame visual das estimativas de distância. Portanto, é importante proceder à análise de agrupamentos, obter dendrogramas por meio de métodos hierárquicos e posteriormente, analisar os grupos formados. A fim de determinar e classificar os grupos formados na clusterização hierárquica foram utilizados comandos específicos do programa computacional R que desenha no dendrograma os retângulos de cada grupo e os numera. Desta forma, o objetivo deste trabalho foi analisar a divergência genética via modelo AMMI, utilizando-se de técnicas multivariadas e reamostragem \"bootstrap\". / In studies of genetic diversity using multivariate approaches, the Euclidean distance is the most common measure used. This method is recommended when data are scores of principal components, such as in AMMI analysis (additive main effects and multiplicative interaction analysis). The AMMI method allows obtaining more precise estimates for genotypic results and also permits the use of genetic diversity analysis by using agglomerative approaches. Furthermore, this method combines additive components for the main effects (genotypes and environments) and multiplicative components for genotypes x environment interaction effects in a unique model. Plant breeders understand the importance of genotype and environment interaction to obtain superior varieties and the dissimilarity estimation meets breeders\" objectives since it quantifies and determines the similarity or the divergence between pairs of individuals. However, when the number of individuals is large it is unfeasible to recognize the group homogeneity by using a visual analysis of the distances estimation. Therefore, is important to use cluster analysis to obtain dendograms based on hierarchical methods and then analyze the groups obtained. In order to determine and classify the obtained groups from hierarchical cluster analysis specifics commands in the R software were used which shows in the dendrogram rectangles and numbers for each group. In this way, the objective of this work was to analyze the genetic divergence through AMMI model, by using multivariate approaches and \"bootstrap\" resampling.
39

Pré-processamento de dados na identificação de processos industriais. / Pre-processing data in the identification of industrial processes.

Rodríguez Rodríguez, Oscar Wilfredo 01 December 2014 (has links)
Neste trabalho busca-se estudar as diferentes etapas de pre-processamento de dados na identificacao de sistemas, que sao: filtragem, normalizacao e amostragem. O objetivo principal e de acondicionar os dados empiricos medidos pelos instrumentos dos processos industriais, para que quando estes dados forem usados na identificacao de sistemas, se possa obter modelos matematicos que representem da forma mais proxima a dinamica do processo real. Vai-se tambem implementar as tecnicas de pre-processamento de dados no software MatLab 2012b e vai-se fazer testes na Planta Piloto de Vazao instalada no Laboratorio de Controle de Processos Industriais do Departamento de Engenharia de Telecomunicacoes e Controle da Escola Politecnica da USP; bem como em plantas simuladas de processos industriais, em que e conhecido a priori seu modelo matematico. Ao final, vai-se analisar e comparar o desempenho das etapas de pre-processamento de dados e sua influencia no indice de ajuste do modelo ao sistema real (fit), obtido mediante o metodo de validacao cruzada. Os parametros do modelo sao obtidos para predicoes infinitos passos a frente. / This work aims to study the different stages of data pre-processing in system identification, as are: filtering, normalization and sampling. The main goal is to condition the empirical data measured by the instruments of industrial processes, so that when these data are used to identify systems, one can obtain mathematical models that represent more closely the dynamics of the real process. It will also be implemented the techniques of preprocessing of data in MatLab 2012b and it will be performed tests in the Pilot Plant of Flow at the Laboratory of Industrial Process Control, Department of Telecommunications and Control Engineering from the Polytechnic School of USP; as well as with simulated plants of industrial processes where it is known a priori its mathematical model. At the end, it is analyzed and compared the performance of the pre-processing of data and its influence on the index of adjustment of the model to the real system (fit), obtained by the cross validation method. The model parameters are obtained for infinite step-ahead prediction.
40

Padrões de diversidade de aves e rede de interação mutualística ave-planta em mosaico floresta-campo

Casas, Grasiela January 2015 (has links)
Estudos clássicos com diversidade taxonômica, apesar de serem essenciais, não consideram as diferenças funcionais entre as espécies de uma comunidade. A abordagem considerando atributos funcionais e diversidade funcional vem preenchendo esta lacuna. A compreensão da estrutura e dinâmica de interações mutualísticas também é um elemento essencial em estudos de biodiversidade, permitindo a investigação de mecanismos ecológicos e evolutivos. Porém, a maioria dos estudos com redes de interação disponíveis na bibliografia são pequenas em número de espécies e interações, e é possível que estes dados não tenham sido suficientemente amostrados. Além disto, estudos têm mostrado que muitas métricas utilizadas em análises de rede de interação são sensíveis ao esforço amostral e ao tamanho da rede. Os objetivos desta tese foram: 1) investigar a diversidade taxonômica (DT) e funcional (DF) de aves e os padrões de organização de espécies de aves em comunidades refletindo convergência de atributos (TCAP: Trait Convergence Assembly Patterns) ao longo de transições entre floresta e campo; 2) analisar a estrutura de redes de dispersão de sementes de plantas por aves, utilizando as métricas de rede aninhamento, modularidade, conectância e distribuição do grau; 3) desenvolver um método estatístico visando avaliar suficiência amostral para métricas de redes de interação usando o método bootstrap de reamostragem com reposição. A composição de espécies de aves diferiu entre os ambientes, indicando uma substituição de espécies ao longo da transição floresta-borda-campo. DT diferiu significativamente somente entre floresta e borda de floresta, enquanto que ambas diferiram significativamente do campo em relação à DF. DT e DF podem indicar diferentes processos de organização de comunidades ao longo de mosaicos floresta-campo. A correlação significativa entre TCAP e o gradiente floresta-campo indica que provavelmente mecanismos de nicho atuam na organização da comunidade de aves, associados a mudanças na estrutura do habitat ao longo da transição floresta-borda-campo agindo como filtros ecológicos. Redes de dispersão de sementes de plantas por aves aparentemente apresentam um processo comum de organização, independentemente das diferenças na intensidade de amostragem e continentes onde as 19 redes utilizadas foram amostradas. Usando reamostragem bootstrap, encontramos que suficiência amostral pode ser alcançada com diferentes tamanhos amostrais (número de eventos de interação) para o mesmo conjunto de dados, dependendo da métrica de rede utilizada. / Classic studies on taxonomic diversity, though essential, do not consider the functional differences between species in a community. Studies using functional traits and functional diversity are filling this gap. Understanding the structure and dynamics of mutualistic interactions is also essential for biodiversity studies and allows the investigation of ecological and evolutionary mechanisms. However, most networks published are small in the number of species and interactions, and they are likely to be under-sampled. In addition, studies have demonstrated that many network metrics are sensitive to both sampling effort and network size. The aims of this thesis were: 1) to investigate bird taxonomic diversity (TD), functional diversity (FD), and patterns of trait convergence (TCAP: Trait Convergence Assembly Patterns) across forest-grassland transitions; 2) to analyse the structure of seed-dispersal networks between plants and birds using the metrics of nestedness, modularity, connectance and degree distribution; 3) to develop a statistical framework to assess sampling sufficiency for some of the most widely used metrics in network ecology, based on methods of bootstrap resampling. Bird species composition indicated species turnover between forest, forest edge and grassland. Regarding TD, only forest and edges differed. FD was significantly different between grassland and forest, and between grassland and edges. TD and FD responded differently to environmental change from forest to grassland, since they may capture different processes of community assembly along such transitions. Trait-convergence assembly patterns indicated niche mechanisms underlying assembly of bird communities, linked to changes in habitat structure across forest-edge-grassland transitions acting as ecological filters. Seed dispersal mutualistic networks apparently show a common assembly process regardless differences in sampling methodology or continents where the 19 networks were sampled. Using bootstrap resampling we found that sampling sufficiency can be reached at different sample sizes (number of interaction events) for the same dataset, depending on the metric of interest.

Page generated in 0.086 seconds