Spelling suggestions: "subject:"data bransformation"" "subject:"data btransformation""
21 |
Adaptive data models in design / Adaptyvūs duomenų modeliai projektavimePliuskuvienė, Birutė 27 June 2008 (has links)
In the dissertation the adaptation problem of the software whose instability is caused by the changes in primary data contents and structure as well as the algorithms for applied problems implementing solutions to problems of applied nature is examined. The solution to the problem is based on the methodology of adapting models for the data expressed as relational sets. / Disertacijoje nagrinėjama taikomųjų uždavinių sprendimus realizuojančių programinių priemonių, kurių nepastovumą lemia pirminių duomenų turinio, jų struktūrų ir sprendžiamų taikomojo pobūdžio uždavinių algoritmų pokyčiai, adaptavimo problema.
|
22 |
Abordagem clássica e Bayesiana em modelos simétricos transformados aplicados à estimativa de crescimento em altura de Eucalyptus urophylla no Polo Gesseiro do Araripe-PEBARROS, Kleber Napoleão Nunes de Oliveira 22 February 2010 (has links)
Submitted by (ana.araujo@ufrpe.br) on 2016-08-01T17:35:24Z
No. of bitstreams: 1
Kleber Napoleao Nunes de Oliveira Barros.pdf: 2964667 bytes, checksum: a3c757cb7ed16fc9c38b7834b6e0fa29 (MD5) / Made available in DSpace on 2016-08-01T17:35:24Z (GMT). No. of bitstreams: 1
Kleber Napoleao Nunes de Oliveira Barros.pdf: 2964667 bytes, checksum: a3c757cb7ed16fc9c38b7834b6e0fa29 (MD5)
Previous issue date: 2010-02-22 / It is presented in this work the growth model nonlinear Chapman-Richards with distribution of errors following the new class of symmetric models processed and Bayesian inference for the parameters. The objective was to apply this structure, via Metropolis-Hastings algorithm, in order to select the equation that best predicted heights of clones of Eucalyptus urophilla experiment established at the Agronomic Institute of Pernambuco (IPA) in the city of Araripina . The Gypsum Pole of Araripe is an industrial zone, located on the upper interior of Pernambuco, which consumes large amount of wood from native vegetation (caatinga) for calcination of gypsum. In this scenario, there is great need for a solution, economically and environmentally feasible that allows minimizing the pressure on native vegetation. The generus Eucalyptus presents itself as an alternative for rapid development and versatility. The height has proven to be an important factor in prognosis of productivity and selection of clones best adapted. One of the main growth curves, is the Chapman-Richards model with normal distribution for errors. However, some alternatives have been proposed in order to reduce the influence of atypical observations generated by this model. The data were taken from a plantation, with 72 months. Were performed inferences and diagnostics for processed and unprocessed model with many distributions symmetric. After selecting the best equation, was shown some convergence of graphics and other parameters that show the fit to the data model transformed symmetric Student’s t with 5 degrees of freedom in the parameters using Bayesian inference. / É abordado neste trabalho o modelo de crescimento não linear de Chapman-Richards com distribuição dos erros seguindo a nova classe de modelos simétricos transformados e inferência Bayesiana para os parâmetros. O objetivo foi aplicar essa estrutura, via algoritmo de Metropolis-Hastings, afim de selecionar a equação que melhor estimasse as alturas de clones de Eucalyptus urophilla provenientes de experimento implantado no Instituto Agronômico de Pernambuco (IPA), na cidade de Araripina. O Polo Gesseiro do Araripe é uma zona industrial, situada no alto sertão pernambucano, que consume grande quantidade de lenha proveniente da vegetação nativa (caatinga) para calcinação da gipsita. Nesse cenário, há grande necessidade de uma solução, econômica e ambientalmente, viável que possibilite uma minimização da pressão sobre a flora nativa. O gênero Eucalyptus se apresenta como alternativa, pelo seu rápido desenvolvimento e versatilidade. A altura tem se revelado fator importante na prognose de produtividade e seleção de clones melhores adaptados. Uma das principais curvas de crescimento, é o modelo de Chapman- Richards com distribuição normal para os erros. No entanto, algumas alternativas tem sido propostas afim de reduzir a influência de observações atípicas geradas por este modelo. Os dados foram retirados de uma plantação, com 72 meses. Foram realizadas as inferências e diagnósticos para modelo transformado e não transformado com diversas distribuições simétricas. Após a seleção da melhor equação, foram mostrados alguns gráficos da convergência dos parâmetros e outros que comprovam o ajuste aos dados do modelo simétrico transformado t de Student com 5 graus de liberdade utilizando inferência Bayesiana nos parâmetros.
|
23 |
Transformace dat pomocí evolučních algoritmů / Evolutionary Algorithms for Data TransformationŠvec, Ondřej January 2017 (has links)
In this work, we propose a novel method for a supervised dimensionality reduc- tion, which learns weights of a neural network using an evolutionary algorithm, CMA-ES, optimising the success rate of the k-NN classifier. If no activation func- tions are used in the neural network, the algorithm essentially performs a linear transformation, which can also be used inside of the Mahalanobis distance. There- fore our method can be considered to be a metric learning algorithm. By adding activations to the neural network, the algorithm can learn non-linear transfor- mations as well. We consider reductions to low-dimensional spaces, which are useful for data visualisation, and demonstrate that the resulting projections pro- vide better performance than other dimensionality reduction techniques and also that the visualisations provide better distinctions between the classes in the data thanks to the locality of the k-NN classifier. 1
|
24 |
Att hitta en nål i en höstack: Metoder och tekniker för att sålla och gradera stora mängder ostrukturerad textdataPettersson, Emeli, Carlson, Albin January 2019 (has links)
Big Data är i dagsläget ett populärt ämne som kan användas för en mängd olika syften. Bland annat kan det användas för att analysera data på webben i hopp om att identifiera brott mot mänskliga rättigheter. Genom att tillämpa tekniker inom områden som Artificiell Intelligens (AI), Information Retrieval (IR) samt data- visualisering, hoppas företaget Globalworks AB kunna identifiera röster vilka uttrycker sig om förtryck och kränkningar i social media. Artificiell intelligens och informationshämtning är dock breda områden och forskning som behandlar dem kan finnas långt tillbaka i tiden. Vi har därför valt att utföra en systematisk litteraturstudie i syfte att kartlägga existerande forskning inom dessa områden. Med en litterär sammanställning bistår vi med en ontologisk överblick i hur ett system som använder dessa tekniker är strukturerat, med vilka metoder och teknologier ett sådant system kan utvecklas, samt hur dessa kan kombineras. / Big Data is a popular topic these days which can be utilized for numerous purposes. It can, for instance, be used in order to analyse data made available online in hopes of identifying violations against human rights. By applying techniques within such areas as Artificial Intelligence (AI), Information Retrieval (IR), and Visual Analytics, the company Globalworks Ltd. aims to identify single voices in social media expressing grievances concerning such violations. Artificial Intelligence and Information Retrieval are broad topics however, and have been an active area of research for quite some time. We have therefore chosen to conduct a systematic literature review in hopes of mapping together existing research covering these areas. By presenting a literary compilation, we provide an ontological view of how an information system utilizing techniques within these areas could be structured, in addition to how such a system could deploy said techniques.
|
25 |
Knowledge Base Augmentation from Spreadsheet Data : Combining layout inference with multimodal candidate classificationHeyder, Jakob Wendelin January 2020 (has links)
Spreadsheets compose a valuable and notably large dataset of documents within many enterprise organizations and on the Web. Although spreadsheets are intuitive to use and equipped with powerful functionalities, extraction and transformation of the data remain a cumbersome and mostly manual task. The great flexibility they provide to the user results in data that is arbitrarily structured and hard to process for other applications. In this paper, we propose a novel architecture that combines supervised layout inference and multimodal candidate classification to allow knowledge base augmentation from arbitrary spreadsheets. In our design, we consider the need for repairing misclassifications and allow for verification and ranking of ambiguous candidates. We evaluate the performance of our system on two datasets, one with single-table spreadsheets, another with spreadsheets of arbitrary format. The evaluation result shows that the proposed system achieves similar performance on single-table spreadsheets compared to state-of-the-art rule-based solutions. Additionally, the flexibility of the system allows us to process arbitrary spreadsheet formats, including horizontally and vertically aligned tables, multiple worksheets, and contextualizing metadata. This was not possible with existing purely text-based or table-based solutions. The experiments demonstrate that it can achieve high effectiveness with an F1 score of 95.71 on arbitrary spreadsheets that require the interpretation of surrounding metadata. The precision of the system can be further increased by applying candidate schema-matching based on semantic similarity of column headers. / Kalkylblad består av ett värdefullt och särskilt stort datasätt av dokument inom många företagsorganisationer och på webben. Även om kalkylblad är intuitivt att använda och är utrustad med kraftfulla funktioner, utvinning och transformation av data är fortfarande en besvärlig och manuell uppgift. Den stora flexibiliteten som de ger användaren resulterar i data som är godtyckligt strukturerade och svåra att bearbeta för andra applikationer. I det här förslaget föreslår vi en ny arkitektur som kombinerar övervakad layoutinferens och multimodal kandidatklassificering för att tillåta kunskapsbasförstärkning från godtyckliga kalkylblad. I vår design överväger vi behovet av att reparera felklassificeringar och möjliggöra verifiering och rangordning av tvetydiga kandidater. Vi utvärderar systemets utförande på två datasätt, en med singeltabellkalkylblad, en annan med kalkylblad av godtyckligt format. Utvärderingsresultatet visar att det föreslagna systemet uppnår liknande prestanda på singel-tabellkalkylblad jämfört med state-of-the-art regelbaserade lösningar. Dessutom tillåter systemets flexibilitet oss att bearbeta godtyckliga kalkylark format, inklusive horisontella och vertikala inriktade tabeller, flera kalkylblad och sammanhangsförande metadata. Detta var inte möjligt med existerande rent textbaserade eller tabellbaserade lösningar. Experimenten visar att det kan uppnå hög effektivitet med en F1-poäng på 95.71 på godtyckliga kalkylblad som kräver tolkning av omgivande metadata. Systemets precision kan ökas ytterligare genom att applicera schema-matchning av kandidater baserat på semantisk likhet mellan kolumnrubriker.
|
26 |
電路設計中電流值之罕見事件的統計估計探討 / A study of statistical method on estimating rare event in IC Current彭亞凌, Peng, Ya Ling Unknown Date (has links)
距離期望值4至6倍標準差以外的罕見機率電流值,是當前積體電路設計品質的關鍵之一,但隨著精確度的標準提升,實務上以蒙地卡羅方法模擬電路資料,因曠日廢時愈發不可行,而過去透過參數模型外插估計或迴歸分析方法,也因變數蒐集不易、操作電壓減小使得電流值尾端估計產生偏差,上述原因使得尾端電流值估計困難。因此本文引進統計方法改善罕見機率電流值的估計:先以Box-Cox轉換觀察值為近似常態,改善尾端分配值的估計,再以加權迴歸方法估計罕見電流值,其中迴歸解釋變數為Log或Z分數轉換的經驗累積機率,而加權方法採用Down-weight加重極值樣本資訊的重要性,此外,本研究也考慮能蒐集完整變數的情況,改以電路資料作為解釋變數進行加權迴歸。另一方面,本研究也採用極值理論作為估計方法。
本文先以電腦模擬評估各方法的優劣,假設母體分配為常態、T分配、Gamma分配,以均方誤差作為衡量指標,模擬結果驗證了加權迴歸方法的可行性。而後參考模擬結果決定篩選樣本方式進行實證研究,資料來源為新竹某科技公司,實證結果顯示加權迴歸配合Box-Cox轉換能以十萬筆樣本數,準確估計左、右尾機率10^(-4) 、10^(-5)、10^(-6)、10^(-7)極端電流值。其中右尾部分的加權迴歸解釋變數採用對數轉換,而左尾部分的加權迴歸解釋變數採用Z分數轉換,估計結果較為準確,又若能蒐集電路資訊作為解釋變數,在左尾部份可以有最準確的估計結果;而篩選樣本尾端1%和整筆資料的方式對於不同方法的估計準確度各有利弊,皆可考慮。另外,1%門檻值比例的極值理論能穩定且中等程度的估計不同電壓下的電流值,且有短程估計最準的趨勢。 / To obtain the tail distribution of current beyond 4 to 6 sigma is nowadays a key issue in integrated circuit (IC) design and computer simulation is a popular tool to estimate the tail values. Since creating rare events via simulation is time-consuming, often the linear extrapolation methods (such as regression analysis) are applied to enhance efficiency. However, it is shown from past work that the tail values is likely to behave differently if the operating voltage is getting lower. In this study, a statistical method is introduced to deal with the lower voltage case. The data are evaluated via the Box-Cox (or power) transformation and see if they need to be transformed into normally distributed data, following by weighted regression to extrapolate the tail values. In specific, the independent variable is the empirical CDF with logarithm or z-score transformation, and the weight is down-weight in order to emphasize the information of extreme values observations. In addition to regression analysis, Extreme Value Theory (EVT) is also adopted in the research.
The computer simulation and data sets from a famous IC manufacturer in Hsinchu are used to evaluate the proposed method, with respect to mean squared error. In computer simulation, the data are assumed to be generated from normal, student t, or Gamma distribution. For empirical data, there are 10^8 observations and tail values with probabilities 10^(-4),10^(-5),10^(-6),10^(-7) are set to be the study goal given that only 10^5 observations are available. Comparing to the traditional methods and EVT, the proposed method has the best performance in estimating the tail probabilities. If the IC current is produced from regression equation and the information of independent variables can be provided, using the weighted regression can reach the best estimation for the left-tailed rare events. Also, using EVT can also produce accurate estimates provided that the tail probabilities to be estimated and the observations available are on the similar scale, e.g., probabilities 10^(-5)~10^(-7) vs.10^5 observations.
|
27 |
Validação de métodos para teste de germinação em sementes de espécies florestais da família FabaceaePereira, Vanderley José 28 June 2012 (has links)
Conselho Nacional de Desenvolvimento Científico e Tecnológico / CHAPTER II: The standardization of inter-laboratory results of germination test of forest species seeds requires that the methods be robust and subject to low variability. Therefore, the objective was to compare and discuss different forms of the coefficient of variation for normal seedlings in the method validation process for testing germination of seeds of 20 species of the Fabaceae family. Coefficients of variation for the experiment by lot and by laboratory were calculated for normal seedlings from the statistical analysis of method validations. For normal seedlings of 20 Brazilian forest species, the coefficients of variation were low (up to 9.84%), to average (up to 17.66%), contrary to expectations due to high genetic variability in these barely improved species. The increase of the coefficient is not related to treatment for breaking dormancy, but it grows as the lot quality decreases. The high coefficients by laboratory, overestimated by the lot effect, are uniform indicating that the methods are repeatable. The coefficient is not an indicator capable of predicting the heterogeneity of model variance. As normal distribution models random events, randomness is present in the validation process of the 20 forest species of the Fabaceae family. CHAPTER III: There are many treatments to overcome dormancy. However, descriptions of the consequences of these methods in seedling development are scarce. Because of the relevance of Fabaceae family in the context of dormancy seeds, seedlings and seeds of 10 forest species were evaluated quantitatively and qualitatively as to damage and infections caused by invasive treatments. Scarification, cutting and preheat methods were applied to seeds. They were then sampled was arranged in germitest, forming rolls distributed in a germination boxes under continuous fluorescent white light at 25 °C. Root protrusion used as the sole criterion, overestimates the efficiency of germination treatments to overcome dormancy. Scarification and cutting are efficient for Enterolobium contortisiliquum, Parkia pendula, Senna multijuga and Senna macranthera seeds, as well as cutting for Mimosa caesalpiniaefolia seeds. However, they are inefficient for Dimorphandra mollis and Enterolobium maximum because of the increase in the percentage of infected seedlings, and to Stryphnodendron adstringens, for dead seeds. The high percentage of damaged seedlings of Erythrina velutina and Erythrina speciosa cannot be attributed solely to treatment, because without pretreatment the root system gets strapped, forming a loop. The preheat treatment is inefficient because it results in a high percentage of hard seeds remaining. / CAPÍTULO II: A uniformização dos resultados inter laboratoriais de testes de germinação de sementes de espécies florestais exige que os métodos sejam robustos e sujeitos à baixa variabilidade. Assim, o objetivo foi comparar e discutir as diferentes formas de cálculo do coeficiente de variação para plântulas normais do processo de validação de métodos para teste de germinação de sementes de 20 espécies da família Fabaceae. Os experimentos de germinação de sementes para todas as espécies incluindo tratamentos pré-germinativos, tempos de contagem, substrato, temperatura e fotoperíodo foram validados com base no Manual de Validação da Associação Internacional para Teste de Sementes. Coeficientes de variação para o experimento, por lote e por laboratório para plântulas normais foram calculados, a partir das análises estatísticas previstas para a validação dos métodos, como exclusão de valores discrepantes, testes para as pressuposições do modelo e análise de variância. Para plântulas normais de 20 espécies florestais nativas, os coeficientes de variação foram de baixos (até 9,84%) a médios (até 17,66%), contrariando o esperado pela grande variabilidade genética dessas espécies pouco melhoradas. O aumento do coeficiente de variação não está relacionado ao tratamento de superação de dormência, porém cresce à medida que a qualidade do lote decresce. Os altos coeficientes de variação estimados por laboratório, superestimados pelo efeito de lotes, são uniformes indicando que os métodos são reproduzíveis. O coeficiente não é um indício capaz de predizer a heterogeneidade das variâncias e, consequentemente, a necessidade de transformação dos dados. Como a distribuição normal modela eventos aleatórios, a aleatoriedade está presente no processo de validação de métodos das 20 espécies da família Fabaceae, justificada pela distribuição normal dos resíduos. CAPÍTULO III: Muitos são os tratamentos descritos na literatura para superação da dormência de sementes, porém as consequências dos procedimentos no desenvolvimento das plântulas são raramente descritas. Pela relevância da família Fabaceae no contexto dos tratamentos para superação da dormência, plântulas e sementes de 10 espécies foram avaliadas quantitativamente e qualitativamente com o objetivo de determinar danos e infecções causados por tratamentos invasivos. Após revisão de literatura foram escolhidos métodos considerados eficientes para cada espécie. Experimentos foram conduzidos em delineamentos de blocos ao acaso e inteiramente casualizados, com sementes dispostas em papel germitest, com rolos dispostos em câmara de germinação sob luz branca fluorescente contínua a 25 ºC. Após a análise estatística selecionou-se o método que promoveu as maiores germinabilidades, maiores percentuais de plântulas normais, menores de anormais e de sementes mortas e duras, juntando-se outro relacionado ao mesmo procedimento e o tratamento térmico. A protrusão da raiz quando usado como critério único de germinação superestima a eficência dos tratamentos de superação da dormência de sementes de Fabaceae, porém é um indicador eficiente do potencial germinativo. A escarificação e o desponte são tratamentos eficientes de superação de dormência de sementes de Mimosa caesalpiniaefolia, Parkia pendula, Senna macranthera e Senna multijuga quando precedidos e seguidos pela assepsia com hipoclorito de sódio. Contudo, são ineficientes para sementes de Dimorphandra mollis e Enterolobium maximum pelo aumento dos percentuais de plântulas anormais infeccionadas e, para sementes de Stryphnodendron adstringens, pelo aumento dos percentuais de sementes mortas. Em Erythrina speciosa e Erythrina velutina os altos percentuais de plântulas anormais danificadas registrados a partir de sementes escarificadas e despontadas, não podem ser atribuídos exclusivamente aos tratamentos. A anormalidade causada pelo sistema radicular que fica preso no tegumento e enovela também é muito frequente em sementes sem qualquer pré-tratamento; O tratamento térmico úmido é ineficiente na superação de dormência de sementes de Fabaceae, mesmo com embebição posterior, pela desuniformidade pelos altos percentuais de sementes duras ao final do teste. / Mestre em Agronomia
|
28 |
Produção de tomateiro cultivado com acrescent solus® / Yeld of tomato cultivated with acrescent solus®Schwertner, Diogo Vanderlei 15 February 2012 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The objectives of this study were: verify the adequation to the presuppositions of the
mathematical model and, identify data transformation for productive and morphological
variables of tomato in experiments in plastic tunnel and in field, considering each harvest date
and grouped harvests during spring-summer and autumn-winter seasons; and, to evaluate the
effect of the application of Acrescent Solus® in complementation and/or substitution to the
mineral fertilization in coverage over productive and morphological variables, quality of fruit
and color of leaves of tomato cultivated in plastic tunnel and in field during springsummer
and autumn-winter seasons. Two experiments were conducted in plastic tunnel
(spring-summer and autumn-winter) and one in field (spring-summer), both in a randomized
block design with three replications in a bifactorial (2x4), with four additional controls. The
tests of: non-additivity of Tukey, Lilliefors, Bartlett and Run Test, were used, respectively, for
verify the adequation to the presuppositions of: additivity of the model, normality,
homogeneity and randomness of errors. In cases of violation, the data were transformed and,
submitted again to the analysis of the presuppositions of the mathematical model. The
productive and morphological variables of tomato present adequation to the presupposition of
additivity of the model. Violations of the productive and morphological variables occur to the
presupposition of: normality, homogeneity and randomness of errors. The grouping of all
harvests for analysis provides adequation of productive and morphological variables to the
presuppositions of normality and homogeneity of errors. The data transformation that
provides more proportion of adequation of productive and morphological variables to the
presuppositions of normality, homogeneity and randomness of errors is the square root. In the
spring-summer, the Acrescent Solus® does not alter the productive and morphological
variables and, the quality of fruits of tomato, while, in the autumn-winter, it reduces the
length, the width and the mass of fruits and, also, it does not alters the quality of fruits. The
increase of the dose of Acrescent Solus®, combined with the mineral fertilization in coverage
results in intensification of the green color of leaves, from the second and fourth harvest fruit
in the spring-summer and autumn-winter, respectively, nevertheless, without showing
favorable correlation with the expression of productive and morphological variables of
tomato. / Os objetivos deste estudo foram: verificar o atendimento às pressuposições do modelo
matemático e, identificar transformações de dados para variáveis produtivas e morfológicas
de tomateiro em experimentos em túnel plástico e em campo, considerando cada data de
colheita e colheitas agrupadas na primavera-verão e outono-inverno; e, avaliar o efeito da
aplicação de Acrescent Solus® em complementação e/ou substituição à adubação mineral em
cobertura sobre variáveis produtivas, morfológicas, de qualidade de frutos e de coloração de
folhas de tomateiro cultivado em túnel plástico e em campo na primavera-verão e outonoinverno.
Dois experimentos foram conduzidos em túnel plástico (primavera-verão e outonoinverno)
e um em campo (primavera-verão), ambos no delineamento de blocos ao acaso com
três repetições em um bifatorial (2x4), com quatro testemunhas adicionais. Para verificar o
atendimento às pressuposições de: aditividade do modelo, normalidade, homogeneidade e
aleatoriedade dos erros foram utilizados, respectivamente, os testes de: não aditividade de
Tukey, Lilliefors, Bartlett e de Sequência. Nos casos de violação, os dados foram
transformados e, novamente submetidos à análise das pressuposições do modelo matemático.
As variáveis produtivas e morfológicas de tomateiro apresentam atendimento ao pressuposto
de aditividade do modelo. Ocorrem violações das variáveis produtivas e morfológicas às
pressuposições de: normalidade, homogeneidade e aleatoriedade dos erros. O agrupamento de
todas as colheitas para análise proporciona atendimento das variáveis produtivas e
morfológicas aos pressupostos de normalidade e homogeneidade dos erros. A transformação
que possibilita maior proporção de atendimento das variáveis produtivas e morfológicas aos
pressupostos de normalidade, homogeneidade e aleatoriedade dos erros é a raiz quadrada. Na
primavera-verão o Acrescent Solus® não influencia as variáveis produtivas, morfológicas e
de qualidade dos frutos de tomateiro, enquanto no outono-inverno reduz o comprimento, a
largura e a massa de frutos e, também não influencia a qualidade de frutos. O aumento da
dose de Acrescent Solus®, combinada com a adubação mineral em cobertura resulta em
intensificação da coloração verde das folhas, a partir da segunda e quarta colheita de frutos na
primavera-verão e outono-inverno, respectivamente, porém, sem apresentar correlação
favorável com a expressão das variáveis produtivas e morfológicas de tomateiro.
|
29 |
Migrace systémové databáze elektronického obchodu / E-commerce System Database MigrationZkoumalová, Barbora January 2016 (has links)
The object of master‘s thesis is design and creation of e-commerce system database migration tool from the ZenCart platform to the PrestaShop platform. Both system databases will be described and analysed and based on gained information the migration tool will be created according customers‘ requirements and then final data migration from original to the new database will be executed.
|
30 |
Contributions to Engineering Big Data Transformation, Visualisation and Analytics. Adapted Knowledge Discovery Techniques for Multiple Inconsistent Heterogeneous Data in the Domain of Engine TestingJenkins, Natasha N. January 2022 (has links)
In the automotive sector, engine testing generates vast data volumes that
are mainly beneficial to requesting engineers. However, these tests are often
not revisited for further analysis due to inconsistent data quality and
a lack of structured assessment methods. Moreover, the absence of a tailored
knowledge discovery process hinders effective preprocessing, transformation,
analytics, and visualization of data, restricting the potential for
historical data insights. Another challenge arises from the heterogeneous
nature of test structures, resulting in varying measurements, data types,
and contextual requirements across different engine test datasets.
This thesis aims to overcome these obstacles by introducing a specialized
knowledge discovery approach for the distinctive Multiple Inconsistent
Heterogeneous Data (MIHData) format characteristic of engine testing.
The proposed methods include adapting data quality assessment and reporting,
classifying engine types through compositional features, employing modified dendrogram similarity measures for classification, performing customized feature extraction, transformation, and structuring, generating and manipulating synthetic images to enhance data visualization, and
applying adapted list-based indexing for multivariate engine test summary
data searches.
The thesis demonstrates how these techniques enable exploratory analysis,
visualization, and classification, presenting a practical framework to
extract meaningful insights from historical data within the engineering
domain. The ultimate objective is to facilitate the reuse of past data resources,
contributing to informed decision-making processes and enhancing
comprehension within the automotive industry. Through its focus on
data quality, heterogeneity, and knowledge discovery, this research establishes
a foundation for optimized utilization of historical Engine Test Data
(ETD) for improved insights. / Soroptimist International Bradford
|
Page generated in 0.1069 seconds