Spelling suggestions: "subject:"deprocessing"" "subject:"cryoprocessing""
41 |
A Recurrent Neural Network For Battery Capacity Estimations In Electrical VehiclesCorell, Simon January 2019 (has links)
This study is an investigation if a recurrent long short-term memory (LSTM) based neural network can be used to estimate the battery capacity in electrical cars. There is an enormous interest in finding the underlying reasons why and how Lithium-ion batteries ages and this study is a part of this broader question. The research questions that have been answered are how well a LSTM model estimates the battery capacity, how the LSTM model is performing compared to a linear model and what parameters that are important when estimating the capacity. There have been other studies covering similar topics but only a few that has been performed on a real data set from real cars driving. With a data science approach, it was discovered that the LSTM model indeed is a powerful model to use for estimation the capacity. It had better accuracy than a linear regression model, but the linear regression model still gave good results. The parameters that implied to be important when estimating the capacity were logically related to the properties of a Lithium-ion battery.En studie över hur väl ett återkommande neuralt nätverk kan estimera kapaciteten hos Litium-ion batteri hos elektroniska fordon, när en en datavetenskaplig strategi har använts.
|
42 |
Normalization and analysis of high-dimensional genomics dataLandfors, Mattias January 2012 (has links)
In the middle of the 1990’s the microarray technology was introduced. The technology allowed for genome wide analysis of gene expression in one experiment. Since its introduction similar high through-put methods have been developed in other fields of molecular biology. These high through-put methods provide measurements for hundred up to millions of variables in a single experiment and a rigorous data analysis is necessary in order to answer the underlying biological questions. Further complications arise in data analysis as technological variation is introduced in the data, due to the complexity of the experimental procedures in these experiments. This technological variation needs to be removed in order to draw relevant biological conclusions from the data. The process of removing the technical variation is referred to as normalization or pre-processing. During the last decade a large number of normalization and data analysis methods have been proposed. In this thesis, data from two types of high through-put methods are used to evaluate the effect pre-processing methods have on further analyzes. In areas where problems in current methods are identified, novel normalization methods are proposed. The evaluations of known and novel methods are performed on simulated data, real data and data from an in-house produced spike-in experiment.
|
43 |
O efeito do uso de diferentes formas de extração de termos na compreensibilidade e representatividade dos termos em coleções textuais na língua portuguesa / The effect of using different forms of terms extraction on its comprehensibility and representability in Portuguese textual domainsMerley da Silva Conrado 10 September 2009 (has links)
A extração de termos em coleções textuais, que é uma atividade da etapa de Pré-Processamento da Mineração de Textos, pode ser empregada para diversos fins nos processos de extração de conhecimento. Esses termos devem ser cuidadosamente extraídos, uma vez que os resultados de todo o processo dependerão, em grande parte, da \"qualidade\" dos termos obtidos. A \"qualidade\" dos termos, neste trabalho, abrange tanto a representatividade dos termos no domínio em questão como sua compreensibilidade. Tendo em vista sua importância, neste trabalho, avaliou-se o efeito do uso de diferentes técnicas de simplificação de termos na compreensibilidade e representatividade dos termos em coleções textuais na Língua Portuguesa. Os termos foram extraídos seguindo os passos da metodologia apresentada neste trabalho e as técnicas utilizadas durante essa atividade de extração foram a radicalização, lematização e substantivação. Para apoiar tal metodologia, foi desenvolvida uma ferramenta, a ExtraT (Ferramenta para Extração de Termos). Visando garantir a \"qualidade\" dos termos extraídos, os mesmos são avaliados objetiva e subjetivamente. As avaliações subjetivas, ou seja, com o auxílio de especialistas do domínio em questão, abrangem a representatividade dos termos em seus respectivos documentos, a compreensibilidade dos termos obtidos ao utilizar cada técnica e a preferência geral subjetiva dos especialistas em cada técnica. As avaliações objetivas, que são auxiliadas por uma ferramenta desenvolvida (a TaxEM - Taxonomia em XML da Embrapa), levam em consideração a quantidade de termos extraídos por cada técnica, além de abranger tambéem a representatividade dos termos extraídos a partir de cada técnica em relação aos seus respectivos documentos. Essa avaliação objetiva da representatividade dos termos utiliza como suporte a medida CTW (Context Term Weight). Oito coleções de textos reais do domínio de agronegócio foram utilizadas na avaliaçao experimental. Como resultado foram indicadas algumas das características positivas e negativas da utilização das técnicas de simplificação de termos, mostrando que a escolha pelo uso de alguma dessas técnicas para o domínio em questão depende do objetivo principal pré-estabelecido, que pode ser desde a necessidade de se ter termos compreensíveis para o usuário até a necessidade de se trabalhar com uma menor quantidade de termos / The task of term extraction in textual domains, which is a subtask of the text pre-processing in Text Mining, can be used for many purposes in knowledge extraction processes. These terms must be carefully extracted since their quality will have a high impact in the results. In this work, the quality of these terms involves both representativity in the specific domain and comprehensibility. Considering this high importance, in this work the effects produced in the comprehensibility and representativity of terms were evaluated when different term simplification techniques are utilized in text collections in Portuguese. The term extraction process follows the methodology presented in this work and the techniques used were radicalization, lematization and substantivation. To support this metodology, a term extraction tool was developed and is presented as ExtraT. In order to guarantee the quality of the extracted terms, they were evaluated in an objective and subjective way. The subjective evaluations, assisted by domain specialists, analyze the representativity of the terms in related documents, the comprehensibility of the terms with each technique, and the specialist\'s opinion. The objective evaluations, which are assisted by TaxEM and by Thesagro (National Agricultural Thesaurus), consider the number of extracted terms by each technique and their representativity in the related documents. This objective evaluation of the representativity uses the CTW measure (Context Term Weight) as support. Eight real collections of the agronomy domain were used in the experimental evaluation. As a result, some positive and negative characteristics of each techniques were pointed out, showing that the best technique selection for this domain depends on the main pre-established goal, which can involve obtaining better comprehensibility terms for the user or reducing the quantity of extracted terms
|
44 |
Visualização de operações de junção em sistemas de bases de dados para mineração de dados. / Visualization of join operations in DBMS for data mining.Maria Camila Nardini Barioni 13 June 2002 (has links)
Nas últimas décadas, a capacidade das empresas de gerar e coletar informações aumentou rapidamente. Essa explosão no volume de dados gerou a necessidade do desenvolvimento de novas técnicas e ferramentas que pudessem, além de processar essa enorme quantidade de dados, permitir sua análise para a descoberta de informações úteis, de maneira inteligente e automática. Isso fez surgir um proeminente campo de pesquisa para a extração de informação em bases de dados denominado Knowledge Discovery in Databases KDD, no geral técnicas de mineração de dados DM têm um papel preponderante. A obtenção de bons resultados na etapa de mineração de dados depende fortemente de quão adequadamente o preparo dos dados é realizado. Sendo assim, a etapa de extração de conhecimento (DM) no processo de KDD, é normalmente precedida de uma etapa de pré-processamento, onde os dados que porventura devam ser submetidos à etapa de DM são integrados em uma única relação. Um problema importante enfrentado nessa etapa é que, na maioria das vezes, o usuário ainda não tem uma idéia muito precisa dos dados que devem ser extraídos. Levando em consideração a grande habilidade de exploração da mente humana, este trabalho propõe uma técnica de visualização de dados armazenados em múltiplas relações de uma base de dados relacional, com o intuito de auxiliar o usuário na preparação dos dados a serem minerados. Esta técnica permite que a etapa de DM seja aplicada sobre múltiplas relações simultaneamente, trazendo as operações de junção para serem parte desta etapa. De uma maneira geral, a adoção de junções em ferramentas de DM não é prática, devido ao alto custo computacional associado às operações de junção. Entretanto, os resultados obtidos nas avaliações de desempenho da técnica proposta neste trabalho mostraram que ela reduz esse custo significativamente, tornando possível a exploração visual de múltiplas relações de uma maneira interativa. / In the last decades the capacity of information generation and accumulation increased quickly. With the explosive growth in the volume of data, new techniques and tools are being sought to process it and to automatically discover useful information from it, leading to techniques known as Knowledge Discovery in Databases KDD where, in general, data mining DM techniques play an important role. The results of applying data mining techniques on datasets are highly dependent on proper data preparation. Therefore, in traditional DM processes, data goes through a pre-processing step that results in just one table that is submitted to mining. An important problem faced during this step is that, most of the times, the analyst doesnt have a clear idea of what portions of data should be mined. This work reckons the strong ability of human beings to interpret data represented in graphical format, to develop a technique to visualize data from multiple tables, helping human analysts when preparing data to DM. This technique allows the data mining process to be applied over multiple relations at once, bringing the join operations to become part of this process. In general, the use of multiple tables in DM tools is not practical, due to the high computational cost required to explore them. Experimental evaluation of the proposed technique shows that it reduces this cost significantly, turning it possible to visually explore data from multiple tables in an interactive way.
|
45 |
Predicting the impact of prior physical activity on shooting performance / Prediktion av tidigare fysisk aktivitets inverkan på skytteprestandaBerkman, Anton, Andersson, Gustav January 2019 (has links)
The objectives of this thesis were to develop a machine learning tool-chain and to investigate the relationship between heart rate and trigger squeeze and shooting accuracy when firing a handgun in a simulated environment. There are several aspects that affects the accuracy of a shooter. To accelerate the learning process and to complement the instructors, different sensors can be used by the shooter. By extracting sensor data and presenting this to the shooter in real-time the rate of improvement can potentially be accelerated. An experiment which replicated precision shooting was conducted at SAAB AB using their GC-IDT simulator. 14 participants with experience ranging from zero to over 30 years participated. The participants were randomly divided into two groups where one group started the experiment with a heart rate of at least 150 beats per minute. The iTouchGlove2.3 was used to measure trigger squeeze and Polar H10 heart rate belt was used to measure heart rate. Random forest regression was then used to predict accuracy on the data collected from the experiment. A machine learning tool-chain was successfully developed to process raw sensor data which was then used by a random forest regression algorithm to form a prediction. This thesis provides insights and guidance for further experimental explorations of handgun exercises and shooting performance.
|
46 |
Improvement of Optical Character Recognition on Scanned Historical Documents Using Image ProcessingAula, Lara January 2021 (has links)
As an effort to improve accessibility to historical documents, digitization of historical archives has been an ongoing process at many institutions since the origination of Optical Character Recognition. The old, scanned documents can contain deteriorations acquired over time or caused by old printing methods. Common visual attributes seen on the documents are variations in style and font, broken characters, ink intensity, noise levels and damage caused by folding or ripping and more. Many of these attributes are disfavoring for modern Optical Character Recognition tools and can lead to failed character recognition. This study approaches stated problem by using image processing methods to improve the result of character recognition. Furthermore, common image quality characteristics of scanned historical documents with unidentifiable text are analyzed. The Optical Character Recognition tool used to conduct this research was the open-source Tesseract software. Image processing methods like Gaussian lowpass filtering, Otsu’s optimum thresholding method and morphological operations were used to prepare the historical documents for Tesseract. Using the Precision and Recall classification method, the OCR output was evaluated, and it was seen that the recall improved by 63 percentage points and the precision by 18 percentage points. This shows that using image pre-processing methods as an approach to increase the readability of historical documents for Optical Character Recognition tools is effective. Further it was seen that common characteristics that are especially disadvantageous for Tesseract are font deviations, occurrence of non-belonging objects, character fading, broken characters, and Poisson noise.
|
47 |
Streamlining 3D City Modeling for Urban Flow Simulations by Automatic Integration of Multisource TopographyLindroth, Klara January 2023 (has links)
In the workflow of computational fluid dynamics, geometry preparation is commonly the most time-consuming step. For a fast CFD simulation, automatic surface reconstruction to obtain 3D city models for a chosen area is essential. To address this need, a literature study was conducted to map available data suitable for 3D city models. The properties investigated included geographical coverage, resolution, accuracy and licensing. A surface reconstruction using different topographical data was conducted using the 3D finite element mesh generator Gmsh and various GIS analysis tools. The findings of the literature study found no global data enabling a fully automatic solution with sufficient results. However, the open geographic database OpenStreetMap has potential for future work. Today, the method developed in this project is restricted to country-by-country applications and uses a terrain model, LiDAR data and building footprints as input data. The generated 3D city model has a level of detail 1.2, consisting of valid geometries without self-intersection, overlapping or gaps. The method is a semi-automatic workflow with a time consumption of less than one hour, from the extraction of data to a simulation-ready 3D city model. The model shows satisfactory agreement with the reference material but needs improvements regarding the detail of height setting, for more accurate airflow simulations. The method contributes to the field of automatic 3D city model reconstruction. Future work includes improvement regarding level of detail and automation of data attainment.
|
48 |
Developing an Architecture Framework for Cloud-Based, Multi-User, Finite Element Pre-ProcessingBriggs, Jared Calvin 14 November 2013 (has links) (PDF)
This research proposes an architecture for a cloud-based, multi-user FEA pre-processing system, where multiple engineers can access and operate on the same model in a parallel environment. A prototype is discussed and tested, the results of which show that a multi-user preprocessor, where all computing is done on a central server that is hosted on a high performance system, provides significant benefits to the analysis team. These benefits include a shortened preprocessing time, and potentially higher-quality models.
|
49 |
Machine translation of proper names from english and french into vietnamese : an error analysis and some proposed solutions / Traduction automatique des noms propres de l’anglais et du français vers le vietnamien : analyse des erreurs et quelques solutionsPhan Thi Thanh, Thao 11 March 2014 (has links)
Dans l'ère de l'information et de la connaissance, la traduction automatique (TA) devientprogressivement un outil indispensable pour transposer la signification d'un texte d'une langue source versune langue cible. La TA des noms propres (NP), en particulier, joue un rôle crucial dans ce processus,puisqu'elle permet une identification précise des personnes, des lieux, des organisations et des artefacts àtravers les langues. Malgré un grand nombre d'études et des résultats significatifs concernant lareconnaissance d'entités nommées (dont le nom propre fait partie) dans la communauté de TAL dans lemonde, il n'existe presque aucune recherche sur la traduction automatique des noms propres (TANP) pourle vietnamien. En raison des caractéristiques différentes d'écriture de NP, la translittération ou la transcription etla traduction de plusieurs de langues incluant l'anglais, le français, le russe, le chinois, etc. vers levietnamien, le TANP de ces langues vers le vietnamien est stimulant et problématique. Cette étude seconcentre sur les problèmes de TANP d’anglais vers le vietnamien et de français vers le vietnamienrésultant du moteurs courants de la TA et présente les solutions de prétraitement de ces problèmes pouraméliorer la qualité de la TA. A travers l'analyse et la classification d'erreurs de la TANP faites sur deux corpus parallèles detextes avec PN (anglais-vietnamien et français-vietnamien), nous proposons les solutions concernant deuxproblématiques importantes: (1) l'annotation de corpus, afin de préparer des bases de données pour leprétraitement et (2) la création d'un programme pour prétraiter automatiquement les corpus annotés, afinde réduire les erreurs de la TANP et d'améliorer la qualité de traduction des systèmes de TA, tels queGoogle, Vietgle, Bing et EVTran. L'efficacité de différentes méthodes d'annotation des corpus avec des NP ainsi que les tauxd'erreurs de la TANP avant et après l'application du programme de prétraitement sur les deux corpusannotés est comparés et discutés dans cette thèse. Ils prouvent que le prétraitement réduitsignificativement le taux d'erreurs de la TANP et, par la même, contribue à l'amélioration de traductionautomatique vers la langue vietnamienne. / Machine translation (MT) has increasingly become an indispensable tool for decoding themeaning of a text from a source language into a target language in our current information and knowledgeera. In particular, MT of proper names (PN) plays a crucial role in providing the specific and preciseidentification of persons, places, organizations, and artefacts through the languages. Despite a largenumber of studies and significant achievements of named entity recognition in the NLP communityaround the world, there has been almost no research on PNMT for Vietnamese language. Due to the different features of PN writing, transliteration or transcription and translation from a variety of languages including English, French, Russian, Chinese, etc. into Vietnamese, the PNMT from those languages into Vietnamese is still challenging and problematic issue. This study focuses on theproblems of English-Vietnamese and French-Vietnamese PNMT arising from current MT engines. First,it proposes a corpus-based PN classification, then a detailed PNMT error analysis to conclude with somepre-processing solutions in order to improve the MT quality. Through the analysis and classification of PNMT errors from the two English-Vietnamese and French-Vietnamese parallel corpora of texts with PNs, we propose solutions concerning two major issues:(1)corpus annotation for preparing the pre-processing databases, and (2)design of the pre-processingprogram to be used on annotated corpora to reduce the PNMT errors and enhance the quality of MTsystems, including Google, Vietgle, Bing and EVTran. The efficacy of different annotation methods of English and French corpora of PNs and the results of PNMT errors before and after using the pre-processing program on the two annotated corporaare compared and discussed in this study. They prove that the pre-processing solution reducessignificantly PNMT errors and contributes to the improvement of the MT systems’ for Vietnameselanguage.
|
50 |
Polyphenolanalyse in gartenbaulichen Produkten auf der Basis laser-induzierter FluoreszenzspektroskopieWulf, Janina Saskia 11 April 2007 (has links)
In der gartenbaulichen Forschung gewinnen zerstörungsfreie Produktmonitoringverfahren im Hinblick auf ein verbessertes Prozessmanagement an Bedeutung. Optische Methoden werden bereits in mobilen Systemen und Sortieranlagen zur Produktbewertung in Nachernteprozessen eingesetzt. In der vorliegenden Arbeit wurde ein Beitrag zur quantitativen Bestimmung ernährungsphysiologisch bedeutender Fruchtpolyphenole auf der Basis laser-induzierter Fluoreszenzspektroskopie geleistet. An gelagerten Äpfeln und Möhren wurde die Varianz der Produktfluoreszenz bei verschiedenen Lagerbedingungen mit Hilfe der Hauptkomponentenanalyse ausgewertet, um die Produktentwicklung zerstörungsfrei aufzuzeigen. Für eine angepasste Methode der Datenauswertung wurden hierbei verschiedene Signalvorverarbeitungsmethoden getestet. Die quantitative Bestimmung einzelner Inhaltsstoffe wird in der komplexen pflanzlichen Matrix sowohl beeinflusst durch die Fluoreszenzquantenausbeute als auch Reabsorptions- und Löschungseffekten. Aufbauend auf Untersuchungen an Phenolstandards, Fruchtextrakten und geschnittenem Fruchtgewebe zu Einflussparametern und fluoreszenzspektrokopisch messbaren Konzentrationsbereichen wurden neuere Datenvorverarbeitungsmethoden zur Korrektur angewendet. Kalibriermodelle wurden auf der Basis der fluorimetrisch und chromatographisch ermittelten Werte von Hydroxyzimtsäurederivaten bei Apfel und Erdbeere erarbeitetet und hinsichtlich der Messungenauigkeit in der Kalibrierung und Kreuzvalidierung verglichen. Aufgrund der hohen Variabilität gartenbaulicher Produkte wurden diese Modelle auf einem unabhängigen Datensatz getestet. Mit Hilfe mathematischer orthogonaler Signalkorrektur konnte die für den Polyphenolgehalt nicht relevante Varianz aus den spektralen Daten entfernt und verringerte Kalibrierungs- und Validierungsfehler erzielt werden. Der in der Fluoreszenzanalyse übliche empirische Ansatz mit reflexionskorrigierten Fluoreszenzspektren zu arbeiten führten hingegen zu keiner Fehlerverminderung. / During recent years several research groups focussed on the development of non-destructive product monitoring methods to improve the process management for horticultural products in the entire supply chain. Optical methods have been applied for fruit monitoring in production and postharvest processes using mobile measuring systems or NIR sorting lines. The aim of the present study was to quantitatively determine health promoting native fruit polyphenols by means of laser-induced fluorescence spectroscopy. The variance in the fluorescence signal was detected on apples and carrots stored under different conditions. With the help of principal component analysis the fluorescence spectra were evaluated to visualize senescence effects during storage. Different data pre-processing methods were tested for a descriptive factor analysis regarding the wavelength-dependent intensities as variables. However, in a complex fruit matrix the quantitative determination of fruit compounds is influenced by its fluorescence quantum yield as well as reabsorption and quenching effects. The influence of side-effects was studied in phenol standards, fruit extracts and sliced fruit tissue and spectral data was corrected using new data pre-processing methods.. Calibration models for the polyphenol analyses were built on the fruit fluorescence spectra (apples, strawberries) using the chromatographically analysis of hydroxycinnamic acids as a reference. The uncertainty of the models was evaluated by their root mean squares errors of calibration and cross-validation. The feasibility of the non-destructive analysis in practice is influenced by the high variability of horticultural products. Therefore, the models were validated on an independent test set. The mathematical data pre-processing method of direct orthogonal signal correction removed the non relevant information in the spectral data and resulted in the lowest errors. In comparison, the often applied empirical approach in fluorescence spectroscopy to correct with simultaneously recorded reflectance spectra did not improve the calibration models.
|
Page generated in 0.0778 seconds