Global ETD Search

31	Resource-Efficient Data Pre-Processing for Deep Learning Zawawi, Omar 04 1900 (has links) It is projected that by 2026, most workloads in cloud data centers will be Deep Learning (DL) workloads. However, these workloads pose significant challenges due to their high computational demands, requiring infrastructure and platform advancements to meet DL’s performance, efficiency, and scalability requirements. One emerging problem in large-scale DL is the data stall issue, which occurs when DL models require extensive input data pre-processing, causing CPUs to struggle to keep up with the data consumption demands of GPUs during the training stage. This results in the DL pipeline stalling and GPUs running idle. Our work aims to fundamentally address the data stall issue in modern pre-processing DL pipelines. Traditional solutions involve allocating more CPUs to the pre-processing stage to meet GPU demands, but this approach significantly increases energy con- sumption and provisioning costs. For example, Meta recently disclosed that their DLRM pipeline requires 9 to 55 CPU servers per trainer node, depending on the workload. Our research explores offloading common pre-processing primi- tives to programmable network hardware, specifically Tofino2-equipped switches known for their high bandwidth and energy efficiency, and the Bluefield-2 Smart- NIC. Our initial power measurements demonstrate that Tofino2 and Bluefield-2 achieve 11.6x and 3.0x higher throughput per Watt, respectively, compared to a generic x86 or AMD CPU while performing pre-processing operations. However, due to Tofino2’s limitations in terms of the operations it can perform compared to a CPU, several design optimizations are required to fully exploit the potential of programmable network devices. deep learning preprocessing pre-processing efficiency in network computing in-network-computing data stall
32	Assessing biofilm development in drinking water distribution systems by Machine Learning methods Ramos Martínez, Eva 02 May 2016 (has links) [EN] One of the main challenges of drinking water utilities is to ensure high quality supply, in particular, in chemical and microbiological terms. However, biofilms invariably develop in all drinking water distribution systems (DWDSs), despite the presence of residual disinfectant. As a result, water utilities are not able to ensure total bacteriological control. Currently biofilms represent a real paradigm in water quality management for all DWDSs. Biofilms are complex communities of microorganisms bound by an extracellular polymer that provides them with structure, protection from toxics and helps retain food. Besides the health risk that biofilms involve, due to their role as a pathogen shelter, a number of additional problems associated with biofilm development in DWDSs can be identified. Among others, aesthetic deterioration of water, biocorrosion and disinfectant decay are universally recognized. A large amount of research has been conducted on this field since the earliest 80's. However, due to the complex environment and the community studied most of the studies have been developed under certain simplifications. We resort to this already done work and acquired knowledge on biofilm growth in DWDSs to change the common approaches of these studies. Our proposal is based on arduous preprocessing and posterior analysis by Machine Learning approaches. A multi-disciplinary procedure is undertaken, helping as a practical approach to develop a decision-making tool to help DWDS management to maintain, as much as possible, biofilm at the lowest level, and mitigating its negative effects on the service. A methodology to detect the more susceptible areas to biofilm development in DWDSs is proposed. Knowing the location of these hot-spots of the network, mitigation actions could be focused more specifically, thus saving resources and money. Also, prevention programs could be developed, acting before the consequences of biofilm are noticed by the consumers. In this way, the economic cost would be reduced and the service quality would improve, eventually increasing consumers' satisfaction. / [ES] Uno de los principales objetivos de las empresas encargadas de la gestión de los sistemas de distribución de agua potable (DWDSs, del inglés Drinking Water Distribution Systems) es asegurar una alta calidad del agua en su abastecimiento, tanto química como microbiológica. Sin embargo, la existencia de biofilms en todos ellos, a pesar de la presencia de desinfectante residual, hace que no se pueda asegurar un control bacteriológico total, por lo que, hoy en día, los biofilms representan un paradigma en la gestión de la calidad del agua en los DWDSs. Los biofilms son comunidades complejas de microorganismos recubiertas de un polímero extracelular que les da estructura y les ayuda a retener el alimento y a protegerse de agentes tóxicos. Además del riesgo sanitario que suponen por su papel como refugio de patógenos, existen muchos otros problemas asociados al desarrollo de biofilms en los DWDSs, como deterioro estético del agua, biocorrosión y consumo de desinfectante, entre otros. Una gran cantidad de investigaciones se han realizado en este campo desde los primeros años 80. Sin embargo, debido a la complejidad del entorno y la comunidad estudiada la mayoría de estos estudios se han llevado a cabo bajo ciertas simplificaciones. En nuestro caso, recurrimos a estos trabajos ya realizados y al conocimiento adquirido sobre el desarrollo del biofilm en los DWDSs para cambiar el enfoque en el que normalmente se enmarcan estos estudios. Nuestra propuesta se basa en un intenso pre-proceso y posterior análisis con técnicas de aprendizaje automático. Se implementa un proceso multidisciplinar que ayuda a la realización de un enfoque práctico para el desarrollo de una herramienta de ayuda a la toma de decisiones que ayude a la gestión de los DWDSs, manteniendo, en lo posible, el biofilm en los niveles más bajos, y mitigando sus efectos negativos sobre el servicio de agua. Se propone una metodología para detectar las áreas más susceptibles al desarrollo del biofilm en los DWDSs. Conocer la ubicación de estos puntos calientes de biofilm en la red permitiría llevar a cabo acciones de mitigación de manera localizada, ahorrando recursos y dinero, y asimismo, podrían desarrollarse programas de prevención, actuando antes de que las consecuencias derivadas del desarrollo de biofilm sean percibidas por los consumidores. De esta manera, el coste económico se vería reducido y la calidad del servicio mejoraría, aumentando, finalmente, la satisfacción de los usuarios. / [CA] Un dels principals reptes dels serveis d'aigua potable és garantir el subministrament d'alta qualitat, en particular, en termes químics i microbiològics. No obstant això, els biofilms desenvolupen invariablement en tots els sistemes de distribució d'aigua potable (DWDSs, de l'anglès, Drinking Water Distribution Systems), tot i la presència de desinfectant residual. Com a resultat, les empreses d'aigua no són capaces de garantir un control bacteriològic total. Actualment el biofilms representen un veritable paradigma en la gestió de la qualitat de l'aigua per a tots les DWDSs. Els biofilms són comunitats complexes de microorganismes vinculats per un polímer extracel·lular que els proporciona estructura, protecció contra els tòxics i ajuda a retenir els aliments. A més del risc de salut que impliquen els biofilms, com a causa del seu paper com a refugi de patògens, una sèrie de problemes addicionals associats amb el desenvolupament del biofilm en els DWDSs pot ser identificat. Entre altres, deteriorament estètic d'aigua, biocorrosión i decadència de desinfectant són universalment reconeguts. Una gran quantitat d'investigació s'ha realitzat en aquest camp des dels primers anys de la dècada del 80. No obstant això, a causa de la complexitat de l'entorn i la comunitat estudiada, la major part dels estudis s'han desenvolupat sota certes simplificacions. Recorrem a aquest treball ja realitzat i a aquest coneixement adquirit en el creixement de biofilms en els DWDSs per canviar el punt de vista clàssic del biofilm en estudis en els DWDSs. La nostra proposta es basa en l'ardu processament previ i posterior anàlisi mitjançant enfocaments d'aprenentatge automàtic. Es va dur a terme un procediment multidisciplinari, ajudant com un enfocament pràctic per desenvolupar una eina de presa de decisions per ajudar a la gestió dels DWDS a mantenir, en la mesura possible, els biofilm en els nivells més baixos, i la mitigació dels seus efectes negatius sobre el servei. Es proposa una metodologia per detectar les àrees més susceptibles al desenvolupament de biofilms en els DWDSs. En conèixer la ubicació d'aquests punts calents de la xarxa, les accions de mitigació podrien centrar-se més específicament, estalviant recursos i diners. A més, els programes de prevenció es podrien desenvolupar, actuant abans que les conseqüències del biofilm es noten pels consumidors. D'aquesta manera, el cost econòmic seria reduït i la qualitat del servei podria millorar, finalment augmentant la satisfacció dels consumidors. / Ramos Martínez, E. (2016). Assessing biofilm development in drinking water distribution systems by Machine Learning methods [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/63257 MATEMATICA APLICADA INGENIERIA HIDRAULICA
33	Tracking domain knowledge based on segmented textual sources Kalledat, Tobias 11 May 2009 (has links) Die hier vorliegende Forschungsarbeit hat zum Ziel, Erkenntnisse über den Einfluss der Vorverarbeitung auf die Ergebnisse der Wissensgenerierung zu gewinnen und konkrete Handlungsempfehlungen für die geeignete Vorverarbeitung von Textkorpora in Text Data Mining (TDM) Vorhaben zu geben. Der Fokus liegt dabei auf der Extraktion und der Verfolgung von Konzepten innerhalb bestimmter Wissensdomänen mit Hilfe eines methodischen Ansatzes, der auf der waagerechten und senkrechten Segmentierung von Korpora basiert. Ergebnis sind zeitlich segmentierte Teilkorpora, welche die Persistenzeigenschaft der enthaltenen Terme widerspiegeln. Innerhalb jedes zeitlich segmentierten Teilkorpus können jeweils Cluster von Termen gebildet werden, wobei eines diejenigen Terme enthält, die bezogen auf das Gesamtkorpus nicht persistent sind und das andere Cluster diejenigen, die in allen zeitlichen Segmenten vorkommen. Auf Grundlage einfacher Häufigkeitsmaße kann gezeigt werden, dass allein die statistische Qualität eines einzelnen Korpus es erlaubt, die Vorverarbeitungsqualität zu messen. Vergleichskorpora sind nicht notwendig. Die Zeitreihen der Häufigkeitsmaße zeigen signifikante negative Korrelationen zwischen dem Cluster von Termen, die permanent auftreten, und demjenigen das die Terme enthält, die nicht persistent in allen zeitlichen Segmenten des Korpus vorkommen. Dies trifft ausschließlich auf das optimal vorverarbeitete Korpus zu und findet sich nicht in den anderen Test Sets, deren Vorverarbeitungsqualität gering war. Werden die häufigsten Terme unter Verwendung domänenspezifischer Taxonomien zu Konzepten gruppiert, zeigt sich eine signifikante negative Korrelation zwischen der Anzahl unterschiedlicher Terme pro Zeitsegment und den einer Taxonomie zugeordneten Termen. Dies trifft wiederum nur für das Korpus mit hoher Vorverarbeitungsqualität zu. Eine semantische Analyse auf einem mit Hilfe einer Schwellenwert basierenden TDM Methode aufbereiteten Datenbestand ergab signifikant unterschiedliche Resultate an generiertem Wissen, abhängig von der Qualität der Datenvorverarbeitung. Mit den in dieser Forschungsarbeit vorgestellten Methoden und Maßzahlen ist sowohl die Qualität der verwendeten Quellkorpora, als auch die Qualität der angewandten Taxonomien messbar. Basierend auf diesen Erkenntnissen werden Indikatoren für die Messung und Bewertung von Korpora und Taxonomien entwickelt sowie Empfehlungen für eine dem Ziel des nachfolgenden Analyseprozesses adäquate Vorverarbeitung gegeben. / The research work available here has the goal of analysing the influence of pre-processing on the results of the generation of knowledge and of giving concrete recommendations for action for suitable pre-processing of text corpora in TDM. The research introduced here focuses on the extraction and tracking of concepts within certain knowledge domains using an approach of horizontally (timeline) and vertically (persistence of terms) segmenting of corpora. The result is a set of segmented corpora according to the timeline. Within each timeline segment clusters of concepts can be built according to their persistence quality in relation to each single time-based corpus segment and to the whole corpus. Based on a simple frequency measure it can be shown that only the statistical quality of a single corpus allows measuring the pre-processing quality. It is not necessary to use comparison corpora. The time series of the frequency measure have significant negative correlations between the two clusters of concepts that occur permanently and others that vary within an optimal pre-processed corpus. This was found to be the opposite in every other test set that was pre-processed with lower quality. The most frequent terms were grouped into concepts by the use of domain-specific taxonomies. A significant negative correlation was found between the time series of different terms per yearly corpus segments and the terms assigned to taxonomy for corpora with high quality level of pre-processing. A semantic analysis based on a simple TDM method with significant frequency threshold measures resulted in significant different knowledge extracted from corpora with different qualities of pre-processing. With measures introduced in this research it is possible to measure the quality of applied taxonomy. Rules for the measuring of corpus as well as taxonomy quality were derived from these results and advice suggested for the appropriate level of pre-processing. Datenvorverarbeitung Text Data Mining Korpuskennzahlen Korpuslinguistik Computerlinguistik Vorverarbeitungsqualität Wissensextraktion Text Data Mining Corpus Measures Corpus Linguistics Computational Linguistics Data Pre-processing Pre-processing Quality Knowledge Extraction 330 Wirtschaft 17 Wirtschaft QP 345 ddc:330
34	PTC Creo Simulate 4 Roadmap Coronado, Jose 22 July 2016 (has links) (PDF) This presentation is intended to inform about the enhancements to Creo Simulate 4.0 and the Roadmap for the future (5.0 +) Simulation Creo Simulate 4.0 Creo Simulate 4.0 Creo Roadmap Simulation Pre-processing Post-processing ddc:629 Creo Simulate
35	CREO SIMULATE : ROADMAP Coronado, Jose 06 June 2017 (has links) (PDF) This presentation is intended to inform about the enhancements of Creo Simulate and the Roadmap for the future. Simulation Creo Simulate FEM simulation topology optimization Creo Roadmap Pre-processing Post-processing ddc:620 Simulation Creo Simulate
36	Sociální sítě a dobývání znalostí / Social networks and data mining Zvirinský, Peter January 2014 (has links) Recent data mining methods represent modern approaches capable of analyzing large amounts of data and extracting meaningful and potentially useful information from it. In this work, we discuss all the essential steps of the data mining process - including data preparation, storage, cleaning, data analysis as well as visualization of the obtained results. In particular, this work is focused on the data available publicly from the Insolvency Register of the Czech Republic, that comprises all insolvency proceedings commenced after 1. January 2008 in the Czech Republic. With regard to the considered type of data, several data mining methods have been discussed, implemented, tested and evaluated. Among others, the studied techniques include Market Basket Analysis, Bayesian networks and social network analysis. The obtained results reveal several social patterns common in the current Czech society.
37	O efeito do uso de diferentes formas de extração de termos na compreensibilidade e representatividade dos termos em coleções textuais na língua portuguesa / The effect of using different forms of terms extraction on its comprehensibility and representability in Portuguese textual domains Conrado, Merley da Silva 10 September 2009 (has links) A extração de termos em coleções textuais, que é uma atividade da etapa de Pré-Processamento da Mineração de Textos, pode ser empregada para diversos fins nos processos de extração de conhecimento. Esses termos devem ser cuidadosamente extraídos, uma vez que os resultados de todo o processo dependerão, em grande parte, da \"qualidade\" dos termos obtidos. A \"qualidade\" dos termos, neste trabalho, abrange tanto a representatividade dos termos no domínio em questão como sua compreensibilidade. Tendo em vista sua importância, neste trabalho, avaliou-se o efeito do uso de diferentes técnicas de simplificação de termos na compreensibilidade e representatividade dos termos em coleções textuais na Língua Portuguesa. Os termos foram extraídos seguindo os passos da metodologia apresentada neste trabalho e as técnicas utilizadas durante essa atividade de extração foram a radicalização, lematização e substantivação. Para apoiar tal metodologia, foi desenvolvida uma ferramenta, a ExtraT (Ferramenta para Extração de Termos). Visando garantir a \"qualidade\" dos termos extraídos, os mesmos são avaliados objetiva e subjetivamente. As avaliações subjetivas, ou seja, com o auxílio de especialistas do domínio em questão, abrangem a representatividade dos termos em seus respectivos documentos, a compreensibilidade dos termos obtidos ao utilizar cada técnica e a preferência geral subjetiva dos especialistas em cada técnica. As avaliações objetivas, que são auxiliadas por uma ferramenta desenvolvida (a TaxEM - Taxonomia em XML da Embrapa), levam em consideração a quantidade de termos extraídos por cada técnica, além de abranger tambéem a representatividade dos termos extraídos a partir de cada técnica em relação aos seus respectivos documentos. Essa avaliação objetiva da representatividade dos termos utiliza como suporte a medida CTW (Context Term Weight). Oito coleções de textos reais do domínio de agronegócio foram utilizadas na avaliaçao experimental. Como resultado foram indicadas algumas das características positivas e negativas da utilização das técnicas de simplificação de termos, mostrando que a escolha pelo uso de alguma dessas técnicas para o domínio em questão depende do objetivo principal pré-estabelecido, que pode ser desde a necessidade de se ter termos compreensíveis para o usuário até a necessidade de se trabalhar com uma menor quantidade de termos / The task of term extraction in textual domains, which is a subtask of the text pre-processing in Text Mining, can be used for many purposes in knowledge extraction processes. These terms must be carefully extracted since their quality will have a high impact in the results. In this work, the quality of these terms involves both representativity in the specific domain and comprehensibility. Considering this high importance, in this work the effects produced in the comprehensibility and representativity of terms were evaluated when different term simplification techniques are utilized in text collections in Portuguese. The term extraction process follows the methodology presented in this work and the techniques used were radicalization, lematization and substantivation. To support this metodology, a term extraction tool was developed and is presented as ExtraT. In order to guarantee the quality of the extracted terms, they were evaluated in an objective and subjective way. The subjective evaluations, assisted by domain specialists, analyze the representativity of the terms in related documents, the comprehensibility of the terms with each technique, and the specialist\'s opinion. The objective evaluations, which are assisted by TaxEM and by Thesagro (National Agricultural Thesaurus), consider the number of extracted terms by each technique and their representativity in the related documents. This objective evaluation of the representativity uses the CTW measure (Context Term Weight) as support. Eight real collections of the agronomy domain were used in the experimental evaluation. As a result, some positive and negative characteristics of each techniques were pointed out, showing that the best technique selection for this domain depends on the main pre-established goal, which can involve obtaining better comprehensibility terms for the user or reducing the quantity of extracted terms Extração de termos Lematização Lemmatization Mineração de textos Pré-processamento Pre-processing Radicalização Stemming Substantivação Substantivation Term extraction Text mining
38	Visualização de operações de junção em sistemas de bases de dados para mineração de dados. / Visualization of join operations in DBMS for data mining. Barioni, Maria Camila Nardini 13 June 2002 (has links) Nas últimas décadas, a capacidade das empresas de gerar e coletar informações aumentou rapidamente. Essa explosão no volume de dados gerou a necessidade do desenvolvimento de novas técnicas e ferramentas que pudessem, além de processar essa enorme quantidade de dados, permitir sua análise para a descoberta de informações úteis, de maneira inteligente e automática. Isso fez surgir um proeminente campo de pesquisa para a extração de informação em bases de dados denominado Knowledge Discovery in Databases KDD, no geral técnicas de mineração de dados DM têm um papel preponderante. A obtenção de bons resultados na etapa de mineração de dados depende fortemente de quão adequadamente o preparo dos dados é realizado. Sendo assim, a etapa de extração de conhecimento (DM) no processo de KDD, é normalmente precedida de uma etapa de pré-processamento, onde os dados que porventura devam ser submetidos à etapa de DM são integrados em uma única relação. Um problema importante enfrentado nessa etapa é que, na maioria das vezes, o usuário ainda não tem uma idéia muito precisa dos dados que devem ser extraídos. Levando em consideração a grande habilidade de exploração da mente humana, este trabalho propõe uma técnica de visualização de dados armazenados em múltiplas relações de uma base de dados relacional, com o intuito de auxiliar o usuário na preparação dos dados a serem minerados. Esta técnica permite que a etapa de DM seja aplicada sobre múltiplas relações simultaneamente, trazendo as operações de junção para serem parte desta etapa. De uma maneira geral, a adoção de junções em ferramentas de DM não é prática, devido ao alto custo computacional associado às operações de junção. Entretanto, os resultados obtidos nas avaliações de desempenho da técnica proposta neste trabalho mostraram que ela reduz esse custo significativamente, tornando possível a exploração visual de múltiplas relações de uma maneira interativa. / In the last decades the capacity of information generation and accumulation increased quickly. With the explosive growth in the volume of data, new techniques and tools are being sought to process it and to automatically discover useful information from it, leading to techniques known as Knowledge Discovery in Databases KDD where, in general, data mining DM techniques play an important role. The results of applying data mining techniques on datasets are highly dependent on proper data preparation. Therefore, in traditional DM processes, data goes through a pre-processing step that results in just one table that is submitted to mining. An important problem faced during this step is that, most of the times, the analyst doesnt have a clear idea of what portions of data should be mined. This work reckons the strong ability of human beings to interpret data represented in graphical format, to develop a technique to visualize data from multiple tables, helping human analysts when preparing data to DM. This technique allows the data mining process to be applied over multiple relations at once, bringing the join operations to become part of this process. In general, the use of multiple tables in DM tools is not practical, due to the high computational cost required to explore them. Experimental evaluation of the proposed technique shows that it reduces this cost significantly, turning it possible to visually explore data from multiple tables in an interactive way. knowledge discovery in databases mineração visual de dados pré-processamento pre-processing visual data mining
39	Denoising Tandem Mass Spectrometry Data Offei, Felix 01 May 2017 (has links) Protein identification using tandem mass spectrometry (MS/MS) has proven to be an effective way to identify proteins in a biological sample. An observed spectrum is constructed from the data produced by the tandem mass spectrometer. A protein can be identified if the observed spectrum aligns with the theoretical spectrum. However, data generated by the tandem mass spectrometer are affected by errors thus making protein identification challenging in the field of proteomics. Some of these errors include wrong calibration of the instrument, instrument distortion and noise. In this thesis, we present a pre-processing method, which focuses on the removal of noisy data with the hope of aiding in better identification of proteins. We employ the method of binning to reduce the number of noise peaks in the data without sacrificing the alignment of the observed spectrum with the theoretical spectrum. In some cases, the alignment of the two spectra improved. Protein Identification Tandem Mass Spectrometry Pre-processing Binning. Applied Statistics Clinical Trials Genomics Laboratory and Basic Science Research Statistical Methodology
40	Using STAR-CCM+ to Evaluate Multi-User Collaboration in CFD Webster, Kasey Johnson 01 October 2015 (has links) The client-server architecture of STAR-CCM+ allows multiple users to collaborate on a simulation set-up. The effectiveness of collaboration with this architecture is tested and evaluated on five models. The testing of these models is a start to finish set-up of an entire simulation excluding computational time for generating mesh and solving the solution. The different models have distinct differences which test every operation that would be used in a general CFD simulation. These tests focus on reducing the time spent preparing the geometry to be meshed, including setting up for a conformal mesh between multiple regions in conjugate heat transfer models. Results from these five tests show a maximum speed up of 36%. computational fluid dynamics collaboration meshing geometry preparation integrated CAD and analysis pre-processing computer aided engineering Mechanical Engineering

Search results