Global ETD Search

1	Implementing a Data Acquisition System for the Training of Cloud Coverage Neural Networks Montgomery, Weston C 01 June 2021 (has links) (PDF) Cal Poly is home to a solar farm designed to nominally generate 4.5 MW of electricity. The Gold Tree Solar Farm (GTSF) is currently the largest photovoltaic array in the California State University (CSU) system, and it was claimed to be able to produce approximately 11 GWh per year. These types of projections come from power generation models which have been developed to predict power production of these large solar fields. However, when it comes to near-term forecasting of power generation with variable sources such as wind and solar, there is definitely room for improvement. The two primary factors that could impact solar power generation are shading and the angle of the sun. The angle of the sun relative to GTSF’s panels can be analytically calculated using geometry. Shading due to cloud coverage, on the other hand, can be very difficult to map. Due to this, artificial neural networks (NN) have a lot of potential for accurate near-term cloud coverage forecasting. Much of the necessary training data (e.g. wind speeds, temperature, humidity, etc.) can be acquired from online sources, but the most important dataset needs to be captured at GTSF: sky images showing the exact location of the clouds over the solar field. Therefore, a new image capturing digital acquisition (DAQ) system has been implemented to gather the necessary training data for a goal of forecasting cloud coverage 15-30 minutes into the future. San Luis Obispo Solar Energy Machine Learning Data Pipeline Energy Systems Mechanical Engineering
2	Domain Specific Language (DSL) visualisation for Big Data Pipelines Mitrovic, Vlado January 2024 (has links) With the grow of big data technologies, it has become challenging to design and manage complex data workflow, especially for non technical person. However, in order to understand and process these data the best way, we need to rely on domain expert who are often not familiar with tools available on the market. This thesis discovers the needs and describe the implementation of an easy to use tool to define and visualise data processing workflow. The research methodology includes the definition of customer requirements, architecture design, prototype development and user testing. The iterative approach used in this project ensure continuous improvement based on users feedback. The final solution then assessed using KPI metrics such as usability, integration, performances and support. / Med den växande big data-tekniken har det blivit en utmaning att utforma och hantera komplexa dataarbetsflöden, särskilt för icke-tekniska personer. För att förstå och bearbeta dessa data på bästa sätt måste vi dock förlita oss på domänexperter som ofta inte är bekanta med de verktyg som finns tillgängliga på marknaden. Denna avhandling identifierar behoven och beskriver implementeringen av ett lättanvänt verktyg för att definiera och visualisera arbetsflödet för databehandling. Detta genom att abstrahera de tekniska krav som krävs av andra lösningar. Forskningsmetoden omfattar definition av kundkrav, arkitekturdesign, prototyputveckling och användartestning. Det iterativa tillvägagångssätt som används i detta projekt säkerställer kontinuerlig förbättring baserat på användarnas feedback. Den slutliga lösningen utvärderas sedan med hjälp av nyckeltal som användbarhet, integration, prestanda och support. Data pipeline DSL DataCloud Visualization Datapipeline DSL DataCloud Visualisering Computer and Information Sciences Data- och informationsvetenskap
3	DataOps : Towards Understanding and Defining Data Analytics Approach Mainali, Kiran January 2020 (has links) Data collection and analysis approaches have changed drastically in the past few years. The reason behind adopting different approach is improved data availability and continuous change in analysis requirements. Data have been always there, but data management is vital nowadays due to rapid generation and availability of various formats. Big data has opened the possibility of dealing with potentially infinite amounts of data with numerous formats in a short time. The data analytics is becoming complex due to data characteristics, sophisticated tools and technologies, changing business needs, varied interests among stakeholders, and lack of a standardized process. DataOps is an emerging approach advocated by data practitioners to cater to the challenges in data analytics projects. Data analytics projects differ from software engineering in many aspects. DevOps is proven to be an efficient and practical approach to deliver the project in the Software Industry. However, DataOps is still in its infancy, being recognized as an independent and essential task data analytics. In this thesis paper, we uncover DataOps as a methodology to implement data pipelines by conducting a systematic search of research papers. As a result, we define DataOps outlining ambiguities and challenges. We also explore the coverage of DataOps to different stages of the data lifecycle. We created comparison matrixes of different tools and technologies categorizing them in different functional groups to demonstrate their usage in data lifecycle management. We followed DataOps implementation guidelines to implement data pipeline using Apache Airflow as workflow orchestrator inside Docker and compared with simple manual execution of a data analytics project. As per evaluation, the data pipeline with DataOps provided automation in task execution, orchestration in execution environment, testing and monitoring, communication and collaboration, and reduced end-to-end product delivery cycle time along with the reduction in pipeline execution time. / Datainsamling och analysmetoder har förändrats drastiskt under de senaste åren. Anledningen till ett annat tillvägagångssätt är förbättrad datatillgänglighet och kontinuerlig förändring av analyskraven. Data har alltid funnits, men datahantering är viktig idag på grund av snabb generering och tillgänglighet av olika format. Big data har öppnat möjligheten att hantera potentiellt oändliga mängder data med många format på kort tid. Dataanalysen blir komplex på grund av dataegenskaper, sofistikerade verktyg och teknologier, förändrade affärsbehov, olika intressen bland intressenter och brist på en standardiserad process. DataOps är en framväxande strategi som förespråkas av datautövare för att tillgodose utmaningarna i dataanalysprojekt. Dataanalysprojekt skiljer sig från programvaruteknik i många aspekter. DevOps har visat sig vara ett effektivt och praktiskt tillvägagångssätt för att leverera projektet i mjukvaruindustrin. DataOps är dock fortfarande i sin linda och erkänns som en oberoende och viktig uppgiftsanalys. I detta examensarbete avslöjar vi DataOps som en metod för att implementera datarörledningar genom att göra en systematisk sökning av forskningspapper. Som ett resultat definierar vi DataOps som beskriver tvetydigheter och utmaningar. Vi undersöker också täckningen av DataOps till olika stadier av datalivscykeln. Vi skapade jämförelsesmatriser med olika verktyg och teknologier som kategoriserade dem i olika funktionella grupper för att visa hur de används i datalivscykelhantering. Vi följde riktlinjerna för implementering av DataOps för att implementera datapipeline med Apache Airflow som arbetsflödesorkestrator i Docker och jämfört med enkel manuell körning av ett dataanalysprojekt. Enligt utvärderingen tillhandahöll datapipelinen med DataOps automatisering i uppgiftskörning, orkestrering i exekveringsmiljö, testning och övervakning, kommunikation och samarbete, och minskad leveranscykeltid från slut till produkt tillsammans med minskningen av tid för rörledningskörning. DataOps Data lifecycle Data analytics DataOps pipeline Data pipeline DataOps tools and technologies DataOps pipeline DataOps Data lifecycle Data analytics DataOps pipeline Data pipeline DataOps tools and technology DataOps pipeline Computer and Information Sciences Data- och informationsvetenskap
4	Implementing an Interactive Simulation Data Pipeline for Space Weather Visualization Berg, Matthias, Grangien, Jonathan January 2018 (has links) This thesis details work carried out by two students working as contractors at the Community Coordinated Modelling Center at Goddard Space Flight Center of the National Aeronautics and Space Administration. The thesis is made possible by and aims to contribute to the OpenSpace project. The first track of the work implemented is the handling of and putting together new data for a visualization of coronal mass ejections in OpenSpace. The new data allows for observation of coronal mass ejections at their origin by the surface of the Sun, whereas previous data visualized them from 30 solar radii out from the Sun and outwards. Previously implemented visualization techniques are used together to visualize different volume data and fieldlines, which together with a synoptic magnetogram of the Sun gives a multi-layered visualization. The second track is an experimental implementation of a generalized and less user involved process for getting new data into OpenSpace, with a priority on volume data as that was a subject of experience. The results show a space weather model visualization, and how one such model can be adapted to fit within the parameters of the OpenSpace project. Additionally, the results show how a GUI connected to a series of background events can form a data pipeline to make complicated space weather models more easily available. OpenSpace NASA solar data volume rendering data loading data pipeline GUI fieldlines data visualization space exploration thesis Media and Communication Technology Medieteknik
5	The Application of LoRaWAN as an Internet of Things Tool to Promote Data Collection in Agriculture Adam B Schreck (15315892) 27 April 2023 (has links) <p>Information about the conditions of specific fields and assets is critical for farm managers to make operational decisions. Location, rainfall, windspeed, soil moisture, and temperature are examples of metrics that influence the ability to perform certain tasks. Monitoring these events in real time and being able to store historical data can be done using Internet of Things (IoT) devices such as sensors. The abilities of this technology have previously been communicated, yet few farmers have adopted these connected devices into their work. A lack of reliable internet connection, the high annual cost of current on-market systems, and a lack of technical awareness have all contributed to this disconnect. One technology that can better meet the demand of farmers is LoRaWAN because of its long range, low power, and low cost. To assist farmers in implementing this technology on their farms the goal was to build a LoRaWAN network with several sensors to measure metrics such as weather data, distribute these systems locally, and provide context to the operation of IoT networks. By leveraging readily available commercial hardware and opens source software two examples of standalone networks were created with sensor data stored locally and without a dependence on internet connectivity. The first use case was a kit consisting of a gateway and small PC mounted to a tripod with 6 individual sensors and cost close to $2200 in total. An additional design was prepared for a micro-computer-based version using a Raspberry Pi, which made improvements to the original design. These adjustments included a lower cost and complication of hardware, software with more open-source community support, and cataloged steps to increase approachability. Given outside factors, the PC architecture was chosen for mass distribution. Over one year, several identical units were produced and given to farms, extension educators, and vocational agricultural programs. From this series of deployments, all units survived the growing season without damage from the elements, general considerations about the chosen type of sensors and their potential drawbacks were made, the practical observed average range for packet acceptance was 3 miles, and battery life among sensors remained usable after one year. The Pi-based architecture was implemented in an individual use case with instructions to assist participation from any experience level. Ultimately, this work has introduced individuals to the possibilities of creating and managing their own network and what can be learned from a reasonably simple, self-managed data pipeline.</p> Data communications LoRaWAN IoT (Internet of Things) Weather Data Collection Digital Agriculture Open Source Data Pipeline RF
6	Custom Open-Source Software Tools for Targeted and Untargeted Analysis of High Resolution Mass Spectrometry Data and Characterization of Lignin Letourneau, Dane René 11 March 2025 (has links) In den letzten Jahren ist das Interesse an freien und quelloffenen Softwaretools für die analytische Wissenschaft stark gestiegen. Dies hat zu einer Fülle von Innovationen, Kooperationen und Lösungen im Bereich der Datenanalyse geführt. Die Massenspektrometrie ist in dieser Hinsicht ein besonderer Schwerpunkt; moderne HRMS-Instrumente sind leistungsstarke Analysewerkzeuge, die in der Lage sind, eine enorme Menge an detaillierten Informationen in einem einzigen Experiment zu erzeugen. Häufig enthalten diese Datensätze Muster oder „Fingerabdrücke“ von Molekülen, die für den Analytiker mit herkömmlichen Visualisierungstools oder proprietärer Software nicht sichtbar sind. Bei „untargeted“ Experimenten kann das Ziel darin bestehen, neue und neuartige Metaboliten oder molekulare Variationen zu entdecken. In jedem Fall hat die Open-Source-Softwaregemeinschaft auf diese Herausforderungen reagiert, und es gibt jetzt eine große Vielfalt an Lösungen für die Datenverarbeitung, die Musterfindung, die Entdeckung von Molekülen und mehr. In dieser Arbeit wird die Entwicklung mehrerer Software-Tools und Algorithmen beschrieben, die darauf abzielen, aussagekräftige Informationen in HRMS-Datensätzen sowohl bei „targeted“ als auch bei „untargeted“ Analysen nach der Transformation in den Massendefektraum zu finden. Dazu gehört die Erkennung von Mustern, die sich wiederholenden Einheiten von polymeren Analyten entsprechen, und die Zuordnung von Molekülformeln zu diesen wechselnden Einheiten. Diese Methoden werden veröffentlicht und als Open-Source-Software freigegeben und dann angewandt, um Unterschiede zwischen einer Vielzahl von Ligninproben aus verschiedenen Quellen und Behandlungsprozessen zu charakterisieren und um optimale Probenvorbereitungsbedingungen für jedes Lignin für die HRMS-Analyse vorzuschlagen. Es ist zu hoffen, dass die hier vorgestellte Arbeit den Weg für künftige Analysen ebnet, die das Verständnis komplexer natürlicher Gemische mithilfe von HRMS maximieren sollen. / In recent years, there has been a dramatic rise of interest in free and open-source software tools aimed at the analytical science community. This has led to a plethora of innovations, collaborations, and solutions in the data analysis space. Mass spectrometry has been a particular area of focus in this regard; modern high-resolution MS instruments are powerful analytical tools capable of generating an enormous quantity of detailed information in a single experiment. Often these datasets contain patterns or "fingerprints" of molecules that may not be visible to the analyst using conventional visualization tools or proprietary software. In “untargeted” experiments, the goal might be to discover new and novel metabolites or molecular variations. In either case, the open-source software community has responded to these challenges, and there are now a great variety of solutions for data processing, pattern-finding, molecular discovery, and more. This thesis describes the development of several software tools and algorithms aimed at finding meaningful information in HRMS datasets in both targeted and untargeted analyses after transformation into the mass defect space. This includes recognition of patterns corresponding to repeating units of polymeric analytes and assignment of molecular formulae to these changing units. These methods are published and released as open-source software and then applied to characterize differences between a variety of lignin samples from various sources and treatment processes, and to suggest optimal sample preparation conditions for each lignin for API-HRMS analysis. It is hoped that the work presented here helps pave the way for future analyses seeking to maximize the understanding of complex natural mixtures using HRMS, and the author encourages further modification and development of the algorithms and techniques developed here to facilitate future discoveries in this ever-evolving area of research. Massenspektrometrie Quelloffene Software Lignin Analytische Chemie mass spectrometry lignin open source software data pipeline analytical chemistry high resolution mass spectrometry algorithm untargeted analysis targeted analysis unsupervised analysis supervised analysis 543 Analytische Chemie ddc:543
7	A deep learning based anomaly detection pipeline for battery fleets Khongbantabam, Nabakumar Singh January 2021 (has links) This thesis proposes a deep learning anomaly detection pipeline to detect possible anomalies during the operation of a fleet of batteries and presents its development and evaluation. The pipeline employs sensors that connect to each battery in the fleet to remotely collect real-time measurements of their operating characteristics, such as voltage, current, and temperature. The deep learning based time-series anomaly detection model was developed using Variational Autoencoder (VAE) architecture that utilizes either Long Short-Term Memory (LSTM) or, its cousin, Gated Recurrent Unit (GRU) as the encoder and the decoder networks (LSTMVAE and GRUVAE). Both variants were evaluated against three well-known conventional anomaly detection algorithms Isolation Nearest Neighbour (iNNE), Isolation Forest (iForest), and kth Nearest Neighbour (k-NN) algorithms. All five models were trained using two variations in the training dataset (full-year dataset and partial recent dataset), producing a total of 10 different model variants. The models were trained using the unsupervised method and the results were evaluated using a test dataset consisting of a few known anomaly days in the past operation of the customer’s battery fleet. The results demonstrated that k-NN and GRUVAE performed close to each other, outperforming the rest of the models with a notable margin. LSTMVAE and iForest performed moderately, while the iNNE and iForest variant trained with the full dataset, performed the worst in the evaluation. A general observation also reveals that limiting the training dataset to only a recent period produces better results nearly consistently across all models. / Detta examensarbete föreslår en pipeline för djupinlärning av avvikelser för att upptäcka möjliga anomalier under driften av en flotta av batterier och presenterar dess utveckling och utvärdering. Rörledningen använder sensorer som ansluter till varje batteri i flottan för att på distans samla in realtidsmätningar av deras driftsegenskaper, såsom spänning, ström och temperatur. Den djupinlärningsbaserade tidsserieanomalidetekteringsmodellen utvecklades med VAE-arkitektur som använder antingen LSTM eller, dess kusin, GRU som kodare och avkodarnätverk (LSTMVAE och GRU) VAE). Båda varianterna utvärderades mot tre välkända konventionella anomalidetekteringsalgoritmer -iNNE, iForest och k-NN algoritmer. Alla fem modellerna tränades med hjälp av två varianter av träningsdatauppsättningen (helårsdatauppsättning och delvis färsk datauppsättning), vilket producerade totalt 10 olika modellvarianter. Modellerna tränades med den oövervakade metoden och resultaten utvärderades med hjälp av en testdatauppsättning bestående av några kända anomalidagar under tidigare drift av kundens batteriflotta. Resultaten visade att k-NN och GRUVAE presterade nära varandra och överträffade resten av modellerna med en anmärkningsvärd marginal. LSTMVAE och iForest presterade måttligt, medan varianten iNNE och iForest tränade med hela datasetet presterade sämst i utvärderingen. En allmän observation avslöjar också att en begränsning av träningsdatauppsättningen till endast en ny period ger bättre resultat nästan konsekvent över alla modeller. Forklift batteries Battery sensors Data pipeline Predictive maintenance Anomaly detection Deep learning Battery failure prediction Time-series Variational autoencoder Long short-term memory LSTM Gated recurrent unit GRU Isolation nearest neighbor iNNE Isolation forest iForest kth nearest neighbor kNN. Gaffeltruckbatterier Batterisensorer Datapipeline Prediktivt underhåll Avvikelsedetektering Deep learning Batterifelsprediktion Tidsserier Variationsautokodare Långt korttidsminne LSTM Gated recurrent unit GRU Isolation närmaste granne iNNE Isolation skog iForest kth närmaste granne kNN. Computer and Information Sciences Data- och informationsvetenskap

1

Page generated in 0.0776 seconds