Global ETD Search

91	Krav och metoder för insamling av data för maskinlärning inom svensk byggindustri : En utforskning av behov och anpassning av dataset Larsson, Isabell January 2023 (has links) Ur ett pågående forskningsprojekt om artificiell intelligens (AI) har ett behov vuxit fram att hitta en metod för datainsamling inom svensk byggkontext, detta examensarbete hade som syfte är att uppfylla det behovet. Forskningen är inom området maskinlärning (ML) och dataseende (CV) i bygg och anläggningsbranschen. Där dataseende innebär i stora drag går ut på att en dator extraherar information ur visuella data, det vill säga bilder och filmer. Datainsamlingen behöver vara av tillräcklig omfattning för att skapa ett dataset för maskininlärning, med målet att Boston Dynamics SPOT ska kunna användas i bygg - och anläggningsbranschen. Fyra metoder för datainsamling har utvärderats och ställts mot varandra i syfte att hitta den metod som ger bäst förutsättningar att bygga ett dataset. Det bästa förhållningsättet baserat på studiens förutsättningar var att använda en experimentell metod med induktiv karaktär, alltså har studiens fokus främst legat på metodutveckling baserat på empiri och inte på teori. Breda frågeställningar har ställts för att hitta den bästa datainsamlingsmetoden för maskininlärning, dessa frågeställningar har besvarats genom att studien strukturerats upp i tre huvuddelar: en teoretisk, en empirisk och en teknisk del. Den teoretiska delen har varit en mindre kontextuell litteraturstudie, som gett en djupare förståelse för AI. Fokus har varit på delar som ansetts mest relevanta för studien som övervakat datorseende och aktuell forskning på AI:s appliceringsområden i byggkontext. I den empiriska studien har fallstudier genomförts där data samlas in genom de olika metoderna och utvärderats ur olika synpunkter för att avgöra vilken metod som var mest hållbar i praktiken. Den tekniska delen fokuserade främst på annotering och träning av data. Resultatet blev en siffra mellan 0 och 1 där 1 var bäst. I den tekniska delen gjordes även en utvärdering av operativsystem för ML. De fyra metoder som utvärderats för datainsamling var: 1. Manuell fotografering av gipsskivor på byggarbetsplatser. Där en byggarbetsplats besöktes och strax under 200 bilder samlades in. Modellen som tränades med data från metod ett gav ett resultat på 0.46. 2. Den andra metoden som testades var att utnyttja den arbetskraft som var belägen på byggarbeten. Tanken var att arbetspersonalen skulle fotografera gipsskivor under arbetsdagen och skicka in bilderna till gemensam samlingsplats. Denna metod avfärdades av byggföretaget i fråga, dels till följd av organisatoriska problem, dels av äganderättssynpunkt. 3. I den tredje metoden undersöktes möjligheten att använda ett bildgalleri där historiska data samlats. Ett flertal av dessa bildgallerier undersöktes där behörighet till ett av gallerierna gavs till projektet och en anställd på byggföretaget gick igenom ett flertal andra bildgallerier. Totalt uppskattades att 4000 - 5000 bilder genomsöktes varifrån ett dataset av 38 bilder samlades in. Resultatet vid träningen av modellen från metod tre var 0. 4. Den fjärde metoden var att generera syntetiska bilder. En enkel modell modellerades upp i Revit där totalt 740 bilder samlades in. Vid utvärdering av modellen som tränats på metod fyras bilder var resultatet 0.9. När valideringsbilderna ersattes från syntetiska till verkliga bilder blev resultatet i stället 0.32. Vid närmare undersökning visades det att modellen kände igen gipsskivorna, men förvirring uppstod av bruset i bakgrunden, där spackel på väggen i den verkliga bilden misstogs för gipsskivor. Därför testades en hybridmetod där ett fåtal verkliga bilder lades till i träningsdata. Resultatet av hybridmetoden blev 0.66. Sammanfattningsvis visade resultaten av denna studie att ingen av de befintliga metoderna, i deras nuvarande former, lämpar sig för maskininlärningssyften på byggplatsen. Det framkom dock att det kan vara värt att utforska hybridmetoder närmare som en potentiell lösning. Ett intressant forskningsområde skulle vara att undersöka hybridmetoder som kombinerar element från metod ett och fyra, som tidigare beskrivits. En alternativ hybridmetod kan också utforskas, där omgivningen från bildgalleriet inkorporeras i en virtuell miljö och data samlas in med liknande processer som i metod fyra. Dessa hybridmetoder kan erbjuda fördelar som övervinner de begränsningar som identifierades i de enskilda metoderna och därmed möjliggöra effektivare och mer tillförlitlig datainsamling för maskininlärningsapplikationer inom den studerade kontexten. Framtida forskning bör inriktas på att utforska och utvärdera dessa hybridmetoder för att bättre förstå deras potential och fördelar inom området maskininlärning och datavetenskap. Maskinlärning Datorseende AI dataset datainsamling AI inom svensk byggindustri Svensk byggindustri Architectural Engineering Arkitekturteknik
92	Task Distillation: Transforming Reinforcement Learning into Supervised Learning Wilhelm, Connor 12 October 2023 (has links) (PDF) Recent work in dataset distillation focuses on distilling supervised classification datasets into smaller, synthetic supervised datasets in order to reduce per-model costs of training, to provide interpretability, and to anonymize data. Distillation and its benefits can be extended to a wider array of tasks. We propose a generalization of dataset distillation, which we call task distillation. Using techniques similar to those used in dataset distillation, any learning task can be distilled into a compressed synthetic task. Task distillation allows for transmodal distillations, where a task of one modality is distilled into a synthetic task of another modality, allowing a more complex learning task, such as a reinforcement learning environment, to be reduced to a simpler learning task, such as supervised classification. In order to advance task distillation beyond supervised-to-supervised distillation, we explore distilling reinforcement learning environments into supervised learning datasets. We propose a new distillation algorithm that allows PPO to be used to distill a reinforcement learning environment. We demonstrate k-shot learning on distilled cart-pole to demonstrate the effectiveness of our distillation algorithm, as well as to explore distillation generalization. We distill multi-dimensional cart-pole environments to their minimum-sized distillations and show that this matches the theoretical minimum number of data instances required to teach each task. We demonstrate how a distilled task can be used as an interpretability artifact, as it compactly represents everything needed to learn the task. We demonstrate the feasibility of distillation in more complex Atari environments by fully distilling Centipede and demonstrating that distillation is cheaper than training directly on Centipede for training more than 9 models. We provide a method to "partially" distill more complex environments and demonstrate it on Ms. Pac-Man, Pong, and Space Invaders and show how it scales distillation difficulty fully on Centipede. dataset distillation task distillation reinforcement learning meta-learning Physical Sciences and Mathematics
93	Comparative Analysis of Machine Learning Algorithms on Activity Recognition from Wearable Sensors’ MHEALTH dataset Supported with a Comprehensive Process and Development of an Analysis Tool Sheraz, Nasir January 2019 (has links) Human activity recognition based on wearable sensors’ data is quite an attractive subject due to its wide application in the fields of healthcare, wellbeing and smart environments. This research is also focussed on predictive performance comparison of machine learning algorithms for activity recognition from wearable sensors’ (MHEALTH) data while employing a comprehensive process. The framework is adapted from well-laid data science practices which addressed the data analyses requirements quite successfully. Moreover, an Analysis Tool is also developed to support this work and to make it repeatable for further work. A detailed comparative analysis is presented for five multi-class classifier algorithms on MHEALTH dataset namely, Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), Support Vector Machines (SVM), K-Nearest Neighbours (KNN) and Random Forests (RF). Beside using original MHEALTH data as input, reduced dimensionality subsets and reduced features subsets were also analysed. The comparison is made on overall accuracies, class-wise sensitivity and specificity of each algorithm, class-wise detection rate and detection prevalence in comparison to prevalence of each class, positive and negative predictive values etc. The resultant statistics have also been compared through visualizations for ease of understanding and inference. All five ML algorithms were applied for classification using the three sets of input data. Out of all five, three performed exceptionally well (SVM, KNN, RF) where RF was best with an overall accuracy of 99.9%. Although CART did not perform well as a classification algorithm, however, using it for ranking inputs was a better way of feature selection. The significant sensors using CART ranking were found to be accelerometers and gyroscopes; also confirmed through application of predictive ML algorithms. In dimensionality reduction, the subset data based on CART-selected features yielded better classification than the subset obtained from PCA technique. Machine learning R-Based MHEALTH dataset Data analysis Activity recognition Algorithms Wearable sensors Wearable devices
94	A Scalable, Load-Balancing Data Structure for Highly Dynamic Environments Foster, Anthony 05 June 2008 (has links) No description available. Computer Science Information Systems scalable data structure hashing dynamic non-uniform dataset guaranteed worst case
95	Using Text based Visualization in Data Analysis Wu, Yingyu 28 April 2014 (has links) No description available. Computer Science taxi trajectories categorical dataset evolving text stream text visualization
96	Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural Networks Bhandare, Ashray Sadashiv January 2017 (has links) No description available. Computer Science Convolutional Neural Network Genetic Algorithm Particle Swarm Optimizer Grey Wolf Optimizer MNIST dataset
97	A DATABASE SYSTEM TO STORE AND RETRIEVE A CONCEPT LATTICE STRUCTURE ASHOK, RAMYA January 2005 (has links) No description available. Engineering, Mining Concepts Hierarchy Dataset Lattices Lattice Structure Database Data Mining
98	A Comparison of Rule Extraction Techniques with Emphasis on Heuristics for Imbalanced Datasets Singh, Manjeet 22 September 2010 (has links) No description available. Industrial Engineering Ecological Datasets Imbalanced dataset modeling Artficial Neural Networks Surface Generation Non-linear modeling
99	Embodied Data Exploration in Immersive Environments: Application in Geophysical Data Analysis Sardana, Disha 05 June 2023 (has links) Immersive analytics is an emerging field of data exploration and analysis in immersive environments. It is an active research area that explores human-centric approaches to data exploration and analysis based on the spatial arrangement and visualization of data elements in immersive 3D environments. The availability of immersive extended reality systems has increased tremendously recently, but it is still not as widely used as conventional 2D displays. In this dissertation, we described an immersive analysis system for spatiotemporal data and performed several user studies to measure the user performance in the developed system, and laid out design guidelines for an immersive analytics environment. In our first study, we compared the performance of users based on specific visual analytics tasks in an immersive environment and on a conventional 2D display. The approach was realized based on the coordinated multiple-views paradigm. We also designed an embodied interaction for the exploration of spatial time series data. The findings from the first user study showed that the developed system is more efficient in a real immersive environment than using it on a conventional 2D display. One of the important challenges we realized while designing an immersive analytics environment was to find the optimal placement and identification of various visual elements. In our second study, we explored the iterative design of the placement of visual elements and interaction with them based on frames of reference. Our iterative designs explored the impact of the visualization scale for three frames of reference and used the collected user feedback to compare the advantages and limitations of these three frames of reference. In our third study, we described an experiment that quantitatively and qualitatively investigated the use of sonification, i.e., conveying information through nonspeech audio, in an immersive environment that utilized empirical datasets obtained from a multi-dimensional geophysical system. We discovered that using event-based sonification in addition to the visual channel was extremely effective in identifying patterns and relationships in large, complex datasets. Our findings also imply that the inclusion of audio in an immersive analytics system may increase users’ level of confidence when performing analytics tasks like pattern recognition. We outlined the sound design principles for an immersive analytics environment using real-world geospace science datasets and assessed the benefits and drawbacks of using sonification in an immersive analytics setting. / Doctor of Philosophy / When it comes to exploring data, visualization is the norm. We make line charts, scatter plots, bar graphs, or heat maps to look for patterns in data using traditional desktop-based approaches. However, biologically humans are optimized to observe the world in three dimensions. This research is motivated by the idea that representing data in immersive 3D environments can provide a new perspective that may lead to the discovery of previously undetected data patterns. Experiencing the data in three dimensions, engaging multiple senses like sound and sight, and leveraging human embodiment, interaction capabilities, and sense of presence may lead to a unique understanding of the data that is not feasible using traditional visual analytics. In this research, we first compared the data analysis process in a mixed reality system, where real and virtual worlds co-exist, versus doing the same analytical tasks in a desktop-based environment. In our second study, we studied where different charts and data visualizations should be placed based on the scale of the environment, such as table-top versus room-sized. We studied the strengths and limitations of different scales based on the visual and interaction design of the developed system. In our third study, we used a real-world space science dataset to test the liabilities and advantages of using the immersive approach. We also used audio and explored what kinds of audio work for which analytical tasks and laid out design guidelines based on audio. Through this research, we studied how to do data analytics in emerging mixed reality environments and presented results and design guidelines for future developers, designers, and researchers in this field. Immersive Analytics Mixed Reality Human-Centered Design Sonification Space Weather Dataset Frame of Reference
100	Deep Learning for Code Generation using Snippet Level Parallel Data Jain, Aneesh 05 January 2023 (has links) In the last few years, interest in the application of deep learning methods for software engineering tasks has surged. A variety of different approaches like transformer based methods, statistical machine translation models, models inspired from natural language settings have been proposed and shown to be effective at tasks like code summarization, code synthesis and code translation. Multiple benchmark data sets have also been released but all suffer from one limitation or the other. Some data sets only support a select few programming languages while others support only certain tasks. These limitations restrict researchers' ability to be able to perform thorough analyses of their proposed methods. In this work we aim to alleviate some of the limitations faced by researchers who work in the paradigm of deep learning applications for software engineering tasks. We introduce a large, parallel, multi-lingual programming language data set that supports tasks like code summarization, code translation, code synthesis and code search in 7 different languages. We provide benchmark results for the current state of the art models on all these tasks and we also explore some limitations of current evaluation metrics for code related tasks. We provide a detailed analysis of the compilability of code generated by deep learning models because that is a better measure of ascertaining usability of code as opposed to scores like BLEU and CodeBLEU. Motivated by our findings about compilability, we also propose a reinforcement learning based method that incorporates code compilability and syntax level feedback as rewards and we demonstrate it's effectiveness in generating code that has less syntax errors as compared to baselines. In addition, we also develop a web portal that hosts the models we have trained for code translation. The portal allows translation between 42 possible language pairs and also allows users to check compilability of the generated code. The intent of this website is to give researchers and other audiences a chance to interact with and probe our work in a user-friendly way, without requiring them to write their own code to load and inference the models. / Master of Science / Deep neural networks have now become ubiquitous and find their applications in almost every technology and service we use today. In recent years, researchers have also started applying neural network based methods to problems in the software engineering domain. Software engineering by it's nature requires a lot of documentation, and creating this natural language documentation automatically using programs as input to the neural networks has been one their first applications in this domain. Other applications include translating code between programming languages and searching for code using natural language as one does on websites like stackoverflow. All of these tasks now have the potential to be powered by deep neural networks. It is common knowledge that neural networks are data hungry and in this work we present a large data set containing codes in multiple programming languages like Java, C++, Python, C#, Javascript, PHP and C. Our data set is intended to foster more research in automating software engineering tasks using neural networks. We provide an analysis of performance of multiple state of the art models using our data set in terms of compilability, which measures the number of syntax errors in the code, as well as other metrics. In addition, propose our own deep neural network based model for code translation, which uses feedback from programming language compilers in order to reduce the number of syntax errors in the generated code. We also develop and present a website where some of our code translation models have been hosted. The website allows users to interact with our work in an easy manner without any knowledge of deep learning and get a sense of how these technologies are being applied for software engineering tasks. Deep Learning Code Dataset Code Translation Software Development Compilation Reinforcement Learning

Search results