Spelling suggestions: "subject:"[een] DATASET"" "subject:"[enn] DATASET""
91 |
Rendering Photorealistic Images of Sticky Notes in Synthesized Indoor Environments : Creating a synthetic dataset in procedurally genererated environmentsEknefelt, Karl January 2022 (has links)
This thesis is about creating a synthetic data generation pipeline to replace the manual process of adding bounding box-labeled images of sticky notes to an evaluation dataset of a traditional CV algorithm. With this approach, 462 labeled 12 megapixel-images in 55 different, procedurally varied scenes were generated in 8 hours using a MacBook M1 Max laptop. The new pipeline allows for large-scale labeled data generation with minimal human intervention. As an added bonus it allows for lots of labeled data to be generated, which opens the door for developing a machine learning-based CV engine in the future. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p>
|
92 |
Krav och metoder för insamling av data för maskinlärning inom svensk byggindustri : En utforskning av behov och anpassning av datasetLarsson, Isabell January 2023 (has links)
Ur ett pågående forskningsprojekt om artificiell intelligens (AI) har ett behov vuxit fram att hitta en metod för datainsamling inom svensk byggkontext, detta examensarbete hade som syfte är att uppfylla det behovet. Forskningen är inom området maskinlärning (ML) och dataseende (CV) i bygg och anläggningsbranschen. Där dataseende innebär i stora drag går ut på att en dator extraherar information ur visuella data, det vill säga bilder och filmer. Datainsamlingen behöver vara av tillräcklig omfattning för att skapa ett dataset för maskininlärning, med målet att Boston Dynamics SPOT ska kunna användas i bygg - och anläggningsbranschen. Fyra metoder för datainsamling har utvärderats och ställts mot varandra i syfte att hitta den metod som ger bäst förutsättningar att bygga ett dataset. Det bästa förhållningsättet baserat på studiens förutsättningar var att använda en experimentell metod med induktiv karaktär, alltså har studiens fokus främst legat på metodutveckling baserat på empiri och inte på teori. Breda frågeställningar har ställts för att hitta den bästa datainsamlingsmetoden för maskininlärning, dessa frågeställningar har besvarats genom att studien strukturerats upp i tre huvuddelar: en teoretisk, en empirisk och en teknisk del. Den teoretiska delen har varit en mindre kontextuell litteraturstudie, som gett en djupare förståelse för AI. Fokus har varit på delar som ansetts mest relevanta för studien som övervakat datorseende och aktuell forskning på AI:s appliceringsområden i byggkontext. I den empiriska studien har fallstudier genomförts där data samlas in genom de olika metoderna och utvärderats ur olika synpunkter för att avgöra vilken metod som var mest hållbar i praktiken. Den tekniska delen fokuserade främst på annotering och träning av data. Resultatet blev en siffra mellan 0 och 1 där 1 var bäst. I den tekniska delen gjordes även en utvärdering av operativsystem för ML. De fyra metoder som utvärderats för datainsamling var: 1. Manuell fotografering av gipsskivor på byggarbetsplatser. Där en byggarbetsplats besöktes och strax under 200 bilder samlades in. Modellen som tränades med data från metod ett gav ett resultat på 0.46. 2. Den andra metoden som testades var att utnyttja den arbetskraft som var belägen på byggarbeten. Tanken var att arbetspersonalen skulle fotografera gipsskivor under arbetsdagen och skicka in bilderna till gemensam samlingsplats. Denna metod avfärdades av byggföretaget i fråga, dels till följd av organisatoriska problem, dels av äganderättssynpunkt. 3. I den tredje metoden undersöktes möjligheten att använda ett bildgalleri där historiska data samlats. Ett flertal av dessa bildgallerier undersöktes där behörighet till ett av gallerierna gavs till projektet och en anställd på byggföretaget gick igenom ett flertal andra bildgallerier. Totalt uppskattades att 4000 - 5000 bilder genomsöktes varifrån ett dataset av 38 bilder samlades in. Resultatet vid träningen av modellen från metod tre var 0. 4. Den fjärde metoden var att generera syntetiska bilder. En enkel modell modellerades upp i Revit där totalt 740 bilder samlades in. Vid utvärdering av modellen som tränats på metod fyras bilder var resultatet 0.9. När valideringsbilderna ersattes från syntetiska till verkliga bilder blev resultatet i stället 0.32. Vid närmare undersökning visades det att modellen kände igen gipsskivorna, men förvirring uppstod av bruset i bakgrunden, där spackel på väggen i den verkliga bilden misstogs för gipsskivor. Därför testades en hybridmetod där ett fåtal verkliga bilder lades till i träningsdata. Resultatet av hybridmetoden blev 0.66. Sammanfattningsvis visade resultaten av denna studie att ingen av de befintliga metoderna, i deras nuvarande former, lämpar sig för maskininlärningssyften på byggplatsen. Det framkom dock att det kan vara värt att utforska hybridmetoder närmare som en potentiell lösning. Ett intressant forskningsområde skulle vara att undersöka hybridmetoder som kombinerar element från metod ett och fyra, som tidigare beskrivits. En alternativ hybridmetod kan också utforskas, där omgivningen från bildgalleriet inkorporeras i en virtuell miljö och data samlas in med liknande processer som i metod fyra. Dessa hybridmetoder kan erbjuda fördelar som övervinner de begränsningar som identifierades i de enskilda metoderna och därmed möjliggöra effektivare och mer tillförlitlig datainsamling för maskininlärningsapplikationer inom den studerade kontexten. Framtida forskning bör inriktas på att utforska och utvärdera dessa hybridmetoder för att bättre förstå deras potential och fördelar inom området maskininlärning och datavetenskap.
|
93 |
Task Distillation: Transforming Reinforcement Learning into Supervised LearningWilhelm, Connor 12 October 2023 (has links) (PDF)
Recent work in dataset distillation focuses on distilling supervised classification datasets into smaller, synthetic supervised datasets in order to reduce per-model costs of training, to provide interpretability, and to anonymize data. Distillation and its benefits can be extended to a wider array of tasks. We propose a generalization of dataset distillation, which we call task distillation. Using techniques similar to those used in dataset distillation, any learning task can be distilled into a compressed synthetic task. Task distillation allows for transmodal distillations, where a task of one modality is distilled into a synthetic task of another modality, allowing a more complex learning task, such as a reinforcement learning environment, to be reduced to a simpler learning task, such as supervised classification. In order to advance task distillation beyond supervised-to-supervised distillation, we explore distilling reinforcement learning environments into supervised learning datasets. We propose a new distillation algorithm that allows PPO to be used to distill a reinforcement learning environment. We demonstrate k-shot learning on distilled cart-pole to demonstrate the effectiveness of our distillation algorithm, as well as to explore distillation generalization. We distill multi-dimensional cart-pole environments to their minimum-sized distillations and show that this matches the theoretical minimum number of data instances required to teach each task. We demonstrate how a distilled task can be used as an interpretability artifact, as it compactly represents everything needed to learn the task. We demonstrate the feasibility of distillation in more complex Atari environments by fully distilling Centipede and demonstrating that distillation is cheaper than training directly on Centipede for training more than 9 models. We provide a method to "partially" distill more complex environments and demonstrate it on Ms. Pac-Man, Pong, and Space Invaders and show how it scales distillation difficulty fully on Centipede.
|
94 |
Comparative Analysis of Machine Learning Algorithms on Activity Recognition from Wearable Sensors’ MHEALTH dataset Supported with a Comprehensive Process and Development of an Analysis ToolSheraz, Nasir January 2019 (has links)
Human activity recognition based on wearable sensors’ data is quite an attractive
subject due to its wide application in the fields of healthcare, wellbeing and smart
environments. This research is also focussed on predictive performance
comparison of machine learning algorithms for activity recognition from wearable
sensors’ (MHEALTH) data while employing a comprehensive process. The
framework is adapted from well-laid data science practices which addressed the
data analyses requirements quite successfully. Moreover, an Analysis Tool is
also developed to support this work and to make it repeatable for further work.
A detailed comparative analysis is presented for five multi-class classifier
algorithms on MHEALTH dataset namely, Linear Discriminant Analysis (LDA),
Classification and Regression Trees (CART), Support Vector Machines (SVM),
K-Nearest Neighbours (KNN) and Random Forests (RF). Beside using original
MHEALTH data as input, reduced dimensionality subsets and reduced features
subsets were also analysed. The comparison is made on overall accuracies,
class-wise sensitivity and specificity of each algorithm, class-wise detection rate
and detection prevalence in comparison to prevalence of each class, positive and
negative predictive values etc. The resultant statistics have also been compared
through visualizations for ease of understanding and inference.
All five ML algorithms were applied for classification using the three sets of input
data. Out of all five, three performed exceptionally well (SVM, KNN, RF) where
RF was best with an overall accuracy of 99.9%. Although CART did not perform well as a classification algorithm, however, using it for ranking inputs was a better
way of feature selection. The significant sensors using CART ranking were found
to be accelerometers and gyroscopes; also confirmed through application of
predictive ML algorithms. In dimensionality reduction, the subset data based on
CART-selected features yielded better classification than the subset obtained
from PCA technique.
|
95 |
A Scalable, Load-Balancing Data Structure for Highly Dynamic EnvironmentsFoster, Anthony 05 June 2008 (has links)
No description available.
|
96 |
Using Text based Visualization in Data AnalysisWu, Yingyu 28 April 2014 (has links)
No description available.
|
97 |
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural NetworksBhandare, Ashray Sadashiv January 2017 (has links)
No description available.
|
98 |
A DATABASE SYSTEM TO STORE AND RETRIEVE A CONCEPT LATTICE STRUCTUREASHOK, RAMYA January 2005 (has links)
No description available.
|
99 |
A Comparison of Rule Extraction Techniques with Emphasis on Heuristics for Imbalanced DatasetsSingh, Manjeet 22 September 2010 (has links)
No description available.
|
100 |
Embodied Data Exploration in Immersive Environments: Application in Geophysical Data AnalysisSardana, Disha 05 June 2023 (has links)
Immersive analytics is an emerging field of data exploration and analysis in immersive environments. It is an active research area that explores human-centric approaches to data exploration and analysis based on the spatial arrangement and visualization of data elements in immersive 3D environments. The availability of immersive extended reality systems has increased tremendously recently, but it is still not as widely used as conventional 2D displays. In this dissertation, we described an immersive analysis system for spatiotemporal data and performed several user studies to measure the user performance in the developed system, and laid out design guidelines for an immersive analytics environment. In our first study, we compared the performance of users based on specific visual analytics tasks in an immersive environment and on a conventional 2D display. The approach was realized based on the coordinated multiple-views paradigm. We also designed an embodied interaction for the exploration of spatial time series data. The findings from the first user study showed that the developed system is more efficient in a real immersive environment than using it on a conventional 2D display. One of the important challenges we realized while designing an immersive analytics environment was to find the optimal placement and identification of various visual elements. In our second study, we explored the iterative design of the placement of visual elements and interaction with them based on frames of reference. Our iterative designs explored the impact of the visualization scale for three frames of reference and used the collected user feedback to compare the advantages and limitations of these three frames of reference. In our third study, we described an experiment that quantitatively and qualitatively investigated the use of sonification, i.e., conveying information through nonspeech audio, in an immersive environment that utilized empirical datasets obtained from a multi-dimensional geophysical system. We discovered that using event-based sonification in addition to the visual channel was extremely effective in identifying patterns and relationships in large, complex datasets. Our findings also imply that the inclusion of audio in an immersive analytics system may increase users’ level of confidence when performing analytics tasks like pattern recognition. We outlined the sound design principles for an immersive analytics environment using real-world geospace science datasets and assessed the benefits and drawbacks of using sonification in an immersive analytics setting. / Doctor of Philosophy / When it comes to exploring data, visualization is the norm. We make line charts, scatter plots, bar graphs, or heat maps to look for patterns in data using traditional desktop-based approaches. However, biologically humans are optimized to observe the world in three dimensions. This research is motivated by the idea that representing data in immersive 3D environments can provide a new perspective that may lead to the discovery of previously undetected data patterns. Experiencing the data in three dimensions, engaging multiple senses like sound and sight, and leveraging human embodiment, interaction capabilities, and sense of presence may lead to a unique understanding of the data that is not feasible using traditional visual analytics. In this research, we first compared the data analysis process in a mixed reality system, where real and virtual worlds co-exist, versus doing the same analytical tasks in a desktop-based environment. In our second study, we studied where different charts and data visualizations should be placed based on the scale of the environment, such as table-top versus room-sized. We studied the strengths and limitations of different scales based on the visual and interaction design of the developed system. In our third study, we used a real-world space science dataset to test the liabilities and advantages of using the immersive approach. We also used audio and explored what kinds of audio work for which analytical tasks and laid out design guidelines based on audio. Through this research, we studied how to do data analytics in emerging mixed reality environments and presented results and design guidelines for future developers, designers, and researchers in this field.
|
Page generated in 0.0468 seconds