Spelling suggestions: "subject:"autonome""
111 |
Automatic Gait Recognition : using deep metric learning / Automatisk gångstilsigenkänningPersson, Martin January 2020 (has links)
Recent improvements in pose estimation has opened up the possibility of new areas of application. One of them is gait recognition, the task of identifying persons based on their unique style of walking, which is increasingly being recognized as an important method of biometric indentification. This thesis has explored the possibilities of using a pose estimation system, OpenPose, together with deep Recurrent Neural Networks (RNNs) in order to see if there is sufficient information in sequences of 2D poses to use for gait recognition. For this to be possible, a new multi-camera dataset consisting of persons walking on a treadmill was gathered, dubbed the FOI dataset. The results show that this approach has some promise. It achieved an overall classification accuracy of 95,5 % on classes it had seen during training and 83,8 % for classes it had not seen during training. It was unable to recognize sequences from angles it had not seen during training, however. For that to be possible, more data pre-processing will likely be required.
|
112 |
Evaluation of Face Recognition Accuracy in Surveillance VideoTuvskog, Johanna January 2020 (has links)
Automatic Face Recognition (AFR) can be useful in the forensic field when identifying people in surveillance footage. In AFR systems it is common to use deep neural networks which perform well if the quality of the images keeps a certain level. This is a problem when applying AFR on surveillance data since the quality of those images can be very poor. In this thesis the CNN FaceNet has been used to evaluate how different quality parameters influence the accuracy of the face recognition. The goal is to be able to draw conclusions about how to improve the recognition by using and avoiding certain parameters based on the conditions. Parameters that have been experimented with are angle of the face, image quality, occlusion, colour and lighting. This has been achieved by using datasets with different properties or by alternating the images. The parameters are meant to simulate different situations that can occur in surveillance footage that is difficult for the network to recognise. Three different models have been evaluated with different amount of embeddings and different training data. The results show that the two models trained on the VGGFace2 dataset performs much better than the one trained on CASIA-WebFace. All models performance drops on images with low quality compared to images with high quality because of the training data including mostly high-quality images. In some cases, the recognition results can be improved by applying some alterations in the images. This could be by using one frontal and one profile image when trying to identify a person or occluding parts of the shape of the face if it gets recognized as other persons with similar face shapes. One main improvement would be to extend the training datasets with more low-quality images. To some extent, this could be achieved by different kinds of data augmentation like artificial occlusion and down-sampled images.
|
113 |
NAVIGATION AND PLANNED MOVEMENT OF AN UNMANNED BICYCLEBaaz, Hampus January 2020 (has links)
A conventional bicycle is a stable system given adequate forward velocity. However, the velocity region of stability is limited and depends on the geometric parameters of the bicycle. An autonomous bicycle is just not about maintaining the balance but also controlling where the bicycle is heading. Following paths has been accomplished with bicycles and motorcycles in simulation for a while. Car-like vehicles have followed paths in the real world but few bicycles or motorcycles have done so. The goal of this work is to follow a planned path using a physical bicycle without overcoming the dynamic limitations of the bicycle. Using an iterative design process, controllers for direction and position are developed and improved. Kinematic models are also compared in their ability to simulate the bicycle movement and how controllers in simulation translate to outdoors driving. The result shows that the bicycle can follow a turning path on a residential road without human interaction and that some simulation behaviours do not translate to the real world.
|
114 |
Tracking motion in mineshafts : Using monocular visual odometrySuikki, Karl January 2022 (has links)
LKAB has a mineshaft trolley used for scanning mineshafts. It is suspended down into a mineshaft by wire, scanning the mineshaft on both descent and ascent using two LiDAR (Light Detection And Ranging) sensors and an IMU (Internal Measurement Unit) used for tracking the position. With good tracking, one could use the LiDAR scans to create a three-dimensional model of the mineshaft which could be used for monitoring, planning and visualization in the future. Tracking with IMU is very unstable since most IMUs are susceptible to disturbances and will drift over time; we strive to track the movement using monocular visual odometry instead. Visual odometry is used to track movement based on video or images. It is the process of retrieving the pose of a camera by analyzing a sequence of images from one or multiple cameras. The mineshaft trolley is also equipped with one camera which is filming the descent and ascent and we aim to use this video for tracking. We present a simple algorithm for visual odometry and test its tracking on multiple datasets being: KITTI datasets of traffic scenes accompanied by their ground truth trajectories, mineshaft data intended for the mineshaft trolley operator and self-captured data accompanied by an approximate ground truth trajectory. The algorithm is feature based, meaning that it is focused on tracking recognizable keypoints in sequent images. We compare the performance of our algortihm by tracking the different datasets using two different feature detection and description systems, ORB and SIFT. We find that our algorithm performs well on tracking the movement of the KITTI datasets using both ORB and SIFT whose largest total errors of estimated trajectories are $3.1$ m and $0.7$ m for ORB and SIFT respectively in $51.8$ m moved. This was compared to their ground truth trajectories. The tracking of the self-captured dataset shows by visual inspection that the algorithm can perform well on data which has not been as carefully captured as the KITTI datasets. We do however find that we cannot track the movement with the current data from the mineshaft. This is due to the algorithm finding too few matching features in sequent images, breaking the pose estimation of the visual odometry. We make a comparison of how ORB and SIFT finds features in the mineshaft images and find that SIFT performs better by finding more features. The mineshaft data was never intended for visual odometry and therefore it is not suitable for this purpose either. We argue that the tracking could work in the mineshaft if the visual conditions are made better by focusing on more even lighting and camera placement or if it can be combined with other sensors such as an IMU, that assist the visual odometry when it fails.
|
115 |
Multi-Modal Deep Learning with Sentinel-1 and Sentinel-2 Data for Urban Mapping and Change DetectionHafner, Sebastian January 2022 (has links)
Driven by the rapid growth in population, urbanization is progressing at an unprecedented rate in many places around the world. Earth observation has become an invaluable tool to monitor urbanization on a global scale by either mapping the extent of cities or detecting newly constructed urban areas within and around cities. In particular, the Sentinel-1 (S1) Synthetic Aperture Radar (SAR) and Sentinel-2 (S2) MultiSpectral Instrument (MSI) missions offer new opportunities for urban mapping and urban Change Detection (CD) due to the capability of systematically acquiring wide-swath high-resolution images with frequent revisits globally. Current trends in both urban mapping and urban CD have shifted from employing traditional machine learning methods to Deep Learning (DL) models, specifically Convolutional Neural Networks (CNNs). Recent urban mapping efforts achieved promising results by training CNNs on available built-up data using S2 images. Likewise, DL models have been applied to urban CD problems using S2 data with promising results. However, the quality of current methods strongly depends on the availability of local reference data for supervised training, especially since CNNs applied to unseen areas often produce unsatisfactory results due to their insufficient across-region generalization ability. Since multitemporal reference data are even more difficult to obtain, unsupervised learning was suggested for urban CD. While unsupervised models may perform more consistently across different regions, they often perform considerably worse than their supervised counterparts. To alleviate these shortcomings, it is desirable to leverage Semi-Supervised Learning (SSL) that exploits unlabeled data to improve upon supervised learning, especially because satellite data is plentiful. Furthermore, the integration of SAR data into the current optical frameworks (i.e., data fusion) has the potential to produce models with better generalization ability because the representation of urban areas in SAR images is largely invariant across cities, while spectral signatures vary greatly. In this thesis, a novel Domain Adaptation (DA) approach using SSL is first presented. The DA approach jointly exploits Multi-Modal (MM) S1 SAR and S2 MSI to improve across-region generalization for built-up area mapping. Specifically, two identical sub-networks are incorporated into the proposed model to perform built-up area segmentation from SAR and optical images separately. Assuming that consistent built-up area segmentation should be obtained across data modalities, an unsupervised loss for unlabeled data that penalizes inconsistent segmentation from the two sub-networks was designed. Therefore, the use of complementary data modalities as real-world perturbations for Consistency Regularization (CR) is proposed. For the final prediction, the model takes both data modalities into account. Experiments conducted on a test set comprised of sixty representative sites across the world showed that the proposed DA approach achieves strong improvements (F1 score 0.694) upon supervised learning from S1 SAR data (F1 score 0.574), S2 MSI data (F1 score 0.580) and their input-level fusion (F1 score 0.651). The comparison with two state-of-the-art global human settlement maps, namely GHS-S2 and WSF2019, showed that our model is capable of producing built-up area maps with comparable or even better quality. For urban CD, a new network architecture for the fusion of SAR and optical data is proposed. Specifically, a dual stream concept was introduced to process different data modalities separately, before combining extracted features at a later decision stage. The individual streams are based on the U-Net architecture. The proposed strategy outperformed other U-Net-based approaches in combination with uni-modal data and MM data with feature level fusion. Furthermore, our approach achieved state-of-the-art performance on the problem posed by a popular urban CD dataset (F1 score 0.600). Furthermore, a new network architecture is proposed to adapt Multi-Modal Consistency Regularization (MMCR) for urban CD. Using bi-temporal S1 SAR and S2 MSI image pairs as input, the MM Siamese Difference (Siam-Diff) Dual-Task (DT) network not only predicts changes using a difference decoder, but also segments buildings for each image with a semantic decoder. The proposed network is trained in a semi-supervised fashion using the underlying idea of MMCR, namely that building segmentation across sensor modalities should be consistent, to learn more robust features. The proposed method was tested on an urban CD task using the 60 sites of the SpaceNet7 dataset. A domain gap was introduced by only using labels for sites located in the Western World, where geospatial data are typically less sparse than in the Global South. MMCR achieved an average F1 score of 0.444 when applied to sites located outside of the source domain, which is a considerable improvement to several supervised models (F1 scores between 0.107 and 0.424). The combined findings of this thesis contribute to the mapping and monitoring of cities on a global scale, which is crucial to support sustainable planning and urban SDG indicator monitoring. / Vår befolkningstillväxt ligger till stor grund för den omfattande urbanise-ringstakt som kan observeras runt om i världen idag. Jordobservationer harblivit ett betydelsefullt verktyg för att bevaka urbaniseringen på en globalskala genom att antingen kartlägga städernas omfattning eller upptäcka ny-byggda stadsområden inom eller runtom städer. Tillföljd av satellituppdragenSentinel-1 (S1) Synthetic Aperture Radar (SAR) och Sentinel-2 (S2) MultiS-pectral Instrument (MSI) och dess förmåga att systematiskt tillhandahållabreda och högupplösta bilder, har vi fått nya möjligheter att kartlägga urbanaområden och upptäcka förändringar inom dem, även på frekvent åter besöktaplatser. Samtida trender inom både urban kartläggning och för att upptäcka ur-bana förändringar har gått från att använda traditionella maskininlärnings-metoder till djupinlärning (DL), särskilt Convolutional Neural Nets (CNNs).De nytillkomna urbana kartläggningsmetoderna har gett lovande resultat ge-nom att träna CNNs med redan tillgänglig urban data och S2-bilder. Likasåhar DL-modeller, i kombination med S2-data, tillämpats på de problem somkan uppkomma vid analyser av urbana förändringar. Kvaliteten på de nuvarande metoderna beror dock i stor utsträckning påtillgången av lokal referensdata förövervakad träning. CNNs som tillämpaspå nya områden ger ofta otillräckliga resultat på grund av deras oförmågaatt generalisera över regioner. Eftersom multitemporala referensdata kan va-ra svåra att erhålla föreslås oövervakad inlärning för upptäckter av urbanaförändringar. även om oövervakade modeller kan prestera mer konsekvent iolika regioner, generar de ofta betydligt sämre än dess övervakade motsva-righeter. För att undvika de brister som kan uppkomma är det önskvärt attanvända semi-övervakad inlärning (SSL) som nyttjar omärkta data för attförbättraövervakad inlärning eftersom tillgången på satellitdata är så stor.Dessutom har integrationen av SAR-data i de nuvarande optiska ramverken(så kallad datafusion) potential att producera modeller med bättre generali-seringsförmåga då representationen av stadsområden i SAR-bilder är i stortsett oföränderlig mellan städer, medan spektrala signaturer varierar mycket. Denna avhandling presenterar först en ny metod för domänanpassning(DA) som använder SSL. Den DA-metoden som presenteras kombinerar Multi-Modal (MM) S1 SAR och S2 MSI för att förbättra generaliseringen av re-gioner som används vid kartläggning av bebyggda områden. Två identiskaundernätverk är inkorporerade i den föreslagna modellen för att få separataurbana kartläggningar från SAR och optiska data. För att erhålla en kon-sekvent segmentering av bebyggda områden över datamodalitet utformadesen oövervakad komponent för att motverka inkonsekvent segmentering frånde två undernätverken. Således föreslås användningen av kompletterande da-tamodaliteter som använder sig av verkliga störningar för konsistensregula-riseringar (CR). För det slutgiltiga resultatet tar modellen hänsyn till bådadatamodaliteterna. Experiment utförda på en testuppsättning bestående av60 representativa platseröver världen visar att den föreslagna DA-metodenuppnår starka förbättringar (F1 score 0,694) vidövervakad inlärning från S1SAR-data (F1 score 0,574), S2 MSI-data (F1 score 0,580) och deras samman-slagning på ingångsnivå (F1 score 0,651). I jämförelse med de två främstaglobala kartorna över mänskliga bosättningar, GHS-S2 och WSF2019, visadesig vår modell kapabel till att producera bebyggelsekartor med jämförbar ellerbättre kvalitet. Gällande metoder för upptäckter av urbana förändringar i städer föreslårdenna avhandling en ny nätverksarkitektur som sammanslår SAR och op-tisk data. Mer specifikt presenteras ett dubbelströmskoncept för att bearbetaolika datamodaliteter separat, innan de extraherade funktionerna kombine-ras i ett senare beslutsstadium. De enskilda strömmarna baseras på U-Netarkitektur. Strategin överträffade andra U-Net-baserade tillvägagångssätt ikombination med uni-modala data och MM-data med funktionsnivåfusion.Dessutom uppnådde tillvägagångssättet hög prestanda på problem som or-sakas vid en frekvent använd datauppsättning för urbana förändringar (F1score 0,600). Därtill föreslås en ny nätverksarkitektur som anpassar multi-modala kon-sistensregulariseringar (MMCR) för att upptäcka urbana förändringar. Ge-nom att använda bi-temporala S1 SAR- och S2 MSI-bildpar som indata,förutsäger nätverket MM Siamese Difference (Siam-Diff) Dual-Task (DT) intebara förändringar med hjälp av en skillnadsavkodare, utan kan även segmen-tera byggnader för varje bild med en semantisk avkodare. Nätverket tränaspå ett semi-övervakat sätt med hjälp av MMCR, nämligen att byggnadsseg-mentering över sensormodaliteter ska vara konsekvent, för att lära sig merrobusta funktioner. Den föreslagna metoden testades på en CD-uppgift medanvändning av de 60 platserna i SpaceNet7-datauppsättningen. Ett domängapintroducerades genom att endast använda etiketter för platser i västvärlden,där geospatiala data vanligtvis är mindre glest än i Globala Syd. MMCRuppnådde ett genomsnittligt F1 score på 0,444 när det applicerades på plat-ser utanför källdomänen, vilket är en avsevärd förbättring för flera övervakademodeller (F1 score mellan 0,107 och 0,424).Samtliga resultat från avhandlingen bidrar till kartläggning och över-vakning av städer på en global skala, vilket är väsentligt för att kunna bedrivahållbar stadsplanering och övervakning av FN:s globala mål för hållbar ut-veckling. / <p>QC220530</p>
|
116 |
Öppen källkodslösning för datorseende : Skapande av testmiljö och utvärdering av OpenCV / Open source solution for computer vision : Creating a test environment and evaluating OpenCVLokkin, Caj, Bragd, Sebastian January 2021 (has links)
Datorseende är ett område inom datavetenskap som har utvecklats i många år och funktionaliteten är mer tillgängligt nu än någonsin. Det kan bland annat användas för beröringfri mätning så som att hitta, verifiera och identifiera defekter för objekt. Frågan är om det går att utforma en öppen källkodslösning för datorseende som ger motsvarande prestanda som tillgängliga kommersiella. Med andra ord, kan ett företag som använder ett kommersiellt program med stängd källkod istället använda ett gratis öppet källkodsblibotek och få motsvarande resultat? I denna rapport beskriver vi designen av en prototyp som använder det öppna källkodsbiblioteket för datorseende, OpenCV. I syfte att utvärdera vår prototyp låter vi den identifiera block i ett torn på en bild i en serie testfall. Vi jämför resultaten från prototypen med de resultat som erhålls med en kommersiell lösning, skapad med programmet 'Vision Builder for Automated Inspection'. Resultat av det som testats visar att OpenCV tycks ha prestanda och funktionalitet som motsvarar den kommersiella lösningen men har begränsningar. Då OpenCV är fokus är på programmatisk utveckling av datorseenden lösningar är resultatet av lösningar som skapas beroende på användarens kompetens inom programmering och programmdesign. Utifrån de tester som genomfördes anser vi att OpenCV kan ersätta ett licensierat kommersiellt program men licensenkostnaderna kan komma att ersättas av andra utvecklingskostnader. / Computer vision is a subject in computer science that have evolved over many years and the functionality is more accessible then ever. Among other things, it can be used for non-contact measurement to locate, verify, and detect defects of objects. The question is if it is possible to create an open source solution for computer vision equivalent to a closed source solution. In other words, can a company using a closed source commercial program instead use a free open source code library and produce equivalent results? In this report we describe the design of a prototype that uses the open source library for computer vision, OpenCV. In order to evaluate our prototype, we let it identify block in a tower on image in a series of test cases. We compare the results from the prototype with the results obtained with a commercial solution, created with the program ''Vision Builder for Automated Inspection''. Results from the cases tested show that OpenCV seems to have performance and functionality equivalent to the commercial solution but has some limitations. As OpenCV's focus is on programmatic development of computer vision solutions, the result is dependent on the user's skills in programming and program design. Based on the tests that we have performed, we believe that OpenCV can replace a licensed commerical program, but the license cost may come to be replaced by other development costs.
|
117 |
Implementering av objektföljning med hjälp av maskinseendeRönnholm, Robin January 2020 (has links)
Objektföljning kan användas inom allt från större sportevenemang, smarta bilar och kameraövervakning till bekämpning av malariainfekterade myggor. Tekniken fungerar på så vis att en datoranalys utförs på uppfångade bilder, vilka i de flesta fall har tagits av en digitalkamera. Under analysen görs en bedömning om objektet man vill följa finns i bilden och var i bilden det befinner sig. Det ger datorn ett sätt att se och identifiera objekt, så kallad maskinseende. I den här studien undersöks objektidentifiering när metoden färgidentifikation och kantdetektering tillsammans med Hough circle transform implementeras i en kompakt enkortsdator av modell Raspberry Pi. Enkortsdatorn ska även skicka objektets centrumposition till en mikrokontroller av typ Arduino, som är fristående från det här pappret men den ska rikta in en laserpekare mot centrumpositionen på en måltavla. Utöver implementeringen av objektföljningen skall även en plattform konstrueras för att kunna användas av Syntronic AB i demonstreringssyfte. Plattformen består av tre måltavlor med fotoresistorer och en digitalkamera. Digitalkameran är tillsammans med en laserpekare monterad på en liten servomotorstyrd plattform som kan panorera och luta.
|
118 |
Evaluating DCNN architecturesfor multinomial area classicationusing satellite data / Utvärdering av DCNN arkitekturer för multinomial arealklassi-cering med hjälp av satellit dataWojtulewicz, Karol, Agbrink, Viktor January 2020 (has links)
The most common approach to analysing satellite imagery is building or object segmentation,which expects an algorithm to find and segment objects with specific boundaries thatare present in the satellite imagery. The company Vricon takes satellite imagery analysisfurther with the goal of reproducing the entire world into a 3D mesh. This 3D reconstructionis performed by a set of complex algorithms excelling in different object reconstructionswhich need sufficient labeling in the original 2D satellite imagery to ensure validtransformations. Vricon believes that the labeling of areas can be used to improve the algorithmselection process further. Therefore, the company wants to investigate if multinomiallarge area classification can be performed successfully using the satellite image data availableat the company. To enable this type of classification, the company’s gold-standarddataset containing labeled objects such as individual buildings, single trees, roads amongothers, has been transformed into an large area gold-standard dataset in an unsupervisedmanner. This dataset was later used to evaluate large area classification using several stateof-the-art Deep Convolutional Neural Network (DCNN) semantic segmentation architectureson both RGB as well as RGB and Digital Surface Model (DSM) height data. Theresults yield close to 63% mIoU and close to 80% pixel accuracy on validation data withoutusing the DSM height data in the process. This thesis additionally contributes with a novelapproach for large area gold-standard creation from existing object labeled datasets.
|
119 |
Detect obstacles for forest machinery from laser scanned dataSöderström, Simon January 2020 (has links)
No description available.
|
120 |
Improved Data Association for Multi-Pedestrian Tracking Using Image InformationFlodin, Frida January 2020 (has links)
Multi-pedestrian tracking (MPT) is the task of localizing and following the trajectory of pedestrians in a sequence. Using an MPT algorithm is an important part in preventing pedestrian-vehicle collisions in Automated Driving (AD) and Advanced Driving Assistance Systems (ADAS). It has benefited greatly from the advances in computer vision and machine learning in the last decades. Using a pedestrian detector, the tracking consists of associating the detections between frames and maintaining pedestrian identities throughout the sequence. This can be a challenging task due to occlusions, missed detections and complex scenes. The number of pedestrians is unknown, and it varies with time. Finding new methods for improving MPT is an active research field and there are many approaches found in the literature. This work focuses on improving the detection-to-track association, the data association, with the help of extracted color features for each pedestrian. Utilizing the recent improvements in object detection this work shows that classical color features still is relevant in pedestrian tracking for real time applications with limited computational resources. The appearance is not only used in the data association but also integrated in a new proposed method to avoid tracking errors due to missed detections. The results show that even with simple models the color appearance can be used to improve the tracking results. Evaluation on the commonly used Multi-Object Tracking-benchmark shows an improvement in the Multi-Object Tracking Accuracy and identity switches, while keeping other measures essentially unchanged.
|
Page generated in 0.045 seconds