Spelling suggestions: "subject:"keypoints detection"" "subject:"waypoint detection""
1 |
Shoulder Keypoint-Detection from Object DetectionKapoor, Prince 22 August 2018 (has links)
This thesis presents detailed observation of different Convolutional Neural Network
(CNN) architecture which had assisted Computer Vision researchers to achieve state-of-the-art performance on classification, detection, segmentation and much more to
name image analysis challenges. Due to the advent of deep learning, CNN had
been used in almost all the computer vision applications and that is why there is
utter need to understand the miniature details of these feature extractors and find
out their pros and cons of each feature extractor meticulously. In order to perform
our experimentation, we decided to explore an object detection task using a particular
model architecture which maintains a sweet spot between computational cost and
accuracy. The model architecture which we had used is LSTM-Decoder. The
model had been experimented with different CNN feature extractor and found their
pros and cons in variant scenarios. The results which we had obtained on different
datasets elucidates that CNN plays a major role in obtaining higher accuracy and
we had also achieved a comparable state-of-the-art accuracy on Pedestrian Detection
Dataset.
In extension to object detection, we also implemented two different model architectures which find shoulder keypoints. So, One of our idea can be explicated as
follows: using the detected annotation from object detection, a small cropped image
is generated which would be feed into a small cascade network which was trained
for detection of shoulder keypoints. The second strategy is to use the same object detection model and fine tune their weights to predict shoulder keypoints. Currently,
we had generated our results for shoulder keypoint detection. However, this idea
could be extended to full-body pose Estimation by modifying the cascaded network
for pose estimation purpose and this had become an important topic of discussion
for the future work of this thesis.
|
2 |
Single Camera Autonomous Navigation for Micro Aerial VehiclesBowen, Jacob 15 December 2012 (has links)
Micro Aerial Vehicles (MAVs) provide a highly capable, agile platform, ideally suited for intelligence/surveillance/reconnaissance missions, urban search and rescue, and scientific exploration. Critical to the success of these tasks is a system which moves au-tonomously through an unknown, obstacle-strewn, GPS-denied environment. Classical simultaneous localization and mapping (SLAM) approaches rely on large, heavy sensors to generate 3-D information about a MAV’s surroundings, severely limiting its abilities. This motivates a study of Parallel Tracking and Mapping (PTAM), an algorithm requiring only a single camera to provide 3-D data to an autonomous navigation system. Metric properties of 3-D MAV pose estimates are compared with physical measurements to ex-plore tracking accuracy. Additionally, a discrete wavelet transform-based keypoint detec-tor is implemented for a feasibility study on improving map density in low-visual-detail environments. Finally, a system is presented that integrates PTAM, autonomous MAV control, and a human interface for manual control and data logging.
|
3 |
COMPUTER VISION SYSTEMS FOR PRACTICAL APPLICATIONS IN PRECISION LIVESTOCK FARMINGPrajwal Rao (19194526) 23 July 2024 (has links)
<p dir="ltr">The use of advanced imaging technology and algorithms for managing and monitoring livestock improves various aspects of livestock, such as health monitoring, behavioral analysis, early disease detection, feed management, and overall farming efficiency. Leveraging computer vision techniques such as keypoint detection, and depth estimation for these problems help to automate repeatable tasks, which in turn improves farming efficiency. In this thesis, we delve into two main aspects that are early disease detection, and feed management:</p><ul><li><b>Phenotyping Ducks using Keypoint Detection: </b>A platform to measure duck phenotypes such as wingspan, back length, and hip width packaged in an online user interface for ease of use.</li><li><b>Real-Time Cattle Intake Monitoring Using Computer Vision:</b> A complete end-to-end real-time monitoring system to measure cattle feed intake using stereo cameras.</li></ul><p dir="ltr">Furthermore, considering the above implementations and their drawbacks, we propose a cost-effective simulation environment for feed estimation to conduct extensive experiments prior to real-world implementation. This approach allows us to test and refine the computer vision systems under controlled conditions, identify potential issues, and optimize performance without the high costs and risks associated with direct deployment on farms. By simulating various scenarios and conditions, we can gather valuable data, improve algorithm accuracy, and ensure the system's robustness. Ultimately, this preparatory step will facilitate a smoother transition to real-world applications, enhancing the reliability and effectiveness of computer vision in precision livestock farming.</p>
|
4 |
Continuous Balance Evaluation by Image Analysis of Live Video : Fall Prevention Through Pose Estimation / Kontinuerlig Balansutvärdering Genom Bildanalys av Video i Realtid : Fallprevention Genom KroppshållningsestimationRuneskog, Henrik January 2021 (has links)
The deep learning technique Human Pose Estimation (or Human Keypoint Detection) is a promising field in tracking a person and identifying its posture. As posture and balance are two closely related concepts, the use of human pose estimation could be applied to fall prevention. By deriving the location of a persons Center of Mass and thereafter its Center of Pressure, one can evaluate the balance of a person without the use of force plates or sensors and solely using cameras. In this study, a human pose estimation model together with a predefined human weight distribution model were used to extract the location of a persons Center of Pressure in real time. The proposed method utilized two different methods of acquiring depth information from the frames - stereoscopy through two RGB-cameras and with the use of one RGB-depth camera. The estimated location of the Center of Pressure were compared to the location of the same parameter extracted while using the force plate Wii Balance Board. As the proposed method were to operate in real-time and without the use of computational processor enhancement, the choice of human pose estimation model were aimed to maximize software input/output speed. Thus, three models were used - one smaller and faster model called Lightweight Pose Network, one larger and accurate model called High-Resolution Network and one model placing itself somewhere in between the two other models, namely Pose Residual Network. The proposed method showed promising results for a real-time method of acquiring balance parameters. Although the largest source of error were the acquisition of depth information from the cameras. The results also showed that using a smaller and faster human pose estimation model proved to be sufficient in relation to the larger more accurate models in real-time usage and without the use of computational processor enhancement. / Djupinlärningstekniken Kroppshållningsestimation är ett lovande medel gällande att följa en person och identifiera dess kroppshållning. Eftersom kroppshållning och balans är två närliggande koncept, kan användning av kroppshållningsestimation appliceras till fallprevention. Genom att härleda läget för en persons tyngdpunkt och därefter läget för dess tryckcentrum, kan utvärdering en persons balans genomföras utan att använda kraftplattor eller sensorer och att enbart använda kameror. I denna studie har en kroppshållningsestimationmodell tillsammans med en fördefinierad kroppsviktfördelning använts för att extrahera läget för en persons tryckcentrum i realtid. Den föreslagna metoden använder två olika metoder för att utvinna djupseende av bilderna från kameror - stereoskopi genom användning av två RGB-kameror eller genom användning av en RGB-djupseende kamera. Det estimerade läget av tryckcentrat jämfördes med läget av samma parameter utvunnet genom användning av tryckplattan Wii Balance Board. Eftersom den föreslagna metoden var ämnad att fungera i realtid och utan hjälp av en GPU, blev valet av kroppshållningsestimationsmodellen inriktat på att maximera mjukvaruhastighet. Därför användes tre olika modeller - en mindre och snabbare modell vid namn Lightweight Pose Network, en större och mer träffsäker modell vid namn High-Resolution Network och en model som placerar sig någonstans mitt emellan de två andra modellerna gällande snabbhet och träffsäkerhet vid namn Pose Resolution Network. Den föreslagna metoden visade lovande resultat för utvinning av balansparametrar i realtid, fastän den största felfaktorn visade sig vara djupseendetekniken. Resultaten visade att användning av en mindre och snabbare kroppshållningsestimationsmodellen påvisar att hålla måttet i jämförelse med större och mer träffsäkra modeller vid användning i realtid och utan användning av externa dataprocessorer.
|
5 |
Rozpoznávání obrazů pro ovládání robotické ruky / Image recognition for robotic handLabudová, Kristýna January 2017 (has links)
This thesis concerns with processing of embedded terminals’ images and their classification. There is problematics of moire noise reduction thought filtration in frequency domain and the image normalization for further processing analyzed. Keypoints detectors and descriptors are used for image classification. Detectors FAST and Harris corner detector and descriptors SURF, BRIEF and BRISK are emphasized as well as their evaluation in terms of potential contribution to this work.
|
6 |
Unsupervised Domain Adaptation for Regressive Annotation : Using Domain-Adversarial Training on Eye Image Data for Pupil Detection / Oövervakad domänadaptering för regressionsannotering : Användning av domänmotstående träning på ögonbilder för pupilldetektionZetterström, Erik January 2023 (has links)
Machine learning has seen a rapid progress the last couple of decades, with more and more powerful neural network models continuously being presented. These neural networks require large amounts of data to train them. Labelled data is especially in great demand, but due to the time consuming and costly nature of data labelling, there exists a scarcity for labelled data, whereas there usually is an abundance of unlabelled data. In some cases, data from a certain distribution, or domain, is labelled, whereas the data we actually want to optimise our model on is unlabelled and from another domain. This falls under the umbrella of domain adaptation and the purpose of this thesis is to train a network using domain-adversarial training on eye image datasets consisting of a labelled source domain and an unlabelled target domain, with the goal of performing well on target data, i.e., overcoming the domain gap. This was done on two different datasets: a proprietary dataset from Tobii with real images and the public U2Eyes dataset with synthetic data. When comparing domain-adversarial training to a baseline model trained conventionally on source data and a oracle model trained conventionally on target data, the proposed DAT-ResNet model outperformed the baseline on both datasets. For the Tobii dataset, DAT-ResNet improved the Huber loss by 22.9% and the Intersection over Union (IoU) by 7.6%, and for the U2Eyes dataset, DAT-ResNet improved the Huber loss by 67.4% and the IoU by 37.6%. Furthermore, the IoU measures were extended to also include the portion of predicted ellipsis with no intersection with the corresponding ground truth ellipsis – referred to as zero-IoUs. By this metric, the proposed model improves the percentage of zero-IoUs by 34.9% on the Tobii dataset and by 90.7% on the U2Eyes dataset. / Maskininlärning har sett en snabb utveckling de senaste decennierna med mer och mer kraftfulla neurala nätverk-modeller presenterades kontinuerligt. Dessa neurala nätverk kräver stora mängder data för att tränas. Data med etiketter är det framförallt stor efterfrågan på, men på grund av det är tidskrävande och kostsamt att etikettera data så finns det en brist på sådan data medan det ofta finns ett överflöd av data utan etiketter. I vissa fall så är data från en viss fördelning, eller domän, etiketterad, medan datan som vi faktiskt vill optimera vår modell efter saknar etiketter och är från en annan domän. Det här faller under området domänadaptering och målet med det här arbetet är att träna ett nätverk genom att använda domänmoststående träning på dataset med ögonbilder som har en källdomän med etiketter och en måldomän utan etiketter, där målet är att prestera bra på data från måldomänen, i.e., att lösa ett domänadapteringsproblem. Det här gjordes på två olika dataset: ett dataset som ägs av Tobii med riktiga ögonbilder och det offentliga datasetet U2Eyes med syntetiska bilder. När domänadapteringsmodellen jämförs med en basmodell tränad konventionellt på källdata och en orakelmodell tränad konventionellt på måldata, så utklassar den presenterade DAT-ResNet-modellen basmodellen på båda dataseten. På Tobii-datasetet så förbättrade DAT-ResNet förlusten med 22.9% och Intersection over Union (IoU):n med 7.6%, och på U2Eyes-datasetet, förbättrade DAT-ResNet förlusten med 67.4% och IoU:n med 37.6%. Dessutom så utökades IoU-måtten till att också innefatta andelen av förutspådda ellipser utan något överlapp med tillhörande grundsanningsellipser – refererat till som noll-IoU:er. Enligt detta mått så förbättrar den föreslagna modellen noll-IoU:erna med 34.9% på Tobii-datasetet och 90.7% på U2Eyes-datasetet.
|
Page generated in 0.072 seconds