Global ETD Search

191	Text Localization for Unmanned Ground Vehicles Kirchhoff, Allan Richard 16 October 2014 (has links) Unmanned ground vehicles (UGVs) are increasingly being used for civilian and military applications. Passive sensing, such as visible cameras, are being used for navigation and object detection. An additional object of interest in many environments is text. Text information can supplement the autonomy of unmanned ground vehicles. Text most often appears in the environment in the form of road signs and storefront signs. Road hazard information, unmapped route detours and traffic information are available to human drivers through road signs. Premade road maps lack these traffic details, but with text localization the vehicle could fill the information gaps. Leading text localization algorithms achieve ~60% accuracy; however, practical applications are cited to require at least 80% accuracy [49]. The goal of this thesis is to test existing text localization algorithms against challenging scenes, identify the best candidate and optimize it for scenes a UGV would encounter. Promising text localization methods were tested against a custom dataset created to best represent scenes a UGV would encounter. The dataset includes road signs and storefront signs against complex background. The methods tested were adaptive thresholding, the stroke filter and the stroke width transform. A temporal tracking proof of concept was also tested. It tracked text through a series of frames in order to reduce false positives. Best results were obtained using the stroke width transform with temporal tracking which achieved an accuracy of 79%. That level of performance approaches requirements for use in practical applications. Without temporal tracking the stroke width transform yielded an accuracy of 46%. The runtime was 8.9 seconds per image, which is 44.5 times slower than necessary for real-time object tracking. Converting the MATLAB code to C++ and running the text localization on a GPU could provide the necessary speedup. / Master of Science text localization Optical Character Recognition Unmanned Ground Vehicles Stroke Width Transform Stroke Filter Adaptive Thresholding Machine Vision Robotic Perception Machine learning Support Vector Machine
192	Разработка алгоритма для обнаружения дефектных корнеплодов картофеля : магистерская диссертация / Development of an algorithm for detecting defective potato roots Акинин, Д. В., Akinin, D. V. January 2024 (has links) В работе проводилось исследование к решению задачи с сортировкой корнеплодов картофеля на здоровые и дефектные. Решение данной задачи существенно сократило бы ручной монотонный труд человека. Сложность решения задачи заключается в преодолении препятствий – естественное загрязнение землей у картофеля, весовые и габаритные особенности корнеплодов. Для создания алгоритма был собран дата-сет из собственноручно выполненных фотографий и обучено пять моделей, на основе нейронных сетей DenseNet, ResNet50, EfficientNetB0, ResNext50_32x4d, ResNet101. Для дальнейшего исследования была выбрана модель на основе ResNet50, как показавшая наилучшие результаты распознавания на обучающей выборке. Дополнительно была выполнена независимая выборка и протестирована модель, визуализация работы модели с помощью сервиса Streamlit. Создан эскиз будущего фото-сепаратора, предложены рекомендации по реализации такого алгоритма и расчет экономической эффективности. / In the work, a study was conducted to solve the problem of sorting potato root crops into healthy and defective ones. Solving this problem would significantly reduce the manual monotonous work of a person. The difficulty of solving the problem lies in overcoming obstacles – natural soil contamination of potatoes, weight and dimensional features of root crops. To create the algorithm, a dataset was assembled from self-made photographs and five models were trained, based on the neural networks DenseNet, ResNet50, EfficientNetB0, ResNext50_32x4d, ResNet101. For further research, a ResNet50-based model was selected, as it showed the best recognition results in the training sample. Additionally, an independent sample was performed and the model was tested, visualization of the model using the Streamlit service. A sketch of the future photo separator has been created, recommendations for the implementation of such an algorithm and calculation of economic efficiency have been proposed. MASTER'S THESIS COMPUTER VISION MACHINE VISION NEURAL NETWORKS CLASSIFICATION DEFECT CLASSIFICATION NEURAL NETWORKS КОМПЬЮТЕРНОЕ ЗРЕНИЕ МАШИННОЕ ЗРЕНИЕ КЛАССИФИКАЦИЯ НЕЙРОННЫЕ СЕТИ
193	Analysis of non-steady state physiological and pathological processes Hill, Nathan R. January 2008 (has links) The analysis of non steady state physiological and pathological processes concerns the abstraction, extraction, formalisation and analysis of information from physiological systems that is obscured, hidden or unable to be assessed using traditional methods. Time Series Analysis (TSA) techniques were developed and built into a software program, Easy TSA, with the aim of examining the oscillations of hormonal concentrations in respect to their temporal aspects – periodicity, phase, pulsatility. The Easy TSA program was validated using constructed data sets and used in a clinical study to examine the relationship between insulin and obesity in people without diabetes. In this study fifty-six non-diabetic subjects (28M, 28F) were examined using data from a number of protocols. Fourier Transform and Autocorrelation techniques determined that there was a critical effect of the level of BMI on the frequency, amplitude and regularity of insulin oscillations. Second, information systems formed the background to the development of an algorithm to examine glycaemic variability and a new methodology termed the Glycaemic Risk in Diabetes Equation (GRADE) was developed. The aim was to report an integrated glycaemic risk score from glucose profiles that would complement summary measures of glycaemia, such as the HbA1c. GRADE was applied retrospectively to blood glucose data sets to determine if it was clinically relevant. Subjects with type 1 and type 2 diabetes had higher GRADE scores than the non-diabetic population and the contribution of hypo- and hyperglycaemic episodes to risk was demonstrated. A prospective study was then designed with the aim to apply GRADE in a clinical context and to measure the statistical reproducibility of using GRADE. Fifty-three (Male 26, Female 27) subjects measured their blood glucose 4 times daily for twenty-one days. The results were that lower HbA1c’s correlated with an increased risk of hypoglycaemia and higher HbA1c’s correlated with an increased risk of hyperglycaemia. Some subjects had HbA1c of 7.0 but had median GRADE values ranging from 2.2 to 10.5. The GRADE score summarized diverse glycaemic profiles into a single assessment of risk. Well-controlled glucose profiles yielded GRADE scores <= 5 and higher GRADE scores represented increased clinical risk from hypo or hyperglycaemia. Third, an information system was developed to analyse data-rich multi-variable retinal images using the concept of assessment of change rather than specific lesion recognition. A fully Automated Retinal Image Differencing (ARID) computer system was developed to highlight change between retinal images over time. ARID was validated using a study and then a retrospective study sought to determine if the use of the ARID software was an aid to the retinal screener. One hundred and sixty images (80 image pairs) were obtained from Gloucestershire Diabetic Eye Screening Programme. Images pairs were graded manually and categorised according to how each type of lesion had progressed, regressed, or not changed between image A and image B. After a 30 day washout period image pairs were graded using ARID and the results compared. The comparison of manual grading to grading using ARID (Table 4.3) demonstrated an increased sensitivity and specificity. The mean sensitivity of ARID (87.9%) was increased significantly in comparison to manually grading sensitivity (84.1%) (p<0.05). The specificity of the automated analysis (87.5%) increased significantly from the specificity (56.3%) achieved by manually grading (p<0.05). The conclusion was that automatic display of an ARID differenced image where sequential photographs are available would allow rapid assessment and appropriate triage. Forth, non-linear dynamic systems analysis methods were utilised to build a system to assess the extent of chaos characteristics within the insulin-glucose feedback domain. Biological systems exist that are deterministic yet are neither predictable nor repeatable. Instead they exhibit chaos, where a small change in the initial conditions produces a wholly different outcome. The glucose regulatory system is a dynamic system that maintains glucose homeostasis through the feedback mechanism of glucose, insulin, and contributory hormones and was ideally suited to chaos analysis. To investigate this system a new algorithm was created to assess the Normalised Area of Attraction (NAA). The NAA was calculated by defining an oval using the 95% CI of glucose & Insulin (the limit cycle) on a phasic plot. Thirty non-diabetic subjects and four subjects with type 2 diabetes were analysed. The NAA indicated a smaller range for glucose and insulin excursions with the non-diabetics subjects (p<0.05). The conclusion was that the evaluation of glucose metabolism in terms of homeostatic integrity and not in term of cut-off values may enable a more realistic approach to the effective treatment and prevention of diabetes and its complications. 616.462
194	Human layout estimation using structured output learning Mittal, Arpit January 2012 (has links) In this thesis, we investigate the problem of human layout estimation in unconstrained still images. This involves predicting the spatial configuration of body parts. We start our investigation with pictorial structure models and propose an efficient method of model fitting using skin regions. To detect the skin, we learn a colour model locally from the image by detecting the facial region. The resulting skin detections are also used for hand localisation. Our next contribution is a comprehensive dataset of 2D hand images. We collected this dataset from publicly available image sources, and annotated images with hand bounding boxes. The bounding boxes are not axis aligned, but are rather oriented with respect to the wrist. Our dataset is quite exhaustive as it includes images of different hand shapes and layout configurations. Using our dataset, we train a hand detector that is robust to background clutter and lighting variations. Our hand detector is implemented as a two-stage system. The first stage involves proposing hand hypotheses using complementary image features, which are then evaluated by the second stage classifier. This improves both precision and recall and results in a state-of-the-art hand detection method. In addition we develop a new method of non-maximum suppression based on super-pixels. We also contribute an efficient training algorithm for structured output ranking. In our algorithm, we reduce the time complexity of an expensive training component from quadratic to linear. This algorithm has a broad applicability and we use it for solving human layout estimation and taxonomic multiclass classification problems. For human layout, we use different body part detectors to propose part candidates. These candidates are then combined and scored using our ranking algorithm. By applying this bottom-up approach, we achieve accurate human layout estimation despite variations in viewpoint and layout configuration. In the multiclass classification problem, we define the misclassification error using a class taxonomy. The problem then reduces to a structured output ranking problem and we use our ranking method to optimise it. This allows inclusion of semantic knowledge about the classes and results in a more meaningful classification system. Lastly, we substantiate our ranking algorithm with theoretical proofs and derive the generalisation bounds for it. These bounds prove that the training error reduces to the lowest possible error asymptotically. 006.3
195	The acquisition of coarse gaze estimates in visual surveillance Benfold, Ben January 2011 (has links) This thesis describes the development of methods for automatically obtaining coarse gaze direction estimates for pedestrians in surveillance video. Gaze direction estimates are beneficial in the context of surveillance as an indicator of an individual's intentions and their interest in their surroundings and other people. The overall task is broken down into two problems. The first is that of tracking large numbers of pedestrians in low resolution video, which is required to identify the head regions within video frames. The second problem is to process the extracted head regions and estimate the direction in which the person is facing as a coarse estimate of their gaze direction. The first approach for head tracking combines image measurements from HOG head detections and KLT corner tracking using a Kalman filter, and can track the heads of many pedestrians simultaneously to output head regions with pixel-level accuracy. The second approach uses Markov-Chain Monte-Carlo Data Association (MCMCDA) within a temporal sliding window to provide similarly accurate head regions, but with improved speed and robustness. The improved system accurately tracks the heads of twenty pedestrians in 1920x1080 video in real-time and can track through total occlusions for short time periods. The approaches for gaze direction estimation all make use of randomised decision tree classifiers. The first develops classifiers for low resolution head images that are invariant to hair and skin colours using branch decisions based on abstract labels rather than direct image measurements. The second approach addresses higher resolution images using HOG descriptors and novel Colour Triplet Comparison (CTC) based branches. The final approach infers custom appearance models for individual scenes using weakly supervised learning over large datasets of approximately 500,000 images. A Conditional Random Field (CRF) models interactions between appearance information and walking directions to estimate gaze directions for head image sequences. 621.38928
196	Dense Stereo Reconstruction in a Field Programmable Gate Array Sabihuddin, Siraj 30 July 2008 (has links) Estimation of depth within an imaged scene can be formulated as a stereo correspondence problem. Software solutions tend to be too slow for high frame rate (i.e. > 30 fps) performance. Hardware solutions can result in marked improvements. This thesis explores one such hardware implementation that generates dense binocular disparity estimates at frame rates of over 200 fps using a dynamic programming formulation (DPML) developed by Cox et. al. A highly parameterizable field programmable gate array implementation of this architecture demonstrates equivalent accuracy while executing at significantly higher frame rates to those of current approaches. Existing hardware implementations for dense disparity estimation often use sum of squared difference, sum of absolute difference or other similar algorithms that typically perform poorly in comparison to DPML. The presented system runs at 248 fps for a resolution of 320 x 240 pixels and disparity range of 128 pixels, a performance of 2.477 billion DPS. stereo vision disparity depth fpga dynamic programming dense high frame rate pipeline dpml maximum likelihood parallel hardware field programmable gate array ssd sad computer vision machine vision stereo binocular high resolution high disparity large disparity large resolution two view 0544
197	Dense Stereo Reconstruction in a Field Programmable Gate Array Sabihuddin, Siraj 30 July 2008 (has links) Estimation of depth within an imaged scene can be formulated as a stereo correspondence problem. Software solutions tend to be too slow for high frame rate (i.e. > 30 fps) performance. Hardware solutions can result in marked improvements. This thesis explores one such hardware implementation that generates dense binocular disparity estimates at frame rates of over 200 fps using a dynamic programming formulation (DPML) developed by Cox et. al. A highly parameterizable field programmable gate array implementation of this architecture demonstrates equivalent accuracy while executing at significantly higher frame rates to those of current approaches. Existing hardware implementations for dense disparity estimation often use sum of squared difference, sum of absolute difference or other similar algorithms that typically perform poorly in comparison to DPML. The presented system runs at 248 fps for a resolution of 320 x 240 pixels and disparity range of 128 pixels, a performance of 2.477 billion DPS. stereo vision disparity depth fpga dynamic programming dense high frame rate pipeline dpml maximum likelihood parallel hardware field programmable gate array ssd sad computer vision machine vision stereo binocular high resolution high disparity large disparity large resolution two view 0544
198	Έλεγχος ρομπότ για το διαχωρισμό υφάσματος από στοίβα και τη μεταφορά του σε επόμενο στάδιο επεξεργασίας, βασιζόμενος σε μεθόδους τεχνητής νοημοσύνης Ζουμπόνος, Γεώργιος 14 February 2012 (has links) Η βιομηχανία της ένδυσης εξακολουθεί να στηρίζεται σε πολύ μεγάλο βαθμό στην χειρωνακτική εργασία. Αυτό οφείλεται στο γεγονός ότι τα υφάσματα είναι σώματα που παρουσιάζουν πολύ μικρή δυσκαμψία με αποτέλεσμα να παραμορφώνονται εύκολα, ενώ παράλληλα έχουν ένα μεγάλο εύρος δομών και ιδιοτήτων που καθιστά δύσκολη την ανάπτυξη αξιόπιστων και ευέλικτων συστημάτων χειρισμού. Στη διατριβή αυτή παρουσιάζεται μία μέθοδος για τον διαχωρισμό και σύλληψη ενός τεμαχίου υφάσματος από στοίβα, βασισμένη στη ροή αέρα υπό πίεση πάνω από τη στοίβα. Η ροή ανασηκώνει το άνω τεμάχιο, ενώ η τυρβώδης φύση της ροής διαχωρίζει το τεμάχιο από τα υποκείμενά του. Αναπτύσσονται δύο συστήματα για τον αυτόνομο προσδιορισμό της τροχιάς άκρου εργασίας ρομπότ, για την πραγματοποίηση του χειρισμού της απλής απόθεσης τεμαχίου υφάσματος σε τράπεζα εργασίας. Αυτά τα συστήματα βασίζονται σε μεθόδους υπολογιστικής νοημοσύνης, και πιο συγκεκριμένα στην ασαφή λογική, χωρίς να απαιτούν τη χρήση επιπρόσθετων συσκευών ή τη γνώση πολλών μηχανικών ιδιοτήτων των υφασμάτων. Μελετάται ο χειρισμός του διπλώματος υφάσματος σε τράπεζα εργασίας και εισάγονται τρία στάδια στα οποία μπορεί να χωριστεί αυτός ο χειρισμός ώστε να μειωθεί η πολυπλοκότητα του συνολικού χειρισμού. Αναλύεται το κάθε στάδιο και παρουσιάζονται τα χαρακτηριστικά μορφής του υφάσματος που επιλέγονται για να περιγράψουν την κατάστασή του για κάθε στάδιο του χειρισμού. Εισάγεται μια μέθοδος για την εξαγωγή αυτών των χαρακτηριστικών με τη χρήση δύο αισθητήρων όρασης, η οποία βασίζεται στην αναζήτηση των χαρακτηριστικών αυτών σε συγκεκριμένες περιοχές του χώρου της εικόνας. Αυτό καθίσταται δυνατό χάρη στην βαθμονόμηση των αισθητήρων. Αναπτύσσεται μία στρατηγική για το δίπλωμα υφασμάτων βασισμένη σε ασαφή λογική με ανάδραση όρασης. Ο ασαφής ελεγκτής, πολλών εισόδων-εξόδων, εκπαιδεύεται με τη μέθοδο δοκιμής-και-σφάλματος και παρέχει τα κέρδη ενός Ρ-ελεγκτή. Το σύστημα παρουσιάζει ευελιξία και αξιοπιστία για υφάσματα που ικανοποιούν τους περιορισμούς που έχουν τεθεί. Παρουσιάζεται μία στρατηγική για τον έλεγχο του ενεργού διπλώματος όπου δύο ανεξάρτητα υποσυστήματα αναλαμβάνουν τον προσδιορισμό της κατάστασης στόχου του υφάσματος και την επίτευξη αυτού του στόχου αυξάνοντας με αυτόν τον τρόπο την ευελιξία του συστήματος. Οι μέθοδοι που αναπτύχθηκαν μπορούν να χρησιμοποιηθούν ως αφετηρία για την εισαγωγή αξιόπιστων και ευέλικτων αυτοματισμών με σκοπό την εκτέλεση των χειρισμών της βιομηχανίας ένδυσης από ρομπότ. / The apparel industry is still mainly based on manual labor. The main reason for the automation delay is the fact that fabrics are bodies that present very low bending rigidity, and as a result they are easily deformed. Fabrics also present a great variety of structures and properties. These facts deter the development of reliable and flexible robotic handling systems. In this thesis a method for the separation and capture of a piece of fabric from a stack is presented, based on air flow over the stack. The difference in static pressure, caused by the flow, lifts the upper piece of the fabric, while the turbulent nature of the flow separates it from its underlying pieces. Two systems are developed for the determination of the trajectory of the end-effector of a robot, for the realization of the simple laying task of a piece of fabric on a work table. These systems are based on soft computing, and particularly on fuzzy logic, and any additional apparatuses or the knowledge of many mechanical properties of the fabrics are not required. The task of folding a piece of fabric on a work table is investigated and three stages are introduced, in which the folding task can be decomposed in order to reduce the complexity of the robot controller development. Each stage is explained and the shape characteristics that are selected in order to describe the shape of the fabric for each stage are presented. A method for the extraction of the selected characteristics from two vision sensors is introduced, which is based on variable image segmentation. The calibration of the vision sensors is also presented. A strategy is developed for the folding of rectangular pieces of fabric based on fuzzy logic with vision feedback. The indirect fuzzy controller is trained via trial-and-error and provides the variable gains of a P-controller. The system presents flexibility and reliability for the fabrics that satisfy the restrictions that have been set. Finally, a strategy for the control of the true folding stage is presented, according to which two separate subsystems determine the target state of the fabric and lead the fabric towards that state, increasing thus the flexibility of the system. The methods that are developed in this thesis can be the stepping stone for the introduction of reliable and flexible automation schemes for the realization of some of the apparel industry tasks that are still labor intensive. Χειρισμοί από ρομπότ Τεχνητή νοημοσύνη Ασαφής λογική Τεχνητή όραση 670.427 Robotic handling Fabric separation/capture Fabric placement/folding Artificial intelligence Fuzzy logic Machine vision
199	Robot Goalkeeper : A robotic goalkeeper based on machine vision and motor control Adeboye, Taiyelolu January 2018 (has links) This report shows a robust and efficient implementation of a speed-optimized algorithm for object recognition, 3D real world location and tracking in real time. It details a design that was focused on detecting and following objects in flight as applied to a football in motion. An overall goal of the design was to develop a system capable of recognizing an object and its present and near future location while also actuating a robotic arm in response to the motion of the ball in flight. The implementation made use of image processing functions in C++, NVIDIA Jetson TX1, Sterolabs’ ZED stereoscopic camera setup in connection to an embedded system controller for the robot arm. The image processing was done with a textured background and the 3D location coordinates were applied to the correction of a Kalman filter model that was used for estimating and predicting the ball location. A capture and processing speed of 59.4 frames per second was obtained with good accuracy in depth detection while the ball was well tracked in the tests carried out. Object detection 3D reconstruction Object tracking Robot goalkeeper C++ OpenCV CUDA MATLAB Image processing Machine vision Linux OS GPU NVIDIA TX1 Stereolabs ZED Robotics Robotteknik och automation Embedded Systems Inbäddad systemteknik
200	REGTEST - an Automatic & Adaptive GUI Regression Testing Tool. Forsgren, Robert, Petersson Vasquez, Erik January 2018 (has links) Software testing is something that is very common and is done to increase the quality of and confidence in a software. In this report, an idea is proposed to create a software for GUI regression testing which uses image recognition to perform steps from test cases. The problem that exists with such a solution is that if a GUI has had changes made to it, then many test cases might break. For this reason, REGTEST was created which is a GUI regression testing tool that is able to handle one type of change that has been made to the GUI component, such as a change in color, shape, location or text. This type of solution is interesting because setting up tests with such a tool can be very fast and easy, but one previously big drawback of using image recognition for GUI testing is that it has not been able to handle changes well. It can be compared to tools that use IDs to perform a test where the actual visualization of a GUI component does not matter; It only matters that the ID stays the same; however, when using such tools, it either requires underlying knowledge of the GUI component naming conventions or the use of tools which automatically constructs XPath queries for the components. To verify that REGTEST can work as well as existing tools a comparison was made against two professional tools called Ranorex and Kantu. In those tests, REGTEST proved very successful and performed close to, or better than the other software. GUI Test Regression Regression test Machine vision Google Cloud Vision OpenCV Adaptive Automatic Automation Image recognition Web Testing Similarity. Övrig annan teknik Computer Engineering Datorteknik

Search results