Spelling suggestions: "subject:"image recognition"" "subject:"lmage recognition""
1 |
The detection of contours and their visual motionSpacek, L. A. January 1985 (has links)
No description available.
|
2 |
The extraction and recognition of text from multimedia document imagesSmith, R. W. January 1987 (has links)
No description available.
|
3 |
Multisubband structures and their application to image processingTufan, Emir January 1996 (has links)
No description available.
|
4 |
The Smart Phone as a MouseQin, Yinghao January 2006 (has links)
With the development of hardware, mobile phone has become a feature-rich handheld device. Built-in camera and Bluetooth technology are supported in most current mobile phones. A real-time image processing experiment was conducted with a SonyEricsson P910i smartphone and a desktop computer. This thesis describes the design and implementation of a system which uses a mobile phone as a PC mouse. The movement of the mobile phone can be detected by analyzing the images captured by the onboard camera and the mouse cursor in the PC can be controlled by the movement of the phone.
|
5 |
50,000 Tiny Videos: A Large Dataset for Non-parametric Content-based Retrieval and RecognitionKarpenko, Alexandre 22 September 2009 (has links)
This work extends the tiny image data-mining techniques developed by Torralba et al. to videos. A large dataset of over 50,000 videos was collected from YouTube. This is the largest user-labeled research database of videos available to date. We demonstrate that a large dataset of tiny videos achieves high classification precision in a variety of content-based retrieval and recognition tasks using very simple similarity metrics. Content-based copy detection (CBCD) is evaluated on a standardized dataset, and the results are applied to related video retrieval within tiny videos. We use our similarity metrics to improve text-only video retrieval results. Finally, we apply our large labeled video dataset to various classification tasks. We show that tiny videos are better suited for classifying activities than tiny images. Furthermore, we demonstrate that classification can be improved by combining the tiny images and tiny videos datasets.
|
6 |
50,000 Tiny Videos: A Large Dataset for Non-parametric Content-based Retrieval and RecognitionKarpenko, Alexandre 22 September 2009 (has links)
This work extends the tiny image data-mining techniques developed by Torralba et al. to videos. A large dataset of over 50,000 videos was collected from YouTube. This is the largest user-labeled research database of videos available to date. We demonstrate that a large dataset of tiny videos achieves high classification precision in a variety of content-based retrieval and recognition tasks using very simple similarity metrics. Content-based copy detection (CBCD) is evaluated on a standardized dataset, and the results are applied to related video retrieval within tiny videos. We use our similarity metrics to improve text-only video retrieval results. Finally, we apply our large labeled video dataset to various classification tasks. We show that tiny videos are better suited for classifying activities than tiny images. Furthermore, we demonstrate that classification can be improved by combining the tiny images and tiny videos datasets.
|
7 |
Exploration Of Image Recognition On Specific Patterns and Research Of Sub-pixel AlgorithmYang, Jeng-Ho 10 July 2002 (has links)
Image processing technologies are broadly applied on modern machine vision and industrial inspection , but there is usually a trade-off between accuracy and speed of inspection . We plan to solve the plight by two steps : 1.We will develop many major image processing methods such as image boundary¡Bto remove the noise¡Bpattern match , and so on . 2.We will focus on sub-pixel algorithm and boundary research to improve the image accuracy and processing time by software under limited hardware .
As we know , pixel is the most basic element of an image , but we can divide one pixel into several smaller parts by mathematics ; in the meanwhile , the pixel accuracy can be improved . We will use algorithm to realize the goal in continuous way , and research on the flow of image recognition to find out a best flow for any specific image properties .
|
8 |
Multiview active shape models with SIFT descriptorsMilborrow, Stephen January 2016 (has links)
This thesis presents techniques for locating landmarks in images of human faces. A modified Active Shape Model (ASM [21]) is introduced that uses a form of SIFT descriptors [68]. Multivariate Adaptive Regression Splines (MARS [40]) are used to efficiently match descriptors around landmarks. This modified ASM is fast and performs well on frontal faces. The model is then extended to also handle non-frontal faces. This is done by first estimating the face's pose, rotating the face upright, then applying one of three ASM submodels specialized for frontal, left, or right three-quarter views. The multiview model is shown to be effective on a variety of datasets.
|
9 |
Segmentation and clustering in neural networks for image recognitionJan, Ying-Wei January 1994 (has links)
No description available.
|
10 |
A comparison of image and object level annotation performance of image recognition cloud services and custom Convolutional Neural Network modelsNilsson, Kristian, Jönsson, Hans-Eric January 2019 (has links)
Recent advancements in machine learning has contributed to an explosive growth of the image recognition field. Simultaneously, multiple Information Technology (IT) service providers such as Google and Amazon have embraced cloud solutions and software as a service. These factors have helped mature many computer vision tasks from scientific curiosity to practical applications. As image recognition is now accessible to the general developer community, a need arises for a comparison of its capabilities, and what can be gained from choosing a cloud service over a custom implementation. This thesis empirically studies the performance of five general image recognition services (Google Cloud Vision, Microsoft Computer Vision, IBM Watson, Clarifai and Amazon Rekognition) and image recognition models of the Convolutional Neural Network (CNN) architecture that we ourselves have configured and trained. Image and object level annotations of images extracted from different datasets were tested, both in their original state and after being subjected to one of the following six types of distortions: brightness, color, compression, contrast, blurriness and rotation. The output labels and confidence scores were compared to the ground truth of multiple levels of concepts, such as food, soup and clam chowder. The results show that out of the services tested, there is currently no clear top performer over all categories and they all have some variations and similarities in their output, but on average Google Cloud Vision performs the best by a small margin. The services are all adept at identifying high level concepts such as food and most mid-level ones such as soup. However, in terms of further specifics, such as clam chowder, they start to vary, some performing better than others in different categories. Amazon was found to be the most capable at identifying multiple unique objects within the same image, on the chosen dataset. Additionally, it was found that by using synonyms of the ground truth labels, performance increased as the semantic gap between our expectations and the actual output from the services was narrowed. The services all showed vulnerability to image distortions, especially compression, blurriness and rotation. The custom models all performed noticeably worse, around half as well compared to the cloud services, possibly due to the difference in training data standards. The best model, configured with three convolutional layers, 128 nodes and a layer density of two, reached an average performance of almost 0.2 or 20%. In conclusion, if one is limited by a lack of experience with machine learning, computational resources and time, it is recommended to make use of one of the cloud services to reach a more acceptable performance level. Which to choose depends on the intended application, as the services perform differently in certain categories. The services are all vulnerable to multiple image distortions, potentially allowing adversarial attacks. Finally, there is definitely room for improvement in regards to the performance of these services and the computer vision field as a whole.
|
Page generated in 0.0726 seconds