• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1459
  • 296
  • 111
  • 108
  • 106
  • 63
  • 28
  • 24
  • 22
  • 14
  • 13
  • 10
  • 8
  • 7
  • 7
  • Tagged with
  • 2844
  • 2844
  • 768
  • 723
  • 649
  • 620
  • 539
  • 511
  • 487
  • 482
  • 468
  • 462
  • 404
  • 398
  • 397
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
221

A cellular automaton-based system for the identification of topological features of carotid artery plaques

Delaney, Matthew January 2014 (has links)
The formation of a plaque in one or both of the internal carotid arteries poses a serious threat to the lives of those in whom it occurs. This thesis describes a technique designed to detect level of occlusion and provide topological information about such plaques. In order to negate the cost of specialised hardware, only the sound produced by blood-flow around the occlusion is used; this raises problems that prevent the application of existing medical imaging techniques, however, these can be overcome by the application of a nonlinear technique that takes full advantage of the discrete nature of digital computers. Results indicate that both level of occlusion and presence or absence of various topological features can be determined in this way. Beginning with a review of existing work in medical-imaging and in more general but related techniques, the EPI process of Friden (2004) is identified as the strongest approach to a situation where it is desirable to work with both signal and noise yet avoid the computational cost and other pitfalls of established techniques. The remained of the thesis discusses attempts to automate the EPI process which, in the form given by Frieden (2004), requires a degree of human mathematical creative problem-solving. Initially, a numerical-methods inspired approach based on genetic algorithms was attempted but found to be both computationally costly and insufficiently true to the nature of the EPI equations. A second approach, based on the idea of creating a formal system allowing entropy, direction and logic to be manipulated together proved to lack certain key properties and require an amount of work beyond the scope of the project described in this thesis in order to be extended into a form that was usable for the EPI process. The approach upon which the imaging system described is ultimately built is based on an abstracted form of constraint-logic programming resulting in a cellular-automaton based model which is shown to produce distinct images for different sizes and topologies of plaque in a reliable and human-interpretable way.
222

A revised framework for human scene recognition

Linsley, Drew January 2016 (has links)
Thesis advisor: Sean P. MacEvoy / For humans, healthy and productive living depends on navigating through the world and behaving appropriately along the way. But in order to do this, humans must first recognize their visual surroundings. The technical difficulty of this task is hard to comprehend: the number of possible scenes that can fall on the retina approaches infinity, and yet humans often effortlessly and rapidly recognize their surroundings. Understanding how humans accomplish this task has long been a goal of psychology and neuroscience, and more recently, has proven useful in inspiring and constraining the development of new algorithms for artificial intelligence (AI). In this thesis I begin by reviewing the current state of scene recognition research, drawing upon evidence from each of these areas, and discussing an unchallenged assumption in the literature: that scene recognition emerges from independently processing information about scenes’ local visual features (i.e. the kinds of objects they contain) and global visual features (i.e., spatial parameters. ). Over the course of several projects, I challenge this assumption with a new framework for scene recognition that indicates a crucial role for information sharing between these resources. Development and validation of this framework will expand our understanding of scene recognition in humans and provide new avenues for research by expanding these concepts to other domains spanning psychology, neuroscience, and AI. / Thesis (PhD) — Boston College, 2016. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Psychology.
223

Recurrent neural network for optimization with application to computer vision.

January 1993 (has links)
by Cheung Kwok-wai. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1993. / Includes bibliographical references (leaves [146-154]). / Chapter Chapter 1 --- Introduction / Chapter 1.1 --- Programmed computing vs. neurocomputing --- p.1-1 / Chapter 1.2 --- Development of neural networks - feedforward and feedback models --- p.1-2 / Chapter 1.3 --- State of art of applying recurrent neural network towards computer vision problem --- p.1-3 / Chapter 1.4 --- Objective of the Research --- p.1-6 / Chapter 1.5 --- Plan of the thesis --- p.1-7 / Chapter Chapter 2 --- Background / Chapter 2.1 --- Short history on development of Hopfield-like neural network --- p.2-1 / Chapter 2.2 --- Hopfield network model --- p.2-3 / Chapter 2.2.1 --- Neuron's transfer function --- p.2-3 / Chapter 2.2.2 --- Updating sequence --- p.2-6 / Chapter 2.3 --- Hopfield energy function and network convergence properties --- p.2-1 / Chapter 2.4 --- Generalized Hopfield network --- p.2-13 / Chapter 2.4.1 --- Network order and generalized Hopfield network --- p.2-13 / Chapter 2.4.2 --- Associated energy function and network convergence property --- p.2-13 / Chapter 2.4.3 --- Hardware implementation consideration --- p.2-15 / Chapter Chapter 3 --- Recurrent neural network for optimization / Chapter 3.1 --- Mapping to Neural Network formulation --- p.3-1 / Chapter 3.2 --- Network stability verse Self-reinforcement --- p.3-5 / Chapter 3.2.1 --- Quadratic problem and Hopfield network --- p.3-6 / Chapter 3.2.2 --- Higher-order case and reshaping strategy --- p.3-8 / Chapter 3.2.3 --- Numerical Example --- p.3-10 / Chapter 3.3 --- Local minimum limitation and existing solutions in the literature --- p.3-12 / Chapter 3.3.1 --- Simulated Annealing --- p.3-13 / Chapter 3.3.2 --- Mean Field Annealing --- p.3-15 / Chapter 3.3.3 --- Adaptively changing neural network --- p.3-16 / Chapter 3.3.4 --- Correcting Current Method --- p.3-16 / Chapter 3.4 --- Conclusions --- p.3-17 / Chapter Chapter 4 --- A Novel Neural Network for Global Optimization - Tunneling Network / Chapter 4.1 --- Tunneling Algorithm --- p.4-1 / Chapter 4.1.1 --- Description of Tunneling Algorithm --- p.4-1 / Chapter 4.1.2 --- Tunneling Phase --- p.4-2 / Chapter 4.2 --- A Neural Network with tunneling capability Tunneling network --- p.4-8 / Chapter 4.2.1 --- Network Specifications --- p.4-8 / Chapter 4.2.2 --- Tunneling function for Hopfield network and the corresponding updating rule --- p.4-9 / Chapter 4.3 --- Tunneling network stability and global convergence property --- p.4-12 / Chapter 4.3.1 --- Tunneling network stability --- p.4-12 / Chapter 4.3.2 --- Global convergence property --- p.4-15 / Chapter 4.3.2.1 --- Markov chain model for Hopfield network --- p.4-15 / Chapter 4.3.2.2 --- Classification of the Hopfield markov chain --- p.4-16 / Chapter 4.3.2.3 --- Markov chain model for tunneling network and its convergence towards global minimum --- p.4-18 / Chapter 4.3.3 --- Variation of pole strength and its effect --- p.4-20 / Chapter 4.3.3.1 --- Energy Profile analysis --- p.4-21 / Chapter 4.3.3.2 --- Size of attractive basin and pole strength required --- p.4-24 / Chapter 4.3.3.3 --- A new type of pole eases the implementation problem --- p.4-30 / Chapter 4.4 --- Simulation Results and Performance comparison --- p.4-31 / Chapter 4.4.1 --- Simulation Experiments --- p.4-32 / Chapter 4.4.2 --- Simulation Results and Discussions --- p.4-37 / Chapter 4.4.2.1 --- Comparisons on optimal path obtained and the convergence rate --- p.4-37 / Chapter 4.4.2.2 --- On decomposition of Tunneling network --- p.4-38 / Chapter 4.5 --- Suggested hardware implementation of Tunneling network --- p.4-48 / Chapter 4.5.1 --- Tunneling network hardware implementation --- p.4-48 / Chapter 4.5.2 --- Alternative implementation theory --- p.4-52 / Chapter 4.6 --- Conclusions --- p.4-54 / Chapter Chapter 5 --- Recurrent Neural Network for Gaussian Filtering / Chapter 5.1 --- Introduction --- p.5-1 / Chapter 5.1.1 --- Silicon Retina --- p.5-3 / Chapter 5.1.2 --- An Active Resistor Network for Gaussian Filtering of Image --- p.5-5 / Chapter 5.1.3 --- Motivations of using recurrent neural network --- p.5-7 / Chapter 5.1.4 --- Difference between the active resistor network model and recurrent neural network model for gaussian filtering --- p.5-8 / Chapter 5.2 --- From Problem formulation to Neural Network formulation --- p.5-9 / Chapter 5.2.1 --- One Dimensional Case --- p.5-9 / Chapter 5.2.2 --- Two Dimensional Case --- p.5-13 / Chapter 5.3 --- Simulation Results and Discussions --- p.5-14 / Chapter 5.3.1 --- Spatial impulse response of the 1-D network --- p.5-14 / Chapter 5.3.2 --- Filtering property of the 1-D network --- p.5-14 / Chapter 5.3.3 --- Spatial impulse response of the 2-D network and some filtering results --- p.5-15 / Chapter 5.4 --- Conclusions --- p.5-16 / Chapter Chapter 6 --- Recurrent Neural Network for Boundary Detection / Chapter 6.1 --- Introduction --- p.6-1 / Chapter 6.2 --- From Problem formulation to Neural Network formulation --- p.6-3 / Chapter 6.2.1 --- Problem Formulation --- p.6-3 / Chapter 6.2.2 --- Recurrent Neural Network Model used --- p.6-4 / Chapter 6.2.3 --- Neural Network formulation --- p.6-5 / Chapter 6.3 --- Simulation Results and Discussions --- p.6-7 / Chapter 6.3.1 --- Feasibility study and Performance comparison --- p.6-7 / Chapter 6.3.2 --- Smoothing and Boundary Detection --- p.6-9 / Chapter 6.3.3 --- Convergence improvement by network decomposition --- p.6-10 / Chapter 6.3.4 --- Hardware implementation consideration --- p.6-10 / Chapter 6.4 --- Conclusions --- p.6-11 / Chapter Chapter 7 --- Conclusions and Future Researches / Chapter 7.1 --- Contributions and Conclusions --- p.7-1 / Chapter 7.2 --- Limitations and Suggested Future Researches --- p.7-3 / References --- p.R-l / Appendix I The assignment of the boundary connection of 2-D recurrent neural network for gaussian filtering --- p.Al-1 / Appendix II Formula for connection weight assignment of 2-D recurrent neural network for gaussian filtering and the proof on symmetric property --- p.A2-1 / Appendix III Details on reshaping strategy --- p.A3-1
224

Acquisition and modeling of 3D irregular objects.

January 1994 (has links)
by Sai-bun Wong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (leaves 127-131). / Abstract --- p.v / Acknowledgment --- p.vii / Chapter 1 --- Introduction --- p.1-8 / Chapter 1.1 --- Overview --- p.2 / Chapter 1.2 --- Survey --- p.4 / Chapter 1.3 --- Objectives --- p.6 / Chapter 1.4 --- Thesis Organization --- p.7 / Chapter 2 --- Range Sensing --- p.9-30 / Chapter 2.1 --- Alternative Approaches to Range Sensing --- p.9 / Chapter 2.1.1 --- Size Constancy --- p.9 / Chapter 2.1.2 --- Defocusing --- p.11 / Chapter 2.1.3 --- Deconvolution --- p.14 / Chapter 2.1.4 --- Binolcular Vision --- p.18 / Chapter 2.1.5 --- Active Triangulation --- p.20 / Chapter 2.1.6 --- Time-of-Flight --- p.22 / Chapter 2.2 --- Transmitter and Detector in Active Sensing --- p.26 / Chapter 2.2.1 --- Acoustics --- p.26 / Chapter 2.2.2 --- Optics --- p.28 / Chapter 2.2.3 --- Microwave --- p.29 / Chapter 2.3 --- Conclusion --- p.29 / Chapter 3 --- Scanning Mirror --- p.31-47 / Chapter 3.1 --- Scanning Mechanisms --- p.31 / Chapter 3.2 --- Advantages of Scanning Mirror --- p.32 / Chapter 3.3 --- Feedback of Scanning Mirror --- p.33 / Chapter 3.4 --- Scanning Mirror Controller --- p.35 / Chapter 3.5 --- Point-to-Point Scanning --- p.39 / Chapter 3.6 --- Line Scanning --- p.39 / Chapter 3.7 --- Specifications and Measurements --- p.41 / Chapter 4 --- The Rangefinder with Reflectance Sensing --- p.48-58 / Chapter 4.1 --- Ambient Noises --- p.49 / Chapter 4.2 --- Occlusion/Shadow --- p.49 / Chapter 4.3 --- Accuracy and Precision --- p.50 / Chapter 4.4 --- Optics --- p.53 / Chapter 4.5 --- Range/Reflectance Crosstalk --- p.56 / Chapter 4.6 --- Summary --- p.58 / Chapter 5 --- Computer Generation of Range Map --- p.59-75 / Chapter 5.1 --- Homogenous Transformation --- p.61 / Chapter 5.2 --- From Global to Viewer Coordinate --- p.63 / Chapter 5.3 --- Z-buffering --- p.55 / Chapter 5.4 --- Generation of Range Map --- p.66 / Chapter 5.5 --- Experimental Results --- p.68 / Chapter 6 --- Characterization of Range Map --- p.76-90 / Chapter 6.1 --- Mean and Gaussian Curvature --- p.76 / Chapter 6.2 --- Methods of Curvature Generation --- p.78 / Chapter 6.2.1 --- Convolution --- p.78 / Chapter 6.2.2 --- Local Surface Patching --- p.81 / Chapter 6.3 --- Feature Extraction --- p.84 / Chapter 6.4 --- Conclusion --- p.85 / Chapter 7 --- Merging Multiple Characteristic Views --- p.91-119 / Chapter 7.1 --- Rigid Body Model --- p.91 / Chapter 7.2 --- Sub-rigid Body Model --- p.94 / Chapter 7.3 --- Probabilistic Relaxation Matching --- p.95 / Chapter 7.4 --- Merging the Sub-rigid Body Model --- p.99 / Chapter 7.5 --- Illustration --- p.101 / Chapter 7.6 --- Merging Multiple Characteristic Views --- p.104 / Chapter 7.7 --- Mislocation of Feature Extraction --- p.105 / Chapter 7.7.1 --- The Transform Matrix for Perfect Matching --- p.106 / Chapter 7.7.2 --- Introducing The Errors in Feature Set --- p.108 / Chapter 7.8 --- Summary --- p.113 / Chapter 8 --- Conclusion --- p.120-126 / References --- p.127-131 / Appendix A - Projection of Object --- p.A1-A2 / Appendix B - Performance Analysis on Rangefinder System --- p.B1-B16 / Appendix C - Matching of Two Characteristic views --- p.C1-C3
225

Autonomous visual learning for robotic systems

Beale, Dan January 2012 (has links)
This thesis investigates the problem of visual learning using a robotic platform. Given a set of objects the robots task is to autonomously manipulate, observe, and learn. This allows the robot to recognise objects in a novel scene and pose, or separate them into distinct visual categories. The main focus of the work is in autonomously acquiring object models using robotic manipulation. Autonomous learning is important for robotic systems. In the context of vision, it allows a robot to adapt to new and uncertain environments, updating its internal model of the world. It also reduces the amount of human supervision needed for building visual models. This leads to machines which can operate in environments with rich and complicated visual information, such as the home or industrial workspace; also, in environments which are potentially hazardous for humans. The hypothesis claims that inducing robot motion on objects aids the learning process. It is shown that extra information from the robot sensors provides enough information to localise an object and distinguish it from the background. Also, that decisive planning allows the object to be separated and observed from a variety of dierent poses, giving a good foundation to build a robust classication model. Contributions include a new segmentation algorithm, a new classication model for object learning, and a method for allowing a robot to supervise its own learning in cluttered and dynamic environments.
226

Example-based water animation

Pickup, David Lemor January 2013 (has links)
We present the argument that video footage of real scenes can be used as input examples from which novel three-dimensional scenes can be created. We argue that the parameters used by traditional animation techniques based on the underlying physical properties of the water, do not intuitively relate to the resulting visual appearance. We will present a novel approach which allows a range of video examples to be used as a set of visual parameters to design the visible behaviour of a water animation directly. Our work begins with a method for reconstructing the perceived water surface geometry from video footage of natural scenes, captured with only a single static camera. We show that this has not been accomplished before, because previous approaches use sophisticated capturing systems which are limited to a laboratory environment. We will also present an approach for reconstructing the water surface velocities which are consistent with the reconstructed geometry. We then present a method of using these water surface reconstructions as building blocks which can be seamlessly combined to create novel water surface animations. We are also able to extract foam textures from the videos, which can be applied to the water surfaces to enhance their visual appearance. The surfaces we produce can be shaped and curved to fit within a user's three-dimensional scene, and the movement of external objects can be driven by the velocity fields. We present a range of results which show that our method can plausibly emulate a wide range of real-world scenes, different from those from which the water characteristics were captured. As the animations we create are fully three-dimensional, they can be rendered from any viewpoint, in any rendering style.
227

Computer Vision System-On-Chip Designs for Intelligent Vehicles

Zhou, Yuteng 24 April 2018 (has links)
Intelligent vehicle technologies are growing rapidly that can enhance road safety, improve transport efficiency, and aid driver operations through sensors and intelligence. Advanced driver assistance system (ADAS) is a common platform of intelligent vehicle technologies. Many sensors like LiDAR, radar, cameras have been deployed on intelligent vehicles. Among these sensors, optical cameras are most widely used due to their low costs and easy installation. However, most computer vision algorithms are complicated and computationally slow, making them difficult to be deployed on power constraint systems. This dissertation investigates several mainstream ADAS applications, and proposes corresponding efficient digital circuits implementations for these applications. This dissertation presents three ways of software / hardware algorithm division for three ADAS applications: lane detection, traffic sign classification, and traffic light detection. Using FPGA to offload critical parts of the algorithm, the entire computer vision system is able to run in real time while maintaining a low power consumption and a high detection rate. Catching up with the advent of deep learning in the field of computer vision, we also present two deep learning based hardware implementations on application specific integrated circuits (ASIC) to achieve even lower power consumption and higher accuracy. The real time lane detection system is implemented on Xilinx Zynq platform, which has a dual core ARM processor and FPGA fabric. The Xilinx Zynq platform integrates the software programmability of an ARM processor with the hardware programmability of an FPGA. For the lane detection task, the FPGA handles the majority of the task: region-of-interest extraction, edge detection, image binarization, and hough transform. After then, the ARM processor takes in hough transform results and highlights lanes using the hough peaks algorithm. The entire system is able to process 1080P video stream at a constant speed of 69.4 frames per second, realizing real time capability. An efficient system-on-chip (SOC) design which classifies up to 48 traffic signs in real time is presented in this dissertation. The traditional histogram of oriented gradients (HoG) and support vector machine (SVM) are proven to be very effective on traffic sign classification with an average accuracy rate of 93.77%. For traffic sign classification, the biggest challenge comes from the low execution efficiency of the HoG on embedded processors. By dividing the HoG algorithm into three fully pipelined stages, as well as leveraging extra on-chip memory to store intermediate results, we successfully achieved a throughput of 115.7 frames per second at 1080P resolution. The proposed generic HoG hardware implementation could also be used as an individual IP core by other computer vision systems. A real time traffic signal detection system is implemented to present an efficient hardware implementation of the traditional grass-fire blob detection. The traditional grass-fire blob detection method iterates the input image multiple times to calculate connected blobs. In digital circuits, five extra on-chip block memories are utilized to save intermediate results. By using additional memories, all connected blob information could be obtained through one-pass image traverse. The proposed hardware friendly blob detection can run at 72.4 frames per second with 1080P video input. Applying HoG + SVM as feature extractor and classifier, 92.11% recall rate and 99.29% precision rate are obtained on red lights, and 94.44% recall rate and 98.27% precision rate on green lights. Nowadays, convolutional neural network (CNN) is revolutionizing computer vision due to learnable layer by layer feature extraction. However, when coming into inference, CNNs are usually slow to train and slow to execute. In this dissertation, we studied the implementation of principal component analysis based network (PCANet), which strikes a balance between algorithm robustness and computational complexity. Compared to a regular CNN, the PCANet only needs one iteration training, and typically at most has a few tens convolutions on a single layer. Compared to hand-crafted features extraction methods, the PCANet algorithm well reflects the variance in the training dataset and can better adapt to difficult conditions. The PCANet algorithm achieves accuracy rates of 96.8% and 93.1% on road marking detection and traffic light detection, respectively. Implementing in Synopsys 32nm process technology, the proposed chip can classify 724,743 32-by-32 image candidates in one second, with only 0.5 watt power consumption. In this dissertation, binary neural network (BNN) is adopted as a potential detector for intelligent vehicles. The BNN constrains all activations and weights to be +1 or -1. Compared to a CNN with the same network configuration, the BNN achieves 50 times better resource usage with only 1% - 2% accuracy loss. Taking car detection and pedestrian detection as examples, the BNN achieves an average accuracy rate of over 95%. Furthermore, a BNN accelerator implemented in Synopsys 32nm process technology is presented in our work. The elastic architecture of the BNN accelerator makes it able to process any number of convolutional layers with high throughput. The BNN accelerator only consumes 0.6 watt and doesn't rely on external memory for storage.
228

GPU Based Real-Time Trinocular Stereovision

Yao, Yuanbin 24 August 2012 (has links)
"Stereovision has been applied in many fields including UGV (Unmanned Ground Vehicle) navigation and surgical robotics. Traditionally most stereovision applications are binocular which uses information from a horizontal 2-camera array to perform stereo matching and compute the depth image. Trinocular stereovision with a 3-camera array has been proved to provide higher accuracy in stereo matching which could benefit application like distance finding, object recognition and detection. However, as a result of an extra camera, additional information to be processed would increase computational burden and hence not practical in many time critical applications like robotic navigation and surgical robot. Due to the nature of GPUÂ’s highly parallelized SIMD (Single Instruction Multiple Data) architecture, GPGPU (General Purpose GPU) computing can effectively be used to parallelize the large data processing and greatly accelerate the computation of algorithms used in trinocular stereovision. So the combination of trinocular stereovision and GPGPU would be an innovative and effective method for the development of stereovision application. This work focuses on designing and implementing a real-time trinocular stereovision algorithm with GPU (Graphics Processing Unit). The goal involves the use of Open Source Computer Vision Library (OpenCV) in C++ and NVidia CUDA GPGPU Solution. Algorithms were developed with many different basic image processing methods and a winner-take-all method is applied to perform fusion of disparities in different directions. The results are compared in accuracy and speed to verify the improvement."
229

Continuous memories for representing sets of vectors and image collections / Mémoires continues représentant des ensembles de vecteurs et des collections d’images

Iscen, Ahmet 25 September 2017 (has links)
Cette thèse étudie l'indexation et le mécanisme d'expansion de requête en recherche d'image. L'indexation sacrifie la qualité de la recherche pour une plus grande efficacité; l'expansion de requête prend ce compromis dans l'autre sens : il améliore la qualité de la recherche avec un coût en complexité additionnel. Nous proposons des solutions pour les deux approches qui utilisent une représentation continue d'un ensemble de vecteurs. Pour l'indexation, notre solution est basée sur le test par groupe. Chaque vecteur image est assigné à un groupe, et chaque groupe est représenté par un seul vecteur. C'est la représentation continue de l'ensemble des vecteur du groupe. L'optimisation de cette représentation pour produire un bon test d'appartenance donne une solution basée sur la pseudo-inverse de Moore-Penrose. Elle montre des performances supérieures à celles d'une somme basique des vecteurs du groupe. Nous proposons aussi une alternative suivant au plus près les vecteurs-images de la base. Elle optimise conjointement l'assignation des vecteurs images à des groupes ainsi que la représentation vectorielle de ces groupes. La deuxième partie de la thèse étudie le mécanisme d'expansion de requête au moyen d'un graphe pondéré représentant les vecteurs images. Cela permet de retrouver des images similaires le long d'une même variété géométrique, mais éloignées en distance Euclidienne. Nous donnons une implémentation ultra-rapide de ce mécanisme en créant des représentations vectorielles incorporant la diffusion. Ainsi, le mécanisme d'expansion se réduit à un simple produit scalaire entre les représentations vectorielles lors de la requête. Les deux parties de la thèse fournissent une analyse théorique et un travail expérimental approfondi utilisant les protocoles et les jeux de données standards en recherche d'images. Les méthodes proposées ont des performances supérieures à l'état de l'art. / In this thesis, we study the indexing and query expansion problems in image retrieval. The former sacrifices the accuracy for efficiency, whereas the latter takes the opposite perspective and improves accuracy with additional cost. Our proposed solutions to both problems consist of utilizing continuous representations of a set of vectors. We turn our attention to indexing first, and follow the group testing scheme. We assign each dataset vector to a group, and represent each group with a single vector representation. We propose memory vectors, whose solution is optimized under the membership test hypothesis. The optimal solution for this problem is based on Moore-Penrose pseudo-inverse, and shows superior performance compared to basic sum pooling. We also provide a data-driven approach optimizing the assignment and representation jointly. The second half of the transcript focuses on the query expansion problem, representing a set of vectors with weighted graphs. This allows us to retrieve objects that lie on the same manifold, but further away in Euclidean space. We improve the efficiency of our technique even further, creating high-dimensional diffusion embeddings offline, so that they can be compared with a simple dot product in the query time. For both problems, we provide thorough experiments and analysis in well-known image retrieval benchmarks and show the improvements achieved by proposed methods.
230

Describable Visual Attributes for Face Images

Kumar, Neeraj January 2011 (has links)
We introduce the use of describable visual attributes for face images. Describable visual attributes are labels that can be given to an image to describe its appearance. This thesis focuses mostly on images of faces and the attributes used to describe them, although the concepts also apply to other domains. Examples of face attributes include gender, age, jaw shape, nose size, etc. The advantages of an attribute-based representation for vision tasks are manifold: they can be composed to create descriptions at various levels of specificity; they are generalizable, as they can be learned once and then applied to recognize new objects or categories without any further training; and they are efficient, possibly requiring exponentially fewer attributes (and training data) than explicitly naming each category. We show how one can create and label large datasets of real-world images to train classifiers which measure the presence, absence, or degree to which an attribute is expressed in images. These classifiers can then automatically label new images. We demonstrate the current effectiveness and explore the future potential of using attributes for image search, automatic face replacement in images, and face verification, via both human and computational experiments. To aid other researchers in studying these problems, we introduce two new large face datasets, named FaceTracer and PubFig, with labeled attributes and identities, respectively. Finally, we also show the effectiveness of visual attributes in a completely different domain: plant species identification. To this end, we have developed and publicly released the Leafsnap system, which has been downloaded by almost half a million users. The mobile phone application is a flexible electronic field guide with high-quality images of the tree species in the Northeast US. It also gives users instant access to our automatic recognition system, greatly simplifying the identification process.

Page generated in 0.0618 seconds