Spelling suggestions: "subject:"silicon retinal""
1 |
Image Compression and Channel Error Correction using Neurally-Inspired Network ModelsWatkins, Yijing Zhang 01 May 2018 (has links)
Everyday an enormous amount of information is stored, processed and transmitted digitally around the world. Neurally-inspired compression models have been rapidly developed and researched as a solution to image processing tasks and channel error correction control. This dissertation presents a deep neural network (DNN) for gray high-resolution image compression and a fault-tolerant transmission system with channel error-correction capabilities. A feed-forward DNN implemented with the Levenberg-Marguardt learning algorithm is proposed and implemented for image compression. I demonstrate experimentally that the DNN not only provides better quality reconstructed images but also requires less computational capacity as compared to DCT Zonal coding, DCT Threshold coding, Set Partitioning in Hierarchical Trees (SPIHT) and Gaussian Pyramid. An artificial neural network (ANN) with improved channel error-correction rate is also proposed. The experimental results indicate that the implemented artificial neural network provides a superior error-correction ability by transmitting binary images over the noisy channel using Hamming and Repeat-Accumulate coding. Meanwhile, the network’s storage requirement is 64 times less than the Hamming coding and 62 times less than the Repeat-Accumulate coding. Thumbnail images contain higher frequencies and much less redundancy, which makes them more difficult to compress compared to high-resolution images. Bottleneck autoencoders have been actively researched as a solution to image compression tasks. However, I observed that thumbnail images compressed at a 2:1 ratio through bottleneck autoencoders often exhibit subjectively low visual quality. In this dissertation, I compared bottleneck autoencoders with two sparse coding approaches. Either 50\% of the pixels are randomly removed or every other pixel is removed, each achieving a 2:1 compression ratio. In the subsequent decompression step, a sparse inference algorithm is used to in-paint the missing the pixel values. Compared to bottleneck autoencoders, I observed that sparse coding with a random dropout mask yields decompressed images that are superior based on subjective human perception yet inferior according to pixel-wise metrics of reconstruction quality, such as PSNR and SSIM. With a regular checkerboard mask, decompressed images were superior as assessed by both subjective and pixel-wise measures. I hypothesized that alternative feature-based measures of reconstruction quality would better support my subjective observations. To test this hypothesis, I fed thumbnail images processed using either bottleneck autoencoder or sparse coding using either checkerboard or random masks into a Deep Convolutional Neural Network (DCNN) classifier. Consistent, with my subjective observations, I discovered that sparse coding with checkerboard and random masks support on average 2.7\% and 1.6\% higher classification accuracy and 18.06\% and 3.74\% lower feature perceptual loss compared to bottleneck autoencoders, implying that sparse coding preserves more feature-based information. The optic nerve transmits visual information to the brain as trains of discrete events, a low-power, low-bandwidth communication channel also exploited by silicon retina cameras. Extracting high-fidelity visual input from retinal event trains is thus a key challenge for both computational neuroscience and neuromorphic engineering. % Here, we investigate whether sparse coding can enable the reconstruction of high-fidelity images and video from retinal event trains. Our approach is analogous to compressive sensing, in which only a random subset of pixels are transmitted and the missing information is estimated via inference. We employed a variant of the Locally Competitive Algorithm to infer sparse representations from retinal event trains, using a dictionary of convolutional features optimized via stochastic gradient descent and trained in an unsupervised manner using a local Hebbian learning rule with momentum. Static images, drawn from the CIFAR10 dataset, were passed to the input layer of an anatomically realistic retinal model and encoded as arrays of output spike trains arising from separate layers of integrate-and-fire neurons representing ON and OFF retinal ganglion cells. The spikes from each model ganglion cell were summed over a 32 msec time window, yielding a noisy rate-coded image. Analogous to how the primary visual cortex is postulated to infer features from noisy spike trains in the optic nerve, we inferred a higher-fidelity sparse reconstruction from the noisy rate-coded image using a convolutional dictionary trained on the original CIFAR10 database. Using a similar approach, we analyzed the asynchronous event trains from a silicon retina camera produced by self-motion through a laboratory environment. By training a dictionary of convolutional spatiotemporal features for simultaneously reconstructing differences of video frames (recorded at 22HZ and 5.56Hz) as well as discrete events generated by the silicon retina (binned at 484Hz and 278Hz), we were able to estimate high frame rate video from a low-power, low-bandwidth silicon retina camera.
|
2 |
Asynchronous Event-Feature Detection and Tracking for SLAM InitializationTa, Tai January 2024 (has links)
Traditional cameras are most commonly used in visual SLAM to provide visual information about the scene and positional information about the camera motion. However, in the presence of varying illumination and rapid camera movement, the visual quality captured by traditional cameras diminishes. This limits the applicability of visual SLAM in challenging environments such as search and rescue situations. The emerging event camera has been shown to overcome the limitations of the traditional camera with the event camera's superior temporal resolution and wider dynamic range, opening up new areas of applications and research for event-based SLAM. In this thesis, several asynchronous feature detectors and trackers will be used to initialize SLAM using event camera data. To assess the pose estimation accuracy between the different feature detectors and trackers, the initialization performance was evaluated from datasets captured from various environments. Furthermore, two different methods to align corner-events were evaluated on the datasets to assess the difference. Results show that besides some slight variation in the number of accepted initializations, the alignment methods show no overall difference in any metric. Overall highest performance among the event-based trackers for initialization is HASTE with mostly high pose accuracy and a high number of accepted initializations. However, the performance degrades in featureless scenes. CET on the other hand shows mostly lower performance compared to HASTE.
|
Page generated in 0.0723 seconds