Return to search

Automatic classification of fish and bubbles at pixel-level precision in multi-frequency acoustic echograms using U-Net convolutional neural networks

Multi-frequency backscatter acoustic profilers (echosounders) are used to measure biological and physical phenomena in the ocean in ways that are not possible with optical methods. Echosounders are commonly used on ocean observatories and by commercial fisheries but require significant manual effort to classify species of interest within the collected echograms. The work presented in this thesis tackles the challenging task of automating the identification of fish and other phenomena in echosounder data, with specific application to aggregations of juvenile salmon, schools of herring, and bubbles of air that have been mixed into the water.

U-Net convolutional neural networks (CNNs) are used to accomplish this task by identifying classes at the pixel level. The data considered here were collected in Okisollo Channel on the coast of British Columbia, Canada, using an Acoustic Zooplankton and Fish Profiler at four frequencies (67.5, 125, 200, and 455 kHz). The entrainment of air bubbles and the behaviour of fish are both governed by the surrounding physical environment. To improve the classification, simulated channels for water depth and solar elevation angle (a proxy for sunlight) are used to encode the CNNs with information related to the environment providing spatial and temporal context. The manual annotation of echograms at the pixel level is a challenging process, and a custom application was developed to aid in this process. A relatively small set of annotations were created and are used to train the CNNs. During training, the echogram data are divided into randomly-spaced square tiles to encode the models with robust features, and into overlapping tiles for added redundancy during classification. This is done without removing noise in the data, thus ensuring broad applicability. This approach is proven highly successful, as evidenced by the best-performing U-Net model producing F1 scores of 93.0%, 87.3% and 86.5% for herring, salmon, and bubble classes, respectively. These models also achieve promising results when applied to echogram data with coarser resolution.

One goal in fisheries acoustics is to detect distinct schools of fish. Following the initial pixel level classification, the results from the best performing U-Net model are fed through a heuristic module, inspired by traditional fisheries methods, that links connected components of identified fish (school candidates) into distinct school objects. The results are compared to the outputs from a recent study that relied on a Mask R-CNN architecture to apply instance segmentation for classifying fish schools. It is demonstrated that the U-Net/heuristic hybrid technique improves on the Mask R-CNN approach by a small amount for the classification of herring schools, and by a large amount for aggregations of juvenile salmon (improvement in mean average precision from 24.7% to 56.1%). / Graduate

Identiferoai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/13850
Date05 April 2022
CreatorsSlonimer, Alex
ContributorsDosso, Stanley Edward
Source SetsUniversity of Victoria
LanguageEnglish, English
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf
RightsAvailable to the World Wide Web

Page generated in 0.0021 seconds