Sound source separation is a popular and active research area, especially with modern machine learning techniques. In this thesis, the focus is on single-channel separation of two speakers into individual streams, and specifically considering the case where two speakers are also accompanied by background noise. There are different methods to separate speakers and in this thesis three different methods are evaluated: the Conv-TasNet, the DPTNet, and the FaSNetTAC. The methods were used to train models to perform the sound source separation. These models were evaluated and validated through three experiments. Firstly, previous results for the chosen separation methods were reproduced. Secondly, appropriate models applicable for NFC's datasets and applications were created, to fulfill the aim of this thesis. Lastly, all models were evaluated on an independent dataset, similar to datasets from NFC. The results were evaluated using the metrics SI-SNRi and SDRi. This thesis provides recommended models and methods suitable for NFC applications, especially concluding that the Conv-TasNet and the DPTNet are reasonable choices.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-192849 |
Date | January 2023 |
Creators | Gidlöf, Amanda |
Publisher | Linköpings universitet, Kommunikationssystem |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.004 seconds