I am Devdoot Chatterjee, an undergraduate student at Delhi Technological University, India majoring in Mechanical Engineering. I am working on the project- “Multimodal sound monitoring” as a part of Google Summer of Code 2022.
The first part of this project involves building a bioacoustic source separator to separate orca vocalizations from other background noises. I’ve been working on the source separation model for the last couple of weeks. I have explored two major open-source libraries for audio source separation- Spleeter and Zero-shot audio source separator.
Both these models were good but had their shortcomings. When compared to the pre-trained Spleeter model, the Zero-shot model was able to extract orca frequencies better. However, I also observed that the pre-trained Spleeter model was better at reducing the background noise. In the case of the pre-trained Spleeter model, the quality of the orca vocalization got reduced. The Zero-shot model, on the other hand, could better maintain the features of orca vocalization at the expense of some background noise. So, the only option I had was to keep fine-tuning the models until I was happy with the outcome.
The main challenge that I faced was preparing the dataset to train the model, since not many examples of isolated vocals of Southern resident killer whales with minimal background noise were available, which were required to train these neural networks. So, I had to manually search through a large number of hydrophone recordings to get high-quality orca vocalization sounds to train the model.
Meanwhile, Ambra, another GSOC volunteer, has designed a fantastic GUI for pre-processing audio data. Additionally, one can also use this GUI to extract orca vocalization from a hydrophone recording using the fine-tuned Spleeter model.