Improving the open source Orca Active Learning (OrcaAL) bioacoustic labeling tool

Hey everyone, I’m Benjamin! I’m a 3rd year Computer Engineering student from Singapore, although I will be moving to California sometime mid-August for my university exchange. A little about myself, I am a machine learning enthusiast who started out my journey in data science, and am gradually exploring other areas such as software development and DevOps, the latter of which is one of the core foci of my GSoC project. 

This year, I’m really excited and grateful to get the opportunity to work with Orcasound on GSoC 2022. At the start of this year, I started to volunteer with Orcasound even before the GSoC evaluation period opened. The community was really open and I was always made to feel welcome, be it asking questions in the Slack forums or tuning in to the weekly Wednesday standups (although that got a little difficult after a while due to the time-zone differences). 

It was fascinating learning about orcas as well while contributing. I had no idea that killer whales could be grouped based on their ecotypes, and developed specialized hunting techniques depending on their prey! It was also a somewhat surreal experience listening to the whales for the first time, something which you can do here, or occasionally tuning into the Slack community channels where Scott or Val would post updates should there be any interesting calls heard on the hydrophones.

Whilst I did contribute to a few repositories including Orca-Action-Workflow and Orca-hls-utils, I decided to focus on writing my proposal about OrcaAL, as I wanted to practise some of the skills and concepts I had learnt on my own about software engineering & DevOps. 

For GSoC 2022, I am presently working on improving the OrcaAL tool, a project originally done by GSoC students Kunal and Diego in 2020, before Jose worked on it last year as well for GSoC 2021. The main goal of my project is to improve the underlying infrastructure of OrcaAL. The main aspects of OrcaAL are illustrated in the diagram below:

I have spent the past few weeks implementing Docker Compose functionality in the OrcaAL repository, as well as adding the requisite functionalities in Github Actions to perform testing and linting as part of the CI/CD pipeline. Why Docker Compose, you might ask? 

Firstly, OrcaAL can be intimidating to a new dev as it is a standalone fullstack solution, with many different parts of the stack which might be unfamiliar to the uninitiated. The backend is served up using frameworks in Python, such as Flask and Gunicorn, while the deep learning aspect involves the use of Keras. A Postgres database is also currently being used to store metadata. On top of all that, there is a frontend webapp written in vanilla HTML, CSS and Javascript, and bundled using Webpack. 

Given the multitude of technologies in use, it is difficult for new devs only experienced in any one language or area to be able to debug set-up and environment-based issues. To resolve this, I implemented docker compose functionality to orchestrate spinning up all the services with just a single command. 

Secondly, this also simplifies the issue of testing with Github Actions, where all services need to be up and running before the tests are executed. The docker compose file allows for a painless connection between the various services since they are all part of the same network.

After several hours of grueling work, I finally got the tests and linting to run properly on Github Actions. 

Next up, I’ll be working on refactoring the existing code further. This will hopefully allow for new devs to be more easily on-boarded without requiring AWS credentials to set up their environment. This will make it easier for both Orcasound’s administrators (hassle of having to issue credentials to so many new developers and navigating possible security issues) & new potential contributors (who may be deterred by the difficulty in setting up the environment). 

Definitely looking forward to continuing my work over the coming weeks!

Leave a Reply

Your email address will not be published. Required fields are marked *