This blog post is the continuation of the previous post describing the development of the GitHub Actions workflows for Orcasound. Since the first post was published I’ve worked on moving my OOI (Ocean Observatories Initiative) code to use a different data acquisition method, adding input parameters to the OOI and Orcasound workflows, and extracting data acquisition steps into separate Actions.
First of all, the workflow that works with OOI data was changed to utilize the OOIPy Python package (hinted at in the first blog post). While previously we manually traversed a directory for the given date on the OOI website and downloaded each file, we now use input parameters and thereby have much more freedom in choosing exactly what we want to process. We used to download all files for a date (and even that date was hardcoded as “yesterday”) which was problematic because OOI sometimes lists super short files (a few milliseconds long, when usually recordings are 5 minutes long). Using OOIPy we now pass start date, end date, and segment length and it handles chopping/stitching raw data into uniform audio segments which are then saved as .wav files and processed. This approach also allowed me to synchronize OOI and Orcasound workflows; now both get data, convert it to .wav files (OOI from .mseed and Orcasound from .ts) and then process each file using the same functions. Another input for the OOI workflow is the hydrophone node from which to get data.
Everything described above applies to the “Python code” level of abstraction and all these input parameters were extended to the “workflow” level of abstraction (we are working on the GitHub Actions after all!). I have actually learned quite a few things about GitHub Actions syntax while working on this (for example that it’s possible to set up different environment variables depending on if the workflow was triggered with a particular input or not).
As part of the OOI workflow rewrite, I have added some unit tests for utility functions and integration tests downloading data from OOI and creating spectrograms on this data. I haven’t had previous experience writing tests in Python, so I’m glad I could learn something completely new!
And right now I’m in the process of incorporating the feedback for the OOI rewrite from my mentor, Valentina: extending the same functionality to the Orcasound workflow; and extracting data acquisition steps into separate GitHub Actions. This last task is probably the most valuable deliverable of this summer project because it will allow other open-source users to utilize data streams that are otherwise difficult to access in their GitHub Actions workflows (and possibly even outside of GitHub Actions if they just reuse the Docker container used in the Action).
Last but not least I would like to highlight how the Orcasound organisation structured its work with GSoC students. Most discussions happen on the Orcasound Slack where the #gsoc-2021 channel was created. We also have GSoC video meetings each Tuesday (1 hour) through Google Meet where students report their progress, share goals for next week and ask for help with blocking issues, and mentors share some ideas and notes. Plus every two weeks we have “personal” calls where each student discusses the work with their respective mentors. Quite often our discussion wanders away to more abstract topics like Orcasound goals as an organization, high-level overviews of the current and planned architecture, first sightings (and hearings!) of orcas for the summer in the Salish Sea, the decline in salmon population and even favorite local dishes (thanks in part to the “ice breaker” opening section of the meeting — thanks a lot for this idea, Scott, it really helps to ease my nerves for the meeting!). Additionally, at the end of July all GSoC students were invited to the HALLO call (Humans and ALgorithms Listening for Orcas — Canadian-led effort to classify SRKW, Bigg’s, and humpback calls in the Salish Sea and Northeast Pacific) where we’ve met more passionate scientists and software developers. I’m equally excited about the networking opportunities and the fact that my code will potentially be used even more widely.
P.S. Almost forgot to mention that I have already contributed a bit to the above mentioned OOIPy package!