Detection of Orca calls using Active Learning: GSoC progress report #3

Here are some jargons that are written in this blog that you might not be familiar with: Uncertainty threshold: It defines the boundary i.e(specify the upper and lower limits) that we want to keep in order to determine the range that we want to set, to classify a sample as uncertain.
For Example: In our case, since 0 means a model has detected a call and 1 means that a model has detected no call, therefore all the samples that are closer to 0.5 would mean that the model is having a hard time in predicting the label of the sample and we need to select a boundary that such that all the samples predicted by the model within that range would be passed to the expert for labeling.

Accuracy: For binary classification, accuracy can be calculated in terms of positives and negatives as follows:

Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False Negati

In this final phase, I have worked on python scripts, docker file, documentation, and tests. I have also researched how changing the uncertainty threshold or the number of samples used in training affects the accuracy of the model (as measured with a test dataset).

This blog post is the continuation of the previous blog posts where I showed how the accuracy of the model improved with the help of active learning. In this phase, we will dive deeper and see how changing the threshold for uncertainty affects the accuracy of the model predictions test dataset. The test dataset, in this case, consists of 201 preprocessed Mel spectrogram samples that are generated after applying PCEN and Wavelet denoising. In the last blog, the upper limit was 0.6 and the lower limit was 0.4. In this blog post, we would see how changing the upper and lower limits to 0.9 and 0.1 affect the model.

Here is the flowchart for the active learning loop where the threshold for uncertainty is between 0.1 to 0.9 (in the purple diamond).

Flowchart

Thanks to my mentor Jesse, he advised this idea of choosing the uncertain range between 0.1 to 0.9, and an increase in accuracy was found with the help of this method as compared to the method that I used in the previous blog-post where the range was between 0.4 to 0.6. The first three steps of Preprocessing Spectrograms, Building the CNN model, and training it are the same and are explained in the previous blog-posts.

The new step in this phase is labeling only the uncertain samples having the prediction probability between 0.1 to 0.9. These uncertain samples would be labeled by experts like Scott and Val and then be used for training along with the old training samples.

The steps taken in the above flowchart are as follows:

1) Preprocess Spectrograms: Generate Melspectrograms with the help of and librosa library  and then apply PCEN and Wavelet-denoising. These spectrograms are generated from the audio files containing calls and no calls.

2) Train our CNN model on training data.

Note: A small subset from the training data has been removed (or withheld) for active learning on which the model is not trained on.

3) Test the accuracy of the model on the test data. The test dataset consists of 201 samples of Mel Spectrograms generated after applying PCEN and Wavelet Denoising, where 101 are calls and 100 are no calls samples.

4) Use this model to estimate the probability on the subset of the training sample and check if the probability prediction is relatively uncertain (with a value between 0.1 and 0.9, in this case).

5) If yes, ask experts like Scott, Val to label them and pass them to the training directory depending on the labels annotated by the expert.

6) If no, then ask for the next batch of samples to be labeled.

7) After a certain number of samples within these batches are labeled, retrain the model with this new data combined with the old data.

8) Measure the accuracy of the model on test data and compare it with previous accuracy results.

The distribution of the training, active learning, retraining, and test datasets is the same as in the previous blog. Here, is the distribution chart:

Within the 176 samples processed through the active learning loop, there were 163 predictions made by the model that I define as uncertain (with values between 0.1 and 0.9 in this case). There were also 12 predictions of confident calls and 1 sample in which the model was confident there was no call.

These 163 uncertain samples are being validated and then combined sent with the previous training dataset to retrain the model. The new accuracy of the model was found to be 84%.

[[78 23]

[ 9 91]]

acc: 0.8408

sensitivity: 0.7723

specificity: 0.9100

Thus, since the accuracy of the model without active learning was 82.5% as we saw in the previous blog, and accuracy of the model with active learning caused an increase of 1.5% where the uncertainty range is from 0.1 to 0.9.

Another task that I worked on was developing Python scripts for preprocessing, training, and active learning. Previously, much of this code was embedded in Python notebooks. The links to those scripts could be found here.

Preprocessing script: This script is used to convert the raw audio dataset into spectrograms with the help of a .tsv file specifying the start time, duration of the call, and the label for the call. The different types of spectrograms that the script supports are :

1) Power_spectral_density spectrograms

2) Grayscale power_spectral_density_spectrograms

3) Mel Spectrogram

4) Mel Spectrogram with PCEN

5) Mel Spectrogram with wavelet-denoising

Training_script: These are some of the scripts for training and model building, predicting, and generating statistics like a ROC curve.

1) Model_building and training script: This script is used to build and train the CNN model.

2) Statistics: This script is used to generate a ROC curve

3) Report: This script is used to generate the report of how the model performed on the test dataset.

4) Model_predict: This script is used to predict whether the spectrogram contains a call.