Customer is reporting Speaker Label issues
Introduction
Customers will sometimes reach out about issues with Speaker Labels not returning the correct number of speakers. This playbook will cover how to troubleshoot those issues and common situations that can cause the Speaker Diarization model to have difficulty identifying a unique speaker.
Procedures
There are a few different steps involved in troubleshooting an issue with Speaker Labels. We will walk through those in order in this section, starting with the first step:
Gather Information
The purpose of this step is to make sure you have a clear understanding of the issues the customer is seeing so that you can begin documenting the issue in Pylon and troubleshooting things. Clarify the issue with the customer and gather the information necessary to troubleshoot the problem. This should include:
-
Audio files related to the affected requests.
-
Transcript ids or JSON responses for the affected requests.
-
The correct number of speakers for each file.
-
Specific examples where the speaker diarization was not correct. For example, “At the 18:32 minute mark Speaker A says ‘And now we will talk about our sales numbers from the third quarter of last year.’ but it is attributed to Speaker B”. Or “These two utterances (see JSON snippet) were attributed to two different speakers but they were spoken by the same person”. Note: The customer doesn’t need to outline every single instance where the speaker diarization is off but we want to get a couple of examples if possible so that we know what to look for when troubleshooting the issue.
-
The frequency of how often they are seeing these types of issues. i.e. is it happening across many different files or just limited to these specific files?
Test the Customer Files
In this step, we will test the customer files to see if we can recreate the issues they are seeing. In addition to confirming the issue, this will also help determine possible causes of the issue.
-
Run the files the customer shared with speaker_labels set to true. Do not include any other parameters the customer might be using at this point in the process.
- Check the results to see if you were able to replicate the issues the customer is reporting. If you are, skip to the “Evaluating the Results” section of this guide, otherwise continue to the next step in this section.
-
Check the information the customer shared to determine which parameters were included in their requests. Rerun the files the customer shared using speaker_labels and one of the parameters the customer used. Repeat that process until you have iterated through each parameter the customer was using. For example, if the customer was using speaker_labels, iab_categories, and auto_chapters you would run the files once with speaker_labels and iab_categoriesand then again with speaker_labels and auto_chapters.
- Check the results for each combination of parameters to see if you were able to replicate the issues the customer is reporting. If you are, skip to the “Evaluating the Results” section of this guide, otherwise continue to the next step in this section.
-
Rerun the customer files using all of the parameters they included in their request.
- Check the results to see if you were able to replicate the issues the customer is reporting. If you are, skip to the “Evaluating the Results” section of this guide, otherwise continue to the next step in this section.
-
If you have been unable to replicate the issues the customer is reporting to this point, it could be some type of ephemeral issue or something that was fixed in an update that was released between when the customer ran their requests and when you did your testing. To be sure, reach out to a senior colleague on the Support team and share what you are seeing with them. If both agree that everything seems to be in order, follow up with the customer to let them know that you have been unable to recreate the issue and ask them if they would be able to rerun the files and see how things now look on their end.
Evaluating the Results
Once you have replicated the results the customer is reporting you can start working through those results to identify potential causes or issues. The exact process you will take here can vary depending on the type of issues the customer is seeing.