How to Check Audio File Metadata
Introduction
This playbook outlines the process for checking and troubleshooting metadata issues on uploaded media files. Metadata plays a critical role in determining whether a file is valid for processing. Incorrect or malformed metadata can result in various errors, such as “no audio” messages, upload failures, timestamp inconsistencies, or mismatches in reported audio duration.
This guide is intended for Support Engineers to use during investigations where file-level issues are suspected.
Procedures
1. Initial Triage
- Verify whether the file plays correctly locally or in a media player (e.g., VLC, Audacity).
2. Check for Supported Format
- Ensure the file format is supported (e.g.,
.mp3,.mp4,.wav,.m4a, etc.). - Unsupported formats (e.g.,
.vvc) can result in errors like “no audio” or upload failures.
3. Inspect Metadata
-
Use the following command to check the metadata:
ffprobe -i input-file -v quiet -print_format json -show_format -show_streams -hide_banner -
Examine the output for:
- Duration
- Codec info (audio/video streams)
- Bitrate
- Sample rate
- Channels
4. Common Error Scenarios & Fixes
❌ File Has No Audio (but clearly has audio)
- Check if the file contains an audio stream.
- Look for mismatches in containers (e.g., video stream present but no audio stream).
- Check if the audio stream codec is supported.
❌ Upload 422 Error
-
Indicates a malformed or unreadable file.
-
Double-check that:
- The file isn’t empty or corrupted.
- Required headers are present.
- Container/codec are supported.
⚠️ Try transcoding the file. Sometimes, ffmpeg will return an error that provides additional insights related to the bullet points above. The error may mention missing headers, a different codec than what’s listed in the metadata or some other form of corruption.
⚠️ Audio Duration Inconsistency
-
If API reports a different duration than expected:
-
Compare reported duration in the metadata output.
⚠️ If the file is a video, compare the duration of the video and the audio stream. The API should return the duration of the audio stream (we transcode the file to a wav audio file before transcription), but the customer may have expected the API to return the duration of the video stream.
💡Good to know: Sometimes, inconsistent duration can also lead to timestamp issues.
-
5. Leverage an LLM for Review
- If there are no obvious issues when reviewing the metadata, you can share the metadata with an LLM and prompt “Is anything unusual about this metadata?” to catch possible anomalies. From there, evaluate if it’s something worth sharing with the customer.
⚠️ Try to keep customer responses high-level if possible. It’s fine to say something along the lines of “there is an edge case specific to this file” and ask for additional examples, especially if you aren’t sure what the issue is and the LLM’s answer is too speculative. If you’re going to use the LLM’s answer, try verifying the facts through other sources. Again, it is not necessary to go into too much detail if you’re unsure.