Bug 302253
| Summary: | AAC AudioEncoder produces incorrect decoder config description | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | David <dpayr> |
| Component: | Media | Assignee: | Nobody <webkit-unassigned> |
| Status: | NEW | ||
| Severity: | Normal | CC: | jer.noble, webkit-bug-importer, youennf |
| Priority: | P2 | Keywords: | InRadar |
| Version: | Safari 26 | ||
| Hardware: | Mac (Apple Silicon) | ||
| OS: | macOS 26 | ||
David
Repro (open in Safari): https://hilarious-souffle-80c4cb.netlify.app/
---
It appears to me that using an AudioEncoder to encode AAC emits incorrect "description" bytes in the decoder config.
When I encode 48000Hz mono audio, this is the description it gives me:
3, 128, 128, 128, 34, 0, 0, 0, 4, 128, 128, 128, 20, 64, 20, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 128, 128, 128, 2, 17, 136, 6, 128, 128, 128, 1, 2
Per spec, these bytes have to be an AudioSpecificConfig as defined in ISO 14496-3 Section 1.6.2.1. However, when interpreting the bytes according to this spec, they make no sense. They parse as:
Object type: 0 ("null", makes no sense)
Frequency index: 7 (-> 22050 Hz sample rate, also wrong)
Channel configuration: 0 (means "Defined in AOT Specifc Config", but this could just be "1" indicating mono)
Chrome's description bytes are:
17, 136
This makes much more sense.
Object type: 2 (AAC LC)
Frequency index: 3 (-> 48000 Hz)
Channel configuration: 1 (-> mono)
So, it appears to me that WebKit is emitting gibberish here, albeit very deterministic gibberish. I'm sure it adheres to *some* format, but it definitely does not seem to adhere to ISO 14496-3 Section 1.6.2.1.
It is to be noted that piping the encoded audio back into AudioDecoder successfully decodes the audio (even with the fauly description), but muxing the audio alongside the description into an MP4 file creates a silent audio track - one that FFmpeg classifies as:
Stream #0:00x1: Audio: aac (mp4a / 0x6134706D), 22050 Hz, 0 channels, fltp, 118 kb/s (default)
| Attachments | ||
|---|---|---|
| Add attachment proposed patch, testcase, etc. |
David
Update: The bytes appear to be the contents of an "esds" box which contains the AudioSpecificConfig. The relevant bytes are [17, 136] (8 bytes from the end) which are the actual description.
However, this violates the WebCodecs spec:
> If description is present, it is assumed to a AudioSpecificConfig as defined in [iso14496-3] section 1.6.2.1, Table 1.15, and the bitstream is assumed to be in aac.
Radar WebKit Bug Importer
<rdar://problem/164878480>