I am having trouble with manually decoding a quicktime file. An
application I have outputs sound files in quicktime format. The data
stored in them is 44.1 kHz 2-channel 16-bit audio, and I need to convert
the file format to WAV for further processing. This should be fairly
straight forward.
Yes, I know I can do this with the 'export' function in QT Pro. Yes, I
know I can automate this with applescript, and run it from the command
line using osascript. But for various reasons, I do not want to use QT
library calls or a GUI app (load distribution, machines without QT or QT
pro, ability to batch and pause execution, etc.).
That said, here's the issue, with an example. One of my QT data files,
when opened in QT, shows 44100 Hz, 16-bit, 172.2 kb/sec, which is, as
near as I can tell, perfectly reasonable. It is 9.16 in length, and
sounds fine.
If I use QT Pro to export it to WAV, and re-open the file, the details,
including duration, are the same (except for the endian swap).
I have written an atom parser, and run it on the QT file. The results
are rather interesting:
'mvhd' contains time scale of 600, duration 5723, and a preferred rate
of 0x10000. The QT file format spec from Apple says that rate is a
fixed point, but doesn't specify the binary point, so I presume this
corresponds to the "1.0" "normal" rate to which it refers. The 'tkhd'
atom contains the same 5723 duration, as does the 'elst' atom. I
should note that there has never been any editing done on this file
since initial creation; there is a single entry in the edit list atom.
This corresponds to 9.538 seconds, which differs from the 9.16 that QT
Pro displays. But it gets more interesting:
'mdhd' contains a time scale of 44100, which is perfectly reasonable.
However, it also contains a duration of 452842, which corresponds to
roughly 10.27 seconds.
The 'stsd' atom defines the data format as 'twos' and the sample rate as
[ac 44 00 00] (44.1 kHz), confirms the 2-channel 16-bit depth.
'stts' repeats the sample count of 452842. 'stsz' says the same of
sample size.
The 'stsc' atom defines the following:
chunk 1: 45276 samples/channel
chunk 2: 45285 samples/channel
chunk 6: 45286 samples/channel
chunk 7: 45285 samples/channel
The 'stco' lists 10 chunk entries, with the chunk offsets as:
4096
368640
733184
1097728
1462272
1826816
2191360
2555904
2920448
3284992
So, for starters, this explains why the QT file is HUGE compared to the
WAV: the program that creates it MASSIVELY pads the the 'mdat' with
zeroes. Don't ask me why. I didn't write it. :-)
So, this indicates 1 chunk of 45276, 4 chunks (2-5) of 45285, one of
45286, and 4 of 45285, for a grand total of 452842, which matches the
other atom data.
So, following all of the information given in the atoms, I write a WAV
file header, extract the chunks, endian-swap them, and write them to the
WAV file.
The problem is that this is over 10 seconds of data (10.27 by the raw
numbers, 10.08 as reported when the resultant file is opened in QT Pro.)
When played, the resultant WAV file is indeed longer, and contains about
an extra second of sound at the end, which sounds like a portion from
earlier in the actual sound file (that is, it's not junk / noise or
silence).
Is there any portion of the atom structure that I'm missing? Is there a
part that might tell me to 'skip' exporting some of this data?
The decoded atom structure follows. Thanks in advance for any
assistance.
MKS
---begin---
ATOM: foft: 0
atom: size: 32
atom: type: 'ftyp'
atom: data: 71 74 20 20 20 05 03 00 (7/24) 'qt ?? '
atom: data: 71 74 20 20 00 00 00 00 (15/24) 'qt ?? qt '
atom: data: 00 00 00 00 00 00 00 00 (23/24) 'qt ?? qt '
ATOM: foft: 32
atom: size: 8
atom: type: 'wide'
ATOM: foft: 40
atom: size: 3649496
atom: type: 'mdat'
ATOM: foft: 3649536
atom: size: 691
atom: type: 'moov'
ATOM: foft: 3649544
atom: size: 108
atom: type: 'mvhd'
atom: mvhd: vflg: 00 00 00 00 ' '
atom: mvhd: ctim: 3218630523
atom: mvhd: mtim: 3218630533
atom: mvhd: tsca: 600
atom: mvhd: dura: 5723
atom: mvhd: prat: 65536
atom: mvhd: pvol: 256
atom: mvhd: resv: 00 00 00 00 ' '
atom: mvhd: resv: 00 00 00 00 ' '
atom: mvhd: resv: 00 00 ' '
atom: mvhd: matx: 00 01 00 00 ' ? '
atom: mvhd: matx: 00 00 00 00 ' '
atom: mvhd: matx: 00 00 00 00 ' '
atom: mvhd: matx: 00 00 00 00 ' '
atom: mvhd: matx: 00 01 00 00 ' ? '
atom: mvhd: matx: 00 00 00 00 ' '
atom: mvhd: matx: 00 00 00 00 ' '
atom: mvhd: matx: 00 00 00 00 ' '
atom: mvhd: matx: 40 00 00 00 '@ '
atom: mvhd: pvtm: 0
atom: mvhd: pvdu: 0
atom: mvhd: potm: 0
atom: mvhd: sltm: 0
atom: mvhd: sldu: 0
atom: mvhd: ctim: 0
atom: mvhd: ntid: 2
ATOM: foft: 3649652
atom: size: 575
atom: type: 'trak'
ATOM: foft: 3649660
atom: size: 92
atom: type: 'tkhd'
atom: tkhd: vflg: 00 00 00 0f ' §'
atom: tkhd: ctim: 3218630533
atom: tkhd: mtim: 3218630533
atom: tkhd: tkid: 1
atom: tkhd: resv: 00 00 00 00 ' '
atom: tkhd: dura: 5723
atom: tkhd: resv: 00 00 00 00 ' '
atom: tkhd: resv: 00 00 00 00 ' '
atom: tkhd: layr: 0
atom: tkhd: agrp: 0
atom: tkhd: volm: 256
atom: tkhd: resv: 00 00 ' '
atom: tkhd: matx: 00 01 00 00 ' ? '
atom: tkhd: matx: 00 00 00 00 ' '
atom: tkhd: matx: 00 00 00 00 ' '
atom: tkhd: matx: 00 00 00 00 ' '
atom: tkhd: matx: 00 01 00 00 ' ? '
atom: tkhd: matx: 00 00 00 00 ' '
atom: tkhd: matx: 00 00 00 00 ' '
atom: tkhd: matx: 00 00 00 00 ' '
atom: tkhd: matx: 40 00 00 00 '@ '
atom: tkhd: widt: 0
atom: tkhd: hght: 0
ATOM: foft: 3649752
atom: size: 36
atom: type: 'edts'
ATOM: foft: 3649760
atom: size: 28
atom: type: 'elst'
atom: elst: #ent: 1
atom: elst: tdur: 5723
atom: elst: mtim: 0
atom: elst: mrat: 65536
ATOM: foft: 3649788
atom: size: 439
atom: type: 'mdia'
ATOM: foft: 3649796
atom: size: 32
atom: type: 'mdhd'
atom: mdhd: vflg: 00 00 00 00 ' '
atom: mdhd: ctim: 3218630533
atom: mdhd: mtim: 3218630533
atom: mdhd: tsca: 44100
atom: mdhd: dura: 452842
atom: mdhd: lang: 0
atom: mdhd: qual: 0
ATOM: foft: 3649828
atom: size: 58
atom: type: 'hdlr'
atom: hdlr: vflg: 00 00 00 00 ' '
atom: hdlr: ctyp: 6d 68 6c 72 'm h l r'
atom: hdlr: csub: 73 6f 75 6e 's o u n'
atom: hdlr: cman: 1634758764
atom: hdlr: cflg: 10 00 00 00 '? '
atom: hdlr: mask: 65842
atom: hdlr: name: 19 ('?')
atom: hdlr: name: 41 ('A')
atom: hdlr: name: 70 ('p')
atom: hdlr: name: 70 ('p')
atom: hdlr: name: 6c ('l')
atom: hdlr: name: 65 ('e')
atom: hdlr: name: 20 (' ')
atom: hdlr: name: 53 ('S')
atom: hdlr: name: 6f ('o')
atom: hdlr: name: 75 ('u')
atom: hdlr: name: 6e ('n')
atom: hdlr: name: 64 ('d')
atom: hdlr: name: 20 (' ')
atom: hdlr: name: 4d ('M')
atom: hdlr: name: 65 ('e')
atom: hdlr: name: 64 ('d')
atom: hdlr: name: 69 ('i')
atom: hdlr: name: 61 ('a')
atom: hdlr: name: 20 (' ')
atom: hdlr: name: 48 ('H')
atom: hdlr: name: 61 ('a')
atom: hdlr: name: 6e ('n')
atom: hdlr: name: 64 ('d')
atom: hdlr: name: 6c ('l')
atom: hdlr: name: 65 ('e')
atom: hdlr: name: 72 ('r')
ATOM: foft: 3649886
atom: size: 341
atom: type: 'minf'
ATOM: foft: 3649894
atom: size: 16
atom: type: 'smhd'
atom: smhd: vflg: 00 00 00 00 ' '
atom: smhd: blnc: 0
atom: smhd: resv: 00 00 ' '
ATOM: foft: 3649910
atom: size: 57
atom: type: 'hdlr'
atom: hdlr: vflg: 00 00 00 00 ' '
atom: hdlr: ctyp: 64 68 6c 72 'd h l r'
atom: hdlr: csub: 61 6c 69 73 'a l i s'
atom: hdlr: cman: 1634758764
atom: hdlr: cflg: 10 00 00 01 '? ?'
atom: hdlr: mask: 65867
atom: hdlr: name: 18 ('?')
atom: hdlr: name: 41 ('A')
atom: hdlr: name: 70 ('p')
atom: hdlr: name: 70 ('p')
atom: hdlr: name: 6c ('l')
atom: hdlr: name: 65 ('e')
atom: hdlr: name: 20 (' ')
atom: hdlr: name: 41 ('A')
atom: hdlr: name: 6c ('l')
atom: hdlr: name: 69 ('i')
atom: hdlr: name: 61 ('a')
atom: hdlr: name: 73 ('s')
atom: hdlr: name: 20 (' ')
atom: hdlr: name: 44 ('D')
atom: hdlr: name: 61 ('a')
atom: hdlr: name: 74 ('t')
atom: hdlr: name: 61 ('a')
atom: hdlr: name: 20 (' ')
atom: hdlr: name: 48 ('H')
atom: hdlr: name: 61 ('a')
atom: hdlr: name: 6e ('n')
atom: hdlr: name: 64 ('d')
atom: hdlr: name: 6c ('l')
atom: hdlr: name: 65 ('e')
atom: hdlr: name: 72 ('r')
ATOM: foft: 3649967
atom: size: 36
atom: type: 'dinf'
ATOM: foft: 3649975
atom: size: 28
atom: type: 'dref'
atom: dref: vflg: 00 00 00 00 ' '
atom: dref: nent: 1
atom: dref: data: 00 (' ')
atom: dref: data: 00 (' ')
atom: dref: data: 00 (' ')
atom: dref: data: 0c ('?')
atom: dref: data: 61 ('a')
atom: dref: data: 6c ('l')
atom: dref: data: 69 ('i')
atom: dref: data: 73 ('s')
atom: dref: data: 00 (' ')
atom: dref: data: 00 (' ')
atom: dref: data: 00 (' ')
atom: dref: data: 01 ('?')
ATOM: foft: 3650003
atom: size: 224
atom: type: 'stbl'
ATOM: foft: 3650011
atom: size: 52
atom: type: 'stsd'
atom: stsd: #ent: 1
atom: stsd: sdsz: 36
atom: stsd: dfmt: 74 77 6f 73 ('twos')
atom: stsd: dref: 00 01
atom: stsd: #chn: 2
atom: stsd: bitd: 16
atom: stsd: cmsz: 00 00 00 00
atom: stsd: srat: ac 44 00 00
ATOM: foft: 3650063
atom: size: 24
atom: type: 'stts'
atom: stsd: #ent: 1
atom: stsd: scnt: 452842
atom: stsd: sdur: 1
ATOM: foft: 3650087
atom: size: 64
atom: type: 'stsc'
atom: stsd: #ent: 4
atom: stsd: 1chk: 1
atom: stsd: spch: 45276
atom: stsd: sdid: 1
atom: stsd: 1chk: 2
atom: stsd: spch: 45285
atom: stsd: sdid: 1
atom: stsd: 1chk: 6
atom: stsd: spch: 45286
atom: stsd: sdid: 1
atom: stsd: 1chk: 7
atom: stsd: spch: 45285
atom: stsd: sdid: 1
ATOM: foft: 3650151
atom: size: 20
atom: type: 'stsz'
atom: stsd: #ent: 1
atom: stsd: ssiz: 452842
ATOM: foft: 3650171
atom: size: 56
atom: type: 'stco'
atom: stsd: #ent: 10
atom: stsd: chko: 4096
atom: stsd: chko: 368640
atom: stsd: chko: 733184
atom: stsd: chko: 1097728
atom: stsd: chko: 1462272
atom: stsd: chko: 1826816
atom: stsd: chko: 2191360
atom: stsd: chko: 2555904
atom: stsd: chko: 2920448
atom: stsd: chko: 3284992
---end---

Signature
MKS
Melissa Schrumpf - 31 Dec 2005 04:00 GMT
I have an update. I was incorrect in my first post: the sample data I
converted from QT MOV to WAV did not have extraneous data on the end. I
converted a half dozen full-length files, and found the conversions to
be acceptable. It's the QT player that actually truncates the file. If
there's 20 minutes of audio, it may display and play perhaps only 19:45
of it. This is QT 7.0.3 under OSX 10.4.3. When I use QT Pro to export
these files, they export truncated as well. But the data is all there.
If I extract it myself using the information stored in the atoms, it's
all present.
So, three questions:
1. Why would QT Player truncate the data? There are no edits in these
files.
2. In performing time trials to test the conversion speed, I found that
QT Player actually has TWO methods of export. If you do the export
manually, (File->Export, select "Sound to Wave" and "Default Settings")
it pops up a separate dialog box, and the export takes approximately 48
seconds for a sample 200Mb on my poor old G4.
On the other hand, if I attempt to automate it using osascript:
tell application "QuickTime Player"
with timeout of 1.0E+308 seconds
close windows
open file "Path:to:inputfile.mov"
export movie 1 to "Path:to:outputfile.wav" as wave
--using settings preset "44.1 kHz 16 bit Stereo"
--it doesn't matter whether I use default settings or specify
--the format that it's already in, this still takes forever
close movie 1
end timeout
end tell
A different export progress bar opens, connected to the QT Player window
itself, rather than floating. This export takes considerably longer...
on the order of 6 minutes.
Why are there two different export methods invoked? Is the latter
method resampling? How could I access the former, faster export method
(e.g. from AppleScript/OSA or QT calls in C)?
3. This is only incidental, but why does QT Player Pro take so darned
long to simply convert big-endian to little-endian data? In my own
code, which, granted, I optimized rather heavily, I can export the same
data file on the same machine in just under 20 seconds, what takes
Apple's tool 48 or 357 seconds (depending on the method).
Is this extra overhead from the QT library methods, or is QT Player Pro
just particularly bad?
Anyhow, thanks for any information you can offer on what part of the QT
file format may be tripping up QT Player into thinking the data is
shorter than it really is.
Much appreciated.

Signature
MKS