Using the Window Media Codecs in DirectShow
The Windows Media Audio and Video encoder and decoder objects were originally designed and optimized to work with the ASF file container format and the Windows Media Format SDK. The codec objects work well in DirectShow for certain scenarios, namely one-pass CBR and quality based VBR encoding of video streams. But if you are considering using the codec objects directly in DirectShow using file containers other than ASF, there are certain behaviors and issues that you should be aware of in advance.
If you are going to use standalone codecs with DirectShow, you will probably want to use them as DMOs only. In other words, you will be using the IMediaObject interface instead of IMFTransform.
WM Audio in AVI Files
You can use DirectShow to encode WMA streams into any file container format for which you have a multiplexer filter. However, the Windows Media Audio and Video codec interfaces do not support WMA in AVI files because it is impossible, using the default DirectShow AVI playback filters, to maintain audio-video sync in an AVI file with a WMA stream. For more information, see Storing Compressed Media in AVI Files.
The audio encoder DMO outputs samples of varying duration, even when in «constant bit rate» mode. It therefore works best with file container formats that use time stamps. AVI files do not provide a time stamp for each audio sample or group of samples. In DirectShow, the AVI Splitter filter manufactures time stamps for each group of samples (each audio frame) based on the nAvgBytesPerSec value in the WAVEFORMATEX structure in the AVI stream header.
The assumption underlying this calculation is that all audio samples in the stream are of equal duration; however, the samples output by the DMO are not of equal duration, and so the time stamps applied by the AVI Splitter are not accurate. Therefore, it is not possible, without modifying either the AVI Splitter or the audio decoder DMO, to use any DirectShow-based application to play back AVI files with audio and video streams in sync. The Windows Media Audio 9 Voice codec will work in some cases, but even this will lose sync after any seek operation, so it really cannot be considered a viable solution.
If you have an MP3 encoder, you can create AVI files with WMV and MP3 for the audio stream. Such files will play back and seek correctly in Windows Media Player and other DirectShow-based applications, because the AVI Splitter contains special handling code for MP3 streams. Another option is to use uncompressed PCM audio, although obviously the resulting file size will be much larger than with a compressed audio stream. Because the DirectShow sample application creates AVI files, it does not demonstrate how to use the audio encoder DMO.
One-pass Encoding
The video encoder DMO works easily in DirectShow for two encoding modes: CBR and quality based VBR. As long as you follow the correct order of operations when building the filter graph, as demonstrated in the sample application, it is relatively simple to place WMV content into an AVI file using the AVI Multiplexer and the File Writer.
Two-pass Encoding
The two-pass encoding modes require a more complex approach to graph building and streaming in order to prevent the DMO from flushing its contents from the first pass before beginning the second pass. In two-pass encoding, it is necessary to run the graph once so the DMO can perform its preprocessing analysis of the file data, and then rewind the graph and run it again so that the DMO can do the actual encoding.
When the graph goes into a run state for the second pass, the DMO Wrapper sets the DISCONTINUITY flag on the first sample, because the time stamp is not sequential with the last time stamp on the first pass. When the DMO, which was not designed to work in DirectShow in this way, receives the DISCONTINUITY flag, it performs a flush and loses the data stored from the first pass. To work around this issue, the best solution is probably to write a custom DMO Wrapper filter that does not set the DISCONTINUITY flag when the graph is seeked after the first pass. The Video for Windows (VfW) sample in this SDK demonstrates how to perform two-pass encoding.
Interlaced Content
The WMV encoder DMO is able to encode interlaced content while preserving the interlacing, which is useful for content that is captured from a TV and might also be played back on a TV. However, it is not possible to preserve interlacing using the default DMO Wrapper, because that filter does not support INSSBuffer on its input samples.
The DMO uses that interface to obtain the interlaced settings for each sample that it receives. If the interface is not found, as is the case with the DMO Wrapper, the DMO simply treats the input samples as noninterlaced. To perform interlaced encoding in DirectShow, there are several alternatives. The easiest approach is probably to use the Windows Media Format 9 Series SDK either directly or using the WM ASF Writer DirectShow filter, to create an interlaced ASF file. You can then transcode that file into some other format. If you transcode into AVI, you will have an interlaced file, but the standard DirectShow AVI playback filters will not recognize it as such because they do not support VIDEOINFOHEADER2. Another approach is to write your own DMO Wrapper filter that supports the INSSBuffer interface.
Using Windows Media in DirectShow
This section describes how to use DirectShow to play and write Advanced Systems Format (ASF) files. ASF files commonly contain audio and video content encoded using the Windows Media Audio and Video codecs. However, ASF can contain any type of data.
The following DirectShow filters support reading and writing ASF files:
- WM ASF Reader Filter. Reads ASF files.
- WM ASF Writer Filter. Wrties ASF files.
- DMO Wrapper Filter. Wraps the Windows Media encoder and decoder DMOs.
Versions
The WM ASF Reader and WM ASF Writer filters are packaged in the DLL named qasf.dll, and the filters are collectively named «QASF components.» These filters are wrappers for the Windows Media Format SDK. The DLL (qasf.dll) was first published in the DirectX SDK but was later updated in the Windows Media Format SDK. Here is the version history of the QASF filters:
- DirectShow 8.1 supports Windows Media Format SDK version 7.0.
- DirectShow 9.0 supports Windows Media Format SDK version 7.1.
- Windows XP Service Pack 2 supports Windows Media Format 9 SDK.
- Windows Vista supports Windows Media Format 11 SDK.
- Windows Media Format 9 SDK and later contain corresponding versions of QASF.
To get the latest version of QASF, always download the latest Windows Media Format SDK.
Legacy Windows Media Source Filter
In Windows XP Service Pack 1 and earlier, the default source filter for ASF files (.asf, .wmv, and .wma file extensions) is the obsolete Windows Media Source Filter. This behavior was maintained to ensure backward compatibility with applications that used the Windows Media Player 6.4. New applications should use the newer versions of QASF, which make the WM ASF Reader filter the default filter for playing ASF files.
For more information on the Windows Media suite of software development kits, see the Audio and Video section of the MDSN Library.
This article contains the following topics:
DirectShowSource
DirectShowSource reads media files using Microsoft DirectShow, the same multimedia playback system that WMP (Windows Media Player) uses. It can read most formats that WMP can, including MP4, MP3, most MOV (QuickTime) files, as well as AVI files that AVISource doesn’t support (like DV type 1, or files using DirectShow-only codecs). There is also support for GraphEdit (grf) files.
There are some caveats:
- Some decoders (notably MS MPEG-4) will produce upside-down video. You’ll have to use FlipVertical.
- DirectShow video decoders are not required to support frame-accurate seeking. In most cases seeking will work, but on some it might not.
- DirectShow video decoders are not even required to tell you the frame rate of the incoming video. Most do, but the ASF decoder doesn’t. You have to specify the frame rate using the fps parameter, like this: DirectShowSource(«video.asf», fps=15).
- This version automatically detects the Microsoft DV codec and sets it to decode at full (instead of half) resolution. I guess this isn’t a caveat. 🙂
- Also this version attempts to disable any decoder based deinterlacing.
Try reading AVI files with AviSource first. For non-AVI files, try FFmpegSource or LSMASHSource. If that doesn’t work then try this filter instead.
Contents
Syntax and Parameters
DirectShowSource(string filename [, float fps, bool seek, bool audio, bool video,
bool convertfps, bool seekzero, int timeout, string pixel_type, int framecount, string logfile, int logmask ] )
string filename =
The path of the source file; path can be omitted if the source file is in the same directory as the AviSynth script (*.avs). float fps = (auto)
Frames Per Second of the resulting clip. This is sometimes needed to specify the framerate. If the framerate or the number of frames is incorrect (this can happen with ASF or MOV clips for example), use this option to force the correct framerate. For live sources, this is like «max fps» that will be displayed. bool seek = true
There is full seeking support available on most file formats. If problems occur, try setting seekzero =true first. If seeking still causes problems, disable seeking completely with seek =false. With seeking disabled and trying to seek backwards, the audio stream returns silence, and the video stream returns the most recently rendered frame. Note the Avisynth cache may provide limited access to the previous few frames, but beyond that the most recently frame rendered will be returned. bool audio = true
Enable audio on the resulting clip. The channel ordering is the same as in the wave-format-extensible format, because the input is always decompressed to WAV. For more information, see also GetChannel. AviSynth loads 8, 16, 24 and 32 bit int PCM samples, and float PCM format, and any number of channels. bool video = true
Enable video on the resulting clip. bool convertfps = false
If true, it turns VFR (variable framerate) video into CFR (constant framerate) video by adding frames. This allows you to open VFR video in AviSynth. It is most useful when fps is set to the least common multiple of the component frame rates, e.g. 120 or 119.880. bool seekzero = false
If true, restrict backwards seeking only to the beginning, and seeking forwards is done the hard way (by reading all samples). Limited backwards seeking is allowed with non-indexed ASF. [dubious – discuss] int timeout =
For positive values DirectShowSource waits for up to timeout milliseconds for the DirectShow graph to start. timeout is clamped between [5000,300000] milliseconds. If the graph fails to start a compile time exception is thrown. Once the graph starts, each GetFrame/GetAudio call will wait for up to the timeout value and then return a grey frame or silence for the audio. No runtime exceptions are ever thrown because of time-outs. For negative values DirectShowSource waits for up to 2000 milliseconds for the DirectShow graph to start. If the graph fails to start it is ignored at that point and the initial graph start wait is deferred until the first GetFrame/GetAudio call. If any GetFrame/GetAudio call experiences a timeout a runtime exception is then thrown. string pixel_type = (auto)
Request a color format from the decompressor. Valid values are:
YV24, | YV16, | YV12, | I420, | NV12, | YUY2, | AYUV, | Y41P, | Y411, | ARGB, | RGB32, | RGB24, | YUV, | YUVex, | RGB, | AUTO, | FULL |
By default, upstream DirectShow filters are free to bid all of their supported media types in the order of their choice. A few DirectShow filters get this wrong. The pixel_type argument limits the acceptable video stream subformats for the IPin negotiation. Note the graph builder may add a format converter to satisfy your request, so make sure the codec in use can actually decode to your chosen format. The MS format converter is just adequate. The «YUV» and «RGB» pseudo-types restrict the negotiation to all official supported YUV or RGB formats respectively. The «YUVex» also includes YV24, YV16, I420 and NV12 non-standard pixel types. The «AUTO» pseudo-type permits the negotiation to use all relevant official formats, YUV plus RGB. The «FULL» pseudo-type includes the non-standard pixel types in addition to those supported by «AUTO». The full order of preference is YV24, YV16, YV12, I420, NV12, YUY2, AYUV, Y41P, Y411, ARGB, RGB32, RGB24. Many DirectShow filters get this wrong, which is why it is not enabled by default. The option exists so you have enough control to encourage the maximum range of filters to serve your media. (See discussion.)
The non-standard pixel types use the following GUID’s respectively :-
In other words, if pixel_type =»AUTO», it will try to output YV24; if that isn’t possible it tries YV16, and if that isn’t possible it tries YV12, etc. For planar color formats, adding a '+' prefix, e.g. DirectShowSource(. pixel_type=»+YV12″) , tells AviSynth the video rows are DWORD aligned in memory instead of packed. This can fix skew or tearing of the decoded video when the width of the picture is not divisible by 4. int framecount = (auto)
Sometimes needed to specify the frame count of the video. If the framerate or the number of frames is incorrect (this can happen with ASF or MOV clips for example), use this option to force the correct number of frames. If fps is also specified, the length of the audio stream is adjusted. For live sources, specify a very large number. string logfile =
Use this option to specify the name of a log file for debugging. int logmask = 35
When a logfile is specified, use this option to select which information is logged.
Value | Data |
---|---|
1 | Format Negotiation |
2 | Receive samples |
4 | GetFrame/GetAudio calls |
8 | Directshow callbacks |
16 | Requests to Directshow |
32 | Errors |
64 | COM object use count |
128 | New objects |
256 | Extra info |
512 | Wait events |
Add the values together of the data you need logged. Specify -1 to log everything. The default, 35, logs 1+2+32, or Format Negotiation, Received samples and Errors.
Examples
Opens an AVI with the first available RGB format (without audio):