FFmpeg 入門(1)：截取視頻幀_ZenDei技術網路在線

本文轉自：[FFmpeg 入門(1)：截取視頻幀 | www.samirchen.com][2] 背景在 Mac OS 上如果要運行教程中的相關代碼需要先安裝 FFmpeg，建議使用 brew 來安裝：或者你可以參考[在 Mac OS 上編譯 FFmpeg][5]使用源碼編譯和安裝 FFmpeg ...

本文轉自：FFmpeg 入門(1)：截取視頻幀 | www.samirchen.com

背景

在 Mac OS 上如果要運行教程中的相關代碼需要先安裝 FFmpeg，建議使用 brew 來安裝：

// 用 brew 安裝 FFmpeg：
brew install ffmpeg

或者你可以參考在 Mac OS 上編譯 FFmpeg使用源碼編譯和安裝 FFmpeg。

教程原文地址：http://dranger.com/ffmpeg/tutorial01.html，本文中的代碼做過部分修正。

概要

媒體文件通常有一些基本的組成部分。首先，文件本身被稱為「容器(container)」，容器的類型定義了文件的信息是如何存儲，比如，AVI、QuickTime 等容器格式。接著，你需要瞭解的概念是「流(streams)」，例如，你通常會有一路音頻流和一路視頻流。流中的數據元素被稱為「幀(frames)」。每路流都會被相應的「編/解碼器(codec)」進行編碼或解碼（codec 這個名字就是源於 COded 和 DECoded）。codec 定義了實際數據是如何被編解碼的，比如你用到的 codecs 可能是 DivX 和 MP3。「數據包(packets)」是從流中讀取的數據片段，這些數據片段中包含的一個個比特就是解碼後能最終被我們的應用程式處理的原始幀數據。為了達到我們音視頻處理的目標，每個數據包都包含著完整的幀，在音頻情況下，一個數據包中可能會包含多個音頻幀。

基於以上這些基礎，處理視頻流和音頻流的過程其實很簡單：

1：從 video.avi 文件中打開 video_stream。
2：從 video_stream 中讀取數據包到 frame。
3：如果數據包中的 frame 不完整，則跳到步驟 2。
4：處理 frame。
5：跳到步驟 2。

儘管在一些程式中上面步驟 4 處理 frame 的邏輯可能會非常複雜，但是在本文中的常式中，用 FFmpeg 來處理多媒體文件的部分會寫的比較簡單一些，這裡我們將要做的就是打開一個媒體文件，讀取其中的視頻流，將視頻流中獲取到的視頻幀寫入到 PPM 文件中保存起來。

下麵我們一步一步來實現。

打開媒體文件

首先，我們來看看如何打開媒體文件。在使用 FFmpeg 時，首先需要初始化對應的 Library。

#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libswscale/swscale.h>
#include <libavutil/imgutils.h>
//...

int main(int argc, char *argv[]) {

    // Register all formats and codecs.
    av_register_all();

    // ...
}

上面的代碼會註冊 FFmpeg 庫中所有可用的「視頻格式」和「codec」，這樣當使用庫打開一個媒體文件時，就能找到對應的視頻格式處理程式和 codec 來處理。需要註意的是在使用 FFmpeg 時，你只需要調用 av_register_all() 一次即可，因此我們在 main 中調用。當然，你也可以根據需求只註冊給定的視頻格式和 codec，但通常你不需要這麼做。

接下來我們就要準備打開媒體文件了，那麼媒體文件中有哪些信息是值得註意的呢？

是否包含：音頻、視頻。
碼流的封裝格式，用於解封裝。
視頻的編碼格式，用於初始化視頻解碼器
音頻的編碼格式，用於初始化音頻解碼器。
視頻的解析度、幀率、碼率，用於視頻的渲染。
音頻的採樣率、位寬、通道數，用於初始化音頻播放器。
碼流的總時長，用於展示、拖動 Seek。
其他 Metadata 信息，如作者、日期等，用於展示。

這些關鍵的媒體信息，被稱作 metadata，常常記錄在整個碼流的開頭或者結尾處，例如：wav 格式主要由 wav header 頭來記錄音頻的採樣率、通道數、位寬等關鍵信息；mp4 格式，則存放在 moov box 結構中；而 FLV 格式則記錄在 onMetaData 中等等。

avformat_open_input 這個函數主要負責伺服器的連接和碼流頭部信息的拉取，我們就用它來打開媒體文件：

AVFormatContext *pFormatCtx = NULL;

// Open video file.
if (avformat_open_input(&pFormatCtx, argv[1], NULL, NULL) != 0) {
    return -1; // Couldn't open file.
}

我們從程式入口獲得要打開文件的路徑，作為 avformat_open_input 函數的第二個參數傳入，這個函數會讀取媒體文件的文件頭並將文件格式相關的信息存儲在我們作為第一個參數傳入的 AVFormatContext 數據結構中。avformat_open_input 函數的第三個參數用於指定媒體文件格式，第四個參數是文件格式相關選項。如果你後面這兩個參數傳入的是 NULL，那麼 libavformat 將自動探測文件格式。

接下來對於媒體信息的探測和分析工作就要交給 avformat_find_stream_info 函數了：

// Retrieve stream information.
if (avformat_find_stream_info(pFormatCtx, NULL) < 0) {
    return -1; // Couldn't find stream information.
}

avformat_find_stream_info 函數會為 pFormatCtx->streams 填充對應的信息。這裡還有一個調試用的函數 av_dump_format 可以為我們列印 pFormatCtx 中都有哪些信息。

// Dump information about file onto standard error.
av_dump_format(pFormatCtx, 0, argv[1], 0);

AVFormatContext 里包含了下麵這些跟媒體信息有關的成員：

struct AVInputFormat *iformat; // 記錄了封裝格式信息
unsigned int nb_streams; // 記錄了該 URL 中包含有幾路流
AVStream **streams; // 一個結構體數組，每個對象記錄了一路流的詳細信息
int64_t start_time; // 第一幀的時間戳
int64_t duration; // 碼流的總時長
int64_t bit_rate; // 碼流的總碼率，bps
AVDictionary *metadata; // 一些文件信息頭，key/value 字元串

你拿到這些數據後，與 av_dump_format 的輸出對比可能會發現一些不同，這時候可以去看看 FFmpeg 源碼中 av_dump_format 的實現，裡面對列印出來的數據是有一些處理邏輯的。比如對於 start_time 的處理代碼如下：

if (ic->start_time != AV_NOPTS_VALUE) {
    int secs, us;
    av_log(NULL, AV_LOG_INFO, ", start: ");
    secs = ic->start_time / AV_TIME_BASE;
    us = llabs(ic->start_time % AV_TIME_BASE);
    av_log(NULL, AV_LOG_INFO, "%d.%06d", secs, (int) av_rescale(us, 1000000, AV_TIME_BASE));
}

由此可見，經過 avformat_find_stream_info 的處理，我們可以拿到媒體資源的封裝格式、總時長、總碼率了。此外 pFormatCtx->streams 是一個 AVStream 指針的數組，裡面包含了媒體資源的每一路流信息，數組的大小為 pFormatCtx->nb_streams。

AVStream 結構體中關鍵的成員包括：

AVCodecContext *codec; // 記錄了該碼流的編碼信息
int64_t start_time; // 第一幀的時間戳
int64_t duration; // 該碼流的時長
int64_t nb_frames; // 該碼流的總幀數
AVDictionary *metadata; // 一些文件信息頭，key/value 字元串
AVRational avg_frame_rate; // 平均幀率

這裡可以拿到平均幀率。

AVCodecContext 則記錄了一路流的具體編碼信息，其中關鍵的成員包括：

const struct AVCodec *codec; // 編碼的詳細信息
enum AVCodecID codec_id; // 編碼類型
int bit_rate; // 平均碼率
video only：
- int width, height; // 圖像的寬高尺寸，碼流中不一定存在該信息，會由解碼後覆蓋
- enum AVPixelFormat pix_fmt; // 原始圖像的格式，碼流中不一定存在該信息，會由解碼後覆蓋
audio only：
- int sample_rate; // 音頻的採樣率
- int channels; // 音頻的通道數
- enum AVSampleFormat sample_fmt; // 音頻的格式，位寬
- int frame_size; // 每個音頻幀的 sample 個數

可以看到編碼類型、圖像的寬度高度、音頻的參數都在這裡了。

瞭解完這些數據結構，我們接著往下走，直到我們找到一個視頻流：

// Find the first video stream.
videoStream = -1;
for (i = 0; i < pFormatCtx->nb_streams; i++) {
    if(pFormatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO) {
        videoStream = i;
        break;
    }
}
if (videoStream == -1) {
    return -1; // Didn't find a video stream.
}

// Get a pointer to the codec context for the video stream.
pCodecCtxOrig = pFormatCtx->streams[videoStream]->codec;

流信息中關於 codec 的部分存儲在 codec context 中，這裡包含了這路流所使用的所有的 codec 的信息，現在我們有一個指向它的指針了，但是我們接著還需要找到真正的 codec 並打開它：

// Find the decoder for the video stream.
pCodec = avcodec_find_decoder(pCodecCtxOrig->codec_id);
if (pCodec == NULL) {
    fprintf(stderr, "Unsupported codec!\n");
    return -1; // Codec not found.
}
// Copy context.
pCodecCtx = avcodec_alloc_context3(pCodec);
if (avcodec_copy_context(pCodecCtx, pCodecCtxOrig) != 0) {
    fprintf(stderr, "Couldn't copy codec context");
    return -1; // Error copying codec context.
}

// Open codec.
if (avcodec_open2(pCodecCtx, pCodec, NULL) < 0) {
    return -1; // Could not open codec.
}

需要註意，我們不能直接使用視頻流中的 AVCodecContext，所以我們需要用 avcodec_copy_context() 來拷貝一份新的 AVCodecContext 出來。

存儲數據

接下來，我們需要一個地方來存儲視頻中的幀：

AVFrame *pFrame = NULL;

// Allocate video frame.
pFrame = av_frame_alloc();

由於我們計劃將視頻幀輸出存儲為 PPM 文件，而 PPM 文件是會存儲為 24-bit RGB 格式的，所以我們需要將視頻幀從它本來的格式轉換為 RGB。FFmpeg 可以幫我們做這些。對於大多數的項目，我們可能都有將原來的視頻幀轉換為指定格式的需求。現在我們就來創建一個AVFrame 用於格式轉換：

// Allocate an AVFrame structure.
pFrameRGB = av_frame_alloc();
if (pFrameRGB == NULL) {
    return -1;
}

儘管我們已經分配了記憶體類處理視頻幀，當我們轉格式時，我們仍然需要一塊地方來存儲視頻幀的原始數據。我們使用 av_image_get_buffer_size 來獲取需要的記憶體大小，然後手動分配這塊記憶體。

int numBytes;
uint8_t *buffer = NULL;

// Determine required buffer size and allocate buffer.
numBytes = av_image_get_buffer_size(AV_PIX_FMT_RGB24, pCodecCtx->width, pCodecCtx->height, 1);
buffer = (uint8_t *) av_malloc(numBytes * sizeof(uint8_t));

av_malloc 是一個 FFmpeg 的 malloc，主要是對 malloc 做了一些封裝來保證地址對齊之類的事情，它不會保證你的代碼不發生記憶體泄漏、多次釋放或其他 malloc 問題。

現在我們用 av_image_fill_arrays 函數來關聯 frame 和我們剛纔分配的記憶體。

// Assign appropriate parts of buffer to image planes in pFrameRGB Note that pFrameRGB is an AVFrame, but AVFrame is a superset of AVPicture
av_image_fill_arrays(pFrameRGB->data, pFrameRGB->linesize, buffer, AV_PIX_FMT_RGB24, pCodecCtx->width, pCodecCtx->height, 1);

現在，我們準備從視頻流讀取數據了。

讀取數據

接下來我們要做的就是從整個視頻流中讀取數據包 packet，並將數據解碼到我們的 frame 中，一旦獲得完整的 frame，我們就轉換其格式並存儲它。

AVPacket packet;
int frameFinished;
struct SwsContext *sws_ctx = NULL;

// Initialize SWS context for software scaling.
sws_ctx = sws_getContext(pCodecCtx->width, pCodecCtx->height, pCodecCtx->pix_fmt, pCodecCtx->width, pCodecCtx->height, AV_PIX_FMT_RGB24, SWS_BILINEAR, NULL, NULL, NULL);

// Read frames and save first five frames to disk.
i = 0;
while (av_read_frame(pFormatCtx, &packet) >= 0) {
    // Is this a packet from the video stream?
    if (packet.stream_index == videoStream) {
        // Decode video frame
        avcodec_decode_video2(pCodecCtx, pFrame, &frameFinished, &packet);

        // Did we get a video frame?
        if (frameFinished) {
            // Convert the image from its native format to RGB.
            sws_scale(sws_ctx, (uint8_t const * const *) pFrame->data, pFrame->linesize, 0, pCodecCtx->height, pFrameRGB->data, pFrameRGB->linesize);

            // Save the frame to disk.
            if (++i <= 5) {
                SaveFrame(pFrameRGB, pCodecCtx->width, pCodecCtx->height, i);
            }
        }
    }

    // Free the packet that was allocated by av_read_frame.
    av_packet_unref(&packet);
}

接下來的程式是比較好理解的：av_read_frame() 函數從視頻流中讀取一個數據包 packet，把它存儲在 AVPacket 數據結構中。需要註意，我們只創建了 packet 結構，FFmpeg 則為我們填充了其中的數據，其中 packet.data 這個指針會指向這些數據，而這些數據占用的記憶體需要通過 av_packet_unref() 函數來釋放。avcodec_decode_video2() 函數將數據包 packet 轉換為視頻幀 frame。但是，我們可能無法通過只解碼一個 packet 就獲得一個完整的視頻幀 frame，可能需要讀取多個 packet 才行，avcodec_decode_video2() 會在解碼到完整的一幀時設置 frameFinished 為真。最後當解碼到完整的一幀時，我們用 sws_scale() 函數來將視頻幀本來的格式 pCodecCtx->pix_fmt 轉換為 RGB。記住你可以將一個 AVFrame 指針轉換為一個 AVPicture 指針。最後，我們使用我們的 SaveFrame 函數來保存這一個視頻幀到文件。

在 SaveFrame 函數中，我們將 RGB 信息寫入到一個 PPM 文件中。

void SaveFrame(AVFrame *pFrame, int width, int height, int iFrame) {
    FILE *pFile;
    char szFilename[32];
    int y;
  
    // Open file.
    sprintf(szFilename, "frame%d.ppm", iFrame);
    pFile = fopen(szFilename, "wb");
    if (pFile == NULL) {
        return;
    }
  
    // Write header.
    fprintf(pFile, "P6\n%d %d\n255\n", width, height);
  
    // Write pixel data.
    for (y = 0; y < height; y++) {
        fwrite(pFrame->data[0]+y*pFrame->linesize[0], 1, width*3, pFile);
    }
  
    // Close file.
    fclose(pFile);
}

下麵我們回到 main 函數，當我們完成了視頻流的讀取，我們需要做一些掃尾工作：

// Free the RGB image.
av_free(buffer);
av_frame_free(&pFrameRGB);

// Free the YUV frame.
av_frame_free(&pFrame);

// Close the codecs.
avcodec_close(pCodecCtx);
avcodec_close(pCodecCtxOrig);

// Close the video file.
avformat_close_input(&pFormatCtx);

return 0;

你可以看到，這裡我們用 av_free() 函數來釋放我們用 av_malloc() 分配的記憶體。

以上便是我們這節教程的全部內容，其中的完整代碼你可以從這裡獲得：https://github.com/samirchen/TestFFmpeg

編譯執行

你可以使用下麵的命令編譯它：

$ gcc -o tutorial01 tutorial01.c -lavutil -lavformat -lavcodec -lswscale -lz -lm

找一個媒體文件，你可以這樣執行一下試試：

$ tutorial01 myvideofile.mp4