linux環境編程(2): 使用pipe完成進程間通信

1. 寫在前面 linux系統內核為上層應用程式提供了多種進程間通信(IPC)的手段，適用於不同的場景，有些解決進程間數據傳遞的問題，另一些則解決進程間的同步問題。對於同樣一種IPC機制，又有不同的API供應用程式使用，目前有POSIX IPC以及System V IPC可以為應用程式提供服務。後續 ...

1. 寫在前面

linux系統內核為上層應用程式提供了多種進程間通信(IPC)的手段，適用於不同的場景，有些解決進程間數據傳遞的問題，另一些則解決進程間的同步問題。對於同樣一種IPC機制，又有不同的API供應用程式使用，目前有POSIX IPC以及System V IPC可以為應用程式提供服務。後續的系列文章將逐一介紹消息隊列，共用記憶體，信號量，socket，fifo等進程間通信方法，本篇文章主要總結了管道相關係統調用的使用方式。文中代碼可以在這個代碼倉庫中獲取，代碼中使用了我自己實現的一個單元測試框架，對測試框架感興趣的同學可以參考上一篇文章。

2. pipe介紹

在linux環境進行日常開發時，管道是一種經常用到的進程間通信方法。在shell環境下，'|'就是連接兩個進程的管道，它可以把一個進程的標準輸出通過管道寫到另一個進程的標準輸入，利用管道以及重定向，各種命令行工具經過組合之後可以實現一個及其複雜的功能，這也是繼承自UNIX的一種編程哲學。

除了在shell腳本中使用管道，另一種方式是通過系統調用去操作管道。使用pipe或者pipe2創建管道，得到兩個文件描述符，分別是管道的讀端和寫端，有了文件描述符，進程就可以像讀寫普通文件一樣對管道進行read和write操作，操作完成之後調用close關閉管道的兩個文件描述符即可。可以看到，當完成創建之後，管道的使用和普通文件相比沒有什麼區別。

管道有兩個特點: 1) 通常只能在有親緣關係的進程之間進行通信； 2) 閱後即焚；有親緣關係是指，通信的兩個進程可以是父子進程或者兄弟進程，這裡的父子和兄弟是一個廣義的概念, 子進程可以是父進程調用了多次fork創建出來的，而不僅局限在只經過一次fork，總之，只要通信雙方的進程拿到了管道的文件描述符就可以使用管道了。說”閱後即焚“是因為管道中的數據在被進程讀取之後就會被管道清除掉。有一個形象的比喻說，管道就像某個進程家族各個成員之間傳遞情報的中轉站，情報內容閱後即焚。

3. pipe的基本使用

在使用管道時，需要註意管道中數據的流動方向，通常都是把管道作為一個單向的數據通道使用的。雖然通信雙方可以都持有管道的讀端和寫端，然後使用同一個管道實現雙向通信，但這種方式實際上很少使用。下麵通過幾段代碼說明幾種使用管道的方法：

3.1 自言自語

管道雖然時進程間通信的一種手段，但一個進程自言自語也是可以的，內核並沒有限制管道的兩端必須由不同的進程操作。下麵的代碼展示了一個孤獨的進程怎樣通過管道自言自語，代碼中使用了自己實現的測試框架cutest。執行之後它將從管道的另一頭收到前一個時刻發給自己的消息。

CUTEST_CASE(basic_pipe, talking_to_myself) {
    int pipefd[2];
    pipe(pipefd);

    const char *msg = "I'm talking to myself";
    write(pipefd[1], msg, strlen(msg));

    char buf[32];
    read(pipefd[0], buf, 32);
    printf("talking_to_myself: %s\n", buf);

    close(pipefd[0]);
    close(pipefd[1]);
}

3.2 父進程向子進程傳遞數據

自言自語始終是太過無聊，是時候讓父子進程之間聊點什麼了。因為fork之後的子進程會繼承父進程的文件描述符，fork之前父進程向管道寫入的數據，子進程可以在管道的另一端讀到。

CUTEST_CASE(basic_pipe, parent2child) {
    int pipefd[2];
    pipe(pipefd);

    const char *msg = "parent write, child read";
    write(pipefd[1], msg, strlen(msg));

    if (fork() == 0) {
        close(pipefd[1]);
        char buf[64];
        memset(buf, 0, 64);
        read(pipefd[0], buf, 64);
        printf("parent2child: %s\n", buf);
        exit(0);
    }

    close(pipefd[0]);
    close(pipefd[1]);
}

3.2 自進程向父進程傳遞數據

管道的方向是由通信雙方操作的文件描述符決定的，子進程同樣可以傳遞消息給父進程。

CUTEST_CASE(basic_pipe, child2parent) {
    int pipefd[2];
    pipe(pipefd);

    if (fork() == 0) {
        close(pipefd[0]);
        const char *msg = "parent read, child write";
        write(pipefd[1], msg, strlen(msg));
        close(pipefd[1]);
        exit(0);
    }

    close(pipefd[1]);

    char buf[64];
    memset(buf, 0, 64);
    read(pipefd[0], buf, 64);
    printf("child2parent: %s\n", buf);
    close(pipefd[0]);
}

3.3 父進程向多個子進程傳遞數據

當有多個子進程時，只要它們持有了管道的文件描述符，就可以利用管道通信，把父進程寫進管道的數據讀取出來。當然，在具體的應用中需要考慮子進程的讀取順序等因素，下麵的例子只是簡單的創建了多個子進程，每個進程讀取一個int類型的數據，開始階段由父進程向管道寫入數據，需要說明一點，三個子進程並沒有將管道內的數據都讀完，當所有引用了這個管道的文件描述符都關閉了之後，內核也會在適當的時機銷毀自己維護的管道。


void fork_child_read(int id, int pipefd[2], const char *msg_pregix) {
    if (fork() == 0) {
        close(pipefd[1]);
        int n;
        read(pipefd[0], &n, sizeof(int));
        printf("%s: child %d get data %d\n", msg_pregix, id, n);
        close(pipefd[0]);
        exit(0);
    }
}

CUTEST_CASE(basic_pipe, parent2children) {
    int pipefd[2];
    pipe(pipefd);

    for (int i = 1; i <= 10; i++)
        write(pipefd[1], &i, sizeof(int));

    const char *msg_prefix = "parent2children:";
    fork_child_read(1, pipefd, msg_prefix);
    fork_child_read(2, pipefd, msg_prefix);
    fork_child_read(3, pipefd, msg_prefix);

    close(pipefd[0]);
    close(pipefd[1]);
}

3.4 父進程接收多個子進程的數據

考慮這樣一種場景，一個任務需要由多個子進程進行處理，最終的計算結果需要由父進程彙總，下麵的代碼模擬了這樣的場景，代碼中創建了兩個子進程向管道寫入數據，父進程則一直嘗試讀取管道內的數據。

void fork_child_write(int pipefd[2], int data) {
    if (fork() == 0) {
        close(pipefd[0]);
        write(pipefd[1], &data, sizeof(int));
        close(pipefd[1]);
        exit(0);
    }
}

CUTEST_CASE(basic_pipe, children2parent) {
    int pipefd[2];
    pipe(pipefd);

    int data[] = {512, 1024};

    fork_child_write(pipefd, data[0]);
    fork_child_write(pipefd, data[1]);

    close(pipefd[1]);
    int n;
    while (read(pipefd[0], &n, sizeof(int)) == sizeof(int)) {
        printf("children2parent: get data %d\n", n);
    }
    close(pipefd[0]);
}

3.5 兄弟進程之間傳遞數據

如果有兩個兄弟進程，進程A需要得到進程B的計算結果之後才能完成自己的任務，這時也可以用管道通信。代碼中分別創建了兩個進程對管道進行寫和讀操作，實際應用中經常還需要一種通知機制，讓等待的進程知道它依賴的任務已經就緒了，這需要用到信號量，後續文章會介紹。下麵代碼的第二個進程在read操作時是阻塞的，會一直等到管道中數據可讀，因為創建管道時沒有指定O_NONBLOCK標誌。

CUTEST_CASE(basic_pipe, two_children) {
    int pipefd[2];
    pipe(pipefd);

    const char *msg = "pipe between two children";
    if (fork() == 0) {
        close(pipefd[0]);
        write(pipefd[1], msg, strlen(msg));
        close(pipefd[1]);
        exit(0);
    }

    if (fork() == 0) {
        close(pipefd[1]);
        char buf[64];
        memset(buf, 0, 64);
        read(pipefd[0], buf, 64);
        printf("two_children: %s\n", buf);
        close(pipefd[0]);
        exit(0);
    }

    close(pipefd[0]);
    close(pipefd[1]);
}

3.6 阻塞和非阻塞的問題

前面的例子中提到了管道的阻塞和非阻塞，這裡詳細說明一下這個問題。對於一個阻塞的管道，如果進程在read時，系統中存在沒有關閉的寫端文件描述符，但此時管道是空的，read操作就會阻塞在這裡。可以這樣理解，因為寫端的存在，read就固執地認為在未來的某個時刻一定會有人會向管道中寫入數據，所以它就阻塞在這裡。對於非阻塞的管道，在前面的條件下，read會立即返回。上述的特性就要求我們在使用阻塞類型的管道時要及時關閉不使用的文件描述符，因為進程read操作時在等待的寫端文件描述符很可能是由當前進程打開的，當系統中管道的其他寫端都關閉了的時候，當前進程的read就會出現自己等自己的問題，類似死鎖。

CUTEST_CASE(basic_pipe, blocking_read) {
    int pipefd[2];
    pipe(pipefd);

    if (fork() == 0) {
        /* NOTE: remove the comment below if you don't want child process
         * blocking while reading data from pipe. Otherwise you will see that
         * there is still a "basic-pipe" process after you finish this test, and
         * you have to kill it manually.*/
        // close(pipefd[1]);

        int num;
        read(pipefd[0], &num, sizeof(int));

        /* NOTE: since the write end of pipe is a valid file descriptor in
         * current process, the print below should never execute.*/
        printf("should NEVER goes here\n");
        exit(0);
    }

    close(pipefd[0]);
    close(pipefd[1]);
    printf("blocking_read: parent process exit\n");
}

上述代碼使用的是阻塞類型的管道，fork出的進程沒有關閉管道的寫端，然後執行了read操作，當父進程退出之後，系統中仍存在這個管道的寫端描述符，並且就在已經處於睡眠狀態下的子進程中，這種情況下將不會再有人向管道中寫入數據，子進程會一直睡眠。運行代碼之後使用ps命令可以看到這個睡死過去的子進程。

3.7 測試執行結果

以下是上述測試的執行結果，可以看到在程式退出之後仍然由一個"basic-pipe"進程，這是因為3.6節中的代碼在子進程中沒有及時關閉不使用的管道文件描述符。此時不得不手動把睡死的進程kill掉了。

[junan@arch1 test-all]$ make install
[junan@arch1 test-all]$ ./script/run_test.sh basic-pipe
blocking_read: parent process exit
two_children: pipe between two children
children2parent: get data 512
children2parent: get data 1024
parent2children:: child 1 get data 1
parent2children:: child 2 get data 2
parent2children:: child 3 get data 3
child2parent: parent read, child write
talking_to_myself: I'm talking to myself
cutest summary:
        [basic_pipe] suit result: 7/7
        [basic_pipe::blocking_read] case result: Pass
        [basic_pipe::two_children] case result: Pass
        [basic_pipe::children2parent] case result: Pass
        [basic_pipe::parent2children] case result: Pass
        [basic_pipe::child2parent] case result: Pass
        [basic_pipe::parent2child] case result: Pass
        [basic_pipe::talking_to_myself] case result: Pass
parent2child: parent write, child read
[junan@arch1 test-all]$ ps -e|grep basic-pipe
  18866 pts/2    00:00:00 basic-pipe
[junan@arch1 test-all]$ kill -9 18866
[junan@arch1 test-all]$ ps -e|grep basic-pipe
[junan@arch1 test-all]$

4. pipe的進階使用

以上的幾段示例代碼說明瞭管道的一些基本使用方法和註意事項，下麵看一個使用管道和多進程生成質數的問題。我們的需求是這樣的，給定一個整數nmax，生成[2, nmax]區間上的所有質數，並且要求生成質數的核心邏輯使用管道和多進程。第一次碰到這個問題是在xv6操作系統的lab中，也是為了說明pipe和fork的使用。

看到這裡，不妨先稍微思考一下？一個簡單的想法可能是這樣的，首先有一個函數，其功能是判斷輸入的n是否是質數，接下來遍歷[2, nmax]上的整數，並且用之前的函數把質數都過濾出來，但問題是如何用管道和多進程實現這個函數的過濾功能呢？OK, 思考結束，來看看管道加多進程版本的質數生成器演算法思路：

prime-number-pipe

這個“質數篩子”中的每個進程主要有三個任務，1）從pipe1讀取第一個數據並列印出來，並且它一定是質數；2）用得到的質數過濾pipe1中的其他數據，並把過濾出來的數據寫入pipe2；3）fork自己的子進程，並把pipe2傳遞給它；具體的代碼實現如下，當過濾之後沒有數據時，就不會繼續創建子進程了。

void generate_primes(int pipe1[2]) {
    close(pipe1[1]);

    int prime = 0;
    int err = read(pipe1[0], &prime, sizeof(int));
    if (err <= 0) {
        close(pipe1[0]);
        return;
    }
    printf("%d\n", prime);

    int pipe2[2];
    pipe(pipe2);

    pid_t pid = fork();
    if (pid == 0) {
        generate_primes(pipe2);
    } else {
        int num = 0;
        while ((err = read(pipe1[0], &num, sizeof(int))) > 0) {
            if (num % prime) {
                write(pipe2[1], &num, sizeof(int));
            }
        }
    }

    close(pipe1[0]);
    close(pipe2[0]);
    close(pipe2[1]);
    exit(0);
}

CUTEST_SUIT(prime_numbers_pipe)

CUTEST_CASE(prime_numbers_pipe, prime_number_max30) {
    int nmax = 30;

    int pipe1[2];
    pipe(pipe1);

    for (int i = 2; i <= nmax; ++i)
        write(pipe1[1], &i, sizeof(int));

    if (fork() == 0) {
        generate_primes(pipe1);
    }

    close(pipe1[0]);
    close(pipe1[1]);
}

代碼中生成的是2到30區間上的質數，執行結果如下：

[junan@arch1 test-all]$ ./script/run_test.sh prime-number-pipe
cutest summary:
        [prime_numbers_pipe] suit result: 1/1
        [prime_numbers_pipe::prime_number_max30] case result: Pass
2
3
5
7
11
13
17
19
23
29
[junan@arch1 test-all]$

5. 寫在最後

管道是一種比較基礎和常用的進程間通信方法，在使用過程中需要註意及時關閉不再使用的文件描述符的問題，否則可能使得進程一直睡眠。文中的代碼示例可以在我的代碼倉庫中找到，有興趣的可以自己clone下來實際跑跑看。後續會繼續更新其他的IPC相關的文章，併在最後使用各種IPC方法實現一個小項目，有想法的歡迎在評論區冒泡。

6. 相關鏈接

本文來自博客園，作者：kfggww，轉載請註明原文鏈接：https://www.cnblogs.com/kfggww/p/17066291.html