Linux內核學習筆記（6）-- 進程優先順序詳解（prio、static_prio、normal_prio、rt_priority）

Linux 中採用了兩種不同的優先順序範圍，一種是 nice 值，一種是實時優先順序。在上一篇粗略的說了一下 nice 值和實時優先順序，仍有不少疑問，本文來詳細說明一下進程優先順序。linux 內核版本為 linux 2.6.34 。進程優先順序的相關信息，存放在進程描述符 task_struct 中： ...

　　Linux 中採用了兩種不同的優先順序範圍，一種是 nice 值，一種是實時優先順序。在上一篇粗略的說了一下 nice 值和實時優先順序，仍有不少疑問，本文來詳細說明一下進程優先順序。linux 內核版本為 linux 2.6.34 。

　　進程優先順序的相關信息，存放在進程描述符 task_struct 中：

struct task_struct {
        ...
    int prio, static_prio, normal_prio;
    unsigned int rt_priority;
        ...
}

　　可以看到，有四種進程優先順序： prio、static_prio、normal_prio 和 rt_priority，它們的具體定義在 kernel/sched.c 中，在介紹這四種優先順序之前，先介紹一下以下巨集定義：

/* linux-kernel 2.6.34 /include/linux/sched.h */

/*
 * Priority of a process goes from 0..MAX_PRIO-1, valid RT
 * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
 * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority
 * values are inverted: lower p->prio value means higher priority.
 *
 * The MAX_USER_RT_PRIO value allows the actual maximum
 * RT priority to be separate from the value exported to
 * user-space.  This allows kernel threads to set their
 * priority to a value higher than any user task. Note:
 * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
 */

#define MAX_USER_RT_PRIO     100
#define MAX_RT_PRIO          MAX_USER_RT_PRIO

#define MAX_PRIO            (MAX_RT_PRIO + 40)
#define DEFAULT_PRIO        (MAX_RT_PRIO + 20)　　　　// 預設優先順序，對應 nice 值為 0 的靜態優先順序

1、prio 動態優先順序

　　prio 的值是調度器最終使用的優先順序數值，即調度器選擇一個進程時實際選擇的值。prio 值越小，表明進程的優先順序越高。prio 值的取值範圍是 0 ~ MAX_PRIO，即 0 ~ 139（包括 0 和 139），根據調度策略的不同，又可以分為兩個區間，其中區間 0 ~ 99 的屬於實時進程，區間 100 ~139 的為非實時進程。用語言不好描述，我們通過內核代碼來詳細描述 prio：

/* linux-kernel 2.6.34  /kernel/sched.c  */

#include "sched_idletask.c"
#include "sched_fair.c"
#include "sched_rt.c"
#ifdef CONFIG_SCHED_DEBUG
#include "sched_debug.c"
#endif

/*
 * __normal_prio - return the priority that is based on the static prio
 */
static inline int __normal_prio(struct task_struct *p)　　　　// _normal_prio 函數，返回靜態優先順序值
{
    return p->static_prio;
}

/*
 * Calculate the expected normal priority: i.e. priority
 * without taking RT-inheritance into account. Might be
 * boosted by interactivity modifiers. Changes upon fork,
 * setprio syscalls, and whenever the interactivity
 * estimator recalculates.
 */
static inline int normal_prio(struct task_struct *p)　　　　// normal_prio 函數
{
    int prio;

    if (task_has_rt_policy(p))　　　　　　　　　　　　　　　　　// task_has_rt_policy 函數，判斷進程是否為實時進程，若為實時進程，則返回1，否則返回0
        prio = MAX_RT_PRIO-1 - p->rt_priority;　　　　　　　 // 進程為實時進程，prio 值為實時優先順序值做相關運算得到： prio = MAX_RT_PRIO -1 - p->rt_priority
    else
        prio = __normal_prio(p);　　　　　　　　　　　　　　　　// 進程為非實時進程，則 prio 值為靜態優先順序值，即 prio = p->static_prio
    return prio;
}

/*
 * Calculate the current priority, i.e. the priority
 * taken into account by the scheduler. This value might
 * be boosted by RT tasks, or might be boosted by
 * interactivity modifiers. Will be RT if the task got
 * RT-boosted. If not then it returns p->normal_prio.
 */
static int effective_prio(struct task_struct *p)　　　　　　　// effective_prio 函數，計算進程的有效優先順序，即prio值，這個值是最終調度器所使用的優先順序值
{
    p->normal_prio = normal_prio(p);　　　　　　　　　　　　　　// 計算 normal_prio 的值
    /*
     * If we are RT tasks or we were boosted to RT priority,
     * keep the priority unchanged. Otherwise, update priority
     * to the normal priority:
     */
    if (!rt_prio(p->prio))
        return p->normal_prio;　　　　　　　　　　　　　　　　　　// 若進程是非實時進程，則返回 normal_prio 值，這時的 normal_prio = static_prio
    return p->prio;　　　　　　　　　　　　　　　　　　　　　　　　 // 否則，返回值不變，依然為 prio 值，此時 prio = MAX_RT_PRIO -1 - p->rt_priority
} 

/*********************** 函數 set_user_nice ****************************************/
void set_user_nice(struct task_struct *p, long nice)
{
　　　　　....
    p->prio = effective_prio(p);　　　　　　　　　　　　　　     // 在函數 set_user_nice 中，調用 effective_prio 函數來設置進程的 prio 值
　　　　　....
}

　　從上面代碼中我們知道，當進程為實時進程時， prio 的值由實時優先順序值（rt_priority）計算得來；當進程為非實時進程時，prio 的值由靜態優先順序值（static_prio）得來。即：

prio = MAX_RT_PRIO - 1 - rt_priority // 進程為實時進程

prio = static_prio　　　　　　　　　　// 進程為非實時進程

　　簡單計算上面的兩個式子，可以知道，prio 值的範圍是 0 ~ 139 。

2、static_prio 靜態優先順序

　　靜態優先順序不會隨時間改變，內核不會主動修改它，只能通過系統調用 nice 去修改 static_prio，如下：

/*
 * Convert user-nice values [ -20 ... 0 ... 19 ]
 * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
 * and back.
 */
#define NICE_TO_PRIO(nice)    (MAX_RT_PRIO + (nice) + 20)
#define PRIO_TO_NICE(prio)    ((prio) - MAX_RT_PRIO - 20)
#define TASK_NICE(p)        PRIO_TO_NICE((p)->static_prio)

/*
 * 'User priority' is the nice value converted to something we
 * can work with better when scaling various scheduler parameters,
 * it's a [ 0 ... 39 ] range.
 */
#define USER_PRIO(p)        ((p)-MAX_RT_PRIO)
#define TASK_USER_PRIO(p)    USER_PRIO((p)->static_prio)
#define MAX_USER_PRIO        (USER_PRIO(MAX_PRIO))

/********************* 函數 set_user_nice *****************************/
p->static_prio = NICE_TO_PRIO(nice);    　　　　// 當有需要時，系統會通過調用 NICE_TO_PRIO() 來修改 static_prio 的值

　　由上面代碼知道，我們可以通過調用 NICE_TO_PRIO(nice) 來修改 static_prio 的值， static_prio 值的計算方法如下：

static_prio = MAX_RT_PRIO + nice +20

　　MAX_RT_PRIO 的值為100，nice 的範圍是 -20 ~ +19，故 static_prio 值的範圍是 100 ~ 139。 static_prio 的值越小，表明進程的靜態優先順序越高。

3、normal_prio 歸一化優先順序

　　normal_prio 的值取決於靜態優先順序和調度策略，可以通過 _setscheduler 函數來設置 normal_prio 的值。對於非實時進程，normal_prio 的值就等於靜態優先順序值 static_prio；對於實時進程，normal_prio = MAX_RT_PRIO-1 - p->rt_priority。代碼如下：

static inline int normal_prio(struct task_struct *p)　　　　// normal_prio 函數
{
    int prio;

    if (task_has_rt_policy(p))　　　　　　　　　　　　　　　　　// task_has_rt_policy 函數，判斷進程是否為實時進程，若為實時進程，則返回1，否則返回0
        prio = MAX_RT_PRIO-1 - p->rt_priority;　　　　　　　 // 進程為實時進程，prio 值為實時優先順序值做相關運算得到： prio = MAX_RT_PRIO -1 - p->rt_priority
    else
        prio = __normal_prio(p);　　　　　　　　　　　　　　　　// 進程為非實時進程，則 prio 值為靜態優先順序值，即 prio = p->static_prio
    return prio;
}

4、rt_priority 實時優先順序

　　rt_priority 值的範圍是 0 ~ 99，只對實時進程有效。由式子：

prio = MAX_RT_PRIO-1 - p->rt_priority;　

　　知道，rt_priority 值越大，則 prio 值越小，故 實時優先順序（rt_priority）的值越大，意味著進程優先順序越高。

　　rt_priority 的值也是取決於調度策略的，可以在 _setscheduler 函數中對 rt_priority 值進行設置。