你應該曾經糾結過是用kmalloc(),還是vmalloc()?現在你不用那麼糾結了,因為內核裡面現在有個API叫kvmalloc(),可以認為是kmalloc()和vmalloc()的雙劍合一。屠龍刀和倚天劍的合體。 內核裡面有大量的代碼現在都使用了kvmalloc(),譬如: source/ip ...
你應該曾經糾結過是用kmalloc(),還是vmalloc()?現在你不用那麼糾結了,因為內核裡面現在有個API叫kvmalloc(),可以認為是kmalloc()和vmalloc()的雙劍合一。屠龍刀和倚天劍的合體。
內核裡面有大量的代碼現在都使用了kvmalloc(),譬如:
source/ipc/msg.c
static int newque(struct ipc_namespace *ns, struct ipc_params *params)
{
struct msg_queue *msq;
int retval;
key_t key = params->key;
int msgflg = params->flg;
msq = kvmalloc(sizeof(*msq), GFP_KERNEL);
if (unlikely(!msq))
return -ENOMEM;
...
}
這個代碼在早期的內核裡面是(比如v4.0-rc7/source/ipc/msg.c):
static int newque(struct ipc_namespace *ns, struct ipc_params *params)
{
struct msg_queue *msq;
int id, retval;
key_t key = params->key;
int msgflg = params->flg;
msq = ipc_rcu_alloc(sizeof(*msq));
if (!msq)
return -ENOMEM;
...
}
看起來是用的這個函數申請記憶體:
ipc_rcu_alloc(sizeof(*msq))
那麼這個ipc_rc_alloc()是怎麼回事呢?
void *ipc_alloc(int size)
{
void *out;
if (size > PAGE_SIZE)
out = vmalloc(size);
else
out = kmalloc(size, GFP_KERNEL);
return out;
}
邏輯上是,大於一頁的時候用vmalloc(),小於等於1頁用kmalloc()。
而kvmalloc()的實現代碼裡面則對類似邏輯進行了非常智能地處理:
void *kvmalloc_node(size_t size, gfp_t flags, int node)
{
gfp_t kmalloc_flags = flags;
void *ret;
/*
* vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
* so the given set of flags has to be compatible.
*/
if ((flags & GFP_KERNEL) != GFP_KERNEL)
return kmalloc_node(size, flags, node);
/*
* We want to attempt a large physically contiguous block first because
* it is less likely to fragment multiple larger blocks and therefore
* contribute to a long term fragmentation less than vmalloc fallback.
* However make sure that larger requests are not too disruptive - no
* OOM killer and no allocation failure warnings as we have a fallback.
*/
if (size > PAGE_SIZE) {
kmalloc_flags |= __GFP_NOWARN;
if (!(kmalloc_flags & __GFP_RETRY_MAYFAIL))
kmalloc_flags |= __GFP_NORETRY;
}
ret = kmalloc_node(size, kmalloc_flags, node);
/*
* It doesn't really make sense to fallback to vmalloc for sub page
* requests
*/
if (ret || size <= PAGE_SIZE)
return ret;
return __vmalloc_node_flags_caller(size, node, flags,
__builtin_return_address(0));
}
EXPORT_SYMBOL(kvmalloc_node);
static inline void *kvmalloc(size_t size, gfp_t flags)
{
return kvmalloc_node(size, flags, NUMA_NO_NODE);
}
大於一個page的時候,會先用kmalloc()進行__GFP_NORETRY的嘗試,如果嘗試失敗就fallback到vmalloc(NORETRY標記避免了kmalloc在申請記憶體失敗地情況下,反覆嘗試甚至做OOM來獲得記憶體)。
當然,kvmalloc()的size如果小於1個page,則沿用老的kmalloc()邏輯,而且也不會設置__GFP_NORETRY,如果反覆嘗試失敗的話,也不會fallback到vmalloc(),因為vmalloc()申請小於1個page的記憶體是不合適的。