文章主要闡述了優先順序反轉的一些概念和解決思路,並結合iOS平臺的幾種鎖進行了詳細的調研。通過深入的理解,可以去規避一些不必要的優先順序反轉,從而進一步避免卡死異常。 ...
作者:崔曉兵
從一個線上問題說起
最近線上上遇到了一些[HMDConfigManager remoteConfigWithAppID:]
卡死
初步分析
觀察了下主線程堆棧,用到的鎖是讀寫鎖
隨後又去翻了下持有著鎖的子線程,有各種各樣的情況,且基本都處於正常的執行狀態,例如有的處於打開文件狀態,有的處於read
狀態,有的正在執行NSUserDefaults
的方法…通過觀察發現,出問題的線程都有QOS:BACKGROUND
標記。整體看起來持有鎖的子線程仍然在執行,只是留給主線程的時間不夠了。為什麼這些子線程在持有鎖的情況下,需要執行這麼久,直到主線程的8s卡死?一種情況就是真的如此耗時,另一種則是出現了優先順序反轉。
解決辦法
在這個案例裡面,持有讀寫鎖且優先順序低的線程遲遲得不到調度(又或者得到調度的時候又被搶占了,或者得到調度的時候時間已然不夠了),而具有高優先順序的線程由於拿不到讀寫鎖,一直被阻塞,所以互相死鎖。iOS8
之後引入了QualityOfService
的概念,類似於線程的優先順序,設置不同的QualityOfService
的值後系統會分配不同的CPU
時間、網路資源和硬碟資源等,因此我們可以通過這個設置隊列的優先順序 。
方案一:去除對NSOperationQueue
的優先順序設置
在 Threading Programming Guide 文檔中,蘋果給出了提示:
Important: It is generally a good idea to leave the priorities of your threads at their default values. Increasing the priorities of some threads also increases the likelihood of starvation among lower-priority threads. If your application contains high-priority and low-priority threads that must interact with each other, the starvation of lower-priority threads may block other threads and create performance bottlenecks.
蘋果的建議是不要隨意修改線程的優先順序,尤其是這些高低優先順序線程之間存在臨界資源競爭的情況。所以刪除相關優先順序設置代碼即可解決問題。
方案二:臨時修改線程優先順序
在 pthread_rwlock_rdlock(3pthread) 發現瞭如下提示:
Realtime applications may encounter priority inversion when using read-write locks. The problem occurs when a high priority thread “locks” a read-write lock that is about to be “unlocked” by a low priority thread, but the low priority thread is preempted by a medium priority thread. This scenario leads to priority inversion; a high priority thread is blocked by lower priority threads for an unlimited period of time. During system design, realtime programmers must take into account the possibility of this kind of priority inversion. They can deal with it in a number of ways, such as by having critical sections that are guarded by read-write locks execute at a high priority, so that a thread cannot be preempted while executing in its critical section.
儘管針對的是實時系統,但是還是有一些啟示和幫助。按照提示,對有問題的代碼進行了修改:線上程通過pthread_rwlock_wrlock
拿到_rwlock
的時候,臨時提升其優先順序,在釋放_rwlock
之後,恢復其原先的優先順序。
- (id)remoteConfigWithAppID:(NSString *)appID
{
.......
pthread_rwlock_rdlock(&_rwlock);
HMDHeimdallrConfig *result = ....... // get existing config
pthread_rwlock_unlock(&_rwlock);
if(result == nil) {
result = [[HMDHeimdallrConfig alloc] init]; // make a new config
pthread_rwlock_wrlock(&_rwlock);
qos_class_t oldQos = qos_class_self();
BOOL needRecover = NO;
// 臨時提升線程優先順序
if (_enablePriorityInversionProtection && oldQos < QOS_CLASS_USER_INTERACTIVE) {
int ret = pthread_set_qos_class_self_np(QOS_CLASS_USER_INTERACTIVE, 0);
needRecover = (ret == 0);
}
......
pthread_rwlock_unlock(&_rwlock);
// 恢複線程優先順序
if (_enablePriorityInversionProtection && needRecover) {
pthread_set_qos_class_self_np(oldQos, 0);
}
}
return result;
}
值得註意的是,這裡只能使用
pthread
的api
,NSThread
提供的API
是不可行的
Demo 驗證
為了驗證上述的手動調整線程優先順序是否有一定的效果,這裡通過demo
進行本地實驗:定義了2000
個operation
(目的是為了CPU
繁忙),優先順序設置NSQualityOfServiceUserInitiated
,且對其中可以被100
整除的operation
的優先順序調整為NSQualityOfServiceBackground
,在每個operation
執行相同的耗時任務,然後對這被選中的10
個operation
進行耗時統計。
for (int j = 0; j < 2000; ++j) {
NSOperationQueue *operation = [[NSOperationQueue alloc] init];
operation.maxConcurrentOperationCount = 1;
operation.qualityOfService = NSQualityOfServiceUserInitiated;
// 模塊1
// if (j % 100 == 0) {
// operation.qualityOfService = NSQualityOfServiceBackground;
// }
// 模塊1
[operation addOperationWithBlock:^{
// 模塊2
// qos_class_t oldQos = qos_class_self();
// pthread_set_qos_class_self_np(QOS_CLASS_USER_INITIATED, 0);
// 模塊2
NSTimeInterval start = CFAbsoluteTimeGetCurrent();
double sum = 0;
for (int i = 0; i < 100000; ++i) {
sum += sin(i) + cos(i) + sin(i*2) + cos(i*2);
}
start = CFAbsoluteTimeGetCurrent() - start;
if (j % 100 == 0) {
printf("%.8f\n", start * 1000);
}
// 模塊2
// pthread_set_qos_class_self_np(oldQos, 0);
// 模塊2
}];
}
統計信息如下圖所示
A | B | C |
---|---|---|
(註釋模塊1和模塊2代碼) | (只打開模塊1代碼) | (同時打開模塊1和模塊2代碼) |
11.8190561 | 94.70210189 | 15.04005137 |
可以看到
- 正常情況下,每個任務的平均耗時為:11.8190561;
- 當
operation
被設置為低優先順序時,其耗時大幅度提升為:94.70210189; - 當
operation
被設置為低優先順序時,又在Block
中手動恢復其原有的優先順序,其耗時已經大幅度降低:15.04005137( 耗時比正常情況高,大家可以思考下為什麼)
通過Demo
可以發現,通過手動調整其優先順序,低優先順序任務的整體耗時得到大幅度的降低,這樣在持有鎖的情況下,可以減少對主線程的阻塞時間。
上線效果
該問題的驗證過程分為2
個階段:
- 第一個階段如第1個紅框所示,從
3
月6
號開始在版本19.7
上有較大幅度的下降,主要原因:堆棧中被等待的隊列信息由QOS:BACKGROUND
變為了com.apple.root.default-qos
,隊列的優先順序從QOS_CLASS_BACKGROUND
提升為QOS_CLASS_DEFAULT
,相當於實施了方案一,使用了預設優先順序。 - 第二個階段如第
2
個紅框所示,從4
月24
號在版本20.3
上開始驗證。目前看起來效果暫時不明顯,推測一個主要原因是:demo
中是把優先順序從QOS_CLASS_BACKGROUND
提升為QOS_CLASS_USER_INITIATED
,而線上相當於把隊列的優先順序從預設的優先順序QOS_CLASS_DEFAULT
提升為QOS_CLASS_USER_INITIATED
所以相對來說,線上的提升相對有限。QOS_CLASS_BACKGROUND
的Mach
層級優先順序數是4;QOS_CLASS_DEFAULT
的Mach
層級優先順序數是31;QOS_CLASS_USER_INITIATED
的Mach
層級優先順序數是37;
深刻理解優先順序反轉
那麼是否所有鎖都需要像上文一樣,手動提升持有鎖的線程優先順序?系統是否會自動調整線程的優先順序?如果有這樣的機制,是否可以覆蓋所有的鎖?要理解這些問題,需要深刻認識優先順序反轉。
什麼是優先順序反轉?
優先順序反轉,是指某同步資源被較低優先順序的進程/線程所擁有,較高優先順序的進程/線程競爭該同步資源未獲得該資源,而使得較高優先順序進程/線程反而推遲被調度執行的現象。根據阻塞類型的不同,優先順序反轉又被分為Bounded priority inversion
和Unbounded priority inversion
。這裡藉助 Introduction to RTOS - Solution to Part 11 (Priority Inversion) 的圖進行示意。
Bounded priority inversion
如圖所示,高優先順序任務(Task H
)被持有鎖的低優先順序任務(Task L
)阻塞,由於阻塞的時間取決於低優先順序任務在臨界區的時間(持有鎖的時間),所以被稱為bounded priority inversion
。只要Task L
一直持有鎖,Task H
就會一直被阻塞,低優先順序的任務運行在高優先順序任務的前面,優先順序被反轉。
這裡的任務也可以理解為線程
Unbounded priority inversion
在Task L
持有鎖的情況下,如果有一個中間優先順序的任務(Task M
)打斷了Task L
,前面的bounded
就會變為unbounded
,因為Task M
只要搶占了Task L
的CPU
,就可能會阻塞Task H
任意多的時間(Task M
可能不止1
個)
優先順序反轉常規解決思路
目前解決Unbounded priority inversion
有2
種方法:一種被稱作優先權極限(priority ceiling protocol
),另一種被稱作優先順序繼承(priority inheritance
)。
Priority ceiling protocol
在優先權極限方案中,系統把每一個臨界資源與1個極限優先權相關聯。當1個任務進入臨界區時,系統便把這個極限優先權傳遞給這個任務,使得這個任務的優先權最高;當這個任務退出臨界區後,系統立即把它的優先權恢復正常,從而保證系統不會出現優先權反轉的情況。該極限優先權的值是由所有需要該臨界資源的任務的最大優先順序來決定的。
如圖所示,鎖的極限優先權是3。當Task L
持有鎖的時候,它的優先順序將會被提升到3,和Task H
一樣的優先順序。這樣就可以阻止Task M
(優先順序是2)的運行,直到Task L
和Task H
不再需要該鎖。
Priority inheritance
在優先順序繼承方案中,大致原理是:高優先順序任務在嘗試獲取鎖的時候,如果該鎖正好被低優先順序任務持有,此時會臨時把高優先順序線程的優先順序轉移給擁有鎖的低優先順序線程,使低優先順序線程能更快的執行並釋放同步資源,釋放同步資源後再恢復其原來的優先順序。
priority ceiling protocol
和priority inheritance
都會在釋放鎖的時候,恢復低優先順序任務的優先順序。同時要註意,以上2
種方法只能阻止Unbounded priority inversion
,而無法阻止Bounded priority inversion
(Task H
必須等待Task L
執行完畢才能執行,這個反轉是無法避免的)。
可以通過以下幾種發生來避免或者轉移Bounded priority inversion
:
- 減少臨界區的執行時間,減少
Bounded priority inversion
的反轉耗時; - 避免使用會阻塞高優先順序任務的臨界區資源;
- 專門使用一個隊列來管理資源,避免使用鎖。
優先順序繼承必須是可傳遞的。舉個慄子:當
T1
阻塞在被T2
持有的資源上,而T2
又阻塞在T3
持有的一個資源上。如果T1
的優先順序高於T2
和T3
的優先順序,T3
必須通過T2
繼承T1
的優先順序。否則,如果另外一個優先順序高於T2
和T3
,小於T1
的線程T4
,將搶占T3
,引發相對於T1
的優先順序反轉。因此,線程所繼承的優先順序必須是直接或者間接阻塞的線程的最高優先順序。
如何避免優先順序反轉?
QoS 傳遞
iOS 系統主要使用以下兩種機制來在不同線程(或 queue
)間傳遞 QoS
:
- 機制1:
dispatch_async
dispatch_async()
automatically propagates the QoS from the calling thread, though it will translate User Interactive to User Initiated to avoid assigning that priority to non-main threads.- Captured at time of block submission, translate user interactive to user initiated. Used if destination queue does not have a QoS and does not lower the QoS (ex dispatch_async back to the main thread)
- 機制2:基於 XPC 的進程間通信(
IPC
)
系統的 QoS 傳遞規則比較複雜,主要參考以下信息:
- 當前線程的
QoS
- 如果是使用
dispatch_block_create
() 方法生成的dispatch_block
,則考慮生成block
時所調用的參數 dispatch_async
或IPC
的目標queue
或線程的QoS
調度程式會根據這些信息決定 block
以什麼優先順序運行。
- 如果沒有其他線程同步地等待此
block
,則block
就按上面所說的優先順序來運行。 如果出現了線程間同步等待的情況,則調度程式會根據情況調整線程的運行優先順序。
如何觸發優先順序反轉避免機制?
如果當前線程因等待某線程(線程1)上正在進行的操作(如 block1
)而受阻,而系統知道 block1
所在的目標線程(owner
),系統會通過提高相關線程的優先順序來解決優先順序反轉的問題。反之如果系統不知道 block1
所在目標線程,則無法知道應該提高誰的優先順序,也就無法解決反轉問題;
記錄了持有者信息(owner
)的系統 API 如下:
pthread mutex
、os_unfair_lock
、以及基於這二者實現的上層 APIdispatch_once
的實現是基於os_unfair_lock
的NSLock
、NSRecursiveLock
、@synchronized
等的實現是基於pthread mutex
dispatch_sync
、dispatch_wait
xpc_connection_send_with_message_sync
使用以上這些 API
能夠在發生優先順序反轉時使系統啟用優先順序反轉避免機制。
基礎API驗證
接下來對前文提到的各種「基礎系統API
」進行驗證
測試驗證環境:模擬器 iOS15.2
pthread mutex
pthread mutex
的數據結構pthread_mutex_s
其中有一個m_tid
欄位,專門來記錄持有該鎖的線程Id
。
// types_internal.h
struct pthread_mutex_s {
long sig;
_pthread_lock lock;
union {
uint32_t value;
struct pthread_mutex_options_s options;
} mtxopts;
int16_t prioceiling;
int16_t priority;
#if defined(__LP64__)
uint32_t _pad;
#endif
union {
struct {
uint32_t m_tid[2]; // thread id of thread that has mutex locked
uint32_t m_seq[2]; // mutex sequence id
uint32_t m_mis[2]; // for misaligned locks m_tid/m_seq will span into here
} psynch;
struct _pthread_mutex_ulock_s ulock;
};
#if defined(__LP64__)
uint32_t _reserved[4];
#else
uint32_t _reserved[1];
#endif
};
代碼來驗證一下:線程優先順序是否會被提升?
// printThreadPriority用來列印線程的優先順序信息
void printThreadPriority() {
thread_t cur_thread = mach_thread_self();
mach_port_deallocate(mach_task_self(), cur_thread);
mach_msg_type_number_t thread_info_count = THREAD_INFO_MAX;
thread_info_data_t thinfo;
kern_return_t kr = thread_info(cur_thread, THREAD_EXTENDED_INFO, (thread_info_t)thinfo, &thread_info_count);
if (kr != KERN_SUCCESS) {
return;
}
thread_extended_info_t extend_info = (thread_extended_info_t)thinfo;
printf("pth_priority: %d, pth_curpri: %d, pth_maxpriority: %d\n", extend_info->pth_priority, extend_info->pth_curpri, extend_info->pth_maxpriority);
}
先在子線程上鎖並休眠,然後主線程請求該鎖
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
printf("begin : \n");
printThreadPriority();
printf("queue before lock \n");
pthread_mutex_lock(&_lock); //確保 backgroundQueue 先得到鎖
printf("queue lock \n");
printThreadPriority();
dispatch_async(dispatch_get_main_queue(), ^{
printf("before main lock\n");
pthread_mutex_lock(&_lock);
printf("in main lock\n");
pthread_mutex_unlock(&_lock);
printf("after main unlock\n");
});
sleep(10);
printThreadPriority();
printf("queue unlock\n");
pthread_mutex_unlock(&_lock);
printf("queue after unlock\n");
});
begin :
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue before lock
queue lock
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
before main lock
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
queue unlock
in main lock
after main unlock
queue after unlock
可以看到,低優先順序子線程先持有鎖,當時的優先順序為4
,而該鎖被主線程請求的時候,子線程的優先順序被提升為47
os_unfair_lock
os_unfair_lock
用來替換OSSpinLock
,解決優先順序反轉問題。等待os_unfair_lock
鎖的線程會處於休眠狀態,從用戶態切換到內核態,而並非忙等。os_unfair_lock
將線程ID
保存到了鎖的內部,鎖的等待者會把自己的優先順序讓出來,從而避免優先順序反轉。驗證一下:
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
printf("begin : \n");
printThreadPriority();
printf("queue before lock \n");
os_unfair_lock_lock(&_unfair_lock); //確保 backgroundQueue 先得到鎖
printf("queue lock \n");
printThreadPriority();
dispatch_async(dispatch_get_main_queue(), ^{
printf("before main lock\n");
os_unfair_lock_lock(&_unfair_lock);
printf("in main lock\n");
os_unfair_lock_unlock(&_unfair_lock);
printf("after main unlock\n");
});
sleep(10);
printThreadPriority();
printf("queue unlock\n");
os_unfair_lock_unlock(&_unfair_lock);
printf("queue after unlock\n");
});
begin :
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue before lock
queue lock
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
before main lock
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
queue unlock
in main lock
after main unlock
queue after unlock
結果和pthread mutex
一致
pthread_rwlock_t
在 pthread_rwlock_init 有如下提示:
Caveats: Beware of priority inversion when using read-write locks. A high-priority thread may be blocked waiting on a read-write lock locked by a low-priority thread. The microkernel has no knowledge of read-write locks, and therefore can’t boost the low-priority thread to prevent the priority inversion.
大意是內核不感知讀寫鎖,無法提升低優先順序線程的優先順序,從而無法避免優先順序反轉。通過查詢定義發現:pthread_rwlock_s
包含了欄位rw_tid
,專門來記錄持有寫鎖的線程,這不由令人好奇:為什麼pthread_rwlock_s
有owner
信息卻仍然無法避免優先順序反轉?
struct pthread_rwlock_s {
long sig;
_pthread_lock lock;
uint32_t
unused:29,
misalign:1,
pshared:2;
uint32_t rw_flags;
#if defined(__LP64__)
uint32_t _pad;
#endif
uint32_t rw_tid[2]; // thread id of thread that has exclusive (write) lock
uint32_t rw_seq[4]; // rw sequence id (at 128-bit aligned boundary)
uint32_t rw_mis[4]; // for misaligned locks rw_seq will span into here
#if defined(__LP64__)
uint32_t _reserved[34];
#else
uint32_t _reserved[18];
#endif
};
https://news.ycombinator.com/item?id=21751269 鏈接中提到:
xnu supports priority inheritance through “turnstiles”, a kernel-internal mechani** which is used by default by a number of locking primitives (list at [1]), including normal pthread mutexes (though not read-write locks [2]), as well as the os_unfair_lock API (via the ulock syscalls). With pthread mutexes, you can actually explicitly request priority inheritance by calling pthread_mutexattr_setprotocol [3] with PTHREAD_PRIO_INHERIT; the Apple implementation supports it, but currently ignores the protocol setting and just gives all mutexes priority inheritance.
大意是:XNU
使用turnstiles
內核機制進行優先順序繼承,這種機制被應用在pthread mutex
和os_unfair_lock
上。
順藤摸瓜,在ksyn_wait
方法中找到了_kwq_use_turnstile
的調用,其中的註釋對讀寫鎖解釋的比較委婉,添加了at least sometimes
pthread mutexes and rwlocks both (at least sometimes) know their owner and can use turnstiles. Otherwise, we pass NULL as the tstore to the shims so they wait on the global waitq.
// libpthread/kern/kern_synch.c
int
ksyn_wait(ksyn_wait_queue_t kwq, kwq_queue_type_t kqi, uint32_t lockseq,
int fit, uint64_t abstime, uint16_t kwe_flags,
thread_continue_t continuation, block_hint_t block_hint)
{
thread_t th = current_thread();
uthread_t uth = pthread_kern->get_bsdthread_info(th);
struct turnstile **tstore = NULL;
int res;
assert(continuation != THREAD_CONTINUE_NULL);
ksyn_waitq_element_t kwe = pthread_kern->uthread_get_uukwe(uth);
bzero(kwe, sizeof(*kwe));
kwe->kwe_count = 1;
kwe->kwe_lockseq = lockseq & PTHRW_COUNT_MASK;
kwe->kwe_state = KWE_THREAD_INWAIT;
kwe->kwe_uth = uth;
kwe->kwe_thread = th;
kwe->kwe_flags = kwe_flags;
res = ksyn_queue_insert(kwq, kqi, kwe, lockseq, fit);
if (res != 0) {
//panic("psynch_rw_wrlock: failed to enqueue\n"); // XXX ksyn_wqunlock(kwq);
return res;
}
PTHREAD_TRACE(psynch_mutex_kwqwait, kwq->kw_addr, kwq->kw_inqueue,
kwq->kw_prepost.count, kwq->kw_intr.count);
if (_kwq_use_turnstile(kwq)) {
// pthread mutexes and rwlocks both (at least sometimes) know their
// owner and can use turnstiles. Otherwise, we pass NULL as the
// tstore to the shims so they wait on the global waitq.
tstore = &kwq->kw_turnstile;
}
......
}
再去查看_kwq_use_turnstile
的定義,代碼還是很誠實的,只有在KSYN_WQTYPE_MTX
才會啟用turnstile
進行優先順序反轉保護,而讀寫鎖的類型為KSYN_WQTYPE_RWLOCK
,這說明讀寫鎖不會使用_kwq_use_turnstile
,所以無法避免優先順序反轉。
#define KSYN_WQTYPE_MTX 0x01
#define KSYN_WQTYPE_CVAR 0x02
#define KSYN_WQTYPE_RWLOCK 0x04
#define KSYN_WQTYPE_SEMA 0x08
static inline bool
_kwq_use_turnstile(ksyn_wait_queue_t kwq)
{
// If we had writer-owner information from the
// rwlock then we could use the turnstile to push on it. For now, only
// plain mutexes use it.
return (_kwq_type(kwq) == KSYN_WQTYPE_MTX);
}
另外在_pthread_find_owner
也可以看到,讀寫鎖的owner
是0
void
_pthread_find_owner(thread_t thread,
struct stackshot_thread_waitinfo * waitinfo)
{
ksyn_wait_queue_t kwq = _pthread_get_thread_kwq(thread);
switch (waitinfo->wait_type) {
case kThreadWaitPThreadMutex:
assert((kwq->kw_type & KSYN_WQTYPE_MASK) == KSYN_WQTYPE_MTX);
waitinfo->owner = thread_tid(kwq->kw_owner);
waitinfo->context = kwq->kw_addr;
break;
/* Owner of rwlock not stored in kernel space due to races. Punt
* and hope that the userspace address is helpful enough. */
case kThreadWaitPThreadRWLockRead:
case kThreadWaitPThreadRWLockWrite:
assert((kwq->kw_type & KSYN_WQTYPE_MASK) == KSYN_WQTYPE_RWLOCK);
waitinfo->owner = 0;
waitinfo->context = kwq->kw_addr;
break;
/* Condvars don't have owners, so just give the userspace address. */
case kThreadWaitPThreadCondVar:
assert((kwq->kw_type & KSYN_WQTYPE_MASK) == KSYN_WQTYPE_CVAR);
waitinfo->owner = 0;
waitinfo->context = kwq->kw_addr;
break;
case kThreadWaitNone:
default:
waitinfo->owner = 0;
waitinfo->context = 0;
break;
}
}
把鎖更換為讀寫鎖,驗證一下前面的理論是否正確:
pthread_rwlock_init(&_rwlock, NULL);
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
printf("begin : \n");
printThreadPriority();
printf("queue before lock \n");
pthread_rwlock_rdlock(&_rwlock); //確保 backgroundQueue 先得到鎖
printf("queue lock \n");
printThreadPriority();
dispatch_async(dispatch_get_main_queue(), ^{
printf("before main lock\n");
pthread_rwlock_wrlock(&_rwlock);
printf("in main lock\n");
pthread_rwlock_unlock(&_rwlock);
printf("after main unlock\n");
});
sleep(10);
printThreadPriority();
printf("queue unlock\n");
pthread_rwlock_unlock(&_rwlock);
printf("queue after unlock\n");
});
begin :
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue before lock
queue lock
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
before main lock
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue unlock
queue after unlock
in main lock
after main unlock
可以看到讀寫鎖不會發生優先順序提升
dispatch_sync
這個API
都比較熟悉了,這裡直接驗證:
// 當前線程為主線程
dispatch_queue_attr_t qosAttribute = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_SERIAL, QOS_CLASS_BACKGROUND, 0);
_queue = dispatch_queue_create("com.demo.test", qosAttribute);
printThreadPriority();
dispatch_async(_queue, ^{
printf("dispatch_async before dispatch_sync : \n");
printThreadPriority();
});
dispatch_sync(_queue, ^{
printf("dispatch_sync: \n");
printThreadPriority();
});
dispatch_async(_queue, ^{
printf("dispatch_async after dispatch_sync: \n");
printThreadPriority();
});
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
dispatch_async before dispatch_sync :
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
dispatch_sync:
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
dispatch_async after dispatch_sync:
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
_queue
是一個低優先順序隊列(QOS_CLASS_BACKGROUND
),可以看到dispatch_sync
調用壓入隊列的任務,以及在這之前dispatch_async
壓入的任務,都被提升到較高的優先順序47
(和主線程一致),而最後一個dispatch_async
的任務則以優先順序4
來執行。
dispatch_wait
// 當前線程為主線程
dispatch_queue_attr_t qosAttribute = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_SERIAL, QOS_CLASS_BACKGROUND, 0);
_queue = dispatch_queue_create("com.demo.test", qosAttribute);
printf("main thread\n");
printThreadPriority();
dispatch_block_t block = dispatch_block_create(DISPATCH_BLOCK_INHERIT_QOS_CLASS, ^{
printf("sub thread\n");
sleep(2);
printThreadPriority();
});
dispatch_async(_queue, block);
dispatch_wait(block, DISPATCH_TIME_FOREVER);
_queue
是一個低優先順序隊列(QOS_CLASS_BACKGROUND
),當在當前主線程使用dispatch_wait
進行等待時,輸出如下,低優先順序的任務被提升到優先順序47
main thread
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
sub thread
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
而如果將dispatch_wait(block, DISPATCH_TIME_FOREVER)
註釋掉之後,輸出如下:
main thread
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
sub thread
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
值得註意的是,
dispatch_wait
是一個巨集(C11
的泛型),或者是一個入口函數,它可以接受dispatch_block_t
,dispatch_group_t
,dispatch_semaphore_t
3
種類型的參數,但是這裡的具體含義應該是指dispatch_block_wait
,只有dispatch_block_wait
會調整優先順序,避免優先順序反轉。
intptr_t
dispatch_wait(void *object, dispatch_time_t timeout);
#if __has_extension(c_generic_selections)
#define dispatch_wait(object, timeout) \
_Generic((object), \
dispatch_block_t:dispatch_block_wait, \
dispatch_group_t:dispatch_group_wait, \
dispatch_semaphore_t:dispatch_semaphore_wait \
)((object),(timeout))
#endif
神秘的信號量
dispatch_semaphore
之前對dispatch_semaphore
的認知非常淺薄,經常把二值信號量和互斥鎖劃等號。但是通過調研後發現:dispatch_semaphore
沒有 QoS
的概念,沒有記錄當前持有信號量的線程(owner
),所以有高優先順序的線程在等待鎖時,內核無法知道該提高哪個線程的調試優先順序(QoS
)。如果鎖持有者優先順序比其他線程低,高優先順序的等待線程將一直等待。Mutex vs Semaphore: What’s the Difference? 一文詳細比對了Mutex
和Semaphore
之間的區別。
Semaphores are for signaling (sames a condition variables, events) while mutexes are for mutual exclusion. Technically, you can also use semaphores for mutual exclusion (a mutex can be thought as a binary semaphore) but you really shouldn’t.Right, but libdispatch doesn’t have a mutex. It has semaphores and queues. So if you’re trying to use libdispatch and you don’t want the closure-based aspect of queues, you might be tempted to use a semaphore instead. Don’t do that, use os_unfair_lock or pthread_mutex (or a higher-level construct like NSLock) instead.
這些是一些警示,可以看到dispatch_semaphore
十分危險,使用需要特別小心。
這裡通過蘋果官方提供的demo進行解釋:
__block NSString *taskName = nil;
dispatch_semaphore_t sema = dispatch_semaphore_create(0);
[self.connection.remoteObjectProxy requestCurrentTaskName:^(NSString *task) {
taskName = task;
dispatch_semaphore_signal(sema);
}];
dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER);
return taskName;
- 假設在主線程執行這段代碼,那麼當前線程的優先順序是
QOS_CLASS_USER_INTERACTIVE
; - 由於從主線程進行了非同步,非同步任務隊列的
QoS
將會被提升為QOS_CLASS_USER_INITIATED
; - 主線程被信號量
sema
阻塞,而負責釋放該信號量的非同步任務的優先順序QOS_CLASS_USER_INITIATED
低於主線程的優先順序QOS_CLASS_USER_INTERACTIVE
,因此可能會發生優先順序反轉。
值得一提的是,Clang
專門針對這種情況進行了靜態檢測:
static auto findGCDAntiPatternWithSemaphore() -> decltype(compoundStmt()) {
const char *SemaphoreBinding = "semaphore_name";
auto SemaphoreCreateM = callExpr(allOf(
callsName("dispatch_semaphore_create"),
hasArgument(0, ignoringParenCasts(integerLiteral(equals(0))))));
auto SemaphoreBindingM = anyOf(
forEachDescendant(
varDecl(hasDescendant(SemaphoreCreateM)).bind(SemaphoreBinding)),
forEachDescendant(binaryOperator(bindAssignmentToDecl(SemaphoreBinding),
hasRHS(SemaphoreCreateM))));
auto HasBlockArgumentM = hasAnyArgument(hasType(
hasCanonicalType(blockPointerType())
));
auto ArgCallsSignalM = hasAnyArgument(stmt(hasDescendant(callExpr(
allOf(
callsName("dispatch_semaphore_signal"),
equalsBoundArgDecl(0, SemaphoreBinding)
)))));
auto HasBlockAndCallsSignalM = allOf(HasBlockArgumentM, ArgCallsSignalM);
auto HasBlockCallingSignalM =
forEachDescendant(
stmt(anyOf(
callExpr(HasBlockAndCallsSignalM),
objcMessageExpr(HasBlockAndCallsSignalM)
)));
auto SemaphoreWaitM = forEachDescendant(
callExpr(
allOf(
callsName("dispatch_semaphore_wait"),
equalsBoundArgDecl(0, SemaphoreBinding)
)
).bind(WarnAtNode));
return compoundStmt(
SemaphoreBindingM, HasBlockCallingSignalM, SemaphoreWaitM);
}
如果想使用該功能,只需要打開xcode
設置即可:
另外,
dispatch_group
跟semaphore
類似,在調用enter()
方法時,無法預知誰會調用leave()
,所以系統也無法知道其owner
是誰,所以同樣不會有優先順序提升的問題。
信號量卡死現身說法
dispatch_semaphore
給筆者的印象非常深刻,之前寫過一段這樣的代碼:使用信號量在主線程同步等待相機授權結果。
__block BOOL auth = NO;
dispatch_semaphore_t semaphore = dispatch_semaphore_create(0);
[KTAuthorizeService requestAuthorizationWithType:KTPermissionsTypeCamera completionHandler:^(BOOL allow) {
auth = allow;
dispatch_semaphore_signal(semaphore);
}];
dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER);
上線後長期占據卡死top1
,當時百思不得其解,在深入瞭解到信號量無法避免優先順序反轉後,終於豁然開朗,一掃之前心中的陰霾。這類問題一般通過2
種方式來解決:
- 使用同步
API
BOOL auth = [KTAuthorizeService authorizationWithType:KTPermissionsTypeCamera];
// do something next
- 非同步回調,不要在當前線程等待
[KTAuthorizeService requestAuthorizationWithType:KTPermissionsTypeCamera completionHandler:^(BOOL allow) {
BOOL auth = allow;
// do something next via callback
}];
幾個概念
turnstile
前文提到XNU
使用turnstile
進行優先順序繼承,這裡對turnstile
機制進行簡單的描述和理解。在XNU
內核中,存在著大量的同步對象(例如lck_mtx_t
),為瞭解決優先順序反轉的問題,每個同步對象都必須對應一個分離的數據結構來維護大量的信息,例如阻塞在這個同步對象上的線程隊列。可以想象一下,如果每個同步對象都要分配一個這樣的數據結構,將造成極大的記憶體浪費。為瞭解決這個問題,XNU
採用了turnstile
機制,一種空間利用率很高的解決方案。該方案的提出依據是同一個線程在同一時刻不能同時阻塞於多個同步對象上。這一事實允許所有同步對象只需要保留一個指向turnstile
的指針,且在需要的時候去分配一個turnstile
即可,而turnstile
則包含了操作一個同步對象需要的所有信息,例如阻塞線程的隊列、擁有這個同步對象的線程指針。turnstile
是從池中動態分配的,這個池的大小會隨著系統中已分配的線程數目增加而增加,所以turnstile
總數將始終低於或等於線程數,這也決定了turnstile
的數目是可控的。turnstile
由阻塞在該同步對象上的第一個線程負責分配,當沒有更多線程阻塞在該同步對象上,turnstile
會被釋放,回收到池中。turnstile
的數據結構如下:
struct turnstile {
struct waitq ts_waitq; /* waitq embedded in turnstile */
turnstile_inheritor_t ts_inheritor; /* thread/turnstile inheriting the priority (IL, WL) */
union {
struct turnstile_list ts_free_turnstiles; /* turnstile free list (IL) */
SLIST_ENTRY(turnstile) ts_free_elm; /* turnstile free list element (IL) */
};
struct priority_queue_sched_max ts_inheritor_queue; /* Queue of turnstile with us as an inheritor (WL) */
union {
struct priority_queue_entry_sched ts_inheritor_links; /* Inheritor queue links */
struct mpsc_queue_chain ts_deallocate_link; /* thread deallocate link */
};
SLIST_ENTRY(turnstile) ts_htable_link; /* linkage for turnstile in global hash table */
uintptr_t ts_proprietor; /* hash key lookup turnstile (IL) */
os_refcnt_t ts_refcount; /* reference count for turnstiles */
_Atomic uint32_t ts_type_gencount; /* gen count used for priority chaining (IL), type of turnstile (IL) */
uint32_t ts_port_ref; /* number of explicit refs from ports on send turnstile */
turnstile_update_flags_t ts_inheritor_flags; /* flags for turnstile inheritor (IL, WL) */
uint8_t ts_priority; /* priority of turnstile (WL) */
#if DEVELOPMENT || DEBUG
uint8_t ts_state; /* current state of turnstile (IL) */
queue_chain_t ts_global_elm; /* global turnstile chain */
thread_t ts_thread; /* thread the turnstile is attached to */
thread_t ts_prev_thread; /* thread the turnstile was attached before donation */
#endif
};
優先順序數值
在驗證環節有一些優先順序數值,這裡藉助「Mac OS® X and iOS Internals 」解釋一下:實驗中涉及到的優先順序數值都是相對於Mach
層而言的,且都是用戶線程數值
- 用戶線程的優先順序是0~63;
NSQualityOfServiceBackground
的Mach
層級優先順序數是4;NSQualityOfServiceUtility
的Mach
層級優先順序數是20;NSQualityOfServiceDefault
的Mach
層級優先順序數是31;NSQualityOfServiceUserInitiated
的Mach
層級優先順序數是37;NSQualityOfServiceUserInteractive
的Mach
層級優先順序是47;
- 內核線程的優先順序是80~95;
- 實時系統線程的優先順序是96~127;
- 64~79被保留給系統使用;
總結
本文主要闡述了優先順序反轉的一些概念和解決思路,並結合iOS
平臺的幾種鎖進行了詳細的調研。通過深入的理解,可以去規避一些不必要的優先順序反轉,從而進一步避免卡死異常。位元組跳動 APM
團隊也針對線程的優先順序做了監控處理,進而達到發現和預防優先順序反轉的目的。
加入我們
位元組跳動 APM 中台致力於提升整個集團內全系產品的性能和穩定性表現,技術棧覆蓋iOS/Android/Server/Web/Hybrid/PC/游戲/小程式等,工作內容包括但不限於性能穩定性監控,問題排查,深度優化,防劣化等。長期期望為業界輸出更多更有建設性的問題發現和深度優化手段。
歡迎對位元組APM團隊職位感興趣的同學投遞簡歷到郵箱 [email protected] 。
參考文檔
- WWDC18 What’ s New in LLVM - actorsfit
- https://developer.apple.com/videos/play/wwdc2015/718
- https://developer.apple.com/forums/thread/124155
- https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Multithreading/CreatingThreads/CreatingThreads.html
- https://developer.apple.com/library/archive/documentation/Performance/Conceptual/EnergyGuide-iOS/PrioritizeWorkWithQoS.html
- https://github.com/llvm-mirror/clang/blob/google/stable/lib/StaticAnalyzer/Checkers/ GCDAntipatternChecker.cpp
- Don’t use dispatch semaphores where mutexes (or dispatch queues) would suffice
- Concurrency Problems Written by Scott Grosch
- https://www.jianshu.com/p/af64e05de503
- https://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_rwlock_wrlock.html
- iOS中各種“鎖”的理解及應用
- 不再安全的 OSSpinLock
- https://blog.actorsfit.com/a?ID=00001-499b1c8e-8a7f-4960-a1c1-c8e2f42c08c6
- https://objccn.io/issue-2-1/#Priority-Inversion
- Introduction to RTOS - Solution to Part 11 (Priority Inversion)
- https://threadreaderapp.com/thread/1229999590482444288.html#
- 深入理解iOS中的鎖
- Threads can infect each other with their low priority