soft lockup和hard lockup介紹_ZenDei技術網路在線

soft lockup和hard lockup介紹

-Advertisement-

在linux kernel里，有一個debug選項LOCKUP_DETECTOR。使能它可以打開kernel中的soft lockup和hard lockup探測。這兩個東西到底有什麼用處那？首先，soft/hard lockup的實現在kernel/watchdog.c中，主體涉及到了3個 ...

在linux kernel里，有一個debug選項LOCKUP_DETECTOR。

使能它可以打開kernel中的soft lockup和hard lockup探測。

這兩個東西到底有什麼用處那？

首先，soft/hard lockup的實現在kernel/watchdog.c中，

主體涉及到了3個東西：kernel線程，時鐘中斷，NMI中斷（不可屏蔽中斷）。

這3個東西具有不一樣的優先順序，依次是kernel線程 < 時鐘中斷 < NMI中斷。

而正是用到了他們之間優先順序的區別，所以才可以調試系統運行中的兩種問題：

搶占被長時間關閉而導致進程無法調度（soft lockup）
中斷被長時間關閉而導致更嚴重的問題（hard lockup）

接下來我們從具體代碼入手分析linux（3.10）是如何實現這兩種lockup的探測的：

static struct smp_hotplug_thread watchdog_threads = {
	.store			= &softlockup_watchdog,
	.thread_should_run	= watchdog_should_run,
	.thread_fn		= watchdog,
	.thread_comm		= "watchdog/%u",
	.setup			= watchdog_enable,
	.park			= watchdog_disable,
	.unpark			= watchdog_enable,
};
 
void __init lockup_detector_init(void)
{
	set_sample_period();
	if (smpboot_register_percpu_thread(&watchdog_threads)) {
		pr_err("Failed to create watchdog threads, disabled\n");
		watchdog_disabled = -ENODEV;
	}
}

首先，系統會為每個cpu core註冊一個一般的kernel線程，名字叫watchdog/0, watchdog/1...以此類推。

這個線程會定期得調用watchdog函數

static void __touch_watchdog(void)
{
	__this_cpu_write(watchdog_touch_ts, get_timestamp());
}
 
static void watchdog(unsigned int cpu)
{
	__this_cpu_write(soft_lockup_hrtimer_cnt,
			 __this_cpu_read(hrtimer_interrupts));
	__touch_watchdog();
}

我們先不理會這個線程處理函數watchdog多久被調用一次，我們就先簡單的認為，這個線程是負責更新watchdog_touch_ts的。

然後我們要看一下時鐘中斷了：

static void watchdog_enable(unsigned int cpu)
{
	struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
 
	/* kick off the timer for the hardlockup detector */
	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
	hrtimer->function = watchdog_timer_fn;
 
	/* done here because hrtimer_start can only pin to smp_processor_id() */
	hrtimer_start(hrtimer, ns_to_ktime(sample_period),
		      HRTIMER_MODE_REL_PINNED);
}

時鐘中斷處理函數是watchdog_timer_fn

static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
{
	unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts);
	int duration;
 
	/* kick the hardlockup detector */
	watchdog_interrupt_count();
 
	duration = is_softlockup(touch_ts);
	if (unlikely(duration)) {
		if (softlockup_panic)
			panic("softlockup: hung tasks");
		__this_cpu_write(soft_watchdog_warn, true);
	} else
		__this_cpu_write(soft_watchdog_warn, false);
 
	return HRTIMER_RESTART;
}

這個函數主要做2件事情：

更新hrtimer_interrupts變數。

static void watchdog_interrupt_count(void)
{
	__this_cpu_inc(hrtimer_interrupts);
}

這裡我們就要回顧之前創建的那個kernel線程了，多久調用一次就和hrtimer_interrupts的值密切相關。

static int watchdog_should_run(unsigned int cpu)
{
	return __this_cpu_read(hrtimer_interrupts) !=
		__this_cpu_read(soft_lockup_hrtimer_cnt);
}

那就是說，kernel線程和時鐘中斷函數的頻率是相同的。預設情況是10*2/5=4秒一次。

int __read_mostly watchdog_thresh = 10;
 
static int get_softlockup_thresh(void)
{
	return watchdog_thresh * 2;
}
 
static void set_sample_period(void)
{
	/*
	 * convert watchdog_thresh from seconds to ns
	 * the divide by 5 is to give hrtimer several chances (two
	 * or three with the current relation between the soft
	 * and hard thresholds) to increment before the
	 * hardlockup detector generates a warning
	 */
	sample_period = get_softlockup_thresh() * ((u64)NSEC_PER_SEC / 5);
}

就是要探測是否有soft lockup發生。

static int is_softlockup(unsigned long touch_ts)
{
	unsigned long now = get_timestamp();
 
	/* Warn about unreasonable delays: */
	if (time_after(now, touch_ts + get_softlockup_thresh()))
		return now - touch_ts;
 
	return 0;
}

很容易理解，其實就是查看watchdog_touch_ts變數在最近20秒的時間內，有沒有被創建的kernel thread更新過。

假如沒有，那就意味著線程得不到調度，所以很有可能就是在某個cpu core上搶占被關閉了，所以調度器沒有辦法進行調度。

這種情況下，系統往往不會死掉，但是會很慢。

有了soft lockup的機制，我們就能儘早的發現這樣的問題了。

分析完soft lockup，我們繼續分析hard lockup

static int watchdog_nmi_enable(unsigned int cpu)
{
	struct perf_event_attr *wd_attr;
 
	wd_attr = &wd_hw_attr;
	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
 
	/* Try to register using hardware perf events */
	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
}

perf_event_create_kernel_counter函數主要是註冊了一個硬體的事件。

這個硬體在x86里叫performance monitoring，這個硬體有一個功能就是在cpu clock經過了多少個周期後發出一個NMI中斷出來。

u64 hw_nmi_get_sample_period(int watchdog_thresh)
{
	return (u64)(cpu_khz) * 1000 * watchdog_thresh;
}

在這裡，根據當前cpu的頻率，算出一個值，也就是20秒cpu clock經過的周期數。

這樣一來，當cpu全負荷跑完20秒後，就會有一個NMI中斷發出，而這個中斷的出路函數就是watchdog_overflow_callback。

static void watchdog_overflow_callback(struct perf_event *event,
		 struct perf_sample_data *data,
		 struct pt_regs *regs)
{
	if (is_hardlockup()) {
		int this_cpu = smp_processor_id();
 
		if (hardlockup_panic)
			panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
		else
			WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu);
 
		return;
	}
 
	return;
}

這個函數主要就是調用is_hardlockup

static int is_hardlockup(void)
{
	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
 
	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
		return 1;
 
	__this_cpu_write(hrtimer_interrupts_saved, hrint);
	return 0;
}

而這個函數主要就是查看hrtimer_interrupts變數在時鐘中斷處理函數里有沒有被更新。

假如沒有更新，就意味著中斷出了問題，可能被錯誤代碼長時間的關中斷了。

那這樣，相應的問題也就暴露出來了。

您的分享是我們最大的動力!

-Advertisement-

更多相關文章

NETCore 讀取JSON配置文件

AppSetting.json { "Logging": { "LogLevel": { "Default": "Information", "Microsoft": "Warning", "Microsoft.Hosting.Lifetime": "Information" } }, "Allow ...
.Net Core微服務入門全紀錄（二）——Consul-服務註冊與發現（上）

前言上一篇【.Net Core微服務入門全紀錄（一）——項目搭建】講到要做到服務的靈活伸縮，那麼需要有一種機制來實現它，這個機制就是服務註冊與發現。當然這也並不是必要的，如果你的服務實例很少，並且很穩定，那麼就沒有必要使用服務註冊與發現。服務註冊與發現服務註冊：簡單理解，就是有一個註冊中心，我 ...
容器技術之Docker資源限制

所謂OOM就是當系統上的應用申請記憶體資源時，發現申請不到記憶體，這個時候Linux內核就會啟動OOM，內核將給系統上的所有進程進行評分，通過評分得分最高的進程就會被系統第一個幹掉，從而騰出一些記憶體空間，如果騰出的記憶體空間還是不夠該應用使用，它會繼續殺得分第二高的，直到應用有足夠的記憶體使用；一旦發生O... ...
將tomcat註冊成服務(windows)、linux安裝svn、docker、nginx、zipkin以及rabbitMQ教程

windows下將tomcat註冊為服務進入tomcat/bin 目錄下輸入：service.bat install(remove) 修改服務名稱，為修改service.bat rem Set default Service name set SERVICE_NAME=Tomcat6qd set ...
Docker鏡像與容器的常用操作

Docker鏡像加速配置；Docker鏡像常用操作；Dcoker容器常用操作。 ...
Redis在CentOS for LInux上安裝詳細教程

1.首先上傳安裝包，這裡我以 redis-5.0.8.tar.gz 為例子。 Linux下載redis地址：wget http://download.redis.io/releases/redis-5.0.8.tar.gz 先在opt目錄下建立一個軟體包上傳文件夾 mkdir /opt/softwa ...
如何使用 Shell 腳本來查看多個伺服器的埠是否打開？

我們在進行伺服器配置的時候，經常要查看伺服器的某個埠是否已經開放。如果伺服器只有一兩台的話，那很好辦，只需要使用 nc 命令一個個查看即可。但是，如果你的伺服器是個集群，有很多台呢？那如果還一個個手動去檢查的話，效率肯定是無比低下的，年底裁員名單里肯定有你。在這種情況下，我們完全可以使用 Sh ...
結合中斷上下文切換和進程上下文切換分析Linux內核的一般執行過程

實驗內容：結合中斷上下文切換和進程上下文切換分析Linux內核一般執行過程以fork和execve系統調用為例分析中斷上下文的切換分析execve系統調用中斷上下文的特殊之處分析fork子進程啟動執行時進程上下文的特殊之處以系統調用作為特殊的中斷，結合中斷上下文切換和進程上下文切換分析Li ...