合宙開發團隊10月11日的提交中開源了AIR32F103的PLL倍頻調節的代碼, 使得在 Linux 下通過 GCC Arm 工具鏈也能編譯運行216MHz. 示例中的 CoreMark_256MHz 項目, 可以將AIR32F103運行在最高256MHz主頻下, 運行CoreMark性能測試. 以... ...
目錄
- AIR32F103(一) 合宙AIR32F103CBT6開發板上手報告
- AIR32F103(二) Linux環境和LibOpenCM3項目模板
- AIR32F103(三) Linux環境基於標準外設庫的項目模板
- AIR32F103(四) 27倍頻216MHz,CoreMark跑分測試
27倍頻運行216MHz主頻
合宙開發團隊10月11日的提交中開源了AIR32F103的PLL倍頻調節的代碼, 使得在 Linux 下通過 GCC Arm 工具鏈也能編譯運行216MHz.
代碼示例
示例代碼位於 Examples/NonFreeRTOS/RCC 下: https://gitee.com/iosetting/air32f103-template/tree/master/Examples/NonFreeRTOS/RCC
編譯時的註意事項
編譯時需要註意避免編譯器對AIR_RCC_PLLConfig()
這個函數的優化.
這個函數的源代碼如下, 可以看到其中會對特定地址(例如 0x40016C00)進行連續的寫操作, 編譯時如果優化參數不是-O0, 就大概率會將這些寫操作合併或調換位置.
uint32_t AIR_RCC_PLLConfig(uint32_t RCC_PLLSource, uint32_t RCC_PLLMul, uint8_t Latency)
{
uint32_t sramsize = 0;
uint32_t pllmul = 0;
FunctionalState pwr_gating_state = 0;
/* Check the parameters */
assert_param(IS_RCC_PLL_SOURCE(RCC_PLLSource));
assert_param(IS_RCC_PLL_MUL(RCC_PLLMul));
*(uint32_t *)(0x400210F0) = BIT(0);//開啟sys_cfg門控
*(uint32_t *)(0x40016C00) = 0xa7d93a86;//解一、二、三級鎖
*(uint32_t *)(0x40016C00) = 0xab12dfcd;
*(uint32_t *)(0x40016C00) = 0xcded3526;
sramsize = *(uint32_t *)(0x40016C18);
*(uint32_t *)(0x40016C18) = 0x200183FF;//配置sram大小, 將BOOT使用對sram打開
*(uint32_t *)(0x4002228C) = 0xa5a5a5a5;//QSPI解鎖
SysFreq_Set(RCC_PLLMul,Latency ,0,1);
RCC->CFGR = (RCC->CFGR & ~0x00030000) | RCC_PLLSource;
//恢復配置前狀態
*(uint32_t *)(0x40016C18) = sramsize;
*(uint32_t *)(0x400210F0) = 0;//開啟sys_cfg門控
*(uint32_t *)(0x40016C00) = ~0xa7d93a86;//加一、二、三級鎖
*(uint32_t *)(0x40016C00) = ~0xab12dfcd;
*(uint32_t *)(0x40016C00) = ~0xcded3526;
*(uint32_t *)(0x4002228C) = ~0xa5a5a5a5;//QSPI解鎖
return 1;
}
解決的方法一, 是通過調整編譯參數
- 在Keil5下, 可以對 air32f10x_rcc_ex.c 這個文件右鍵單獨設置 AC6 編譯選項. AC5可以使用 註解, AC6不再支持文件內部單個函數的優化設置
- 在GCC Arm中, 可以通過 Makefile 對 air32f10x_rcc_ex.c 設置單獨的-O0參數, 也可以在代碼中增加屏障避免優化(例如在兩行代碼之間增加
__NOP();
), 還可以通過int foo(int i) __attribute__((optimize("-O3")));
這樣的形式, 參考GNU GCC文檔
因此將庫函數修改為
__attribute__((optimize("-O0"))) uint32_t AIR_RCC_PLLConfig(uint32_t RCC_PLLSource, uint32_t RCC_PLLMul, uint8_t Latency)
CoreMark跑分結果
示例中的 CoreMark_256MHz 項目, 可以將AIR32F103運行在最高256MHz主頻下, 運行CoreMark性能測試. 以下是分別在 256MHz, 216MHz, 72MHz 不同編譯器版本下的測試結果.
32倍頻, 256MHz
編譯器 GCC11.2.1
SYSCLK: 256000000, HCLK: 256000000, PCLK1: 128000000, PCLK2: 256000000, ADCCLK: 128000000␊
2K performance run parameters for coremark.␊
CoreMark Size : 666␊
Total ticks : 17054␊
Total time (secs): 17.054000␊
Iterations/Sec : 586.372698␊
Iterations : 10000␊
Compiler version : GCC11.2.1 20220111␊
Compiler flags : -O3␊
Memory location : STACK␊
seedcrc : 0xe9f5␊
[0]crclist : 0xe714␊
[0]crcmatrix : 0x1fd7␊
[0]crcstate : 0x8e3a␊
[0]crcfinal : 0x988c␊
Correct operation validated. See readme.txt for run and reporting rules.␊
CoreMark 1.0 : 586.372698 / GCC11.2.1 20220111 -O3 / STACK␊
IR32F103 CoreMark␊
編譯器 GCC11.3.1, 256MHz
SYSCLK: 256000000, HCLK: 256000000, PCLK1: 128000000, PCLK2: 256000000, ADCCLK: 128000000␊
2K performance run parameters for coremark.␊
CoreMark Size : 666␊
Total ticks : 17054␊
Total time (secs): 17.054000␊
Iterations/Sec : 586.372698␊
Iterations : 10000␊
Compiler version : GCC11.3.1 20220712␊
Compiler flags : -O3␊
Memory location : STACK␊
seedcrc : 0xe9f5␊
[0]crclist : 0xe714␊
[0]crcmatrix : 0x1fd7␊
[0]crcstate : 0x8e3a␊
[0]crcfinal : 0x988c␊
Correct operation validated. See readme.txt for run and reporting rules.␊
CoreMark 1.0 : 586.372698 / GCC11.3.1 20220712 -O3 / STACK␊
編譯器 GCC12.2.0 256MHz
SYSCLK: 256000000, HCLK: 256000000, PCLK1: 128000000, PCLK2: 256000000, ADCCLK: 128000000␊
2K performance run parameters for coremark.␊
CoreMark Size : 666␊
Total ticks : 15822␊
Total time (secs): 15.822000␊
Iterations/Sec : 632.031349␊
Iterations : 10000␊
Compiler version : GCC12.2.0␊
Compiler flags : -O3␊
Memory location : STACK␊
seedcrc : 0xe9f5␊
[0]crclist : 0xe714␊
[0]crcmatrix : 0x1fd7␊
[0]crcstate : 0x8e3a␊
[0]crcfinal : 0x988c␊
Correct operation validated. See readme.txt for run and reporting rules.␊
CoreMark 1.0 : 632.031349 / GCC12.2.0 -O3 / STACK␊
27倍頻, 216MHz
GCC11.2.1 216MHz
SYSCLK: 216000000, HCLK: 216000000, PCLK1: 108000000, PCLK2: 216000000, ADCCLK: 108000000␊
2K performance run parameters for coremark.␊
CoreMark Size : 666␊
Total ticks : 20213␊
Total time (secs): 20.213000␊
Iterations/Sec : 494.731114␊
Iterations : 10000␊
Compiler version : GCC11.2.1 20220111␊
Compiler flags : -O3␊
Memory location : STACK␊
seedcrc : 0xe9f5␊
[0]crclist : 0xe714␊
[0]crcmatrix : 0x1fd7␊
[0]crcstate : 0x8e3a␊
[0]crcfinal : 0x988c␊
Correct operation validated. See readme.txt for run and reporting rules.␊
CoreMark 1.0 : 494.731114 / GCC11.2.1 20220111 -O3 / STACK␊
GCC11.3.1 216MHz
SYSCLK: 216000000, HCLK: 216000000, PCLK1: 108000000, PCLK2: 216000000, ADCCLK: 108000000␊
2K performance run parameters for coremark.␊
CoreMark Size : 666␊
Total ticks : 20213␊
Total time (secs): 20.213000␊
Iterations/Sec : 494.731114␊
Iterations : 10000␊
Compiler version : GCC11.3.1 20220712␊
Compiler flags : -O3␊
Memory location : STACK␊
seedcrc : 0xe9f5␊
[0]crclist : 0xe714␊
[0]crcmatrix : 0x1fd7␊
[0]crcstate : 0x8e3a␊
[0]crcfinal : 0x988c␊
Correct operation validated. See readme.txt for run and reporting rules.␊
CoreMark 1.0 : 494.731114 / GCC11.3.1 20220712 -O3 / STACK␊
GCC12.2.0 216MHz
SYSCLK: 216000000, HCLK: 216000000, PCLK1: 108000000, PCLK2: 216000000, ADCCLK: 108000000␊
2K performance run parameters for coremark.␊
CoreMark Size : 666␊
Total ticks : 18753␊
Total time (secs): 18.753000␊
Iterations/Sec : 533.248014␊
Iterations : 10000␊
Compiler version : GCC12.2.0␊
Compiler flags : -O3␊
Memory location : STACK␊
seedcrc : 0xe9f5␊
[0]crclist : 0xe714␊
[0]crcmatrix : 0x1fd7␊
[0]crcstate : 0x8e3a␊
[0]crcfinal : 0x988c␊
Correct operation validated. See readme.txt for run and reporting rules.␊
CoreMark 1.0 : 533.248014 / GCC12.2.0 -O3 / STACK␊
編譯器 AC6 (GCCClang 15.0.0) 216MHz
SYSCLK: 216.0Mhz, HCLK: 216.0Mhz, PCLK1: 108.0Mhz, PCLK2: 216.0Mhz, ADCCLK: 108.0Mhz
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 16328
Total time (secs): 16.328000
Iterations/Sec : 612.444880
Iterations : 10000
Compiler version : GCCClang 15.0.0
Compiler flags : -O3
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0x988c
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 612.444880 / GCCClang 15.0.0 -O3 / STACK
9倍頻, 72MHz
編譯器 GCC11.2.1 72MHz
SYSCLK: 72000000, HCLK: 72000000, PCLK1: 36000000, PCLK2: 72000000, ADCCLK: 36000000␊
2K performance run parameters for coremark.␊
CoreMark Size : 666␊
Total ticks : 60677␊
Total time (secs): 60.677000␊
Iterations/Sec : 164.807093␊
Iterations : 10000␊
Compiler version : GCC11.2.1 20220111␊
Compiler flags : -O3␊
Memory location : STACK␊
seedcrc : 0xe9f5␊
[0]crclist : 0xe714␊
[0]crcmatrix : 0x1fd7␊
[0]crcstate : 0x8e3a␊
[0]crcfinal : 0x988c␊
Correct operation validated. See readme.txt for run and reporting rules.␊
CoreMark 1.0 : 164.807093 / GCC11.2.1 20220111 -O3 / STACK␊
編譯器 GCC11.3.1 72MHz
SYSCLK: 72000000, HCLK: 72000000, PCLK1: 36000000, PCLK2: 72000000, ADCCLK: 36000000␊
2K performance run parameters for coremark.␊
CoreMark Size : 666␊
Total ticks : 60677␊
Total time (secs): 60.677000␊
Iterations/Sec : 164.807093␊
Iterations : 10000␊
Compiler version : GCC11.3.1 20220712␊
Compiler flags : -O3␊
Memory location : STACK␊
seedcrc : 0xe9f5␊
[0]crclist : 0xe714␊
[0]crcmatrix : 0x1fd7␊
[0]crcstate : 0x8e3a␊
[0]crcfinal : 0x988c␊
Correct operation validated. See readme.txt for run and reporting rules.␊
CoreMark 1.0 : 164.807093 / GCC11.3.1 20220712 -O3 / STACK␊
編譯器 GCC12.2.0 72MHz
SYSCLK: 72000000, HCLK: 72000000, PCLK1: 36000000, PCLK2: 72000000, ADCCLK: 36000000␊
2K performance run parameters for coremark.␊
CoreMark Size : 666␊
Total ticks : 56293␊
Total time (secs): 56.293000␊
Iterations/Sec : 177.641980␊
Iterations : 10000␊
Compiler version : GCC12.2.0␊
Compiler flags : -O3␊
Memory location : STACK␊
seedcrc : 0xe9f5␊
[0]crclist : 0xe714␊
[0]crcmatrix : 0x1fd7␊
[0]crcstate : 0x8e3a␊
[0]crcfinal : 0x988c␊
Correct operation validated. See readme.txt for run and reporting rules.␊
CoreMark 1.0 : 177.641980 / GCC12.2.0 -O3 / STACK␊
總結
可以看到, GCC11.2和GCC11.3是一樣的, GCC12.2生成的二進位執行性能提升了接近8%, 但是性能最好的還是AC6, 比GCC12.2性能高了接近15%.