微軟在去年發佈了Bash On Windows, 這項技術允許在Windows上運行Linux程式, 我相信已經有很多文章解釋過Bash On Windows的原理, 而今天的這篇文章將會講解如何自己實現一個簡單的原生Linux程式運行器, 這個運行器在用戶層實現, 原理和Bash On Windo ...
微軟在去年發佈了Bash On Windows, 這項技術允許在Windows上運行Linux程式, 我相信已經有很多文章解釋過Bash On Windows的原理,
而今天的這篇文章將會講解如何自己實現一個簡單的原生Linux程式運行器, 這個運行器在用戶層實現, 原理和Bash On Windows不完全一樣,比較接近Linux上的Wine.
示常式序完整的代碼在github上, 地址是 https://github.com/303248153/HelloElfLoader
初步瞭解ELF格式
首先讓我們先瞭解什麼是原生Linux程式, 以下說明摘自維基百科
In computing, the Executable and Linkable Format (ELF, formerly named Extensible Linking Format), is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4),[2] and later in the Tool Interface Standard,[1] it was quickly accepted among different vendors of Unix systems. In 1999, it was chosen as the standard binary file format for Unix and Unix-like systems on x86 processors by the 86open project.
By design, ELF is flexible, extensible, and cross-platform, not bound to any given central processing unit (CPU) or instruction set architecture. This has allowed it to be adopted by many different operating systems on many different hardware platforms.
Linux的可執行文件格式採用了ELF格式, 而Windows採用了PE格式, 也就是我們經常使用的exe文件的格式.
ELF格式的結構如下
大致上可以分為這些部分
- ELF頭,在文件的最開頭,儲存了類型和版本等信息
- 程式頭, 供程式運行時解釋器(interpreter)使用
- 節頭, 供程式編譯時鏈接器(linker)使用, 運行時不需要讀節頭
- 節內容, 不同的節作用都不一樣
- .text 代碼節,保存了主要的程式代碼
- .rodata 保存了只讀的數據,例如字元串(const char*)
- .data 保存了可讀寫的數據,例如全局變數
- 還有其他各種各樣的節
讓我們來實際看一下Linux可執行程式的樣子
以下的編譯環境是Ubuntu 16.04 x64 + gcc 5.4.0, 編譯環境不一樣可能會得出不同的結果
首先創建hello.c
,寫入以下的代碼
#include <stdio.h>
int max(int x, int y) {
return x > y ? x : y;
}
int main() {
printf("max is %d\n", max(123, 321));
printf("test many arguments %d %d %d %s %s %s %s %s %s\n", 1, 2, 3, "a", "b", "c", "d", "e", "f");
return 100;
}
然後使用gcc編譯這份代碼
gcc hello.c
編譯完成後你可以看到hello.c
旁邊多了一個a.out
, 這就是linux的可執行文件了, 現在可以在linux上運行它
./a.out
你可以看到以下輸出
max is 321
test many arguments 1 2 3 a b c d e f
我們來看看a.out
包含了什麼,解析ELF文件
可以使用readelf
命令
readelf -a ./a.out
可以看到輸出了以下的信息
ELF 頭:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
類別: ELF64
數據: 2 補碼,小端序 (little endian)
版本: 1 (current)
OS/ABI: UNIX - System V
ABI 版本: 0
類型: EXEC (可執行文件)
系統架構: Advanced Micro Devices X86-64
版本: 0x1
入口點地址: 0x400430
程式頭起點: 64 (bytes into file)
Start of section headers: 6648 (bytes into file)
標誌: 0x0
本頭的大小: 64 (位元組)
程式頭大小: 56 (位元組)
Number of program headers: 9
節頭大小: 64 (位元組)
節頭數量: 31
字元串表索引節頭: 28
節頭:
[號] 名稱 類型 地址 偏移量
大小 全體大小 旗標 鏈接 信息 對齊
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000400238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000400254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i NOTE 0000000000400274 00000274
0000000000000024 0000000000000000 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000400298 00000298
000000000000001c 0000000000000000 A 5 0 8
[ 5] .dynsym DYNSYM 00000000004002b8 000002b8
0000000000000060 0000000000000018 A 6 1 8
[ 6] .dynstr STRTAB 0000000000400318 00000318
000000000000003f 0000000000000000 A 0 0 1
[ 7] .gnu.version VERSYM 0000000000400358 00000358
0000000000000008 0000000000000002 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000400360 00000360
0000000000000020 0000000000000000 A 6 1 8
[ 9] .rela.dyn RELA 0000000000400380 00000380
0000000000000018 0000000000000018 A 5 0 8
[10] .rela.plt RELA 0000000000400398 00000398
0000000000000030 0000000000000018 AI 5 24 8
[11] .init PROGBITS 00000000004003c8 000003c8
000000000000001a 0000000000000000 AX 0 0 4
[12] .plt PROGBITS 00000000004003f0 000003f0
0000000000000030 0000000000000010 AX 0 0 16
[13] .plt.got PROGBITS 0000000000400420 00000420
0000000000000008 0000000000000000 AX 0 0 8
[14] .text PROGBITS 0000000000400430 00000430
00000000000001f2 0000000000000000 AX 0 0 16
[15] .fini PROGBITS 0000000000400624 00000624
0000000000000009 0000000000000000 AX 0 0 4
[16] .rodata PROGBITS 0000000000400630 00000630
0000000000000050 0000000000000000 A 0 0 8
[17] .eh_frame_hdr PROGBITS 0000000000400680 00000680
000000000000003c 0000000000000000 A 0 0 4
[18] .eh_frame PROGBITS 00000000004006c0 000006c0
0000000000000114 0000000000000000 A 0 0 8
[19] .init_array INIT_ARRAY 0000000000600e10 00000e10
0000000000000008 0000000000000000 WA 0 0 8
[20] .fini_array FINI_ARRAY 0000000000600e18 00000e18
0000000000000008 0000000000000000 WA 0 0 8
[21] .jcr PROGBITS 0000000000600e20 00000e20
0000000000000008 0000000000000000 WA 0 0 8
[22] .dynamic DYNAMIC 0000000000600e28 00000e28
00000000000001d0 0000000000000010 WA 6 0 8
[23] .got PROGBITS 0000000000600ff8 00000ff8
0000000000000008 0000000000000008 WA 0 0 8
[24] .got.plt PROGBITS 0000000000601000 00001000
0000000000000028 0000000000000008 WA 0 0 8
[25] .data PROGBITS 0000000000601028 00001028
0000000000000010 0000000000000000 WA 0 0 8
[26] .bss NOBITS 0000000000601038 00001038
0000000000000008 0000000000000000 WA 0 0 1
[27] .comment PROGBITS 0000000000000000 00001038
0000000000000034 0000000000000001 MS 0 0 1
[28] .shstrtab STRTAB 0000000000000000 000018ea
000000000000010c 0000000000000000 0 0 1
[29] .symtab SYMTAB 0000000000000000 00001070
0000000000000660 0000000000000018 30 47 8
[30] .strtab STRTAB 0000000000000000 000016d0
000000000000021a 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
程式頭:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000007d4 0x00000000000007d4 R E 200000
LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
0x0000000000000228 0x0000000000000230 RW 200000
DYNAMIC 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
0x00000000000001d0 0x00000000000001d0 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x0000000000000680 0x0000000000400680 0x0000000000400680
0x000000000000003c 0x000000000000003c R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
GNU_RELRO 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
0x00000000000001f0 0x00000000000001f0 R 1
Section to Segment mapping:
段節...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame
03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07
08 .init_array .fini_array .jcr .dynamic .got
Dynamic section at offset 0xe28 contains 24 entries:
標記 類型 名稱/值
0x0000000000000001 (NEEDED) 共用庫:[libc.so.6]
0x000000000000000c (INIT) 0x4003c8
0x000000000000000d (FINI) 0x400624
0x0000000000000019 (INIT_ARRAY) 0x600e10
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x600e18
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x400298
0x0000000000000005 (STRTAB) 0x400318
0x0000000000000006 (SYMTAB) 0x4002b8
0x000000000000000a (STRSZ) 63 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x601000
0x0000000000000002 (PLTRELSZ) 48 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x400398
0x0000000000000007 (RELA) 0x400380
0x0000000000000008 (RELASZ) 24 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffe (VERNEED) 0x400360
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0x400358
0x0000000000000000 (NULL) 0x0
重定位節 '.rela.dyn' 位於偏移量 0x380 含有 1 個條目:
偏移量 信息 類型 符號值 符號名稱 + 加數
000000600ff8 000300000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
重定位節 '.rela.plt' 位於偏移量 0x398 含有 2 個條目:
偏移量 信息 類型 符號值 符號名稱 + 加數
000000601018 000100000007 R_X86_64_JUMP_SLO 0000000000000000 printf@GLIBC_2.2.5 + 0
000000601020 000200000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0
The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
Symbol table '.dynsym' contains 4 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2.5 (2)
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.2.5 (2)
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
Symbol table '.symtab' contains 68 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000400238 0 SECTION LOCAL DEFAULT 1
2: 0000000000400254 0 SECTION LOCAL DEFAULT 2
3: 0000000000400274 0 SECTION LOCAL DEFAULT 3
4: 0000000000400298 0 SECTION LOCAL DEFAULT 4
5: 00000000004002b8 0 SECTION LOCAL DEFAULT 5
6: 0000000000400318 0 SECTION LOCAL DEFAULT 6
7: 0000000000400358 0 SECTION LOCAL DEFAULT 7
8: 0000000000400360 0 SECTION LOCAL DEFAULT 8
9: 0000000000400380 0 SECTION LOCAL DEFAULT 9
10: 0000000000400398 0 SECTION LOCAL DEFAULT 10
11: 00000000004003c8 0 SECTION LOCAL DEFAULT 11
12: 00000000004003f0 0 SECTION LOCAL DEFAULT 12
13: 0000000000400420 0 SECTION LOCAL DEFAULT 13
14: 0000000000400430 0 SECTION LOCAL DEFAULT 14
15: 0000000000400624 0 SECTION LOCAL DEFAULT 15
16: 0000000000400630 0 SECTION LOCAL DEFAULT 16
17: 0000000000400680 0 SECTION LOCAL DEFAULT 17
18: 00000000004006c0 0 SECTION LOCAL DEFAULT 18
19: 0000000000600e10 0 SECTION LOCAL DEFAULT 19
20: 0000000000600e18 0 SECTION LOCAL DEFAULT 20
21: 0000000000600e20 0 SECTION LOCAL DEFAULT 21
22: 0000000000600e28 0 SECTION LOCAL DEFAULT 22
23: 0000000000600ff8 0 SECTION LOCAL DEFAULT 23
24: 0000000000601000 0 SECTION LOCAL DEFAULT 24
25: 0000000000601028 0 SECTION LOCAL DEFAULT 25
26: 0000000000601038 0 SECTION LOCAL DEFAULT 26
27: 0000000000000000 0 SECTION LOCAL DEFAULT 27
28: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
29: 0000000000600e20 0 OBJECT LOCAL DEFAULT 21 __JCR_LIST__
30: 0000000000400460 0 FUNC LOCAL DEFAULT 14 deregister_tm_clones
31: 00000000004004a0 0 FUNC LOCAL DEFAULT 14 register_tm_clones
32: 00000000004004e0 0 FUNC LOCAL DEFAULT 14 __do_global_dtors_aux
33: 0000000000601038 1 OBJECT LOCAL DEFAULT 26 completed.7585
34: 0000000000600e18 0 OBJECT LOCAL DEFAULT 20 __do_global_dtors_aux_fin
35: 0000000000400500 0 FUNC LOCAL DEFAULT 14 frame_dummy
36: 0000000000600e10 0 OBJECT LOCAL DEFAULT 19 __frame_dummy_init_array_
37: 0000000000000000 0 FILE LOCAL DEFAULT ABS hello.c
38: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
39: 00000000004007d0 0 OBJECT LOCAL DEFAULT 18 __FRAME_END__
40: 0000000000600e20 0 OBJECT LOCAL DEFAULT 21 __JCR_END__
41: 0000000000000000 0 FILE LOCAL DEFAULT ABS
42: 0000000000600e18 0 NOTYPE LOCAL DEFAULT 19 __init_array_end
43: 0000000000600e28 0 OBJECT LOCAL DEFAULT 22 _DYNAMIC
44: 0000000000600e10 0 NOTYPE LOCAL DEFAULT 19 __init_array_start
45: 0000000000400680 0 NOTYPE LOCAL DEFAULT 17 __GNU_EH_FRAME_HDR
46: 0000000000601000 0 OBJECT LOCAL DEFAULT 24 _GLOBAL_OFFSET_TABLE_
47: 0000000000400620 2 FUNC GLOBAL DEFAULT 14 __libc_csu_fini
48: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
49: 0000000000601028 0 NOTYPE WEAK DEFAULT 25 data_start
50: 0000000000601038 0 NOTYPE GLOBAL DEFAULT 25 _edata
51: 0000000000400624 0 FUNC GLOBAL DEFAULT 15 _fini
52: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@@GLIBC_2.2.5
53: 0000000000400526 22 FUNC GLOBAL DEFAULT 14 max
54: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@@GLIBC_
55: 0000000000601028 0 NOTYPE GLOBAL DEFAULT 25 __data_start
56: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
57: 0000000000601030 0 OBJECT GLOBAL HIDDEN 25 __dso_handle
58: 0000000000400630 4 OBJECT GLOBAL DEFAULT 16 _IO_stdin_used
59: 00000000004005b0 101 FUNC GLOBAL DEFAULT 14 __libc_csu_init
60: 0000000000601040 0 NOTYPE GLOBAL DEFAULT 26 _end
61: 0000000000400430 42 FUNC GLOBAL DEFAULT 14 _start
62: 0000000000601038 0 NOTYPE GLOBAL DEFAULT 26 __bss_start
63: 000000000040053c 109 FUNC GLOBAL DEFAULT 14 main
64: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
65: 0000000000601038 0 OBJECT GLOBAL HIDDEN 25 __TMC_END__
66: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
67: 00000000004003c8 0 FUNC GLOBAL DEFAULT 11 _init
Version symbols section '.gnu.version' contains 4 entries:
地址: 0000000000400358 Offset: 0x000358 Link: 5 (.dynsym)
000: 0 (*本地*) 2 (GLIBC_2.2.5) 2 (GLIBC_2.2.5) 0 (*本地*)
Version needs section '.gnu.version_r' contains 1 entries:
地址:0x0000000000400360 Offset: 0x000360 Link: 6 (.dynstr)
000000: 版本: 1 文件:libc.so.6 計數:1
0x0010:名稱:GLIBC_2.2.5 標誌:無 版本:2
Displaying notes found at file offset 0x00000254 with length 0x00000020:
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 2.6.32
Displaying notes found at file offset 0x00000274 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: debd3d7912be860a432b5c685a6cff7fd9418528
從上面的信息中我們可以知道這個文件的類型是ELF64
, 也就是64位的可執行程式, 並且有9個程式頭和31個節頭, 各個節的作用大家可以在網上找到資料, 這篇文章中只涉及到以下的節
- .init 程式初始化的代碼
- .rela.dyn 需要重定位的變數列表
- .rela.plt 需要重定位的函數列表
- .plt 調用動態鏈接函數的代碼
- .text 保存了主要的程式代碼
- .init 保存了程式的初始化代碼, 用於初始化全局變數等
- .fini 保存了程式的終止代碼, 用於析構全局變數等
- .rodata 保存了只讀的數據,例如字元串(const char*)
- .data 保存了可讀寫的數據,例如全局變數
- .dynsym 動態鏈接的符號表
- .dynstr 動態鏈接的符號名稱字元串
- .dynamic 動態鏈接所需要的信息,供程式運行時使用(不需要訪問節頭)
什麼是動態鏈接
上面的程式中調用了printf
函數, 然而這個函數的實現並不在./a.out
中, 那麼printf
函數在哪裡, 又是怎麼被調用的?
printf
函數的實現在glibc
庫中, 也就是/lib/x86_64-linux-gnu/libc.so.6
中, 在執行./a.out
的時候會在glibc
庫中找到這個函數併進行調用, 我們來看看這段代碼
執行以下命令反編譯./a.out
objdump -c -S ./a.out
我們可以看到以下的代碼
00000000004003f0 <printf@plt-0x10>:
4003f0: ff 35 12 0c 20 00 pushq 0x200c12(%rip) # 601008 <_GLOBAL_OFFSET_TABLE_+0x8>
4003f6: ff 25 14 0c 20 00 jmpq *0x200c14(%rip) # 601010 <_GLOBAL_OFFSET_TABLE_+0x10>
4003fc: 0f 1f 40 00 nopl 0x0(%rax)
0000000000400400 <printf@plt>:
400400: ff 25 12 0c 20 00 jmpq *0x200c12(%rip) # 601018 <_GLOBAL_OFFSET_TABLE_+0x18>
400406: 68 00 00 00 00 pushq $0x0
40040b: e9 e0 ff ff ff jmpq 4003f0 <_init+0x28>
000000000040053c <main>:
40053c: 55 push %rbp
40053d: 48 89 e5 mov %rsp,%rbp
400540: be 41 01 00 00 mov $0x141,%esi
400545: bf 7b 00 00 00 mov $0x7b,%edi
40054a: e8 d7 ff ff ff callq 400526 <max>
40054f: 89 c6 mov %eax,%esi
400551: bf 38 06 40 00 mov $0x400638,%edi
400556: b8 00 00 00 00 mov $0x0,%eax
40055b: e8 a0 fe ff ff callq 400400 <printf@plt>
在這一段代碼中,我們可以看到調用printf
會首先調用0x400400
的printf@plt
printf@plt
會負責在運行時找到實際的printf
函數並跳轉到該函數
在這裡實際的printf
函數會保存在0x400406 + 0x200c12 = 0x601018
中
需要註意的是0x601018
一開始並不會指向實際的printf
函數,而是會指向0x400406
, 為什麼會這樣? 因為Linux的可執行程式為了考慮性能,不會在一開始就解決所有動態連接的函數,而是選擇了延遲解決.
在上面第一次jmpq *0x200c12(%rip)
會跳轉到下一條指令0x400406
, 又會繼續跳轉到0x4003f0
, 再跳轉到0x601010
指向的地址, 0x601010
指向的地址就是延遲解決的實現, 第一次延遲解決成功後, 0x601018
就會指向實際的printf
, 以後調用就會直接跳轉到實際的printf
上.
程式入口點
Linux程式運行首先會從_start
函數開始, 上面readelf
中的入口點地址0x400430
就是_start
函數的地址,
0000000000400430 <_start>:
400430: 31 ed xor %ebp,%ebp
400432: 49 89 d1 mov %rdx,%r9
400435: 5e pop %rsi
400436: 48 89 e2 mov %rsp,%rdx
400439: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
40043d: 50 push %rax
40043e: 54 push %rsp
40043f: 49 c7 c0 20 06 40 00 mov $0x400620,%r8
400446: 48 c7 c1 b0 05 40 00 mov $0x4005b0,%rcx
40044d: 48 c7 c7 3c 05 40 00 mov $0x40053c,%rdi
400454: e8 b7 ff ff ff callq 400410 <__libc_start_main@plt>
400459: f4 hlt
40045a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
接下來_start
函數會調用__libc_start_main
函數, __libc_start_main
是libc庫中定義的初始化函數, 負責初始化全局變數和調用main
函數等工作.
__libc_start_main
函數還負責設置返回值和退出進程, 可以看到上面調用__libc_start_main
後的指令是hlt
, 這個指令永遠不會被執行.
實現Linux程式運行器
在擁有以上的知識後我們可以先構想以下的運行器需要做什麼.
因為x64的Windows和Linux程式使用的cpu指令集都是一樣的,我們可以直接執行彙編而不需要一個指令模擬器,
而且這次我打算在用戶層實現, 所以不能像Bash On Windows一樣模擬syscall, 這個運行器會像下圖一樣模擬libc庫的函數
這樣運行器需要做的事情有:
- 解析ELF文件
- 載入程式代碼到指定的記憶體地址
- 載入數據到指定的記憶體地址
- 提供動態鏈接的函數實現
- 執行載入的程式代碼
這些工作會在以下的示常式序中一一實現, 完整的源代碼可以看文章頂部的鏈接
首先我們需要把ELF文件格式對應的代碼從binutils
中複製過來, 它包含了ELF頭, 程式頭和相關的數據結構, 裡面用unsigned char[]
是為了防止alignment
, 這樣結構體可以直接從文件內容中轉換過來
ELFDefine.h
:
#pragma once
namespace HelloElfLoader {
// 以下內容複製自
// https://github.com/aeste/binutils/blob/develop/elfcpp/elfcpp.h
// https://github.com/aeste/binutils/blob/develop/include/elf/external.h
// e_ident中各項的偏移值
const int EI_MAG0 = 0;
const int EI_MAG1 = 1;
const int EI_MAG2 = 2;
const int EI_MAG3 = 3;
const int EI_CLASS = 4;
const int EI_DATA = 5;
const int EI_VERSION = 6;
const int EI_OSABI = 7;
const int EI_ABIVERSION = 8;
const int EI_PAD = 9;
const int EI_NIDENT = 16;
// ELF文件類型
enum {
ELFCLASSNONE = 0,
ELFCLASS32 = 1,
ELFCLASS64 = 2
};
// ByteOrder
enum {
ELFDATANONE = 0,
ELFDATA2LSB = 1,
ELFDATA2MSB = 2
};
// 程式頭類型
enum PT
{
PT_NULL = 0,
PT_LOAD = 1,
PT_DYNAMIC = 2,
PT_INTERP = 3,
PT_NOTE = 4,
PT_SHLIB = 5,
PT_PHDR = 6,
PT_TLS = 7,
PT_LOOS = 0x60000000,
PT_HIOS = 0x6fffffff,
PT_LOPROC = 0x70000000,
PT_HIPROC = 0x7fffffff,
// The remaining values are not in the standard.
// Frame unwind information.
PT_GNU_EH_FRAME = 0x6474e550,
PT_SUNW_EH_FRAME = 0x6474e550,
// Stack flags.
PT_GNU_STACK = 0x6474e551,
// Read only after relocation.
PT_GNU_RELRO = 0x6474e552,
// Platform architecture compatibility information
PT_ARM_ARCHEXT = 0x70000000,
// Exception unwind tables
PT_ARM_EXIDX = 0x70000001
};
// 動態節類型
enum DT
{
DT_NULL = 0,
DT_NEEDED = 1,
DT_PLTRELSZ = 2,
DT_PLTGOT = 3,
DT_HASH = 4,
DT_STRTAB = 5,
DT_SYMTAB = 6,
DT_RELA = 7,
DT_RELASZ = 8,
DT_RELAENT = 9,
DT_STRSZ = 10,
DT_SYMENT = 11,
DT_INIT = 12,
DT_FINI = 13,
DT_SONAME = 14,
DT_RPATH = 15,
DT_SYMBOLIC = 16,
DT_REL = 17,
DT_RELSZ = 18,
DT_RELENT = 19,
DT_PLTREL = 20,
DT_DEBUG = 21,
DT_TEXTREL = 22,
DT_JMPREL = 23,
DT_BIND_NOW = 24,
DT_INIT_ARRAY = 25,
DT_FINI_ARRAY = 26,
DT_INIT_ARRAYSZ = 27,
DT_FINI_ARRAYSZ = 28,
DT_RUNPATH = 29,
DT_FLAGS = 30,
// This is used to mark a range of dynamic tags. It is not really
// a tag value.
DT_ENCODING = 32,
DT_PREINIT_ARRAY = 32,
DT_PREINIT_ARRAYSZ = 33,
DT_LOOS = 0x6000000d,
DT_HIOS = 0x6ffff000,
DT_LOPROC = 0x70000000,
DT_HIPROC = 0x7fffffff,
// The remaining values are extensions used by GNU or Solaris.
DT_VALRNGLO = 0x6ffffd00,
DT_GNU_PRELINKED = 0x6ffffdf5,
DT_GNU_CONFLICTSZ = 0x6ffffdf6,
DT_GNU_LIBLISTSZ = 0x6ffffdf7,
DT_CHECKSUM = 0x6ffffdf8,
DT_PLTPADSZ = 0x6ffffdf9,
DT_MOVEENT = 0x6ffffdfa,
DT_MOVESZ = 0x6ffffdfb,
DT_FEATURE = 0x6ffffdfc,
DT_POSFLAG_1 = 0x6ffffdfd,
DT_SYMINSZ = 0x6ffffdfe,
DT_SYMINENT = 0x6ffffdff,
DT_VALRNGHI = 0x6ffffdff,
DT_ADDRRNGLO = 0x6ffffe00,
DT_GNU_HASH = 0x6ffffef5,
DT_TLSDESC_PLT = 0x6ffffef6,
DT_TLSDESC_GOT = 0x6ffffef7,
DT_GNU_CONFLICT = 0x6ffffef8,
DT_GNU_LIBLIST = 0x6ffffef9,
DT_CONFIG = 0x6ffffefa,
DT_DEPAUDIT = 0x6ffffefb,
DT_AUDIT = 0x6ffffefc,
DT_PLTPAD = 0x6ffffefd,
DT_MOVETAB = 0x6ffffefe,
DT_SYMINFO = 0x6ffffeff,
DT_ADDRRNGHI = 0x6ffffeff,
DT_RELACOUNT = 0x6ffffff9,
DT_RELCOUNT = 0x6ffffffa,
DT_FLAGS_1 = 0x6ffffffb,
DT_VERDEF = 0x6ffffffc,
DT_VERDEFNUM = 0x6ffffffd,
DT_VERNEED = 0x6ffffffe,
DT_VERNEEDNUM = 0x6fffffff,
DT_VERSYM = 0x6ffffff0,
// Specify the value of _GLOBAL_OFFSET_TABLE_.
DT_PPC_GOT = 0x70000000,
// Specify the start of the .glink section.
DT_PPC64_GLINK = 0x70000000,
// Specify the start and size of the .opd section.
DT_PPC64_OPD = 0x70000001,
DT_PPC64_OPDSZ = 0x70000002,
// The index of an STT_SPARC_REGISTER symbol within the DT_SYMTAB
// symbol table. One dynamic entry exists for every STT_SPARC_REGISTER
// symbol in the symbol table.
DT_SPARC_REGISTER = 0x70000001,
DT_AUXILIARY = 0x7ffffffd,
DT_USED = 0x7ffffffe,
DT_FILTER = 0x7fffffff
};;
// ELF頭的定義
typedef struct {
unsigned char e_ident[16]; /* ELF "magic number" */
unsigned char e_type[2]; /* Identifies object file type */
unsigned char e_machine[2]; /* Specifies required architecture */
unsigned char e_version[4]; /* Identifies object file version */
unsigned char e_entry[8]; /* Entry point virtual address */
unsigned char e_phoff[8]; /* Program header table file offset */
unsigned char e_shoff[8]; /* Section header table file offset */
unsigned char e_flags[4]; /* Processor-specific flags */
unsigned char e_ehsize[2]; /* ELF header size in bytes */
unsigned char e_phentsize[2]; /* Program header table entry size */
unsigned char e_phnum[2]; /* Program header table entry count */
unsigned char e_shentsize[2]; /* Section header table entry size */
unsigned char e_shnum[2]; /* Section header table entry count */
unsigned char e_shstrndx[2]; /* Section header string table index */
} Elf64_External_Ehdr;
// 程式頭的定義
typedef struct {
unsigned char p_type[4]; /* Identifies program segment type */
unsigned char p_flags[4]; /* Segment flags */
unsigned char p_offset[8]; /* Segment file offset */
unsigned char p_vaddr[8]; /* Segment virtual address */
unsigned char p_paddr[8]; /* Segment physical address */
unsigned char p_filesz[8]; /* Segment size in file */
unsigned char p_memsz[8]; /* Segment size in memory */
unsigned char p_align[8]; /* Segment alignment, file & memory */
} Elf64_External_Phdr;
// DYNAMIC類型的程式頭的內容定義
typedef struct {
unsigned char d_tag[8]; /* entry tag value */
union {
unsigned char d_val[8];
unsigned char d_ptr[8];
} d_un;
} Elf64_External_Dyn;
// 動態鏈接的重定位記錄,部分系統會用Elf64_External_Rel
typedef struct {
unsigned char r_offset[8]; /* Location at which to apply the action */
unsigned char r_info[8]; /* index and type of relocation */
unsigned char r_addend[8]; /* Constant addend used to compute value */
} Elf64_External_Rela;
// 動態鏈接的符號信息
typedef struct {
unsigned char st_name[4]; /* Symbol name, index in string tbl */
unsigned char st_info[1]; /* Type and binding attributes */
unsigned char st_other[1]; /* No defined meaning, 0 */
unsigned char st_shndx[2]; /* Associated section index */
unsigned char st_value[8]; /* Value of the symbol */
unsigned char st_size[8]; /* Associated symbol size */
} Elf64_External_Sym;
}
接下來我們定義一個讀取和執行ELF文件的類, 這個類會在初始化時把文件載入到fileStream_
, execute
函數會負責執行
HelloElfLoader.h
:
#pragma once
#include <string>
#include <fstream>
namespace HelloElfLoader {
class Loader {
std::ifstream fileStream_;
public:
Loader(const std::string& path);
Loader(std::ifstream&& fileStream);
void execute();
};
}
構造函數如下, 也就是標準的c++打開文件的代碼
HelloElfLoader.cpp:
Loader::Loader(const std::string& path) :
Loader(std::ifstream(path, std::ios::in | std::ios::binary)) {}
Loader::Loader(std::ifstream&& fileStream) :
fileStream_(std::move(fileStream)) {
if (!fileStream_) {
throw std::runtime_error("open file failed");
}
}
接下來將實現上面所說的步驟, 首先是解析ELF文件
void Loader::execute() {
std::cout << "====== start loading elf ======" << std::endl;
// 檢查當前運行程式是否64位
if (sizeof(intptr_t) != sizeof(std::int64_t)) {
throw std::runtime_error("please use x64 compile and run this program");
}
// 讀取ELF頭
Elf64_External_Ehdr elfHeader = {};
fileStream_.seekg(0);
fileStream_.read(reinterpret_cast<char*>(&elfHeader), sizeof(elfHeader));
// 檢查ELF頭,只支持64位且byte order是little endian的程式
if (std::string(reinterpret_cast<char*>(elfHeader.e_ident), 4) != "\x7f\x45\x4c\x46") {
throw std::runtime_error("magic not match");
}
else if (elfHeader.e_ident[EI_CLASS] != ELFCLASS64) {
throw std::runtime_error("only support ELF64");
}
else if (elfHeader.e_ident[EI_DATA] != ELFDATA2LSB) {
throw std::runtime_error("only support little endian");
}
// 獲取program table的信息
std::uint32_t programTableOffset = *reinterpret_cast<std::uint32_t*>(elfHeader.e_phoff);
std::uint16_t programTableEntrySize = *reinterpret_cast<std::uint16_t*>(elfHeader.e_phentsize);
std::uint16_t programTableEntryNum = *reinterpret_cast<std::uint16_t*>(elfHeader.e_phnum);
std::cout << "program table at: " << programTableOffset << ", "
<< programTableEntryNum << " x " << programTableEntrySize << std::endl;
// 獲取section table的信息
// section table只給linker用,loader中其實不需要訪問section table
std::uint32_t sectionTableOffset = *reinterpret_cast<std::uint32_t*>(elfHeader.e_shoff);
std::uint16_t sectionTableEntrySize = *reinterpret_cast<std::uint16_t*>(elfHeader.e_shentsize);
std::uint16_t sectionTableEntryNum = *reinterpret_cast<std::uint16_t*>(elfHeader.e_shentsize);
std::cout << "section table at: " << sectionTableOffset << ", "
<< sectionTableEntryNum << " x " << sectionTableEntrySize << std::endl;
ELF文件的的開始部分就是ELF頭,和Elf64_External_Ehdr
結構體的結構相同, 我們可以讀到Elf64_External_Ehdr
結構體中,
然後ELF頭包含了程式頭和節頭的偏移值, 我們可以預先獲取到這些參數
節頭在運行時不需要使用, 運行時需要遍歷程式頭
// 準備動態鏈接的信息
std::uint64_t jmpRelAddr = 0; // 重定位記錄的開始地址
std::uint64_t pltRelType = 0; // 重定位記錄的類型 RELA或REL
std::uint64_t pltRelSize = 0; // 重定位記錄的總大小
std::uint64_t symTabAddr = 0; // 動態符號表的開始地址
std::uint64_t strTabAddr = 0; // 動態符號名稱表的開始地址
std::uint64_t strTabSize = 0; // 動態符號名稱表的總大小
// 遍歷program hedaer
std::vector<Elf64_External_Phdr> programHeaders;
programHeaders.resize(programTableEntryNum);
fileStream_.read(reinterpret_cast<char*>(programHeaders.data()), programTableEntryNum * programTableEntrySize);
std::vector<std::shared_ptr<void>> loadedSegments;
for (const auto& programHeader : programHeaders) {
std::uint32_t type = *reinterpret_cast<const std::uint32_t*>(programHeader.p_type);
if (type == PT_LOAD) {
// 把文件內容(包含程式代碼和數據)載入到虛擬記憶體,這個示例不考慮地址衝突
std::uint64_t fileOffset = *reinterpret_cast<const std::uint64_t*>(programHeader.p_offset);
std::uint64_t fileSize = *reinterpret_cast<const std::uint64_t*>(programHeader.p_filesz);
std::uint64_t virtAddr = *reinterpret_cast<const std::uint64_t*>(programHeader.p_vaddr);
std::uint64_t memSize = *reinterpret_cast<const std::uint64_t*>(programHeader.p_memsz);
if (memSize < fileSize) {
throw std::runtime_error("invalid memsz in program header, it shouldn't less than filesz");
}
// 在指定的虛擬地址分配記憶體
std::cout << std::hex << "allocate address at: 0x" << virtAddr <<
" size: 0x" << memSize << std::dec << std::endl;
void* addr = ::VirtualAlloc((void*)virtAddr, memSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
if (addr == nullptr) {
throw std::runtime_error("allocate memory at specific address failed");
}
loadedSegments.emplace_back(addr, [](void* ptr) { ::VirtualFree(ptr, 0, MEM_RELEASE); });
// 複製文件內容到虛擬記憶體
fileStream_.seekg(fileOffset);
if (!fileStream_.read(reinterpret_cast<char*>(addr), fileSize)) {
throw std::runtime_error("read contents into memory from LOAD program header failed");
}
}
else if (type == PT_DYNAMIC) {
// 遍歷動態節
std::uint64_t fileOffset = *reinterpret_cast<const std::uint64_t*>(programHeader.p_offset);
fileStream_.seekg(fileOffset);
Elf64_External_Dyn dynSection = {};
std::uint64_t dynSectionTag = 0;
std::uint64_t dynSectionVal = 0;
do {
if (!fileStream_.read(reinterpret_cast<char*>(&dynSection), sizeof(dynSection))) {
throw std::runtime_error("read dynamic section failed");
}
dynSectionTag = *reinterpret_cast<const std::uint64_t*>(dynSection.d_tag);
dynSectionVal = *reinterpret_cast<const std::uint64_t*>(dynSection.d_un.d_val);
if (dynSectionTag == DT_JMPREL) {
jmpRelAddr = dynSectionVal;
}
else if (dynSectionTag == DT_PLTREL) {
pltRelType = dynSectionVal;
}
else if (dynSectionTag == DT_PLTRELSZ) {
pltRelSize = dynSectionVal;
}
else if (dynSectionTag == DT_SYMTAB) {
symTabAddr = dynSectionVal;
}
else if (dynSectionTag == DT_STRTAB) {
strTabAddr = dynSectionVal;
}
else if (dynSectionTag == DT_STRSZ) {
strTabSize = dynSectionVal;
}
} while (dynSectionTag != 0);
}
}
還記得我們上面使用readelf
讀取到的信息嗎?
程式頭:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000007d4 0x00000000000007d4 R E 200000
LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
0x0000000000000228 0x0000000000000230 RW 200000
DYNAMIC 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
0x00000000000001d0 0x00000000000001d0 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x0000000000000680 0x0000000000400680 0x0000000000400680
0x000000000000003c 0x000000000000003c R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
GNU_RELRO 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
0x00000000000001f0 0x00000000000001f0 R 1
這裡面類型是LOAD
的頭代表需要載入文件的內容到記憶體,
Offset
是文件的偏移值, VirtAddr
是虛擬記憶體地址, FileSiz
是需要載入的文件大小, MemSiz
是需要分配的記憶體大小, Flags
是記憶體的訪問許可權,
這個示例不考慮訪問許可權(統一使用PAGE_EXECUTE_READWRITE).
這個程式有兩個LOAD頭, 第一個包含了代碼和只讀數據(.data, .init, .rodata等節的內容), 第二個包含了可寫數據(.init_array, .fini_array等節的內容).
把LOAD
頭對應的內容載入到指定的記憶體地址後我們就完成了構想中的第2個第3個步驟, 現在代碼和數據都在記憶體中了.
接下來我們還需要處理動態鏈接的函數, 處理所需的信息可以從DYNAMIC
頭得到
DYNAMIC
頭包含的信息有
Dynamic section at offset 0xe28 contains 24 entries:
標記 類型 名稱/值
0x0000000000000001 (NEEDED) 共用庫:[libc.so.6]
0x000000000000000c (INIT) 0x4003c8
0x000000000000000d (FINI) 0x400624
0x0000000000000019 (INIT_ARRAY) 0x600e10
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x600e18
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x400298
0x0000000000000005 (STRTAB) 0x400318
0x0000000000000006 (SYMTAB) 0x4002b8
0x000000000000000a (STRSZ) 63 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x601000
0x0000000000000002 (PLTRELSZ) 48 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x400398
0x0000000000000007 (RELA) 0x400380
0x0000000000000008 (RELASZ) 24 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffe (VERNEED) 0x400360
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0x400358
0x0000000000000000 (NULL) 0x0
一個個看上面代碼中涉及到的類型
- DT_JMPREL: 重定位記錄的開始地址, 指向
.rela.plt
節在記憶體中保存的地址 - DT_PLTREL: 重定位記錄的類型 RELA或RE, 這裡是RELAL
- DT_PLTRELSZ: 重定位記錄的總大小, 這裡是
24 * 2 = 48
重定位節 '.rela.plt' 位於偏移量 0x398 含有 2 個條目:
偏移量 信息 類型 符號值 符號名稱 + 加數
000000601018 000100000007 R_X86_64_JUMP_SLO 0000000000000000 printf@GLIBC_2.2.5 + 0
000000601020 000200000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0
- DT_SYMTAB: 動態符號表的開始地址, 指向
.dynsym
節在記憶體中保存的地址 - DT_STRTAB: 動態符號名稱表的開始地址, 指向
.dynstr
節在記憶體中保存的地址 - DT_STRSZ: 動態符號名稱表的總大小
Symbol table '.dynsym' contains 4 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2.5 (2)
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.2.5 (2)
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
在遍歷完程式頭以後, 我們可以知道有兩個動態鏈接的函數需要重定位, 它們分別是__libc_start_main
和printf
, 其中__libc_start_main
負責調用main
函數
接下來讓我們需要設置這些函數的地址
// 讀取動態鏈接符號表
std::string dynamicSymbolNames(reinterpret_cast<char*>(strTabAddr), strTabSize);
Elf64_External_Sym* dynamicSymbols = reinterpret_cast<Elf64_External_Sym*>(symTabAddr);
// 設置動態鏈接的函數地址
std::cout << std::hex << "read dynamic entires at: 0x" << jmpRelAddr <<
" size: 0x" << pltRelSize << std::dec << std::endl;
if (jmpRelAddr == 0 || pltRelType != DT_RELA || pltRelSize % sizeof(Elf64_External_Rela) != 0) {
throw std::runtime_error("invalid dynamic entry info, rel type should be rela");
}
std::vector<std::shared_ptr<void>> libraryFuncs;
for (std::uint64_t offset = 0; offset < pltRelSize; offset += sizeof(Elf64_External_Rela)) {
Elf64_External_Rela* rela = (Elf64_External_Rela*)(jmpRelAddr + offset);
std::uint64_t relaOffset = *reinterpret_cast<const std::uint64_t*>(rela->r_offset);
std::uint64_t relaInfo = *reinterpret_cast<const std::uint64_t*>(rela->r_info);
std::uint64_t relaSym = relaInfo >> 32