一段包含 bytes 類型的 protobuf 二進位數據,經過 pbjs 解碼生成的 json 文件,再傳遞給 pbjs 編碼後生成的二進位數據和原始數據差異巨大,經過一番探究,發現居然是 pbjs 的一個 bug,快來看看你是否踩過這個坑吧~ ...
問題背景
之前寫過一篇《使用腳本收發 protobuf 協議數據 》,通過 pbjs 命令可以將 protobuf 二進位數據轉換為 json:
> pbjs msg.proto --decode ProbeIpv6Response < response.bin
{
"selfAddr": {
"addrV6": "2409:8900:7900:8f0d:ecd9:4aee:aa3:7ad",
"portV6": 46066
},
"brosAddr": [
{
"addrV6": "2409:8a34:4405:6624:5250:9d04:cf77:d",
"portV6": 18720
},
{
"addrV6": "2409:8a34:401a:4151:59e6:69b4:37ad:dea2",
"portV6": 18679
},
{
"addrV6": "2409:8a20:2a02:20c0:7d11:9a6b:6b51:a9bb",
"portV6": 18824
},
{
"addrV6": "2409:8a20:e0d:7773:50d4:93b0:680a:b555",
"portV6": 18968
},
{
"addrV6": "2409:8a44:5b20:edf2:7c09:a5e1:cdbf:69c6",
"portV6": 18008
}
]
}
反過來將 json 編碼為二進位數據也沒問題:
> pbjs msg.proto --encode ProbeIpv6Response < response.json > response2.bin
> xxd response2.bin
00000000: 122b 0a25 3234 3039 3a38 3930 303a 3739 .+.%2409:8900:79
00000010: 3030 3a38 6630 643a 6563 6439 3a34 6165 00:8f0d:ecd9:4ae
00000020: 653a 6161 333a 3761 6410 f2e7 021a 2a0a e:aa3:7ad.....*.
00000030: 2432 3430 393a 3861 3334 3a34 3430 353a $2409:8a34:4405:
00000040: 3636 3234 3a35 3235 303a 3964 3034 3a63 6624:5250:9d04:c
00000050: 6637 373a 6410 a092 011a 2d0a 2732 3430 f77:d.....-.'240
00000060: 393a 3861 3334 3a34 3031 613a 3431 3531 9:8a34:401a:4151
00000070: 3a35 3965 363a 3639 6234 3a33 3761 643a :59e6:69b4:37ad:
00000080: 6465 6132 10f7 9101 1a2d 0a27 3234 3039 dea2.....-.'2409
00000090: 3a38 6132 303a 3261 3032 3a32 3063 303a :8a20:2a02:20c0:
000000a0: 3764 3131 3a39 6136 623a 3662 3531 3a61 7d11:9a6b:6b51:a
000000b0: 3962 6210 8893 011a 2c0a 2632 3430 393a 9bb.....,.&2409:
000000c0: 3861 3230 3a65 3064 3a37 3737 333a 3530 8a20:e0d:7773:50
000000d0: 6434 3a39 3362 303a 3638 3061 3a62 3535 d4:93b0:680a:b55
000000e0: 3510 9894 011a 2d0a 2732 3430 393a 3861 5.....-.'2409:8a
000000f0: 3434 3a35 6232 303a 6564 6632 3a37 6330 44:5b20:edf2:7c0
00000100: 393a 6135 6531 3a63 6462 663a 3639 6336 9:a5e1:cdbf:69c6
00000110: 10d8 8c01
編碼生成的 response2.bin 與原始的 response.bin 完全一致。
然而後來在編碼另一種消息格式的時候,重新生成的 bin 文件和原始文件有很大差異,導致不能通過 pbjs 將 json 轉化為 binary 數據。
問題現象
為了說明白這個問題,先來看消息定義:
message common
{
required uint32 mem1 = 1;
required uint32 mem2 = 2;
required bytes mem3 = 3;
required uint32 mem4 = 4;
required uint64 mem5 = 5;
optional uint32 mem6 = 6;
optional bytes mem7 = 7;
optional uint32 mem8 = 8;
optional uint64 mem9 = 9;
}
message query_md5
{
required common mema = 1;
required uint32 memb = 2;
required bytes memc = 3;
required uint32 memd = 4;
required uint64 meme = 5;
repeated bytes memf = 6;
}
出於協議安全考慮,這裡欄位全部使用 memxx 代替。下麵是 proto 消息對應的原始數據:
> xxd tmp/resp.bin
0000000: 0a37 0802 10c3 8040 1a10 ba38 ba93 af7a [email protected]
0000010: dae8 1967 2b89 ddd2 6b5c 200b 28b4 baba ...g+...k\ .(...
0000020: a8b6 0130 003a 0a32 2e32 2e31 3031 2e32 ...0.:.2.2.101.2
0000030: 3740 0348 f0db 8883 0910 001a 1067 c607 [email protected]..
0000040: 215e 47ae 8925 272d 6da0 f602 2d20 0028 !^G..%'-m...- .(
0000050: a0cd c90a 3210 d15b f326 4708 bfc7 01e0 ....2..[.&G.....
0000060: 4b3d c624 38a3 3210 3195 44f3 2f32 1b96 K=.$8.2.1.D./2..
0000070: 7865 6b82 fdb8 9560 3210 9a75 1735 fcca xek....`2..u.5..
0000080: e66f 7486 e9fa dc6a 9fab 3210 284c ebbf .ot....j..2.(L..
0000090: 36e0 1d57 5ca6 93de 391b 7a7d 3210 3e0b 6..W\...9.z}2.>.
00000a0: 439c 62a5 a401 c3ff cf00 3299 bc7e 3210 C.b.......2..~2.
00000b0: f6b9 9746 9ce6 9555 52d3 f50b 6ca3 8eb1 ...F...UR...l...
00000c0: 3210 9852 e7f1 2530 cb6b 7aa0 5569 fbcd 2..R..%0.kz.Ui..
00000d0: 0a5c 3210 d333 33b1 d516 d868 3938 f307 .\2..33....h98..
00000e0: bffe d4c0 3210 a646 0cdf 2874 486a 0bc0 ....2..F..(tHj..
00000f0: edf1 6f51 b59e 3210 1eee e679 5bf1 0832 ..oQ..2....y[..2
0000100: d5a7 fc4f 60cf 48ab 3210 c446 9663 f6a4 ...O`.H.2..F.c..
0000110: 87cd fc3f d560 285c 0ea4 ...?.`(\..
經過 pbjs 解碼後得到如下 json:
> pbjs query_md5.proto --decode query_md5 < tmp/resp.bin > resp.json
> jq -c '.' resp.json
{"mema":{"mem1":2,"mem2":1048643,"mem3":{"type":"Buffer","data":[186,56,186,147,175,122,218,232,25,103,43,137,221,210,107,92]},"mem4":11,"mem5":{"low":1695456564,"high":11,"unsigned":true},"mem6":0,"mem7":{"type":"Buffer","data":[50,46,50,46,49,48,49,46,50,55]},"mem8":3,"mem9":{"low":-1872613904,"high":0,"unsigned":true}},"memb":0,"memc":{"type":"Buffer","data":[103,198,7,33,94,71,174,137,37,39,45,109,160,246,2,45]},"memd":0,"meme":{"low":22177440,"high":0,"unsigned":true},"memf":[{"type":"Buffer","data":[209,91,243,38,71,8,191,199,1,224,75,61,198,36,56,163]},{"type":"Buffer","data":[49,149,68,243,47,50,27,150,120,101,107,130,253,184,149,96]},{"type":"Buffer","data":[154,117,23,53,252,202,230,111,116,134,233,250,220,106,159,171]},{"type":"Buffer","data":[40,76,235,191,54,224,29,87,92,166,147,222,57,27,122,125]},{"type":"Buffer","data":[62,11,67,156,98,165,164,1,195,255,207,0,50,153,188,126]},{"type":"Buffer","data":[246,185,151,70,156,230,149,85,82,211,245,11,108,163,142,177]},{"type":"Buffer","data":[152,82,231,241,37,48,203,107,122,160,85,105,251,205,10,92]},{"type":"Buffer","data":[211,51,51,177,213,22,216,104,57,56,243,7,191,254,212,192]},{"type":"Buffer","data":[166,70,12,223,40,116,72,106,11,192,237,241,111,81,181,158]},{"type":"Buffer","data":[30,238,230,121,91,241,8,50,213,167,252,79,96,207,72,171]},{"type":"Buffer","data":[196,70,150,99,246,164,135,205,252,63,213,96,40,92,14,164]}]}
內容比較多使用 jq -c 列為一行了。將 json 再次編碼後,得到的 bin 文件內容如下:
> pbjs query_md5.proto --encode query_md5 < resp.json > resp.bin
> xxd resp.bin
0000000: 0a08 0802 10c3 8040 1a00 1000 1a00 .......@......
從數據長度就能看出來,明顯與第一次不一樣。
初步分析
既然之前 pbjs 能成功的恢復 binary 數據,說明它本身的問題不大,複習下第一個消息的格式:
> cat msg.proto
message ProbeIpv6Request {
string xxxxx = 1;
string xxxx = 2;
string xxxxxxxx = 3;
string xxxxxxx = 4;
}
message V6AddrType {
string addrV6 = 1;
uint32 portV6 = 2;
}
message ProbeIpv6Response {
string xxxxx = 1;
V6AddrType selfAddr = 2;
repeated V6AddrType brosAddr = 3;
}
與出問題的消息區別主要在於:前者使用 string,後者使用 bytes。
bytes vs string
難道問題出在 bytes 類型上?嘗試將第二個消息中的 bytes 替換為 string:
message common
{
required uint32 mem1 = 1;
required uint32 mem2 = 2;
required string mem3 = 3;
required uint32 mem4 = 4;
required uint64 mem5 = 5;
optional uint32 mem6 = 6;
optional string mem7 = 7;
optional uint32 mem8 = 8;
optional uint64 mem9 = 9;
}
message query_md5
{
required common mema = 1;
required uint32 memb = 2;
required string memc = 3;
required uint32 memd = 4;
required uint64 meme = 5;
repeated string memf = 6;
}
但願 pbjs 對它這兩種類型做了相容,按 string 類型直接解析 binary 數據:
> pbjs query_md5.proto --decode query_md5 < tmp/resp.bin > resp.json
> cat resp.json
{
"mema": {
"mem1": 2,
"mem2": 1048643,
"mem3": "�8���z��\u0019g+���k\\",
"mem4": 11,
"mem5": {
"low": 1695456564,
"high": 11,
"unsigned": true
},
"mem6": 0,
"mem7": "2.2.101.27",
"mem8": 3,
"mem9": {
"low": -1872613904,
"high": 0,
"unsigned": true
}
},
"memb": 0,
"memc": "g�\u0007!^G��%'-m��\u0002-",
"memd": 0,
"meme": {
"low": 22177440,
"high": 0,
"unsigned": true
},
"memf": [
"�[�&G\b��\u0001�K=�$8�",
"1�D�/2\u001b�xek����`",
"�u\u00175���ot����j��",
"(L��6�\u001dW\\���9\u001bz}",
">\u000bC�b��\u0001���\u00002��~",
"���F���UR��\u000bl���",
"�R��%0�kz�Ui��\n\\",
"�33��\u0016�h98�\u0007����",
"�F\f�(tHj\u000b���oQ��",
"\u001e��y[�\b2է�O`�H�",
"�F�c�����?�`(\\\u000e�"
]
}
哈哈,居然解出來了,雖然 bytes 欄位出現了亂碼。如果原封不動的再 encode 回去,應該沒問題吧?
> pbjs query_md5.proto --encode query_md5 < resp.json > resp.bin
> xxd resp.bin
0000000: 0a49 0802 10c3 8040 1a22 efbf bd38 efbf .I.....@."...8..
0000010: bdef bfbd efbf bd7a efbf bdef bfbd 1967 .......z.......g
0000020: 2bef bfbd efbf bdef bfbd 6b5c 200b 28b4 +.........k\ .(.
0000030: baba a8b6 0130 003a 0a32 2e32 2e31 3031 .....0.:.2.2.101
0000040: 2e32 3740 0348 f0db 8883 0910 001a 1a67 [email protected]
0000050: efbf bd07 215e 47ef bfbd efbf bd25 272d ....!^G......%'-
0000060: 6def bfbd efbf bd02 2d20 0028 a0cd c90a m.......- .(....
0000070: 321e efbf bd5b efbf bd26 4708 efbf bdef 2....[...&G.....
0000080: bfbd 01ef bfbd 4b3d efbf bd24 38ef bfbd ......K=...$8...
0000090: 321e 31ef bfbd 44ef bfbd 2f32 1bef bfbd 2.1...D.../2....
00000a0: 7865 6bef bfbd efbf bdef bfbd efbf bd60 xek............`
00000b0: 3224 efbf bd75 1735 efbf bdef bfbd efbf 2$...u.5........
00000c0: bd6f 74ef bfbd efbf bdef bfbd efbf bd6a .ot............j
00000d0: efbf bdef bfbd 321c 284c efbf bdef bfbd ......2.(L......
00000e0: 36ef bfbd 1d57 5cef bfbd efbf bdef bfbd 6....W\.........
00000f0: 391b 7a7d 3220 3e0b 43ef bfbd 62ef bfbd 9.z}2 >.C...b...
0000100: efbf bd01 efbf bdef bfbd efbf bd00 32ef ..............2.
0000110: bfbd efbf bd7e 3226 efbf bdef bfbd efbf .....~2&........
0000120: bd46 efbf bdef bfbd efbf bd55 52ef bfbd .F.........UR...
0000130: efbf bd0b 6cef bfbd efbf bdef bfbd 321e ....l.........2.
0000140: efbf bd52 efbf bdef bfbd 2530 efbf bd6b ...R......%0...k
0000150: 7aef bfbd 5569 efbf bdef bfbd 0a5c 3222 z...Ui.......\2"
0000160: efbf bd33 33ef bfbd efbf bd16 efbf bd68 ...33..........h
0000170: 3938 efbf bd07 efbf bdef bfbd efbf bdef 98..............
0000180: bfbd 321e efbf bd46 0cef bfbd 2874 486a ..2....F....(tHj
0000190: 0bef bfbd efbf bdef bfbd 6f51 efbf bdef ..........oQ....
00001a0: bfbd 321c 1eef bfbd efbf bd79 5bef bfbd ..2........y[...
00001b0: 0832 d5a7 efbf bd4f 60ef bfbd 48ef bfbd .2.....O`...H...
00001c0: 3222 efbf bd46 efbf bd63 efbf bdef bfbd 2"...F...c......
00001d0: efbf bdef bfbd efbf bd3f efbf bd60 285c .........?...`(\
00001e0: 0eef bfbd ....
可以是可以,但還是和原始數據有很大差異:
這次是多了很多內容,給我的熱情澆了一大盆冷水。抱著試試看的態度,將這個 binary 數據發給伺服器,果然報錯了:
{"error_code":196608,"error_msg":"fgid not find","request_id":3933672364}
看起來是解析 bytes 欄位時失敗了。
在我的場景中,使用 pbjs 主要就是根據 json 生成請求的 protobuf 數據併發送給伺服器,從而得到 protobuf 響應,之後通過 pbjs 解析響應數據得到 json 數據,最後喂給 jq 來獲取想要的各種信息。
如果這一步走不通,後面的就全阻塞了,即使在本地可以使用 string 類型來迴轉換數據。
json unicode
一開始懷疑 string 類型中一些字元沒能成功轉換為對應的二進位數據,以上例中的 memc 欄位為例:
"memc":{"type":"Buffer","data":[103,198,7,33,94,71,174,137,37,39,45,109,160,246,2,45]}
轉換後變為:
"memc": "g�\u0007!^G��%'-m��\u0002-",
一些亂碼字元看起來很可疑,如何在 json 中表示一個字元的二進位形式?搜到了 json 中的 unicode 表達式 \u,它要求後面必需跟四位 hex 值,因此這裡做了一些轉換:
"memc": "\u0067\u00c6\u0007\u0021\u005e\u0047\u00ae\u0089\u0025\u0027\u002d\u006d\u00a0\u00f6\u0002\u002d",
將其它的幾個 string 類型欄位也如法炮製:
{
"mema": {
"mem1": 2,
"mem2": 1048643,
"mem3": "\u00ba\u0038\u00ba\u0093\u00af\u007a\u00da\u00e8\u0019\u0067\u002b\u0089\u00dd\u00d2\u006b\u005c",
"mem4": 11,
"mem5": {
"low": 1695456564,
"high": 11,
"unsigned": true
},
"mem6": 0,
"mem7": "2.2.101.27",
"mem8": 3,
"mem9": {
"low": -1872613904,
"high": 0,
"unsigned": true
}
},
"memb": 0,
"memc": "\u0067\u00c6\u0007\u0021\u005e\u0047\u00ae\u0089\u0025\u0027\u002d\u006d\u00a0\u00f6\u0002\u002d",
"memd": 0,
"meme": {
"low": 22177440,
"high": 0,
"unsigned": true
},
"memf": [
"\u00d1\u005b\u00f3\u0026\u0047\u0008\u00bf\u00c7\u0001\u00e0\u004b\u003d\u00c6\u0024\u0038\u00a3",
"\u0031\u0095\u0044\u00f3\u002f\u0032\u001b\u0096\u0078\u0065\u006b\u0082\u00fd\u00b8\u0095\u0060",
"\u009a\u0075\u0017\u0035\u00fc\u00ca\u00e6\u006f\u0074\u0086\u00e9\u00fa\u00dc\u006a\u009f\u00ab",
"\u0028\u004c\u00eb\u00bf\u0036\u00e0\u001d\u0057\u005c\u00a6\u0093\u00de\u0039\u001b\u007a\u007d",
"\u003e\u000b\u0043\u009c\u0062\u00a5\u00a4\u0001\u00c3\u00ff\u00cf\u0000\u0032\u0099\u00bc\u007e",
"\u00f6\u00b9\u0097\u0046\u009c\u00e6\u0095\u0055\u0052\u00d3\u00f5\u000b\u006c\u00a3\u008e\u00b1",
"\u0098\u0052\u00e7\u00f1\u0025\u0030\u00cb\u006b\u007a\u00a0\u0055\u0069\u00fb\u00cd\u000a\u005c",
"\u00d3\u0033\u0033\u00b1\u00d5\u0016\u00d8\u0068\u0039\u0038\u00f3\u0007\u00bf\u00fe\u00d4\u00c0",
"\u00a6\u0046\u000c\u00df\u0028\u0074\u0048\u006a\u000b\u00c0\u00ed\u00f1\u006f\u0051\u00b5\u009e",
"\u001e\u00ee\u00e6\u0079\u005b\u00f1\u0008\u0032\u00d5\u00a7\u00fc\u004f\u0060\u00cf\u0048\u00ab",
"\u00c4\u0046\u0096\u0063\u00f6\u00a4\u0087\u00cd\u00fc\u003f\u00d5\u0060\u0028\u005c\u000e\u00a4"
]
}
使用 pbjs 編碼新的 json 文件嘗試:
> pbjs query_md5.proto --encode query_md5 < resp.uni.json > resp.uni.bin
> xxd resp.uni.bin
0000000: 0a40 0802 10c3 8040 1a19 c2ba 38c2 bac2 .@[email protected]...
0000010: 93c2 af7a c39a c3a8 1967 2bc2 89c3 9dc3 ...z.....g+.....
0000020: 926b 5c20 0b28 b4ba baa8 b601 3000 3a0a .k\ .(......0.:.
0000030: 322e 322e 3130 312e 3237 4003 48f0 db88 [email protected]...
0000040: 8309 1000 1a15 67c3 8607 215e 47c2 aec2 ......g...!^G...
0000050: 8925 272d 6dc2 a0c3 b602 2d20 0028 a0cd .%'-m.....- .(..
0000060: c90a 3217 c391 5bc3 b326 4708 c2bf c387 ..2...[..&G.....
0000070: 01c3 a04b 3dc3 8624 38c2 a332 1731 c295 ...K=..$8..2.1..
0000080: 44c3 b32f 321b c296 7865 6bc2 82c3 bdc2 D../2...xek.....
0000090: b8c2 9560 321a c29a 7517 35c3 bcc3 8ac3 ...`2...u.5.....
00000a0: a66f 74c2 86c3 a9c3 bac3 9c6a c29f c2ab .ot........j....
00000b0: 3216 284c c3ab c2bf 36c3 a01d 575c c2a6 2.(L....6...W\..
00000c0: c293 c39e 391b 7a7d 3218 3e0b 43c2 9c62 ....9.z}2.>.C..b
00000d0: c2a5 c2a4 01c3 83c3 bfc3 8f00 32c2 99c2 ............2...
00000e0: bc7e 321b c3b6 c2b9 c297 46c2 9cc3 a6c2 .~2.......F.....
00000f0: 9555 52c3 93c3 b50b 6cc2 a3c2 8ec2 b132 .UR.....l......2
0000100: 17c2 9852 c3a7 c3b1 2530 c38b 6b7a c2a0 ...R....%0..kz..
0000110: 5569 c3bb c38d 0a5c 3219 c393 3333 c2b1 Ui.....\2...33..
0000120: c395 16c3 9868 3938 c3b3 07c2 bfc3 bec3 .....h98........
0000130: 94c3 8032 17c2 a646 0cc3 9f28 7448 6a0b ...2...F...(tHj.
0000140: c380 c3ad c3b1 6f51 c2b5 c29e 3218 1ec3 ......oQ....2...
0000150: aec3 a679 5bc3 b108 32c3 95c2 a7c3 bc4f ...y[...2......O
0000160: 60c3 8f48 c2ab 3219 c384 46c2 9663 c3b6 `..H..2...F..c..
0000170: c2a4 c287 c38d c3bc 3fc3 9560 285c 0ec2 ........?..`(\..
0000180: a4 .
新版本看起來比之前有一些變化:
縮短了一些,然而伺服器仍然報相同的錯誤。
事實證明這個方案不可行,使用 string 類型替換 bytes 類型這個方向走到頭兒了。
解決方案
既然必需使用 bytes 類型,而 pbjs 又有問題,那有沒有其它轉換工具呢?
protobufjs
一般的 pbjs help 輸出如下:
> pbjs
Usage: pbjs [options] <schema_path>
Options:
-V, --version output the version number
--es5 <js_path> Generate ES5 JavaScript code
--es6 <js_path> Generate ES6 JavaScript code
--ts <ts_path> Generate TypeScript code
--decode <msg_type> Decode standard input to JSON
--encode <msg_type> Encode standard input to JSON
-h, --help output usage information
無意間我的 pbjs 輸出了下麵的信息:
> pbjs
protobuf.js v1.1.2 CLI for JavaScript
Translates between file formats and generates static code.
-t, --target Specifies the target format. Also accepts a path to require a custom target.
json JSON representation
json-module JSON representation as a module
proto2 Protocol Buffers, Version 2
proto3 Protocol Buffers, Version 3
static Static code without reflection (non-functional on its own)
static-module Static code without reflection as a module
-p, --path Adds a directory to the include path.
--filter Set up a filter to configure only those messages you need and their dependencies to compile, this will effectively reduce the final file size
Set A json file path, Example of file content: {"messageNames":["mypackage.messageName1", "messageName2"] }
-o, --out Saves to a file instead of writing to stdout.
--sparse Exports only those types referenced from a main file (experimental).
Module targets only:
-w, --wrap Specifies the wrapper to use. Also accepts a path to require a custom wrapper.
default Default wrapper supporting both CommonJS and AMD
commonjs CommonJS wrapper
amd AMD wrapper
es6 ES6 wrapper (implies --es6)
closure A closure adding to protobuf.roots where protobuf is a global
--dependency Specifies which version of protobuf to require. Accepts any valid module id
-r, --root Specifies an alternative protobuf.roots name.
-l, --lint Linter configuration. Defaults to protobuf.js-compatible rules:
eslint-disable block-scoped-var, id-length, no-control-regex, no-magic-numbers, no-prototype-builtins, no-redeclare, no-shadow, no-var, sort-vars
--es6 Enables ES6 syntax (const/let instead of var)
Proto sources only:
--keep-case Keeps field casing instead of converting to camel case.
--alt-comment Turns on an alternate comment parsing mode that preserves more comments.
Static targets only:
--no-create Does not generate create functions used for reflection compatibility.
--no-encode Does not generate encode functions.
--no-decode Does not generate decode functions.
--no-verify Does not generate verify functions.
--no-convert Does not generate convert functions like from/toObject
--no-delimited Does not generate delimited encode/decode functions.
--no-typeurl Does not generate getTypeUrl function.
--no-beautify Does not beautify generated code.
--no-comments Does not output any JSDoc comments.
--no-service Does not output service classes.
--force-long Enforces the use of 'Long' for s-/u-/int64 and s-/fixed64 fields.
--force-number Enforces the use of 'number' for s-/u-/int64 and s-/fixed64 fields.
--force-message Enforces the use of message instances instead of plain objects.
--null-defaults Default value for optional fields is null instead of zero value.
usage: pbjs [options] file1.proto file2.json ... (or pipe) other | pbjs [options] -
原來有兩個 pbjs,一個是 npm install pbjs 所得,一個是 npm install protobufjs[-cli] 所得,後者是用來生成處理 protobuf 數據的 javascript 代碼的。
如果先安裝了一個,另外一個就會報錯:
$ sudo npm install protobufjs -g
npm ERR! code EEXIST
npm ERR! path /usr/local/bin/pbjs
npm ERR! EEXIST: file already exists
npm ERR! File exists: /usr/local/bin/pbjs
npm ERR! Remove the existing file and try again, or run npm
npm ERR! with --force to overwrite files recklessly.
npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2023-09-24T03_19_13_647Z-debug-0.log
需要卸載之前安裝的才行。網上搜索 pbjs 關鍵字,有的講的是第一種,有的講的是第二種,原因就是安裝的包不同,千萬不要將這二者混為一談。
有一種方法可以同時保有兩者,就是將另外一個安裝在本地:
> npm install protobufjs-cli
added 84 packages in 2m
> ls node_modules/
acorn brace-expansion entities esutils inherits lodash minimatch protobufjs strip-json-comments underscore
acorn-jsx catharsis escape-string-regexp fast-levenshtein js2xmlparser long minimist @protobufjs supports-color word-wrap
ansi-styles chalk escodegen fs.realpath jsdoc lru-cache mkdirp protobufjs-cli tmp wrappy
argparse color-convert eslint-visitor-keys glob @jsdoc markdown-it once requizzle type-check xmlcreate
@babel color-name espree graceful-fs klaw markdown-it-anchor optionator rimraf @types yallist
balanced-match concat-map esprima has-flag levn marked path-is-absolute semver uc.micro
bluebird deep-is estraverse inflight linkify-it mdurl prelude-ls source-map uglify-js
> find . -type f -name "pbjs"
./node_modules/protobufjs-cli/bin/pbjs
> ./node_modules/protobufjs-cli/bin/pbjs
protobuf.js v1.1.2 CLI for JavaScript
Translates between file formats and generates static code.
......
usage: pbjs [options] file1.proto file2.json ... (or pipe) other | pbjs [options] -
缺點是只能用下麵的方式引用了:
> ./node_modules/protobufjs-cli/bin/pbjs
關於 protobufjs,主要關註它將 proto 消息轉換為 json 描述的格式以便 js 代碼直接使用:
> ./node_modules/protobufjs-cli/bin/pbjs -t json query_md5.proto > query_md5.json
> cat query_md5.json
{{
"nested": {
"common": {
"fields": {
"mem1": {
"rule": "required",
"type": "uint32",
"id": 1
},
"mem2": {
"rule": "required",
"type": "uint32",
"id": 2
},
"mem3": {
"rule": "required",
"type": "bytes",
"id": 3
},
"mem4": {
"rule": "required",
"type": "uint32",
"id": 4
},
"mem5": {
"rule": "required",
"type": "uint64",
"id": 5
},
"mem6": {
"type": "uint32",
"id": 6
},
"mem7": {
"type": "bytes",
"id": 7
},
"mem8": {
"type": "uint32",
"id": 8
},
"mem9": {
"type": "uint64",
"id": 9
}
}
},
"query_md5": {
"fields": {
"mema": {
"rule": "required",
"type": "common",
"id": 1
},
"memb": {
"rule": "required",
"type": "uint32",
"id": 2
},
"memc": {
"rule": "required",
"type": "bytes",
"id": 3
},
"memd": {
"rule": "required",
"type": "uint32",
"id": 4
},
"meme": {
"rule": "required",
"type": "uint64",
"id": 5
},
"memf": {
"rule": "repeated",
"type": "bytes",
"id": 6
}
}
}
}
稍後會用到。
javascript
無論是 protobufjs 還是 pbjs,都可以根據 proto 文件生成 javascript 代碼,回顧 pbjs 的幫助信息:
> pbjs
Usage: pbjs [options] <schema_path>
Options:
-V, --version output the version number
--es5 <js_path> Generate ES5 JavaScript code
--es6 <js_path> Generate ES6 JavaScript code
--ts <ts_path> Generate TypeScript code
--decode <msg_type> Decode standard input to JSON
--encode <msg_type> Encode standard input to JSON
-h, --help output usage information
主要是通過 --es5/6 選項來實現,protobufjs 也有類似選項,這裡出於描述方便,統一使用 pbjs 說明。
通過運行 js 代碼來將 binary 數據轉換為 json,也不失為一種解決方案。參考網上的帖子,得到下麵的 js 代碼:
let pbroot = require("protobufjs").Root;
let json = require("./query_md5.json");
let root = pbroot.fromJSON(json);
// console.log (root);
var fs = require('fs');
fs.readFile('./tmp/resp.bin', function (err, data) {
if (err) {
console.log(err);
} else {
console.log(data);
console.log(data.length + ' bytes');
let Message = root.lookupType("query_md5");
try{
let message = Message.decode(data);
console.log(message);
}catch(e){
console.log(e);
}
}
});
註意第 2 行中的 query_md5.json 文件就是上一節中通過 protobufjs 生成的。對上面的代碼做個簡單說明:
- 載入 query_md5.json 中定義的 proto 類型 (query_md5)
- 讀取 binary 數據 (tmp/resp.bin) 併進行解析
- 輸出解析結果
運行 js 代碼得到下麵的輸出:
> node index.js
<Buffer 0a 37 08 02 10 c3 80 40 1a 10 ba 38 ba 93 af 7a da e8 19 67 2b 89 dd d2 6b 5c 20 0b 28 b4 ba ba a8 b6 01 30 00 3a 0a 32 2e 32 2e 31 30 31 2e 32 37 40 ... 232 more bytes>
282 bytes
query_md5 {
memf: [
<Buffer d1 5b f3 26 47 08 bf c7 01 e0 4b 3d c6 24 38 a3>,
<Buffer 31 95 44 f3 2f 32 1b 96 78 65 6b 82 fd b8 95 60>,
<Buffer 9a 75 17 35 fc ca e6 6f 74 86 e9 fa dc 6a 9f ab>,
<Buffer 28 4c eb bf 36 e0 1d 57 5c a6 93 de 39 1b 7a 7d>,
<Buffer 3e 0b 43 9c 62 a5 a4 01 c3 ff cf 00 32 99 bc 7e>,
<Buffer f6 b9 97 46 9c e6 95 55 52 d3 f5 0b 6c a3 8e b1>,
<Buffer 98 52 e7 f1 25 30 cb 6b 7a a0 55 69 fb cd 0a 5c>,
<Buffer d3 33 33 b1 d5 16 d8 68 39 38 f3 07 bf fe d4 c0>,
<Buffer a6 46 0c df 28 74 48 6a 0b c0 ed f1 6f 51 b5 9e>,
<Buffer 1e ee e6 79 5b f1 08 32 d5 a7 fc 4f 60 cf 48 ab>,
<Buffer c4 46 96 63 f6 a4 87 cd fc 3f d5 60 28 5c 0e a4>
],
mema: common {
mem1: 2,
mem2: 1048643,
mem3: <Buffer ba 38 ba 93 af 7a da e8 19 67 2b 89 dd d2 6b 5c>,
mem4: 11,
mem5: Long { low: 1695456564, high: 11, unsigned: true },
mem6: 0,
mem7: <Buffer 32 2e 32 2e 31 30 31 2e 32 37>,
mem8: 3,
mem9: Long { low: -1872613904, high: 0, unsigned: true }
},
memb: 0,
memc: <Buffer 67 c6 07 21 5e 47 ae 89 25 27 2d 6d a0 f6 02 2d>,
memd: 0,
meme: Long { low: 22177440, high: 0, unsigned: true }
}
<Buffer 0a 37 08 02 10 c3 80 40 1a 10 ba 38 ba 93 af 7a da e8 19 67 2b 89 dd d2 6b 5c 20 0b 28 b4 ba ba a8 b6 01 30 00 3a 0a 32 2e 32 2e 31 30 31 2e 32 37 40 ... 232 more bytes>
能正確的解析 binary 數據。對代碼稍加改動:
...
let buffer= Message.encode(Message.create(message)).finish();
console.log (buffer);
fs.writeFile('./resp.bin', buffer, function (err) {
if (err) {
console.log(err);
} else {
console.log('success');
}
});
...
將解析後的數據 (message) 再編碼為二進位 (buffer) 並輸出到文件 (resp.bin):
...
<Buffer 0a 37 08 02 10 c3 80 40 1a 10 ba 38 ba 93 af 7a da e8 19 67 2b 89 dd d2 6b 5c 20 0b 28 b4 ba ba a8 b6 01 30 00 3a 0a 32 2e 32 2e 31 30 31 2e 32 37 40 ... 52 more bytes>
success
> xxd resp.bin
0000000: 0a37 0802 10c3 8040 1a10 ba38 ba93 af7a [email protected]
0000010: dae8 1967 2b89 ddd2 6b5c 200b 28b4 baba ...g+...k\ .(...
0000020: a8b6 0130 003a 0a32 2e32 2e31 3031 2e32 ...0.:.2.2.101.2
0000030: 3740 0348 f0db 8883 0910 001a 1067 c607 [email protected]..
0000040: 215e 47ae 8925 272d 6da0 f602 2d20 0028 !^G..%'-m...- .(
0000050: a0cd c90a 3210 d15b f326 4708 bfc7 01e0 ....2..[.&G.....
0000060: 4b3d c624 38a3 3210 3195 44f3 2f32 1b96 K=.$8.2.1.D./2..
0000070: 7865 6b82 fdb8 9560 3210 9a75 1735 fcca xek....`2..u.5..
0000080: e66f 7486 e9fa dc6a 9fab 3210 284c ebbf .ot....j..2.(L..
0000090: 36e0 1d57 5ca6 93de 391b 7a7d 3210 3e0b 6..W\...9.z}2.>.
00000a0: 439c 62a5 a401 c3ff cf00 3299 bc7e 3210 C.b.......2..~2.
00000b0: f6b9 9746 9ce6 9555 52d3 f50b 6ca3 8eb1 ...F...UR...l...
00000c0: 3210 9852 e7f1 2530 cb6b 7aa0 5569 fbcd 2..R..%0.kz.Ui..
00000d0: 0a5c 3210 d333 33b1 d516 d868 3938 f307 .\2..33....h98..
00000e0: bffe d4c0 3210 a646 0cdf 2874 486a 0bc0 ....2..F..(tHj..
00000f0: edf1 6f51 b59e 3210 1eee e679 5bf1 0832 ..oQ..2....y[..2
0000100: d5a7 fc4f 60cf 48ab 3210 c446 9663 f6a4 ...O`.H.2..F.c..
0000110: 87cd fc3f d560 285c 0ea4 ...?.`(\..
與原始數據做個對比:
完全一致!看起來這種方法可行,只是有些麻煩。
protoc
說到通過 proto 文件編解碼二進位數據,最拿手的就不應該是 protobuf 自帶的 protoc 工具嗎?
$ ./protoc --help
Usage: ./protoc [OPTION] PROTO_FILES
Parse PROTO_FILES and generate output based on the options given:
-IPATH, --proto_path=PATH Specify the directory in which to search for
imports. May be specified multiple times;
directories will be searched in order. If not
given, the current working directory is used.
--version Show version info and exit.
-h, --help Show this text and exit.
--encode=MESSAGE_TYPE Read a text-format message of the given type
from standard input and write it in binary
to standard output. The message type must
be defined in PROTO_FILES or their imports.
--decode=MESSAGE_TYPE Read a binary message of the given type from
standard input and write it in text format
to standard output. The message type must
be defined in PROTO_FILES or their imports.
--decode_raw Read an arbitrary protocol message from
standard input and write the raw tag/value
pairs in text format to standard output. No
PROTO_FILES should be given when using this
flag.
-oFILE, Writes a FileDescriptorSet (a protocol buffer,
--descriptor_set_out=FILE defined in descriptor.proto) containing all of
the input files to FILE.
--include_imports When using --descriptor_set_out, also include
all dependencies of the input files in the
set, so that the set is self-contained.
--include_source_info When using --descriptor_set_out, do not strip
SourceCodeInfo from the FileDescriptorProto.
This results in vastly larger descriptors that
include information about the original
location of each decl in the source file as
well as surrounding comments.
--dependency_out=FILE Write a dependency output file in the format
expected by make. This writes the transitive
set of input file paths to FILE
--error_format=FORMAT Set the format in which to print errors.
FORMAT may be 'gcc' (the default) or 'msvs'
(Microsoft Visual Studio format).
--print_free_field_numbers Print the free field numbers of the messages
defined in the given proto files. Groups share
the same field number space with the parent
message. Extension ranges are counted as
occupied fields numbers.
--plugin=EXECUTABLE Specifies a plugin executable to use.
Normally, protoc searches the PATH for
plugins, but you may specify additional
executables not in the path using this flag.
Additionally, EXECUTABLE may be of the form
NAME=PATH, in which case the given plugin name
is mapped to the given executable even if
the executable's own name differs.
--cpp_out=OUT_DIR Generate C++ header and source.
--csharp_out=OUT_DIR Generate C# source file.
--java_out=OUT_DIR Generate Java source file.
--javanano_out=OUT_DIR Generate Java Nano source file.
--js_out=OUT_DIR Generate JavaScript source.
--objc_out=OUT_DIR Generate Objective C header and source.
--php_out=OUT_DIR Generate PHP source file.
--python_out=OUT_DIR Generate Python source file.
--ruby_out=OUT_DIR Generate Ruby source file.
說乾就乾:
> ./protoc --decode=query_md5 query_md5.proto < tmp/resp.bin > resp.pb
[libprotobuf WARNING ../../src/google/protobuf/compiler/parser.cc:546] No syntax specified for the proto file: query_md5.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
> cat resp.pb
mema {
mem1: 2
mem2: 1048643
mem3: "\2728\272\223\257z\332\350\031g+\211\335\322k\\"
mem4: 11
mem5: 48940096820
mem6: 0
mem7: "2.2.101.27"
mem8: 3
mem9: 2422353392
}
memb: 0
memc: "g\306\007!^G\256\211%\'-m\240\366\002-"
memd: 0
meme: 22177440
memf: "\321[\363&G\010\277\307\001\340K=\306$8\243"
memf: "1\225D\363/2\033\226xek\202\375\270\225`"
memf: "\232u\0275\374\312\346ot\206\351\372\334j\237\253"
memf: "(L\353\2776\340\035W\\\246\223\3369\033z}"
memf: ">\013C\234b\245\244\001\303\377\317\0002\231\274~"
memf: "\366\271\227F\234\346\225UR\323\365\013l\243\216\261"
memf: "\230R\347\361%0\313kz\240Ui\373\315\n\\"
memf: "\32333\261\325\026\330h98\363\007\277\376\324\300"
memf: "\246F\014\337(tHj\013\300\355\361oQ\265\236"
memf: "\036\356\346y[\361\0102\325\247\374O`\317H\253"
memf: "\304F\226c\366\244\207\315\374?\325`(\\\016\244"
生成的文件並非 json 格式,屬於 protobuf 定義的一種通用文本格式。將它原封不動的 encode 回去:
> ./protoc --encode=query_md5 query_md5.proto < resp.pb > resp.bin
[libprotobuf WARNING ../../src/google/protobuf/compiler/parser.cc:546] No syntax specified for the proto file: query_md5.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
> xxd resp.bin
0000000: 0a37 0802 10c3 8040 1a10 ba38 ba93 af7a [email protected]
0000010: dae8 1967 2b89 ddd2 6b5c 200b 28b4 baba ...g+...k\ .(...
0000020: a8b6 0130 003a 0a32 2e32 2e31 3031 2e32 ...0.:.2.2.101.2
0000030: 3740 0348 f0db 8883 0910 001a 1067 c607 [email protected]..
0000040: 215e 47ae 8925 272d 6da0 f602 2d20 0028 !^G..%'-m...- .(
0000050: a0cd c90a 3210 d15b f326 4708 bfc7 01e0 ....2..[.&G.....
0000060: 4b3d c624 38a3 3210 3195 44f3 2f32 1b96 K=.$8.2.1.D./2..
0000070: 7865 6b82 fdb8 9560 3210 9a75 1735 fcca xek....`2..u.5..
0000080: e66f 7486 e9fa dc6a 9fab 3210 284c ebbf .ot....j..2.(L..
0000090: 36e0 1d57 5ca6 93de 391b 7a7d 3210 3e0b 6..W\...9.z}2.>.
00000a0: 439c 62a5 a401 c3ff cf00 3299 bc7e 3210 C.b.......2..~2.
00000b0: f6b9 9746 9ce6 9555 52d3 f50b 6ca3 8eb1 ...F...UR...l...
00000c0: 3210 9852 e7f1 2530 cb6b 7aa0 5569 fbcd 2..R..%0.kz.Ui..
00000d0: 0a5c 3210 d333 33b1 d516 d868 3938 f307 .\2..33....h98..
00000e0: bffe d4c0 3210 a646 0cdf 2874 486a 0bc0 ....2..F..(tHj..
00000f0: edf1 6f51 b59e 3210 1eee e679 5bf1 0832 ..oQ..2....y[..2
0000100: d5a7 fc4f 60cf 48ab 3210 c446 9663 f6a4 ...O`.H.2..F.c..
0000110: 87cd fc3f d560 285c 0ea4 ...?.`(\..
與原始數據做個對比:
也能對得上!不過這種方案的缺點是 pb 文件不能交給 jq 命令處理,後期集成時工作量會大不少。
問題根因
標準的 pbjs 命令其實是一個鏈接:
> which pbjs
/usr/local/bin/pbjs
> ls -lh /usr/local/bin/pbjs
lrwxrwxrwx 1 root root 31 Sep 24 11:33 /usr/local/bin/pbjs -> ../lib/node_modules/pbjs/cli.js
> ls /usr/local/lib/node_modules/pbjs/
cli.js index.d.ts node_modules/ test.js test.proto.js
cli.ts index.js package.json test.proto test.proto.ts
generate.js index.ts protocol-buffers-schema.d.ts test.proto.es5.js test.ts
generate.ts LICENSE.md README.md test.proto.es6.js tsconfig.json
對應的是 cli.js 文件,出於好奇,查看了一下它是如何處理 bytes 類型的 encode 的,這主要位於 generate.js 文件:
function encodeValue(name, buffer, value, nested = 'nested') {
let type;
let write;
switch (name) {
case 'bool':
type = TYPE_VAR_INT;
write = [`writeByte(${buffer}, ${value} ? 1 : 0)`];
break;
case 'bytes':
type = TYPE_SIZE_N;
write = [`writeVarint32(${buffer}, ${value}.length), writeBytes(${buffer}, ${value})`];
break;
case 'int32':
type = TYPE_VAR_INT;
write = [`writeVarint64(${buffer}, intToLong(${value}))`];
break;
case 'int64':
type = TYPE_VAR_INT;
write = [`writeVarint64(${buffer}, ${value})`];
break;
case 'string':
type = TYPE_SIZE_N;
write = [`writeString(${buffer}, ${value})`];
break;
....
}
return { type, write };
}
為了突出重點代碼有刪減。對比 bytes 類型與其它類型,發現它會首先 encode 一個數組的長度,然後才是數組內容。
數組內容的寫入是由一個 writeBytes 的常式負責的:
lines.push(`function writeBytes(bb${ts('ByteBuffer')}, buffer${ts('Uint8Array')})${ts('void')} {`);
lines.push(` ${varOrLet} offset = grow(bb, buffer.length);`);
lines.push(` bb.bytes.set(buffer, offset);`);
lines.push(`}`);
看它的實現,首先增長底層緩存區以確保可以容納數組,然後一整個寫入進去。
還記得 pbjs decode 二進位數據後的形式嗎?這裡回顧一下:
"mem3":{"type":"Buffer","data":[186,56,186,147,175,122,218,232,25,103,43,137,221,210,107,92]},
數據是包在一個 object 里的,而這裡要求的是直接的數組類型,會不會是這一步出現了匹配問題?
將 pbjs 反解二進位數據得到的 json 稍加修改,去掉包在 bytes 數據外面的對象:
> jq -c '.' resp.json
{"mema":{"mem1":2,"mem2":1048643,"mem3":[186,56,186,147,175,122,218,232,25,103,43,137,221,210,107,92],"mem4":11,"mem5":{"low":1695456564,"high":11,"unsigned":true},"mem6":0,"mem7":[50,46,50,46,49,48,49,46,50,55],"mem8":3,"mem9":{"low":-1872613904,"high":0,"unsigned":true}},"memb":0,"memc":[103,198,7,33,94,71,174,137,37,39,45,109,160,246,2,45],"memd":0,"meme":{"low":22177440,"high":0,"unsigned":true},"memf":[[209,91,243,38,71,8,191,199,1,224,75,61,198,36,56,163],[49,149,68,243,47,50,27,150,120,101,107,130,253,184,149,96],[154,117,23,53,252,202,230,111,116,134,233,250,220,106,159,171],[40,76,235,191,54,224,29,87,92,166,147,222,57,27,122,125],[62,11,67,156,98,165,164,1,195,255,207,0,50,153,188,126],[246,185,151,70,156,230,149,85,82,211,245,11,108,163,142,177],[152,82,231,241,37,48,203,107,122,160,85,105,251,205,10,92],[211,51,51,177,213,22,216,104,57,56,243,7,191,254,212,192],[166,70,12,223,40,116,72,106,11,192,237,241,111,81,181,158],[30,238,230,121,91,241,8,50,213,167,252,79,96,207,72,171],[196,70,150,99,246,164,135,205,252,63,213,96,40,92,14,164]]}
再對這個 json 進行編碼:
> pbjs query_md5.proto --encode query_md5 < resp.json > resp.bin
> xxd resp.bin
0000000: 0a37 0802 10c3 8040 1a10 ba38 ba93 af7a [email protected]
0000010: dae8 1967 2b89 ddd2 6b5c 200b 28b4 baba ...g+...k\ .(...
0000020: a8b6 0130 003a 0a32 2e32 2e31 3031 2e32 ...0.:.2.2.101.2
0000030: 3740 0348 f0db 8883 0910 001a 1067 c607 [email protected]..
0000040: 215e 47ae 8925 272d 6da0 f602 2d20 0028 !^G..%'-m...- .(
0000050: a0cd c90a 3210 d15b f326 4708 bfc7 01e0 ....2..[.&G.....
0000060: 4b3d c624 38a3 3210 3195 44f3 2f32 1b96 K=.$8.2.1.D./2..
0000070: 7865 6b82 fdb8 9560 3210 9a75 1735 fcca xek....`2..u.5..
0000080: e66f 7486 e9fa dc6a 9fab 3210 284c ebbf .ot....j..2.(L..
0000090: 36e0 1d57 5ca6 93de 391b 7a7d 3210 3e0b 6..W\...9.z}2.>.
00000a0: 439c 62a5 a401 c3ff cf00 3299 bc7e 3210 C.b.......2..~2.
00000b0: f6b9 9746 9ce6 9555 52d3 f50b 6ca3 8eb1 ...F...UR...l...
00000c0: 3210 9852 e7f1 2530 cb6b 7aa0 5569 fbcd 2..R..%0.kz.Ui..
00000d0: 0a5c 3210 d333 33b1 d516 d868 3938 f307 .\2..33....h98..
00000e0: bffe d4c0 3210 a646 0cdf 2874 486a 0bc0 ....2..F..(tHj..
00000f0: edf1 6f51 b59e 3210 1eee e679 5bf1 0832 ..oQ..2....y[..2
0000100: d5a7 fc4f 60cf 48ab 3210 c446 9663 f6a4 ...O`.H.2..F.c..
0000110: 87cd fc3f d560 285c 0ea4 ...?.`(\..
看起來有戲!與原始數據做個對比:
完全一致!
結語
本文記敘了 protobuf 的 js 工具 pbjs 在遇到 bytes 類型時編解碼方面的一些問題,通過幾次嘗試最終找到了三種解決方案:
- 使用 pbjs & protobufjs 生成 js 代碼將 json 編碼為二進位數據
- 使用 protoc 編碼 pb 文本為二進位數據
- 修改解碼後的 json,去掉 bytes 數組外包的 object 層,使用 pbjs 編碼修改後的 json 為二進位數據
方案 I 稍微複雜一點;方案 II 的 pb 文本不通用,特別是不能傳遞給下游 jq 做事先處理;方案 III 兼顧了便利性與相容性,是最優解。
特別是修改 json 去掉 objet 包裹層這一工作,對於 jq 來說就是手到擒來:
local req=$(jq -c ".mema.mem3=${mem3}|.mema.mem4=${mem4}|.mema.mem5.low=${mem5_lo}|.mema.mem5.high=${mem5_hi}|.mema.mem7=${mem7}|.mema.mem8=${mem8}|.mema.mem9.low=${mem9_lo}|.mema.mem9.high=${mem9_hi}|.memc=${memc}" query_md5.json)
jq 首先讀取原始 json (resp.json),然後通過層級管道對各個欄位進行賦值 (json 只是一個模板,沒有請求需要的數據),在賦值過程中,對於 bytes 類型,通過直接設置以下形式的值:
[186,56,186,147,175,122,218,232,25,103,43,137,221,210,107,92]
來將預設的 object 替換為位元組數組。jq 變數的方式也能替換值,但是在更改欄位類型時遇到了一些困難,像下麵這樣:
local req=$(jq --arg mm3 "[${mem3}]" --arg mm4 "${mem4}" --arg mm5h "${mem5_hi}" --arg mm5o "${mem5_lo}" --arg mm7 "[${mem7}]" --arg mm8 "${mem8}" --arg mm9 "${mem9}" --arg mmc "[${memc}]" -c '{ mema: { mem1 : .mema.mem1, mem2: .mema.mem2, mem3: $mm3, mem4: $mm4, mem5: { low: $mm5_lo, high: $mm5_hi, unsigned: true }, mem6: .mema.mem6, mem7: $mm7, mem8: $mm8, mem9: { low: $mm9, high: 0, unsigned: true } }, memb: .memb, memc: $mmc, memd: .memd, meme: .meme, memf: .memf }' query_md5.json)
更新後的 json 會變成這樣:
req={"mema":{"mem1":2,"mem2":1048642,"mem3":"[186,56,186,147,175,122,218,232,25,103,43,137,221,210,107,92]","mem4":"11","mem5":{"low":"1695625406","high":"11","unsigned":true},"mem6":0,"mem7":"2.2.101.27","mem8":"3","mem9":{"low":"2422353392","high":0,"unsigned":true}},"memb":0,"memc":"[103,198,7,33,94,71,174,137,37,39,45,109,160,246,2,45]","memd":"0",...}
發現所有位元組數組外面都套了雙引號變字元串了!再加上這種方式比較繁瑣,就不推薦了。
後記
根因定位的過程有一些潦草了,記得當時確實是看到了相關可疑的點,不過後來複盤的時候,卻怎麼也回憶不起來是哪裡引發了懷疑,所以就將就看吧,哈哈。
現在回過頭來看,這應該是 pbjs 的一個 bug,在將 Uint8Array 解碼時,使用了 wrapper 類直接寫入,導致有 object 層包裹,而在編碼時又只接收純 bytes 數組,最終導致數據匹配不上沒有編入二進位結果中。
如果僅使用 pbjs 生成的 js/ts 代碼,應該不受影響,甚至直接使用 protoc 生成 pb 文件也是正常的,只在使用 pbjs 將二進位數據和 json 之間轉換時才會出現上面問題,希望 pbjs 的作者能早日修複這個問題。
參考
[2]. protobufjs
[3]. node.js讀本地文件
本文來自博客園,作者:goodcitizen,轉載請註明原文鏈接:https://www.cnblogs.com/goodcitizen/p/solution_about_pbjs_encode_bytes_data_failed_problem.html