python爬蟲-UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte ...
錯誤如下:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
抓取的網頁檢查:
Content-Encoding: gzip
需要做gzip的解壓
request = urllib.request.Request(url = url, headers = request_headers)
reponse = urllib.request.urlopen(request,timeout = timeout)
data = reponse.read()
buff = BytesIO(data)
f = gzip.GzipFile(fileobj=buff)
res = f.read().decode('utf-8')
print(res)
在請求的頭部加入:"Accept-Encoding":"gzip",
如果是下麵:則每次返回有可能是gzip壓縮,有可能不壓縮,WEB 應用乾脆為了遷就 IE 直接輸出原始 DEFLATE
Accept-Encoding: gzip, deflate在請求的頭部加入:
"Accept-Encoding":"gzip",