Python 爬蟲 (五)_ZenDei技術網路在線

Python 爬蟲 (五)

-Advertisement-

# 頭條街拍圖片爬取 1 import re 2 import requests 3 from urllib import request 4 import json 5 import os 6 i = 0 7 headers = { 8 'user-agent': 'Mozilla/5.0 (Wi... ...

 # 頭條街拍圖片爬取


 1 import re
 2 import requests
 3 from urllib import request
 4 import json
 5 import os
 6 i = 0
 7 headers = {
 8     'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
 9 }
10 while True:
11     pag_all_url = 'https://www.toutiao.com/search_content/?offset={}&format=json&keyword=%E8%A1%97%E6%8B%8D&autoload=true&count=20&cur_tab=1&from=search_tab'
12     i += 20
13     full_pag_url = pag_all_url.format(i)
14     # print(full_pag_url) # 街拍的首頁 根據不同的i來請求ajax,從而獲得所有的街拍網址 像這樣https://www.toutiao.com/search/?keyword=%E8%A1%97%E6%8B%8D
15     pag_html = requests.get(full_pag_url,headers = headers).text
16     pag_html_str = str(json.loads(pag_html))
17     # print(pag_html_str) #把網頁轉化為字元串 進行正則匹配
18     img_pag_id = re.findall(r'\'item_source_url\': \'\/group\/(\d*)\/\',',pag_html_str)
19     # print(img_pag_id) #獲得每個街拍的url like this--->https://www.toutiao.com/a6590127156037157379/
20     for l in img_pag_id:    #圖片下載
21         img_all_url = 'https://www.toutiao.com/a{}'
22         full_url = img_all_url.format(l)
23         # print(full_url)#圖片的url  print(full_pag_url)#圖片所在的url
24         html = requests.get(full_url,headers=headers).text
25         pattern = r'gallery: JSON\.parse\((.*)\),'
26         ans1 = re.search(pattern,html)
27         try:
28             ans1_str = json.loads(ans1[1])
29             ans1_dic = json.loads(ans1_str)
30             # print(ans1_dic)
31             # if not os.path.exists('1'):
32             #     os.mkdir('1')
33             for q in ans1_dic['sub_images']:
34                 img_url = q['url']
35                 print(img_url)
36                 filename = '1/' + img_url.split('/')[-1] + '.jpg'
37                 request.urlretrieve(img_url, filename)
38         except:continue

您的分享是我們最大的動力!

-Advertisement-

更多相關文章

排序查找

$array[$j]){ $temp = $array[$i]; $array[$i] = $array[$j]; $array[$j] = $temp; } } } return $ar... ...
【leetcode 簡單】第三十七題兩數之和 II - 輸入有序數組

給定一個已按照升序排列的有序數組，找到兩個數使得它們相加之和等於目標數。函數應該返回這兩個下標值index1 和 index2，其中 index1 必須小於 index2。說明: 返回的下標值（index1 和 index2）不是從零開始的。你可以假設每個輸入只對應唯一的答案，而且你不可以重 ...
Java定時任務解決方案

很多項目中都會遇到需要定時任務的情況，本篇文章就結合了Spring中以及SpringBoot、SpringCloud中定時任務的解決方案。在Spring中使用定時器用SpringBoot比較多的同學可能都會覺得Spring的xml配置確實比較麻煩，如果想在Spring中使用定時器的話其實是必須使 ...
我的第一個python web開發框架（30）——定製ORM（七）

幾個複雜的ORM方式都已介紹完了，剩下一些常用的刪除、獲取記錄數量、統計合計數、獲取最大值、獲取最小值等方法我就不一一詳細介紹了，直接給出代碼大家自行查看。 1 #!/usr/bin/env python 2 # coding=utf-8 3 4 from common import db_help ...
【leetcode 簡單】第三十六題相交鏈表

編寫一個程式，找到兩個單鏈表相交的起始節點。例如，下麵的兩個鏈表：在節點 c1 開始相交。註意：如果兩個鏈表沒有交點，返回 null. 在返回結果後，兩個鏈表仍須保持原有的結構。可假定整個鏈表結構中沒有迴圈。程式儘量滿足 O(n) 時間複雜度，且僅用 O(1) 記憶體。 ...
LeetCode 1 Two Sum

題目 Given an array of integers, return indices of the two numbers such that they add up to a specific target. You may assume that each input would have ...
【MML】華為MML AAA介面聯調，Java版本

1、我們先設置一些常量數據 2、創建對應的信息vo載體 3、創建編碼解碼器，進行報文的編碼解碼（關鍵，劃重點哦，特別是校驗和的計算） 4、創建對應的成幀器，來獲取發送每一幀信息 5、根據模板模式，設計模板類，用來與MML伺服器通信 6、發送指令操作直接調用（各個地方的某些欄位可能不同，這個參考常量 ...
針對AttributeError: ‘module’ object has no attribute’xxx’的錯誤歸類

目前遇見的有三種類型： ...