1.主要學習這程式的編寫思路 a.讀取解釋網站 b.找到相關頁 c.找到圖片鏈接的元素 d.保存圖片到文件夾 ..... 將每一個步驟都分解出來,然後用函數去實現,代碼易讀性高. ##代碼儘快運行時會報錯,還須修改 import urllib.request import os def url_op
1.主要學習這程式的編寫思路
a.讀取解釋網站
b.找到相關頁
c.找到圖片鏈接的元素
d.保存圖片到文件夾
.....
將每一個步驟都分解出來,然後用函數去實現,代碼易讀性高.
##代碼儘快運行時會報錯,還須修改
import urllib.request import os def url_open(url): #讀取解釋 req = urllib.request.Request(url) # req.add_header(\'User-Agent\',\'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36\') response = urllib.request.urlopen(req) html = response.read() return html def get_page(url): #找到相關頁 html = url_open(url) a = html.find('current-comment-page') b = html.find(a) return html[a:b] def find_imgs(url): #找到圖片鏈接的元素 html = url_open(url) img_addrs = [] a = html.find('img src=') while a != -1: b = html.find('.jpg',a,a +255') if b !=-1: img_addrs.append(html[a+9:b+4]) else: b = a +9 a = html.find('img src=',b') return img_addrs def save_imgs(folder, img_addrs): #保存圖片到文件夾 for each in img_addrs: filename = each.split('\'/\'') with open(filename,'wb') as f: img =url_open(each) f.write(img) def download_mm(folder='OOXX',pages=10): os.mkdir(folder) os.chdir(folder) url = 'http://jandan.net/ooxx/' page_num = int(get_page(url)) for i in range(pages): page_num -= i page_url = url + 'page-' + str(page_num) + '#comments' img_addrs = find_imgs(page_url) save_imgs(img_addrs) if __name__ == '__main__': download_mm()