wsgi 協議 前言 本來沒打算這麼早就學習 wsgi 的,因為想要學習python 是如何處理網路請求的繞不開 wsgi,所以只好先學習一下 wsgi。先對 wsgi 有個印象,到了學習 Django 運行方式以及如何處理網路請求數據的時候就會感覺很順暢了。本文參考 "" 什麼是 WSGI wsg ...
wsgi 協議
前言
本來沒打算這麼早就學習 wsgi 的,因為想要學習python 是如何處理網路請求的繞不開 wsgi,所以只好先學習一下 wsgi。先對 wsgi 有個印象,到了學習 Django 運行方式以及如何處理網路請求數據的時候就會感覺很順暢了。本文參考
什麼是 WSGI
wsgi 的全稱是Web Server Gateway Interface,這是一個規範,描述了 web server 如何與 web application 交互、web application 如何處理請求。該規範的具體描述在 PEP3333。WSGI 既要實現 web server,也要實現 web application。在 Django 中的 app 其實就是 web application,而 web server其實在使用命令行輸入python manage.py runserver
或者使用 pycharm 開啟 Django 項目的時候就把runserver
當做參數傳給了 manage.py
裡面
經過判斷然後執行execute_from_command_line(sys.argv)
,sys.argv就是 runserver命令,進入該函數,發現執行了utility.execute()
函數,進入函數查看源碼:
def execute(self):
"""
Given the command-line arguments, this figures out which subcommand is
being run, creates a parser appropriate to that command, and runs it.
"""
try:
subcommand = self.argv[1]
except IndexError:
subcommand = 'help' # Display help if no arguments were given.
# Preprocess options to extract --settings and --pythonpath.
# These options could affect the commands that are available, so they
# must be processed early.
parser = CommandParser(None, usage="%(prog)s subcommand [options] [args]", add_help=False)
parser.add_argument('--settings')
parser.add_argument('--pythonpath')
parser.add_argument('args', nargs='*') # catch-all
try:
options, args = parser.parse_known_args(self.argv[2:])
handle_default_options(options)
except CommandError:
pass # Ignore any option errors at this point.
try:
settings.INSTALLED_APPS
except ImproperlyConfigured as exc:
self.settings_exception = exc
if settings.configured:
# Start the auto-reloading dev server even if the code is broken.
# The hardcoded condition is a code smell but we can't rely on a
# flag on the command class because we haven't located it yet.
if subcommand == 'runserver' and '--noreload' not in self.argv:
try:
autoreload.check_errors(django.setup)()
except Exception:
# The exception will be raised later in the child process
# started by the autoreloader. Pretend it didn't happen by
# loading an empty list of applications.
apps.all_models = defaultdict(OrderedDict)
apps.app_configs = OrderedDict()
apps.apps_ready = apps.models_ready = apps.ready = True
# Remove options not compatible with the built-in runserver
# (e.g. options for the contrib.staticfiles' runserver).
# Changes here require manually testing as described in
# #27522.
_parser = self.fetch_command('runserver').create_parser('django', 'runserver')
_options, _args = _parser.parse_known_args(self.argv[2:])
for _arg in _args:
self.argv.remove(_arg)
# In all other cases, django.setup() is required to succeed.
else:
django.setup()
self.autocomplete()
if subcommand == 'help':
if '--commands' in args:
sys.stdout.write(self.main_help_text(commands_only=True) + '\n')
elif len(options.args) < 1:
sys.stdout.write(self.main_help_text() + '\n')
else:
self.fetch_command(options.args[0]).print_help(self.prog_name, options.args[0])
# Special-cases: We want 'django-admin --version' and
# 'django-admin --help' to work, for backwards compatibility.
elif subcommand == 'version' or self.argv[1:] == ['--version']:
sys.stdout.write(django.get_version() + '\n')
elif self.argv[1:] in (['--help'], ['-h']):
sys.stdout.write(self.main_help_text() + '\n')
else:
self.fetch_command(subcommand).run_from_argv(self.argv)
源碼太長了。。。我把關鍵地方摳出來:
if settings.configured:
# Start the auto-reloading dev server even if the code is broken.
# The hardcoded condition is a code smell but we can't rely on a
# flag on the command class because we haven't located it yet.
if subcommand == 'runserver' and '--noreload' not in self.argv:
try:
autoreload.check_errors(django.setup)()
except Exception:
# The exception will be raised later in the child process
# started by the autoreloader. Pretend it didn't happen by
# loading an empty list of applications.
apps.all_models = defaultdict(OrderedDict)
apps.app_configs = OrderedDict()
apps.apps_ready = apps.models_ready = apps.ready = True
# Remove options not compatible with the built-in runserver
# (e.g. options for the contrib.staticfiles' runserver).
# Changes here require manually testing as described in
# #27522.
_parser = self.fetch_command('runserver').create_parser('django', 'runserver')
_options, _args = _parser.parse_known_args(self.argv[2:])
for _arg in _args:
self.argv.remove(_arg)
# In all other cases, django.setup() is required to succeed.
else:
django.setup()
這裡也是註釋最多的地方,可以看到有runserver
這條命令,然後這裡面在經過一系列的判斷最後要執行最後一行代碼:
self.fetch_command(subcommand).run_from_argv(self.argv)
這行代碼等學習 Django 處理流程的時候在詳細解釋,反正只要知道目前經過這個函數的執行,Django 的 web server 成功運行了。
實現了 WSGI 的模塊/庫有 wsgiref(python 內置,下麵也是用這個來舉例)、werkzeug.serving、twisted.web等。
當前運行在 wsgi 之上的 web 框架有 Bottle、Flask、Django 等。WSGI server 所做的工作僅僅是將客戶端收到的請求傳遞給 WSGI application,然後將 WSGI application 的返回值作為相應傳給客戶端。WSGI application 可以是棧式的,這個棧的中間部分叫做中間件
,兩端是必須要實現的 application 和 server。所以對客戶端來說,中間件扮演伺服器;對伺服器來說,中間件扮演客戶端。在 Django 中wsgi 收到的數據用 request對象表示,要傳給客戶端的數據用 Httpresponse對象表示。
搭建一個 wsgi 服務
在上章節說了 python 有個內置的 WSGI 庫叫 wsgiref。
首先看下項目結構:
# templates為模板(HTML)文件夾
# start.py 為項目入口,
# urls.py 為路由配置
# views.py 為具體處理路由邏輯代碼
start 文件
# start.py文件
from wsgiref.simple_server import make_server
from urls import urls
def app(env, response):
# 在這裡,
print(env)
route = env['PATH_INFO']
print(response)
# 設置狀態碼與響應頭
response('200 OK', [('Content-type', 'text/html')])
# 設置錯誤處理
data = urls['/error']()
# 設置路由處理
if route in urls:
data = urls[route]()
# 返回二進位響應體
return [data]
if __name__ == '__main__':
# 創建伺服器對象
server = make_server('', 8808, app)
print('服務:http://localhost:8808')
# 服務保持運行狀態
server.serve_forever()
# WSGI server 是一個 web server,其處理一個 HTTP 請求的邏輯如下:
# iterable = app(env, response)
# for date in iterable:
# send data to client
其實這個模塊底層使用了 sockserver 模塊,我前面的博客也有介紹。經過 make_server
就成功開啟了wsgi server
,然後server_forever()
是為了將伺服器持續接收客戶端請求,採用的是輪詢方法,該方法裡面的參數 poll_interval=0.5
,採用的是0.5秒輪詢一次,輪詢採用的是 selector
學名叫多路復用技術。
urls 文件
# urls.py文件
from views import *
urls = {
'/index': index, # 函數地址
'/error': error
}
該文件就是處理路由的,然後將對應的路由映射到相應的邏輯處理函數。
views 文件
# 處理請求的功能函數(處理結果返回的都是頁面 => 功能函數)
# 利用 jinja2來渲染模板,將後臺數據傳給前臺
from jinjia2 import Template
# 處理主頁請求
def index():
with open('templates/index.html', 'r') as f:
dt = f.read()
tem = Template(dt)
# 將後臺數據通過模板渲染功能渲染傳給前臺頁面
resp = tem.render(name='主頁')
return resp.encode('utf-8')
# 處理圖標請求
def ico():
with open('favicon.ico', 'rb') as f:
dt = f.read()
return dt
# 處理錯誤請求
def error():
return b'404'
templates
該文件夾裡面放的偽要返回給前端相關資源,比如index.html
測試
- index 測試
- error 測試
WSGI application介面
在上面wsgi 服務中的 app 就是 wsgi 中的 application,該介面應該實現為一個可調用對象,例如函數、方法、類、含__call__
方法的實例。這個可調用對象可以接收兩個參數:
- 一個字典,該字典可以包含了客戶端請求的信息以及其他信息,可以認為是請求上下文,一般叫做 environment(在這裡我取名為 env);
- 一個用於發送 HTTP 狀態碼與響應頭的回調函數。(具體怎麼回調的還不清楚)
同時,可調用對象的返回值是響應體(response body),響應正文是可迭代的、並包含了多個字元串。(加了中括弧可以減少迭代次數,提高效率)
把上面的 app 代碼拷下來:
def app(env, response):
# 在這裡,
print(env)
route = env['PATH_INFO']
print(response)
# 設置狀態碼與響應頭
response('200 OK', [('Content-type', 'text/html')])
# 設置錯誤處理
data = urls['/error']()
# 設置路由處理
if route in urls:
data = urls[route]()
# 返回二進位響應體
return [data]
當我對服務端發起請求時,會列印出 env,如下:
{'PATH': '/Users/jingxing/Virtualenv/py3-env1/bin:/Users/jingxing/.nvm/versions/node/v4.9.1/bin:/Library/Frameworks/Python.framework/Versions/3.6/bin:/python_study/mongodb/bin://Volumes/python_study/mongodb/bin:/Library/Frameworks/Python.framework/Versions/3.6/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Applications/VMware Fusion.app/Contents/Public:/python_study/Applications/mongodb-osx-x86_64-3.6.3/bin::/usr/local/mysql/bin', 'PS1': '(py3-env1) ', 'VERSIONER_PYTHON_VERSION': '2.7', 'LS_OPTIONS': '--color=auto', 'LOGNAME': 'jingxing', 'XPC_SERVICE_NAME': 'com.jetbrains.pycharm.23248', 'PWD': '/Users/jingxing/django_project/day01', 'PYCHARM_HOSTED': '1', 'NODE_PATH': '/Users/jingxing/.nvm/versions/node/v4.9.1/lib/node_modules', 'PYCHARM_MATPLOTLIB_PORT': '62845', 'PYTHONPATH': '/Users/jingxing/django_project/day01:/Users/jingxing/django_project/day04:/Users/jingxing/django_project/day02:/Users/jingxing/PycharmProjects/youku/youkusecond:/Users/jingxing/django_project/day03:/Applications/PyCharm.app/Contents/helpers/pycharm_matplotlib_backend', 'NVM_CD_FLAGS': '', 'NVM_DIR': '/Users/jingxing/.nvm', 'SHELL': '/bin/bash', 'LSCOLORS': 'CxfxcxdxbxegedabagGxGx', 'PYTHONIOENCODING': 'UTF-8', 'VERSIONER_PYTHON_PREFER_32_BIT': 'no', 'USER': 'jingxing', 'CLICOLOR': 'Yes', 'TMPDIR': '/var/folders/yl/3drd7wf93f90sfkgpc2zg9cr0000gn/T/', 'SSH_AUTH_SOCK': '/private/tmp/com.apple.launchd.ujA3r16JUC/Listeners', 'VIRTUAL_ENV': '/Users/jingxing/Virtualenv/py3-env1', 'XPC_FLAGS': '0x0', 'PYTHONUNBUFFERED': '1', '__CF_USER_TEXT_ENCODING': '0x1F5:0x0:0x0', 'Apple_PubSub_Socket_Render': '/private/tmp/com.apple.launchd.gOrXw3Il2u/Render', 'LC_CTYPE': 'en_US.UTF-8', 'NVM_BIN': '/Users/jingxing/.nvm/versions/node/v4.9.1/bin', 'HOME': '/Users/jingxing', 'SERVER_NAME': 'jingxingdeMacBook-Pro.local', 'GATEWAY_INTERFACE': 'CGI/1.1', 'SERVER_PORT': '8808', 'REMOTE_HOST': '', 'CONTENT_LENGTH': '', 'SCRIPT_NAME': '', 'SERVER_PROTOCOL': 'HTTP/1.1', 'SERVER_SOFTWARE': 'WSGIServer/0.2', 'REQUEST_METHOD': 'GET', 'PATH_INFO': '/', 'QUERY_STRING': '', 'REMOTE_ADDR': '127.0.0.1', 'CONTENT_TYPE': 'text/plain', 'HTTP_HOST': '127.0.0.1:8808', 'HTTP_CONNECTION': 'keep-alive', 'HTTP_UPGRADE_INSECURE_REQUESTS': '1', 'HTTP_USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36', 'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate, br', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7', 'HTTP_COOKIE': 'csrftoken=YjPgsyb6TW4fen2fxjy6DHzZYFlBU4SsAuE9AVqWRjLIhymeAlukqjVBpL7KTPPH', 'wsgi.input': <_io.BufferedReader name=7>, 'wsgi.errors': <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, 'wsgi.version': (1, 0), 'wsgi.run_once': False, 'wsgi.url_scheme': 'http', 'wsgi.multithread': True, 'wsgi.multiprocess': False, 'wsgi.file_wrapper': <class 'wsgiref.util.FileWrapper'>}
這些參數值得關註的為:
- PATH_INFO:路由信息;
- SERVER_PORT:埠;
- HTTP_HOST:ip;
- SERVER_PROTOCOL:伺服器端通信協議
可迭代響應
在 app 中向客戶端返回數據時,寫的為
return [data]
,如果改為return date
,這將會導致 WSGI 程式的響應變慢。原因是字元串date
也是可迭代的,它的每一次迭代只能得到 1bytes 的數據量,這也意味著每一次只向客戶端發送1bytes 的數據,直到發送完畢為止。所以推薦使用return [data]
。這裡的數據是怎麼返回的目前還不清楚,保留疑問。。。
如果可迭代響應含有多個字元串,那麼Content-Length
應該是這些字元串長度之和。
解析 GET 請求
運行 start.py文件,在瀏覽器中訪問http://localhost:8808/?id=1&name=musibii
,可以在響應內容中找到到:
'QUERY_STRING': 'id=1&name=musibii'
'REQUEST_METHOD': 'GET'
cgi.parse_qs()
函數可以很方便的處理 QUERY_STRING,同時需要cgi.escape()
處理特殊字元以防止腳本註入,如下:
from cgi import parse_qs, escape
QUERY_STRING = 'id=1&name=musibii'
d = parse_qs(QUERY_STRING)
print(d.get('id', [''])[0]) # ['']是預設值,如果在QUERY_STRING中沒找到則返回預設值
print(d.get('name',[]))
print(escape('<script>alert(123);</script>'))
運行結果:
1
['musibii']
<script>alert(123);</script>
處理 GET 請求的動態網頁
from wsgiref.simple_server import make_server
from cgi import parse_qs, escape
# html中 form 的 method 預設為 get,action 是當前頁面
html = '''
<html>
<body>
<form method="get" action="">
<p>
Age: <input type="text" name="age" value="%(age)s">
</p>
<p>
Hobbies:
<input
name="hobbies" type="checkbox" value="software"
%(checked-software)s
> Software
<input
name="hobbies" type="checkbox" value="tunning"
%(checked-tunning)s
> Auto Tunning
</p>
<p>
<input type="submit" value="Submit">
</p>
</form>
<p>
Age: %(age)s<br>
Hobbies: %(hobbies)s
</p>
</body>
</html>
'''
def app(env, response):
# 解析QUERY_STRING
d = parse_qs(env['QUERY_STRING'])
age = d.get('age', [''])[0] # 返回 age 對應的值
hobbies = d.get('hobbies', []) # 以 list 形式返回所有的 hobbies
# 防止腳本註入
age = escape(age)
hobbies = [escape(hobby) for hobby in hobbies]
response_body = html% {
'checked-software': ('', 'checket')['software' in hobbies],
'checked-tunning': ('', 'checked')['tunning' in hobbies],
'age': age or 'Empty',
'hobbies': ','.join(hobbies or ['No Hobbies?'])
}
status = '200 OK'
response_body = [
('Content-Type', 'text/html'),
('Content-Length', str(len(response_body)))
]
start_response(status, response_headers)
return [response_body]
httpd = make_server('', 8088, app)
httpd.serve_forever()
處理 POST 請求的動態網頁
對於POST 請求,查詢字元串是放在 HTTP 請求正文(request body)末尾的,不是顯式在 url 中。請求正文在 env 字典變數中鍵為wsgi.input
對應的值中,這是一個類似 file 的變數:
'wsgi.input': <_io.BufferedReader name=7>
我看源碼看暈了還是沒找到這個 name 具體是什麼意思,經過 google 猜測這個應該是個標識符。
from wsgiref.simple_server import make_server
from cgi import parse_qs, escape
# html中form的method是post
html = """
<html>
<body>
<form method="post" action="">
<p>
Age: <input type="text" name="age" value="%(age)s">
</p>
<p>
Hobbies:
<input
name="hobbies" type="checkbox" value="software"
%(checked-software)s
> Software
<input
name="hobbies" type="checkbox" value="tunning"
%(checked-tunning)s
> Auto Tunning
</p>
<p>
<input type="submit" value="Submit">
</p>
</form>
<p>
Age: %(age)s<br>
Hobbies: %(hobbies)s
</p>
</body>
</html>
"""
def application(environ, start_response):
# CONTENT_LENGTH 可能為空,或者沒有
try:
request_body_size = int(environ.get('CONTENT_LENGTH', 0))
except (ValueError):
request_body_size = 0
request_body = environ['wsgi.input'].read(request_body_size)
d = parse_qs(request_body)
# 獲取數據
age = d.get('age', [''])[0]
hobbies = d.get('hobbies', [])
# 轉義,防止腳本註入
age = escape(age)
hobbies = [escape(hobby) for hobby in hobbies]
response_body = html % {
'checked-software': ('', 'checked')['software' in hobbies],
'checked-tunning': ('', 'checked')['tunning' in hobbies],
'age': age or 'Empty',
'hobbies': ', '.join(hobbies or ['No Hobbies?'])
}
status = '200 OK'
response_headers = [
('Content-Type', 'text/html'),
('Content-Length', str(len(response_body)))
]
start_response(status, response_headers)
return [response_body]
httpd = make_server('localhost', 8051, application)
httpd.serve_forever()
中間件
中間件位於 WSGI server 和 WSGI application 之間。所以對客戶端來說,中間件扮演伺服器;對伺服器來說,中間件扮演客戶端。在 Django 中wsgi 收到的數據用 request對象表示,要傳給客戶端的數據用 Httpresponse對象表示。
示例:
from wsgiref.simple_server import make_server
def application(environ, start_response):
response_body = 'hello world!'
status = '200 OK'
response_headers = [
('Content-Type', 'text/plain'),
('Content-Length', str(len(response_body)))
]
start_response(status, response_headers)
return [response_body]
# 中間件
class Upperware:
def __init__(self, app):
self.wrapped_app = app
def __call__(self, environ, start_response):
for data in self.wrapped_app(environ, start_response):
yield data.upper()
wrapped_app = Upperware(application)
httpd = make_server('localhost', 8051, wrapped_app)
httpd.serve_forever()