python3-cookbook筆記：第二章（字元串和文本）

-Advertisement-

python3-cookbook中每個小節以問題、解決方案和討論三個部分探討了Python3在某類問題中的最優解決方式，或者說是探討Python3本身的數據結構、函數、類等特性在某類問題上如何更好地使用。這本書對於加深Python3的使用和提升Python編程能力的都有顯著幫助，特別是對怎麼提高Py ...

python3-cookbook中每個小節以問題、解決方案和討論三個部分探討了Python3在某類問題中的最優解決方式，或者說是探討Python3本身的數據結構、函數、類等特性在某類問題上如何更好地使用。這本書對於加深Python3的理解和提升Python編程能力的都有顯著幫助，特別是對怎麼提高Python程式的性能會有很好的幫助，如果有時間的話強烈建議看一下。

本文為學習筆記，文中的內容只是根據自己的工作需要和平時使用寫了書中的部分內容，並且文中的示例代碼大多直接貼的原文代碼，當然，代碼都在Python3.6的環境上都驗證過了的。有興趣的可以去看全文。

python3-cookbook：https://python3-cookbook.readthedocs.io/zh_CN/latest/index.html

2.1 使用多個界定符分割字元串

一般字元串的分割用str.split足以勝任，但是在複雜的文本中查找分割字元串，正則表達式是無疑是首選的工具，re模塊也有一個分割字元串的函數split，需要註意的是正則表達式中如果有括弧分組的話，分組的結果也會在結果列表中。

>>> import re
>>> line = 'asdf fjdk; afed, fjek,asdf, foo'
>>> re.split(r'[;,\s]\s*', line)
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']
>>> fields = re.split(r'(;|,|\s)\s*', line)  # 分組的內容也會出現在結果里
>>> fields
['asdf', ' ', 'fjdk', ';', 'afed', ',', 'fjek', ',', 'asdf', ',', 'foo']
>>>

2.3 用Shell通配符匹配字元串

當字元串的匹配一般方法不能滿足，但又不想用正則表達式那麼複雜，可以考慮使用fnmatch.fnmatch或fnmatch.fnmatchcase，兩者都可以使用Unix Shell中常用的通配符匹配字元串，區別在於前者使用的是操作系統的大小寫敏感規則，後者則完全按照你寫的內容去匹配。

>>> from fnmatch import fnmatch, fnmatchcase
>>> fnmatch('foo.txt', '*.txt')
True
>>> fnmatch('foo.txt', '?oo.txt')
True
>>> fnmatch('Dat45.csv', 'Dat[0-9]*')
True
>>>

2.13 字元串對齊

字元串對齊是字元串格式化的一部分，對於普通的左對齊，右對齊和居中對齊可以使用字元串的ljust、rjust和center方法，也可以使用內置的format函數和字元串的format方法，文中推薦使用format，因為後者在字元串的格式化功能上更加的豐富和強大。

字元串的對齊工作中似乎並不常用，但我遇到過一個使用場景，就是使用字元串表示的二進位數時，需要用0或1來將字元串補齊為8位或者16位的字元串，這時字元串的對齊功能就排上用場了。

>>> text = 'Hello World'
>>> text.ljust(20)
'Hello World         '
>>> text.rjust(20)
'         Hello World'
>>> text.center(20)
'    Hello World     '
>>> text.rjust(20, '=')
'=========Hello World'
>>> text.center(20, '*')
'****Hello World*****'
>>>

>>> # 格式化字元串
>>> format(text, '>20')
'         Hello World'
>>> format(text, '<20')
'Hello World         '
>>> format(text, '^20')
'    Hello World     '
>>> format(text, '=<20s')
'Hello World========='
>>> format(text, '*^20s')
'****Hello World*****'
>>> # 格式化數字
>>> x = 1.2345
>>> format(x, '^10.2f')
'   1.23   '
>>> # 字元串的format方法
>>> '{:>10s} {:>10s}'.format('Hello', 'World')
'     Hello      World'

2.16 以指定列寬格式化字元串

這個問題在列印信息或者在終端展示信息的時候可能會遇到，此時可以使用textwrap來指定輸出列寬。

>>> import textwrap
>>> s = "Look into my eyes, look into my eyes, the eyes, the eyes, the eyes, not around the eyes, don't look around the eyes, look into my eyes, you're under."
>>> print(textwrap.fill(s, 70))
Look into my eyes, look into my eyes, the eyes, the eyes, the eyes,
not around the eyes, don't look around the eyes, look into my eyes,
you're under.
>>> print(textwrap.fill(s, 40))
Look into my eyes, look into my eyes,
the eyes, the eyes, the eyes, not around
the eyes, don't look around the eyes,
look into my eyes, you're under.
>>> print(textwrap.fill(s, 40, initial_indent='    '))
    Look into my eyes, look into my
eyes, the eyes, the eyes, the eyes, not
around the eyes, don't look around the
eyes, look into my eyes, you're under.
>>> print(textwrap.fill(s, 40, subsequent_indent='    '))
Look into my eyes, look into my eyes,
    the eyes, the eyes, the eyes, not
    around the eyes, don't look around
    the eyes, look into my eyes, you're
    under.
>>>

2.17 在字元串中處理html和xml

在處理HTML或XML文本的時候，想要將如&entity;或&#code;替換為對應的文本，或者反過來操作，只需要使用對應解析器的工具函數即可，當然，如果你比較熟悉對應的解析器的話或許有更好的方法。

>>> import html
>>> s = 'Elements are written as "<tag>text</tag>".'
>>> print(s)
Elements are written as "<tag>text</tag>".
>>> print(html.escape(s))
Elements are written as &quot;&lt;tag&gt;text&lt;/tag&gt;&quot;.
>>> print(html.escape(s, quote=False))
Elements are written as "&lt;tag&gt;text&lt;/tag&gt;".
>>> 
>>> from html.parser import HTMLParser
>>> s = 'Spicy &quot;Jalape&#241;o&quot.'
>>> p = HTMLParser()
>>> p.unescape(s)
'Spicy "Jalapeño".'
>>> 
>>> from xml.sax.saxutils import unescape
>>> t = 'The prompt is &gt;&gt;&gt;'
>>> unescape(t)
'The prompt is >>>'
>>>

2.18 字元串令牌解析

令牌化字元串可以使用正則表達式的命名捕獲分組來進行，語法為“(?P<group_name>)”，這個問題在解析用戶自定義的計算公式字元串時會很有用。

解決這個問題時，可以考慮使用模式對象的scanner方法，並打包到一個生成器中使用。

import re

NAME = r'(?P<NAME>[a-zA-Z_][a-zA-Z_0-9]*)'
NUM = r'(?P<NUM>\d+)'
PLUS = r'(?P<PLUS>\+)'
TIMES = r'(?P<TIMES>\*)'
EQ = r'(?P<EQ>=)'
WS = r'(?P<WS>\s+)'

master_pat = re.compile('|'.join([NAME, NUM, PLUS, TIMES, EQ, WS]))
scanner = master_pat.scanner('foo = 42')
for m in iter(scanner.match, None):
    print(m.lastgroup, m.group())

NAME foo
WS  
EQ =
WS  
NUM 42

您的分享是我們最大的動力!

-Advertisement-

更多相關文章

《Redis5.x入門教程》之準備工作、數據類型

關註公眾號：CoderBuff，回覆“redis”獲取《Redis5.x入門教程》完整版PDF。《Redis5.x入門教程》目錄 "第一章 · 準備工作" "第二章 · 數據類型" 第三章 · 命令第四章 · 配置第五章 · Java客戶端（上）第六章 · 事務第七章 · 分散式鎖第 ...
Java基礎——多線程

Java基礎多線程多個線程一起做同一件事情,縮短時間,提升效率提高資源利用率加快程式響應，提升用戶體驗創建線程 1. 繼承Thread類步驟繼承Thread類，重寫run方法調用的時候，直接new一個對象，然後調start()方法啟動線程特點由於是繼承方式，所以不建議使用，因為J ...
Java各種類

1.Object類 equals方法 2.Date類構造方法成員方法 DateFormat類 Calendar類 3.System類 StringBuilder原理構造方法 toString方法 4.包裝類裝箱&拆箱自動裝箱&自動拆箱基本類型和字元串類型的轉換 5.Collection ...
Java Web 筆記（4）

11、Filter （重點） Filter：過濾器，用來過濾網站的數據；處理中文亂碼登錄驗證…. Filter開發步驟： 1. 導包 2. 編寫過濾器 1. 導包不要錯實現Filter介面，重寫對應的方法即可 3. 在web.xml中配置 Filter 12、監聽器實現一個監聽器的介面；（ ...
Python 模擬登錄幾種常見方法

方法一：直接使用已知的cookie訪問優點：簡單，但需要先在瀏覽器登錄原理：簡單地說，cookie保存在發起請求的客戶端中，伺服器利用cookie來區分不同的客戶端。因為http是一種無狀態的連接，當伺服器一下子收到好幾個請求時，是無法判斷出哪些請求是同一個客戶端發起的。而“訪問登錄後才能看 ...
SpringBoot整合持久層技術--（三）Spring Data JPA

簡介： JPA(java Persistence API)和SpringData是兩個範疇的概念。spring data jpa是spring公司下的spring data項目的一個模塊。 spring data jpa定義了介面來進行持久層的編寫規範，同時還大大簡化了持久層的CRUD操作。從此可 ...
Scala函數式編程（五）函數式的錯誤處理

前情提要 "Scala函數式編程指南（一）函數式思想介紹" "scala函數式編程（二） scala基礎語法介紹" "Scala函數式編程（三） scala集合和函數" "Scala函數式編程（四）函數式的數據結構上" "Scala函數式編程（四）函數式的數據結構下" 1.面向對象的錯誤處理 ...
TChart-圖表編輯器的測試

最近不知怎麼的，想研究一下圖表。先上效果圖：功能代碼： unit Unit1; interface uses Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms, Dialogs, TeeEdit, E ...