簡單又高大上的項目 圖形識別、自然語言處理(語言識別、語音轉文字)、文字識別、區塊鏈 1.java實現一個基本的文字識別 引入依賴 <!-- ai 文字識別 --> <dependency> <groupId>com.baidu.aip</groupId> <artifactId>java-sdk< ...
1.問題/需求
在含有多行文字的英文段落或一篇英文中查找匹配含有關鍵字的句子。
例如在以下字元串:
text = '''Today I registered my personal blog in the cnblogs and wrote my first essay. The main problem of this essay is to use python regular expression matching to filter out sentences containing keywords in the paper. To solve this problem, I made many attempts and finally found a regular expression matching method that could meet the requirements through testing. So I've documented the problem and the solution in this blog post and shared it for reference to others who are having the same problem. At the same time, this text is also used to test the feasibility of this matching approach. Some additional related thoughts and ideas will be added to this blog later.'''
中匹配含有’blog‘的句子。
2.解決方法
因為要找出所有含有關鍵字的句子,所以這裡採用re庫中findall()方法。同時,由於所搜索的字元串中含有換行符'\n',因此向re.compilel()傳入re.DOTALL參數,以使'.'字元能夠匹配所有字元,包括換行符'\n'。這樣我們匹配創建Pattern對象為:
newre = re.compile('[A-Z][^.]*blog[^.]*[.]', re.DOTALL)
newre.findall(text) # 進行匹配
# 結果為:
['Today I registered my personal blog in the cnblogs and wrote my first essay.',
"So I've documented the problem and the solution in this blog post and \nshared it for reference to others who are having the same problem.",
'Some additional \nrelated thoughts and ideas will be added to this blog later.'] # 這其中的'\n'就是換行符, 它在字元串中是不顯示的, 但是匹配結果中又顯示出來了