一、事件背景大家好，我是馬哥python說。演員張天愛於2022.8.25號在網上爆出一段音頻 "慣犯，希望所以女孩擦亮眼睛。" 至今已有2.5億次觀看量，瞬間衝上熱搜。二、微熱點分析以下數據來源：微熱點從輿情分析網站上來看，從熱度指數的變化趨勢來看，"張天愛"的熱度在08月25日22時達 ...

一、事件背景

大家好，我是馬哥python說。

演員張天愛於2022.8.25號在網上爆出一段音頻 "慣犯，希望所以女孩擦亮眼睛。"

至今已有2.5億次觀看量，瞬間衝上熱搜。

二、微熱點分析

以下數據來源：微熱點

從輿情分析網站上來看，從熱度指數的變化趨勢來看，"張天愛"的熱度在08月25日22時達到了92.56的峰值。張天愛-熱度指數趨勢

"張天愛"全網熱度：張天愛-熱度分析

"張天愛"網路媒體的評價指標：張天愛-媒體分析

"張天愛"關鍵詞分析：張天愛-關鍵詞分析

"張天愛"地域分析：張天愛-地域分析

二、自開發Python輿情分析

2.1 Python爬蟲

從博文URL地址中找出id。

目標鏈接地址的id參數值就是id：

原文查看

把id帶入到我的Python爬蟲代碼中，下麵展示部分爬蟲代碼。

關鍵邏輯，就是max_id的處理：

原文查看

如果是第一頁，不用傳max_id參數。

如果非第一頁，需要傳max_id參數，它的值來自於上一頁的r.json()['data']['max_id']

首先，向頁面發送請求：

r = requests.get(url, headers=headers)  # 發送請求
print(r.status_code)  # 查看響應碼
print(r.json())  # 查看響應內容

下麵，是解析數據的處理邏輯：

datas = r.json()['data']['data']
for data in datas:
	page_list.append(page)
	id_list.append(data['id'])
	dr = re.compile(r'<[^>]+>', re.S)  # 用正則表達式清洗評論數據
	text2 = dr.sub('', data['text'])
	text_list.append(text2)  # 評論內容
	time_list.append(trans_time(v_str=data['created_at']))  # 評論時間
	like_count_list.append(data['like_count'])  # 評論點贊數
	source_list.append(data['source'])  # 評論者IP歸屬地
	user_name_list.append(data['user']['screen_name'])  # 評論者姓名
	user_id_list.append(data['user']['id'])  # 評論者id
	user_gender_list.append(tran_gender(data['user']['gender']))  # 評論者性別
	follow_count_list.append(data['user']['follow_count'])  # 評論者關註數
	followers_count_list.append(data['user']['followers_count'])  # 評論者粉絲數

最後，是保存數據的處理邏輯：

df = pd.DataFrame(
	{
		'id': [weibo_id] * len(time_list),
		'評論頁碼': page_list,
		'評論id': id_list,
		'評論時間': time_list,
		'評論點贊數': like_count_list,
		'評論者IP歸屬地': source_list,
		'評論者姓名': user_name_list,
		'評論者id': user_id_list,
		'評論者性別': user_gender_list,
		'評論者關註數': follow_count_list,
		'評論者粉絲數': followers_count_list,
		'評論內容': text_list,
	}
)
if os.path.exists(v_comment_file):  # 如果文件存在，不再設置表頭
	header = False
else:  # 否則，設置csv文件表頭
	header = True
# 保存csv文件
df.to_csv(v_comment_file, mode='a+', index=False, header=header, encoding='utf_8_sig')
print('結果保存成功:{}'.format(v_comment_file))

篇幅有限，請求頭、cookie、迴圈頁碼、數據清洗等其他細節不再贅述。

看下最終數據：爬取結果

2.2 可視化大屏

首先，看下最終大屏交互效果：

這個大屏，包含了5個圖表：

大標題-Line
詞雲圖-Wordcloud
條形圖-Bar
餅圖-Pie
地圖-Map

下麵，依次講解代碼實現。

2.2.1 大標題

由於pyecharts組件沒有專門用作標題的圖表，我決定靈活運用Line組件實現大標題。

line3 = (
	Line(init_opts=opts.InitOpts(width="1000px",  # 寬度
	                             height="625px",  # 高度
	                             bg_color={"type": "pattern", "image": JsCode("img"),
	                                       "repeat": "repeat", }))  # 設置背景圖片
		.add_xaxis([None])  # 插入空數據
		.add_yaxis("", [None])  # 插入空數據
		.set_global_opts(
		title_opts=opts.TitleOpts(title=v_title,
		                          pos_left='center',
		                          title_textstyle_opts=opts.TextStyleOpts(font_size=45,
		                                                                  color='#51c2d5',
		                                                                  align='left'),
		                          pos_top='top'),
		yaxis_opts=opts.AxisOpts(is_show=False),  # 不顯示y軸
		xaxis_opts=opts.AxisOpts(is_show=False))  # 不顯示x軸
)
# 設置背景圖片
line3.add_js_funcs(
	"""
	var img = new Image(); img.src = '大屏背景.jpg';
	"""
)
line3.render('大標題.html')
print('頁面渲染完畢:大標題.html')

這裡最關鍵的邏輯，就是背景圖片的處理。我找了一個張天愛的圖片：大屏背景

然後用add_js_funcs代碼把此圖片設置為整個大屏的背景圖。

大標題效果：

2.2.2 詞雲圖

首先，把評論數據清洗出來：

cmt_list = df['評論內容'].values.tolist()  # 轉換成列表
cmt_list = [str(i) for i in cmt_list]  # 數據清洗
cmt_str = ' '.join(cmt_list)  # 轉換成字元串

然後，將清洗後的數據，帶入詞雲圖函數，核心代碼：

wc = WordCloud(init_opts=opts.InitOpts(width=chart_width, height=chart_height, theme=theme_config, chart_id='wc1'))
wc.add(series_name="辭彙",
       data_pair=data,
       word_gap=1,
       word_size_range=[5, 30],
       mask_image='張天愛背景圖.png',
       )  # 增加數據
wc.set_global_opts(
	title_opts=opts.TitleOpts(pos_left='center',
	                          title="張天愛評論-詞雲圖",
	                          title_textstyle_opts=opts.TextStyleOpts(font_size=20)  # 設置標題
	                          ),
	tooltip_opts=opts.TooltipOpts(is_show=True),  # 不顯示工具箱
)
wc.render('張天愛詞雲圖.html')  # 生成html文件
print('渲染完成:' + '張天愛詞雲圖.html')

看下效果：詞雲圖

2.2.3 條形圖

針對評論數據的TOP10高頻詞，繪製出條形圖。
核心代碼：

bar = Bar(
	init_opts=opts.InitOpts(theme=theme_config, width=chart_width, height=chart_height,
	                        chart_id='bar_cmt'))  # 初始化條形圖
bar.add_xaxis(x_data)  # 增加x軸數據
bar.add_yaxis("數量", y_data)  # 增加y軸數據
bar.reversal_axis()  # 設置水平方向
bar.set_series_opts(label_opts=opts.LabelOpts(position="right"))  # Label出現位置
bar.set_global_opts(
	legend_opts=opts.LegendOpts(pos_left='right'),
	title_opts=opts.TitleOpts(title=v_title, pos_left='center'),  # 標題
	toolbox_opts=opts.ToolboxOpts(is_show=False, ),  # 不顯示工具箱
	xaxis_opts=opts.AxisOpts(name="數量", axislabel_opts={"rotate": 0}),  # x軸名稱
	yaxis_opts=opts.AxisOpts(name="關鍵詞",
	                         axislabel_opts=opts.LabelOpts(font_size=9, rotate=0),  # y軸名稱
	                         ))
bar.render(v_title + ".html")  # 生成html文件
print('渲染完成:' + v_title + '.html')

看下效果：條形圖

2.2.4 餅圖（玫瑰圖）

首先，針對評論數據，用snownlp庫做情感分析判定。

for comment in v_cmt_list:
	tag = ''
	sentiments_score = SnowNLP(comment).sentiments
	if sentiments_score < 0.4:  # 情感分小於0.4判定為消極
		tag = '消極'
		neg_count += 1
	elif 0.4 <= sentiments_score <= 0.6:  # 情感分在[0.4,0.6]直接判定為中性
		tag = '中性'
		mid_count += 1
	else:  # 情感分大於0.6判定為積極
		tag = '積極'
		pos_count += 1
	score_list.append(sentiments_score)  # 得分值
	tag_list.append(tag)  # 判定結果
df['情感得分'] = score_list
df['分析結果'] = tag_list

然後，將統計數據帶入餅圖函數，部分核心代碼：

# 畫餅圖
pie = (
	Pie(init_opts=opts.InitOpts(theme=theme_config, width=chart_width, height=chart_width, chart_id='pie1'))
		.add(series_name="情感分佈",  # 系列名稱
	         data_pair=[['正能量', pos_count],  # 添加數據
	                    ['中性', mid_count],
	                    ['負能量', neg_count]],
	         rosetype="radius",  # 是否展示成南丁格爾圖
	         radius=["30%", "55%"],  # 扇區圓心角展現數據的百分比，半徑展現數據的大小
	         )  # 加入數據
		.set_global_opts(  # 全局設置項
		title_opts=opts.TitleOpts(title=v_title, pos_left='center'),  # 標題
		legend_opts=opts.LegendOpts(pos_left='right', orient='vertical')  # 圖例設置項,靠右,豎向排列
	)
		.set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c}")))  # 樣式設置項
pie.render(v_title + '.html')  # 生成html文件
print('渲染完成:' + v_title + '.html')

看下效果：

2.2.5 地圖

把評論者的IP歸屬地統計求和，求和後的總數分佈在地圖上。

df['評論者IP歸屬地'] = df['評論者IP歸屬地'].astype(str).str.replace('來自', '')  # 數據清洗
loc_grp = df.groupby('評論者IP歸屬地').count()['評論內容']
data_list = list(zip(loc_grp.index.tolist(), loc_grp.values.tolist()))

數據準備好之後，帶入地圖函數，部分核心代碼：

f_map = (
	Map(init_opts=opts.InitOpts(width=chart_width,
	                            height=chart_height,
	                            theme=theme_config,
	                            page_title=v_title,
	                            chart_id='map1',
	                            bg_color=None))
		.add(series_name="評論數量",
	         data_pair=v_data_list,
	         maptype="china",  # 地圖類型
	         is_map_symbol_show=False)
		.set_global_opts(
		title_opts=opts.TitleOpts(title=v_title,
		                          pos_left="center", ),
		legend_opts=opts.LegendOpts(  # 設置圖例
			is_show=True, pos_top="40px", pos_right="30px"),
		visualmap_opts=opts.VisualMapOpts(  # 設置視覺映射
			is_piecewise=True, range_text=['高', '低'], pieces=[  # 分段顯示
				# {"min": 10000, "color": "#751d0d"},
				{"min": 121, "max": 150, "color": "#37561a"},
				{"min": 91, "max": 120, "color": "#006400"},
				{"min": 61, "max": 90, "color": "#4d9116"},
				{"min": 31, "max": 60, "color": "#77bb40"},
				{"min": 11, "max": 30, "color": "#b8db9b"},
				{"min": 0, "max": 10, "color": "#e5edd6"}
			]),
	)
		.set_series_opts(label_opts=opts.LabelOpts(is_show=True, font_size=8, ),
	                     markpoint_opts=opts.MarkPointOpts(
		                     symbol_size=[90, 90], symbol='circle'),
	                     effect_opts=opts.EffectOpts(is_show='True', )
	                     )
)
f_map.render(v_title + '.html')
print('渲染完成:' + v_title + '.html')