前言 本文的文字及圖片來源於網路,僅供學習、交流使用,不具有任何商業用途,版權歸原作者所有,如有問題請及時聯繫我們以作處理。 時間序列 1、時間序列圖 時間序列圖用於可視化給定指標如何隨時間變化。在這裡,您可以瞭解1949年至1969年之間的航空客運流量如何變化。 # Import Data df ...
前言
本文的文字及圖片來源於網路,僅供學習、交流使用,不具有任何商業用途,版權歸原作者所有,如有問題請及時聯繫我們以作處理。
時間序列
1、時間序列圖
時間序列圖用於可視化給定指標如何隨時間變化。在這裡,您可以瞭解1949年至1969年之間的航空客運流量如何變化。
# Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv') # Draw Plot plt.figure(figsize=(16,10), dpi= 80) plt.plot('date', 'traffic', data=df, color='tab:red') # Decoration plt.ylim(50, 750) xtick_location = df.index.tolist()[::12] xtick_labels = [x[-4:] for x in df.date.tolist()[::12]] plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=0, fontsize=12, horizontalalignment='center', alpha=.7) plt.yticks(fontsize=12, alpha=.7) plt.title("Air Passengers Traffic (1949 - 1969)", fontsize=22) plt.grid(axis='both', alpha=.3) # Remove borders plt.gca().spines["top"].set_alpha(0.0) plt.gca().spines["bottom"].set_alpha(0.3) plt.gca().spines["right"].set_alpha(0.0) plt.gca().spines["left"].set_alpha(0.3) plt.show()
2、帶有標記的時間序列圖
下麵的時間序列繪製了所有的波峰和波谷,並註釋了選定特殊事件的發生。
# Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv') # Get the Peaks and Troughs data = df['traffic'].values doublediff = np.diff(np.sign(np.diff(data))) peak_locations = np.where(doublediff == -2)[0] + 1 doublediff2 = np.diff(np.sign(np.diff(-1*data))) trough_locations = np.where(doublediff2 == -2)[0] + 1 # Draw Plot plt.figure(figsize=(16,10), dpi= 80) plt.plot('date', 'traffic', data=df, color='tab:blue', label='Air Traffic') plt.scatter(df.date[peak_locations], df.traffic[peak_locations], marker=mpl.markers.CARETUPBASE, color='tab:green', s=100, label='Peaks') plt.scatter(df.date[trough_locations], df.traffic[trough_locations], marker=mpl.markers.CARETDOWNBASE, color='tab:red', s=100, label='Troughs') # Annotate for t, p in zip(trough_locations[1::5], peak_locations[::3]): plt.text(df.date[p], df.traffic[p]+15, df.date[p], horizontalalignment='center', color='darkgreen') plt.text(df.date[t], df.traffic[t]-35, df.date[t], horizontalalignment='center', color='darkred') # Decoration plt.ylim(50,750) xtick_location = df.index.tolist()[::6] xtick_labels = df.date.tolist()[::6] plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=90, fontsize=12, alpha=.7) plt.title("Peak and Troughs of Air Passengers Traffic (1949 - 1969)", fontsize=22) plt.yticks(fontsize=12, alpha=.7) # Lighten borders plt.gca().spines["top"].set_alpha(.0) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(.0) plt.gca().spines["left"].set_alpha(.3) plt.legend(loc='upper left') plt.grid(axis='y', alpha=.3) plt.show()
3、自相關(ACF)和部分自相關(PACF)圖
ACF圖顯示了時間序列與其自身滯後的相關性。每條垂直線(在自相關圖上)代表序列與從滯後0開始的滯後之間的相關性。圖中的藍色陰影區域是顯著性水平。藍線以上的那些滯後就是巨大的滯後。
那麼如何解釋呢?
對於AirPassengers,我們看到多達14個滯後已越過藍線,因此意義重大。這意味著,距今已有14年之久的航空客運量對今天的客運量產生了影響。
另一方面,PACF顯示了任何給定的(時間序列)滯後與當前序列之間的自相關,但是去除了兩者之間的滯後。
# Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv") x = df['date'] y1 = df['psavert'] y2 = df['unemploy'] # Plot Line1 (Left Y Axis) fig, ax1 = plt.subplots(1,1,figsize=(16,9), dpi= 80) ax1.plot(x, y1, color='tab:red') # Plot Line2 (Right Y Axis) ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis ax2.plot(x, y2, color='tab:blue') # Decorations # ax1 (left Y axis) ax1.set_xlabel('Year', fontsize=20) ax1.tick_params(axis='x', rotation=0, labelsize=12) ax1.set_ylabel('Personal Savings Rate', color='tab:red', fontsize=20) ax1.tick_params(axis='y', rotation=0, labelcolor='tab:red' ) ax1.grid(alpha=.4) # ax2 (right Y axis) ax2.set_ylabel("# Unemployed (1000's)", color='tab:blue', fontsize=20) ax2.tick_params(axis='y', labelcolor='tab:blue') ax2.set_xticks(np.arange(0, len(x), 60)) ax2.set_xticklabels(x[::60], rotation=90, fontdict={'fontsize':10}) ax2.set_title("Personal Savings Rate vs Unemployed: Plotting in Secondary Y Axis", fontsize=22) fig.tight_layout() plt.show()
4、交叉相關圖
互相關圖顯示了兩個時間序列之間的時滯。
from scipy.stats import sem # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/user_orders_hourofday.csv") df_mean = df.groupby('order_hour_of_day').quantity.mean() df_se = df.groupby('order_hour_of_day').quantity.apply(sem).mul(1.96) # Plot plt.figure(figsize=(16,10), dpi= 80) plt.ylabel("# Orders", fontsize=16) x = df_mean.index plt.plot(x, df_mean, color="white", lw=2) plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3F5D7D") # Decorations # Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(1) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(1) plt.xticks(x[::2], [str(d) for d in x[::2]] , fontsize=12) plt.title("User Orders by Hour of Day (95% confidence)", fontsize=22) plt.xlabel("Hour of Day") s, e = plt.gca().get_xlim() plt.xlim(s, e) # Draw Horizontal Tick lines for y in range(8, 20, 2): plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5) plt.show()
5、時間序列分解圖
時間序列分解圖顯示了時間序列按趨勢,季節和殘差成分的分解。
"Data Source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv" from dateutil.parser import parse from scipy.stats import sem # Import Data df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv', parse_dates=['purchase_time', 'purchase_date']) # Prepare Data: Daily Mean and SE Bands df_mean = df_raw.groupby('purchase_date').quantity.mean() df_se = df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96) # Plot plt.figure(figsize=(16,10), dpi= 80) plt.ylabel("# Daily Orders", fontsize=16) x = [d.date().strftime('%Y-%m-%d') for d in df_mean.index] plt.plot(x, df_mean, color="white", lw=2) plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3F5D7D") # Decorations # Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(1) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(1) plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12) plt.title("Daily Order Quantity of Brazilian Retail with Error Bands (95% confidence)", fontsize=20) # Axis limits s, e = plt.gca().get_xlim() plt.xlim(s, e-2) plt.ylim(4, 10) # Draw Horizontal Tick lines for y in range(5, 10, 1): plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5) plt.show()
6、多時間序列圖
您可以在同一張圖表上繪製測量同一值的多個時間序列,如下所示。
"Data Source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv" from dateutil.parser import parse from scipy.stats import sem # Import Data df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv', parse_dates=['purchase_time', 'purchase_date']) # Prepare Data: Daily Mean and SE Bands df_mean = df_raw.groupby('purchase_date').quantity.mean() df_se = df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96) # Plot plt.figure(figsize=(16,10), dpi= 80) plt.ylabel("# Daily Orders", fontsize=16) x = [d.date().strftime('%Y-%m-%d') for d in df_mean.index] plt.plot(x, df_mean, color="white", lw=2) plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3F5D7D") # Decorations # Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(1) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(1) plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12) plt.title("Daily Order Quantity of Brazilian Retail with Error Bands (95% confidence)", fontsize=20) # Axis limits s, e = plt.gca().get_xlim() plt.xlim(s, e-2) plt.ylim(4, 10) # Draw Horizontal Tick lines for y in range(5, 10, 1): plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5) plt.show()
7、雙y軸圖
如果要顯示在同一時間點測量兩個不同量的兩個時間序列,則可以在右邊的第二個Y軸上再次繪製第二個序列。
"Data Source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv" from dateutil.parser import parse from scipy.stats import sem # Import Data df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv', parse_dates=['purchase_time', 'purchase_date']) # Prepare Data: Daily Mean and SE Bands df_mean = df_raw.groupby('purchase_date').quantity.mean() df_se = df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96) # Plot plt.figure(figsize=(16,10), dpi= 80) plt.ylabel("# Daily Orders", fontsize=16) x = [d.date().strftime('%Y-%m-%d') for d in df_mean.index] plt.plot(x, df_mean, color="white", lw=2) plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3F5D7D") # Decorations # Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(1) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(1) plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12) plt.title("Daily Order Quantity of Brazilian Retail with Error Bands (95% confidence)", fontsize=20) # Axis limits s, e = plt.gca().get_xlim() plt.xlim(s, e-2) plt.ylim(4, 10) # Draw Horizontal Tick lines for y in range(5, 10, 1): plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5) plt.show()
8、具有誤差帶的時間序列
如果您具有每個時間點(日期/時間戳)具有多個觀測值的時間序列數據集,則可以構建帶有誤差帶的時間序列。您可以在下麵看到一些基於一天中不同時間下達的訂單的示例。另一個例子是在45天的時間內到達的訂單數量。
在這種方法中,訂單數量的平均值由白線表示。然後計算出95%的置信帶並圍繞均值繪製。
"Data Source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv" from dateutil.parser import parse from scipy.stats import sem # Import Data df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv', parse_dates=['purchase_time', 'purchase_date']) # Prepare Data: Daily Mean and SE Bands df_mean = df_raw.groupby('purchase_date').quantity.mean() df_se = df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96) # Plot plt.figure(figsize=(16,10), dpi= 80) plt.ylabel("# Daily Orders", fontsize=16) x = [d.date().strftime('%Y-%m-%d') for d in df_mean.index] plt.plot(x, df_mean, color="white", lw=2) plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3F5D7D") # Decorations # Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(1) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(1) plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12) plt.title("Daily Order Quantity of Brazilian Retail with Error Bands (95% confidence)", fontsize=20) # Axis limits s, e = plt.gca().get_xlim() plt.xlim(s, e-2) plt.ylim(4, 10) # Draw Horizontal Tick lines for y in range(5, 10, 1): plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5) plt.show()
"Data Source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv" from dateutil.parser import parse from scipy.stats import sem # Import Data df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv', parse_dates=['purchase_time', 'purchase_date']) # Prepare Data: Daily Mean and SE Bands df_mean = df_raw.groupby('purchase_date').quantity.mean() df_se = df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96) # Plot plt.figure(figsize=(16,10), dpi= 80) plt.ylabel("# Daily Orders", fontsize=16) x = [d.date().strftime('%Y-%m-%d') for d in df_mean.index] plt.plot(x, df_mean, color="white", lw=2) plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3F5D7D") # Decorations # Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(1) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(1) plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12) plt.title("Daily Order Quantity of Brazilian Retail with Error Bands (95% confidence)", fontsize=20) # Axis limits s, e = plt.gca().get_xlim() plt.xlim(s, e-2) plt.ylim(4, 10) # Draw Horizontal Tick lines for y in range(5, 10, 1): plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5) plt.show()
9、堆積面積圖
堆積面積圖直觀地顯示了多個時間序列的貢獻程度,因此可以輕鬆地進行相互比較。
# Import Data df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/nightvisitors.csv') # Decide Colors mycolors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:brown', 'tab:grey', 'tab:pink', 'tab:olive'] # Draw Plot and Annotate fig, ax = plt.subplots(1,1,figsize=(16, 9), dpi= 80) columns = df.columns[1:] labs = columns.values.tolist() # Prepare data x = df['yearmon'].values.tolist() y0 = df[columns[0]].values.tolist() y1 = df[columns[1]].values.tolist() y2 = df[columns[2]].values.tolist() y3 = df[columns[3]].values.tolist() y4 = df[columns[4]].values.tolist() y5 = df[columns[5]].values.tolist() y6 = df[columns[6]].values.tolist() y7 = df[columns[7]].values.tolist() y = np.vstack([y0, y2, y4, y6, y7, y5, y1, y3]) # Plot for each column labs = columns.values.tolist() ax = plt.gca() ax.stackplot(x, y, labels=labs, colors=mycolors, alpha=0.8) # Decorations ax.set_title('Night Visitors in Australian Regions', fontsize=18) ax.set(ylim=[0, 100000]) ax.legend(fontsize=10, ncol=4) plt.xticks(x[::5], fontsize=10, horizontalalignment='center') plt.yticks(np.arange(10000, 100000, 20000), fontsize=10) plt.xlim(x[0], x[-1]) # Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(.3) plt.show()
10、區域圖(未堆疊)
未堆積的面積圖用於可視化兩個或多個系列相對於彼此的進度(漲跌)。在下麵的圖表中,您可以清楚地看到隨著失業時間的中位數增加,個人儲蓄率如何下降。未堆積面積圖很好地顯示了這種現象。
import matplotlib as mpl import calmap # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/yahoo.csv", parse_dates=['date']) df.set_index('date', inplace=True) # Plot plt.figure(figsize=(16,10), dpi= 80) calmap.calendarplot(df['2014']['VIX.Close'], fig_kws={'figsize': (16,10)}, yearlabel_kws={'color':'black', 'fontsize':14}, subplot_kws={'title':'Yahoo Stock Prices'}) plt.show()
11、日曆熱圖
日曆地圖是與時間序列相比可視化基於時間的數據的替代方法,而不是首選方法。儘管可以在視覺上吸引人,但數值並不十分明顯。但是,它可以有效地很好地描繪出極端值和假日效果。
import matplotlib as mpl import calmap # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/yahoo.csv", parse_dates=['date']) df.set_index('date', inplace=True) # Plot plt.figure(figsize=(16,10), dpi= 80) calmap.calendarplot(df['2014']['VIX.Close'], fig_kws={'figsize': (16,10)}, yearlabel_kws={'color':'black', 'fontsize':14}, subplot_kws={'title':'Yahoo Stock Prices'}) plt.show()
12、季節性圖
季節性圖可用於比較上一個季節的同一天(年/月/周等)的時間序列執行情況。
from dateutil.parser import parse # Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv') # Prepare data df['year'] = [parse(d).year for d in df.date] df['month'] = [parse(d).strftime('%b') for d in df.date] years = df['year'].unique() # Draw Plot mycolors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:brown', 'tab:grey', 'tab:pink', 'tab:olive', 'deeppink', 'steelblue', 'firebrick', 'mediumseagreen'] plt.figure(figsize=(16,10), dpi= 80) for i, y in enumerate(years): plt.plot('month', 'traffic', data=df.loc[df.year==y, :], color=mycolors[i], label=y) plt.text(df.loc[df.year==y, :].shape[0]-.9, df.loc[df.year==y, 'traffic'][-1:].values[0], y, fontsize=12, color=mycolors[i]) # Decoration plt.ylim(50,750) plt.xlim(-0.3, 11) plt.ylabel('$Air Traffic$') plt.yticks(fontsize=12, alpha=.7) plt.title("Monthly Seasonal Plot: Air Passengers Traffic (1949 - 1969)", fontsize=22) plt.grid(axis='y', alpha=.3) # Remove borders plt.gca().spines["top"].set_alpha(0.0) plt.gca().spines["bottom"].set_alpha(0.5) plt.gca().spines["right"].set_alpha(0.0) plt.gca().spines["left"].set_alpha(0.5) # plt.legend(loc='upper right', ncol=2, fontsize=12) plt.show()
不管你是零基礎還是有基礎都可以獲取到自己相對應的學習禮包!包括Python軟體工具和2020最新入門到實戰教程。加群695185429即可免費獲取。