決策樹演算法簡單應用_ZenDei技術網路在線

決策樹演算法簡單應用

-Advertisement-

採用ID3演算法（信息熵：H(X)=−∑i=0np(xi)log2p(xi)）下載一個決策樹可視化軟體：Graphviz （註意環境變數Path加：C:\Program Files (x86)\Graphviz2.38\bin）代碼：導入需要用到的庫：讀取表格：這裡一些數據（屬性），決定一 ...

採用ID3演算法

（信息熵：H(X)=−∑i=0np(xi)log2p(xi)）

下載一個決策樹可視化軟體：Graphviz

（註意環境變數Path加：C:\Program Files (x86)\Graphviz2.38\bin）

代碼：

導入需要用到的庫：

from sklearn.feature_extraction import DictVectorizer
import csv
from sklearn import tree
from sklearn import preprocessing

讀取表格：

這裡一些數據（屬性），決定一位客戶是否要買這臺電腦

讀取表格並做一些簡單的數據處理：

allElectronicsData = open(r'D:\demo.csv', 'rt')
reader = csv.reader(allElectronicsData)
headers = next(reader)

featureList = []
labelList = []

for row in reader:
    labelList.append(row[len(row)-1])
    rowDict = {}
    for i in range(1, len(row)-1):
        rowDict[headers[i]] = row[i]
    featureList.append(rowDict)

print(featureList)

看一下結果：

[
{'age': 'youth', 'student': 'no', 'income': 'high', 'credit_rating': 'fair'},
 {'age': 'youth', 'student': 'no', 'income': 'high', 'credit_rating': 'excellent'}, 
{'age': 'middle_aged', 'student': 'no', 'income': 'high', 'credit_rating': 'fair'}, 
{'age': 'senior', 'student': 'no', 'income': 'medium', 'credit_rating': 'fair'},
 {'age': 'senior', 'student': 'yes', 'income': 'low', 'credit_rating': 'fair'}, 
{'age': 'senior', 'student': 'yes', 'income': 'low', 'credit_rating': 'excellent'}, 
{'age': 'middle_aged', 'student': 'yes', 'income': 'low', 'credit_rating': 'excellent'}, 
{'age': 'youth', 'student': 'no', 'income': 'medium', 'credit_rating': 'fair'},
 {'age': 'youth', 'student': 'yes', 'income': 'low', 'credit_rating': 'fair'}, 
{'age': 'senior', 'student': 'yes', 'income': 'medium', 'credit_rating': 'fair'},
 {'age': 'youth', 'student': 'yes', 'income': 'medium', 'credit_rating': 'excellent'}, 
{'age': 'middle_aged', 'student': 'no', 'income': 'medium', 'credit_rating': 'excellent'},
 {'age': 'middle_aged', 'student': 'yes', 'income': 'high', 'credit_rating': 'fair'}, 
{'age': 'senior', 'student': 'no', 'income': 'medium', 'credit_rating': 'excellent'}
]

處理的不錯：

調用sklearn的函數進一步處理數據：

vec = DictVectorizer()
dummyX = vec.fit_transform(featureList) .toarray()

lb = preprocessing.LabelBinarizer()
dummyY = lb.fit_transform(labelList)

查看下處理的結果：

print("dummyX: \n" + str(dummyX))
print(vec.get_feature_names())

print("labelList: " + str(labelList))
print("dummyY: \n" + str(dummyY))

結果：

註意要把數據轉換成數字矩陣，便於學習

dummyX: 
[[0. 0. 1. 0. 1. 1. 0. 0. 1. 0.]
 [0. 0. 1. 1. 0. 1. 0. 0. 1. 0.]
 [1. 0. 0. 0. 1. 1. 0. 0. 1. 0.]
 [0. 1. 0. 0. 1. 0. 0. 1. 1. 0.]
 [0. 1. 0. 0. 1. 0. 1. 0. 0. 1.]
 [0. 1. 0. 1. 0. 0. 1. 0. 0. 1.]
 [1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]
 [0. 0. 1. 0. 1. 0. 0. 1. 1. 0.]
 [0. 0. 1. 0. 1. 0. 1. 0. 0. 1.]
 [0. 1. 0. 0. 1. 0. 0. 1. 0. 1.]
 [0. 0. 1. 1. 0. 0. 0. 1. 0. 1.]
 [1. 0. 0. 1. 0. 0. 0. 1. 1. 0.]
 [1. 0. 0. 0. 1. 1. 0. 0. 0. 1.]
 [0. 1. 0. 1. 0. 0. 0. 1. 1. 0.]]
['age=middle_aged', 'age=senior', 'age=youth', 'credit_rating=excellent', 'credit_rating=fair', 'income=high', 'income=low', 'income=medium', 'student=no', 'student=yes']
labelList: ['no', 'no', 'yes', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes', 'yes', 'yes', 'yes', 'no']
dummyY: 
[[0]
 [0]
 [1]
 [1]
 [1]
 [0]
 [1]
 [0]
 [1]
 [1]
 [1]
 [1]
 [1]
 [0]]

用決策樹ID3演算法和訓練數據擬合分類器模型：

clf = tree.DecisionTreeClassifier(criterion='entropy')
clf = clf.fit(dummyX, dummyY)

可以利用下載的可視化軟體畫圖觀察下：

with open(r"D:\demo.dot", 'w') as f:
    f = tree.export_graphviz(clf, feature_names=vec.get_feature_names(), out_file=f)

然後調出cmd：

畫好後是pdf形式的，看一下：

模型建好了，我們可以做一個預測：

在第一個數據的基礎上修改下，然後預測是否買電腦：

oneRowX = dummyX[0, :]
print("oneRowX: " + str(oneRowX))

newRowX = oneRowX
newRowX[0] = 1
newRowX[2] = 0
print("newRowX: " + str(newRowX))

predictedY = clf.predict(newRowX.reshape(1, -1))
print("predictedY: " + str(predictedY))

結果：

oneRowX: [0. 0. 1. 0. 1. 1. 0. 0. 1. 0.]
newRowX: [1. 0. 0. 0. 1. 1. 0. 0. 1. 0.]
predictedY: [1]

結論：這個人要買這臺電腦

您的分享是我們最大的動力!

-Advertisement-

更多相關文章

C++編程實現對工廠產品生產流程的模擬

花費二個多月的時間編寫了可以實時模擬工廠產品生產流程的程式，工廠產品生產流程的模擬，就是計算在工藝文件所規定的工序下，不同種類的多件產品(同一類別的產品可以有多件)在不同類別的多台設備(同一類別的設備可以有多台)上全部生產完畢所需的總時間。每一件產品可以在生產流程中先後多次在同一類設備上生產而且生產 ...
客戶端安全傳輸密碼至服務端的實現改進

兩年前在做Java EE開發平臺時，因為用戶登錄相關的模塊是委托給另一位同事完成的，所以雖然知道大體概念，但是對客戶端怎麼安全傳輸密碼到服務端的具體細節並不甚瞭解。然而這次在做4A系統（認證、授權、監控、審計）時，無論怎樣都繞不過這一塊內容了，於是在仔細研究了一下之前的方案，並參考網上的一些資料後， ...
Python文件處理

Python文件處理 Python文件處理在python中，要對一個文件進行操作，得把文件抽象為Streams流或者說file object或者叫file-like objects。這樣將文件當作一個流對象來處理就方便多了。Stream對象提供了很多操作方法（如read(),write()等）， ...
spring MVC框架入門(外加SSM整合)

spring MVC框架一、什麼是spring MVC Spring MVC屬於SpringFrameWork的後續產品，已經融合在Spring Web Flow裡面。Spring 框架提供了構建 Web 應用程式的全功能 MVC 模塊。使用 Spring 可插入的 MVC 架構，從而在使用Spr ...
Java動態代理模式淺析

"Java代理設計模式靜態代理" "Java中的動態代理調用處理器" 代理設計模式的UML圖：我將首先介紹Java中的各種代理實現方法 Java代理設計模式靜態代理這個例子非常簡單，只有一個方法的介面 : 測試代碼：測試輸出：現在麻煩的是，Jerry的領導因為團隊中的開發者像Jerr ...
電梯模擬C++

1、問題描述與要求模擬某校九層教學樓的電梯系統。該樓有一個自動電梯，能在每層停留，其中第一層是大樓的進出層，即是電梯的“本壘層”，電梯“空閑”時，將來到該層候命。電梯一共有七個狀態，即正在開門（Opening）、已開門（Opened）、正在關門（Closing）、已關門（Closed）、等待（W ...
PHP的錯誤處理和異常處理

由於教程是圍繞著文件打開做的錯誤處理，所以先記錄幾個用於文件處理的一些函數，fopen 用於打開一個文件；file_exists 用於檢查目錄是否存在；fclose（ $變數）用於指定關閉打開的文件； PHP處理錯誤的幾種方式：die（）語句；自定義錯誤和錯誤觸發器；錯誤日誌； die（）語句： ...
利用爬蟲獲取網上醫院藥品價格信息（上）

在對比醫院業務數據中的各類藥品價格的時候，面對著成千上百種的藥品。因而想到使用爬蟲來自動獲取網上的藥品價格，保存下來導入資料庫中就可以方便地比較院方的藥品採購價格了。通過百度搜索“藥品價格查詢”，在眾多的網站中，這裡選擇了藥價查詢網（http://www.china-yao.com/），主要是因為 ...