1.TASKS PREDICTION and DESCRIPTION 2.COMMON METHODS CLASSIFACATION ; CLUSTEERING ; SEQUENTIAL PATTERN DISCOVERY ; REGRESSION ; ASSOCIATION RULE DISCOV ...
1.TASKS
PREDICTION and DESCRIPTION
2.COMMON METHODS
CLASSIFACATION ; CLUSTEERING ; SEQUENTIAL PATTERN DISCOVERY ; REGRESSION ; ASSOCIATION RULE DISCOVERY ; DEVIATION DETECTION
分類;聚類;相關性;
序列分析:用隨機過程理論和數理統計學方法,研究隨機數據序列所遵從的統計規律,以用於解決實際問題。由於在多數問題中,隨機數據是依時間先後排成序列的,故稱為時間序列分析;
回歸;異常值檢測
3.A GENERAL PROCEDURE
specify the problems to deal with——input——pre-processing——other treating process(feature-based) like sampling and grouping——modling——output——assessment(ex:處理過擬合,training set)
明確問題:分類or聚類(有training set:分類)
搜集數據...
預處理:可信度,數據集成,冗餘刪除,處理衝突值,數據採樣,數據清理,缺失值處理,雜訊處理(工具:pig,hive,spark ; 編程語言:python,sas,matlab,scala,java)
4.REFERENCES
BOOK:
集體智慧編程(programming colletive intelligence)
寫給程式員的數據挖掘指南
數學之美
introduction to data mining
data mining:concepts and techniques
MOOC:
Andrew NG(machine learning) ; maching learning foundations;
DOCs:
統計學習方法 ;機器學習實戰;scikit-learn文檔
PRACTICE:
Kaggle;SIGKDD
DEEP LEARNING:
The elenents of statistical learing
Pattern recognition and machine learning