學習ML.NET(1): 使用LearningPipeline構建機器學習流水線

-Advertisement-

ML.NET使用LearningPipeline類定義執行期望的機器學習任務所需的步驟，讓機器學習的流程變得直觀。下麵用鳶尾花瓣預測快速入門的示例代碼講解流水線是如何工作的。創建工作流實例首先，創建LearningPipeline實例添加步驟然後，調用LearningPipeline實例的 ...

ML.NET使用LearningPipeline類定義執行期望的機器學習任務所需的步驟，讓機器學習的流程變得直觀。

下麵用鳶尾花瓣預測快速入門的示例代碼講解流水線是如何工作的。

using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System;

namespace myApp
{
    class Program
    {
        // STEP 1: Define your data structures

        // IrisData is used to provide training data, and as 
        // input for prediction operations
        // - First 4 properties are inputs/features used to predict the label
        // - Label is what you are predicting, and is only set when training
        public class IrisData
        {
            [Column("0")]
            public float SepalLength;

            [Column("1")]
            public float SepalWidth;

            [Column("2")]
            public float PetalLength;

            [Column("3")]
            public float PetalWidth;

            [Column("4")]
            [ColumnName("Label")]
            public string Label;
        }

        // IrisPrediction is the result returned from prediction operations
        public class IrisPrediction
        {
            [ColumnName("PredictedLabel")]
            public string PredictedLabels;
        }

        static void Main(string[] args)
        {
            // STEP 2: Create a pipeline and load your data
            var pipeline = new LearningPipeline();

            // If working in Visual Studio, make sure the 'Copy to Output Directory' 
            // property of iris-data.txt is set to 'Copy always'
            string dataPath = "iris-data.txt";
            pipeline.Add(new TextLoader(dataPath).CreateFrom<IrisData>(separator: ','));

            // STEP 3: Transform your data
            // Assign numeric values to text in the "Label" column, because only
            // numbers can be processed during model training
            pipeline.Add(new Dictionarizer("Label"));

            // Puts all features into a vector
            pipeline.Add(new ColumnConcatenator("Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth"));

            // STEP 4: Add learner
            // Add a learning algorithm to the pipeline. 
            // This is a classification scenario (What type of iris is this?)
            pipeline.Add(new StochasticDualCoordinateAscentClassifier());

            // Convert the Label back into original text (after converting to number in step 3)
            pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });

            // STEP 5: Train your model based on the data set
            var model = pipeline.Train<IrisData, IrisPrediction>();

            // STEP 6: Use your model to make a prediction
            // You can change these numbers to test different predictions
            var prediction = model.Predict(new IrisData()
            {
                SepalLength = 3.3f,
                SepalWidth = 1.6f,
                PetalLength = 0.2f,
                PetalWidth = 5.1f,
            });

            Console.WriteLine($"Predicted flower type is: {prediction.PredictedLabels}");
        }
    }
}

創建工作流實例

首先，創建LearningPipeline實例

var pipeline = new LearningPipeline();

添加步驟

然後，調用LearningPipeline實例的Add方法向流水線添加步驟，每個步驟都繼承自ILearningPipelineItem介面。

一個基本的工作流包括以下幾個步驟，其中，藍色部分是可選的。

載入數據集

繼承自ILearningPipelineLoader介面。

一個工作流必須包含至少1個載入數據集步驟。

//使用TextLoader載入數據
string dataPath = "iris-data.txt";
pipeline.Add(new TextLoader(dataPath).CreateFrom<IrisData>(separator: ','));

數據預處理

繼承自CommonInputs.ITransformInput介面。

一個工作流可以包含0到多個數據預處理步驟，用於將已載入的數據集標準化，示例代碼中就包含2了個數據預處理步驟。

//由於Label文本數據，演算法不能識別數據，需要將其轉換為字典
pipeline.Add(new Dictionarizer("Label")); 

//演算法只能從Features列獲取數據，需要數據中的多列連接到Features列中
pipeline.Add(new ColumnConcatenator("Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth"));

選擇學習演算法

繼承自CommonInputs.ITrainerInput介面。

一個工作流必須且只能包含1個學習演算法。

//使用線性分類器
pipeline.Add(new StochasticDualCoordinateAscentClassifier());

標簽轉換

繼承自CommonInputs.ITransformInput介面。

一個工作流可以包含0到多個標簽轉換步驟，用於將預測得到的標簽轉換成方便識別的數據。

//將Label從字典轉換成文本數據
pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });

執行工作流

最後，調用LearningPipeline實例的Train方法，就可以執行工作流得到預測模型。

var model = pipeline.Train<IrisData, IrisPrediction>();

您的分享是我們最大的動力!

-Advertisement-

更多相關文章

python多任務抓取虎牙妹子圖片

協程,又稱微線程,利用線程在等待某個資源的期間執行其它函數,切換資源消耗非常小,執行效率相當快圖片下載器利用網路下載延遲,切換 ...
Servlet開發（一）

1. Servlet簡介 Servlet是伺服器端程式，主要用來互動式地瀏覽和修改數據，生成動態web內容。Servlet是SUN公司提供的一個介面，廣義的Servlet可以指任何實現了Servlet這個介面的類。Servlet生成動態web內容的過程包含以下這些內容： 1. 客戶端發送請求至伺服器 ...
洛谷P2347 砝碼稱重

題目貌似是某年提高組簽到題，六重迴圈零壓力AC，差點怒踩std 但本蒟蒻決定寫正解——多重背包，果斷20分原因是寫錯了狀態轉移方程。。。神才知道我咋過的樣例和兩個測試點扯遠了多重背包簡單說一下多重背包限制某一些物體個數的背包可以參考fengzw的背包問題：0-1背包、完全背包和多重背包 ...
Heap memory compared to stack memory

Heap memory compared to stack memory ...
第一章

一。機器語言和彙編語言（1）機器語言是機器指令的集合，是0,1構成的二進位信息優點：面向機器，高效率缺點：依賴硬體，不具備可移植性，晦澀難懂，不宜查錯用途：特殊加密解密（2）彙編語言組成： 1）彙編指令：機器碼的助記符，有對應的機器碼 2）偽指令：沒有對應機器碼，由編譯器執行，電腦 ...
[PHP] 演算法-原址排序數組使奇數位於偶數前面的PHP實現

輸入一個整數數組，實現一個函數來調整該數組中數字的順序，使得所有的奇數位於數組的前半部分，所有的偶數位於數組的後半部分，並保證奇數和奇數，偶數和偶數之間的相對位置不變。 1.遍曆數組，判斷元素奇數偶數，push進新數組，空間換時間 2.插入排序的思想空間上是原址排序 2.1從前往後遍歷，判斷當前的... ...
AJAX獲取JSON WEB窗體代碼

1.添加引用 using System.Web.Services; 2.添加方法 [WebMethod] public static string getFoodClasses(int parentID) { onnEntities onndb = new onnEntities(); //定義數據 ...
C#特性詳解

特性（attribute）是被指定給某一聲明的一則附加的聲明性信息。在C#中，有一個小的預定義特性集合。在學習如何建立我們自己的定製特性（custom attributes）之前，我們先來看看在我們的代碼中如何使用預定義特性。 1 using System; 2 public class AnyC ...