.NET 向量類型的運算結果範例——用於學習Vector類所提供百多個向量方法

摘要這片文章主要是記錄自己的整活過程，涉及到的技術包括.NET IoT, .NET Web, .NET MAUI,框架採用的也是最新的.NET 7。本人是用的樹莓派Zero 2 W（ubuntu-22.04）進行開發測試，但是.NET IoT庫也有社區張高興提交的香橙派GPIO引腳的映射，香橙派 ...

作者：

一、背景
二、編寫Demo程式（VectorClassDemo）
三、運行結果
參考文獻

一、背景

從.NET Core 1.0（或 .NET Framework 4.5、.NET Standard 1.0）開始，.NET中便可以使用具有SIMD硬體加速的向量類型了。
其中大小與硬體相關的向量（Vectors with a hardware dependent size）作用最大。它由只讀結構體（readonly struct） Vector<T>，及輔助的靜態類 Vector 所組成。
只讀結構體 Vector<T> 主要是通過運算符提供了常規算術運算的能力，功能有限。而靜態類 Vector 為向量類型提供了大量的運算函數，能大大拓展了向量類型的使用領域。
但是靜態類 Vector 提供了大量的方法，數量達到一百多個，且文檔說明很簡略，導致學習起來很困難。

於是我編寫了一個Demo程式，將靜態類 Vector所提供百多個向量方法，每一個均編寫了測試代碼。利用測試代碼、運行結果與官方文檔進行對照，這樣便更容易弄懂了。

二、編寫Demo程式（VectorClassDemo）

2.1 項目結構

目前解決方案里有這3個項目：

VectorClassDemo：共用項目。裡面是公用的測試代碼。
VectorClassDemo20：.NET Core 2.0 控制台項目。用於測試低版本 .NET Core 2.0 時的運行情況。
VectorClassDemo50：Net 5.0 控制台項目。用於測試高版本 .NET 時的運行情況。例如可臨時將項目的目標框架修改為“.Net 7.0”，測試 “.Net 7.0”下的表現。

為了便於不同目標框架的測試，於是將公用的測試代碼放在共用項目里，這樣能便於代碼復用，使控制台的代碼簡單。例如 VectorClassDemo50 中 Program.cs 代碼為：

using System;
using System.IO;
using VectorClassDemo;

namespace VectorClassDemo50 {
    class Program {
        static void Main(string[] args) {
            string indent = "";
            TextWriter tw = Console.Out;
            tw.WriteLine("VectorClassDemo50");
            tw.WriteLine();
            VectorDemo.OutputEnvironment(tw, indent);
            tw.WriteLine();
            VectorDemo.Run(tw, indent);
        }
    }
}

2.2 輸出環境信息（OutputEnvironment）

因為這次測試了多個平臺，不同平臺的環境信息信息均不同。於是可以專門用一個函數來輸出環境信息，源碼如下。

/// <summary>
/// Is release make.
/// </summary>
public static readonly bool IsRelease =
#if DEBUG
    false
#else
    true
#endif
;

/// <summary>
/// Output Environment.
/// </summary>
/// <param name="tw">Output <see cref="TextWriter"/>.</param>
/// <param name="indent">The indent.</param>
public static void OutputEnvironment(TextWriter tw, string indent) {
    if (null == tw) return;
    if (null == indent) indent = "";
    //string indentNext = indent + "\t";
    tw.WriteLine(indent + string.Format("IsRelease:\t{0}", IsRelease));
    tw.WriteLine(indent + string.Format("EnvironmentVariable(PROCESSOR_IDENTIFIER):\t{0}", Environment.GetEnvironmentVariable("PROCESSOR_IDENTIFIER")));
    tw.WriteLine(indent + string.Format("Environment.ProcessorCount:\t{0}", Environment.ProcessorCount));
    tw.WriteLine(indent + string.Format("Environment.Is64BitOperatingSystem:\t{0}", Environment.Is64BitOperatingSystem));
    tw.WriteLine(indent + string.Format("Environment.Is64BitProcess:\t{0}", Environment.Is64BitProcess));
    tw.WriteLine(indent + string.Format("Environment.OSVersion:\t{0}", Environment.OSVersion));
    tw.WriteLine(indent + string.Format("Environment.Version:\t{0}", Environment.Version));
    //tw.WriteLine(indent + string.Format("RuntimeEnvironment.GetSystemVersion:\t{0}", System.Runtime.InteropServices.RuntimeEnvironment.GetSystemVersion())); // Same Environment.Version
    tw.WriteLine(indent + string.Format("RuntimeEnvironment.GetRuntimeDirectory:\t{0}", System.Runtime.InteropServices.RuntimeEnvironment.GetRuntimeDirectory()));
#if (NET47 || NET462 || NET461 || NET46 || NET452 || NET451 || NET45 || NET40 || NET35 || NET20) || (NETSTANDARD1_0)
#else
    tw.WriteLine(indent + string.Format("RuntimeInformation.FrameworkDescription:\t{0}", System.Runtime.InteropServices.RuntimeInformation.FrameworkDescription));
#endif
    tw.WriteLine(indent + string.Format("BitConverter.IsLittleEndian:\t{0}", BitConverter.IsLittleEndian));
    tw.WriteLine(indent + string.Format("IntPtr.Size:\t{0}", IntPtr.Size));
    tw.WriteLine(indent + string.Format("Vector.IsHardwareAccelerated:\t{0}", Vector.IsHardwareAccelerated));
    tw.WriteLine(indent + string.Format("Vector<byte>.Count:\t{0}\t# {1}bit", Vector<byte>.Count, Vector<byte>.Count * sizeof(byte) * 8));
    //tw.WriteLine(indent + string.Format("Vector<float>.Count:\t{0}\t# {1}bit", Vector<float>.Count, Vector<float>.Count * sizeof(float) * 8));
    //tw.WriteLine(indent + string.Format("Vector<double>.Count:\t{0}\t# {1}bit", Vector<double>.Count, Vector<double>.Count * sizeof(double) * 8));
    Assembly assembly;
    //assembly = typeof(Vector4).GetTypeInfo().Assembly;
    //tw.WriteLine(string.Format("Vector4.Assembly:\t{0}", assembly));
    //tw.WriteLine(string.Format("Vector4.Assembly.CodeBase:\t{0}", assembly.CodeBase));
    assembly = typeof(Vector<float>).GetTypeInfo().Assembly;
    tw.WriteLine(string.Format("Vector<T>.Assembly.CodeBase:\t{0}", assembly.CodeBase));

    OutputIntrinsics(tw, indent);
}

/// <summary>
/// Output Intrinsics.
/// </summary>
/// <param name="tw">Output <see cref="TextWriter"/>.</param>
/// <param name="indent">The indent.</param>
public static void OutputIntrinsics(TextWriter tw, string indent) {
    if (null == tw) return;
    if (null == indent) indent = "";
#if NETCOREAPP3_0_OR_GREATER
    tw.WriteLine();
    tw.WriteLine(indent + "[Intrinsics.X86]");
    WriteLineFormat(tw, indent, "Aes.IsSupported:\t{0}", System.Runtime.Intrinsics.X86.Aes.IsSupported);
    WriteLineFormat(tw, indent, "Aes.X64.IsSupported:\t{0}", System.Runtime.Intrinsics.X86.Aes.X64.IsSupported);
    WriteLineFormat(tw, indent, "Avx.IsSupported:\t{0}", Avx.IsSupported);
    WriteLineFormat(tw, indent, "Avx.X64.IsSupported:\t{0}", Avx.X64.IsSupported);
    WriteLineFormat(tw, indent, "Avx2.IsSupported:\t{0}", Avx2.IsSupported);
    WriteLineFormat(tw, indent, "Avx2.X64.IsSupported:\t{0}", Avx2.X64.IsSupported);
#if NET6_0_OR_GREATER
    WriteLineFormat(tw, indent, "AvxVnni.IsSupported:\t{0}", AvxVnni.IsSupported);
    WriteLineFormat(tw, indent, "AvxVnni.X64.IsSupported:\t{0}", AvxVnni.X64.IsSupported);
#endif
    WriteLineFormat(tw, indent, "Bmi1.IsSupported:\t{0}", Bmi1.IsSupported);
    WriteLineFormat(tw, indent, "Bmi1.X64.IsSupported:\t{0}", Bmi1.X64.IsSupported);
    WriteLineFormat(tw, indent, "Bmi2.IsSupported:\t{0}", Bmi2.IsSupported);
    WriteLineFormat(tw, indent, "Bmi2.X64.IsSupported:\t{0}", Bmi2.X64.IsSupported);
    WriteLineFormat(tw, indent, "Fma.IsSupported:\t{0}", Fma.IsSupported);
    WriteLineFormat(tw, indent, "Fma.X64.IsSupported:\t{0}", Fma.X64.IsSupported);
    WriteLineFormat(tw, indent, "Lzcnt.IsSupported:\t{0}", Lzcnt.IsSupported);
    WriteLineFormat(tw, indent, "Lzcnt.X64.IsSupported:\t{0}", Lzcnt.X64.IsSupported);
    WriteLineFormat(tw, indent, "Pclmulqdq.IsSupported:\t{0}", Pclmulqdq.IsSupported);
    WriteLineFormat(tw, indent, "Pclmulqdq.X64.IsSupported:\t{0}", Pclmulqdq.X64.IsSupported);
    WriteLineFormat(tw, indent, "Popcnt.IsSupported:\t{0}", Popcnt.IsSupported);
    WriteLineFormat(tw, indent, "Popcnt.X64.IsSupported:\t{0}", Popcnt.X64.IsSupported);
    WriteLineFormat(tw, indent, "Sse.IsSupported:\t{0}", Sse.IsSupported);
    WriteLineFormat(tw, indent, "Sse.X64.IsSupported:\t{0}", Sse.X64.IsSupported);
    WriteLineFormat(tw, indent, "Sse2.IsSupported:\t{0}", Sse2.IsSupported);
    WriteLineFormat(tw, indent, "Sse2.X64.IsSupported:\t{0}", Sse2.X64.IsSupported);
    WriteLineFormat(tw, indent, "Sse3.IsSupported:\t{0}", Sse3.IsSupported);
    WriteLineFormat(tw, indent, "Sse3.X64.IsSupported:\t{0}", Sse3.X64.IsSupported);
    WriteLineFormat(tw, indent, "Sse41.IsSupported:\t{0}", Sse41.IsSupported);
    WriteLineFormat(tw, indent, "Sse41.X64.IsSupported:\t{0}", Sse41.X64.IsSupported);
    WriteLineFormat(tw, indent, "Sse42.IsSupported:\t{0}", Sse42.IsSupported);
    WriteLineFormat(tw, indent, "Sse42.X64.IsSupported:\t{0}", Sse42.X64.IsSupported);
    WriteLineFormat(tw, indent, "Ssse3.IsSupported:\t{0}", Ssse3.IsSupported);
    WriteLineFormat(tw, indent, "Ssse3.X64.IsSupported:\t{0}", Ssse3.X64.IsSupported);
#if NET5_0_OR_GREATER
    WriteLineFormat(tw, indent, "X86Base.IsSupported:\t{0}", X86Base.IsSupported);
    WriteLineFormat(tw, indent, "X86Base.X64.IsSupported:\t{0}", X86Base.X64.IsSupported);
#endif // NET5_0_OR_GREATER
#if NET7_0_OR_GREATER
    WriteLineFormat(tw, indent, "X86Serialize.IsSupported:\t{0}", X86Serialize.IsSupported);
    WriteLineFormat(tw, indent, "X86Serialize.X64.IsSupported:\t{0}", X86Serialize.X64.IsSupported);
#endif // NET7_0_OR_GREATER
#endif // NETCOREAPP3_0_OR_GREATER

#if NET5_0_OR_GREATER
    tw.WriteLine();
    tw.WriteLine(indent + "[Intrinsics.Arm]");
    WriteLineFormat(tw, indent, "AdvSimd.IsSupported:\t{0}", AdvSimd.IsSupported);
    WriteLineFormat(tw, indent, "AdvSimd.Arm64.IsSupported:\t{0}", AdvSimd.Arm64.IsSupported);
    WriteLineFormat(tw, indent, "Aes.IsSupported:\t{0}", System.Runtime.Intrinsics.Arm.Aes.IsSupported);
    WriteLineFormat(tw, indent, "Aes.Arm64.IsSupported:\t{0}", System.Runtime.Intrinsics.Arm.Aes.Arm64.IsSupported);
    WriteLineFormat(tw, indent, "ArmBase.IsSupported:\t{0}", ArmBase.IsSupported);
    WriteLineFormat(tw, indent, "ArmBase.Arm64.IsSupported:\t{0}", ArmBase.Arm64.IsSupported);
    WriteLineFormat(tw, indent, "Crc32.IsSupported:\t{0}", Crc32.IsSupported);
    WriteLineFormat(tw, indent, "Crc32.Arm64.IsSupported:\t{0}", Crc32.Arm64.IsSupported);
    WriteLineFormat(tw, indent, "Dp.IsSupported:\t{0}", Dp.IsSupported);
    WriteLineFormat(tw, indent, "Dp.Arm64.IsSupported:\t{0}", Dp.Arm64.IsSupported);
    WriteLineFormat(tw, indent, "Rdm.IsSupported:\t{0}", Rdm.IsSupported);
    WriteLineFormat(tw, indent, "Rdm.Arm64.IsSupported:\t{0}", Rdm.Arm64.IsSupported);
    WriteLineFormat(tw, indent, "Sha1.IsSupported:\t{0}", Sha1.IsSupported);
    WriteLineFormat(tw, indent, "Sha1.Arm64.IsSupported:\t{0}", Sha1.Arm64.IsSupported);
    WriteLineFormat(tw, indent, "Sha256.IsSupported:\t{0}", Sha256.IsSupported);
    WriteLineFormat(tw, indent, "Sha256.Arm64.IsSupported:\t{0}", Sha256.Arm64.IsSupported);
#endif // NET5_0_OR_GREATER
}

因向量類型與內在函數（Intrinsics Functions）緊密相關，於是該函數還輸出了各類內在函數的支持信息。
在開發過程中，發現 .NET 版本升級時也在增加更多的內在函數（Intrinsics Functions）。例如 Net 5.0 時增加了大量 Arm架構的內在函數，且增加了 X86Base。
可以利用條件編譯，安全使用當前.NET 版本所允許使用的類。

2.3 創建測試數據（CreateVectorUseRotate）

使用 Vector<T> 的構造函數，只能創建單個數字重覆的值，或是通過數據（或Span）逐一指定數字。前者太死板，後者又太繁瑣。因為在不同的處理器上，Vector<T>的長度是不同的。
目前在支持 Avx2指令集的機器上，Vector<T>是256位的；而其他情況是 128位的。例如 128位的Vector<T>含有4個Single，而256位的Vector<T>含有8個Single，未來Vector<T>很可能會有512位或更高。
對於測試來說，很多時候我們用一批迴圈數字就行。例如 128位時用 “a,b,c,d”，而256位時用“a,b,c,d,a,b,c,d”就好。
於是我建立了一個根據有限數據來迴圈鋪滿各個向量元素的函數。而且它是用 params 定義的可變參數，極大地方便了使用。代碼如下。

/// <summary>
/// Create Vector&lt;T&gt; use rotate.
/// </summary>
/// <typeparam name="T">Vector type.</typeparam>
/// <param name="list">Source value list.</param>
/// <returns>Returns Vector&lt;T&gt;.</returns>
static Vector<T> CreateVectorUseRotate<T>(params T[] list) where T : struct {
    if (null == list || list.Length <= 0) return Vector<T>.Zero;
    T[] arr = new T[Vector<T>.Count];
    int idx = 0;
    for(int i=0; i< arr.Length; ++i) {
        arr[i] = list[idx];
        ++idx;
        if (idx >= list.Length) idx = 0;
    }
    Vector <T> rt = new Vector<T>(arr);
    return rt;
}

2.4 開始測試（Run）

有了CreateVectorUseRotate幫忙構造測試數據後，我們可以很方便的建立測試程式的骨架了。代碼如下：

public static void Run(TextWriter tw, string indent) {
    RunType(tw, indent, CreateVectorUseRotate(float.MinValue, float.PositiveInfinity, float.NaN, -1.2f, 0f, 1f, 2f, 4f), new Vector<float>(2.0f));
    RunType(tw, indent, CreateVectorUseRotate(double.MinValue, double.PositiveInfinity, -1.2, 0), new Vector<double>(2.0));
    RunType(tw, indent, CreateVectorUseRotate<sbyte>(sbyte.MinValue, sbyte.MaxValue, -1, 0, 1, 2, 3, 4), new Vector<sbyte>(2));
    RunType(tw, indent, CreateVectorUseRotate<short>(short.MinValue, short.MaxValue, -1, 0, 1, 2, 3, 4, 127, 128), new Vector<short>(2));
    RunType(tw, indent, CreateVectorUseRotate<int>(int.MinValue, int.MaxValue, -1, 0, 1, 2, 3, 32768), new Vector<int>(2));
    RunType(tw, indent, CreateVectorUseRotate<long>(long.MinValue, long.MaxValue, -1, 0, 1, 2, 3), new Vector<long>(2));
    RunType(tw, indent, CreateVectorUseRotate<byte>(byte.MinValue, byte.MaxValue, 0, 1, 2, 3, 4), new Vector<byte>(2));
    RunType(tw, indent, CreateVectorUseRotate<ushort>(ushort.MinValue, ushort.MaxValue, 0, 1, 2, 3, 4, 255, 256), new Vector<ushort>(2));
    RunType(tw, indent, CreateVectorUseRotate<uint>(uint.MinValue, uint.MaxValue, 0, 1, 2, 3, 65536), new Vector<uint>(2));
    RunType(tw, indent, CreateVectorUseRotate<ulong>(ulong.MinValue, ulong.MaxValue, 0, 1, 2, 3), new Vector<ulong>(2));
}

2.5 測試指定類型（RunType）

RunType 是一個泛型函數，能夠分別測試每一種數字類型。主要代碼如下。

/// <summary>
/// Run type demo.
/// </summary>
/// <typeparam name="T">Vector type.</typeparam>
/// <param name="tw">Output <see cref="TextWriter"/>.</param>
/// <param name="indent">The indent.</param>
/// <param name="srcT">Source temp value.</param>
/// <param name="src2">Source 2.</param>
static void RunType<T>(TextWriter tw, string indent, Vector<T> srcT, Vector<T> src2) where T : struct {
    Vector<T> src0 = Vector<T>.Zero;
    Vector<T> src1 = Vector<T>.One;
    Vector<T> srcAllOnes = ~Vector<T>.Zero;
    int elementBitSize = (Vector<byte>.Count / Vector<T>.Count) * 8;
    tw.WriteLine(indent + string.Format("-- {0}, Vector<{0}>.Count={1} --", typeof(T).Name, Vector<T>.Count));
    WriteLineFormat(tw, indent, "srcT:\t{0}", srcT);
    //WriteLineFormat(tw, indent, "src2:\t{0}", src2);
    WriteLineFormat(tw, indent, "srcAllOnes:\t{0}", srcAllOnes);

    // -- Methods --
    #region Methods
    //Abs<T>(Vector<T>) Returns a new vector whose elements are the absolute values of the given vector's elements.
    WriteLineFormat(tw, indent, "Abs(srcT):\t{0}", Vector.Abs(srcT));
    WriteLineFormat(tw, indent, "Abs(srcAllOnes):\t{0}", Vector.Abs(srcAllOnes));

    //Add<T>(Vector<T>, Vector<T>) Returns a new vector whose values are the sum of each pair of elements from two given vectors.
    WriteLineFormat(tw, indent, "Add(srcT, src1):\t{0}", Vector.Add(srcT, src1));
    WriteLineFormat(tw, indent, "Add(srcT, src2):\t{0}", Vector.Add(srcT, src2));

    //AndNot<T>(Vector<T>, Vector<T>) Returns a new vector by performing a bitwise And Not operation on each pair of corresponding elements in two vectors.
    WriteLineFormat(tw, indent, "AndNot(srcT, src1):\t{0}", Vector.AndNot(srcT, src1));
    WriteLineFormat(tw, indent, "AndNot(srcT, src2):\t{0}", Vector.AndNot(srcT, src2));

參數列表裡有2個測試用的向量值，分別是 srcT、src2。
方法的頭部定義了一些常用的向量值，如：src0（0的值）、src1（1的值）、srcAllOnes（每個位全為1的值）。隨後輸出 srcT、srcAllOnes 的值，便於口算數據。

然後便是分別對靜態類Vector 的各個方法進行測試了。

2.5.1 非泛型的方法

靜態類Vector所提供的大部分方法是泛型方法，它們在RunType這樣的泛型方法內使用時是很方便的。
但靜態類Vector的部分方法不是泛型方法，而是通過重載（overload）的方式提供各個類型的方法的。這時用起來麻煩一些，需要用 typeof 寫分支代碼。代碼如下。

//ConvertToDouble(Vector<Int64>) Converts a Vector<Int64>to aVector<Double>.
//ConvertToDouble(Vector<UInt64>) Converts a Vector<UInt64> to aVector<Double>.
//ConvertToInt32(Vector<Single>) Converts a Vector<Single> to aVector<Int32>.
//ConvertToInt64(Vector<Double>) Converts a Vector<Double> to aVector<Int64>.
//ConvertToSingle(Vector<Int32>) Converts a Vector<Int32> to aVector<Single>.
//ConvertToSingle(Vector<UInt32>) Converts a Vector<UInt32> to aVector<Single>.
//ConvertToUInt32(Vector<Single>) Converts a Vector<Single> to aVector<UInt32>.
//ConvertToUInt64(Vector<Double>) Converts a Vector<Double> to aVector<UInt64>.
if (typeof(T) == typeof(Double)) {
    WriteLineFormat(tw, indent, "ConvertToInt64(srcT):\t{0}", Vector.ConvertToInt64(Vector.AsVectorDouble(srcT)));
    WriteLineFormat(tw, indent, "ConvertToUInt64(srcT):\t{0}", Vector.ConvertToUInt64(Vector.AsVectorDouble(srcT)));
} else if (typeof(T) == typeof(Single)) {
    WriteLineFormat(tw, indent, "ConvertToInt32(srcT):\t{0}", Vector.ConvertToInt32(Vector.AsVectorSingle(srcT)));
    WriteLineFormat(tw, indent, "ConvertToUInt32(srcT):\t{0}", Vector.ConvertToUInt32(Vector.AsVectorSingle(srcT)));
} else if (typeof(T) == typeof(Int32)) {
    WriteLineFormat(tw, indent, "ConvertToSingle(srcT):\t{0}", Vector.ConvertToSingle(Vector.AsVectorInt32(srcT)));
} else if (typeof(T) == typeof(UInt32)) {
    WriteLineFormat(tw, indent, "ConvertToSingle(srcT):\t{0}", Vector.ConvertToSingle(Vector.AsVectorUInt32(srcT)));
} else if (typeof(T) == typeof(Int64)) {
    WriteLineFormat(tw, indent, "ConvertToDouble(srcT):\t{0}", Vector.ConvertToDouble(Vector.AsVectorInt64(srcT)));
} else if (typeof(T) == typeof(UInt64)) {
    WriteLineFormat(tw, indent, "ConvertToDouble(srcT):\t{0}", Vector.ConvertToDouble(Vector.AsVectorUInt64(srcT)));
}

2.5.2 控制值的測試

部分方法具有控制參數，如進行左移位的ShiftLeft。於是最好寫一個迴圈，分別測試不同的控制值。代碼如下。

#if NET7_0_OR_GREATER
//ShiftLeft(Vector<Byte>, Int32)  Shifts each element of a vector left by the specified amount.
//ShiftLeft(Vector<Int16>, Int32) Shifts each element of a vector left by the specified amount.
//ShiftLeft(Vector<Int32>, Int32) Shifts each element of a vector left by the specified amount.
//ShiftLeft(Vector<Int64>, Int32) Shifts each element of a vector left by the specified amount.
//ShiftLeft(Vector<IntPtr>, Int32)    Shifts each element of a vector left by the specified amount.
//ShiftLeft(Vector<SByte>, Int32) Shifts each element of a vector left by the specified amount.
//ShiftLeft(Vector<UInt16>, Int32)    Shifts each element of a vector left by the specified amount.
//ShiftLeft(Vector<UInt32>, Int32) Shifts each element of a vector left by the specified amount.
//ShiftLeft(Vector<UInt64>, Int32)    Shifts each element of a vector left by the specified amount.
//ShiftLeft(Vector<UIntPtr>, Int32) Shifts each element of a vector left by the specified amount.
int[] shiftCounts = new int[] { 1, elementBitSize - 1, elementBitSize, elementBitSize + 1, -1 };
foreach (int shiftCount in shiftCounts) {
    if (typeof(T) == typeof(Byte)) {
        WriteLineFormat(tw, indent, "ShiftLeft(srcT, " + shiftCount + "):\t{0}", Vector.ShiftLeft(Vector.AsVectorByte(srcT), shiftCount));
    } else if (typeof(T) == typeof(Int16)) {
        WriteLineFormat(tw, indent, "ShiftLeft(srcT, " + shiftCount + "):\t{0}", Vector.ShiftLeft(Vector.AsVectorInt16(srcT), shiftCount));
    } else if (typeof(T) == typeof(Int32)) {
        WriteLineFormat(tw, indent, "ShiftLeft(srcT, " + shiftCount + "):\t{0}", Vector.ShiftLeft(Vector.AsVectorInt32(srcT), shiftCount));
    } else if (typeof(T) == typeof(Int64)) {
        WriteLineFormat(tw, indent, "ShiftLeft(srcT, " + shiftCount + "):\t{0}", Vector.ShiftLeft(Vector.AsVectorInt64(srcT), shiftCount));
    } else if (typeof(T) == typeof(IntPtr)) {
        WriteLineFormat(tw, indent, "ShiftLeft(srcT, " + shiftCount + "):\t{0}", Vector.ShiftLeft(Vector.AsVectorNInt(srcT), shiftCount));
    } else if (typeof(T) == typeof(SByte)) {
        WriteLineFormat(tw, indent, "ShiftLeft(srcT, " + shiftCount + "):\t{0}", Vector.ShiftLeft(Vector.AsVectorSByte(srcT), shiftCount));
    } else if (typeof(T) == typeof(UInt16)) {
        WriteLineFormat(tw, indent, "ShiftLeft(srcT, " + shiftCount + "):\t{0}", Vector.ShiftLeft(Vector.AsVectorUInt16(srcT), shiftCount));
    } else if (typeof(T) == typeof(UInt32)) {
        WriteLineFormat(tw, indent, "ShiftLeft(srcT, " + shiftCount + "):\t{0}", Vector.ShiftLeft(Vector.AsVectorUInt32(srcT), shiftCount));
    } else if (typeof(T) == typeof(UInt64)) {
        WriteLineFormat(tw, indent, "ShiftLeft(srcT, " + shiftCount + "):\t{0}", Vector.ShiftLeft(Vector.AsVectorUInt64(srcT), shiftCount));
    } else if (typeof(T) == typeof(UIntPtr)) {
        WriteLineFormat(tw, indent, "ShiftLeft(srcT, " + shiftCount + "):\t{0}", Vector.ShiftLeft(Vector.AsVectorNUInt(srcT), shiftCount));
    }
}

2.5.3 out 參數

有一些方法通過out 參數返回了多個值，如能使數據變寬的 Widen。於是可利用“if塊”來限制不同類型變數的作用域。代碼如下。

//Widen(Vector<Byte>, Vector<UInt16>, Vector<UInt16>) Widens aVector<Byte> into two Vector<UInt16>instances.
//Widen(Vector<Int16>, Vector<Int32>, Vector<Int32>) Widens a Vector<Int16> into twoVector<Int32> instances.
//Widen(Vector<Int32>, Vector<Int64>, Vector<Int64>) Widens a Vector<Int32> into twoVector<Int64> instances.
//Widen(Vector<SByte>, Vector<Int16>, Vector<Int16>) Widens a Vector<SByte> into twoVector<Int16> instances.
//Widen(Vector<Single>, Vector<Double>, Vector<Double>) Widens a Vector<Single> into twoVector<Double> instances.
//Widen(Vector<UInt16>, Vector<UInt32>, Vector<UInt32>) Widens a Vector<UInt16> into twoVector<UInt32> instances.
//Widen(Vector<UInt32>, Vector<UInt64>, Vector<UInt64>) Widens a Vector<UInt32> into twoVector<UInt64> instances.
if (typeof(T) == typeof(Single)) {
    Vector<Double> low, high;
    Vector.Widen(Vector.AsVectorSingle(srcT), out low, out high);
    WriteLineFormat(tw, indent, "Widen(srcT).low:\t{0}", low);
    WriteLineFormat(tw, indent, "Widen(srcT).high:\t{0}", high);
} else if (typeof(T) == typeof(SByte)) {
    Vector<Int16> low, high;
    Vector.Widen(Vector.AsVectorSByte(srcT), out low, out high);
    WriteLineFormat(tw, indent, "Widen(srcT).low:\t{0}", low);
    WriteLineFormat(tw, indent, "Widen(srcT).high:\t{0}", high);
} else if (typeof(T) == typeof(Int16)) {
    Vector<Int32> low, high;
    Vector.Widen(Vector.AsVectorInt16(srcT), out low, out high);
    WriteLineFormat(tw, indent, "Widen(srcT).low:\t{0}", low);
    WriteLineFormat(tw, indent, "Widen(srcT).high:\t{0}", high);
} else if (typeof(T) == typeof(Int32)) {
    Vector<Int64> low, high;
    Vector.Widen(Vector.AsVectorInt32(srcT), out low, out high);
    WriteLineFormat(tw, indent, "Widen(srcT).low:\t{0}", low);
    WriteLineFormat(tw, indent, "Widen(srcT).high:\t{0}", high);
} else if (typeof(T) == typeof(Byte)) {
    Vector<UInt16> low, high;
    Vector.Widen(Vector.AsVectorByte(srcT), out low, out high);
    WriteLineFormat(tw, indent, "Widen(srcT).low:\t{0}", low);
    WriteLineFormat(tw, indent, "Widen(srcT).high:\t{0}", high);
} else if (typeof(T) == typeof(UInt16)) {
    Vector<UInt32> low, high;
    Vector.Widen(Vector.AsVectorUInt16(srcT), out low, out high);
    WriteLineFormat(tw, indent, "Widen(srcT).low:\t{0}", low);
    WriteLineFormat(tw, indent, "Widen(srcT).high:\t{0}", high);
} else if (typeof(T) == typeof(UInt32)) {
    Vector<UInt64> low, high;
    Vector.Widen(Vector.AsVectorUInt32(srcT), out low, out high);
    WriteLineFormat(tw, indent, "Widen(srcT).low:\t{0}", low);
    WriteLineFormat(tw, indent, "Widen(srcT).high:\t{0}", high);
}

2.6 格式化輸出（WriteLineFormat）

雖然只讀結構體 Vector<T>支持 ToString，能夠輸出各個元素的數值。但在很多時候（例如使用 AndNot 的函數進行二進位運算時），我們需要觀察它的二進位數據，故需要以十六進位的方式來顯示其中的數據，但Vector<T>不支持十六進位格式化（X）。
於是專門為 Vector<T> 寫了一個重載函數，用於輸出它的十六進位值。

/// <summary>
/// Get hex string.
/// </summary>
/// <typeparam name="T">Vector value type.</typeparam>
/// <param name="src">Source value.</param>
/// <param name="separator">The separator.</param>
/// <param name="noFixEndian">No fix endian.</param>
/// <returns>Returns hex string.</returns>
private static string GetHex<T>(Vector<T> src, string separator, bool noFixEndian) where T : struct {
    Vector<byte> list = Vector.AsVectorByte(src);
    int unitCount = Vector<T>.Count;
    int unitSize = Vector<byte>.Count / unitCount;
    bool fixEndian = false;
    if (!noFixEndian && BitConverter.IsLittleEndian) fixEndian = true;
    StringBuilder sb = new StringBuilder();
    if (fixEndian) {
        // IsLittleEndian.
        for (int i=0; i < unitCount; ++i) {
            if ((i > 0)) {
                if (!string.IsNullOrEmpty(separator)) {
                    sb.Append(separator);
                }
            }
            int idx = unitSize * (i+1) - 1;
            for(int j = 0; j < unitSize; ++j) {
                byte by = list[idx];
                --idx;
                sb.Append(by.ToString("X2"));
            }
        }
    } else {
        for (int i = 0; i < Vector<byte>.Count; ++i) {
            byte by = list[i];
            if ((i > 0) && (0 == i % unitSize)) {
                if (!string.IsNullOrEmpty(separator)) {
                    sb.Append(separator);
                }
            }
            sb.Append(by.ToString("X2"));
        }
    }
    return sb.ToString();
}

/// <summary>
/// WriteLine with format.
/// </summary>
/// <typeparam name="T">Vector value type.</typeparam>
/// <param name="tw">The TextWriter.</param>
/// <param name="indent">The indent.</param>
/// <param name="format">The format.</param>
/// <param name="src">Source value</param>
private static void WriteLineFormat<T>(TextWriter tw, string indent, string format, Vector<T> src) where T : struct {
    if (null == tw) return;
    string line = indent + string.Format(format, src);
    string hex = GetHex(src, " ", false);
    line += "\t# (" + hex +")";
    tw.WriteLine(line);
}

三、運行結果

由於Vector類提供了大量的向量方法，再乘以10種基元類型，導致本程式的輸出信息很長，達到了90多KB。
為了避免文章過長，於是這裡僅摘錄了主要的輸出信息。

VectorClassDemo50

IsRelease:	False
EnvironmentVariable(PROCESSOR_IDENTIFIER):	Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
Environment.ProcessorCount:	8
Environment.Is64BitOperatingSystem:	True
Environment.Is64BitProcess:	True
Environment.OSVersion:	Microsoft Windows NT 10.0.19044.0
Environment.Version:	7.0.0
RuntimeEnvironment.GetRuntimeDirectory:	C:\Program Files\dotnet\shared\Microsoft.NETCore.App\7.0.0\
RuntimeInformation.FrameworkDescription:	.NET 7.0.0
BitConverter.IsLittleEndian:	True
IntPtr.Size:	8
Vector.IsHardwareAccelerated:	True
Vector<byte>.Count:	32	# 256bit
Vector<T>.Assembly.CodeBase:	file:///C:/Program Files/dotnet/shared/Microsoft.NETCore.App/7.0.0/System.Private.CoreLib.dll

[Intrinsics.X86]
Aes.IsSupported:	True
Aes.X64.IsSupported:	True
Avx.IsSupported:	True
Avx.X64.IsSupported:	True
Avx2.IsSupported:	True
Avx2.X64.IsSupported:	True
AvxVnni.IsSupported:	False
AvxVnni.X64.IsSupported:	False
Bmi1.IsSupported:	True
Bmi1.X64.IsSupported:	True
Bmi2.IsSupported:	True
Bmi2.X64.IsSupported:	True
Fma.IsSupported:	True
Fma.X64.IsSupported:	True
Lzcnt.IsSupported:	True
Lzcnt.X64.IsSupported:	True
Pclmulqdq.IsSupported:	True
Pclmulqdq.X64.IsSupported:	True
Popcnt.IsSupported:	True
Popcnt.X64.IsSupported:	True
Sse.IsSupported:	True
Sse.X64.IsSupported:	True
Sse2.IsSupported:	True
Sse2.X64.IsSupported:	True
Sse3.IsSupported:	True
Sse3.X64.IsSupported:	True
Sse41.IsSupported:	True
Sse41.X64.IsSupported:	True
Sse42.IsSupported:	True
Sse42.X64.IsSupported:	True
Ssse3.IsSupported:	True
Ssse3.X64.IsSupported:	True
X86Base.IsSupported:	True
X86Base.X64.IsSupported:	True
X86Serialize.IsSupported:	False
X86Serialize.X64.IsSupported:	False

[Intrinsics.Arm]
AdvSimd.IsSupported:	False
AdvSimd.Arm64.IsSupported:	False
Aes.IsSupported:	False
Aes.Arm64.IsSupported:	False
ArmBase.IsSupported:	False
ArmBase.Arm64.IsSupported:	False
Crc32.IsSupported:	False
Crc32.Arm64.IsSupported:	False
Dp.IsSupported:	False
Dp.Arm64.IsSupported:	False
Rdm.IsSupported:	False
Rdm.Arm64.IsSupported:	False
Sha1.IsSupported:	False
Sha1.Arm64.IsSupported:	False
Sha256.IsSupported:	False
Sha256.Arm64.IsSupported:	False

-- Single, Vector<Single>.Count=8 --
srcT:	<-3.4028235E+38, ∞, NaN, -1.2, 0, 1, 2, 4>	# (FF7FFFFF 7F800000 FFC00000 BF99999A 00000000 3F800000 40000000 40800000)
srcAllOnes:	<NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN>	# (FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF)
Abs(srcT):	<3.4028235E+38, ∞, NaN, 1.2, 0, 1, 2, 4>	# (7F7FFFFF 7F800000 7FC00000 3F99999A 00000000 3F800000 40000000 40800000)
Abs(srcAllOnes):	<NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN>	# (7FFFFFFF 7FFFFFFF 7FFFFFFF 7FFFFFFF 7FFFFFFF 7FFFFFFF 7FFFFFFF 7FFFFFFF)
Add(srcT, src1):	<-3.4028235E+38, ∞, NaN, -0.20000005, 1, 2, 3, 5>	# (FF7FFFFF 7F800000 FFC00000 BE4CCCD0 3F800000 40000000 40400000 40A00000)
Add(srcT, src2):	<-3.4028235E+38, ∞, NaN, 0.79999995, 2, 3, 4, 6>	# (FF7FFFFF 7F800000 FFC00000 3F4CCCCC 40000000 40400000 40800000 40C00000)
AndNot(srcT, src1):	<-3.9999998, 2, -3, -2.350989E-39, 0, 0, 2, 2>	# (C07FFFFF 40000000 C0400000 8019999A 00000000 00000000 40000000 40000000)
AndNot(srcT, src2):	<-0.99999994, 1, -1.5, -1.2, 0, 1, 0, 1.1754944E-38>	# (BF7FFFFF 3F800000 BFC00000 BF99999A 00000000 3F800000 00000000 00800000)
BitwiseAnd(srcT, src1):	<0.5, 1, 1, 1, 0, 1, 0, 1.1754944E-38>	# (3F000000 3F800000 3F800000 3F800000 00000000 3F800000 00000000 00800000)
BitwiseAnd(srcT, src2):	<2, 2, 2, 0, 0, 0, 2, 2>	# (40000000 40000000 40000000 00000000 00000000 00000000 40000000 40000000)
BitwiseOr(srcT, src1):	<NaN, ∞, NaN, -1.2, 1, 1, ∞, ∞>	# (FFFFFFFF 7F800000 FFC00000 BF99999A 3F800000 3F800000 7F800000 7F800000)
BitwiseOr(srcT, src2):	<-3.4028235E+38, ∞, NaN, NaN, 2, ∞, 2, 4>	# (FF7FFFFF 7F800000 FFC00000 FF99999A 40000000 7F800000 40000000 40800000)
...
Widen(srcT).low:	<-3.4028234663852886E+38, ∞, NaN, -1.2000000476837158>	# (C7EFFFFFE0000000 7FF0000000000000 FFF8000000000000 BFF3333340000000)
Widen(srcT).high:	<0, 1, 2, 4>	# (0000000000000000 3FF0000000000000 4000000000000000 4010000000000000)
...

-- Double, Vector<Double>.Count=4 --
srcT:	<-1.7976931348623157E+308, ∞, -1.2, 0>	# (FFEFFFFFFFFFFFFF 7FF0000000000000 BFF3333333333333 0000000000000000)
srcAllOnes:	<NaN, NaN, NaN, NaN>	# (FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF)
Abs(srcT):	<1.7976931348623157E+308, ∞, 1.2, 0>	# (7FEFFFFFFFFFFFFF 7FF0000000000000 3FF3333333333333 0000000000000000)
Abs(srcAllOnes):	<NaN, NaN, NaN, NaN>	# (7FFFFFFFFFFFFFFF 7FFFFFFFFFFFFFFF 7FFFFFFFFFFFFFFF 7FFFFFFFFFFFFFFF)
Add(srcT, src1):	<-1.7976931348623157E+308, ∞, -0.19999999999999996, 1>	# (FFEFFFFFFFFFFFFF 7FF0000000000000 BFC9999999999998 3FF0000000000000)
Add(srcT, src2):	<-1.7976931348623157E+308, ∞, 0.8, 2>	# (FFEFFFFFFFFFFFFF 7FF0000000000000 3FE999999999999A 4000000000000000)
AndNot(srcT, src1):	<-3.9999999999999996, 2, -4.4501477170144E-309, 0>	# (C00FFFFFFFFFFFFF 4000000000000000 8003333333333333 0000000000000000)
AndNot(srcT, src2):	<-0.9999999999999999, 1, -1.2, 0>	# (BFEFFFFFFFFFFFFF 3FF0000000000000 BFF3333333333333 0000000000000000)
BitwiseAnd(srcT, src1):	<0.5, 1, 1, 0>	# (3FE0000000000000 3FF0000000000000 3FF0000000000000 0000000000000000)
BitwiseAnd(srcT, src2):	<2, 2, 0, 0>	# (4000000000000000 4000000000000000 0000000000000000 0000000000000000)
BitwiseOr(srcT, src1):	<NaN, ∞, -1.2, 1>	# (FFFFFFFFFFFFFFFF 7FF0000000000000 BFF3333333333333 3FF0000000000000)
BitwiseOr(srcT, src2):	<-1.7976931348623157E+308, ∞, NaN, 2>	# (FFEFFFFFFFFFFFFF 7FF0000000000000 FFF3333333333333 4000000000000000)
...

-- UInt64, Vector<UInt64>.Count=4 --
srcT:	<0, 18446744073709551615, 0, 1>	# (0000000000000000 FFFFFFFFFFFFFFFF 0000000000000000 0000000000000001)
srcAllOnes:	<18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615>	# (FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF)
Abs(srcT):	<0, 18446744073709551615, 0, 1>	# (0000000000000000 FFFFFFFFFFFFFFFF 0000000000000000 0000000000000001)
Abs(srcAllOnes):	<18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615>	# (FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF)
Add(srcT, src1):	<1, 0, 1, 2>	# (0000000000000001 0000000000000000 0000000000000001 0000000000000002)
Add(srcT, src2):	<2, 1, 2, 3>	# (0000000000000002 0000000000000001 0000000000000002 0000000000000003)
AndNot(srcT, src1):	<0, 18446744073709551614, 0, 0>	# (0000000000000000 FFFFFFFFFFFFFFFE 0000000000000000 0000000000000000)
AndNot(srcT, src2):	<0, 18446744073709551613, 0, 1>	# (0000000000000000 FFFFFFFFFFFFFFFD 0000000000000000 0000000000000001)
BitwiseAnd(srcT, src1):	<0, 1, 0, 1>	# (0000000000000000 0000000000000001 0000000000000000 0000000000000001)
BitwiseAnd(srcT, src2):	<0, 2, 0, 0>	# (0000000000000000 0000000000000002 0000000000000000 0000000000000000)
BitwiseOr(srcT, src1):	<1, 18446744073709551615, 1, 1>	# (0000000000000001 FFFFFFFFFFFFFFFF 0000000000000001 0000000000000001)
BitwiseOr(srcT, src2):	<2, 18446744073709551615, 2, 3>	# (0000000000000002 FFFFFFFFFFFFFFFF 0000000000000002 0000000000000003)
...
ShiftLeft(srcT, 1):	<0, 18446744073709551614, 0, 2>	# (0000000000000000 FFFFFFFFFFFFFFFE 0000000000000000 0000000000000002)
ShiftLeft(srcT, 63):	<0, 9223372036854775808, 0, 9223372036854775808>	# (0000000000000000 8000000000000000 0000000000000000 8000000000000000)
ShiftLeft(srcT, 64):	<0, 18446744073709551615, 0, 1>	# (0000000000000000 FFFFFFFFFFFFFFFF 0000000000000000 0000000000000001)
ShiftLeft(srcT, 65):	<0, 18446744073709551614, 0, 2>	# (0000000000000000 FFFFFFFFFFFFFFFE 0000000000000000 0000000000000002)
ShiftLeft(srcT, -1):	<0, 9223372036854775808, 0, 9223372036854775808>	# (0000000000000000 8000000000000000 0000000000000000 8000000000000000)

完整的測試結果，請運行程式進行查看。
源碼地址——
https://github.com/zyl910/BenchmarkVector/tree/main/VectorClassDemo