使用 F# 手寫一個 Typedoc 轉 C# 代碼生成器,方便一切 C# 項目對 TypeScript 項目的封裝。 ...
前言
我們經常會遇到這樣的事情:有時候我們找到了一個庫,但是這個庫是用 TypeScript 寫的,但是我們想在 C# 調用,於是我們需要設法將原來的 TypeScript 類型聲明翻譯成 C# 的代碼,然後如果是 UI 組件的話,我們需要將其封裝到一個 WebView 裡面,然後通過 JavaScript 和 C# 的互操作功能來調用該組件的各種方法,支持該組件的各種事件等等。
但是這是一個苦力活,尤其是類型翻譯這一步。
這個是我最近在幫助維護一個開源 UWP 項目 monaco-editor-uwp 所需要的,該項目將微軟的 monaco 編輯器封裝成了 UWP 組件。
然而它的 monaco.d.ts 足足有 1.5 mb,並且 API 經常會變化,如果人工翻譯,不僅工作量十分大,還可能會漏掉新的變化,但是如果有一個自動生成器的話,那麼人工的工作就會少很多。
目前 GitHub 上面有一個叫做 QuickType 的項目,但是這個項目對 TypeScript 的支持極其有限,仍然停留在 TypeScript 3.2,而且遇到不認識的類型就會報錯,比如 DOM 類型等等。
因此我決定手寫一個代碼生成器 TypedocConverter:https://github.com/hez2010/TypedocConverter
構思
本來是打算從 TypeScript 詞法和語義分析開始做的,但是發現有一個叫做 Typedoc 的項目已經幫我們完成了這一步,而且支持輸出 JSON schema,那麼剩下的事情就簡單了:我們只需要將 TypeScript 的 AST 轉換成 C# 的 AST,然後再將 AST 還原成代碼即可。
那麼話不多說,這就開寫。
構建 Typescipt AST 類型綁定
藉助於 F# 更加強大的類型系統,類型的聲明和使用非常簡單,並且具有完善的recursive pattern。pattern matching、option types 等支持,這也是該項目選用 F# 而不是 C# 的原因,雖然 C# 也支持這些,也有一定的 FP 能力,但是它還是偏 OOP,寫起來會有很多的樣板代碼,非常的繁瑣。
我們將 Typescipt 的類型綁定定義到 Definition.fs 中,這一步直接將 Typedoc 的定義翻譯到 F# 即可:
首先是 ReflectionKind 枚舉,該枚舉表示了 JSON Schema 中各節點的類型:
type ReflectionKind = | Global = 0 | ExternalModule = 1 | Module = 2 | Enum = 4 | EnumMember = 16 | Variable = 32 | Function = 64 | Class = 128 | Interface = 256 | Constructor = 512 | Property = 1024 | Method = 2048 | CallSignature = 4096 | IndexSignature = 8192 | ConstructorSignature = 16384 | Parameter = 32768 | TypeLiteral = 65536 | TypeParameter = 131072 | Accessor = 262144 | GetSignature = 524288 | SetSignature = 1048576 | ObjectLiteral = 2097152 | TypeAlias = 4194304 | Event = 8388608 | Reference = 16777216
然後是類型修飾標誌 ReflectionFlags,註意該 record 所有的成員都是 option 的
type ReflectionFlags = { IsPrivate: bool option IsProtected: bool option IsPublic: bool option IsStatic: bool option IsExported: bool option IsExternal: bool option IsOptional: bool option IsReset: bool option HasExportAssignment: bool option IsConstructorProperty: bool option IsAbstract: bool option IsConst: bool option IsLet: bool option }
然後到了我們的 Reflection,由於每一種類型的 Reflection 都可以由 ReflectionKind 來區分,因此我選擇將所有類型的 Reflection 合併成為一個 record,而不是採用 Union Types,因為後者雖然看上去清晰,但是在實際 parse AST 的時候會需要大量 pattern matching 的代碼。
由於部分 records 相互引用,因此我們使用 and
來定義 recursive records。
type Reflection = { Id: int Name: string OriginalName: string Kind: ReflectionKind KindString: string option Flags: ReflectionFlags Parent: Reflection option Comment: Comment option Sources: SourceReference list option Decorators: Decorator option Decorates: Type list option Url: string option Anchor: string option HasOwnDocument: bool option CssClasses: string option DefaultValue: string option Type: Type option TypeParameter: Reflection list option Signatures: Reflection list option IndexSignature: Reflection list option GetSignature: Reflection list option SetSignature: Reflection list option Overwrites: Type option InheritedFrom: Type option ImplementationOf: Type option ExtendedTypes: Type list option ExtendedBy: Type list option ImplementedTypes: Type list option ImplementedBy: Type list option TypeHierarchy: DeclarationHierarchy option Children: Reflection list option Groups: ReflectionGroup list option Categories: ReflectionCategory list option Reflections: Map<int, Reflection> option Directory: SourceDirectory option Files: SourceFile list option Readme: string option PackageInfo: obj option Parameters: Reflection list option } and DeclarationHierarchy = { Type: Type list Next: DeclarationHierarchy option IsTarget: bool option } and Type = { Type: string Id: int option Name: string option ElementType: Type option Value: string option Types: Type list option TypeArguments: Type list option Constraint: Type option Declaration: Reflection option } and Decorator = { Name: string Type: Type option Arguments: obj option } and ReflectionGroup = { Title: string Kind: ReflectionKind Children: int list CssClasses: string option AllChildrenHaveOwnDocument: bool option AllChildrenAreInherited: bool option AllChildrenArePrivate: bool option AllChildrenAreProtectedOrPrivate: bool option AllChildrenAreExternal: bool option SomeChildrenAreExported: bool option Categories: ReflectionCategory list option } and ReflectionCategory = { Title: string Children: int list AllChildrenHaveOwnDocument: bool option } and SourceDirectory = { Parent: SourceDirectory option Directories: Map<string, SourceDirectory> Groups: ReflectionGroup list option Files: SourceFile list Name: string option DirName: string option Url: string option } and SourceFile = { FullFileName: string FileName: string Name: string Url: string option Parent: SourceDirectory option Reflections: Reflection list option Groups: ReflectionGroup list option } and SourceReference = { File: SourceFile option FileName: string Line: int Character: int Url: string option } and Comment = { ShortText: string Text: string option Returns: string option Tags: CommentTag list option } and CommentTag = { TagName: string ParentName: string Text: string }
這樣,我們就簡單的完成了類型綁定的翻譯,接下來要做的就是將 Typedoc 生成的 JSON 反序列化成我們所需要的東西即可。
反序列化
雖然想著好像一切都很順利,但是實際上 System.Text.Json、Newtonsoft.JSON 等均不支持 F# 的 option types,所需我們還需要一個 JsonConverter 處理 option types。
本項目採用 Newtonsoft.Json,因為 System.Text.Json 目前尚不成熟。得益於 F# 對 OOP 的相容,我們可以很容易的實現一個 OptionConverter
。
type OptionConverter() = inherit JsonConverter() override __.CanConvert(objectType: Type) : bool = match objectType.IsGenericType with | false -> false | true -> typedefof<_ option> = objectType.GetGenericTypeDefinition() override __.WriteJson(writer: JsonWriter, value: obj, serializer: JsonSerializer) : unit = serializer.Serialize(writer, if isNull value then null else let _, fields = FSharpValue.GetUnionFields(value, value.GetType()) fields.[0] ) override __.ReadJson(reader: JsonReader, objectType: Type, _existingValue: obj, serializer: JsonSerializer) : obj = let innerType = objectType.GetGenericArguments().[0] let value = serializer.Deserialize( reader, if innerType.IsValueType then (typedefof<_ Nullable>).MakeGenericType([|innerType|]) else innerType ) let cases = FSharpType.GetUnionCases objectType if isNull value then FSharpValue.MakeUnion(cases.[0], [||]) else FSharpValue.MakeUnion(cases.[1], [|value|])
這樣所有的工作就完成了。
我們可以去 monaco-editor 倉庫下載 monaco.d.ts 測試一下我們的 JSON Schema deserializer,可以發現 JSON Sechma 都被正確地反序列化了。
反序列化結果
構建 C# AST 類型
當然,此 "AST" 非彼 AST,我們沒有必要其細化到語句層面,因為我們只是要寫一個簡單的代碼生成器,我們只需要構建實體結構即可。
我們將實體結構定義到 Entity.fs 中,在此我們只需支持 interface、class、enum 即可,對於 class 和 interface,我們只需要支持 method、property 和 event 就足夠了。
當然,代碼中存在泛型的可能,這一點我們也需要考慮。
type EntityBodyType = { Type: string Name: string option InnerTypes: EntityBodyType list } type EntityMethod = { Comment: string Modifier: string list Type: EntityBodyType Name: string TypeParameter: string list Parameter: EntityBodyType list } type EntityProperty = { Comment: string Modifier: string list Name: string Type: EntityBodyType WithGet: bool WithSet: bool IsOptional: bool InitialValue: string option } type EntityEvent = { Comment: string Modifier: string list DelegateType: EntityBodyType Name: string IsOptional: bool } type EntityEnum = { Comment: string Name: string Value: int64 option } type EntityType = | Interface | Class | Enum | StringEnum type Entity = { Namespace: string Name: string Comment: string Methods: EntityMethod list Properties: EntityProperty list Events: EntityEvent list Enums: EntityEnum list InheritedFrom: EntityBodyType list Type: EntityType TypeParameter: string list Modifier: string list }
文檔化註釋生成器
文檔化註釋也是少不了的東西,能極大方便開發者後續使用生成的類型綁定,而無需參照原 typescript 類型聲明上的註釋。
代碼很簡單,只需要將文本處理成 xml 即可。
let escapeSymbols (text: string) = if isNull text then "" else text .Replace("&", "&") .Replace("<", "<") .Replace(">", ">") let toCommentText (text: string) = if isNull text then "" else text.Split "\n" |> Array.map (fun t -> "/// " + escapeSymbols t) |> Array.reduce(fun accu next -> accu + "\n" + next) let getXmlDocComment (comment: Comment) = let prefix = "/// <summary>\n" let suffix = "\n/// </summary>" let summary = match comment.Text with | Some text -> prefix + toCommentText comment.ShortText + toCommentText text + suffix | _ -> match comment.ShortText with | "" -> "" | _ -> prefix + toCommentText comment.ShortText + suffix let returns = match comment.Returns with | Some text -> "\n/// <returns>\n" + toCommentText text + "\n/// </returns>" | _ -> "" summary + returns
類型生成器
Typescript 的類型系統較為靈活,包括 union types、intersect types 等等,這些即使是目前的 C# 8 都不能直接表達,需要等到 C# 9 才行。當然我們可以生成一個 struct 併為其編寫隱式轉換操作符重載,支持 union types,但是目前尚未實現,我們就先用 union types 中的第一個類型代替,而對於 intersect types,我們姑且先使用 object。
然而 union types 有一個特殊情況:string literals types alias。就是這樣的東西:
type Size = "XS" | "S" | "M" | "L" | "XL";
即純 string 值組合的 type alias,這個我們還是有必要支持的,因為在 typescript 中用的非常廣泛。
C# 在沒有對應語法的時候要怎麼支持呢?很簡單,我們創建一個 enum,該 enum 包含該類型中的所有元素,然後我們為其編寫 JsonConverter,這樣就能確保序列化後,typescript 方能正確識別類型,而在 C# 又有 type sound 的編碼體驗。
另外,我們需要提供一些常用的類型轉換:
Array<T>
->T[]
Set<T>
->System.Collections.Generic.ISet<T>
Map<T>
->System.Collections.Generic.IDictionary<T>
Promise<T>
->System.Threading.Tasks.Task<T>
- callbacks ->
System.Func<T...>
,System.Action<T...>
- Tuple 類型
- 其他的數組類型如
Uint32Array
- 對於
<void>
,我們需要解除泛型,即T<void>
->T
那麼實現如下:
let rec getType (typeInfo: Type): EntityBodyType = let genericType = match typeInfo.Type with | "intrinsic" -> match typeInfo.Name with | Some name -> match name with | "number" -> { Type = "double"; InnerTypes = []; Name = None } | "boolean" -> { Type = "bool"; InnerTypes = []; Name = None } | "string" -> { Type = "string"; InnerTypes = []; Name = None } | "void" -> { Type = "void"; InnerTypes = []; Name = None } | _ -> { Type = "object"; InnerTypes = []; Name = None } | _ -> { Type = "object"; InnerTypes = []; Name = None } | "reference" | "typeParameter" -> match typeInfo.Name with | Some name -> match name with | "Promise" -> { Type = "System.Threading.Tasks.Task"; InnerTypes = []; Name = None } | "Set" -> { Type = "System.Collections.Generic.ISet"; InnerTypes = []; Name = None } | "Map" -> { Type = "System.Collections.Generic.IDictionary"; InnerTypes = []; Name = None } | "Array" -> { Type = "System.Array"; InnerTypes = []; Name = None } | "BigUint64Array" -> { Type = "System.Array"; InnerTypes = [{ Type = "ulong"; InnerTypes = [ ]; Name = None };]; Name = None }; | "Uint32Array" -> { Type = "System.Array"; InnerTypes = [{ Type = "uint"; InnerTypes = [ ]; Name = None };]; Name = None }; | "Uint16Array" -> { Type = "System.Array"; InnerTypes = [{ Type = "ushort"; InnerTypes = [ ]; Name = None };]; Name = None }; | "Uint8Array" -> { Type = "System.Array"; InnerTypes = [{ Type = "byte"; InnerTypes = [ ]; Name = None };]; Name = None }; | "BigInt64Array" -> { Type = "System.Array"; InnerTypes = [{ Type = "long"; InnerTypes = [ ]; Name = None };]; Name = None }; | "Int32Array" -> { Type = "System.Array"; InnerTypes = [{ Type = "int"; InnerTypes = [ ]; Name = None };]; Name = None }; | "Int16Array" -> { Type = "System.Array"; InnerTypes = [{ Type = "short"; InnerTypes = [ ]; Name = None };]; Name = None }; | "Int8Array" -> { Type = "System.Array"; InnerTypes = [{ Type = "char"; InnerTypes = [ ]; Name = None };]; Name = None }; | "RegExp" -> { Type = "string"; InnerTypes = []; Name = None }; | x -> { Type = x; InnerTypes = []; Name = None }; | _ -> { Type = "object"; InnerTypes = []; Name = None } | "array" -> match typeInfo.ElementType with | Some elementType -> { Type = "System.Array"; InnerTypes = [getType elementType]; Name = None } | _ -> { Type = "System.Array"; InnerTypes = [{ Type = "object"; InnerTypes = []; Name = None }]; Name = None } | "stringLiteral" -> { Type = "string"; InnerTypes = []; Name = None } | "tuple" -> match typeInfo.Types with | Some innerTypes -> match innerTypes with | [] -> { Type = "object"; InnerTypes = []; Name = None } | _ -> { Type = "System.ValueTuple"; InnerTypes = innerTypes |> List.map getType; Name = None } | _ -> { Type = "object"; InnerTypes = []; Name = None } | "union" -> match typeInfo.Types with | Some innerTypes -> match innerTypes with | [] -> { Type = "object"; InnerTypes = []; Name = None } | _ -> printWarning ("Taking only the first type " + innerTypes.[0].Type + " for the entire union type.") getType innerTypes.[0] // TODO: generate unions | _ ->{ Type = "object"; InnerTypes = []; Name = None } | "intersection" -> { Type = "object"; InnerTypes = []; Name = None } // TODO: generate intersections | "reflection" -> match typeInfo.Declaration with | Some dec -> match dec.Signatures with | Some [signature] -> let paras = match signature.Parameters with | Some p -> p |> List.map (fun pi -> match pi.Type with | Some pt -> Some (getType pt) | _ -> None ) |> List.collect (fun x -> match x with | Some s -> [s] | _ -> [] ) | _ -> [] let rec getDelegateParas (paras: EntityBodyType list): EntityBodyType list = match paras with | [x] -> [{ Type = x.Type; InnerTypes = x.InnerTypes; Name = None }] | (front::tails) -> [front] @ getDelegateParas tails | _ -> [] let returnsType = match signature.Type with | Some t -> getType t | _ -> { Type = "void"; InnerTypes = []; Name = None } let typeParas = getDelegateParas paras match typeParas with | [] -> { Type = "System.Action"; InnerTypes = []; Name = None } | _ -> if returnsType.Type = "void" then { Type = "System.Action"; InnerTypes = typeParas; Name = None } else { Type = "System.Func"; InnerTypes = typeParas @ [returnsType]; Name = None } | _ -> { Type = "object"; InnerTypes = []; Name = None } | _ -> { Type = "object"; InnerTypes = []; Name = None } | _ -> { Type = "object"; InnerTypes = []; Name = None } let mutable innerTypes = match typeInfo.TypeArguments with | Some args -> getGenericTypeArguments args | _ -> [] if genericType.Type = "System.Threading.Tasks.Task" then match innerTypes with | (front::_) -> if front.Type = "void" then innerTypes <- [] else () | _ -> () else () { Type = genericType.Type; Name = None; InnerTypes = if innerTypes = [] then genericType.InnerTypes else innerTypes; } and getGenericTypeArguments (typeInfos: Type list): EntityBodyType list = typeInfos |> List.map getType and getGenericTypeParameters (nodes: Reflection list) = // TODO: generate constaints let types = nodes |> List.where(fun x -> x.Kind = ReflectionKind.TypeParameter) |> List.map (fun x -> x.Name) types |> List.map (fun x -> {| Type = x; Constraint = "" |})
當然,目前尚不支持生成泛型約束,如果以後有時間的話會考慮添加。
修飾生成器
例如 public
、private
、protected
、static
等等。這一步很簡單,直接將 ReflectionFlags 轉換一下即可,個人覺得使用 mutable 代碼會讓代碼變得非常不優雅,但是有的時候還是需要用一下的,不然會極大地提高代碼的複雜度。
let getModifier (flags: ReflectionFlags) = let mutable modifier = [] match flags.IsPublic with | Some flag -> if flag then modifier <- modifier |> List.append [ "public" ] else () | _ -> () match flags.IsAbstract with | Some flag -> if flag then modifier <- modifier |> List.append [ "abstract" ] else () | _ -> () match flags.IsPrivate with | Some flag -> if flag then modifier <- modifier |> List.append [ "private" ] else () | _ -> () match flags.IsProtected with | Some flag -> if flag then modifier <- modifier |> List.append [ "protected" ] else () | _ -> () match flags.IsStatic with | Some flag -> if flag then modifier <- modifier |> List.append [ "static" ] else () | _ -> () modifier
Enum 生成器
終於到 parse 實體的部分了,我們先從最簡單的做起:枚舉。 代碼很簡單,直接將原 AST 中的枚舉部分轉換一下即可。
let parseEnum (section: string) (node: Reflection): Entity = let values = match node.Children with | Some children -> children |> List.where (fun x -> x.Kind = ReflectionKind.EnumMember) | None -> [] { Type = EntityType.Enum; Namespace = if section = "" then "TypeDocGenerator" else section; Modifier = getModifier node.Flags; Name = node.Name Comment = match node.Comment with | Some comment -> getXmlDocComment comment | _ -> "" Methods = []; Properties = []; Events = []; InheritedFrom = []; Enums = values |> List.map (fun x -> let comment = match x.Comment with | Some comment -> getXmlDocComment comment | _ -> "" let mutable intValue = 0L match x.DefaultValue with // ????? | Some value -> if Int64.TryParse(value, &intValue) then { Comment = comment; Name = toPascalCase x.Name; Value = Some intValue; } else match getEnumReferencedValue values value x.Name with | Some t -> { Comment = comment; Name = x.Name; Value = Some (int64 t); } | _ -> { Comment = comment; Name = x.Name; Value = None; } | _ -> { Comment = comment; Name = x.Name; Value = None; } ); TypeParameter = [] }
你會註意到一個上面我有一處標了個 ?????
,這是在乾什麼呢?
其實,TypeScript 的 enum 是 recursive 的,也就意味著定義的時候,一個元素可以引用另一個元素,比如這樣:
enum MyEnum {
A = 1,
B = 2,
C = A
}
這個時候,我們需要查找它引用的枚舉值,比如在上面的例子裡面,處理 C 的時候,需要將它的值 A 用真實值 1 代替。所以我們還需要一個查找函數:
let rec getEnumReferencedValue (nodes: Reflection list) value name = match nodes |> List.where(fun x -> match x.DefaultValue with | Some v -> v <> value && not (name = x.Name) | _ -> true ) |> List.where(fun x -> x.Name = value) |> List.tryFind(fun x -> let mutable intValue = 0 match x.DefaultValue with | Some y -> Int32.TryParse(y, &intValue) | _ -> true ) with | Some t -> t.DefaultValue | _ -> None
這樣我們的 Enum parser 就完成了。
Interface 和 Class 生成器
下麵到了重頭戲,interface 和 class 才是類型綁定的關鍵。
我們的函數簽名是這樣的:
let parseInterfaceAndClass (section: string) (node: Reflection) (isInterface: bool): Entity = ...
首先我們從 Reflection 節點中查找並生成註釋、修飾、名稱、泛型參數、繼承關係、方法、屬性和事件:
let comment = match node.Comment with | Some comment -> getXmlDocComment comment | _ -> "" let exts = (match node.ExtendedTypes with | Some types -> types |> List.map(fun x -> getType x) | _ -> []) @ (match node.ImplementedTypes with | Some types -> types |> List.map(fun x -> getType x) | _ -> []) let genericType = let types = match node.TypeParameter with | Some tp -> Some (getGenericTypeParameters tp) | _ -> None match types with | Some result -> result | _ -> [] let properties = match node.Children with | Some children -> if isInterface then children |> List.where(fun x -> x.Kind = ReflectionKind.Property) |> List.where(fun x -> x.InheritedFrom = None) // exclude inhreited properties |> List.where(fun x -> x.Overwrites = None) // exclude overrites properties else children |> List.where(fun x -> x.Kind = ReflectionKind.Property) | _ -> [] let events = match node.Children with | Some children -> if isInterface then children |> List.where(fun x -> x.Kind = ReflectionKind.Event) |> List.where(fun x -> x.InheritedFrom = None) // exclude inhreited events |> List.where(fun x -> x.Overwrites = None) // exclude overrites events else children |> List.where(fun x -> x.Kind = ReflectionKind.Event) | _ -> [] let methods = match node.Children with | Some children -> if isInterface then children |> List.where(fun x -> x.Kind = ReflectionKind.Method) |> List.where(fun x -> x.InheritedFrom = None) // exclude inhreited methods |> List.where(fun x -> x.Overwrites = None) // exclude overrites methods else children |> List.where(fun x -> x.Kind = ReflectionKind.Method) | _ -> []
有一點要註意,就是對於 interface 來說,子 interface 無需重覆父 interface 的成員,因此需要排除。
然後我們直接返回一個 record,代表該節點的實體即可。
{ Type = if isInterface then EntityType.Interface else EntityTy