GraphQuery - Powerful html/xml query language

来源:https://www.cnblogs.com/gomaster/archive/2018/10/25/9850385.html
-Advertisement-
Play Games

GraphQuery " " " " " " " " " " " " GraphQuery is a query language and execution engine tied to any backend service. It is . Project Address: "GraphQue ...


GraphQuery CircleCI Go Report Card Build Status Coverage Status GoDoc Gitter chat

GraphQuery

GraphQuery is a query language and execution engine tied to any backend service. It is back-end language independent.

Project Address: GraphQuery

Related Projects:

Overview

GraphQuery is an easy to use query language, it has built-in Xpath/CSS/Regex/JSONpath selectors and enough built-in text processing functions.
The most amazing thing is that you can use the minimalist GraphQuery syntax to get any data structure you want.

Language-independent

Use GraphQuery to let you unify text parsing logic on any backend language.
You won't need to find implementations of Xpath/CSS/Regex/JSONpath selectors between different languages ​​and get familiar with their syntax or explore their compatibility.

Multiple selector syntax support

You can use GraphQuery to parse any text and use your skilled selector. GraphQuery currently supports the following selectors:

  1. Jsonpath for parsing JSON strings
  2. Xpath and CSS for parsing XML/HTML
  3. Regular expressions for parsing any text.

You can use these selectors in any combination in GraphQuery. The rich built-in selector provides great flexibility for your parsing.

Complete function

Graphquery has some built-in text processing functions like trim, template, replace. If you think these functions don't meet your needs, you can register new custom functions in the pipeline.

Clear data structure & Concise grammar

With GraphQuery, you won't need to look for parsing libraries when parsing text, nor do you need to write complex nesting and traversal. Simple and clear GraphQuery syntax gives you a clear picture of the data structure.

compare

As you can see from the comparison chart above, the syntax of GraphQuery is so simple that even if you are in touch with it for the first time, you can still understand its meaning and get started quickly.

At the same time, GraphQuery is also very easy to integrate into your backend data system (any backend language), let's continue to look down.

Getting Started

GraphQuery consists of query language and pipelines. To guide you through each of these components, we've written an example designed to illustrate the various pieces of GraphQuery. This example is not comprehensive, but it is designed to quickly introduce the core concepts of GraphQuery. The premise of the example is that we want to use GraphQuery to query for information about library books.

1. First example

<library>
<!-- Great book. -->
<book id="b0836217462" available="true">
    <isbn>0836217462</isbn>
    <title lang="en">Being a Dog Is a Full-Time Job</title>
    <quote>I'd dog paddle the deepest ocean.</quote>
    <author id="CMS">
        <?echo "go rocks"?>
        <name>Charles M Schulz</name>
        <born>1922-11-26</born>
        <dead>2000-02-12</dead>
    </author>
    <character id="PP">
        <name>Peppermint Patty</name>
        <born>1966-08-22</born>
        <qualification>bold, brash and tomboyish</qualification>
    </character>
    <character id="Snoopy">
        <name>Snoopy</name>
        <born>1950-10-04</born>
        <qualification>extroverted beagle</qualification>
    </character>
</book>
</library>

Faced with such a text structure, we naturally think of extracting the following data structure from the text :

{
    bookID
    title
    isbn
    quote
    language
    author{
        name
        born
        dead
    }
    character [{
        name
        born
        qualification
    }]
}

This is perfect, when you know the data structure you want to extract, you have actually succeeded 80%, the above is the data structure we want, we call it DDL (Data Definition Language) for the time being. let's see how GraphQuery does it:

{
    bookID `css("book");attr("id")`
    title `css("title")`
    isbn `xpath("//isbn")`
    quote `css("quote")`
    language `css("title");attr("lang")`
    author `css("author")` {
        name `css("name")`
        born `css("born")`
        dead `css("dead")`
    }
    character `xpath("//character")` [{
        name `css("name")`
        born `css("born")`
        qualification `xpath("qualification")`
    }]
}

As you can see, the syntax of GraphQuery adds some strings wrapped in ` to the DDL. These strings wrapped by ` are called Pipeline. We will introduce Pipeline later.
Let's first take a look at what data GraphQuery engine returns to us.

{
    "bookID": "b0836217462",
    "title": "Being a Dog Is a Full-Time Job",
    "isbn": "0836217462",
    "quote": "I'd dog paddle the deepest ocean.",
    "language": "en",
    "author": {
        "born": "1922-11-26",
        "dead": "2000-02-12",
        "name": "Charles M Schulz"
    },
    "character": [
        {
            "born": "1966-08-22",
            "name": "Peppermint Patty",
            "qualification": "bold, brash and tomboyish"
        },
        {
            "born": "1950-10-04",
            "name": "Snoopy",
            "qualification": "extroverted beagle"
        }
    ],
}

Wow, it's wonderful. Just like what we want.
We call the above example Example1, now let's have a brief look at what pipeline is.

2. Pipeline

A pipeline is a collection of functions that use the parent element text as an entry parameter to execute the functions in the collection in sequence.
For example, the language field in our previous example is defined as follows:

language `css("title");attr("lang")`

The language is the field name, css("title"); attr("lang") is the pipeline. In this pipeline, GraphQuery first uses the CSS selector to find the title node from the document, and the title node will be obtained. Pass the obtained node into the attr() function and get its lang attribute. The whole process is as follows:

language: document->css("title")->attr("lang")->en

In Example1, we not only use the css and attr functions, but also xpath(). It is easy to associate, Xpath() is to select elements with the Xpath selector.
The following is a list of the pipeline functions built into the current version of graphquery:

pipeline prototype example introduce
css css(CSSSelector) css("title") Use CSS selector to select elements
json json(JSONSelector) json("title") Use json path to select elements
xpath xpath(XpathSelector) xpath("//title") Use Xpath selector to select elements
regex regex(RegexSelector) regex("") Use Regex selector to select elements
trim trim() trim() Clear spaces and line breaks before and after the string
template template(TemplateStr) template("[{$}]") Add characters before and after variables
attr attr(AttributeName) attr("lang") Extract the property of the current node
eq eq(Index) eq("0") Take the nth element in the current node collection
string string() string() Extract the current node native string
text text() text() Extract the text of the current node
link link(KeyName) link("title") Returns the current text of the specified key
replace replace(A, B) replace("a", "b") Replace all A in the current node to B

More detailed introduction to pipeline and function, please go to docs.

Install

GraphQuery is currently only native to Golang, but for other languages, it can be invoked as a service.

1. Golang:

go get github.com/storyicon/graphquery

Create a new go file :

package main

import (
    "encoding/json"
    "log"

    "github.com/storyicon/graphquery"
)

func main() {
    document := `
        <html>
            <body>
                <a href="01.html">Page 1</a>
                <a href="02.html">Page 2</a>
                <a href="03.html">Page 3</a>
            </body>
        </html>
    `
    expr := "{ anchor `css(\"a\")` [ content `text()` ] }"
    response := graphquery.ParseFromString(document, expr)
    bytes, _ := json.Marshal(response.Data)
    log.Println(string(bytes))
}

Run the go file, the output is as follows :

{"anchor":["Page 1","Page 2","Page 3"]}

2. Other language

We use the HTTP protocol to provide a cross-language solution for developers to query GraphQuery using any back-end language you want to use to access the specified port after starting the service.

GraphQuery-http : Cross language solution for GraphQuery

You can also use RPC for communication, but currently you may need to do this yourself, because the RPC project on GraphQuery is still under development.
At the same time, We welcome the contributors to write native support code for other languages ​​in GraphQuery.


您的分享是我們最大的動力!

-Advertisement-
Play Games
更多相關文章
  • 定義 ECMAScript規範為所有函數都包含兩個方法(這兩個方法非繼承而來), call 和 apply 。這兩個函數都是在特定的作用域中調用函數,能改變函數的作用域,實際上是改變函數體內 this 的值 。 用法 我們看到通過方法本身的call 和 apply 執行了該函數。 我們改變了函數運行 ...
  • JavaScript ES6 數組新方法 學習隨筆 新建數組 includes 方法 includes 查找數組有無該參數 有返回true map方法 map 遍歷處理返回新數組 原數組不會改變 reduce方法 reduce 遍歷處理數組返回結果 prev與next中間的符號以及順序控制處理方式 ...
  • 第四篇打算作為系列最後一篇,這裡嘗試談談緩存的一些併發交互場景,包括與資料庫(特指 RDBMS)交互,和一些獨立的高併發場景相關補充處理方案(若涉及具體應用同樣將主要以Redis舉例)。 另見:分散式系統之緩存的微觀應用經驗談(三)(數據分片和集群篇) (https://yq.aliyun.com/... ...
  • 單體架構 一個歸檔包(例如war格式)包含所有功能的應用程式,通常稱為單體應用。 如圖: 儘管該應用已經使用了MVC分層與模塊化,但是由於所有部件最終都打包在一個war包中,該war包包含了整個系統所有的業務功能,這樣的應用系統稱為單體應用。 單體架構的缺陷: 1.複雜度高:隨著代碼的增多,會導致業 ...
  • 設計模式:觀察者模式 當一個對象的狀態發生改變時,依賴他的對象會全部收到通知,並自動更新。 使用場景 一個事件發生後,要執行一連串更新操作。傳統的編程方式,就是在事件的代碼之後直接加入處理邏輯,當更新得邏輯增多之後,代碼會變得難以維護,這種方式是耦合的,侵入式的,增加新的邏輯需要改變事件主題的代碼。 ...
  • 緩存穿透 概念 訪問一個不存在的key,緩存不起作用,請求會穿透到DB,流量大時DB會掛掉。 解決方案 1. 採用布隆過濾器,使用一個足夠大的bitmap,用於存儲可能訪問的key,不存在的key直接被過濾; 2. 訪問key未在DB查詢到值,也將空值寫進緩存,但可以設置較短過期時間。 緩存雪崩 概 ...
  • 下麵總結並演示了 Redis 的 常用管理命令、key 操作、字元串、集合、列表、散列類型的操作命令。 你需要掌握的 Redis 知識 "史上最全 Redis 高可用解決方案總結" "為什麼分散式一定要有Redis?" "Spring Boot Redis Cluster 實戰乾貨" "Spring ...
  • 系統介紹: 1.系統採用主流的 SSM 框架 jsp JSTL bootstrap html5 (PC瀏覽器使用) 2.springmvc +spring4.3.7+ mybaits3.3 SSM 普通java web(非maven, 附贈pom.xml文件) 資料庫:mysql 3.開發工具:my ...
一周排行
    -Advertisement-
    Play Games
  • 移動開發(一):使用.NET MAUI開發第一個安卓APP 對於工作多年的C#程式員來說,近來想嘗試開發一款安卓APP,考慮了很久最終選擇使用.NET MAUI這個微軟官方的框架來嘗試體驗開發安卓APP,畢竟是使用Visual Studio開發工具,使用起來也比較的順手,結合微軟官方的教程進行了安卓 ...
  • 前言 QuestPDF 是一個開源 .NET 庫,用於生成 PDF 文檔。使用了C# Fluent API方式可簡化開發、減少錯誤並提高工作效率。利用它可以輕鬆生成 PDF 報告、發票、導出文件等。 項目介紹 QuestPDF 是一個革命性的開源 .NET 庫,它徹底改變了我們生成 PDF 文檔的方 ...
  • 項目地址 項目後端地址: https://github.com/ZyPLJ/ZYTteeHole 項目前端頁面地址: ZyPLJ/TreeHoleVue (github.com) https://github.com/ZyPLJ/TreeHoleVue 目前項目測試訪問地址: http://tree ...
  • 話不多說,直接開乾 一.下載 1.官方鏈接下載: https://www.microsoft.com/zh-cn/sql-server/sql-server-downloads 2.在下載目錄中找到下麵這個小的安裝包 SQL2022-SSEI-Dev.exe,運行開始下載SQL server; 二. ...
  • 前言 隨著物聯網(IoT)技術的迅猛發展,MQTT(消息隊列遙測傳輸)協議憑藉其輕量級和高效性,已成為眾多物聯網應用的首選通信標準。 MQTTnet 作為一個高性能的 .NET 開源庫,為 .NET 平臺上的 MQTT 客戶端與伺服器開發提供了強大的支持。 本文將全面介紹 MQTTnet 的核心功能 ...
  • Serilog支持多種接收器用於日誌存儲,增強器用於添加屬性,LogContext管理動態屬性,支持多種輸出格式包括純文本、JSON及ExpressionTemplate。還提供了自定義格式化選項,適用於不同需求。 ...
  • 目錄簡介獲取 HTML 文檔解析 HTML 文檔測試參考文章 簡介 動態內容網站使用 JavaScript 腳本動態檢索和渲染數據,爬取信息時需要模擬瀏覽器行為,否則獲取到的源碼基本是空的。 本文使用的爬取步驟如下: 使用 Selenium 獲取渲染後的 HTML 文檔 使用 HtmlAgility ...
  • 1.前言 什麼是熱更新 游戲或者軟體更新時,無需重新下載客戶端進行安裝,而是在應用程式啟動的情況下,在內部進行資源或者代碼更新 Unity目前常用熱更新解決方案 HybridCLR,Xlua,ILRuntime等 Unity目前常用資源管理解決方案 AssetBundles,Addressable, ...
  • 本文章主要是在C# ASP.NET Core Web API框架實現向手機發送驗證碼簡訊功能。這裡我選擇是一個互億無線簡訊驗證碼平臺,其實像阿裡雲,騰訊雲上面也可以。 首先我們先去 互億無線 https://www.ihuyi.com/api/sms.html 去註冊一個賬號 註冊完成賬號後,它會送 ...
  • 通過以下方式可以高效,並保證數據同步的可靠性 1.API設計 使用RESTful設計,確保API端點明確,並使用適當的HTTP方法(如POST用於創建,PUT用於更新)。 設計清晰的請求和響應模型,以確保客戶端能夠理解預期格式。 2.數據驗證 在伺服器端進行嚴格的數據驗證,確保接收到的數據符合預期格 ...