Oracle 12.1.0.2 New Feature翻譯學習【In-Memory column store記憶體列存儲】【原創】

翻譯沒有追求信達雅，不是為了學英語翻譯，是為了快速瞭解新特性，如有語義理解錯誤可以指正。歡迎加微信12735770或QQ12735770探討oracle技術問題:) In-Memory Column Store記憶體列存儲 Starting in Oracle Database 12c Release ...

翻譯沒有追求信達雅，不是為了學英語翻譯，是為了快速瞭解新特性，如有語義理解錯誤可以指正。歡迎加微信12735770或QQ12735770探討oracle技術問題:)

In-Memory Column Store記憶體列存儲

Starting in Oracle Database 12c Release 1 (12.1.0.2), the In-Memory Column Store (IM column store) is an optional, static SGA pool that stores copies of tables and partitions in a special columnar format optimized for rapid scans.

從12.1.0.2開始，有一個可選項叫In-Memory column store（IM column store），可使靜態SGA存儲為了快速掃描優化過的指定列格式（column format）的表和分區的複製。

The IM column store does not replace the buffer cache, but acts as a supplement so that both memory areas can store the same data in different formats. By default, only objects specified as INMEMORY using DDL are candidates to be populated in the IM column store.

記憶體列存儲不是為了替代Buffer Cache，而是作為補充可以讓所有記憶體區域存儲同樣的數據用不同的格式。預設的，需要用DDL對指明的對象用INMEMEORY選項才會加入隊列等待存入記憶體列存儲中。

It is not necessary for objects populated in the IM column store to be loaded into the database buffer cache.

完全不需要把記憶體中的填充對象載入到buffer cache中。

柱狀格式只存在於記憶體中。下圖展示了SH模式中存儲在記憶體列存儲中的三個表：customers,products,sales。記憶體列存儲使用列存儲數據而不是行。資料庫保證柱狀格式的數據與buffer cache中的數據是一致的，只是存儲格式不同而已。

Benefits of the IM Column Store

使用記憶體列存儲的好處

The IM column store enables the database to perform scans, joins, and aggregates much faster than when it uses the on-disk format exclusively.

使用記憶體列存儲可以使資料庫的掃描scans、連接joins、聚合aggregates比用on-disk format快很多。

In particular, the IM column store is useful for:

特別是記憶體列存儲用在：

l Performing fast full scans of large tables

l 執行大表快速掃描；

l Evaluating predicate filters that use operators such as =, <, >, and IN

l 評估用<>=和in操作的謂語過濾；

l Querying a subset of columns in a table, for example, selecting 5 of 100 columns

l 查詢表中列的子集，如：select 5of 100 columns.

l Accelerating joins by converting predicates on small dimension tables into filters on a large fact table

l 通過轉換小的維度表的謂語到一個大的事實表的過濾來加速join。

Business applications, ad-hoc analytic queries, and data warehouse workloads benefit most. Pure OLTP databases that perform short transactions using index lookups benefit less.

商業應用，即席分析查詢，以及數據倉庫負載收益最多。完全的OLTP資料庫，使用索引查詢短事物，收益較小。

The IM column store also provides the following advantages:

記憶體列存儲優勢：

l 1.All existing database features are supported, including High Availability features (see "Overview of High Availability").

l 支持現有所有資料庫特性，包括高可用。

l No application changes are required.

l 不需要更改應用。

l 2.The optimizer automatically takes advantage of the columnar format.

l 優化器自動利用柱狀格式。

l 3.Configuration is simple.

l 配置簡單。

l The INMEMORY_SIZE initialization parameter specifies the amount of memory reserved for use by the IM column store. DDL statements specify the tablespaces, tables, partitions, or columns to be read into the IM column store.

l 只需要對INMEMORY_SIZE初始化參數指定為使用IM column store保留的記憶體大小。然後DDL語句指定表空間、表、分區或者需要被讀入IM column store的列就可以。

l 4.Compression is optimized for query performance.

l 為查詢性能優化了壓縮。

l These compression techniques increase the effective memory bandwidth by enabling sessions to read more data into memory.

l 這些壓縮技術提高了記憶體帶寬的使用效率，使會話可以讀入更多數據到記憶體中。

l 5.Fewer indexes, materialized views, and OLAP cubes are required.

l 更少的需要索引、物化視圖和OLAP多維數據集。

l The reduction in the number of pre-built objects results in reduced storage space and significantly less processing overhead.

l 預創建對象數量的減少使存儲空間降低並且顯著地減少處理開銷。

DUAL Memory Format：columnar and row

雙記憶體模式：柱模式和行模式

When fetching data, Oracle Database can read either the IM column store or the database buffer cache, or both within the same query.

取數據的時候，oracle既可以讀取記憶體列存儲的，也可以讀取buffer cacho中的，或者同時用他們查詢同樣的數據。

The database transparently sends OLTP queries (such as primary key lookups) to the buffer cache, and analytic and reporting queries to the IM column store. Thus, dual memory formats provide the best of both worlds.

資料庫透明地發送OLTP查詢（如主鍵查詢）到buffer cache中，並且也分析、報告基於記憶體列存儲的查詢。因此，雙記憶體格式提供比兩個單獨的（IM column store/buffer cache）方式更好的方式。

In the execution plan, the operation TABLE ACCESS IN MEMORY FULL indicates use of the IM column store.

在執行計劃中，TABLE ACCESS IN MEMORY FULL表明使用的是記憶體列存儲。

The following figure shows a sample IM column store. The sh.sales table is stored on disk in traditional row format. The SGA stores the data in columnar format in the IM column store, and in row format in the database buffer cache.

下圖展示了記憶體列存儲的樣例，sh.sales表被存在磁碟上以傳統的行格式。SGA在IM column store上以柱狀格式存儲數據，並且在buffer cache中以行格式存儲數據。

How Background Processes Populate IMCUs

後臺進程如何填充IMCUs

在填充數據期間，資料庫從磁碟上以行的形式讀取數據，並且以行為單位創建列，然後壓縮數據到In-Memory-Compression-Units（IMCU），worker process（Wnnn）進程填充數據到記憶體列存儲，每個worker進程操作一個對象上的數據塊子集，填充是一個流機制，壓縮並且同時轉換數據為柱狀格式。

The INMEMORY_MAX_POPULATE_SERVERS initialization parameter specifies the maximum number of worker processes to use for IM column store population. By default, the setting is one half of CPU_COUNT. Set this parameter to an appropriate value for your environment. More worker processes result in faster population, but they use more CPU resources. Fewer worker processes result in slower population, which reduces CPU overhead.

INMEMORY_MAX_POPULATE_SERVERS初始化參數指定用來執行記憶體列存儲填充任務的worker進程最大數量。該參數預設為CPU_COUNT的一半。Worker越多，填充越快，但是也越消耗CPU，所以需要根據實際環境設置該參數。

註意，如果該參數設置為0，則意味著禁用了填充功能。

Population of the IM Column Store at Instance Startup

實例啟動時的記憶體列存儲的填充操作

The database must reconstruct the entire in-memory columnar representation from the on-disk representation every time the database instance restarts. This reconstruction is necessary because the IM column store resides only in memory.

資料庫實例每次重啟都必須要根據磁碟中的representation來重建記憶體中的representation。這個重建過程是必須的，因為記憶體列存儲只能存儲在記憶體中。

Population of the IM Column Store in Response to Queries

查詢響應中的記憶體列存儲填充操作

Setting the INMEMORY attribute on an object means that this object is a candidate for population in the IM column store, not that the database immediately populates the object in memory.

在一個對象上設置了INMEMORY屬性，意味著這個對象已經加入等待填充到記憶體列存儲的隊列中了，並不是立即把這個對象填充進記憶體。

By default (INMEMORY PRIORITY is set to NONE), the database delays population of a table in the IM column store until the database considers it useful. When the INMEMORY attribute is set for an object, the database may choose not to materialize all columns when the database determines that the memory is better used elsewhere. Also, the IM column store may populate a subset of columns from a table.

INMEMORY PRIORITY預設為NONE，資料庫需要先確定使用記憶體列存儲有用才會進行一個表的填充。即使INMEMORY屬性已經設置給對象了，但是資料庫如果覺得記憶體用在其他地方更好，則會選擇不去物化所有的列。記憶體列存儲可能只會填充表上列的部分子集。

Example 14-1 Population of an Object in the IM Column Store

Assume that you connect to the database as administrator. To determine whether data from the sh.customers table has been populated in the IM column store, you execute the following query :

假設你以SYSDBA許可權登錄資料庫，為了查詢是否sh.customres表已經被填充進IMcolumn store中，你執行以下語句：

SQL> SELECT SEGMENT_NAME, POPULATE_STATUS FROM V$IM_SEGMENTS WHERE SEGMENT_NAME = 'CUSTOMERS';

no rows selected

In this case, no segments are populated in the IM column store because sh.customers has not yet been accessed. You query sh.customers, and then query V$IM_SEGMENTS again:

這種情況為什麼沒有段被填充進記憶體列存儲中，是因為sh.customers還沒有被訪問過，當你查詢一次sh.customres，然後在查詢V$IM_SEGMENTS時候：

SQL> SELECT cust_id, cust_last_name, cust_first_name

2 FROM sh.customers WHERE cust_city = 'Hyderabad'

3 AND cust_income_level LIKE 'C%' AND cust_year_of_birth > 1960;

CUST_ID CUST_LAST_NAME CUST_FIRST

---------- --------------- ----------

6284 Hummer Liane

12515 Oppy Linette

39514 Ready Theodric

33292 Weston Theodric

33284 Grigsby Thatcher

27062 Patterson Thatcher

6 rows selected.

SQL> COL SEGMENT_NAME FORMAT a20

SQL> SELECT SEGMENT_NAME, POPULATE_STATUS FROM V$IM_SEGMENTS WHERE SEGMENT_NAME = 'CUSTOMERS';

SEGMENT_NAME POPULATE_STAT

-------------------- -------------

CUSTOMERS COMPLETED

The following query confirms that the database used the IM column store to retrieve the results:

下麵的查詢可以證明是否資料庫使用了記憶體列存儲來檢索結果：

SQL> COL NAME FORMAT a25

SQL> SELECT ul.NAME, ul.DETECTED_USAGES FROM DBA_FEATURE_USAGE_STATISTICS ul

WHERE ul.VERSION=(SELECT MAX(u2.VERSION) FROM DBA_FEATURE_USAGE_STATISTICS u2

WHERE u2.NAME = ul.NAME AND ul.NAME LIKE '%Column Store%');

NAME DETECTED_USAGES

------------------------- ---------------

In-Memory Column Store 1

Population of the IM Column Store in Oracle RAC

RAC中的IM column store填充操作

Each node in an Oracle Real Application Clusters (Oracle RAC) environment has its own IM column store.

RAC每個節點有自己的記憶體列存儲（畢竟每個節點有自己的記憶體）

By default, each in-memory object is distributed among the Oracle RAC instances, effectively employing a share-nothing architecture for the IM column store. When INMEMORY is specified, the DISTRIBUTE and DUPLICATE keywords control the distribution of objects.

每個記憶體對象預設被分佈在RAC實例中，並且記憶體列存儲都設置為share-nothing。當指定INMEMORY，DISTRIBUTE和DUPLICATE關鍵字控制對象的具體分佈。

註意：可以為每個節點設置IMEMORY_SIZE為0，意味著禁用該節點的IM column store。

In-Memory Columnar Compression

記憶體柱狀壓縮

The IM column store uses special compression formats optimized for access speed rather than storage reduction.

記憶體列存儲使用為了提高訪問速度而不是減少磁碟訪問而優化過的特別壓縮格式。

The database increases speed in the following ways:

資料庫通過以下途徑提高訪問速度：

The compression formats enable the database to decrease the amount of memory processed for each column. SQL executes directly on the compressed columns.

壓縮格式使資料庫減少每列的記憶體進程數量，SQL直接執行在壓縮列上。

The database uses SIMD （Single instruction, multiple data）vector (array) instructions to process an array of column values in a single CPU clock cycle. The database can store many values in a vector, which maximizes the performance benefits with SIMD vector processing.

資料庫使用SIMD矢量（陣列）指令來處理單個CPU周期中的列值的陣列。資料庫可以存儲許多矢量中的值，這個矢量最大化了SIMD矢量處理的性能收益。

You can use the MEMCOMPRESS subclause of CREATE and ALTER to select different compression options for each table, partition, or column. For example, you might optimize some columns in a table for scan speed, and others for storage.

你可以使用CREATE和ALTER的MEMCOMPRESS子句選擇不同的壓縮選項給每張表，分區，或者列。例如：你可能需要在這張表上提高掃描速度，而在另一張表上提高存儲性能。

The default compression option is MEMCOMPRESS FOR QUERY LOW. This option provides the fastest read times because the database does not need to decompress the data. The other compression options, such as FOR QUERY HIGH and FOR CAPACITY, use layers of compression, some of which require decompression. Before storing a table in the IM column store, you can use DBMS_COMPRESSION.GET_COMPRESSION_RATIO to report the compression ratio, and thus estimate how much space is required.

預設壓縮選項是MEMCOMPRESS FOR QUERY LOW。這個選項提供最快的讀取速度，因為資料庫不需要解壓數據。

其他壓縮選項，例如：FOR QUERY HIGH，FOR CAPACITY，不同的壓縮層次，有一些需要先解壓。在把表存進記憶體列存儲中之前，可以使用DBMS_COMPRESSION.GET_COMPRESSION_RATIO報告壓縮比例，然後可以評估還需要多少空間。

The in-memory columnar compression is closely related to Hybrid Columnar Compression. Both technologies involve processing sets of column vectors. The primary difference is that the column vectors for the IM column store are optimized for memory storage, whereas the column vectors for Hybrid Columnar Compression are optimized for disk storage.

記憶體列存儲壓縮與Oracle exdata的混合柱壓縮非常相似。兩個技術都包含處理列矢量的集合。主要區別在於記憶體列存儲的列矢量是為記憶體存儲而優化過的，然後混合柱壓縮的列矢量是為磁碟存儲而優化過的。

Scan Optimizations for the IM Column Store

記憶體列存儲的掃描優化

The columnar format enables queries to scan only necessary columns.

柱狀格式使查詢僅掃描需要的列。

For example, suppose a user executes the following ad hoc query:

例如，一個用戶執行以下即席查詢：

SELECT cust_id, time_id, channel_id

FROM sales

WHERE prod_id > 14

AND prod_id < 29

When using the buffer cache, the database would typically scan an index to find the product IDs, use the rowids to fetch the rows from disk into the buffer cache, and then discard the unwanted column values. Scanning data in row format in the buffer cache requires many CPU instructions, and can result in poor CPU efficiency.

當使用buffer cache時，資料庫會掃描索引去查找product IDs，用rowids從磁碟取出數據放到buffer cache，然後丟掉不要的值。行模式下載buffer cache中掃描數據需要CPU執行許多指令，導致CPU效率很低。

When using the IM column store, the database can scan only the requested sales columns, avoiding expensive disk I/O altogether. Scanning data in columnar format pipelines only necessary columns to the CPU, increasing efficiency. Each CPU core scans local in-memory columns using SIMD vector instructions.

當使用記憶體列存儲時，資料庫只需要掃描需要的sales列，完全避免了昂貴的磁碟I/O開銷。柱狀格式掃描數據僅傳輸需要的列給CPU，提高了CPU效率。每個CPU掃描本地記憶體中的列使用SIMD矢量命令。