我們要從MySQL當中導出數據到Greenplum當中,按照以下步驟就可以 以schema_name.table_name為例 導的時候需要註意,一些字元的轉換,對於這張表來說,主要就是在MySQL當中一些時間格式存儲的為INT類型,我們需要進行轉化後然後導出,而且在Greenplum當中建表的時候 ...
我們要從MySQL當中導出數據到Greenplum當中,按照以下步驟就可以
1:將MySQL當中的表導出外部文件
以schema_name.table_name為例
select product_id, number, name, english_name, purchase_name, system_name, bar_code, category_one, category_two, category_three, parent_id, parent_number, brand_id, supplier_id, price, ad_word, give_integral, shelf_life, FROM_UNIXTIME(shelve_date), product_area, country, sale_unit, specification, weight, length, width, height, storage_conditions, storage, model, refuse_notes, status, is_promote, is_gift, is_book, is_outgoing, is_presale, is_fragile, is_have, is_cod, is_return, is_oos, is_seasonal, is_multicity, is_package, is_show, click, favorite, min_purchase_unit, in_price, refer_in_price, mwaverage_price, is_unique_number, is_batch_number, qs_proportion, shelf_life_proportion, box_specification, max_unsalable, advent_shelves, pro_warning, FROM_UNIXTIME(add_time), operator_id,FROM_UNIXTIME( audit_time), remark, price_type, new_tag, product_type, business_model, is_sell, return_policy, package, inventory, merchant_number, modified_time ,now() from schema_name.table_name INTO OUTFILE '/tmp/table_name.txt';
導的時候需要註意,一些字元的轉換,對於這張表來說,主要就是在MySQL當中一些時間格式存儲的為INT類型,我們需要進行轉化後然後導出,而且在Greenplum當中建表的時候會多一個時間欄位,我們這裡預設導出現在時間。按照以上格式進行導出。
2:將文件拷貝到Greenplum伺服器上,並且創建外部表
先將文件拷貝到外部表的目錄下,這個比較簡單,什麼方法都可以,然後創建外部表:
create external TABLE schema_name.table_name_ext( product_id int, number varchar(10), name varchar(100), english_name varchar(100), purchase_name varchar(100), system_name varchar(100), bar_code varchar(255), category_one int, category_two int, category_three int, parent_id int, parent_number int, brand_id int, supplier_id int, price int, ad_word varchar(100), give_integral int, shelf_life int, shelve_date timestamp without time zone, product_area int, country int, sale_unit varchar(20), specification varchar(255), weight decimal(10,2) , length int, width int, height int, storage_conditions varchar(255), storage smallint, model varchar(20), refuse_notes varchar(255), status smallint, is_promote smallint, is_gift smallint, is_book smallint, is_outgoing smallint, is_presale int, is_fragile smallint, is_have smallint, is_cod smallint, is_return smallint, is_oos smallint, is_seasonal smallint, is_multicity smallint, is_package smallint, is_show smallint, click int, favorite int, min_purchase_unit int, in_price int, refer_in_price int, mwaverage_price int, is_unique_number int, is_batch_number int, qs_proportion int, shelf_life_proportion DOUBLE PRECISION, box_specification varchar(50), max_unsalable int, advent_shelves int, pro_warning int, add_time timestamp without time zone, operator_id int, audit_time timestamp without time zone, remark varchar(255), price_type smallint, new_tag int, product_type int, business_model smallint, is_sell smallint, return_policy smallint, package varchar(200), inventory varchar(200), merchant_number int, modified_time timestamp without time zone, dw_modified_time timestamp without time zone ) location( 'gpfdist://172.16.16.34:9888/table_name.txt' )
FORMAT 'TEXT' SEGMENT REJECT LIMIT 1000000 rows ;
這裡我們要指定'gpfdist://172.16.16.34:9888/table_name.txt',這個IP地址加上外部表就可以了,後面要把這個文件拷貝到 gpfdist 的目錄當中,我們看下啟動方式gpfdist -d /tmp -p 9888,也就是要把外部文件拷貝到/tmp目錄下才可以。其他的註意列名對應就好
然後查詢一下,一般情況列對上就不會有問題。
3:導入到Greenplum當中正式表
先創建一張正式表:
create table schema_name.table_name ( product_id int, number varchar(10), name varchar(100), english_name varchar(100), purchase_name varchar(100), system_name varchar(100), bar_code varchar(255), category_one int, category_two int, category_three int, parent_id int, parent_number int, brand_id int, supplier_id int, price int, ad_word varchar(100), give_integral int, shelf_life int, shelve_date timestamp without time zone, product_area int, country int, sale_unit varchar(20), specification varchar(255), weight decimal(10,2) , length int, width int, height int, storage_conditions varchar(255), storage smallint, model varchar(20), refuse_notes varchar(255), status smallint, is_promote smallint, is_gift smallint, is_book smallint, is_outgoing smallint, is_presale int, is_fragile smallint, is_have smallint, is_cod smallint, is_return smallint, is_oos smallint, is_seasonal smallint, is_multicity smallint, is_package smallint, is_show smallint, click int, favorite int, min_purchase_unit int, in_price int, refer_in_price int, mwaverage_price int, is_unique_number int, is_batch_number int, qs_proportion int, shelf_life_proportion DOUBLE PRECISION, box_specification varchar(50), max_unsalable int, advent_shelves int, pro_warning int, add_time timestamp without time zone, operator_id int, audit_time timestamp without time zone, remark varchar(255), price_type smallint, new_tag int, product_type int, business_model smallint, is_sell smallint, return_policy smallint, package varchar(200), inventory varchar(200), merchant_number int, modified_time timestamp without time zone, dw_modified_time timestamp without time zone ) distributed by(product_id);
然後導入數據:
insert into schema_name.table_name
select * from schema_name.table_name_ext
這樣就把外部表數據導出到了內部表,均勻分佈在每個segment上。註意schema_name.table_name的結構要和schema_name.table_name_ext是一致的。