1 LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 …)]
2
3 LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 …)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
在Hive 3.0版本之前的load操作都是纯粹的copy/move操作:move datafiles into locations corresponding to Hive tables
filepath:
被加载的目标位置可以是表或分区,如果是分区必须指定所有分区的具体值(有多少分区就要写多少个分区)
filepath可以是具体的文件,也可以是文件夹,如果是文件夹,hive将会移动该文件夹下的所有文件到hive表中
如果指定了关键字LOCAL
没有指定关键字LOCAL
如果使用了OVERWRITE关键字,目标表或表分区的内容将会被删除,并被替换成filepath指定的内容
对于 INSERT AS SELECT这样额外的load操作在Hive 3.0 及之后版本支持,因为Hive 内部需要重写load
如果表中有分区,但是load指令中没有指定,load方式将会转成INSERT AS SELECT,并认为最后的colums是分区的columns,如果文件不符合期待的模式,将会出错
filepath 可以包含子文件目录,其中的子文件都需符合schema
inputformat 可以是hive的任何输入格式 text, ORC等
serde 可以和Hive的SERDE关联
inputformat 和serde都区分大小写
1 # 标准语法:
2 INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …) [IF NOT EXISTS]] select_statement1 FROM from_statement;
3 INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …)] select_statement1 FROM from_statement;
4
5 # 扩展内容(多个insert):
6 FROM from_statement
7 INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …) [IF NOT EXISTS]] select_statement1
8 [INSERT OVERWRITE TABLE tablename2 [PARTITION … [IF NOT EXISTS]] select_statement2]
9 [INSERT INTO TABLE tablename2 [PARTITION …] select_statement2] …;
10 FROM from_statement
11 INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …)] select_statement1
12 [INSERT INTO TABLE tablename2 [PARTITION …] select_statement2]
13 [INSERT OVERWRITE TABLE tablename2 [PARTITION … [IF NOT EXISTS]] select_statement2] …;
14
15 Hive extension (dynamic partition inserts):
16 INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] …) select_statement FROM from_statement;
17 INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] …) select_statement FROM from_statement;
INSERT OVERWRITE 将会覆盖表或分区中的任何存在的额数据
INSERT INTO将会追加到表或分区中,保留存在数据的完整性(HIVE 0.8)
hive(0.13.0)之后,建表时指定TBLPROPERTIES ("immutable"="true"),表可以被设置成不可改变 :当表中没有数据时,可以插入;当表中存在数据时,通过insert into追加数据无效,但是INSERT OVERWRITE仍然起作用
目前OVERWRITE 关键字是强制的,也就是说表或分区中的内容会被关联的查询输出内容替代,需要确保查询语句的输出字段内容和目标表字段顺序完全一致
hive(0.14)如果表的OutputFormat实现了AcidOutputFormat,INSERT OVERWRITE操作无效,可以用TRUNCATE TABLE(对于没有分区表)或者DROP PARTITION 加INSERT INTO 实现
hive(1.1.0),TABLE关键字可以不用写
hive(1.2.0), INSERT INTO T可以指定列顺序(INSERT OVERWRITE 不支持),如INSERT INTO T (z, x, c1),查看举例
Configuration property
Default
Note
hive.exec.dynamic.partition
true
Needs to be set to true
to enable dynamic partition inserts
hive.exec.dynamic.partition.mode
strict
In strict
mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions, in nonstrict
mode all partitions are allowed to be dynamic
hive.exec.max.dynamic.partitions.pernode
100
Maximum number of dynamic partitions allowed to be created in each mapper/reducer node
hive.exec.max.dynamic.partitions
1000
Maximum number of dynamic partitions allowed to be created in total
hive.exec.max.created.files
100000
Maximum number of HDFS files created by all mappers/reducers in a MapReduce job
hive.error.on.empty.partition
false
Whether to throw an exception if dynamic partition insert generates empty results
举个例子:
1 FROM page_view_stg pvs
2 INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
3 SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.cnt
country分区会由最后的字段(pvs.cnt)创建,提示:名称没有被使用,只使用了其中的值,在nonstrict模式中dt分区也可以被动态创建
手机扫一扫
移动阅读更方便
你可能感兴趣的文章