tispark使用时需要注意分区裁剪问题,在tispark中的分区裁剪只能使用在to_day方式进行的分区表。
不能用unix_timestamp限制的。而且在tidb中产生的执行计划和tispark中的执行计划不同。请注意。
文档见如下:
Reading partition table from TiDB
TiSpark reads the range and hash partition table from TiDB.
Currently, TiSpark doesn't support a MySQL/TiDB partition table syntax select col_name from table_name partition(partition_name), but you can still use where condition to filter the partitions.
TiSpark decides whether to apply partition pruning according to the partition type and the partition expression associated with the table. Currently, TiSpark partially apply partition pruning on range partition.
The partition pruning is applied when the partition expression of the range partition is one of the following:
column expression
YEAR(col) and its type is datetime/string/date literal that can be parsed as datetime.
TO_DAYS(col) and its type is datetime/string/date literal that can be parsed as datetime.
If partition pruning is not applied, TiSpark's reading is equivalent to doing a table scan over all partitions.
Write into partition table
Currently, TiSpark only supports writing into the range and hash partition table under the following conditions:
the partition expression is column expression
the partition expression is YEAR($argument) where the argument is a column and its type is datetime or string literal that can be parsed as datetime.
There are two ways to write into partition table:
Use datasource API to write into partition table which supports replace and append semantics.
Use delete statement with Spark SQL.