ACL Management for Spark SQL
Three primary modes for Spark SQL authorization are available with spark-authorizer:
Storage-Based Authorization
Enabling Storage Based Authorization in the Hive Metastore Server
uses the HDFS permissions to act as the main source for verification and allows for consistent data and metadata authorization policy. This allows control over metadata access by verifying if the user has permission to access corresponding directories on the HDFS. Similar with HiveServer2
, files and directories will be tanslated into hive metadata objects, such as dbs, tables, partitions, and be protected from end user's queries through Spark SQL as a service like Kyuubi, livy etc.
Storage-Based Authorization offers users with Database, Table and Partition-level coarse-gained access control.
Please refer to the Storage-Based Authorization Guide in the online documentation for an overview on how to configure Storage-Based Authorization for Spark SQL.
SQL-Standard Based Authorization
Enabling SQL-Standard Based Authorization gives users more fine-gained control over access comparing with Storage Based Authorization. Besides of the ability of Storage Based Authorization, SQL-Standard Based Authorization can improve it to Views and Column-level. Unfortunately, Spark SQL does not support grant/revoke statements which controls access, this might be done only through the HiveServer2. But it's gratifying that spark-authorizer makes Spark SQL be able to understand this fine-grain access control granted or revoked by Hive.
For Spark SQL Client users who can directly acess HDFS, the SQL-Standard Based Authorization can be easily bypassed.
With Kyuubi, the SQL-Standard Based Authorization is guaranteed for the security configurations, metadata, and storage information is preserved from end users.
Please refer to the SQL-Standard Based Authorization Guide in the online documentation for an overview on how to configure SQL-Standard Based Authorization for Spark SQL.
Ranger Security Support
Apache Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform but end before Spark or Spark SQL. The spark-authorizer enables Spark SQL with control access ability reusing Ranger Plugin for Hive MetaStore. Apache Ranger makes the scope of existing SQL-Standard Based Authorization expanded but without supporting Spark SQL. And spark-authorizer sticks them together.
Please refer to the Spark SQL Ranger Security Support Guide in the online documentation for an overview on how to configure Ranger for Spark SQL.