Working with Hudi + S3 + HMS. S3 access error

atwong · February 3, 2024, 2:14am

Executing this scala spark code

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
import org.apache.spark.sql.SaveMode._
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._
import scala.collection.JavaConversions._

val schema = StructType( Array(
                 StructField("language", StringType, true),
                 StructField("users", StringType, true),
                 StructField("id", StringType, true)
             ))

val rowData= Seq(Row("Java", "20000", "a"), 
               Row("Python", "100000", "b"), 
               Row("Scala", "3000", "c"))


val df = spark.createDataFrame(rowData,schema)

val tableName = "hudi_coders_hive"
val basePath = "s3a://warehouse/hudi_coders/"

df.write.format("hudi").
  option(TABLE_NAME, tableName).
  option(RECORDKEY_FIELD_OPT_KEY, "id").
  option(PARTITIONPATH_FIELD_OPT_KEY, "language").
  option(PRECOMBINE_FIELD_OPT_KEY, "users").
  option("hoodie.datasource.write.hive_style_partitioning", "true").
  option("hoodie.datasource.hive_sync.enable", "true").
  option("hoodie.datasource.hive_sync.mode", "hms").
  option("hoodie.datasource.hive_sync.database", "default").
  option("hoodie.datasource.hive_sync.table", tableName).
  option("hoodie.datasource.hive_sync.partition_fields", "language").
  option("hoodie.datasource.hive_sync.partition_extractor_class", "org.apache.hudi.hive.MultiPartKeysValueExtractor").
  option("hoodie.datasource.hive_sync.metastore.uris", "thrift://hive-metastore:9083").
  mode(Overwrite).
  save(basePath)

atwong · February 3, 2024, 2:15am

sr-hudi.txt (57.9 KB)

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: XZ6M19JFRH0VB3KW; S3 Extended Request ID: mN8cO8C3wrrbd6WS

CREATE EXTERNAL CATALOG hudi_catalog_hms
PROPERTIES
(
    "type" = "hudi",
    "hive.metastore.type" = "hive",
    "hive.metastore.uris" = "thrift://hive-metastore:9083",
    "aws.s3.use_instance_profile" = "false",
    "aws.s3.access_key" = "admin",
    "aws.s3.secret_key" = "password",
    "aws.s3.region" = "us-east-1"
);

Doesn’t seem like a permission issue

atwong@Albert-CelerData hudi % mc alias ls
local
  URL       : http://localhost:9000
  AccessKey : admin
  SecretKey : password
  API       : s3v4
  Path      : auto
atwong@Albert-CelerData hudi % mc ls local/warehouse/hudi_coders/.hoodie
[2024-02-02 15:54:52 PST] 5.8KiB STANDARD 20240202235406856.commit
[2024-02-02 15:54:08 PST]     0B STANDARD 20240202235406856.commit.requested
[2024-02-02 15:54:33 PST] 4.2KiB STANDARD 20240202235406856.inflight
[2024-02-02 15:54:16 PST]   819B STANDARD hoodie.properties
[2024-02-02 16:31:25 PST]     0B .aux/
[2024-02-02 16:31:25 PST]     0B .schema/
[2024-02-02 16:31:25 PST]     0B .temp/
[2024-02-02 16:31:25 PST]     0B archived/
[2024-02-02 16:31:25 PST]     0B metadata/

atwong · February 3, 2024, 2:36am

Answer is you need the s3 info.

CREATE EXTERNAL CATALOG hudi_catalog_hms
PROPERTIES
(
    "type" = "hudi",
    "hive.metastore.type" = "hive",
    "hive.metastore.uris" = "thrift://hive-metastore:9083",
    "aws.s3.use_instance_profile" = "false",
    "aws.s3.access_key" = "admin",
    "aws.s3.secret_key" = "password",
    "aws.s3.region" = "us-east-1",
    "aws.s3.enable_ssl" = "false",
    "aws.s3.enable_path_style_access" = "true",
    "aws.s3.endpoint" = "http://minio:9000"
);

Topic		Replies	Views
Can I use a Hive External Catalog and have Apache Iceberg, Apache Hudi, and Apache Hive tables in it? Open Table Formats (Iceberg, Hudi, Hive, Delta)	1	123	January 5, 2024
StarRocks \| OneTable Open Table Formats (Iceberg, Hudi, Hive, Delta)	0	105	February 3, 2024
StarRocks / CelerData as a vendor that supports Iceberg Tables Open Table Formats (Iceberg, Hudi, Hive, Delta)	0	97	January 31, 2024
StarRocks query performance with Apache Hudi Open Table Formats (Iceberg, Hudi, Hive, Delta)	4	94	January 26, 2024
StarRocks and Onetable.dev support Data Loading Tools & Integrations	0	88	January 4, 2024

Working with Hudi + S3 + HMS. S3 access error

Related topics