Loading S3 data into StarRocks on EKS

Hello,

I am deploying StarRocks on EKS using the kube operator via Helm. I used the shared data model with S3 as storage backend. I am using IRSA (AWS IAM role for service accounts) to allow FE and CN pods read-write access to S3.

I am using StarRocks version 3.2-latest.

Most operations are working well. I am able to create tables, insert rows and query data.

However, when I try to load data from S3 files using FILES(), I get the following error :
[42000][1064] Access storage error. Error message: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))

The attached role via IRSA allows access to the files I am trying to load. I tested the access manually from the pods. Maybe, the requested variables by the FILES() function are not available since I am using IRSA.

Is there a configuration step that I missed ? or does FILES() only work with access and secret keys ?

For information, IRSA loads the following IAM variables :

  • AWS_ROLE_ARN
  • AWS_WEB_IDENTITY_TOKEN_FILE

Thank you.

I think you may need to change the credential property according to the IAM roles

AWS S3
If you use the default authentication credential of AWS SDK to access S3, set the following properties:

"aws.s3.region" = "<region>",
"aws.s3.endpoint" = "<endpoint_url>",
"aws.s3.use_aws_sdk_default_behavior" = "true"

If you use IAM user-based credential (Access Key and Secret Key) to access S3, set the following properties:

"aws.s3.region" = "<region>",
"aws.s3.endpoint" = "<endpoint_url>",
"aws.s3.use_aws_sdk_default_behavior" = "false",
"aws.s3.use_instance_profile" = "false",
"aws.s3.access_key" = "<access_key>",
"aws.s3.secret_key" = "<secrete_key>"

If you use Instance Profile to access S3, set the following properties:

"aws.s3.region" = "<region>",
"aws.s3.endpoint" = "<endpoint_url>",
"aws.s3.use_aws_sdk_default_behavior" = "false",
"aws.s3.use_instance_profile" = "true"

If you use Assumed Role to access S3, set the following properties:

"aws.s3.region" = "<region>",
"aws.s3.endpoint" = "<endpoint_url>",
"aws.s3.use_aws_sdk_default_behavior" = "false",
"aws.s3.use_instance_profile" = "true",
"aws.s3.iam_role_arn" = "<role_arn>"

If you use Assumed Role to access S3 from an external AWS account, set the following properties:

"aws.s3.region" = "<region>",
"aws.s3.endpoint" = "<endpoint_url>",
"aws.s3.use_aws_sdk_default_behavior" = "false",
"aws.s3.use_instance_profile" = "true",
"aws.s3.iam_role_arn" = "<role_arn>",
"aws.s3.external_id" = "<external_id>"
1 Like

I was using EC2 before and forgot to change my app configuration.

I should’ve used the first auth method you pointed out (it’s working with IRSA after testing) :

"aws.s3.region" = "<region>",
"aws.s3.endpoint" = "<endpoint_url>",
"aws.s3.use_aws_sdk_default_behavior" = "true"

Thank you for your help !

Hello @elie,

I am using the same config as you but i am still unable to make the IRSA auth to work.

aws_s3_path = mybucketname/folderprefixname
aws_s3_region = us-east-1
aws_s3_endpoint = https://s3.us-east-1.amazonaws.com

Apart for creating the role and iam policies with sts:AssumeRoleWithWebIdentity action on the trust side, and setting up the service account with the role annotation can you tell what other changes you did to make it work?

I tried using a container with the aws cli and i am able to correctly list the files on the s3 bucket from there (using same IRSA config and s3 bucket).

Any idea? i am desperate to make this work.

Much thanks

Found the issue, once the default builtin_storage_volume is created using the configmap, the changes to the config are not reflected into it anymore.