Skip to content

[file system] Configurable token delegation process to enable other authentication methods in S3 credential provider chain#1245

Closed
michaelkoepf wants to merge 17 commits into
apache:mainfrom
michaelkoepf:fs/configurable-token-delegation-process
Closed

[file system] Configurable token delegation process to enable other authentication methods in S3 credential provider chain#1245
michaelkoepf wants to merge 17 commits into
apache:mainfrom
michaelkoepf:fs/configurable-token-delegation-process

Conversation

@michaelkoepf

@michaelkoepf michaelkoepf commented Jun 30, 2025

Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #1246

Brief change log

  • Allow to enable/disable token delegation process in Fluss S3 filesystem via config option
  • Pass filesystem config options from Fluss Flink connector to client module and whitelist certain config options with the client prefix in the S3 file system
  • Determine if file system is initialized by server or client based on user config to correctly initialize the file system
  • Update documentation

Tests

Unit Tests

Added

  • fluss-filesystems/fluss-fs-s3/src/test/java/com/alibaba/fluss/fs/s3/S3FileSystemPluginTest.java
  • fluss-filesystems/fluss-fs-s3/src/test/java/com/alibaba/fluss/fs/s3/token/S3DelegationTokenReceiverTest.java

Example usage with MinIO

  1. Checkout PR and build using ./mvnw clean package -DskipTests.

  2. docker directory: Build fluss image with name fluss:local-snapshot. Execute the prepare.sh for the quickstart image, copy the local fluss-fs-s3 from this PR into the lib folder and build the image with name fluss-quickstart-flink:local-snapshot.

  3. docker-compose.yml

services:
  #begin Fluss cluster
  coordinator-server:
    image: fluss:local-snapshot
    command: coordinatorServer
    depends_on:
      - zookeeper
      - minio-init
    environment:
      - ROOT_LOG_LEVEL=DEBUG
      - |
        FLUSS_PROPERTIES=
        zookeeper.address: zookeeper:2181
        bind.listeners: FLUSS://coordinator-server:9123
        remote.data.dir: s3://fluss/remote-data
        fs.s3.enable-token-delegation: false
        s3.access-key: fluss
        s3.secret-key: 12345678
        s3.path-style-access: true
        s3.region: unknown_region
        s3.endpoint: http://minio:9000
        datalake.format: paimon
        datalake.paimon.metastore: filesystem
        datalake.paimon.warehouse: /tmp/paimon
    volumes:
      - shared-tmpfs:/tmp/paimon
  tablet-server:
    image: fluss:local-snapshot
    command: tabletServer
    depends_on:
      - coordinator-server
    environment:
      - ROOT_LOG_LEVEL=DEBUG
      - |
        FLUSS_PROPERTIES=
        zookeeper.address: zookeeper:2181
        bind.listeners: FLUSS://tablet-server:9123
        data.dir: /tmp/fluss/data
        remote.data.dir: s3://fluss/remote-data
        remote.log.task-interval-duration: 15s
        # reduce for testing purposes
        log.segment.file-size: 4k
        fs.s3.enable-token-delegation: false
        s3.access-key: fluss
        s3.secret-key: 12345678
        s3.path-style-access: true
        s3.region: unknown_region
        s3.endpoint: http://minio:9000
        kv.snapshot.interval: 15s
        datalake.format: paimon
        datalake.paimon.metastore: filesystem
        datalake.paimon.warehouse: /tmp/paimon
    volumes:
      - shared-tmpfs:/tmp/paimon
  zookeeper:
    restart: always
    image: zookeeper:3.9.2
  #end
  #begin Flink cluster
  jobmanager:
    image: fluss-quickstart-flink:local-snapshot
    depends_on:
      - minio-init
    ports:
      - "8083:8081"
    command: jobmanager
    environment:
      - ROOT_LOG_LEVEL=DEBUG
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: jobmanager
        fluss.client.fs.s3.access-key: fluss
        fluss.client.fs.s3.secret-key: 12345678
  taskmanager:
    image: fluss-quickstart-flink:local-snapshot
    depends_on:
      - jobmanager
    command: taskmanager
    environment:
      - ROOT_LOG_LEVEL=DEBUG
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: jobmanager
        taskmanager.numberOfTaskSlots: 10
        taskmanager.memory.process.size: 2048m
        taskmanager.memory.framework.off-heap.size: 256m
  #end
  #begin minio
  minio:
    image: minio/minio:latest
    command: server /data --console-address ":9001"
    ports:
      - "9000:9000"
      - "9001:9001"
    environment:
      - MINIO_ROOT_USER=fluss
      - MINIO_ROOT_PASSWORD=12345678
    volumes:
      - minio-data:/data
  minio-init:
    image: minio/minio:latest
    depends_on:
      - minio
    entrypoint: >
      /bin/sh -c "
      sleep 10;
      /usr/bin/mc alias set minio http://minio:9000 fluss 12345678;
      /usr/bin/mc mb minio/fluss;
      /usr/bin/mc mb minio/paimon;
      exit 0;
      "
  #end

volumes:
  minio-data:
    driver: local
    driver_opts:
      type: "tmpfs"
      device: "tmpfs"
  shared-tmpfs:
    driver: local
    driver_opts:
      type: "tmpfs"
      device: "tmpfs"

Start with docker compose up -d.

Notes:

  • Since we set fs.s3.enable-token-delegation: false, the token delegation process is deactivated, and there will be no exception thrown during obtaining the session token via STS.

  • There is no need to set region, endpoint or path-style-access on the client side, because the information will be distributed from the server to the client even when token delegation is deactivated.

  1. docker compose exec jobmanager ./sql-client

  2. Based on quickstart guide, create catalog and table, and insert data.

CREATE CATALOG fluss_catalog WITH (
    'type' = 'fluss',
    'bootstrap.servers' = 'coordinator-server:9123'
);
USE CATALOG fluss_catalog;
CREATE TABLE fluss_order (
    `order_key` BIGINT,
    `cust_key` INT NOT NULL,
    `total_price` DECIMAL(15, 2),
    `order_date` DATE,
    `order_priority` STRING,
    `clerk` STRING,
    `ptime` AS PROCTIME(),
    PRIMARY KEY (`order_key`) NOT ENFORCED
);
EXECUTE STATEMENT SET
BEGIN
    INSERT INTO fluss_order SELECT * FROM `default_catalog`.`default_database`.source_order;
END;
  1. Go to http://localhost:9001. Log in with user fluss and password 12345678.

  2. Wait until fluss/remote-data contains the log folder, i.e., logs are tiered so we can be sure that we read from MinIO.

  3. SELECT * FROM fluss_order; should successfully read from MinIO.

API and Format

n/a

Documentation

fluss-docs/docs/maintenance/filesystems/s3

Comment thread fluss-client/src/main/java/com/alibaba/fluss/client/FlussConnection.java Outdated
@michaelkoepf michaelkoepf force-pushed the fs/configurable-token-delegation-process branch 2 times, most recently from f3981d3 to 92ba1e2 Compare June 30, 2025 16:56
@@ -59,13 +63,8 @@ public static void updateHadoopConfig(org.apache.hadoop.conf.Configuration hadoo
LOG.debug("Provider already exists");
}

// then, set addition info
if (additionInfos == null) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we actually ensure here that there is no race condition? i.e., that additionInfos is set before this method is called?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean race condition? additionInfos may be set before or after this method is called.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, not the entire code block i want to show is visible in the comment.

this code block is from com.alibaba.fluss.fs.s3.token.S3DelegationTokenReceiver#updateHadoopConfig and is only called in file system creation. it expects additionInfos to be set. if not, an exception is thrown.

if (additionInfos == null) {
// if addition info is null, it also means we have not received any token,
    // we throw InvalidCredentialsException
    throw new NoAwsCredentialsException(DynamicTemporaryAWSCredentialsProvider.COMPONENT);
} else {
    for (Map.Entry<String, String> entry : additionInfos.entrySet()) {
        hadoopConfig.set(entry.getKey(), entry.getValue());
    }
}

however, additionInfos is only written to in com.alibaba.fluss.fs.s3.token.S3DelegationTokenReceiver#onNewTokensObtained() which is only called by SecurityTokenReceiverRepository.

  1. how do we ensure that com.alibaba.fluss.fs.s3.token.S3DelegationTokenReceiver#onNewTokensObtained() was called, so we do not end up throwing an exception during file system creation?
  2. couldn't we just move this code block
for (Map.Entry<String, String> entry : additionInfos.entrySet()) {
    hadoopConfig.set(entry.getKey(), entry.getValue());

into com.alibaba.fluss.fs.s3.token.S3DelegationTokenReceiver#onNewTokensObtained() and remove it in com.alibaba.fluss.fs.s3.token.S3DelegationTokenReceiver#updateHadoopConfig?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1: Yes, you are right. it may happen that it may be end up throwing an exception. But I think it happens rarely. If it happens, it can always retry, as a flink job. In the firstly version, I block it to wait until the token is received. But Jark think it's hard to debug if it hangs whiling obtaining token. So, in here, I just throw the exception
2: I'm not sure how it can solve the problem. The Filesystem will call updateHadoopConfig to init a filesytem, even though we move the code block into onNewTokensObtained, but it may still happen that the token is not avaiable, and we still can't init the filsystem.

PropertiesUtils.extractAndRemovePrefix(flinkConfig.toMap(), FLUSS_PREFIX)
.forEach(flussConfig::setString);
} catch (NoSuchMethodError e) {
// Flink 1.18 does not have the toMap() method yet, see

@michaelkoepf michaelkoepf Jun 30, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note in the documentation for this.

@michaelkoepf michaelkoepf changed the title [WIP][file system] Configurable token delegation process to enable other authentication methods in credential provider chain [WIP][file system] Configurable token delegation process to enable other authentication methods in S3 credential provider chain Jun 30, 2025
@michaelkoepf michaelkoepf force-pushed the fs/configurable-token-delegation-process branch from 92ba1e2 to 0e29681 Compare June 30, 2025 17:42
@michaelkoepf michaelkoepf changed the title [WIP][file system] Configurable token delegation process to enable other authentication methods in S3 credential provider chain [file system] Configurable token delegation process to enable other authentication methods in S3 credential provider chain Jun 30, 2025
@michaelkoepf michaelkoepf marked this pull request as ready for review June 30, 2025 20:37
@michaelkoepf michaelkoepf requested review from luoyuxia and wuchong July 1, 2025 22:27

byte[] tokenBytes = token.getToken();
if (tokenBytes.length != 0) {
credentials = CredentialsJsonSerde.fromJson(tokenBytes);

@michaelkoepf michaelkoepf Jul 4, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we lock on credentials and additionalInfos to be safe? both fields are accessed from separate threads (one reads, the other one writes) and assignment may not be a single machine level instruction.

@michaelkoepf michaelkoepf marked this pull request as draft July 4, 2025 10:27
@michaelkoepf michaelkoepf force-pushed the fs/configurable-token-delegation-process branch 4 times, most recently from cb001e7 to 2c7731b Compare July 11, 2025 21:06
@polyzos polyzos force-pushed the main branch 3 times, most recently from d88c76c to 434a4f4 Compare August 31, 2025 15:13
@ka-steve

Copy link
Copy Markdown

Do you have a planned timeline for the release date of this feature? In 0.8, S3 can only be used with baked-in permanent credentials, which, I assume, prevents most AWS customers using Fluss in production.

@michaelkoepf michaelkoepf force-pushed the fs/configurable-token-delegation-process branch 2 times, most recently from a589ea8 to 2e4c770 Compare November 28, 2025 08:40
@michaelkoepf

Copy link
Copy Markdown
Contributor Author

In general, this PR is ready for review/testing. We need to figure out 2 things.

  1. If AWS Credential Providers with Temporary Credentials (e.g., the IAM Credential Provider) finally work. @polyzos can provide AWS access. We need somebody who has experience in setting up different temporary authentication methods with S3 and EC2.
  2. There is some anomaly that I currently cannot make sense of (see below).

The idea of the PR is to deactivate token delegation to be able to use other credential providers. Without token delegation, users should be forced to set the credential provider to avoid misconfiguration or use unsafe credential providers that use long-term credentials. A more detailed description is in the update docs of the PR. To force users to set the credential provider when token delegation is deactivated, I set the credential provider config options to blank, see here.

If you check out the PR and follow the instructions under Example usage with MinIO (build Fluss from source, build Docker images with the specified names, etc.) you can see that s3.aws.credentials.provider: org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider is set in the Coordinator and Tablet Server, and everything works. Data is written to MinIO.

Now the anomaly: Shut down the Docker Compose stack. Remove s3.aws.credentials.provider: org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider from the Coordinator and Tablet server configuration but leave the credentials (access key, secret) there. Spin up the Docker Compose stack again. Follow the instructions in the example. For some reason, data is still written to MinIO. This is a contradiction to me, because I set the credential providers by default to blank.

@michaelkoepf michaelkoepf marked this pull request as ready for review November 28, 2025 08:46
@polyzos

polyzos commented Nov 28, 2025

Copy link
Copy Markdown
Contributor

@michaelkoepf @luoyuxia I tested it out, and it works. But then I tried to enable both lake tiering.
Seems like im getting different behaviors between runs.. On the first case, although tiering worked fine, lake tiering wasn't adding any data.
On the second run i got this exception

Caused by: org.apache.fluss.exception.FlussRuntimeException: Failed to get snapshot metadata for table bucket TableBucket{tableId=1, bucket=1}, snapshot ID: 0, Table: ecommercedb.users
	at org.apache.fluss.client.table.scanner.TableScan.createBatchScanner(TableScan.java:138)
	at org.apache.fluss.flink.tiering.source.TieringSplitReader.checkSplitOrStartNext(TieringSplitReader.java:196)
	at org.apache.fluss.flink.tiering.source.TieringSplitReader.fetch(TieringSplitReader.java:123)
	at org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58)
	at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:165)
	... 6 more
Caused by: java.util.concurrent.ExecutionException: org.apache.fluss.exception.KvSnapshotNotExistException: Failed to get kv snapshot metadata for table bucket TableBucket{tableId=1, bucket=1} and snapshot id 0. Error: Kv snapshot not found
	at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown Source)
	at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source)
	at org.apache.fluss.client.table.scanner.TableScan.createBatchScanner(TableScan.java:135)

Can someone test if you can make it work with both?
PS: On AWS S3, both work fine

@leekeiabstraction

leekeiabstraction commented Nov 28, 2025

Copy link
Copy Markdown
Contributor

Remove s3.aws.credentials.provider: org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider from the Coordinator and Tablet server configuration but leave the credentials (access key, secret) there. Spin up the Docker Compose stack again. Follow the instructions in the example. For some reason, data is still written to MinIO. This is a contradiction to me, because I set the credential providers by default to blank.

I might be missing something but the documented behaviour of unsetting fs.s3a.assumed.role.credentials.provider is to fallback to BasicAWSCredentialsProvider.
See here: A list of providers can be set in fs.s3a.assumed.role.credentials.provider; if unset the standard BasicAWSCredentialsProvider credential provider is used, which uses fs.s3a.access.key and fs.s3a.secret.key.

Were fs.s3a.secret.key/access.key in the properties during the second run?

To force users to set the credential provider when token delegation is deactivated, I set the credential provider config options to blank, see here.

Additional question that might help my understanding, what do we expect the user experience to look like here when they disable token delegation and forgot to set credential provider?

@polyzos

polyzos commented Nov 28, 2025

Copy link
Copy Markdown
Contributor

There might be more, but for me it boils down to two things... Currently when configuring Fluss tiering to work with minio there is this exception:

org.apache.fluss.exception.SecurityTokenException: Failed to get file access security token: The security token included in the request is invalid. (Service: AWSSecurityTokenService; Status Code: 403; Error Code: InvalidClientTokenId; Request ID: c3cb1c0c-57fc-4386-87ac-436622b40f21; Proxy: null)
taskmanager-1         | 2025-11-27 07:08:05,066 WARN  org.apache.fluss.client.token.DefaultSecurityTokenManager    [] - Failed to update tokens, will try again in 3600000 ms
taskmanager-1         | org.apache.fluss.exception.FlussRuntimeException: org.apache.fluss.exception.SecurityTokenException: Failed to get file access security token: The security token included in the request is invalid. (Service: AWSSecurityTokenService; Status Code: 403; Error Code: InvalidClientTokenId; Request ID: c3cb1c0c-57fc-4386-87ac-436622b40f21; Proxy: null)
taskmanager-1      

So my understanding is that when you run against MinIO, you typically use static S3 credentials and a custom endpoint. That works fine with Hadoop S3A if you set the right configs. Fluss’s DefaultSecurityTokenManager still tries to obtain file-access tokens from the server. Those tokens are AWS STS–based. If your environment isn’t wired to AWS STS (as with local MinIO), calls fail.
In the S3 filesystem plugin (S3FileSystemPlugin#setCredentialProvider), if fs.s3a.access.key is present, Fluss does NOT inject the STS-based credential provider. That part is good. However, the client still creates and starts DefaultSecurityTokenManager when the remote file downloader is used FlussConnection#getOrCreateRemoteFileDownloader, which forces the token flow and can fail before you even touch S3..

So I think @michaelkoepf solution should work.. If someone has time to try out different scenarios and see the behavior with some stronger understanding than me, it would be amazing.

The second thing to test and figure out is this: #1911 (comment)
Michael also added a comment there.

@michaelkoepf

michaelkoepf commented Nov 28, 2025

Copy link
Copy Markdown
Contributor Author

@leekeiabstraction very strong catch.

Were fs.s3a.secret.key/access.key in the properties during the second run?

yes, that was the manual test scenario: "but leave the credentials (access key, secret) there". verification check, kind of. but i did miss out on the fs.s3a.assumed.role.credentials.provider behavior, indeed.

Additional question that might help my understanding, what do we expect the user experience to look like here when they disable token delegation and forgot to set credential provider?

we would have treated this as a configuration error, which causes s3 access to fail. this mechanism was introduced as a safety net. users should prefer short-term credentials, as we discourage embedding long-term credentials in the client. if they want to use long-term credentials in the client, they should be explicit about it by setting the respective credential provider.

any ideas how we can achieve this despite the behavior of fs.s3a.assumed.role.credentials.provider? maybe we can set fs.s3a.assumed.role.credentials.provider to a credential provider that uses short-term credentials and remove the long-term credential providers from s3.aws.credentials.provider's default chain (instead of setting it to blank)?

@leekeiabstraction

Copy link
Copy Markdown
Contributor

any ideas how we can achieve this despite the behavior of fs.s3a.assumed.role.credentials.provider? maybe we can set fs.s3a.assumed.role.credentials.provider to a credential provider that uses short-term credentials and remove the long-term credential providers from s3.aws.credentials.provider's default chain (instead of setting it to blank)?

I may need to sync to understand the user experience that we want in terms of happy cases as well.

But if you intend to support this specific scenario ie user should run into error when they disable token delegation without specifying 's3.aws.credentials.provider', my suggestion is that we should check the configurations and fail-fast by throwing an exception at the configuration stage, rather than passing it on and relying on dependency's behaviour.

Also, modifying credentials provider behind the scene might catch user by surprise.

@michaelkoepf

michaelkoepf commented Nov 29, 2025

Copy link
Copy Markdown
Contributor Author

fail-fast by throwing an exception at the configuration stage, rather than passing it on and relying on dependency's behaviour

i think this was considered in the beginning. the problem there was that deactivating the token delegation (fs.s3.enable-token-delegation) is introduces as a server side config. in the client, we always prepend the token delegation credential provider to the chain and at runtime, a valid provider should be figured out automatically. on initialization of the file system, the client is not aware whether token delegation will be used or not. in general, you could also introduce the config on the client side, but keeping a boolean flag (fs.s3.enable-token-delegation: {true|false}`) in sync across clients and server was considered even less user friendly than the solution with setting the credential provider explicitly, iirc.

bottom line: i think the easiest way is to just leave default providers in place, if possible. this would need to be further investigated, because config options are also used to figure out whether we are on the server or client side (see here). but actually, i think checking if fs.s3.enable-token-delegation is present (no matter if deactivated or not) is sufficient for checking if we are on the server or client side, because it's not whitelisted for clients.

so i suggest, if it is possible to leave the credential providers in place, we put a warning in the docs that short-term credentials should be preferred if possible. wdyt based on your experience with hadoop aws in other projects? @leekeiabstraction

@leekeiabstraction

Copy link
Copy Markdown
Contributor

Hey @michaelkoepf

so i suggest, if it is possible to leave the credential providers in place, we put a warning in the docs that short-term credentials should be preferred if possible. wdyt based on your experience with hadoop aws in other projects?

I've not used TD in the past so taking my time to understand the scenarios here.

Warning in logs and also documentation seems sensible to me but before recommending that, I'd like to understand the fuller picture as there are a few components that I do not have context on: MinIO and when/where configurations are defined.

Can you elaborate the following? Happy to sync on huddle

S3

Coordinator Server Tablet Server Client Current client experience Intended client experience after this PR
Token Delegation Enabled? true true true Session delegation token requested by client from STS? Session delegation token requested by client from STS?
Token Delegation Enabled? true true false ? We're discussing this
Token Delegation Enabled? false false false N/A token delegation cannot be disabled ?
Token Delegation Enabled? false false true N/A token delegation cannot be disabled ?

MinIO

Coordinator Server Tablet Server Client Current client experience Intended client experience after this PR
Token Delegation Enabled? true true true ? ?
Token Delegation Enabled? true true false ? We're discussing this
Token Delegation Enabled? false false false N/A token delegation cannot be disabled ?
Token Delegation Enabled? false false true N/A token delegation cannot be disabled ?

@michaelkoepf michaelkoepf force-pushed the fs/configurable-token-delegation-process branch from 2e4c770 to cf8ad90 Compare December 28, 2025 17:38
@michaelkoepf michaelkoepf force-pushed the fs/configurable-token-delegation-process branch from 01cc836 to 1423313 Compare December 28, 2025 18:40
// set credential provider
setCredentialProvider(hadoopConfig);
final boolean isClient = isClient(flussConfig);
final boolean useTokenDelegation;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename from useTokenDelegation to shouldServeToken or shouldDelegateToken as technically the client side still uses token delegation if server is configured to perform token delegation.


// pass through all fluss options from flink config
try {
PropertiesUtils.extractAndRemovePrefix(flinkConfig.toMap(), FLUSS_PREFIX)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 243 to 248 already forward table configs with prefix client to fluss config.

This will remove any table specific config if the name clashes. In such cases, should table specific config take precedence over flink config?

// pass through all fluss options from flink config
try {
PropertiesUtils.extractAndRemovePrefix(flinkConfig.toMap(), FLUSS_PREFIX)
.forEach(flussConfig::setString);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we enforce/prependclient. prefix here? Technically, all config in FlinkTableFactory is Fluss client side config. As is, we are leaking config non client config through if user missed out the client part.. I believe that leads to the best effort handling logic that we have within isClient(configuration)

Comment on lines +164 to +167
}

String flussKeyClientPrefix = CLIENT_PREFIX + flussPrefix;
if (flussKey.startsWith(flussKeyClientPrefix)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest the following to avoid unnecessary alloc and comparison.

else if (flussKey.startsWith(CLIENT_PREFIX + flussPrefix)) {

Collections.reverse(credentialProviderPrependOrder);

for (String credentialProviderName : credentialProviderPrependOrder) {
if (!providers.contains(credentialProviderName)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use exact match instead of contains.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if I'm reading correctly, we currently only add a single credential provider Collections.singletonList(DynamicTemporaryAWSCredentialsProvider.NAME) from S3FileSystemPlugin.java:102. Do we need to handle updating and adding a list of credentialProviders anywhere? If not, it might be worth simplifying the code and remove the updating/adding of list of credentialProviders

@binary-signal

Copy link
Copy Markdown
Contributor

Now the anomaly: Shut down the Docker Compose stack. Remove s3.aws.credentials.provider: org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider from the Coordinator and Tablet server configuration but leave the credentials (access key, secret) there. Spin up the Docker Compose stack again. Follow the instructions in the example. For some reason, data is still written to MinIO. This is a contradiction to me, because I set the credential providers by default to blank.

Unless you force the containers to recreate, they will keep using the old configuration. That's why you can still write to MinIO.

@michaelkoepf

Copy link
Copy Markdown
Contributor Author

subsumed by #2662

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

6 participants