Skip to content

A56 update: don't fake a report of TF when failover timer fires#509

Merged
markdroth merged 2 commits into
grpc:masterfrom
markdroth:priority_simplification
Aug 18, 2025
Merged

A56 update: don't fake a report of TF when failover timer fires#509
markdroth merged 2 commits into
grpc:masterfrom
markdroth:priority_simplification

Conversation

@markdroth

Copy link
Copy Markdown
Member

No description provided.

@ejona86 ejona86 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably also need to update the line:

It will also be started if the child reports CONNECTING and it has previously reported READY or IDLE more recently than TRANSIENT_FAILURE.

The easiest change is maybe "when the child first reports CONNECTING"? That language would tempt me to only check seenReadyOrIdleSinceTransientFailure when the state changes to CONNECTING instead of clearing out the variable (+rename) like done in grpc/grpc#40453 . I don't really care which way we go, but can_start_failover_timer_ looks awkward to describe here.

@markdroth

Copy link
Copy Markdown
Member Author

@ejona86 I've updated the wording to try to clarify that point. PTAL.

ejona86 added a commit to ejona86/grpc-java that referenced this pull request Aug 18, 2025
Since c4256ad we no longer fabricate a TRANSIENT_FAILURE update from
children. However, previously that would have set
seenReadyOrIdleSinceTransientFailure = false and prevented future timer
creation. If a LB policy gives extraneous updates with state CONNECTING,
then it was possible to re-create failOverTimer which would then wait
the 10 seconds for the child to finish CONNECTING. We only want to give
the child one opportunity after transitioning out of READY/IDLE.

grpc/proposal#509
@markdroth markdroth merged commit 4a8687b into grpc:master Aug 18, 2025
1 check passed
@markdroth markdroth deleted the priority_simplification branch August 18, 2025 23:38
copybara-service Bot pushed a commit to grpc/grpc that referenced this pull request Aug 19, 2025
…0453)

As per grpc/proposal#509.

CC @ejona86 @dfawley

Closes #40453

COPYBARA_INTEGRATE_REVIEW=#40453 from markdroth:priority_simplification 4227145
PiperOrigin-RevId: 796640653
kannanjgithub pushed a commit to grpc/grpc-java that referenced this pull request Aug 19, 2025
Since c4256ad we no longer fabricate a TRANSIENT_FAILURE update from
children. However, previously that would have set
seenReadyOrIdleSinceTransientFailure = false and prevented future timer
creation. If a LB policy gives extraneous updates with state CONNECTING,
then it was possible to re-create failOverTimer which would then wait
the 10 seconds for the child to finish CONNECTING. We only want to give
the child one opportunity after transitioning out of READY/IDLE.

grpc/proposal#509
kannanjgithub pushed a commit to kannanjgithub/grpc-java that referenced this pull request Aug 19, 2025
…#12289)

Since c4256ad we no longer fabricate a TRANSIENT_FAILURE update from
children. However, previously that would have set
seenReadyOrIdleSinceTransientFailure = false and prevented future timer
creation. If a LB policy gives extraneous updates with state CONNECTING,
then it was possible to re-create failOverTimer which would then wait
the 10 seconds for the child to finish CONNECTING. We only want to give
the child one opportunity after transitioning out of READY/IDLE.

grpc/proposal#509
ejona86 added a commit to grpc/grpc-java that referenced this pull request Aug 19, 2025
Since c4256ad we no longer fabricate a TRANSIENT_FAILURE update from
children. However, previously that would have set
seenReadyOrIdleSinceTransientFailure = false and prevented future timer
creation. If a LB policy gives extraneous updates with state CONNECTING,
then it was possible to re-create failOverTimer which would then wait
the 10 seconds for the child to finish CONNECTING. We only want to give
the child one opportunity after transitioning out of READY/IDLE.

grpc/proposal#509
asheshvidyut pushed a commit to asheshvidyut/grpc that referenced this pull request Aug 22, 2025
…pc#40453)

As per grpc/proposal#509.

CC @ejona86 @dfawley

Closes grpc#40453

COPYBARA_INTEGRATE_REVIEW=grpc#40453 from markdroth:priority_simplification 4227145
PiperOrigin-RevId: 796640653
paulosjca pushed a commit to paulosjca/grpc that referenced this pull request Aug 23, 2025
…pc#40453)

As per grpc/proposal#509.

CC @ejona86 @dfawley

Closes grpc#40453

COPYBARA_INTEGRATE_REVIEW=grpc#40453 from markdroth:priority_simplification 4227145
PiperOrigin-RevId: 796640653
asheshvidyut pushed a commit to asheshvidyut/grpc that referenced this pull request Sep 12, 2025
…pc#40453)

As per grpc/proposal#509.

CC @ejona86 @dfawley

Closes grpc#40453

COPYBARA_INTEGRATE_REVIEW=grpc#40453 from markdroth:priority_simplification 4227145
PiperOrigin-RevId: 796640653
AgraVator pushed a commit to AgraVator/grpc-java that referenced this pull request Sep 26, 2025
…#12289)

Since c4256ad we no longer fabricate a TRANSIENT_FAILURE update from
children. However, previously that would have set
seenReadyOrIdleSinceTransientFailure = false and prevented future timer
creation. If a LB policy gives extraneous updates with state CONNECTING,
then it was possible to re-create failOverTimer which would then wait
the 10 seconds for the child to finish CONNECTING. We only want to give
the child one opportunity after transitioning out of READY/IDLE.

grpc/proposal#509
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants