Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestDatabases/redshift_cluster flakiness #41521

Closed
ravicious opened this issue May 14, 2024 · 9 comments · Fixed by #50605
Closed

TestDatabases/redshift_cluster flakiness #41521

ravicious opened this issue May 14, 2024 · 9 comments · Fixed by #50605
Assignees

Comments

@ravicious
Copy link
Member

Failure

Link(s) to logs

Relevant snippet

=== FAIL: e2e/aws TestDatabases/redshift_cluster/auto_user_keep (83.17s)
    redshift_test.go:162: 
        	Error Trace:	/__w/teleport/teleport/e2e/aws/redshift_test.go:275
        	            				/opt/go/src/runtime/asm_amd64.s:1695
        	Error:      	Should be false
        	Messages:   	user "auto_keep_d0fc17" should not be able to login after deactivating
    redshift_test.go:162: 
        	Error Trace:	/__w/teleport/teleport/e2e/aws/redshift_test.go:252
        	            				/__w/teleport/teleport/e2e/aws/redshift_test.go:162
        	            				/__w/teleport/teleport/e2e/aws/redshift_test.go:196
        	Error:      	Condition never satisfied
        	Test:       	TestDatabases/redshift_cluster/auto_user_keep
        	Messages:   	waiting for auto user "auto_keep_d0fc17" to be deactivated
{"caller":"reversetunnel/agent.go:561","component":"proxy:agent","leaseID":1,"level":"debug","message":"Ping -\u003e 127.0.0.1:41509.","target":"127.0.0.1:41509","timestamp":"2024-05-14T11:16:33Z","trace.fields":{"localCluster":"","targetCluster":"local-site"}}
{"addr":"127.0.0.1:48572","caller":"reversetunnel/localsite.go:777","component":"proxy:server","latency":209466,"level":"debug","message":"Ping \u003c- 127.0.0.1:48572","serverID":"localhost.local-site","timestamp":"2024-05-14T11:16:33Z","trace.fields":{"cluster":"local-site"}}
{"caller":"reversetunnel/agent.go:561","component":"proxy:agent","leaseID":1,"level":"debug","message":"Ping -\u003e 127.0.0.1:41509.","target":"127.0.0.1:41509","timestamp":"2024-05-14T11:16:36Z","trace.fields":{"localCluster":"","targetCluster":"local-site"}}
{"addr":"127.0.0.1:48572","caller":"reversetunnel/localsite.go:777","component":"proxy:server","latency":187520,"level":"debug","message":"Ping \u003c- 127.0.0.1:48572","serverID":"localhost.local-site","timestamp":"2024-05-14T11:16:36Z","trace.fields":{"cluster":"local-site"}}
{"caller":"reversetunnel/agent.go:561","component":"proxy:agent","leaseID":1,"level":"debug","message":"Ping -\u003e 127.0.0.1:41509.","target":"127.0.0.1:41509","timestamp":"2024-05-14T11:16:39Z","trace.fields":{"localCluster":"","targetCluster":"local-site"}}
{"addr":"127.0.0.1:48572","caller":"reversetunnel/localsite.go:777","component":"proxy:server","latency":191492,"level":"debug","message":"Ping \u003c- 127.0.0.1:48572","serverID":"localhost.local-site","timestamp":"2024-05-14T11:16:39Z","trace.fields":{"cluster":"local-site"}}
        --- FAIL: TestDatabases/redshift_cluster/auto_user_keep (83.17s)

 === FAIL: e2e/aws TestDatabases/redshift_cluster (7.21s)
    redshift_test.go:116: 
        	Error Trace:	/__w/teleport/teleport/e2e/aws/redshift_test.go:116
        	            				/opt/go/src/testing/testing.go:1175
        	            				/opt/go/src/testing/testing.go:1353
        	            				/opt/go/src/testing/testing.go:1657
        	Error:      	Received unexpected error:
        	            	ERROR: cannot drop this role since it has been granted on a user (SQLSTATE 0LP01)
        	Test:       	TestDatabases/redshift_cluster
        	Messages:   	test cleanup failed, stmt="DROP ROLE \"auto_role1_42536b\""
    redshift_test.go:116: 
        	Error Trace:	/__w/teleport/teleport/e2e/aws/redshift_test.go:116
        	            				/opt/go/src/testing/testing.go:1175
        	            				/opt/go/src/testing/testing.go:1353
        	            				/opt/go/src/testing/testing.go:1657
        	Error:      	Received unexpected error:
        	            	ERROR: cannot drop this role since it has been granted on a user (SQLSTATE 0LP01)
        	Test:       	TestDatabases/redshift_cluster
        	Messages:   	test cleanup failed, stmt="DROP ROLE \"auto_role2_8114fd\""
@zmb3
Copy link
Collaborator

zmb3 commented May 21, 2024

https://github.com/gravitational/teleport/actions/runs/9176973071/job/25233510207?pr=41813

@GavinFrazar can you take a look?

@nklaassen
Copy link
Contributor

@zmb3
Copy link
Collaborator

zmb3 commented May 31, 2024

@GavinFrazar
Copy link
Contributor

GavinFrazar commented Jun 2, 2024

I was out all last week, but I did discover shortly before going on leave that the tests are failing due to a deadlock bug in our auto user provisioning SQL, so I'm pretty confident that these failures are legit just inconsistent.
Basically what I've found is that concurrent transactions can acquire the same locks out of order leading to two transactions waiting on eachother. The database detects this and aborts one of the transactions, leaving the auto provisioned user activated instead of deactivating it:

{"caller":"postgres/engine.go:142","component":"db:service","db":"ci-database-e2e-tests-redshif
t-cluster-us-west-2-307493967395","error":"ERROR: deadlock detected (SQLSTATE 40P01)","id":"c5f
031d4-1c88-4231-b9b5-0ff070b02e8f","level":"error","message":"Failed to teardown auto user.","t
imestamp":"2024-05-27T14:21:37-07:00"}

I'll look into fixing that this week

@rosstimothy
Copy link
Contributor

@ravicious
Copy link
Member Author

@ibeckermayer
Copy link
Contributor

@rosstimothy
Copy link
Contributor

@zmb3
Copy link
Collaborator

zmb3 commented Dec 19, 2024

v16 hit: https://github.com/gravitational/teleport/actions/runs/12420352680/job/34677692596

    redshift_test.go:130: 
        	Error Trace:	/__w/teleport/teleport/e2e/aws/redshift_test.go:130
        	            				/opt/go/src/testing/testing.go:1175
        	            				/opt/go/src/testing/testing.go:1353
        	            				/opt/go/src/testing/testing.go:1657
        	Error:      	Received unexpected error:
        	            	ERROR: could not complete because of conflict with concurrent transaction (SQLSTATE XX000)
        	Test:       	TestDatabases/redshift_cluster
        	Messages:   	test cleanup failed, stmt="DROP USER IF EXISTS \"test_admin_443be6\""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants