Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebSocket: After successful reconnect no proposed transactions are received #3180

Open
SteelBRS opened this issue Dec 7, 2019 · 16 comments
Open
Assignees

Comments

@SteelBRS
Copy link

SteelBRS commented Dec 7, 2019

The websocket regularly disconnects with closed with CloseStatus[code=1008, reason=Policy error: client is too slow.] - they come in bursts but then don't happen for 30-40 minutes, so I suspect it's not my implementation ... but I will refactor and move the processing out of the websocket receipt thread.
Anyway - the bug - at some point the websocket reconnects succesfully, but no proposed transactions are received. Not even the ones with funny money (CNY), which usually flood the network 😄

@SteelBRS SteelBRS changed the title WebSocket: After succesful reconnect no proposed transactions are received WebSocket: After successful reconnect no proposed transactions are received Dec 7, 2019
@intelliot
Copy link
Collaborator

Can you post an example that exhibits this behavior?

When you're disconnected from websocket, any prior subscriptions are canceled. When you reconnect, you won't receive any subscription messages until after you re-subscribe again. This is normal and expected.

@SteelBRS
Copy link
Author

SteelBRS commented Dec 9, 2019

It doesn't work that way in practice ... every time I'm disconnected because of 1008 client is too slow a reconnect resumes the subscription.

Hmm ... this is really odd ... at 23:14 CET last night it just stopped receiving proposed transactions ... without any websocket disconnection. (It wasn't connectivity loss on my end since my Binance websocket kept on receiving)

@SteelBRS
Copy link
Author

SteelBRS commented Dec 10, 2019

That stop of proposed transactions without websocket disconnect happened again here at 03:41:38 CET ... can you see anything in your logs why this is happening?
I'm connecting to wss://s2.ripple.com:443

@nbougalis
Copy link
Contributor

We will look into this. Thank you.

@SteelBRS
Copy link
Author

SteelBRS commented Dec 18, 2019

Are you restarting your websockets daily at around 3:40 - 4:00 CET?
This time I got a timeout, which is a lot better than failing silently 🙂

03:57:11.388 [SimpleAsyncTaskExecutor-1758] ERROR s.n.OpenWebSocketConnectionManager - Failed to connect
javax.websocket.DeploymentException: The HTTP request to initiate the WebSocket connection failed
	at org.apache.tomcat.websocket.WsWebSocketContainer.connectToServerRecursive(WsWebSocketContainer.java:486)
	at org.apache.tomcat.websocket.WsWebSocketContainer.connectToServer(WsWebSocketContainer.java:195)
	at org.springframework.web.socket.client.standard.StandardWebSocketClient.lambda$doHandshakeInternal$0(StandardWebSocketClient.java:150)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: null
	at org.apache.tomcat.websocket.AsyncChannelWrapperSecure$WrapperFuture.get(AsyncChannelWrapperSecure.java:508)
	at org.apache.tomcat.websocket.WsWebSocketContainer.connectToServerRecursive(WsWebSocketContainer.java:344)
	... 4 common frames omitted

@SteelBRS
Copy link
Author

SteelBRS commented Dec 27, 2019

My logs show there was another silent failure on 25th. of December at 17:01 CET (connection was not closed, it just stopped receiving proposedTransactions)

It doesn't look to be a problem on my side, as the Bitstamp & Binance websockets kept on receiving

@SteelBRS
Copy link
Author

Another silent failure at 06:55:59 CET this morning.

Can you at least tell me if you were restarting the production servers at that time?

@book-2333
Copy link

Are you restarting your websockets daily at around 3:40 - 4:00 CET?

This time I got a timeout, which is a lot better than failing silently 🙂


03:57:11.388 [SimpleAsyncTaskExecutor-1758] ERROR s.n.OpenWebSocketConnectionManager - Failed to connect

javax.websocket.DeploymentException: The HTTP request to initiate the WebSocket connection failed

	at org.apache.tomcat.websocket.WsWebSocketContainer.connectToServerRecursive(WsWebSocketContainer.java:486)

	at org.apache.tomcat.websocket.WsWebSocketContainer.connectToServer(WsWebSocketContainer.java:195)

	at org.springframework.web.socket.client.standard.StandardWebSocketClient.lambda$doHandshakeInternal$0(StandardWebSocketClient.java:150)

	at java.util.concurrent.FutureTask.run(FutureTask.java:266)

	at java.lang.Thread.run(Thread.java:748)

Caused by: java.util.concurrent.TimeoutException: null

	at org.apache.tomcat.websocket.AsyncChannelWrapperSecure$WrapperFuture.get(AsyncChannelWrapperSecure.java:508)

	at org.apache.tomcat.websocket.WsWebSocketContainer.connectToServerRecursive(WsWebSocketContainer.java:344)

	... 4 common frames omitted

Please!!! how to solve it

@pdp2121
Copy link

pdp2121 commented Feb 8, 2024

Hi, the validator history service is experiencing the same issue across multiple nodes subscriptions:

2024-02-08 12:41:47.227	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://100.24.12.73:51233/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:41:47.226Z","v":0}
2024-02-08 12:41:38.889	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://3.142.166.192:6006/ on test with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:41:38.888Z","v":0}
2024-02-08 12:41:32.666	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://35.85.8.19:51233/ on dev with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:41:32.665Z","v":0}
2024-02-08 12:41:20.371	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://35.85.20.122:51233/ on dev with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:41:20.371Z","v":0}
2024-02-08 12:41:19.061	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://3.71.101.152:443/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:41:19.061Z","v":0}
2024-02-08 12:41:17.345	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://78.89.200.25:6006/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:41:17.344Z","v":0}
2024-02-08 12:41:16.857	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://52.37.146.148:443/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:41:16.857Z","v":0}
2024-02-08 12:41:12.557	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://52.221.251.6:51233/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:41:12.556Z","v":0}
2024-02-08 12:41:11.218	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://194.163.173.96:6005/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:41:11.217Z","v":0}
2024-02-08 12:41:11.060	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://44.208.128.26:6006/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:41:11.059Z","v":0}
2024-02-08 12:41:07.146	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://18.195.169.174:51233/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:41:07.146Z","v":0}
2024-02-08 12:41:06.674	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://54.89.86.136:6005/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:41:06.674Z","v":0}
2024-02-08 12:41:05.514	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://34.208.12.148:51233/ on dev with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:41:05.514Z","v":0}
2024-02-08 12:40:59.097	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://195.201.195.173:6006/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:59.097Z","v":0}
2024-02-08 12:40:56.513	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://35.162.59.23:51233/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:56.513Z","v":0}
2024-02-08 12:40:55.879	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://72.44.58.240:51233/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:55.879Z","v":0}
2024-02-08 12:40:53.318	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://13.212.199.76:51233/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:53.318Z","v":0}
2024-02-08 12:40:45.503	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://44.195.55.196:51233/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:45.503Z","v":0}
2024-02-08 12:40:43.035	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://3.66.84.123:51233/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:43.034Z","v":0}
2024-02-08 12:40:38.105	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://37.27.53.61:6006/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:38.105Z","v":0}
2024-02-08 12:40:28.786	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://136.243.58.171:6005/ on test with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:28.786Z","v":0}
2024-02-08 12:40:24.101	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://3.70.225.46:443/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:24.101Z","v":0}
2024-02-08 12:40:23.259	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://34.209.3.47:443/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:23.258Z","v":0}
2024-02-08 12:40:21.751	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://34.202.55.140:6006/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:21.751Z","v":0}
2024-02-08 12:40:21.213	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://52.43.207.32:51233/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:21.213Z","v":0}
2024-02-08 12:40:17.380	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://62.113.194.3:51233/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:17.379Z","v":0}
2024-02-08 12:40:12.651	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://18.236.198.183:443/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:12.651Z","v":0}
2024-02-08 12:40:09.253	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://52.39.152.66:51233/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:09.253Z","v":0}
2024-02-08 12:40:08.117	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://34.209.245.79:443/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:08.116Z","v":0}
2024-02-08 12:40:05.258	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://54.218.72.107:443/ on dev with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:40:05.258Z","v":0}
2024-02-08 12:39:50.197	
{"name":"connections","hostname":"v-h-s-stg-connections-55bcfc5468-7lrhb","pid":8,"level":50,"msg":"WebSocket closed for ws://54.218.136.118:51233/ on main with code 1008 and reason: Policy error: client is too slow.","time":"2024-02-08T17:39:50.197Z","v":0}

Lots of node connections failed on the first try and VHS would not try again after an hour. My current fix would be to retry whenever the closing code is seen. Just wonder if there's more insight on how to prevent this from happening in the future. Thank you!

@github-project-automation github-project-automation bot moved this to 📋 Backlog in Core Ledger Feb 9, 2024
@pdp2121
Copy link

pdp2121 commented Feb 16, 2024

I'm starting to have the same issue on the VHS as after reconnections, the websockets randomly became silent (no subscription message received, even though they were still open) for certain periods. They started to receive messages again after a while or if I restarted the server. Any insights on this would be appreciated.

@Bronek
Copy link
Collaborator

Bronek commented Feb 27, 2024

@pdp2121 is the problem only showing in subscriptions , or are you missing responses/replies from other methods ?

@pdp2121
Copy link

pdp2121 commented Feb 27, 2024

Hi @Bronek, thanks for working on this issue! Currently we only send other requests based on the subscription stream so I'm not sure if other methods would fail as well. I can try to send a request periodically to test if that would help?

@Bronek
Copy link
Collaborator

Bronek commented Feb 27, 2024

Hi @Bronek, thanks for working on this issue! Currently we only send other requests based on the subscription stream so I'm not sure if other methods would fail as well. I can try to send a request periodically to test if that would help?

it would help me to narrow it down if I know if other methods, such as e.g. server_state produce output as expected, while the subscription does not. Even better if you could create a new subscription (while the old one is still in "not producing expected output" state) and see if the new subscription does work. Thank you !

@pdp2121
Copy link

pdp2121 commented Feb 29, 2024

Hi @Bronek, after testing with server_info request, it seem not to produce result when subscription is interrupted as well, and then get result when subscription resumes (they were on/off together). I will try new subscription when one is idle and let you know. Thanks!

@Bronek
Copy link
Collaborator

Bronek commented Feb 29, 2024

Thank you - could this be a connectivity problem between rippled and your client app ?

@pdp2121
Copy link

pdp2121 commented Mar 1, 2024

@Bronek Communication is via Node.js websocket library:
https://www.npmjs.com/package/ws

Example:

import WebSocket from 'ws'
const ws = new WebSocket(url, { handshakeTimeout: WS_TIMEOUT })

The app connect to all nodes possible on XRPL to get data. This issue happens on both my local machine and staging/ prod servers on GCP. Any suggestion on how we could improve the connectivity would be appreciated! Below is the connection file:
https://github.com/ripple/validator-history-service/blob/main/src/connection-manager/connections.ts
Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

No branches or pull requests

6 participants