-
Notifications
You must be signed in to change notification settings - Fork 4.5k
adds new contact-info with forward compatible sockets #29596
adds new contact-info with forward compatible sockets #29596
Conversation
b74c948
to
50269f6
Compare
@behzadnouri i pushed commits to fix the frozen abi issue with |
@ryoqun Thank you |
552d13d
to
30980d1
Compare
gossip/src/contact_info.rs
Outdated
const SOCKET_TAG_GOSSIP: u8 = 0; | ||
const SOCKET_TAG_REPAIR: u8 = 1; | ||
const SOCKET_TAG_RPC: u8 = 2; | ||
const SOCKET_TAG_RPC_PUBSUB: u8 = 3; | ||
const SOCKET_TAG_SERVE_REPAIR: u8 = 4; | ||
const SOCKET_TAG_TPU: u8 = 5; | ||
const SOCKET_TAG_TPU_QUIC: u8 = 6; | ||
const SOCKET_TAG_TPU_FORWARDS: u8 = 7; | ||
const SOCKET_TAG_TPU_FORWARDS_QUIC: u8 = 8; | ||
const SOCKET_TAG_TPU_VOTE: u8 = 9; | ||
const SOCKET_TAG_TVU: u8 = 10; | ||
const SOCKET_TAG_TVU_FORWARDS: u8 = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider an enum
or type-alias/newtype u8
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The opaque u8
here is intentional so that adding new sockets is backward compatible.
An enum
will fail deserialization if there is an unknown tag/enum-variant; as a result you cannot add a new socket in a backward compatible way because nodes not recognizing the enum-variant would fail deserializing the whole contact-info.
You can work around that by adding an Unknown(u8)
variant to the enum but that would also require manual serde implementation which would be counterproductive.
gossip/src/contact_info.rs
Outdated
// TODO: sanity check on the fields. | ||
// TODO: verify port offsets don't overflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these seem important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
fn new_rand_addr<R: Rng>(rng: &mut R) -> IpAddr { | ||
if rng.gen() { | ||
let addr = Ipv4Addr::new(rng.gen(), rng.gen(), rng.gen(), rng.gen()); | ||
IpAddr::V4(addr) | ||
} else { | ||
let addr = Ipv6Addr::new( | ||
rng.gen(), | ||
rng.gen(), | ||
rng.gen(), | ||
rng.gen(), | ||
rng.gen(), | ||
rng.gen(), | ||
rng.gen(), | ||
rng.gen(), | ||
); | ||
IpAddr::V6(addr) | ||
} | ||
} | ||
|
||
fn new_rand_port<R: Rng>(rng: &mut R) -> u16 { | ||
let port = rng.gen::<u16>(); | ||
let bits = u16::BITS - port.leading_zeros(); | ||
let shift = rng.gen_range(0u32, bits + 1u32); | ||
port.checked_shr(shift).unwrap_or_default() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these will occasionally draw values that we'd reject on sanitize, right? probably less flaky to generate the deterministically ala Pubkey::new_unique
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what change you are suggesting.
This occasionally creates invalid socket addresses which is intentional to have test coverage over paths which fail sanitize_socket
.
If we want to make the test deterministic we can use a SeedableRng
and a seeded random number generator.
Pubkey::new_unique
approach isn't good because:
https://discord.com/channels/428295358100013066/439194979856809985/881943239823880222
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what change you are suggesting. This occasionally creates invalid socket addresses which is intentional to have test coverage over paths which fail
sanitize_socket
.
we shouldn't mix testing and fuzzing like that. tests should be deterministic to avoid flake. if we need to start fuzzing stuff, that's another conversation
If we want to make the test deterministic we can use a
SeedableRng
and a seeded random number generator.Pubkey::new_unique
approach isn't good because: https://discord.com/channels/428295358100013066/439194979856809985/881943239823880222
seedable rng would be fine (would be fine to reimplement Pubkey::new_unique
over one as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if all corner cases are identified, it is impossible to cover all their various combinations using deterministic tests. Using unseeded random number generator here, we are getting a much larger test coverage.
Flaky tests by themselves are not a problem, as long as they are easily reproducible locally; in which case they do identify a legit bug or corner case.
Given that this is a single threaded test and there is no other dependence on runtime environment here, do you see any reason that if this test becomes flaky it won't be locally reproducible, by say running the test 1000 times in a loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about switch to deterministic rng, seed with time and log the seed? so we can immediately reproduce a failure with trivial modification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you didn't answer my question
gossip/src/contact_info.rs
Outdated
let &(_, index, port) = self.sockets.iter().find(|(k, _, _)| *k == tag)?; | ||
let addr = self.addrs.get(usize::from(index))?; | ||
let socket = SocketAddr::new(*addr, port); | ||
let _ = sanitize_socket(&socket).ok()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like this should've been done on construction/push?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_socket
does sanitize_socket
.
But I am thinking if we are deserializing another node's contact-info, we probably don't want to throw the entire contact-info out if one of the sockets doesn't sanitize. It would be just enough to filter out that specific socket when needed.
thiserror::Error, | ||
}; | ||
|
||
const SOCKET_TAG_GOSSIP: u8 = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add a gossip quic entry as well? We want to add Quic Gossip support. CC @lijunwangs on the related Quic repair changes for if we need a port allocated for that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding unused ports now, we can add new ports once the code using those ports is almost ready (could possibly be in the same patch). Even if we add that now it cannot be used yet until LegacyContactInfo => ContactInfo
migration is done.
c9a8be3
to
e52dabb
Compare
gossip/src/contact_info.rs
Outdated
#[serde(with = "short_vec")] | ||
addrs: Cow<'a, [IpAddr]>, | ||
#[serde(with = "short_vec")] | ||
sockets: Cow<'a, [(u8, u8, u16)]>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can define a struct to clarify what these three members means. Would this make the sockets dynamic to add additional sockets? Like quic tpu and etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gossip/src/contact_info.rs
Outdated
} | ||
|
||
fn get_socket(&self, tag: u8) -> Option<SocketAddr> { | ||
let &(_, index, port) = self.sockets.iter().find(|(k, _, _)| *k == tag)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put it into a map for quicker lookup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the list will be short. probably faster to linear probe than do the hashing for a map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual socket lookup is done by self.cache
which is an array.
pub struct ContactInfo {
// ...
#[serde(skip_serializing)]
cache: [SocketAddr; SOCKET_CACHE_SIZE],
}
@lijunwangs @t-nelson The intention of discord message was just to discuss the new |
Should we roll solana/gossip/src/crds_value.rs Lines 414 to 419 in 164c929
|
yup, doing that as well. |
d86d437
to
6b690c7
Compare
754599e
to
8102031
Compare
8102031
to
dfb521c
Compare
@@ -189,6 +189,10 @@ impl CrdsGossipPush { | |||
let crds = crds.read().unwrap(); | |||
let entries = crds | |||
.get_entries(crds_cursor.deref_mut()) | |||
.filter(|entry| { | |||
// Exclude the new ContactInfo until the cluster has upgraded. | |||
!matches!(&entry.value.data, CrdsData::ContactInfo(_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gossip/src/contact_info.rs
Outdated
} | ||
} | ||
} | ||
// Verify that all sockets have unique key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate this section in the comment? i understand what's going on here, but not how it relates to the rest of this changeset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key should be unique otherwise with multiple entries for the same key, the socket specification would be ambiguous as which one it is. This is already commented in the struct definition:
pub struct ContactInfo {
// ...
// All sockets have a unique key and a valid IP address index.
#[serde(with = "short_vec")]
sockets: Vec<SocketEntry>,
// ...
}
here I am verifying "that all sockets have unique key".
What needs to be clarified?
fn new_rand_addr<R: Rng>(rng: &mut R) -> IpAddr { | ||
if rng.gen() { | ||
let addr = Ipv4Addr::new(rng.gen(), rng.gen(), rng.gen(), rng.gen()); | ||
IpAddr::V4(addr) | ||
} else { | ||
let addr = Ipv6Addr::new( | ||
rng.gen(), | ||
rng.gen(), | ||
rng.gen(), | ||
rng.gen(), | ||
rng.gen(), | ||
rng.gen(), | ||
rng.gen(), | ||
rng.gen(), | ||
); | ||
IpAddr::V6(addr) | ||
} | ||
} | ||
|
||
fn new_rand_port<R: Rng>(rng: &mut R) -> u16 { | ||
let port = rng.gen::<u16>(); | ||
let bits = u16::BITS - port.leading_zeros(); | ||
let shift = rng.gen_range(0u32, bits + 1u32); | ||
port.checked_shr(shift).unwrap_or_default() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what change you are suggesting. This occasionally creates invalid socket addresses which is intentional to have test coverage over paths which fail
sanitize_socket
.
we shouldn't mix testing and fuzzing like that. tests should be deterministic to avoid flake. if we need to start fuzzing stuff, that's another conversation
If we want to make the test deterministic we can use a
SeedableRng
and a seeded random number generator.Pubkey::new_unique
approach isn't good because: https://discord.com/channels/428295358100013066/439194979856809985/881943239823880222
seedable rng would be fine (would be fine to reimplement Pubkey::new_unique
over one as well)
|
||
#[derive(Copy, Clone, Debug, Eq, PartialEq, AbiExample, Deserialize, Serialize)] | ||
struct SocketEntry { | ||
key: u8, // Protocol identifier, e.g. tvu, tpu, etc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we normalize on key
or tag
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is key
everywhere. There is no tag
.
..._TAG_...
is only used for constants (because those are consts not a key variable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, but we assign a *_TAG_*
constant to a key
field. which i find inconsistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took me a while to realize the "key" in this context is the one of the tags defined as well. I think it is clearer to standardize on one only
7d62da2
to
7f7b8b6
Compare
@t-nelson were you going to look at this again or are you ok to merge it? |
The commit implement new ContactInfo where * Ports and IP addresses are specified separately so that unique IP addresses can only be specified once. * Different sockets (tvu, tpu, etc) are specified by opaque u8 tags so that adding and removing sockets is backward and forward compatible. * solana_version::Version is also embedded in so that it won't need to be gossiped separately. * NodeInstance is also rolled in by adding a field identifying when the instance was first created so that it won't need to be gossiped separately. Update plan: * Once the cluster is able to ingest the new type (i.e. this patch), a 2nd patch will start gossiping the new ContactInfo along with the LegacyContactInfo. * Once all nodes in the cluster gossip the new ContactInfo, a 3rd patch will start solely using the new ContactInfo while still gossiping the old LegacyContactInfo. * Once all nodes in the cluster solely use the new ContactInfo, a 4th patch will stop gossiping the old LegacyContactInfo.
7f7b8b6
to
0d35756
Compare
yes. i'll take another pass today |
|
||
#[derive(Copy, Clone, Debug, Eq, PartialEq, AbiExample, Deserialize, Serialize)] | ||
struct SocketEntry { | ||
key: u8, // Protocol identifier, e.g. tvu, tpu, etc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took me a while to realize the "key" in this context is the one of the tags defined as well. I think it is clearer to standardize on one only
key: u8, // Protocol identifier, e.g. tvu, tpu, etc | ||
index: u8, // IpAddr index in the accompanying addrs vector. | ||
#[serde(with = "serde_varint")] | ||
offset: u16, // Port offset with respect to the previous entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the offset is not totally accurate, for the first entry it is the actual port -- so better to reflect it in the comment at least. I wonder what is the benefit of storing the offset than simply the port number itself as most complexity in this code is related to the offset arithmetic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ports tend to be close to each other, so the offsets can be stored in one byte.
For gossip we are consuming a lot of bandwidth so anything which helps in the front can be useful.
Any added complexity is encapsulated inside this struct, and this is not exposed in the public api.
New ContactInfo implemented in solana-labs#29596 specifies sockets for QUIC protocols explicitly. The commit adds tpu_quic field to RpcContactInfo in preparation of LegacyContactInfo => ContactInfo migration.
New ContactInfo implemented in solana-labs#29596 specifies sockets for QUIC protocols explicitly. The commit adds tpu_quic field to RpcContactInfo in preparation of LegacyContactInfo => ContactInfo migration.
Working towards migrating from legacy contact-info to the new contact-info: solana-labs#29596
Working towards migrating from legacy contact-info to the new contact-info: #29596
Working towards migrating from legacy contact-info to the new contact-info: solana-labs/solana#29596
Problem
ContactInfo
in a backward and forward compatible way.LegacyContactInfo
is inefficient and unnecessarily duplicates IP addresses.Summary of Changes
Implemented a new
ContactInfo
whereu8
tags so that adding and removing sockets is backward and forward compatible.solana_version::Version
is also embedded in so that it won't need to be gossiped separately.NodeInstance
crds-value will become redundant.Update plan:
ContactInfo
along with theLegacyContactInfo
.ContactInfo
, a 3rd patch will start solely using the newContactInfo
while still gossiping the oldLegacyContactInfo
.ContactInfo
, a 4th patch will stop gossiping the oldLegacyContactInfo
.