Skip to content

Commit

Permalink
dragonboat: renamed observer to non-voting node
Browse files Browse the repository at this point in the history
  • Loading branch information
lni committed May 8, 2021
1 parent 884ad09 commit dd856e7
Show file tree
Hide file tree
Showing 22 changed files with 544 additions and 516 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ Dragonboat v3.4 comes with many improvements. All v3.3.x users are recommended t
- Fixed unreachable notification.
- Upgraded to a more recent version of pebble.

### Other changes

- Raft observer node has been renamed as non-voting node.

## v3.3 (2021-01-20)

Dragonboat v3.3 is a major release that comes with new features and improvements. All v3.2.x users are recommended to upgrade.
Expand Down
27 changes: 16 additions & 11 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -169,14 +169,16 @@ type Config struct {
// disable such auto compactions and use NodeHost.RequestCompaction to
// manually request such compactions when necessary.
DisableAutoCompactions bool
// IsObserver indicates whether this is an observer Raft node without voting
// power. Described as non-voting members in the section 4.2.1 of Diego
// Ongaro's thesis, observer nodes are usually used to allow a new node to
// join the cluster and catch up with other existing ndoes without impacting
// the availability. Extra observer nodes can also be introduced to serve
// read-only requests without affecting system write throughput.
//
// Observer support is currently experimental.
// IsNonVoting indicates whether this is a non-voting Raft node. Described as
// non-voting members in the section 4.2.1 of Diego Ongaro's thesis, they are
// used to allow a new node to join the cluster and catch up with other
// existing ndoes without impacting the availability. Extra non-voting nodes
// can also be introduced to serve read-only requests.
IsNonVoting bool
// IsObserver indicates whether this is a non-voting Raft node without voting
// power.
//
// Deprecated: use IsNonVoting instead.
IsObserver bool
// IsWitness indicates whether this is a witness Raft node without actual log
// replication and do not have state machine. It is mentioned in the section
Expand Down Expand Up @@ -225,8 +227,11 @@ func (c *Config) Validate() error {
if c.IsWitness && c.SnapshotEntries > 0 {
return errors.New("witness node can not take snapshot")
}
if c.IsWitness && c.IsObserver {
return errors.New("witness node can not be an observer")
if c.IsObserver {
c.IsNonVoting = true
}
if c.IsWitness && c.IsNonVoting {
return errors.New("witness node can not be a non-voting node")
}
return nil
}
Expand Down Expand Up @@ -284,7 +289,7 @@ type NodeHostConfig struct {
// by their NodeHostID values. This feature is usually used when only dynamic
// addresses are available. When enabled, NodeHostID values should be used
// as the target parameter when calling NodeHost's StartCluster,
// RequestAddNode, RequestAddObserver and RequestAddWitness methods.
// RequestAddNode, RequestAddNonVoting and RequestAddWitness methods.
//
// Enabling AddressByNodeHostID also enables the internal gossip service,
// NodeHostConfig.Gossip must be configured to control the behaviors of the
Expand Down
4 changes: 2 additions & 2 deletions config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ func TestIsValidAddress(t *testing.T) {
}
}

func TestWitnessNodeCanNotBeAnObserver(t *testing.T) {
cfg := Config{IsWitness: true, IsObserver: true}
func TestWitnessNodeCanNotBeNonVoting(t *testing.T) {
cfg := Config{IsWitness: true, IsNonVoting: true}
if err := cfg.Validate(); err == nil {
t.Fatalf("witness node can not be an observer")
}
Expand Down
2 changes: 1 addition & 1 deletion docs/devops.CHS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
* 必须使用本地硬盘,建议使用高写入寿命的企业级NVME固态硬盘,避免使用NFS、CIFS、Samba或Ceph等任何形式网络共享存储方式。
* Dragonboat所生成存储的数据绝不可通过直接文件、目录拷贝覆盖操作来进行备份与恢复。这将永久性损坏所涉及的Raft组。
* 每个Raft组已有多份副本,增加Raft组的副本数量是避免因部分节点故障失效而带来服务不可用及数据丢失的最好解决途径。比如,5个副本允许至少2个节点同时发生故障,它较3副本带来更高的数据安全与系统高可用性保障。
* 在个别节点发生故障后,如果多数派Quorum依旧存在、Raft组依旧可用,应该首先增加一个Observer节点以开始立即同步Raft组状态,待同步完成后立刻将其升级为普通节点,然后通过成员变更移除故障节点。对于间歇性的可立即恢复的硬件故障,比如短暂的网络分区或系统掉电,可立即试图修复故障机器。
* 在个别节点发生故障后,如果多数派Quorum依旧存在、Raft组依旧可用,应该首先增加一个non-voting节点以开始立即同步Raft组状态,待同步完成后立刻将其升级为普通节点,然后通过成员变更移除故障节点。对于间歇性的可立即恢复的硬件故障,比如短暂的网络分区或系统掉电,可立即试图修复故障机器。
* 如发生磁盘损坏,比如发生磁盘相关数据的校验错,在排除系软件故障引起后,应立即停止该节点并替换磁盘,并通过上述组成员变更方法替换已永久故障的节点。在重启已确认磁盘故障的节点前,应确保已通过磁盘替换方式完全清空所有Dragonboat数据,且使用新磁盘的该节点应使用新的RaftAddress值,在IP或者DNS Name无法变更的情况下,可使用其它端口来确保RaftAddress被更新。
* 在极端情况下,当多数节点同时永久故障并无法修复时,Raft组将不可用。此时须使用github.com/lni/dragonboat/tools包提供的ImportSnapshot工具修复受损的Raft组。这需要用户日常定期使用NodeHost的ExportSnapshot方法导出并备份快照供此灾备用途。
* 在默认方式下,NodeHost每次重启后其RaftAddress不能变化,否则将报错。
Expand Down
2 changes: 1 addition & 1 deletion docs/devops.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This document describes the DevOps requirements for operating Dragonboat based a
* It is recommended to use enterprise NVME SSD with high write endurance rating. Must use local hard disks and avoid any NFS, CIFS, Samba, CEPH or other similar shared storage.
* Never try to backup or restore Dragonboat data by directly operating on Dragonboat data files or directories. It can immediately corrupt your Raft clusters.
* Each Raft group has multiple replicas, the best way to safeguard the availability of your services and data is to increase the number of replicas. As an example, the Raft group can tolerant 2 node failures when there are 5 replicas, while it can only tolerant 1 node failure when using 3 replicas.
* On node failure, the Raft group will be available when it still has the quorum. To handle such failures, you can add an Observer node to start replicating data to it, once in sync with other replicas you can promote the Observer to a regular node and remove the failed node by using membership change APIs. For those failed nodes caused by intermittent failures such as short term network partition or power loss, you should resolve the network or power issue and try restarting the affected nodes.
* On node failure, the Raft group will be available when it still has the quorum. To handle such failures, you can add a non-voting node to start replicating data to it, once in sync with other replicas you can promote the non-voting node to a regular node and remove the failed node by using membership change APIs. For those failed nodes caused by intermittent failures such as short term network partition or power loss, you should resolve the network or power issue and try restarting the affected nodes.
* On disk failure, such as when experiencing data integrity check errors or write failures, it is important to immediately replace the failed disk and remove the failed node using the above described membership change method. To restart nodes with such disk failures, it is important to have the failed disk replaced first to ensure corrupted data is removed. As a refreshed node with no existing data, that node must be assigned a new RaftAddress value to avoid confusing other nodes.
* When the quorum nodes are gone, you will not be able to resolve it without losing data. The github.com/lni/dragonboat/tools package provides the ImportSnapshot method to import a previously exported snapshot to repair such failed Raft cluster.
* By default, the RaftAddress value can not be changed between restarts, otherwise the system will panic with an error message.
Expand Down
2 changes: 1 addition & 1 deletion docs/overview.CHS.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,6 @@ Gossip服务本身是一个全分布的网络服务,用户仅需要通过NodeH

Dragonboat通过NodeHost提供下列其它常用功能:

* 观察者Observer节点。观察者节点不参与Leader的选举,不参与一个提议是否可以被采纳,它仅仅用来接受并执行Raft组各个已采纳的提议。观察者节点的状态机与普通节点一样,正常情况下将具备完整且相同的状态机状态,它可以被用来做为一个额外的只读节点,供用户读取有一致性保证的状态机状态。观察者节点的另一大作用是允许一个新加入的节点以观察者身份加入Raft组,在其逐渐获取所有状态机状态后再提升其为正常节点。在观察者节点所在的NodeHost上发起一次SyncRead或者一次GetClusterMembership,如果成功返回则表示ReadIndex协议被完整执行了一轮,这表示观察者节点已经拥有基本所有Log Entry,具备了将其升级为正常节点的条件。
* Non-Voting节点。观察者节点不参与Leader的选举,不参与一个提议是否可以被采纳,它仅仅用来接受并执行Raft组各个已采纳的提议。观察者节点的状态机与普通节点一样,正常情况下将具备完整且相同的状态机状态,它可以被用来做为一个额外的只读节点,供用户读取有一致性保证的状态机状态。观察者节点的另一大作用是允许一个新加入的节点以观察者身份加入Raft组,在其逐渐获取所有状态机状态后再提升其为正常节点。在观察者节点所在的NodeHost上发起一次SyncRead或者一次GetClusterMembership,如果成功返回则表示ReadIndex协议被完整执行了一轮,这表示观察者节点已经拥有基本所有Log Entry,具备了将其升级为正常节点的条件。
* Leader迁移。正常情况下,Leader以选举方式由用户程序透明的方式选举产生。用户可以使用NodeHost提供的RequestLeaderTransfer方法尝试将Leader迁移至指定节点。
* NodeHost同时提供GetNodeHostInfo与GetClusterMembership方法供查询当前各NodeHost管理下的各Raft组信息。
4 changes: 2 additions & 2 deletions internal/raft/peer.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ func Launch(config config.Config,
p := Peer{raft: r}
p.raft.events = events
_, lastIndex := logdb.GetRange()
if newNode && !config.IsObserver && !config.IsWitness {
if newNode && !config.IsNonVoting && !config.IsWitness {
r.becomeFollower(1, NoLeader)
}
if initial && newNode {
Expand Down Expand Up @@ -184,7 +184,7 @@ func (p *Peer) Handle(m pb.Message) error {
panic("local message sent to Step")
}
_, rok := p.raft.remotes[m.From]
_, ook := p.raft.observers[m.From]
_, ook := p.raft.nonVotings[m.From]
_, wok := p.raft.witnesses[m.From]
if rok || ook || wok || !isResponseMessageType(m.Type) {
return p.raft.Handle(m)
Expand Down
4 changes: 2 additions & 2 deletions internal/raft/peer_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ func TestRaftAPIProposeAndConfigChange(t *testing.T) {
testRaftAPIProposeAndConfigChange(pb.AddNode, NoLeader, t)
testRaftAPIProposeAndConfigChange(pb.AddNode, 2, t)
testRaftAPIProposeAndConfigChange(pb.RemoveNode, 2, t)
testRaftAPIProposeAndConfigChange(pb.AddObserver, 2, t)
testRaftAPIProposeAndConfigChange(pb.AddNonVoting, 2, t)
}

func TestGetUpdateIncludeLastAppliedValue(t *testing.T) {
Expand Down Expand Up @@ -364,7 +364,7 @@ func TestRaftAPIProposeAddDuplicateNode(t *testing.T) {
}
cc3 := pb.ConfigChange{Type: pb.RemoveNode, NodeID: 2}
ne(rawNode.ApplyConfigChange(cc3), t)
cc4 := pb.ConfigChange{Type: pb.AddObserver, NodeID: 3}
cc4 := pb.ConfigChange{Type: pb.AddNonVoting, NodeID: 3}
ne(rawNode.ApplyConfigChange(cc4), t)
cc5 := pb.ConfigChange{Type: pb.RemoveNode, NodeID: NoLeader}
ne(rawNode.ApplyConfigChange(cc5), t)
Expand Down
Loading

0 comments on commit dd856e7

Please sign in to comment.