Twins: BFT Systems Made Robust

Shehar Bano !

Facebook Novi, United Kingdom

Alberto Sonnino !

Facebook Novi, United Kingdom

Andrey Chursin !

Facebook Novi, United States

Dmitri Perelman !

Facebook Novi, United States

Zekun Li !

Facebook Novi, United States

Avery Ching !

Facebook Novi, United States

Dahlia Malkhi !

Facebook Novi, United States

Abstract

This paper presents Twins, an automated unit test generator of Byzantine attacks. Twins implements

three types of Byzantine behaviors: (i) leader equivocation, (ii) double voting, and (iii) losing internal

state such as forgetting “locks” guarding voted values. To emulate interesting attacks by a Byzantine

node, it instantiates twin copies of the node instead of one, giving both twins the same identities

and network credentials. To the rest of the system, the twins appear indistinguishable from a

single node behaving in a “questionable” manner. Twins can systematically generate Byzantine

attack scenarios at scale, execute them in a controlled manner, and examine their behavior. Twins

scenarios iterate over protocol rounds and vary the communication patterns among nodes. Twins

runs in a production setting within DiemBFT where it can execute 44M Twins-generated scenarios

daily. Whereas the system at hand did not manifest errors, subtle safety bugs that were deliberately

injected for the purpose of validating the implementation of Twins itself were exposed within minutes.

Twins can prevent developers from regressing correctness when updating the codebase, introducing

new features, or performing routine maintenance tasks. Twins only requires a thin wrapper over

DiemBFT, we thus envision other systems using it. Building on this idea, one new attack and several

known attacks against other BFT protocols were materialized as Twins scenarios. In all cases, the

target protocols break within fewer than a dozen protocol rounds, hence it is realistic for the Twins

approach to expose the problems.

2012 ACM Subject Classiﬁcation Security and privacy → Distributed systems security

Keywords and phrases Distributed Systems, Byzantine Fault Tolerance, Real-World Deployment

Digital Object Identiﬁer 10.4230/LIPIcs.CVIT.2016.23

Supplementary Material

All artifacts presented in this paper are made publicly available. Speciﬁ-

cally, this includes: (i) the Rust implementation of LibTwins, the Twins framework we implemented

for DiemBFT (Section 5); (ii) the artifacts (the AWS orchestration scripts, and microbenchmarking

scripts and data) used to evaluate LibTwins (Section 6); and (iii) the Python simulator and Twins

instantiation of safety ﬂaw in Fast-HotStuﬀ (Section 3).

Funding This work is funded by Novi, a subsidiary of Facebook.

Acknowledgements

The authors would like to thank Ben Maurer, David Dill, Daniel Xiang, Kartik

Nayak, Ling Ren, and Scott Stoller for feedback on late manuscript, and George Danezis for comments

Dahlia Malkhi;

licensed under Creative Commons License CC-BY 4.0

42nd Conference on Very Important Topics (CVIT 2016).

Editors: John Q. Open and Joan R. Access; Article No. 23; pp. 23:1–23:29

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

arXiv:2004.10617v2 [cs.CR] 14 Jan 2022

23:2 Twins: BFT Systems Made Robust

on early manuscript. We also thank the Novi Research and Engineering teams for valuable feedback.

1 Introduction

Byzantine Fault Tolerant (BFT) protocols introduced in the seminal work of Lamport et

al. [

] are designed to withstand attacks or arbitrary malfunction of internal nodes. However,

creating Byzantine attacks in order to validate a BFT system is challenging: (i) Byzantine

behavior is unconstrained and (ii) developers may be tainted by what they think that the

system is designed to tolerate. Last, as a pragmatical consideration, developing code that

implements Byzantine attacks might be risky.

This paper introduces Twins, a principled approach for eﬀectuating Byzantine attacks

on BFT systems and examining their behavior. Instead of coding incorrect behavior, Twins

creates faulty behavior testing from the correct behavior itself, simply by duplicating

correct

and unmodiﬁed node behavior. This works as follows.

Twins creates a "faulty" nodes by deploying two (or generally,

) instances, both having

the same credentials/signing-keys but running autonomously. Thus, for example, both nodes

can send messages in the same protocol round, but these messages will carry conﬂicting

proposals or votes; to the rest of the system, this twins behavior will appear indistinguishable

from an equivocating behavior by a single node. In another example, one twin may send a

vote in one round, and its twin will “forget” it has voted in the next round; again, to the rest

of the system, this will appear indistinguishable from a single node violating safety rules.

Twins is based on the insight that most interesting Byzantine attacks are internal and

leverage knowledge of the expected behavior of participants, hence they go unnoticed. In

particular, Twins foregoes trivial attacks such as sending semantically invalid messages,

or sending a message without justiﬁcation. Thus, leveraging existing code, Twins can

automatically cover material Byzantine behaviors. Indeed, Section 3 demonstrates one new,

and several known, attacks on BFT protocols materialized as Twins attacks. Crucially, in all

cases, protocols break within fewer than a dozen protocol steps, hence Twins successfully

exposes them. Note that Twins scenarios systematically iterate over protocol rounds and

vary the communication patterns among nodes. While inherently exponential, in the above

attacks, it took Twins only minutes to discover protocol ﬂaws that in some cases, took the

community decades to surface.

Twins has been incorporated into a production setting, DiemBFT [

], in which Twins can

execute 44M Twins-generated scenarios daily. Whereas the system at hand did not manifest

errors, subtle safety bugs that were deliberately injected for the purpose of validating the

implementation of Twins itself were exposed within minutes. Twins can prevent developers

from regressing correctness when updating the codebase, introducing new features, or

performing routine maintenance tasks.

Twins & attacks on BFT replication.

Twins arises in the context of BFT replication

protocols. In this domain, several worrisome safety and liveness vulnerabilities were exposed

recently [

] in both known protocols [

] and in new ones [

]. One reason that

BFT replication lends itself well to analysis via Twins is as follows. A common paradigm

underlying practical BFT replication protocols is a view-by-view design. Each view is driven

by a designated leader proposing to the nodes and going through voting rounds by the nodes.

If a leader is successful, a consensus decision is reached in the view. If not, nodes give up after

a timeout and move to the next view. Transitioning to the new view/leader is tricky: A new

leader must discover if the previous leader was successful, but it may be able to communicate

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:3

only with a subset of the nodes. The transition logic turns out to be the source of problems

in all the above cases, hence exposing the ﬂaw requires only one or two leader rotations.

Twins implementation.

Twins eﬀectuates a Byzantine attack by a Byzantine node via

instantiating twin copies of the node instead of one, giving both twins the same identities

and network credentials. To the rest of the system, the twins appear indistinguishable from

a single node behaving in a “questionable” manner. Twins minutely interacts with existing

code to control message delivery and schedule various coarse-steps such as protocol rounds.

It is practical to deploy in real systems as it uses existing node code, easily keeping up with

an evolving software project.

We built an attack generator based on the Twins approach in the DiemBFT open-source

project, the BFT replication core of the Diem payment system [

]. Implementing Twins in

DiemBFT consists of two principal parts.

The ﬁrst is a scenario executor that deploys a network conﬁguration where some nodes

have twins. The scenario executor hides twins behind a thin multiplexing wrapper; to the

rest of the system, each pair of twins appear as a single entity. The scenario executor controls

the scheduling of message deliveries according to a prescribed scenario. This is accomplished

through a transport emulator in the DiemBFT repository called Network Playground.

The second part is a scenario generator. The scenario generator enumerates scenarios by

varying the number of nodes and the message delivery schedule, then feeding the scenarios

to the scenario executor. We describe in the paper several strategies for drastically reducing

the number of scenarios through aggressive trimming of symmetrical scenarios. Among

these strategies, one minimally “opens” the DiemBFT implementation and lets the scenario

executor determine when a node acts as a leader in the consensus protocol. This removes

duplicate scenarios that diﬀer only in their leaders. Another strategy may allow only faulty

nodes to become leader. Section 6 reports on our experience with Twins in DiemBFT.

Coverage.

What attacks does the Twins approach capture? Developing a rigorous theory

that answers this question is an intriguing question left for future work. Here, we provide

anecdotal evidence of coverage in three forms:

(i) Section 2 brings intuition and experience of several decades of work in the ﬁeld. There

are only a handful of ways in which a Byzantine attacker can materially deviate from the

safety rules imposed by its protocol. For example, it can equivocate and send diﬀerent

proposals to diﬀerent groups of recipients, or it can pretend it did not send/receive a message

and propose or vote in a manner that conﬂicts with such a message.

(ii) Evaluating within the DiemBFT production system Section 6 provides compelling

validation of the Twins approach. Whereas the system at hand did not manifest errors,

self-injected subtle safety bugs—for the purpose of validating the implementation of Twins

itself—were exposed within minutes. In particular, we created a simple safety-violating

setting by deploying

+ 1 (instead of

) nodes with Twins, which led to an expected

consistency violation within seconds. We further injected three subtle logical bugs, which

only slightly deviated from the original speciﬁcation. In all three cases, with only

twins

(faults), Twins successfully exposed safety violations.

(iii) Section 3.1 shows how Twins can instantiate a safety violation in a new protocol

described in a recent manuscript [

]. This highlights the importance of systematically

analyzing the properties of BFT protocols using Twins to expose subtle ﬂaws. Section 3

reinstates several known attacks on BFT protocols using the Twins approach. These attacks

cover a broad spectrum of vulnerabilities, e.g., safety, liveness, timing, and responsiveness.

In some protocol steps, a node may wait for messages to determine its next action. Under

Twins, the node is forced to act according to the messages it received, as if the node provided

C V I T 2 0 1 6

23:4 Twins: BFT Systems Made Robust

a justiﬁcation for each step in form of the history of messages it received. Deviating from

this behavior was not required to reinstate any of the attacks discussed in Section 3, though

in principle, various deviating behaviors would not be covered by Twins. Another coverage

challenge emerges in synchronous protocols because a node behavior may be based on real

time. In such protocols, Twins essentially forces a node to behave in a timely manner.

We tackle this case in one of the attacks investigated in Section 3 and demonstrate that

nonetheless, a slight adaptation of the original attack reinstates the attack in Twins. However,

we do not know yet which timing attacks may not be covered. We discuss some concrete

future directions in Section 8 for extending Twins in the settings we explore as well as others.

2 Motivating the Twins Approach

We open this section with a quick primer on the Byzantine Fault Tolerant (BFT) replication

problem, and describe the notation that will be used to describe attacks through the rest

of this paper. We then provide high-level intuition on why Twins is a viable approach by

showing the diﬀerent kinds of Byzantine behaviors that can be captured by Twins. (Concrete

attacks using Twins are described later in Section 3 and Section 6.1.)

BFT Replication

. The goal of BFT replication is for a group of nodes to provide a

fault-tolerant service through redundancy. Clients submit requests to the service. These

requests are collectively sequenced by the nodes; this enables all nodes to execute the same

chain of requests and hence agree on their (deterministic) output.

Except when speciﬁcally noted, we consider protocols that maintain safety against

arbitrary delays in message transmissions. That is, we assume an asynchronous network

setting. The main challenge is to drive agreement on a chain of requests (and their output)

among all non-faulty nodes despite node failures. It is common to rely on leaders to populate

the network with a unique proposal. During periods in which the leader is non-faulty and

communication among the leader and non-faulty nodes is timely, this regime can drive

consensus quickly. This approach is called partial synchrony, indicating that it maintains

safety at all times and progress only during periods of synchrony.

In the Byzantine fault model, a node may crash or arbitrarily deviate from the protocol.

In this setting, a BFT replication system implements a fault tolerant service via

nodes,

of which a threshold

f < n/

3 may be Byzantine. As Byzantine behavior is deﬁned rather

vaguely, there is no principled way to evaluate BFT systems. Twins is a new approach to

systematically generate Byzantine attacks. The main idea of Twins is the following: running

two (generally, up to

) autonomous instances of a node that both use correct code and

share the same identity, allows us to emulate most interesting Byzantine attacks. Two nodes

share the same identity when they share the same credentials and signing keys.

Notation

. Nodes are represented by capital alphabets (e.g.,

) and the twin of a node is

represented by the same alphabet with the prime symbol (e.g.,

′

). When referring to a set

of nodes, we enclose them in parentheses e.g., (

A, B, B

′

). We underline a node that is serving

as the leader, e.g.,

. The adversary can delay and ﬁlter messages between nodes. We denote

partitions of nodes by enclosing them in braces, e.g.,

{A, B, C, D}

and

{E, F, G}

and reserve the capital letter

to denote them. Additionally, to show messages allowed in

a given direction, we use the symbols

→

and

↔

. For example,

A →

(

B, C

) means

can

send messages to

and

; similarly,

A ↔ P

means

can send messages to and receive

messages from any node of the partition

. The scenarios described below use a network

conﬁguration of 7 nodes, (

A, B, C, D, E, F, G

). Byzantine nodes have twins denoted with

′

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:5

as in

′

. To experiment with any of the deviating behaviors described below, one can

increase the number of Byzantine faults to

+ 1 (say

E, F, G

have twins

′

, F

′

, G

′

) and

expect to see conﬂicting commits.

Equivocation

. A quintessential Byzantine behavior is for a node to equivocate. That is, in

the same step, a Byzantine node might send diﬀerent messages to diﬀerent recipients.

Twins covers equivocation by splitting honest nodes between two partitions, each one

communicating with only one twin of each pair. For example, we can split the system into

{A, B, C, D, F }, P

{C, D, E, F

′

, G}

. The leader(s)

and

′

execute correct leader

code but nevertheless may generate conﬂicting proposals due to diﬀerent inputs or randomness

seeds. If there is a protocol ﬂaw then these conﬂicting proposals could respectively commit

in P

and P

, hence safety breaks.

Amnesia

. An important role that nodes have in agreement protocols is vote for a single

proposal per view. However, a Byzantine node might vote for a proposal and then ‘forget’

that it has voted and vote again. Twins covers amnesia by letting one of the twins vote on one

proposal. Since the other twin is oblivious to the vote happening, it may nevertheless—albeit

executing correct code—vote on a diﬀerent proposal.

More concretely, as in the scenario above, we can split the nodes into two partitions,

{A, B, E, F, G}, P

{C, D, E, F

′

, G

′

}

. If there is a protocol ﬂaw then this double-

voting behavior may result in conﬂicting commits in P

and P

, hence safety breaks.

Losing internal states

. Another notable deviation for Byzantine nodes is to lose their

internal state, particularly a lock that guards a value they voted for. Twins covers this

deviation by letting one of the twins get locked on a value in one view, but in some subsequent

view, bring the other twin who is ignorant that a lock exists.

More concretely, we can split the nodes into two partitions

{A, B, E, F, G}, P

{C, D, E, F

′

, G

′

}

. In one view, the adversary relays messages only among

. In the next view,

it switches to

, causing

′

, G

′

—albeit executing correct code—to ignore their ‘previous’

actions. This can repeat any number of times. If there is a protocol ﬂaw then conﬂicting

proposals may commit in diﬀerent views, hence safety breaks.

3 Attacks Materialized in Twins

In this section, we demonstrate one new, and several known, attacks on BFT replication

protocols, expressed as Twins scenarios. We provide insight into the attacks and defer

the details of all but the linear leader-replacement attack to an appendix, due to space

constraints.

3.1 New Attack

Fast-HotStuﬀ [

] is a new protocol, described in a recent manuscript. It is similar to

HotStuﬀ [

], except with a 2-phase commit rule. The safety violation we reveal using Twins

is possible because Fast-HotStuﬀ does not require consecutive rounds in order to commit.

Speciﬁcally, Quorum Certiﬁcates (QCs) [

] formed by some of the (partitioned) nodes

do not reach the other nodes, resulting in two parallel branches that eventually commit

two conﬂicting blocks. We instantiate this safety violation with Twins (using only network

partitions in a network with 4 nodes and within 11 rounds). This highlights the eﬃcacy of

systematically analyzing the properties of BFT protocols via Twins to expose subtle ﬂaws.

More details are provided in Appendix F.

C V I T 2 0 1 6

23:6 Twins: BFT Systems Made Robust

We implemented the Fast-HotStuﬀ BFT consensus algorithm in a Python simulator which

we release as open source

. The simulator then executes Twins scenarios over the algorithm.

3.2 Reinstated Attacks

We present several known attacks on BFT protocols, expressed as Twins scenarios. In all

cases, exposing vulnerabilities requires only a small number of nodes, partitions, rounds

and leader rotations. It is worth noting that later, our evaluation (Section 6) of LibTwins,

Twins implemented for DiemBFT, shows that running an automated scenario generator

(Section 4.2) with these conﬁgurations would cover the described attacks within minutes. We

did not undertake to re-implement all these protocols and apply a Twins scenario generator

to them; our implementation covers only DiemBFT [13].

Safety attack on Zyzzyva

. Zyzzyva broke new ground in BFT replication with the

introduction of an optimistic single phase “fast track” commit. Eleven years elapsed from

its publication until a safety ﬂaw in Zyzzyva was discovered [

], during which numerous

research project and systems were built on it. Twins generates a scenario that exposes the

ﬂaw with 4 nodes and two leader rotations: the ﬁrst leader equivocates via a twin, and the

next two leaders drop messages to/from some nodes. The details of this attack using Twins

is described in Appendix C.

Liveness attack on FaB

. FaB [

], a precursor to Zyzzyva, is a view-based protocol with

an optimistic fast track. Not surprisingly, a similar problem arises in FaB due to a ﬂawed

leader replacement protocol [

], albeit manifesting as a liveness bug. Twins exposes this bug

in a short scenario with

= 4 and three leader rotations, leading to a complete absence of

leader proposals. The detailed attack using Twins is described in Appendix D.

Timing attack on Sync HotStuﬀ

. Force-Locking Attack [

] is a timing attack on a

preliminary version of a synchronous BFT protocol named Sync HotStuﬀ [

] (which was

subsequently updated to resist the attack). As before, Twins captures this attack with only

a small system size,

= 5, and two leader rotations. However, in order to create timing

attacks, Twins needs to be aware of timing information for protocol steps and messages

deliveries. Extending Twins with timing data is left for future work. In the speciﬁc attack at

hand, course-grain timing at ﬁxed intervals—fewer than ten—suﬃce to reinstate the attack.

The detailed attack using Twins is described in Appendix E.

Non-Responsiveness attack on linear leader-replacement

. Practical Byzantine Fault

Tolerance (PBFT) [

] is a seminal work that was designed to work eﬃciently in the asyn-

chronous setting. Carrying the classical PBFT solution to the blockchain world, Ten-

dermint [

] and Capser [

] introduced a simpliﬁed linear strategy for leader-replacement.

However, it has been observed [

] that this strategy forgoes an important property of

asynchronous protocols—Responsiveness—the ability of a leader to advance as soon as it

receives messages from 2f + 1 nodes.

Indeed, bringing linear leader-replacement approach

into PBFT, we demonstrate a liveness attack using a Twins scenario. Lack of progress is

detected by observing that two consecutive views with honest leaders whose communication

with a quorum is timely do not produce a decision. We present the details of this attack

using Twins in the next section.

https://github.com/asonnino/twins-simulator/tree/master/fhs

Tendermint is a precursor to HotStuﬀ [

] and DiemBFT [

] which operates in two-phase views, but

has no Responsiveness. HotStuﬀ/DiemBFT solve this by adding a third phase.

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:7

3.3 Non-Responsiveness Attack

We now describe in more detail the non-Responsiveness attack above on linear leader-

replacement. The seminal PBFT solution operates two-phase views. A simpliﬁed, linear

leader-replacement works as follows. A leader proposes to extend the highest quorum

certiﬁcate (QC) it knows. A QC is formed on a proposed value if it gathers 2

+ 1 votes from

nodes. Nodes vote on the leader proposal if it extends the highest QC they know. A commit

decision on the leader proposal forms if 2

+ 1 nodes form a QC, and then 2

+ 1 nodes

vote for the QC. Progress is hinged on leaders obtaining the highest QC from the system,

otherwise liveness is broken.

Using the notation from Section 2, the liveness attack here uses 4 replicas (

D, E, F, G

where

has a twin

′

. In the ﬁrst view,

and

′

generate equivocating proposals. Only

D, E

receive a QC for

’s proposal. The next leader is

who proposes to re-propose the

proposal by

′

, which

and

do not vote for because they already have a QC for that

height. Only

and

′

receive a QC for

’s proposal. This scenario repeats indeﬁnitely,

resulting in loss of liveness. More speciﬁcally, this attack works as follows:

View 1: Initialize D and D

′

with diﬀerent inputs v

and v

Create the partitions P

= {D, E, G}, P

= {D

′

, F }.

Let

and

′

run as leaders for one round.

proposes

and gathers votes

from P

creating QC(v

). D

′

proposes v

to P

and gathers votes but not a QC.

Create the following partitions:

{D, E}

′

, F }

{G}

broadcasts

QC(v

), which only reaches P

i.e., (D, E).

View 2: Drop all proposals from D and D

′

until View 2 starts.

Remove all partitions, i.e., P = {D, D

′

, E, F , G}.

Let

run as leader for one round.

re-proposes

(i.e.,

′

’s proposal in the previous

round) to

. (

D, E

) do not vote as they already have

(

) for that height.

gathers votes from the other nodes and forms QC(v

Create partitions P

= {D, E}, P

= {D

′

, F }, P

= {G}.

F broadcasts QC(v

), which only reaches P

View 3: Drop all proposals from F until View 3 starts.

Create the partitions P

= {D, E, G}, P

= {D

′

, F }.

Let

run as leader for one round.

proposes

which extends the highest QC it

knows,

(

). As before,

manages to form

(

), but as a result of a partition,

the QC will only reach (

D, E

). Next, there is a view-change,

is the new leader, and

there are no partitions.

proposes

which extends

(

), the highest QC it knows.

However, (

D, E

) do not vote because

does not extend their highest QC i.e.,

(

This scenario can repeat indeﬁnitely, resulting in the loss of liveness.

4 Systematic Scenario Generation

Whereas the previous section demonstrated manually crafted Twins attack scenarios, this

section presents a framework for systematically generating such scenarios.

Systematically and eﬃciently generating Twins scenarios that provide good coverage

requires tailoring to the speciﬁc BFT protocol settings. We develop the Twins framework

which generates and executes scenarios that describe the node and network conﬁgurations.

Speciﬁcally, the Twins framework is comprised of two components as shown in Figure 1: (i)

the scenario executor, and (ii) the scenario generator. The scenario executor runs a single

scenario and generates output logs, while the scenario generator produces various scenarios

C V I T 2 0 1 6

23:8 Twins: BFT Systems Made Robust

Scenario Generator

Scenarios?

Scenario Executor

Scenario

Yes

FinishedNo

Output

Logs

Persistent

Storage

Figure 1 Twins high-level design.

that are fed to the scenario executor to check for violations. The following design goals

underlie the Twins framework:

Generic & Modular.

Twins is modular with respect to the particular BFT protocol im-

plementation being analyzed, imposes as little complexity as possible on the development,

and easily keeps up with code changes.

Parametrizable.

The network setup (i.e., the number of nodes, leaders per round, and

network conﬁguration per round) and adversarial assumptions (i.e., how many Byzantine

faults are tolerated) is conﬁgurable.

Feasible.

Twins allows pruning duplicate scenarios in order to provide coverage of

material attacks.

Customizable Coverage.

The coverage of scenarios, i.e., the subset of all possible

scenarios to choose for execution, is conﬁgurable by randomly sampling scenarios to run

among all possible enumeration.

Reproducible.

Twins writes logs to persistent storage, containing suﬃcient information

to detect and reproduce any safety violations.

Next, we describe the two main components (Figure 1) of Twins—the scenario executor

and the scenario generator—in detail.

4.1 Scenario Executor

In every Twins scenario, a threshold of the nodes are ‘misconﬁgured’ to have a twin instance

with identical transport endpoint credential and secret keys. The Twins scenario executor

gets as input a scenario consisting of a node-set, a subset of which are marked compromised

(representing Byzantine nodes); and a round-by-round message delivery schedule. The

scenario executor sets up a network of nodes with a given number of compromised nodes

and per round partitions and leaders. The compromised nodes correspond to the nodes for

which the scenario executor creates twins (i.e., identical instances with the same credentials

and signing keys), thereby emulating misbehavior.

As mentioned above, we address BFT replication protocols that proceed in rounds

initiated by a designated leader, each round representing a state transition in the protocol’s

state machine replicated on each node. For each round, the scenario executor creates a given

network partition and assigns given leaders to the round. The scenario executor runs the

BFT protocol among nodes for a pre-speciﬁed number of rounds, at the end of which, the

scenario executor checks for violations. Speciﬁcally, protocol guarantees can be violated in

two principal ways, safety and liveness. A safety violation is detected if two nodes commit

to conﬂicting decisions. A liveness violation can be detected if the protocol fails to commit

within a certain number of steps or a certain duration bound.

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:9

4.2 Scenario Generator

We build a scenario generator of round-by-round scenarios: for each round, the scenario

generator enumerates possible leaders and message delivery schedules among nodes. The sce-

nario generator produces various scenarios to be fed into the scenario executor. Each scenario

represents a unique instance of executor conﬁguration parameters, i.e., the compromised

nodes and per round network partitions and leaders. Scenarios are generated systematically

as follows (see notations in Section 2):

Step 1. The scenario generator ﬁrst produces the set of all possible partitions of nodes

(called partition scenarios). For example, for a network of 4 nodes (

A, B, C, D

), possible

partition scenarios (

) include

{A, D}, {B, C}}

, and

{{A}, {B, C, D}}

. This

problem relates to the Stirling Number of the Second Kind [

] which enumerates the

ways in which a set of

nodes can be divided up into

non-empty partitions, where

ranges from

(i.e., each node is self-isolated) to 1 (i.e., fully connected network without

partitions).

Step 2.

Next the scenario generator assigns each partition scenario to all possible leaders

i.e., the set of

nodes assuming any of those can be a potential leader. For example,

for the example partition scenario above

{A, D}, {B, C}}

for a network of nodes

(

A, B, C, D

), possible leader-partition combinations include

{A, P

}

{B, P

}

{C, P

}

{D, P

}

. Each leader-partition combination fully describes the Twins conﬁguration

required for each round.

Step 3.

The scenario generator lists scenarios by enumerating all possible ways in which

the leader-partition pairs generated in the previous step can be arranged over

rounds

(i.e., permutation, with or without replacement).

The scenario generator iterates over the generated scenarios linearly, and invokes the

scenario executor for each scenario. For safety analysis, usually a small number of rounds

(

10) suﬃces to expose logical bugs in the protocol. Scenario generators therefore need to

enumerate a reasonable number of combinations.

Pruning scenarios

. Important to the success of the approach is for the scenario generator

to avoid duplicate scenarios (e.g., in symmetry or node label

rotation) and generate only

materially diﬀerent scenarios. The implementation we describe in the Evaluation section

of this paper (Section 6) employs aggressively such pruning. Certain heuristics further

substantially reduce the number of scenario conﬁgurations. For example, in most safety

violations the set of honest parties is split into two, hence it suﬃces to play with two or

three partitions per round. These optimizations make it feasible to cover a broad range of

meaningful scenarios. For analyzing liveness, many scenarios will obviously fail to make

progress because there does not exist a super-majority quorum that has reliable and timely

communication among its members. Hence, for liveness analysis the scenario generator must

guarantee that eventually such a quorum exists.

Message delays and timeouts

. We note that the scenario generator does not address

message delays and timeouts, only the dropping of messages and their relative delivery order.

Because the BFT protocol may employ timers, the dropping of messages implicitly implies

Nodes can have designated roles in the protocol, referred to as node labels. Twins incorporates the

label ‘leader’, which is the case for standard BFT protocols. Extensions of these protocols might have

further hierarchy e.g., primary and secondary leaders. This is currently not supported, but the scenario

generator can be easily extended to support diﬀerent node labels.

C V I T 2 0 1 6

23:10 Twins: BFT Systems Made Robust

Node 0

Node 1

Node n

NetworkPlayground:::wait_for_messages

(Network Playground intercepts messages here)

deliver_message( )

Node 0

Node 1

Node n

Inbound messages to nodesOutbound messages from nodes

Figure 2 Design of DiemBFT’s Network Playground.

that relevant endpoint incur a violation of presumed bounds on transmission delays. Future

work may incorporate explicit message delays into the scenario generator to check speciﬁc

timing violations and also to analyze BFT protocols in the synchronous model (Section 8).

5 Implementation

We implemented the Twins framework for DiemBFT, which we call LibTwins. Appendix A

provides an overview of DiemBFT. As described in Section 4, an implementation consists

of two principal ingredients, a scenario generator and an scenario executor (Figure 1). We

ﬁrst describe the scenario executor implementation which leverages a network emulator

in DiemBFT referred to as the network playground. We then proceed to describe the

scenario generator implementation. For completeness, the Rust code and interfaces for the

main functions of LibTwins,

execute_scenario

and

scenario_generator

, are provided in

Appendix B. We are open sourcing the Rust implementation of LibTwins

5.1 Scenario Executor

The LibTwins scenario executor leverages the network emulator of DiemBFT, network

playground

. Network playground provides an apparatus for running single-host DiemBFT

deployments, emulating a network and intercepting all messages exchanged between nodes.

Scenarios can be written to manipulate the intercepted messages (e.g., by dropping certain

messages) and observe node response. Figure 2 shows the design of the network playground.

Nodes are represented by processes run on diﬀerent threads (that run the full consensus

protocol), and network links between them are expressed as Rust channels that provide

asynchronous unidirectional communication between threads. In DiemBFT, nodes are

identiﬁed by their Account Address (a public key that uniquely identiﬁes a node). Channels

are associated with their respective account addresses (nodes). When a node starts a new

round, it checks whether it is leader for this round; if yes, then it generates on the ﬂy a block

to propose using a mock block generator. Each call to the mock block generator produces a

diﬀerent block. This has important implication for LibTwins, as we require a node and its

twin to propose diﬀerent blocks at the same round to emulate equivocation.

The scenario executor component (Section 4) of LibTwins is built on top of network

playground. This required the following modiﬁcations and extensions to the original library:

https://github.com/diem/diem

https://github.com/diem/diem/blob/master/consensus/src/network_tests.rs

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:11

Adding twins.

We wrote a new method to add nodes to the network that supports

twins. The method takes ‘compromised nodes’ as a parameter to refer to the nodes for

which to create twins. For each target node, a duplicate instance is created with the

same credentials and signing keys. Consequently, in the eyes of the other nodes the

compromised node and its twin are indistinguishable.

Inferring rounds.

LibTwins requires to apply a number of ﬁltering policies at the round

level. Network playground does not have a notion of rounds—it only supports static

conﬁgurations that remain unchanged throughout protocol execution. There is no global

notion of rounds in a distributed system with partial synchrony; instead, nodes have their

own view of which round they are in, which they include in their messages. We enable

network playground to extract round from intercepted messages and accordingly apply

ﬁltering criteria.

Round-based message ﬁltering.

Network playground allows writing rules to drop

intercepted messages that meet certain criteria, i.e., messages to or from speciﬁed nodes

and messages of speciﬁed types e.g., votes or proposals. LibTwins extends network

playground to drop intercepted messages per round, which allows emulating diﬀerent

network partitions per round. The message dropping rules treat compromised nodes and

their twins diﬀerently—the rules apply to account addresses (which uniquely identify

nodes), not public keys (which are the same for a target node and its twins).

Deterministic multi-leader election.

DiemBFT currently uses a non-deterministic

leader election algorithm. LibTwins requires leader election at a ﬁner granularity, i.e.,

assigning a speciﬁed leader to each round, potentially assigning multiple leaders to a

round (because if a compromised node is elected as a round leader, its twins becomes

leader too). We wrote a new leader election algorithm for DiemBFT that supports these

requirements.

To emulate running the protocol for a given number of rounds, we approximate rounds

by the number of messages emitted by nodes. Note that in a system with partial synchrony,

we can only make guesses about rounds as there is no global notion of rounds. Using

message-count per-round (without partitions) as an ‘over-guesstimate’, we let the nodes vote

for 3 extra rounds. Over-running a scenario has no consequence on the results of LibTwins

(other than longer scenario execution time) because any safety violations would have already

been detected in earlier rounds.

5.2 Scenario Generator

The scenario generator produces scenarios in three main steps. First, it generates all the

possible ways in which a set of

nodes can be split into

partitions (partition scenarios).

Second, it generates all possible ways in which

leaders can be combined with the partitions

generated in the previous step. Finally, it generates all the possible ways in which the partition-

leader combinations can be permuted over

rounds of consensus protocol execution. The

scenario generator can operate in online or oﬄine modes. In the online mode, scenarios

are generated on the ﬂy and fed to the scenario executor. The scenario generator can be

conﬁgured to write the scenarios to a ﬁle. In the oﬄine mode, the scenario generator reads

previously generated scenarios from a ﬁle and feeds them to the scenario executor.

Pruning scenarios

. A naïve enumeration of all combinations of

partitions,

leaders, and

rounds may explode quickly (see Table 1). In order to constrain the number of generated

scenarios in a particular run, we provide hooks to control the number of

partitions,

the number of

leader-partition pairs, and the number of leader-partition conﬁguration

C V I T 2 0 1 6

23:12 Twins: BFT Systems Made Robust

assignments to rounds. For all three cases, we specify whether the selection is deterministic—

ﬁrst

—or randomized—an

sample. In the third case—conﬁguration assignment to

rounds—the total combination space to select from is large. Therefore, the scenario generator

allows randomizing the per-round conﬁguration selection, rather than sampling over the

entire space of assignments.

6 Evaluation

We validate the capability of LibTwins to model and detect attacks, present microbenchmarks

for the main components of LibTwins, and describe our experiments at scale using Amazon

Web Services (AWS) [

]. We are open sourcing the implementation of LibTwins, AWS

orchestration scripts, and microbenchmarking scripts and data to enable reproducible results

All our evaluations correspond to 4–7 nodes, 4–7 rounds and 2–3 partitions. Intuitively,

these conﬁgurations seem suﬃcient to expose any safety violations. Indeed, the known

attacks on BFT protocols described in Section 3 were exposed with only a small number

of nodes, partitions and leader rotations. A recent work [

] on the coverage of random

scenarios to detect crash faults shows that coverage depends on the number of partitions and

node labels (in our case, the leaders), but not on the number of nodes. For Jepsen [

], all

the bugs that provide meaningful coverage have a small number of rounds, and 2–3 partitions

and roles [

]. Using higher values for these parameters leads to a very large number of

scenarios, which cannot be feasibly executed without some sort of ﬁltering (Section 5.2). It

is an interesting open question whether increasing the value of these parameters has a higher

chance of exposing safety violations.

6.1 Validation

We deliberately introduce bugs to DiemBFT, and validate that LibTwins is able to model

and detect attacks that exploit the injected vulnerabilities. This approach is similar to

mutation testing, a well-known technique to evaluate the quality of existing tests in terms of

whether they can detect programs with deliberately injected modiﬁcations (called “mutants”).

While approaches such as automated mutation testing can help us to exhaustively introduce

mutants, this is computationally expensive and not practical for large, complex systems.

We select bugs to inject into DiemBFT based on their ability to compromise the program’s

functional correctness. We note that this choice is based on our intuition and experience, and

does not provide any coverage guarantees. The validation approach we use is to: (i) inject

the bug into DiemBFT; and (ii) generate scenarios using the LibTwins scenario generator,

checking for any safety violations. We instantiate the scenario generator with diﬀerent

conﬁgurations and vary them until a safety violation is exposed.

We begin with the base case: can LibTwins generate a scenario that violates safety when

the BFT threshold is exceeded (i.e.,

> f

Byzantine nodes)? We discovered a safety violation

with 4 nodes and 2 twins (

A, B, C, D, A

′

, B

′

), 7 rounds, and static scenario conﬁguration (i.e.,

each partition-leader combination is run for all

rounds). LibTwins executed 62 scenarios

of which 8 led to safety violation within 86s.

Changing quorum size to 2f

. BFT protocols consider a state transition safe if it receives

votes from an honest majority of nodes (i.e., quorum). We change DiemBFT’s quorum size

https://github.com/libra/libra

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:13

(a) 4 Nodes, 2 Partitions, 4 Rounds. (b) 4 Nodes, 2 Partitions, 7 Rounds.

Figure 3

Time taken by the scenario generator to produce LibTwins scenarios. Each data point is the

average of 10 runs; error bars represent one standard deviation.

from 2

+1 to 2

. LibTwins detects a safety violation with 4 nodes and 1 twin (

A, B, C, D, A

′

7 rounds, and static scenario conﬁguration (i.e., where each partition-leader combination is

run for all the

rounds). Within 20s, LibTwins executes 14 scenarios of which 6 lead to

safety violation. These scenarios have the same pattern: Nodes are split into two partitions

of size 2 and 3, with

in one partition and

′

in the other. As nodes in the two partitions

can form quorum, oblivious to each other they continue to generate quorum certiﬁcates on

blocks proposed by

and

′

, respectively. Ultimately, nodes in the two partitions commit

two diﬀerent blocks.

Accepting conﬂicting votes

. Upon receiving a proposal, nodes vote for it only if the

block_round

is greater than the

last_voted_round

(Safety Rule 1, Appendix A). We intro-

duce a subtle bug to DiemBFT by changing this rule, so that a node votes for a block if the

block_round

is greater than or equal to the

last_voted_round

. LibTwins detects the safety

violation within a few seconds, with 4 nodes and 1 twin

{A, B, C, D, A

′

}

, and 7 rounds. This

safety bug was detected in one-shot, with 0 partitions. Nodes vote on proposals from both A

and A

′

and quickly end up committing two diﬀerent proposals for the same round.

Forgetting to update preferred round

. Upon receiving a proposal, nodes vote for the

block if the

block_round

is greater than

last_voted_round

, and the block’s

parent_round

is greater than or equal to

preferred_round

(Safety rules 1 and 2, Appendix A). We disable

the ﬁrst check, and bypass the second check by never updating

preferred_round

so it

permanently remains at 0 (Update rule 2, Section A). The main ingredient of an attack

that exploits the bug described above is to propose a block in an old round, and get the

nodes to over-write committed blocks (safety violation). The challenge for LibTwins is that

as a twin node runs correct code, it cannot be made to propose blocks in arbitrary rounds.

One option is to partition the twin node in an old round, and bring it back up in a later

round, so it starts proposing blocks from where it left. This is, however, not possible in a

‘full disclosure’ protocol like DiemBFT where each quorum certiﬁcate (or timeout certiﬁcate)

contains the full history of previous messages that led to the certiﬁcate. That is, as soon

′

recovers from the partition, it receives a quorum certiﬁcate (or timeout certiﬁcate)

from other nodes and advances its round. To emulate

′

going back in time and proposing a

block for an older round, we let it run as leader for a few rounds, crash it, and then recover

it again as leader. When

′

comes back up again it starts from round 0, proposing a block

that builds on the genesis block (the ﬁrst committed block). Because of our modiﬁcations to

the preferred_round and last_voted_round checks, the nodes re-write history.

C V I T 2 0 1 6

23:14 Twins: BFT Systems Made Robust

Nodes Twins Partitions Rounds Step 1 Step 2

Step 3

No Repl. Repl. Static

4 1 2 4 15 15 ∼ 3 × 10

∼ 5 × 10

4 1 3 4 25 25 ∼ 3 × 10

∼ 4 × 10

4 1 2 7 15 15 ∼ 3 × 10

∼ 2 × 10

4 1 3 7 25 25 ∼ 2 × 10

∼ 6 × 10

7 2 2 4 255 510 ∼ 7 × 10

∼ 7 × 10

510

7 2 3 4 3,025 6,050 ∼ 1 × 10

∼ 1 × 10

6,050

7 2 2 7 255 510 ∼ 9 × 10

∼ 9 × 10

510

7 2 3 7 3,025 6,050 ∼ 3 × 10

∼ 3 × 10

6,050

Table 1

The number of LibTwins scenarios generated for various conﬁgurations. Steps 1, 2 and 3

correspond to the scenario generation pipeline described in Section 4.

Step 1:

The number of ways

in which

nodes can be distributed among

partitions.

Step 2:

The number of ways in which the

partitions generated in Step 1 can be combined with leaders.

Step 3:

The number of ways in which the

partition-leader pairs generated in Step 2 can be permuted (with and without replacement) over

rounds.

In Static conﬁgurations, each partition-leader pair is statically conﬁgured for all the R rounds.

6.2 Microbenchmarks

We present microbenchmarks for the two main components of LibTwins: scenario generator

(Section 5.2) and scenario executor (Section 5.1). The microbenchmarks are run on an Apple

laptop (MacBook Pro) with a 2.9 GHz Intel Core i9 (6 physical and 12 logical cores), and 32

GB 2400 MHz DDR4 RAM.

Scenario generator microbenchmarks

. The scenario generator incurs a one-time com-

putational cost—once the scenarios are generated, the scenario generator feeds them one

by one to the scenario executor. Table 1 shows the number of scenarios generated with

diﬀerent conﬁgurations. We observe that the number of nodes and the number of rounds

signiﬁcantly increase the output of Step 1, which increases proportionally in the number of

twins (as we only conﬁgure nodes with twins to become leaders). We ﬁnd that non-static

conﬁgurations in Step 3 cause the number of scenarios to explode. Therefore, of the various

ﬁlters implemented for the scenario generator (Section 5.2), we ﬁnd the ﬁlter at Step 2 to

be most useful. We use this ﬁlter to make our at-scale Twins analysis (Section 6.3) feasible.

Note that this inevitably comes at the cost of completeness of coverage—a trade-oﬀ that

we cannot completely eliminate. Figure 3 shows how long the scenario generator takes to

produce scenarios for the same number of nodes (4) and partitions (2), and 4 (Figure 3a) and

7 (Figure 3b) rounds. We observe that while it expectedly takes longer to generate scenarios

for 7 rounds vs. 4 rounds due to a larger number of permutations, for each case the time

taken increases linearly in the number of scenarios. We observe a similar linear trend in

our microbenchmarks for other conﬁgurations with varying number of nodes and partitions

(ﬁgures not included due to space constraints).

Scenario executor microbenchmarks

. Table 2 shows the time the scenario executor

takes to execute a scenario. We repeat each measurement over 100 randomly selected

scenarios from a conﬁguration with 2 partitions, and varying number of nodes (4 and 7) and

rounds (4–12). We observe that for 4 nodes, the execution time ranges from 234–465ms for

4–12 rounds, with a maximum standard deviation of 314ms. For 7 nodes, the execution time

ranges from 547–748ms for 4–12 rounds, with a maximum standard deviation of ∼ 1.2s.

The variation observed above in execution times is expected because of how DiemBFT

handles timeouts (Appendix A). For each scenario, LibTwins runs DiemBFT until it has

observed a given number of messages (proposals and votes), which roughly corresponds to

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:15

Rounds

4 Nodes 7 Nodes

Mean (ms) Std. (ms) Mean (ms) Std. (ms)

4 239 314 547 1,286

5 250 87 555 1,059

6 284 88 555 802

7 296 87 559 752

8 334 209 647 810

9 363 175 643 557

10 398 222 653 539

11 433 168 718 570

12 465 179 748 223

Table 2

The time scenario executor takes to execute a scenario for 4 and 7 nodes, over varying number

of rounds and ﬁxed partitions (=2). Each measurement is repeated for 100 randomly selected scenarios.

the number of rounds. In some scenarios, LibTwins can quickly pull out the given number of

messages and ﬁnish the scenario in a timely manner. In other scenarios, we might end up

with partitions where the nodes are not able to make progress and advance rounds, due to

frequent round failures and increased timeout values. Some scenarios may take longer to run,

waiting for the network to emit enough messages to conclude the scenario. The execution of

scenarios has negligible (< 0.1%) memory and CPU footprints.

6.3 Running Scenarios at Scale

We evaluate LibTwins at scale, by running it against the correct code of DiemBFT. We

executed 44M scenarios which were randomly selected from the 200M scenarios corresponding

to the third row of Table 1 (that is, with 4 nodes, 2 partitions, 7 rounds, permuted with

replacement). We ﬁrst generated all the 200M scenarios and randomly selected 44M samples.

We ran the scenario generator in oﬄine mode so the scenarios are written to ﬁle rather than

being passed to the scenario executor. We then split the generated scenarios into 20 shards.

The scenarios can be easily sharded, as the scenarios are independent of each other—this

implies that subject to the availability of computing power to generate and execute scenarios,

LibTwins can be scaled up arbitrarily via sharding. We execute the sharded scenarios over

20 parallel instances of LibTwins on AWS. We use

t3.2xlarge

instances with 8 vCPUs, 2.5

GHz, Intel Skylake P-8175; 32 GB of RAM, and 300 GB of SSD storage. All machines run a

fresh installation of Ubuntu 18.04. We did not observe any safety violations.

7 Related Work

There are two typical approaches to validate distributed systems. The ﬁrst approach is to

oﬀer strong guarantees by building a fully veriﬁed system from the ground up [

], or to

show the absence or presence of bugs [

] by exhaustively enumerating the space

of system behaviors [5, 30] under systematically injected faults [3, 20].

Fully veriﬁed systems do not scale to systems deployed in the real world. Model checking

and exhaustive enumeration of distributed system faults (especially, Byzantine arbitrary

behavior) leads to state explosion (despite partial order reduction techniques [

]), resulting in

low performance. This motivates the second approach of random validation, which underlies

the discipline of Chaos Engineering, exempliﬁed by systems like Chaos Monkey [

]. The

main idea is to analyze the resiliency of a distributed system by randomly injecting faults (e.g.,

terminating processes). Turret [

] reﬁnes this idea by focusing on performance attacks. It

runs an attack-ﬁnding algorithm using diﬀerent strategies, ranging from simple brute-force to

C V I T 2 0 1 6

23:16 Twins: BFT Systems Made Robust

more sophisticated “greedy search” algorithms. Jepsen [

] is a blackbox analysis framework

that runs processes with a random, auto-generated workload and randomly injected network

partitions. A related approach is to subject the system being evaluated to trials by ﬁre such

as Cosmos Game of Stakes [

], i.e., ﬁnancially incentivizing the community to attack the

‘mock’ network, and analyzing successful attacks to harden the network. Random validation

is eﬀective and scalable—but it is not comprehensive or reproducible, and cannot be used to

evaluate distributed systems in an ongoing fashion.

Prior work (with the exception of Jepsen) focused on crash faults. Twins is a new,

principled approach to validate BFT systems by emulating Byzantine behavior via twins—

copies of ‘compromised’ nodes that can send duplicate or conﬂicting messages. Twins

advances state-of-the-art by providing a framework to systematically generate scenarios

with conﬁgurable coverage, and only modeling correct executions (thus avoiding the state

explosion problem associated with formal methods). We show with extensive evaluations

that Twins is suitable for evaluating real-world systems, and can be scaled up arbitrarily for

larger scenario coverage. Twins automatically generate scenarios that modify the interaction

of components with the environment, without opening the code.

8 Future Work & Conclusion

Twins is a novel approach to systematically analyze BFT systems. It provides coverage

for many, but not all, Byzantine attacks. The paper demonstrated anecdotal evidence of

coverage with respect to several known Byzantine attacks, and an implementation of Twins

for DiemBFT that exposes misconﬁguration and purposely injected logical bugs within

minutes. Many directions are left open for future extensions.

Theory of Twins coverage.

As mentioned in the Introduction, it is left open to rigorously

characterize the attacks that Twins can cover. In particular, we conjecture that Twins covers

all Byzantine behaviors in a class of protocols that have ‘full disclosure’: each message

includes a reference to its entire causal past and any source of non-determinism (such as local

coin ﬂips), and nodes act deterministically according to their causal past. It would seem that

this class of protocols is fully covered by Twins since the only possible attack by Byzantine

nodes is to select diﬀerent subsets of messages to report to diﬀerent targets. Similarly,

we conjecture that Twins can cover timing violations in a class of ‘lock-step’ synchronous

protocols. Increasing coverage of Twins in the settings we explore as well as others, and

providing a formal treatment of coverage remain interesting open challenges.

Checking additional properties.

A diﬀerent dimension for extension is the type of

guarantees which Twins scenarios. While this paper focused squarely on safety of the core

consensus protocol, the Twins approach can be extended to validate ancillary components of

BFT systems. For example, DiemBFT switches to a new set of nodes by committing a special

block that includes the new set of nodes and signals the reconﬁguration event. It would be

useful to investigate if Twins can cause a safety violation by creating an inconsistent node

change (i.e., parts of the network believe in diﬀerent nodes). Similarly, DiemBFT’s smart

contract execution engine is re-instantiated via a similar mechanism, and can be subjected

to a similar Twins-based attack.

Extending Twins implementation.

With respect to the concrete DiemBFT Twins

implementation presented in Section 5, several extensions are left for future work, including:

(i) tackling more than a pair of twins; (ii) detecting liveness violations; and (iii) implementing

process-level twins over TCP/IP.

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:17

References

Ittai Abraham, Guy Gueta, Dahlia Malkhi, and Jean-Philippe Martin. Revisiting Fast Practical

Byzantine Fault Tolerance: Thelma, Velma, and Zelma. arXiv preprint arXiv:1801.10022,

2018.

Ittai Abraham, Dahlia Malkhi, Kartik Nayak, Ling Ren, and Maofan Yin. Sync HotStuﬀ:

Simple and Practical Synchronous State Machine Replication. In IEEE Symposium on Security

and Privacy, 2020.

Peter Alvaro, Joshua Rosen, and Joseph M. Hellerstein. Lineage-Driven Fault Injection. In

SIGMOD International Conference on Management of Data, 2015.

Inc. Amazon Web Services. AWS Whitepapers.

https://aws.amazon.com/whitepapers

, 2017.

Christel Baier and Joost-Pieter Katoen. Principles of Model Checking (Representation and

Mind Series). The MIT Press, 2008.

Ethan Buchman. Tendermint: Byzantine Fault Tolerance in the Age of Blockchains.

https:

//cdn.relayto.com/media/files/LPgoWO18TCeMIggJVakt_tendermint.pdf, 2016.

Ethan Buchman, Jae Kwon, and Zarko Milosevic. The Latest Gossip on BFT Consensus.

arXiv preprint arXiv:1807.04938, 2018.

Vitalik Buterin and Virgil Griﬃth. Casper the Friendly Finality Gadget. arXiv preprint

arXiv:1710.09437, 2017.

Miguel Castro and Barbara Liskov. Practical Byzantine Fault Tolerance. In USENIX

Symposium on Operating Systems Design and Implementation, 1999.

Ang Chen, W Brad Moore, Hanjun Xiao, Andreas Haeberlen, Linh Thi Xuan Phan, Micah

Sherr, and Wenchao Zhou. Detecting Covert Timing Channels with Time-Deterministic Replay.

In USENIX Symposium on Operating Systems Design and Implementation, pages 541–554,

2014.

Ang Chen, Yang Wu, Andreas Haeberlen, Wenchao Zhou, and Boon Thau Loo. The Good,

the Bad, and the Diﬀerences: Better Network Diagnostics with Diﬀerential Provenance. In

ACM SIGCOMM Conference, 2016.

12 Cosmos. Cosmos Game of Stakes, 2018. https://github.com/cosmos/game-of-stakes.

13 Diem. DiemBFTBFT. https://github.com/diem/diem.

Patrice Godefroid, J. van Leeuwen, J. Hartmanis, G. Goos, and Pierre Wolper. Partial-Order

Methods for the Veriﬁcation of Concurrent Systems: An Approach to the State-Explosion

Problem. Springer-Verlag, 1996.

Mohammad M Jalalzai, Jianyu Niu, and Chen Feng. Fast-hotstuﬀ: A fast and resilient hotstuﬀ

protocol. arXiv preprint arXiv:2010.11454, 2020.

16 Jepsen. Distributed Systems Safety Research. https://jepsen.io.

Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong. Zyzzyva:

Speculative Byzantine Fault Tolerance. In ACM SIGOPS Symposium on Operating Systems

Principles, 2007.

Leslie Lamport. The Temporal Logic of Actions. ACM Transactions on Programming Languages

and Systems, May 1994.

Leslie Lamport, R. Shostak, and M. Pease. The Byzantine Generals Problem. ACM Transac-

tions on Programming Languages and Systems, 4:382–401, 1982.

Hyojeong Lee, Jeﬀ Seibert, Endadul Hoque, Charles Killian, and Cristina Nita-Rotaru. Turret:

A platform for automated attack ﬁnding in unmodiﬁed distributed system implementations. In

2014 IEEE 34th International Conference on Distributed Computing Systems, pages 660–669.

IEEE, 2014.

Chia-Chi Lin, Virajith Jalaparti, Matthew Caesar, and Jacobus Van der Merwe. DEFINED:

Deterministic Execution for Interactive Control-Plane Debugging. In USENIX Technical

Conference, 2013.

J-P Martin and Lorenzo Alvisi. Fast Byzantine Consensus. IEEE Transactions on Dependable

and Secure Computing, 3(3):202–215, 2006.

C V I T 2 0 1 6

23:18 Twins: BFT Systems Made Robust

Atsuki Momose and Jason Paul Cruz. Force-Locking Attack on Sync Hotstuﬀ. IACR Cryptology

ePrint Archive, 2020.

24 Netﬂix. Chaos Monkey. https://netflix.github.io/chaosmonkey/.

Filip Niksic. Combinatorial Constructions for Eﬀective Testing. Doctoral thesis, Technische

Universität Kaiserslautern, 2019.

Santhosh Prabhu, Kuan Yen Chou, Ali Kheradmand, Brighten Godfrey, and Matthew Caesar.

Plankton: Scalable Network Conﬁguration Veriﬁcation Through Model Checking. In USENIX

Symposium on Networked Systems Design and Implementation, 2020.

Basil Cameron Rennie and Annette Jane Dobson. On Stirling Numbers of the Second Kind.

Journal of Combinatorial Theory, 7(2):116–121, 1969.

The Diem Team. State Machine Replication in the Libra

Blockchain.

https://developers.libra.org/docs/assets/papers/

libra-consensus-state-machine-replication-in-the-libra-blockchain/2019-11-08.

pdf, 2019.

Yang Wu, Mingchen Zhao, Andreas Haeberlen, Wenchao Zhou, and Boon Thau Loo. Diagnosing

Missing Events in Distributed Systems with Negative Provenance. ACM SIGCOMM Computer

Communication Review, 44(4):383–394, 2014.

Maysam Yabandeh, Nikola Knežević, Dejan Kostić, and Viktor Kuncak. Predicting and

Preventing Inconsistencies in Deployed Distributed Systems. ACM Transactions on Computer

Systems (TOCS), 28(1):1–49, 2010.

Maofan Yin, Dahlia Malkhi, Michael K Reiter, Guy Golan Gueta, and Ittai Abraham. Hotstuﬀ:

BFT Consensus in the Lens of Blockchain. arXiv preprint arXiv:1803.05069, 2018.

Maofan Yin, Dahlia Malkhi, Michael K Reiter, Guy Golan Gueta, and Ittai Abraham. Hotstuﬀ:

BFT Consensus with Linearity and Responsiveness. In ACM Symposium on Principles of

Distributed Computing, 2019.

A Overview of DiemBFT

We now shift our attention to utilizing Twins for validating BFT replication in DiemBFT [

We discuss our implementation and evaluation of Twins for DiemBFT in Sections 5 and 6. In

this section, we provide an overview of DiemBFT (for details, see the technical report [

]).

DiemBFT operates in a round-by-round manner, electing leaders in each round among

the nodes to balance node participation. Rounds are slightly diﬀerent from conventional

“views” because it takes multiple rounds to reach a decision, but leaders are rotated in each

round. The leader protocol is quite simple. A leader proposes an extension to the longest

chain of requests that it knows already. Usually leaders collect batches of requests to propose,

referred to as blocks, hence the DiemBFT protocol forms a chain of blocks (or a blockchain).

Nodes vote for a proposed block, unless it conﬂicts with a longer chain that they believe may

have reached consensus already. Nodes send their votes to the next leader to help the leader

learn the longest safe chain. If there are three consecutive blocks in the chain,

k+1

k+2

, which are proposed in consecutive rounds,

k+1

k+2

, and each block has votes

from 2

+ 1 nodes (gathered in a data structure called the quorum certiﬁcate, or QC), then

the protocol has reached consensus on block B

If 2

+ 1 send votes to the next leader in a timely manner, a QC is formed by the leader

and it sends the next proposal. Nodes maintain a timer to track progress. When the timer

expires and a node still has not received a proposal, it broadcasts a timeout vote on a Nil

block. When a node gathers enough timeout votes to form a timeout certiﬁcate, it advances

its round. Every time a round fails, timeout periods are increased, allowing lagging nodes to

catch up and enabling the protocol to eventually reach a decision.

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:19

As brieﬂy alluded to in the Introduction, the trickiest part of BFT replication is to manage

leader transition. DiemBFT maintains four parameters to ensure safety, and at the same time

facilitate progress: (i)

current_round

, the node’s current round; (ii)

last_voted_round

, the

last round for which the node voted; (iii)

parent_round

, the round of the block certiﬁed by

the QC attached with the block being processed; (iv)

grandparent_round

, the parent of the

block certiﬁed by the QC; and (iv)

preferred_round

, the highest known grandparent round.

Note that as a QC serves as a pointer to the previous certiﬁed block,

parent_round

and

grandparent_round

do not need to be explicitly tracked; these can be derived from the QC

carried by a block.

Upon Receiving a Proposal.

. Upon receiving proposal for a block, a node processes the

certiﬁcates it carries, and votes for the proposed block if it satisﬁes a simple voting rule: If a

node voted for

k+2

, it prefers the sub-tree of proposals rooted at block

(regardless of

round numbers). A node will not vote for a block

that does not belong to its preferred

sub-tree rooted at

, unless

’s parent has votes from 2

+ 1 nodes at a higher round than

. Concretely:

Safety Rule 1. The block_round is greater than last_voted_round.

Safety Rule 2.

The block’s

parent_round

is greater than or equal to

preferred_round

If the node decides to vote for the proposed block, it updates its state as follows:

Update Rule 1. Update last_voted_round to round of the proposed block.

Update Rule 2.

Update the node’s

preferred_round

to the proposed block’s

grandparent_round

if the latter is higher.

Update Rule 3.

Update the node’s

current_round

to the

parent_round

+1, if the

latter is higher.

Upon Receiving a Vote.

. For every round, the nodes send their votes to the leader of the

next round. When the leader receives a vote, it performs the following safety checks:

Safety Rule 3.

If a vote from the same node was previously received for the same block

and round, the leader rejects the vote and generates a ‘duplicate vote’ warning.

Safety Rule 4.

If a vote from the same node was previously received for a diﬀerent

block but same round, the leader rejects the vote and generates an ‘equivocating vote’

warning.

If a vote passes both these checks, the leader considers it as valid and checks if it has

enough votes to form a QC. When a QC has been formed, the leader generates a new round

event, broadcasts a new block proposal and updates its state.

Update rule 4.

When a leader gathers enough votes to form a QC, it broadcasts a new

proposal and increments current_round.

Spoiler alert: In our evaluation in Section 6.1, we are going to deliberately modify the

above rules. We will see that this enables safety violations that the Twins framework will

expose.

C V I T 2 0 1 6

23:20 Twins: BFT Systems Made Robust

round

Vote

Preferred

Commit

2f+1 votes

Figure 4 Consensus and preferred sub-trees in DiemBFT.

B LibTwins Implementation of Scenario Executor and Scenario

Generator

This section provides the Rust code for the two main functions of Twins,

execute_scenario

and

scenario_generator

. The code listings in Figure 5 and Figure 6 present simpliﬁed

Twins interfaces, i.e., we omit Rust-speciﬁc features such as explicit typing, details of error

messages returned, de-referencing, and managing variable ownership.

The scenario executor, implemented by

execute_scenario

(Figure 5), executes scenarios

generated by the scenario generator. This function takes as input the number of nodes and

twins, and the leaders and partitions for each round. It creates a network with the given

inputs, and starts running the protocol until the nodes have emitted a given number of

messages, which approximate the number of rounds for which the protocol has been run.

The

execute_scenario

function exposes a simple interface, abstracting complex under-

lying network and SMR conﬁgurations. To demonstrate the simplicity and ﬂexibility of

execute_scenario

, we show how to implement a simple scenario (Figure 6) where no quo-

rum can be formed, and therefore no block gets committed. We set up a network with 4

honest nodes (

, n

3), and 1 twin (

twin

0). We split the network into two partitions

{n0, twin0, n

}

and

, n

}

. For each round

0 and

twin

0 (in partition 1) are leaders. We

then run the protocol for enough rounds (at least 3 in DiemBFT) to get a commit on a

block. In partition 1, both

0 and

twin

0 propose diﬀerent blocks for the same rounds.

will only vote for one of the two proposals because the second proposal is for a round that is

not greater than its

last_voted_round

(Safety rule 1, Section A). The second partition does

not have enough nodes to form quorum. Consequently, no blocks are committed.

C Detailed Safety Attack on Zyzzyva

We present a summary of Zyzzyva, and use Twins to reinstate a known safety attack [

] on

Zyzzyva [17]. We use the notation described in Section 2.

C.1 Summary of Zyzzyva

Zyzzyva is an SMR protocol in the same settings as DiemBFT (partial synchrony and

= 3

+1). It operates in a view-by-view manner. Each view has a designated leader. Nodes

vote on the leader proposal if they consider it valid (we describe the validity criteria below,

which has a ﬂaw that enables the safety attack). A commit decision on the leader proposal

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:21

fn e x ec ut e _s ce n ar io (

num_ nodes , // n um ber of no des

tar get_n ode s , // the nodes for which to cre ate t win s

round _p ar ti ti on s , // Vec tor of p ar ti ti on s for e ach rou nd

ro un d_ l ea de rs // V ector of lead er s for ea ch roun d

) {

l e t ru ntime = co ns e ns us_ ru n ti me () ;

l e t pl ay gr ound = N et wo r kP lay gr o un d :: new ( ru ntime . ha nd le () ) ;

// Sta rt nod es and t win s

l e t nod es = SM RN od e :: st ar t _n u m_ n od e s_ w it h _t w in s (

num_ nodes ,

& t arg et _no des ,

& playgrou nd ,

ro un d _p ro pos er s

);

// Creat e pa rt it io ns

cr ea t e_ par ti t io ns (& pl ayg rou nd , r ou nd_ pa rti ti o ns );

// Sta rt ru nning the pr ot oc ol and sendi ng m es sa ges

block _o n (asyn c move {

l e t pr op os al s = p la yg ro un d

. wa it_ fo r _m es s ag es (2 , N e tw ork Pl a yg ro u nd :: pro p os al s_o nl y :: < Payload >)

. await ;

// Pull en ough vot es to get a c ommit on the firs t bloc k

l e t vot es : Vec < Vote Msg > = pl ay gr ou nd

. wa it_ fo r _m es s ag es ( num_nod es * num _o f_r ou nds , Ne t wo rk P la ygr ou n d :: votes_ on ly

:: <Payload >) )

. c ol lect () ;

}) ;

// Che ck that the branc he s are consis te nt at all hei gh ts

l e t al l_b ra nc he s = vec ![];

for i in 0.. nodes .len () {

nodes [i ]. comm it _ cb _r e ce i ve r . clo se () ;

l e t no de_ co mm it s = vec ![];

while l e t no d e_ co mmi t_ id = nodes [ i ]. com mi t _c b_r ec e iv er . try _n ex t () {

no de _c omm it s .p ush ( no de_ co mm i t_ id );

}

al l_ br anc he s .p ush ( no de _co mm it s );

}

ass ert !( is_s af e ( al l_ br an che s )) ;

// Stop all node s

for eac h_ no de in nodes {

ea ch_nod e . stop () ;

}

Figure 5 The execute_scenario function which executes scenarios.

forms in either of two tracks, fast and two-phase. In the fast track, all

nodes vote for the

leader proposal to commit it. In the two-phase track, 2

+ 1 nodes form a commit-certiﬁcate

(CC), then 2f + 1 nodes vote for the CC to commit the proposal.

At the beginning of the view, nodes send the new leader a signed NEW-VIEW status

message. The leader’s ﬁrst proposal carries the status of 2

+ 1 nodes at the beginning of the

view to prove the proposal validity. The (ﬂawed) deﬁnition in Zyzzyva for a valid proposal

upon view change is as follows. For each sequence slot:

Validity Rule 1

The leader picks among the states of 2

+ 1 nodes, the

from the

highest view, if one exists.

Validity Rule 2

Otherwise, the leader picks a proposal that has

+ 1 votes from the

highest view, if one exists.

Validity Rule 3 Finally, if none of the above exist, the leader creates a Nil proposal.

The ﬂaw is to prioritize Validity Rule 1 over Validity Rule 2, which causes the leader to

C V I T 2 0 1 6

23:22 Twins: BFT Systems Made Robust

fn t w in s _n o _q u or u m_ s ce nar io () {

l e t ru ntime = co ns e ns us_ ru n ti me () ;

l e t pl ay gr ound = N et wo r kP lay gr o un d :: new ( ru ntime . ha nd le () ) ;

l e t nu m_ no de s = 4;

// 4 ho nest nod es

l e t n0 = 0 , n1 = 1, n2 = 2, n3 = 3;

// twin of n0

l e t twi n0 = n od e_ to_ tw in . get ( n0 ) ;

// twin of n1

l e t twi n1 = n od e_ to_ tw in . get ( n1 ) ;

// Ind ex # s of node s for which we wi ll cre ate t win s

l e t ta rge t_ no de s = vec ![0 ];

// Sp ecify round lea ders

l e t rou nd _l ead er s = Hash Ma p :: new () ;

for i in 1 ..1 0 {

// Inser t (n0 , twin0 , n3 ) as le ade rs for rou nd i

ro un d_ l ea de rs . i nse rt (i , vec ![ n0 , twin0 , n3 ]) ;

}

// Sp ecify round partit io ns

l e t rou nd _ pa rt i ti on s = H as hMap :: new () ;

for r in 0 ..1 0 {

// Inser t pa rt it io ns for rou nd r

ro un d _p ar t it io n s .ins ert (

r ,

vec ![

vec ![ n0 , twin0 , n1 ] ,

vec ![ n2 , n3 ] ,

);

}

ex ec u te _s c en ari o (

num_ nodes ,

& t arg et _no des ,

& r ou nd _p ar titio ns ,

& ro un d_ l ea de rs

);

}

Figure 6 Twins ‘No Quorum’ scenario.

prefer CC even if generated in a lower view than f + 1 votes.

C.2 Safety Attack on Zyzzyva

The Zyzzyva ﬂawed scenario safety demonstrated in [

] goes through a succession of three

views. In the ﬁrst view, a faulty leader generates conﬂicting proposals

, v

and splits honest

nodes between

+ 1 that vote for

and

that vote for

. The faulty leader gathers a

but does not send it to other nodes. In the second view, a good leader adopts

and drives agreement in the fast track. In the third view,

faulty nodes join the

+ 1

honest nodes that voted for

in the ﬁrst view. They send the leader a

for

, hence

the protocol proceeds with

, in conﬂict with the

commit. The attack on Zyzzyva needs

only

= 4 nodes, of which

= 1 is faulty, and it is fairly easy to re-instate using the Twins

framework. There are four nodes, (

D, E, F, G

). To model the case that

is Byzantine, it

has a twin

′

initialized with diﬀerent input. We drive the execution creating partitions and

electing leaders at each step, according to the attack described above. We describe below

the detailed attack using Twins.

Step 1 Initialize D and D

′

with diﬀerent inputs v

and v

Step 2 During View 1:

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:23

Create the following partitions: P

= {D, E, F }, P

= {D

′

, G}.

Let

run as leader for one round.

proposes

and gathers votes from

creating a CC.

Create the following partitions: P

= {E, F }, P

= {D

′

, G}, P

= {D}.

As a result, D does not get to share CC on v

with E and F .

Similarly, for one round let D

′

propose v

to P

and gather votes from P

Step 3 Delay all messages until a new view starts. View 2:

Create the following partitions: P

= {D

′

, E, G}, P

= {D, F }.

Run

as leader, and let it collect (NEW-VIEW) messages from

′

and

. Using

Validity Rule 2 (Appendix C.1), G decides to propose for v

Remove all partitions, i.e., P = {D, D

′

, E, F, G}.

G proposes v

, and collects votes from everyone. This leads to a commit of v

Step 4 Delay all further messages until new view starts. View 3:

Create the following partitions: P

= {D, E, F }, P

= {D

′

, G}.

Run

as leader, and collect (NEW-VIEW) messages from

and

. Note that

sends the

(from view 1) to

. Using Validity Rule 1 (Appendix C.1),

decides to propose v

proposes

, and gathers votes from

and

(who empty their local logs,

undoing v

). This leads both E and F to commit v

, a safety violation.

D Detailed Liveness Attack on FaB

We present a summary of FaB, and use Twins to reinstate a known liveness attack on FaB [

We use the notation described in Section 2.

D.1 Summary of FaB

FaB is a single-shot consensus protocol for the partial synchrony setting with n = 3f + 1.

A precursor to Zyzzyva, FaB is a view-based protocol with an optimistic fast track. A

leader drives a decision in the fast track if all nodes vote for it, and in the two-phase track

if 2

+ 1 nodes vote for a (2

+ 1) commit-certiﬁcate (

). When a new leader is elected,

it picks a valid proposal that does not conﬂict with neither

+ 1 votes nor a

in the

previous view.

FaB is actually designed for a parameterized model with

= 3

+ 2

+ 1, with safety guaranteed against

Byzantine failures and fast track guaranteed against

. For brevity and uniformity, we ignore

here

and set t = 0.

C V I T 2 0 1 6

23:24 Twins: BFT Systems Made Robust

D.2 Liveness Attack on FaB

The (ﬂawed) selection criterion above leads an execution in the following scenario to become

stuck. A faulty leader equivocates and proposes

, v

to 2

+ 1 and

honest nodes,

respectively. In transitioning to the next view, there is a commit-certiﬁcate for

and

+ 1

votes for

(including an equivocation by one faulty), hence neither is safe, and the new

leader is stuck. The attack on FaB needs only

= 4 nodes, of which

= 1 is faulty, and

it can be easily re-instated using Twins. There are four nodes, (

A, B, C, D

) with

as a

Byzantine node, for which we create a twin

′

initialized with diﬀerent input. We describe

below the attack using Twins.

Step 1 Initialize D and D

′

with diﬀerent inputs v

and v

Step 2 During View 1:

Create the following partitions: P

= {A, B, D}, P 2 = {C , D

′

}

Run D as leader for one round. D proposes v

to P

which decides to vote on v

Insert the following rule in

: (

B, D

)

→ A

. That is, the only messages allowed are

those from B and D, to A.

and

send their votes which only reach

. Thus, only

produces a

for

Meanwhile, the leader D

′

proposes v

to P

Step 3

Delay all further messages until new view starts. Create the partitions:

{A, C, D

′

}

{B, D}

. Let the new leader

collect NEW-VIEW status messages from

. These status

messages block

from proposing both

and

due to the FaB proposal validity rule.

The rule states that a proposal is valid if it does not conﬂict with neither

+ 1 votes nor

in the previous view, which is not the case for

(has a

) and

(has

+ 1

votes) as described below:

From A, the NEW-VIEW message contains the value v

, and a CC for it.

From C, the NEW-VIEW message contains the value v

, and no CC.

From D

′

, the NEW-VIEW message contains the value v

, and no CC.

E Detailed Liveness Attack on Sync HotStuﬀ

We present a summary of Sync HotStuﬀ, and use Twins to reinstate the force-locking

attack [

] on a preliminary version of Sync HotStuﬀ (which was ﬁxed in an updated version).

We use the notation described in Section 2.

E.1 Summary of Sync HotStuﬀ

The preliminary version of Sync HotStuﬀ [

] is an SMR solution in the synchronous model

with n = 2f + 1 parties.

In synchronous protocols like Sync HotStuﬀ, nodes execute the protocol in terms of ∆,

which is the known bound assumed on maximal network transmission delay. Sync HotStuﬀ

operates in a view-by-view regime —in each view there is a designated leader which proposes

The description here covers the ﬁrst of three variants in that paper; two other variants are designed for

slightly diﬀerent synchrony assumptions, but the attacks on them are similarly covered by the Twins

approach.

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:25

values to nodes. If a node accepts the proposed value, it broadcasts its vote. A node creates

a commit certiﬁcate (

) for a proposed value if it receives

+ 1 votes on it. Nodes track

the highest

, and only vote on a proposed value if it: (i) extends the highest

known

to the node, and (ii) does not equivocate another value proposed for the same height.

A node creates and broadcasts a blame against a leader: (i) if the leader does not propose

a value for 3∆, or (ii) the leader proposes an equivocating value. If a node observes

+ 1

blames against the leader in the current view, it broadcasts the

+ 1 blames, then waits ∆

(to allow the blames to reach all honest nodes), and moves to the new view. In the new view,

it immediately sends the new leader the highest CC it knows of.

After a view change, the new leader waits for ∆ to receive node status messages (carrying

the highest

known to them). The leader then proposes a value that extends the highest

from among the received status messages. Nodes proceed in the new view as previously

described.

E.2 Implementing Synchrony Attacks in Twins

Due to the synchronous settings and the nature of the attack which heavily leverages

synchrony assumptions, in this case a Twins scheduler must control message delivery timing.

More precisely, rather than only specifying whether a message is delivered to a party or

dropped, attacks on synchronous protocols require the Twins scheduler to deliver messages

to speciﬁc parties at speciﬁed times. While this is captured by the Twins approach, our

current implementation (Section 5) does not support this feature (this will be implemented

in future Twins extensions).

Generally, we expect that the granularity of the scheduler timing can be fairly coarse.

In particular, there is a known parameter ∆, the bound presumed by the algorithm on

message transmission delays and hard-coded into it. Indeed, the force-locking attack needs

to deliver messages at 0

5∆ increments, e.g., at times 0

5∆

∆

5∆

0∆

, ..

. Therefore, a

Twins network emulator could operate in discrete lock-step at 0

5∆ increments. With this

capability in place, the force-locking attack can be re-instated in the Twins approach as

described below.

E.3 Safety Attack on Sync HotStuﬀ

We now rebuild the force-locking attack on the preliminary version of Sync HotStuﬀ using

Twins. The crux of the attack is for a faulty leader to generate a last-minute proposal that

reaches only half of the honest nodes. The other half trigger a view change, and now the

system becomes split. The ﬁrst half continues to commit the ﬁrst leader proposal with “help”

from Byzantine nodes. The second half starts a new view and fork the chain. This attack

can be reinstated with Twins using 5 nodes (

A, B, C, D, E

), of which (

A, B

) are faulty and

have twins (A

′

, B

′

Notation

. We extend the notation described in Section 2 to capture message transmission

in the synchronous setting as follows:

−→

′

denotes the transmission of a value

from a

set of nodes

that generate the value at time

, to a set of nodes

′

that receive the value at

time

′

. If a value is broadcast, we use the

⋆

symbol instead of a set: For example,

−→

⋆

means that

broadcasts a value

at time

. Additionally, to highlight the ‘send’ or ‘receive’

action on a value, we use bold text on the left or right side of the arrow, respectively. For

example, S

−→

′

means that S sends v to S

′

(message arrival time is not known).

To reinstate this attack with Twins, we deploy 5 nodes (

A, B, C, D, E

), of which (

A, B

) are

faulty and have twins (

′

, B

′

). Here,

= 5,

= 2, and quorum size is 3 (since synchronous

C V I T 2 0 1 6

23:26 Twins: BFT Systems Made Robust

BFT protocols tolerate f Byzantine nodes for n = 2f + 1). We describe below the detailed

attack using Twins.

At time 1.5∆ :

is the leader, and broadcasts a proposal with

delay

= ∆ for the value

which

extends v

(A)

1.5∆

propose(v

)

−−−−−−−→

⋆

At time 2.5∆ :

C receives V

, and broadcasts its vote.

(A)

1.5∆

propose(v

)

−−−−−−−→

(C)

2.5∆

(C)

2.5∆

v ote(v

)

−−−−−→

⋆

At time 3∆ :

blames

since it did not receive a proposal from

within 3∆. Twins (

′

, B

′

) also

did not receive a proposal from

, hence they also blame with

. (

D, A

′

, B

′

) broadcast

their blames with delay = 0, receive f + 1 blames from each other, and start waiting

for ∆.

(D, A

′

, B

′

)

3∆

blame(A)

−−−−−−→

⋆

(D, A

′

, B

′

)

3∆

blame(A)

−−−−−−→

(D, A

′

, B

′

)

3∆

At time 3.5∆ :

receives

’s vote on

, but it cannot create a

since it has less than

+ 1

votes.

(C)

2.5∆

v ote(v

)

−−−−−→

(D)

3.5∆

(

A, B

) broadcast their votes on

, which arrive at

with delay 0. As a result,

gathers f + 1 votes on v

and creates CC(v

(A, B)

3.5∆

v ote(v

)

−−−−−→

⋆

(A, B)

3.5∆

v ote(v

)

−−−−−→

(C)

3.5∆

At time 4∆ :

receives

+ 1 blame messages from (

D, A

′

, B

′

), broadcasts all blame messages, and

starts waiting for ∆.

(D, A

′

, B

′

)

3∆

blame(A)

−−−−−−→

(C)

4∆

(C)

4∆

blame(A)

−−−−−−→

⋆

has waited for ∆ since it quit the old view

with leader

, so it starts the next

view

+ 1 and sends its highest commit certiﬁcate

(

) along with

+ 1 blames

on A to the next leader B, with delay = 0.

(D)

4∆

CC(v

),blame(A)

−−−−−−−−−−−→

(B)

4∆

The new leader B receives CC(v

) from D and f + 1 blames on A, and broadcasts a

proposal for value v

′

extending V

. Note that B does not know about CC(v

(D)

4∆

CC(v

),blame(A)

−−−−−−−−−−−→

(B)

4∆

(B)

4∆

propose(v

′

)

−−−−−−−−→

⋆

receives the proposal

′

from

, and broadcasts its vote with delay ∆, then it sets

its commit timer to 2∆ and starts counting down.

(B)

4∆

propose(v

′

)

−−−−−−−−→

(D)

4∆

(D)

4∆

v ote(v

′

)

−−−−−−→

⋆

At time 4.5∆ :

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:27

receives votes on

from (

A, B

); as it has now gathered

+ 1 votes on

it creates

CC(v

). However, this certiﬁcate is too late, as we will see in the following steps.

(A, B)

3.5∆

v ote(v

)

−−−−−→

(D)

4.5∆

At time 5∆ :

has waited for ∆ since it quit the old view with leader

, so it starts the next view

w + 1 and sends its highest certiﬁcate CC(v

) to the new leader B.

(C)

5∆

CC(v

)

−−−−−→

(B)

receives

’s vote on

′

but does not vote since

′

(which extends

(

)) does

not extend its highest certiﬁcate CC(V

(D)

4∆

v ote(v

′

)

−−−−−−→

(C)

5∆

At time 6∆ :

commits

′

since it ﬁnished waiting for 2∆ and observed no equivocation or blame

in the view w + 1. However, D’s highest certiﬁcate is CC(v

) (see time 4.5∆).

Now if the current leader

goes oﬄine, this will result in a view change to view

+ 2

and the new leader will extend the blockchain from the highest certiﬁcate from the

previous view,

(

). But

has committed

′

conﬂicting with

, hence safety is

violated.

F Attack on Fast-HotStuﬀ

We present a safety attack against Fast-HotStuﬀ [15] and express it using Twins.

F.1 Summary of Fast-HotStuﬀ

Fast-HotStuﬀ is essentially HotStuﬀ [

] with a 2-phase commit rule. In the happy-path, if

the leader of round

is successful, the leader of round

+ 1 performs the same protocol as

HotStuﬀ, namely, it collects a QC from previous round and embeds it in the

+ 1 proposal.

In the unhappy-path, if the leader of round

is unsuccessful, the protocol for leader

+ 1

(

x >

0) provides a proof in the

+ 1 proposal that it is using the highest QC from 2

+ 1

validators. This proof incurs quadratic communication complexity. Moreover, Fast-HotStuﬀ

claims it does not require consecutive rounds in order to commit.

The beneﬁts of Fast-HotStuﬀ are twofold. It provides a fast 2-phase track for HotStuﬀ

whenever the leader is successful in obtaining a QC for the previous round (happy-path).

Fast-HotStuﬀ is also faster both in phases (2 phases instead of 3) and in getting to a scenario

that guarantees progress, namely, it requires 3 consecutive honest leaders (instead of 4).

Requiring a leader proof for the unhappy-path prevents a proposal that conﬂicts with an

uncommitted and unlocked tail of a chain that already has a QC. Thus, dishonest leaders

cannot intentionally slow down progress by overriding the latest tail.

Fast-HotStuﬀ is however ﬂawed as explained in Appendix F.2.

F.2 Safety Attack on Fast-HotStuﬀ

Figure 7 illustrates the safety attack against Fast-HotStuﬀ that we implement using Twins.

There are four nodes (

A, B, C, D

) all of which are honest—the safety attack can be executed

leveraging only network partitions. Blocks are represented by rectangles (which are annotated

with the nodes that receive the block). Block proposers are indicated as ‘authors’. Diamonds

C V I T 2 0 1 6

23:28 Twins: BFT Systems Made Robust

round: 3

author: A

round: 4

author: A

round: 5

author: B

round: 8

author: B

round: 9

author: B

round: 6

author: A

round: 7

author: C

round: 10

author: C

round: 11

author: C

QC QC QC

QCQC

QC QC

A, B, C, D

A, B, D

A, B, C, D B

A, C, D C

A, C, D A, C, D

Figure 7 Example of safety attack on Fast-HotStuﬀ.

refers to QCs (which are embedded into blocks). The arrows indicate the block that a QC

refers to.

We execute the safety attack in 11 rounds starting at round 3 (rounds 2 carries QC for

the genesis block).

Round 3: Initially there are no partitions, i.e., P = {A, B, C, D}.

proposes a block. Nodes send their votes on this proposal to the leader of the next

round, node A.

Round 4:

gathers votes from the previous round, forms a QC, and includes the QC in a new

block proposal. Nodes send their votes on the new proposal to the leader of the next

round, node B.

Round 5: Set node B as leader, i.e., P = {A, B, C, D}.

gathers votes from the previous round, forms a QC, and includes the QC in a new

block proposal.

Create the following partitions: P

= {A, C, D} and P

= {B}.

The partitions prevent

from broadcasting the new block (and the newly formed QC

it embeds). B is thus the only node knowing the QC certifying the block of round 4.

Nodes of

time out, and send a NEW-VIEW message to the leader of the next round

(node A) containing their highest known QC.

Round 6: Set node A as leader, i.e., P

= {A, C, D} and P

= {B}.

selects the highest QC from the NEW-VIEW messages (i.e., the QC certifying the

block of round 3), and embeds it in a new block proposal. All nodes of

vote on this

proposal and send their votes to the leader of the next round (node C).

Round 7: Set node C as leader, i.e., P

= {A, C, D} and P

= {B}.

gathers votes from the previous round, forms a QC, and includes the QC in a new

block proposal.

Create partitions: P

= {A, B, D} and P

= {C}.

These partitions prevent

from broadcasting the new block (and the newly formed

QC it embeds).

is thus the only node knowing the QC certifying the block of round

Nodes of

time out and send a NEW-VIEW message to the leader of the next round

(node B) containing their highest known QC.

Round 8: Set node B as leader, i.e., P

= {A, B, D} and P

= {C}.

selects the highest QC from the NEW-VIEW messages (i.e., the QC certifying the

block of round 4, presented by

), and embeds it in a new block proposal. All nodes

vote on this proposal and send their votes to the leader of the next round (node B).

Round 9:

S. Bano, A. Sonnino, A. Chursin, D. Perelman, Z. Li, A. Ching and D. Malkhi 23:29

gathers all votes from the previous round, forms a QC, and includes the QC in a

new block proposal.

Create partitions P

= {A, C, D} and P

= {B}.

The partitions prevent

from broadcasting its newly block(and the newly formed QC

it embeds).

is thus the only node knowing the QC certifying the block of round 8

and committing the block at round 4.

Nodes of

time out and send a NEW-VIEW message to the leader of the next round

(node C) containing their highest known QC.

Round 10: Set node C as leader, i.e., P

= {A, C, D} and P

= {B}.

selects the highest QC from the NEW-VIEW messages from the previous round

(the QC certifying the block of round 6, presented by

), and embeds it in its new

block proposal. The highest QC in the NEW-VIEW messages.

All nodes of

vote on this proposal and send their votes to the leader of the next

round (node C).

Round 11: Set node C as leader, i.e., P

= {A, C, D} and P

= {B}.

assembles votes from the previous round into a QC certifying the block of round 10,

thus committing the block of round 6.

The safety violation appears at round 11 when node C commits the block of round 6 while

node

previously committed the block of round 4: both blocks have the same height and

fork from the block of round 3.

F.3 Implementation of the Attack

We implemented a Python simulator of Fast-HotStuﬀ using the discrete event simulator

simpy. We demonstrate the safety violation by running a manually-crafted scenario in the

simulator. We are open sourcing our Fast-HotStuﬀ simulator as well as our Twins scenario

used for the attack

https://github.com/asonnino/twins-simulator

C V I T 2 0 1 6