Cuckoo Cycle: Memory Bandwidth Bound PoW Mining

The algorithm seeking to minimize computation in GPU PoW mining.

Since a while ago, the æternity community has been discussing several subjects, such as AE Token issuance, (which are just about to arrive), potential use cases, State Channels, and the oracle (among with a few more). But there’s an especially relevant subject which until now seems not to have received much attention: the Proof-of-Work (PoW) algorithm.

As many already know, æternity blockchain will implement two algorithms – PoW (for mining) and PoS (for staking on forks). In this post, we’ll talk about æternity’s PoW algorithm: Cuckoo Cycle. We want to address some theories about one of the main subjects on mining groups: Most efficient mining RIG.

So far, æternty’s TestNet has not implemented Cuckoo Cycle, which is why we are sharing data based only on theoretic calculations. Next months, after the implementation, we’ll start doing some tests and share more accurate info.

According to description of algorithm developer:

Cuckoo Cycle is the first graph-theoretic proof-of-work, and the most memory bound, yet with instant verification.

He also mentions the fact that this algorithm “avoids the traditional ASICs arm race since it only takes a few dozen tiny siphash computing cores to saturate the DRAM memory bandwidth, at which point any further performance improvements will go to waste as the ASIC sits idle waiting for memory”.

Put in a few words, it will be possible to mine with your GPU, but your rig’s profit is bound to its memory bandwidth.

In that regard, Cuckoo Cycle is rather similar to the Equihash algorithm used as the Zcash PoW. Both use bucket sort to find matches in large amounts of data. Where Zcash uses a million outputs of the blake2b hash function as random data, Cuckoo Cycle uses a billion outputs of siphash24. The most efficient Zcash GPUs, in terms of expected energy use to find one solution, appear to be the various incarnations of NVIDIA GTX 1080 and 1070, and it’s reasonable to assume those would also be the best performers on Cuckoo Cycle.

Where Cuckoo Cycle differs is that there is an alternative algorithm that uses an order of magnitude less memory while suffering nearly an order of magnitude slowdown. In fact, this is a simpler algorithm, but instead of being bandwidth bound, it is memory latency bound. Although RAM stands for random access memory, existing DRAM is in fact optimized for sequential access. Cuckoo Cycle thus encourages the development of memory that is truly random access, as it would make the simple algorithm not only more memory efficient, but more energy efficient as well.

æternity is still exploring the prospect of applying Cuckoo Cycle and working closely with John Tromp to find the best path forward. Although GPU mining is preferable to ASIC mining in terms of centralization, we will continue to look for even more decentralized alternatives.

Q&A with John Tromp at Reddit

We are organizing a Q&A with Mr. John Tromp — the creator of Cuckoo cycle on r/aeternity. You can learn more about how to participate in this post:

If you have any comment for us about this information we will be happy to talk more about it, here or on any of our channels.

GitHub | Reddit | Telegram | Twitter | Facebook | Mail

You can also visit the Cuckoo Cycle project page at GitHub.

Leave a Reply

Your email address will not be published. Required fields are marked *