Stark @ Home: StarkWare’s Lightning Fast Next-Gen Prover

StarkWare has recently open-sourced the next-generation STARK prover, Stwo. Estimated to be at least 100 times faster than Stone, our current prover, Stwo is poised to become the fastest validity prover available.

Join us and the team from Polygon at the next STARK @ Home event to delve into the mathematics driving Stwo’s performance:

Performance:

  • The impact of small field sizes on proof generation time
  • Real-world measurements of Stwo

Light Math:

  • GKR: Balancing round complexity and hash complexity
  • Mixed degree: Transitioning from one-size-fits-all to custom-tailored solutions
  • Circle STARK: Discovering recursive patterns in unexpected places

Deep Math:

  • A detailed exploration of the Circle STARK white paper

Transcript:

Eli Ben Sasson:
Hello. I guess my first question would be, can you see and hear us? Please write in the comments. Yes. Thank you, Patrick. Okay. So, terrific. And thank you, Mark. Okay. It’s always confusing with these things. So, my name is Eli Ben Sasson. I’m CEO at Starkware. And welcome everyone to another STARK at Home episode. We have here Ulrich Habock, an applied cryptographer from Polygon Labs, Shahar Papini and Andrew Milson from Starkware. And today, we’re going to discuss Circle STARK and Stwo. So, the schedule is going to be like this. We’ll start with a very brief introduction. We’ll hand it over to see some performance metrics and why at least we are very excited about what’s coming. And then, we’ll talk first a little bit gently about the math and innovation in there. And then, time permitting, we’ll go deeper and deeper into the math, as much as you have patience for that.
I urge you all to post questions. I think you can use the comments for it. So, here’s a question and from time to time we will try to answer them. So, about four years ago, we came out with our first production system and it was serviced by a prover that we call Stone. This prover was used to settle over a trillion dollars’ worth of volume on top of Ethereum and settle hundreds of millions of transactions. And it’s been working pretty nicely for the past four years or more in production.
And it is open source now and it works very well. But, now we’re actually building a new open source prover called Stwo. And its development is being done open source. We can share with you the link. And it’s going to be much, much better and much faster. And to explain what we mean by much better and much faster, I’m going to hand it over to Andrew Milson to share some numbers. And again, after that, after discussing the performance and also offering some interesting research and collaboration avenues, we’re going to try and explain why is it a little bit faster. So please, Andrew, take us through it.

Andrew Milson:
Awesome. Well thanks, Eli. And hey, everyone. So, yeah. What I’d like to speak about right now is a couple of things. I want to talk about the high-level design of Stwo, how we’ve gone about designing it so that it’s fast on multiple different architectures and different kinds of hardware. And then I want to talk about the performance compared to Stone. And at the end, I’d like to talk about some exciting directions, or some exciting things that we can collectively build as a community with Stwo. So, yeah. Let me just share my screen, actually. Okay. So, can everyone see that? Can everyone see my screen? Is that good?

Eli Ben Sasson:
I can answer, but yeah, let’s wait for the people from the audience.

Shahar Papini:
Yeah. The audience seems to see the screen.

Andrew Milson:
Okay. Cool. So, yeah. So, I want to start with the high level design of Stwo. So, our goal is to make Stwo as fast as possible. And to do that, what we’ve done is we’ve written the protocol in Rust and then we’ve made it generic over a backend. And a backend is a type, but it’s two things. It defines how it stores its data and how it operates on that data. And currently in Stwo, we actually have two backends. We originally wrote the protocol and we created a CPU backend, which was our naive implementation where we wrote the backend without trying to make it optimized for any specific kind of architectures or hardware. And that was good. We already saw a really great performance. We saw a great performance increase over Stone from that alone.
But, then we created another backend and we created a SIMD backend. And for those who may not be familiar with SIMD, it stands for Single Instruction Multiple Data, and it’s a class of instructions on the machine that, instead of targeting one piece of data target multiple. So, an example would be you have an instruction that performs an addition on two numbers. And with SIMD you can perform an addition on 16 numbers in more or less the same amount of time. So, when you utilize something like SIMD on a machine, you can get a really great performance increase. So, yeah. So, we built the SIMD backend and we saw a really great performance increase on that. So yeah, that’s a little bit about the high-level design of Stwo. We wrote the protocol and then you can plug in whatever backend you want, whatever hardware you want to support. And now I’d like to talk about the performance.

Shahar Papini:
I just like to add that we do expect other backends to also be implemented for all kinds of hardwares, like GPUs, FPGAs, wherever you want to, basically.

Andrew Milson:
Absolutely. So, I don’t have the data on how fast you can prove Cairo and Stwo. We’re still working on that. But what I have here is just the single thread FFT performance of Stwo with both of our backends, compared to Stone. So, the blue line here at the bottom is Stone and all of the dots at the top here is Stwo. And you’re going to have to believe me that the FFT is a big bottleneck for the prover. If you squint hard enough at the protocol, it’s just a bunch of FFTs and hashes and polynomial evaluation. So, by improving the performance of FFTs and some of the other primitives, it directly translates to the performance improvements of the entire protocol. So, I just want to say a couple other things about this. These red and pink dots at the bottom, this is the CPU backend I just talked about. And all of the green, the orange, the purple, and the brown, these are all the SIMD backends.
Well, it’s actually something that maybe the engineers will find interesting, is we originally wrote a backend for AVX-512, specifically. We just focused on AVX-512 instructions. And then we switched it out with Rust’s portable SIMD library and we saw no degradation in performance. And what we got along with that was, not only a implementation for AVX-512, SIMD instructions on Intel machines, but we also got really good performance for AVX-2 and even WebAssembly and the NEON instructions on an M1 Max. And the difference between the M1 Max performance, the native performance with NEON instructions and WebAssembly … There’s only like a 30% loss in WebAssembly. I thought that was quite interesting. And also if the log scale doesn’t make it clear, the difference between the Stone FFT performance and the Stwo FFT performance is 200 X. So, it’s a real jump in performance.
And this is just implementation of an FFT, but in Stwo we haven’t just optimized the implementation, we’re also using more optimal algorithms. So, yeah. So Shahar will touch on how we’re using GKR for things like lookups and mixed degree to get improved performance there. Okay, so that’s a little bit about the performance. And, yeah. The other thing I just want to say is that the backend’s a modular/ you can create your own backend in your own repository and plug it directly into Stwo. And if anyone’s excited about this, I think it would just be so awesome to have more backends, as Shahar mentioned.
So, the backends can define how they store memory and operate on that. So, anything from an Nvidia GPU or some crypto-specific hardware, like a fabric VPU, or even really fast implementations in the browser using web GPU, can all be supported in Stwo and people can create their own backends for this. And I think that’s really exciting. We’ll see more orders of magnitude performance on top of the SIMD backend, I think, if we see these backends. So, I encourage anyone who’s interested in that to explore that and feel free to send a message about that or anything like that. So, yeah. Anything you’d want to add, Eli, Shahar or Ulrich?

Eli Ben Sasson:
Do you see the question? There’s a question by Patrick Grinoway that Shahar already answered, but maybe you can answer it for the audience. “How should we interpret the throughput of the FFT? Does this directly translate to the performance of the LDE step?” Patrick-

Andrew Milson:
[inaudible 00:11:36] Yeah. Exactly. So, the throughput here is field elements per second. So, in terms of Stwo, this is the Mersenne field and in terms of Stone, it’s the field used with Cairo. But, yeah. So, that’s how we’re looking at throughput. But, for instance, let’s take Cairo for example. You’ll have an LDE over the addresses used in Cairo, which could be stored in a Mersenne element. So, although the field sizes are different, you don’t need a large field to store a lot of the things that you need in an [inaudible 00:12:22] for a VM or something like that. So, yeah. I just want to mention the throughput is for field elements, Mersenne vs the Cairo field.

Eli Ben Sasson:
Okay.

Shahar Papini:
Yes. I want to emphasize that the 200 x you see here is again for field elements. And it really depends on the statement you’re trying to prove. Some statements, if you go to a smaller field, you don’t need to add more of these fields, field elements. In some you do. So, it depends on the statement you’re trying to prove. But, like Andrew said, usually in VMs then you get this 200 x trade off, because you usually are not using big values in your elements.

Ulrich Habock:
I have a question Shahar, because I don’t have any clue of neither implementation of yours. So, the Stone prover that’s in Rust?
Shahar Papini:
It’s C++.

Ulrich Habock:
C++ even. Okay. And it uses also SIMD?

Shahar Papini:
Well, it’s a good question. Using SIMD did not provide the performance gains, because we are using another feature of the CPU, specifically of Intel CPUs for these multiplications. It’s not vectorized, but it is using a more efficient version for the multiplication and propagate, carry. It’s like [inaudible 00:14:13], something like that. So, for the big fields, we did implement it as performant as we could. Using vectorization did not make it more performant in that case.

Ulrich Habock:
Okay. I’m not a developer, I’m quite surprised. Couldn’t you have just scale say across columns? That means row-wise, you do always the same operation, even if it’s large?

Shahar Papini:
But, we don’t have the same operations in vectorization that you have on single elements. These operations on single elements that did more things than the just vectorized versions.

Ulrich Habock:
Okay. Thanks.

Shahar Papini:
Sure.

Eli Ben Sasson:
Okay. So Andrew, are we done with the performance discussion?

Andrew Milson:
Yeah. Yeah. I’m happy to pass it over now.

Eli Ben Sasson:
Okay. So, maybe let’s move on to talking gently and lightly about some of the math. We saw here the FFT that, I guess, some of it is the effect of moving to a smaller field. And we’ll talk about that a little bit later. But, there are a number of other improvements that will go into Stwo, most notably two of them are the use of GKR and mixed degrees. So, now I asked Shahar in advance to prepare a gentle non-math explanation of what is each one of these, and why should we care and how will they improve performance. So, let’s start with GKR. And I encourage the listeners if things are not clear, please ask questions. Yeah. So Shahar, what is GKR? How should we think about it?

Shahar Papini:
Yeah. Okay. Okay. First of all, starks began in a really specific way. starks, pretty much, there’s a table of values and you have some constraints on them and that is what you prove. That is how we started, that’s how Stone works and that’s fine. There are other proof systems which are not starks, which use other kinds of mechanism instead of this constraint system, that is proven using starks. They use some checks, things like other things. And the problem was that some check needed some primitives that weren’t in starks. A bit more technical, starks need univariate polynomials and some checks need multi-linear polynomials, that is polynomials with a lot of variables, but of degree one in each. And these things didn’t really fit together.
However, it turns out they can work together and that is exactly what we are doing in Stwo and I’ll explain the benefits. Some checks enable this GKR thing. It’s a cool algorithm, it’s a cool protocol also for proving some computations. But, instead, like in Stark of taking your table of values and committing on it using some a hash, hashing it and sending it as the proof … Instead, it’s using multiple rounds between the prover and the verifier to prove some kind of arithmetic circuit. If you have this inputs, you do some operations. Some operations. Operations operation, all of them, you get some output.
So, it’s trading off instead of committing to all these intermediate layers, it’s doing more rounds of pulling randomness from the verifier and sending things from the prover. That is the trade of GKR. Sometimes it’s okay to take, sometimes it’s not. It seems like the best way is to combine both. We have a single error, some of the weakness we are committing on and some of the computations on this weakness, we are using GKR to prove. And we get this combination that makes our errors more efficient. Specifically, we took as an example, proving the BLAKE hash. The Blake hash is a hardware hash. It’s fast for hardware and for CPUs, but it’s not as efficient to use in Stark proofs. But, using both GKRs and regular starks, we were able to cut down the cost of the proof about X two, X three, something like that. And that was enabled by this way to combine some checks, GKR and starks. And so, that is one thing we have in Stwo. I think Andrew worked on implementing it, so maybe he can also provide some details on its concrete performance. Yeah. Okay. That is one.

Eli Ben Sasson:
How much does it save, this moving to more rounds of interaction, but fewer use of hashes?

Andrew Milson:
So, what we saw is that with grand products, I think it was a two times increase in improved performance. And I think we expected it to be faster initially, but the FFT … So, previously we’d done lookups with FFTs and hashing and those are so fast in Stwo that maybe we were expecting a five x or a 10 x, but it was really only a two x, because in Stwo FFT and hashing are just so fast, anyway. With the lookup arguments, I think we see more like a three or four X improvement. And these translate to big performance improvements overall, because something like BLAKE2 would have thousands of lookups.

Shahar Papini:
Yeah.

Eli Ben Sasson:
There’s a question here. Is there any significant … Again, Patrick. Patrick, thanks for asking an excellent question. “Is there any significant effect on prover memory usage with GKR?”

Shahar Papini:
Yeah. It’s a good question. There is some kind of trade off you can do there. If you do it naively, you still need to keep everything in memory. You can do some trade off that, for the intermediate values, you can keep only some of them in the memory at each point of time. But, then you pay with more rounds or more overhead later. So, if your main concern is memory, you can use this to reduce the memory. But again, it’s a trade-off-

Eli Ben Sasson:
But, aren’t they, all of them like sort of O of N, anyways?

Shahar Papini:
It depends. If you’re bounded by the maximum memory, then you don’t want to keep all of it together. Let’s say your weakness is of size 100, but you’re in [inaudible 00:22:24] computations are a size 400. So, instead of saving 500 in memory, you could maybe reduce it to 200 most and do it. And splitting the 400 minutes to four parts each one of size 100, you can do things like that. Okay-

Eli Ben Sasson:
[inaudible 00:22:45] Which hash are you using?

Shahar Papini:
We are using Blake hashes for the prover as well. We are proving the statement of Blake’s and the proof itself also uses the Blake hash for the commitment. That is to allow us to do recursive proofs. The reason we choose-

Eli Ben Sasson:
I guess I’ll ask, Shahar. Isn’t it the trade-off where it’s very efficient in terms of CPU to hash, but then very inefficient to recursively proof that you use the Blake?

Shahar Papini:
It is. It is. But okay. If your main concern is to reduce the overall approval time and you assume that you have enough things to prove, then you can always … Okay. Because the verification is sub-linear, you can always take computations which are so big that make the recursion insignificant anyway, even if it costs two x or three x. So, it still pays off if your intention is to reduce the overall cost of proving, you want to make the prover as fast as possible. And another point is assuming you are doing recursion, at the end, at the last proof, you can start to do recursive proofs that are more beneficial for the verifier, to make the final proof verifier efficient, so you can somehow enjoy both worlds.

Eli Ben Sasson:
Okay. There was another question, now. A lot of very … Oops, sorry-

Ulrich Habock:
[inaudible 00:24:24] I run into here, because I was trying to answer one of the questions. Hope I did it. So, where are the places where GKR is a benefit versus a classical univariate strategy? The main benefit as Shahar was pointing out is … Or, let’s say, trade-off you face is, on the one hand, a multi-round protocol with this many interactions. But, intermediate results are not committed. So, no hash of intermediate states versus something an univariate strategy that computes, where you commit all the intermediate states and use univariate strategies to prove it.
So, in particular, it’s the cost of the GKR protocol itself, which uses a lot of linear extrapolation steps. That’s the linear and multi-linear protocol. That costs a lot, because you do it over large domains. But, you don’t commit and in the end it’s about really first of all, which hash function do you use? Is it something like Poseidon, which is more costly and primitive, or is it something like BLAKE, which is very fast and primitive, but more costly than Poseidon in circuit. Things like this, versus how fast is the arithmetic steps you do in every single round of the GKR protocol?

Eli Ben Sasson:
So, there’s another question here. Logan, by the way, terrific questions. Keep them coming. “Does the fact that GKR is an interactive protocol lead to a downturn in performance, in memory?”

Shahar Papini:
The interaction is first a verifier penalty. And, the verifier needs to do a lot more hashes in its verification for each round, basically. So, that’s the first drawback. In terms of performance memory, the performance is actually better, assuming your computation is in the extension field. It’s a bit of a fine point. If the computations are in the base field, then there are still stuff you can do. It’s at the edge of research, we don’t have any of that yet in Stwo, but when it’s computation the extension field, then like Andrew mentioned, the performance is better, the memory naively isn’t better. But, you can do another trade-off that makes the memory a bit better and you again pay more for the verifier and a bit more for the prover. So, you have two trade-off space you can do with GKR, which generally gives better results.

Eli Ben Sasson:
I want to add here that I think this was mentioned before. But, GKR or some checks are very useful when … And also Ulrich wrote this, when what you’re verifying with them or proving with them is a very shallow and very wide circuit. So, in the extreme case, if it’s something like a lookup table, where basically it’s like depth one, because the complexity grows with the depth of the circuit or the computation being made. So, probably for general purpose sequential computation, GKR doesn’t work as well as using errors. But, for things like lookup arguments or logup arguments and very, very shallow circuits, they work very well.

Shahar Papini:
Right. [inaudible 00:28:23]

Eli Ben Sasson:
Okay. “I remember it was claimed 1.6 x speed up in the paper for some memory issues. What is the obstacle for the claimed 100 x speed up in Stwo? Is that solved?” I’m not sure I understand exactly the question. Maybe you guys understand.

Shahar Papini:
It seems like you are mixing memory with a runtime performance?
Eli Ben Sasson:
No. No. I think the 1.6 I think was over in the white paper. I think there’s a comparison of Circle STARK to some other small field like BabyBear-

Shahar Papini:
BabyBear.

Eli Ben Sasson:
And here we’re talking about Stone, which is over a much larger field. Yeah, exactly. Oh, sorry. Ulrich has … So, 1.6 in the paper is compared to-

Shahar Papini:
BabyBear. Right.

Eli Ben Sasson:
BabyBear, which is another 32 bit prime or 31 bit prime. Okay. So, let’s move on to the next improvement, which is, what do you mean by mixed degree? Give us some informal explanation of what’s going on.

Shahar Papini:
It’s not something that hasn’t been done before. Some people did it, I think even Polygon had some version of it. But, basically, remember I said in starks we have usually stable values? Well, sometimes you can’t really fit your statement in this rectangle table of H on W. Sometimes you have columns of different length you want, for example. Maybe I’m doing 1,000 hashes, but only 200 multiplications, or something like that. I want things to not be of the same degree, not want all the columns to be of the same length. So, what we can do and what we do in Stwo is, basically, have multiple … It’s like multiple tables of different heights, but we’re not committing on them separately. We’re committing on them together using Merkle Tree shenanigans and FRI shenanigans by … FRI, for example, is a protocol that based on layers. You have this big layer and then you fold it to a smaller layer. Big layer, smaller layer and then you fold it to a small layer.
So, you can imagine that if I start with the layer of size, let’s say, 1,000, and then the next is next layer is size 500, then I can hear a jump and add another column that is shorter. If I had a column size 1,000 and another column size 500, I can edit later in the FRI protocol or later in the Merkle Tree. So, that is what mixed degree means, that, A, it gives us better packing of the data we want inside the commitment we have. So, we need to do less hashes and not waste this commitment space. And, B, the degrees of all kinds of things like constraints and polynomials in FRI, these degrees can be lower than they would’ve been before.
And that actually affects the performance directly, because for a constraint you need to do some work for it in stark. This work is proportional to the total degree of the constraint. So, if you have a constraint that work on smaller degree polynomials and the total degree of them is smaller, then you work less for them. Yeah. This is a mixed degree, basically. Let’s see, are there any questions? Is there a soundness consequence to mixing with FRI? Not that I know of. Ulrich [inaudible 00:32:46].

Ulrich Habock:
It’s negligible. It’s negligible. If you look carefully at the formula from the proximity gaps paper for batch FRI, because it’s always … It’s always about random linear combinations. Either the big, the huge batch in the very beginning, or the smaller batches of just all the even functions in the folder cascade of FRI. But, essentially, there is AN influence on the size of the batch, so how many functions are subject to the linear combination on the soundness error. But, it’s negligible. You have maybe one bit, two bits, something like this, two bits of security. So, typically something which is not of concern.

Eli Ben Sasson:
Yeah. I’ll say that when it comes to soundness for mysterious … Well, I guess mysterious reasons up to the fact that that’s what just the math shows. There are certain parts that are very, very good soundness, like the linear mixing of various commitments, you pay pretty much the minimum possible that you could hope for. So, if you add K things, you select randomness over a field of size Q, then your soundness, you lose something like K over Q, which is very, very negligible, because often K is small, let’s say, less than 1,000. And Q is, by this stage, really large, like two to the 128, or maybe even larger. So, you lose very small soundness.
There are things where you lose in a much worse way, which is more correlated to the rate of the polynomial or the blow up factor. And that is related to the queries. And there, actually, soundness isn’t as good. It is the ratio between, well either the ratio or square root of the ratio between the code length and the degree of the polynomials. And that’s a very major things are not as good there. But, luckily the mixing or mixed degrees is on the part where it costs you almost nothing, as Ulrich said. There’s another question Logan is asking. “Are there constraints between the subtables with polynomials of different degree?”

Shahar Papini:
Yeah. That is also a good question. You can do them. We try to avoid them. The reason is, it adds inefficiencies. Let’s say you have a constraint between a column size 100 and one of size 500, then the total degree of the constraint would be 500, because it has this polynomial of degree 500 in it. So you would have to evaluate it on 500 points in the stark protocol. And that means that the small column of only size 100, you have to evaluate it in a lot of points all of a sudden. So, it adds these kinds of inefficiencies. Sometimes we have to do it, but usually we try to avoid it and we link different size tables using lookups and this is more efficient, usually.

Eli Ben Sasson:
So, did we measure Circle STARK performance for other fields? I know the short answer is no, because it’s not that easy to implement efficiently any sort of FFT. So, right? The answer is no.

Shahar Papini:
Mm-hmm. Yeah. We haven’t.

Eli Ben Sasson:
Okay. By the way. But also, even M127, it’s half the size of the Stone prime. So, you’d expect it to be a bit … M17 more interesting. All of these numbers send primes. There are so many of them. Okay. Okay. So, let’s move on to a third part of Circle STARKs. I’ll try to give an explanation of something that I actually wrote in a blog post that you’re welcome to. I put it here in a comment. You’re welcome to read it later and share with your friends. So, I attempted to explain why am I excited about Circle STARKs and what is even going on. So, let me try and run this explanation, this very informal explanation by you.
So, one part that is obvious and I think is a major part of the explanation for the 200 x improvement from moving from Stone to Stwo is just that you get to work with smaller size numbers. Right? So, the Stone system works over 252-bit numbers and now each multiplication in addition takes many cycles. We counted around 40 cycles to do one multiplication. And when you move to these 32-bit integers, especially … Well, not all 32-bit integers apparently behave the same when it comes to things like multiplication and addition.
And the reason for that has to do with a binary representation of them. And in this case, Mersenne primes are very special, not just because they’re named after Mersenne, but because … Well, think of M31. It is the prime two to the 31, minus one. So, if you’re working modulo this prime, which is what you do when you work in this finite field, it means that the number two to the 31 is equal to one. So, one with a lot of zeros, I guess, 30 zeros or something, or 31 zeros, is equivalent to one. Which means that when you’re doing modulo operations and reductions after you did the multiplications or additions, you get something that is very efficient and involves a very small number of cyclic shifts. So, not all 32-bit primes are the same. And you would like ideally to work with something that is like a Mersenne prime. It would be really great.
However, there’s another thing that you want from the modulus that you’re working with and this has to do with the structure of groups over this finite field. So, I’m assuming that if you stayed with us so long, you probably are not afraid from hearing the term “FFT,” the Fast Fourier Transform. And I’m guessing that a lot of you probably remember learning the FFT algorithm or reading it someplace. It’s this beautiful, super-efficient algorithm that basically moves from one representation of a signal to another. And when you heard about it first, you probably were given a signal transformed by evaluating it over a roots of unity. And the number of roots of unity was a power of two.
And the reason you want an exact power of two, is because you have this beautiful recursive structure. If you look at the number, let’s say, two to the 10 … If you divide it by two, you get another number that is of the same form but smaller, it’s two to the nine. You divide it again by two, you get two to the eight and you can repeat this time and again. And if you remember how the FFT works, you take a sequence of size N, and you break it into two subproblems of size N over two. And if you want to repeat this thing, time and again, so you want to take something of size N, break it into two sub problems of size N over two … Well, for that to happen you want N to be even.
But, you want to do this again. So, you want each one of these N over two chunks … You want to be able to break them up into two even parts. Sorry, two equal-size parts of size N over four. So, you want four to divide N. And then you want a to divide N. And it’s not hard to see that you would like a number of evaluation points that is an exact power of two. And really I think pretty much all FFTs, really fast FFTs that are used in the industry for signal processing or learning, or whatever it is used for, they will take signals of size two to the N. Well, it turns out that when you do our low-degree extensions, you want a similar structure. You would very much like a number of points to evaluate your functions over, that is an exact power of two. So, you can do things like FFT and it even shows up in things like FRI this sort of breaking up into two chunks of even size.
The problem is that the Mersenne prime doesn’t have a set of points that, both form a group, and the size of this group or the number of points on it is an exact power of two. And the reason for that is that the multiplicative group of Mersenne prime is actually two to the 31 minus two, which if you factor IT, it’s two times some odd number. And because of this you won’t have a group. You’re very far from having a group of the right FFT structure. Okay. Now, this problem was already tackled previously in the context of starks in this mathematical paper about using elliptic curves.
And the key idea was to say, well, we really want a group of size two to the K that is defined over the field. And we would like it to be an algebraic group. So, the brilliant three co-authors of the Circle STARK paper said, “Aha! There is a very elegant algebraic structure called the circle, which is defined by the equation X squared plus Y squared minus one. So, it’s the good old circle that we all know from high school elementary.” And it turns out that the points on the circle defined over the Mersenne prime … The number of points there is a very high power of two. Forgot if it’s two to the 30 or two to the 31 and more. So, you have this structure that is a group and is of the right size, and is defined over this Mersenne prime that allows you to have fast arithmetic.
So, if you want to read more about this or explain this to your friends, because I’m assuming a lot of our listeners actually understand this and didn’t need my explanation … You can go and point them to this blog that I wrote about it. But now, unless you have questions about this explanation, I’ll hand it over to Ulrich. So, first of all, I’ll ask, do you have questions about what I said? And if not, I’ll hand it over to Ulrich, to give a little bit more mathematical rigor into what is Circle STARK and why should we care? So, Konstantin is asking something in Russian I think, but I don’t … So, if someone can translate it-

Shahar Papini:
The first word is “super,” no? Super. And then [inaudible 00:45:27].

Eli Ben Sasson:
[foreign language 00:45:27]. Okay. Sorry.

Ulrich Habock:
Yeah. Okay. Well, if someone can translate it, we can try to answer but do extension. Well I guess what exactly do you mean by do extension feels, first of all the circle. Okay, the circle. Just like if we look at x squared plus Y squared minus one, it has a bunch of solutions over the integers. I think it has four solutions, right? 0 1, 1, 0 minus one, zero and minus, well whatever, right? The four points, it has an infinite number of solutions over the field of rational numbers and has many, many more solutions over the reals and even more, well the same, how do you say, the same cardinality, but a lot more solutions over the complex numbers. What am I saying is one field, right? As you move from the field of rational numbers to real numbers, to complex numbers, you get more solutions. And the same thing happens of course with circle Stark. So as you go from Mersenne to some finite extension of Mersenne to the algebraic closure, they’ll have many, many more solutions and also a beautiful group structure. So yes, and maybe it will even come in handy one day. Yeah, good question.
Yeah. Actually, we just one off to the case we use. With classical univariate starks, the line, how many points does it have? As many points as the field is large. So, that’s the size of the field. Right? I’m trying to-

Eli Ben Sasson:
Okay. I assume-

Ulrich Habock:
[inaudible 00:47:11] Do something also in the chat here. And if you take-

Eli Ben Sasson:
Another good question.

Ulrich Habock:
Extension, that’s exactly what you were targeting. If you take let’s say an extension field of degree, E … Wait. I’m typing it. So then, you just have the [inaudible 00:47:31] power of that field. So that’s what used to and with the circle it’s quite similar, it’s just that instead of F, the size of F, it’s always the size of the field plus one. So, we are just one off, but the same. So, the Mersenne M31 has two to the 31 minus one points. The circle has one point more as the field is large. That is exactly two to the 31. If you have a quadratic extension of it with … Yeah. With quadratically many points, in short, then the circle over that extension has the same size of the quadratic extension plus one. So, essentially the same behavior. So, it means for security, if you’re used to scale up to degree four, degree five extension, or beyond, depending on the base field, the same blow up from small field to soundness field you face with the Circle STARK.

Eli Ben Sasson:
So, there’s another question that I can try to answer. “I assume you need a larger field than 31 bit for soundness. I was curious how this is accomplished?” Well it goes like this. For the basic arithmetization, if you take your computation and write down its trace, you need your field to be just as … For perfect soundness, you just need your field to be just as large as the size of the computation. For later interactive steps, you’re right that you need to sample randomness from a large enough field, because part of your error, as we said before, when you do batching … So, you pay roughly, well, the number of batch things, if it’s K, your soundness error is going to be like K over the size of the field. So, you do need a larger field at that point.
And yes, you can then move to the proper extension field. In this case it would be the degree four extension. You’d have 124 bits, which is probably good enough. And the nice thing is that all the steps before that, where you do and FFTs, those parts you can do over the base fields and not have this extra factor. Great question. By the way, terrific questions. “In the paper it states that FFT will be the same as Radix-2 with different twiddle. Can you provide a little more details of this? I mean butterfly diagram would look the same as Radix-2?”

Shahar Papini:
Yes. I try to answering the comments. Yes. It looks exactly the same network, just with different values. I also sent the link to the implementation if you want to look. But, it’s just Radix-2 FFT.
Eli Ben Sasson:
So, I guess the answer to this is yes. Right? For Circle STARK you need projective space and add complex number. I guess. Yes. And then well last one, which this is about speculation. “How does two impact gas fee?” I’m sure it’s going to do a lot of wonders. Right now on Stark and like gas fee is very, very low already, whatever, sub descent for … So, probably would bring it down. I don’t know, by factor of 100? I’m guessing. I don’t know. We’re still very far from knowing the exact effect of it, but won’t make things worse. This is interesting, but we’re in math. Yeah. Sorry, dude. I feel for you, but this is for math discussions. Okay. So, Ulrich, all yours. Can you walk us through just a little bit of what is the math of Circle STARK?

Ulrich Habock:
Yes. So, actually in a nutshell, the Circle STARK is a simpler construction than the easy FFT, that the ECFFT or the elliptic curve stark that Eli mentioned before. So, ECFFT stark, two papers, about 70, 80 pages each. A lot of I algebraic geometry, because elliptic curves. Of course, everybody works with elliptic curves, even developers or [inaudible 00:52:13] people like me. But, really deep understanding of the theory behind it and why do they look so … At least I don’t have it. So, that’s main benefit one is the entire construction that was done with the easy ECFFT and the ECFFT stark. We just took it over to something that is so easy to understand. It’s the circle.
And moreover, because the circle has such a simple structure, you can take this even as an occasion to learn or to approach algebraic geometry if you like to. You don’t have to, it’s not necessary, but I learned basic notions of algebraic geometry by it. And, yeah. With having this simple structure, I think the paper is much more readable. But, to be clear in it, so if it comes to performance, there is no benefit. You could run M31 elliptic curve starks. It would be essentially the same speed versus the Circle STARK. But, the Circle STARK is much easier to implement, much easier to understand, much easier to audit and things like this.
And the reason why it’s so much simpler is that the group, every FFT is related to certain symmetry groups, like a rotation group as we used in the classical FFT. And the same holds also for the circle. So, if a group that is not that complicated as in the ECFFT. So, let’s say, if it just drop some names here and it’s probably going too far, it’s an affine group. Consequently, the objects you encode witness data into, what’s in the univariate stark? It’s polynomials. You place your values over the trace domain and you encode it into polynomials as we used to. Over the ECFFT. You end up with rational functions. Still algebraic of low degree, but rational functions. It’s a bit more harder to understand them. With circle, again you end up with polynomials. These polynomials, they’re polynomials over the circuit, means there are two variables. That’s the only difference.
And actually because of x squared plus y squared is equal to one, that’s the circle identity. Because of that, you can even replace every quadratic power of Y with a quadratic power in X. And if you do this repeatedly, you can boil down the powers. You can kill out all the high powers of Y, ending up just with a linear power in Y. That’s already going too much into details. Let me rewind here and let’s close Pandora’s box. So, it’s a much simpler construction than the ECFFT. We end up encoding again is polynomials and as a consequence, we can take very much directly over from the intuition we gained from the univariate starks, as we know them to the Circle STARK. That’s in short the big advantage of it. I think we are already running out of time now.

Eli Ben Sasson:
Questions?

Ulrich Habock:
Oh, that’s bad. So, I talked too much.

Eli Ben Sasson:
No. No.

Ulrich Habock:
No questions. It’s never good.

Eli Ben Sasson:
Maybe everyone understood everything. But, let’s see. I’ll ask a question. So, we now know that you can use, let’s say, groups-

Ulrich Habock:
There is a question.

Eli Ben Sasson:
Okay.

Ulrich Habock:
There is a question-

Eli Ben Sasson:
[inaudible 00:56:40] Exactly the question I wanted to ask. So, I’ll just say it in more words, because we know that … Thank you, Yao. So, we know that you can use the group of the elliptic curve to do some stuff. And now we know that you can use circles which are genus zero curves. There are so many algebraic curves defined over fields. Will we see more?

Ulrich Habock:
That’s a good question. As-
Eli Ben Sasson:
I can try to answer-

Ulrich Habock:
[inaudible 00:57:15] Let me give first my view, and you have probably the deeper answer here, really. But, just from the way of applied crypto or let’s say from the implementation perspective, as I know from our team at Polygon. Because I was always asking, M17? Is that of use? Because there’s many traces which make sense just to go up to 17 look up arguments for 16-bit range proofs, exactly. It seems to be a sweet spot from the mathematical point of view. Nevertheless, on standard CPU architectures, they’re all set throughout. No. You really don’t gain any performance advantage here anymore. So, that’s one thing. So, I would guess there is already the ECFFT. In terms of performance, I wouldn’t see any improvement with any other type of group here, than the classical FFT as we know it, being log linear to be short. But, yeah. Eli better-

Eli Ben Sasson:
[inaudible 00:58:35] Yeah. I have another answer. I have another answer. So, there’s a story with it. So, this was before Starware. So, I was doing a lot of research on proofs of various sorts and there’s this, let’s call it one-round IOP, which is called a PCP. And there are many, many terrific questions about PCPs and one of them has to do, can you get PCPs with very good parameters and also defined over a constant size field? Whatever, five-bit field, even if your computation size is really large? And so, we realized this was probably more than 10 years ago, that there’s a way it can be done, but only if you use these very exotic algebraic groups that are formed from towers of fields and stuff that I completely don’t understand. And all of my co-authors didn’t understand either, but we knew who to ask.
We needed these very weird exotic structures. We needed the groups that are transitive and over small alphabet and over these asymptotically good codes. So, we wrote to Henning Stichtenoth who was, still is one of the leading authorities on these algebraic geometry codes. And we were expecting him to answer either “Yes. And here’s a result,” or to say, “No. And it’s probably a deep open problem.” But, he wrote back to us. After we asked this question, he wrote back and said, “I know how to do it but I never bothered to write it down.” And we were like, “Really? Man, this is a really important result. You should write it down.” He said, “Well, I didn’t. Whatever. I didn’t have the time,” or something. So, we ended up having this paper that has an appendix by Henning Stichtenoth that uses towers of these fields and curves over them that give you something of benefit. It allows you to use a very, very tiny group, very tiny field and to have very short groups.
All this to say, that math always surprises you and there’s more to do. And I’m sure that more algebraic groups and structures are going to be used for a lot of beautiful things. If I had to guess about one particular thing that may be of interest, right now we’re dealing a lot with very sequential computations. And for that, all you need is a cyclic group and maybe there between the finite field, a circle and elliptic curve, you’re done. But, if you want things that are massively parallelizable, then maybe you want some other group that is much more, not cyclic, but has a few different generators. And then you’re going to need to dig deeper into the algebraic groups. Okay. We definitely went I think on a deep tangent. So, each one of you please summarize. Why are you excited by Circle STARK who wants to go first? We’ll go backwards. So, Ulrich, then Shahar, then Andrew. I already said why I’m excited.

Ulrich Habock:
Yeah. So, why excited? Because, mainly we at Polygon as a lack of our understanding of the ECFFT back then we tried for simple approaches for the M31, Daniel. Daniel Lubarov was very much pushing from early days I joined, we need to move on to smaller fields like M31. And that this little draft, this writeup, we did these simpler constructions. We were very enthusiastic at the beginning and then we discovered, okay, there are some caveats. And the caveats are like when it comes to constraint evaluation in short. So beside it, NTFFT is not as fast as Circle FFT. So, Circle FFT and working together with Shahar here and David, that was super exciting for me personally because it knew, first of all, I could learn algebraic geometry. Second, I could tell Daniel M31 is the way to go for us, at least midterm. And third one, I forgot. But, for sure, I forgot, because I’m too excited.

Eli Ben Sasson:
Okay-

Shahar Papini:
[inaudible 01:03:42] Yeah. I think Circle Stalk and Stwo will open up so many possibilities with this. It’s a great performance and not just scaling Starknet or … It’s open source, so every project that ever needs to prove something will have a fast prover and will come up with these crazy ideas to how to use ZK technology for them. Open up a client-side proving, which would be very efficient to just open up so many things that we can’t even imagine right now. That’s how I see it.

Eli Ben Sasson:
Andrew?

Andrew Milson:
Yeah. I guess I’m excited, because it seems that like Mersenne Prime seems to be the most optimal for CPUs GPUs at least. And I just think that the kind of performance we’ll get will just be insane and that’s pretty exciting. It’s also just cool stuff. For me, I think a lot of the math related to it is new and it’s exciting and fun to learn about. There’s something magical about it. I don’t know. So, yeah. I’m pretty excited.

Eli Ben Sasson:
Yeah. I’ll say that I’m so happy that we’re moving to small fields of any kind. The original ZK stark white paper was over a 64-bit field and even a binary one. And I guess I have this obsession with small fields, because they’re just so efficient. We can talk some other time about why it was definitely the right choice to start with a simpler, a very large prime field for Stone. But, I’m really happy that the performance demands and basically our collective understanding of how to build really efficient stark provers allows us now to use much more efficient and smaller fields. They’re just so fast and efficient. So, that’s why I’m really excited about it.
And also the beauty of this integration of math and engineering is just so powerful. And it’s just so fucking amazing that we take these amazing high-level math things and then bring it into code that is just used massively. It’s going to be used massively by humanity. It’s just mind blowing and it’s such a beautiful thing for cryptography, science and math to be done in this way. I’m just so happy to be observing it. Okay. Thank you everyone. Hope to see you again on another STARK at Home and take care.

Andrew Milson:
Awesome. Thanks, everyone.

Ulrich Habock:
Bye. Thanks for having me.

Eli Ben Sasson:
Thanks for joining.

Shahar Papini:
Bye.

Eli Ben Sasson:
Bye.

Contact us