Intel probably wants AVX-512 just for benchmarks

First things first, you might not know what AVX is. So let’s start from there. Advanced Vector Extensions (AVX) are instructions that extend the x86_64 architecture and we call these SIMD, Single Instruction Multiple Data.

When we have an array of N items, and we want to perform the same operation to every one of those, instead of looping through, SIMD allows us to do batches. The old AVX-256 allows to process 256bits of data at once. This could be processing 32 items of 8bits each in one go.

AVX-512 would enable CPU to process double the amount of data in roughly the same time. So it sounds amazing, isn’t it?

Not so fast. Most programs don’t use SIMD instructions. Also, using AVX-512 means that new binaries need to be compiled, and support to be added in compilers and programs. If just some few CPU support AVX-512, having the opcode/assembly inside just for those, means bigger and generally slower binaries overall.

I first heard about AVX-512 from Linus Torvalds, and I was puzzled reading: <<I hope AVX-512 dies a painful death, and that Intel starts fixing real problems instead of trying to create magic instructions to then create benchmarks that they can look good on,>>

As he states, these things eat up a lot of space in the CPU die, and it means that less cores or other more meaningful features can be built in. In the end, it means less cores and less performance overall.

AVX-512 is most useful for benchmarks. Number-crunching applications that most users don’t have real benefit aside of displaying a score.

Here’s what I think happened: Intel hasn’t been able to use almost any benchmark to showcase its processors lately and had to go esoteric applications to claim that their processor is better than AMD ones. Hence, someone probably thought that just by implementing an AVX-512 and partnering with benchmark tools to update them would make Intel CPUs win on benchmarks.

But this win would be completely on paper, on real tasks the CPU would be quite slower than depicted. This includes games, browsing, office tools, compiling, etc.

Some applications might see a boost from it, but it really depends on the internals and design. Video encoding and photo manipulation might benefit.

But even then, by how much? SIMD instructions create a lot of heat. Now imagine an 8 core / 16 thread trying to crunch 16 AVX-512 instructions at once. Most probably will hit a thermal limit. Yes, it could do really well in single thread… but by definition, if you have a good algorithm that allows you to use AVX, then you can parallelize it. Let me rephrase this in a different way: if a program or algorithm design is incompatible with multithreading, then AVX is also out of the question.

Therefore, performance lifts from AVX-512 in single core are pointless. And in multi-core limited by thermal output and memory bandwidth.

Most compilers don’t use AVX unless explicitly told to; sometimes it has to be manually added. Other times, the compiler skips doing it by some weird assumption. SIMD instructions aren’t exactly “automatically added”; if you want them, at least you need to verify/check that the compiler is actually putting them there. You can’t assume that your code will be automatically translated to use SIMD.

So yeah, Intel might come winning benchmarks on the next generation, so what? I want to see actual performance on actual programs.