Again, and this needs to be reiterated sufficiently often, please do no forget to `make clean` after (re-)configuration and before building (again) using `make`.
The easiest way to boosting speed is by allowing the compiler to apply optimization to the code. To let the compiler know, the configuration process can take in the optionally specified compiler flag `-O3`:
`./configure CFLAGS="-O3"`
The `tools/n2n-benchmark` tool reports speed-ups of 200% or more! There is no known risk in terms of instable code or so.
## Hardware Features
Some parts of the code significantly benefit from compiler optimizations (`-O3`) and platform features
such as NEON, SSE and AVX. It needs to be decided at compile-time. Hence if compiling for a specific
platform with known features (maybe the local one), it should be specified to the compiler – for
example through the `-march=sandybridge` (you name it) or just `-march=native` for local use.
So far, the following portions of n2n's code benefit from hardware features:
```
AES: AES-NI
ChaCha20: SSE2, SSSE3
SPECK: SSE2, SSSE3, AVX2, AVX512, (NEON)
Random Numbers: RDSEED, RDRND (not faster but more random seed)
```
The compilations flags could easily be combined:
`./configure CFLAGS="-O3 -march=native"`.
There are reports of compile errors showing `n2n_seed': random_numbers.c:(.text+0x214): undefined reference to _rdseed64_step'` even though the CPU should support it, see #696. In this case, best solution found so far is to disable `RDSEED` support by adding `-U__RDSEED__` to the `CFLAGS`.
## SPECK – ARM NEON Hardware Acceleration
By default, SPECK does not take advantage of ARM NEON hardware acceleration even if compiled with `-march=native`. The reason is that the NEON implementation proved to be slower than the 64-bit scalar code on Raspberry Pi 3B+, see [here](https://github.com/ntop/n2n/issues/563).
Your specific ARM mileage may vary, so it can be enabled by configuring the definition of the `SPECK_ARM_NEON` macro:
`./configure CFLAGS="-DSPECK_ARM_NEON"`
Just make sure that the correct architecture is set, too. `-march=native` usually works quite well.