Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Call _mm256_zeroupper() when leaving avx512 code
Carsten Steger says: simd-avx512.h defines VLEAVE as nothing in FFTW 3.3.7. However, the current Intel® 64 and IA-32 Architectures Optimization Reference Manual, chapter 15.18, recommends the following: - When you have to mix group B instructions with Intel SSE instructions, or you suspect that such a mixture might occur, use the VZEROUPPER instruction whenever a transition is expected. - Add VZEROUPPER after group B instructions were executed and before any function call that might lead to Intel SSE instruction execution. - Add VZEROUPPER at the end of any function that uses group B instructions. - Add VZEROUPPER before thread creation if not already in a clean state so that the thread does not inherit Dirty Upper State. (Group B are instruction types that modify bits 128-511 of vector registers 0-15.) Therefore, I believe it would be prudent to define VLEAVE as _mm256_zeroupper in simd-avx512.h (see the attached patch). At https://software.intel.com/en-us/forums/intel-isa-extensions/topic/704023 Mark Charney says: To be clear, we very much still recommend using VZEROUPPER on Skylake. Even though it does not have the same penalties as earlier designs in that family for mixing AVX and SSE code, we definitely recommend using VZEROUPPER on Skylake. Yes it would obviously be better if there were one solution. For code that has to run on both families, the "common code" solution is to use the Xeon guidelines. If Mark Charney recommends VZEROUPPER, that's good enough for me.
- Loading branch information