forked from FFTW/fftw3
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
60 lines (40 loc) · 2.07 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
TODO before FFTW-$2\pi$:
* MPI version
* DCT/DST codelets? which kinds?
* investigate the addition-chain trig computation
* I can't believe that there isn't a closed form for the omega
array in Rader.
* merge genfft-k7 generator with the main genfft branch.
genfft-k7 was written by Stefan Kral based on the fftw-2.1 genfft.
(The k7 stuff is becoming obsolete because it is not 64-bit clean.
We should phase it out in the next release.)
* implement rdft/problem2 for even radices other than 2.
* convolution problem type(s)
* Explore the idea of having n < 0 in tensors, possibly to mean
inverse DFT.
* better estimator: possibly, let "other" cost be coef * n, where
coef is a per-solver constant determined via some big numerical
optimization/fit.
* vector radix, multidimensional codelets
* it may be a good idea to unify all those little loops that do
copying, (X[i], X[n-i]) <- (X[i] + X[n-i], X[i] - X[n-i]),
and multiplication of vectors by twiddle factors.
* Pruned FFTs (basically, a vecloop that skips zeros).
* Try FFTPACK-style back-and-forth (Stockham) FFT. (We tried this a
few years ago and it was slower, but perhaps matters have changed.)
* dif, difsq simd codelets
* Generate assembly directly for more processors, or maybe fork gcc. =)
* ensure that threaded solvers generate (block_size % 4 == 0)
to allow SIMD to be used.
* consider whether it would be simpler to have a uniform description
(IO, m0, m1, W) for twiddle problems, where IO always points
to the beginning of the array, W always points to the beginning
of the twiddle table, and the problem is solved in the
range m \in [m0, m1). Currently we have a messy situation in which
we sometimes use [m0, m1), sometimes we use [mstart, mstart+mcount),
and we have to adjust W with the X(twiddle_shift)() hack. Codelets
should obey the uniform protocol as well.
* memoize triggen.
* orb problems with FFTW_PRESERVE_INPUT ought to use SIMD somehow.
(Currently they reduce to an r2r problem that does not use SIMD.)
* eliminate alignment hacks, which ought to be obsolete by now.