-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential performance improvement? Something similar to Julia's @simd
?
#1493
Comments
The correct way to achieve that goal the pythran way would be through openmp simd directives. i'll have a look! |
@paugier : using the appropriate vector form:
This gets vecotrized with [edited] |
Note:
works just fine too. |
And I particularly like this form, even if it's arguably less intuitive:
|
Nice new versions! Unfortunately, I don't get any speedup compared to the original code. Worse, Without
And with
|
Are you using the fixed-size version in the pythran annotation? |
The previous results were with the fixed-size version in the pythran annotation. With the standard annotations, it gives: Without
With
With
|
Hi, It is doing JIT using only pure python code to achieve this. I found it quite amazing because the code is really simple. @hope.jit
def mandel(x, y, max_iters):
"""
Given the real and imaginary parts of a complex number,
determine if it is a candidate for membership in the Mandelbrot
set given a fixed number of iterations.
"""
c = complex(x, y)
z = 0.0j
for i in range(max_iters):
z = z*z + c
if (z.real*z.real + z.imag*z.imag) >= 4:
return i
return max_iters
@hope.jit
def create_fractal(min_x, max_x, min_y, max_y, image, iters, w, h):
height = h
width = w
pixel_size_x = (max_x - min_x) / width
pixel_size_y = (max_y - min_y) / height
for x in range(width):
real = min_x + x * pixel_size_x
for y in range(height):
imag = min_y + y * pixel_size_y
color = mandel(real, imag, iters)
image[y, x] = color
image = np.zeros((1024, 1536), dtype = np.uint8)
start = timer()
create_fractal(-2.0, 1.0, -1.0, 1.0, image, 50)
dt = timer() - start
print("Mandelbrot created in %f s" % dt)
imshow(image)
show() But using its own examples, it appears to be near 2 times faster than numba default jit from scipy.integrate import odeint
import hope
import numba
P, d, B, G, A = 0, 0.0001, 0.0095, 0.0001, 0.0001
N = 10**3
def f_nat(y, t):
dy = np.empty(3)
dy[0] = P - B*y[0]*y[1] - d*y[0]
dy[1] = B*y[0]*y[1] + G*y[2] - A*y[0]*y[1]
dy[2] = d*y[0] + A*y[0]*y[1] - G*y[2]
return dy
y0, t = np.array([500., 0., 5.]), np.linspace(0, 5., N)
@hope.jit
def f_hope(y, t, P, d, B, G, A):
dy = np.empty(3)
dy[0] = P - B*y[0]*y[1] - d*y[0]
dy[1] = B*y[0]*y[1] + G*y[2] - A*y[0]*y[1]
dy[2] = d*y[0] + A*y[0]*y[1] - G*y[2]
return dy
@hope.jit
def f_opt(y, t, dy, P, d, B, G, A):
dy[0] = P - B*y[0]*y[1] - d*y[0]
dy[1] = B*y[0]*y[1] + G*y[2] - A*y[0]*y[1]
dy[2] = d*y[0] + A*y[0]*y[1] - G*y[2]
return dy
import numba
@numba.jit
def f_numb(y, t, P, d, B, G, A):
dy = np.empty(3)
dy[0] = P - B*y[0]*y[1] - d*y[0]
dy[1] = B*y[0]*y[1] + G*y[2] - A*y[0]*y[1]
dy[2] = d*y[0] + A*y[0]*y[1] - G*y[2]
return dy
@numba.jit
def f_numb_opt(y, t, dy, P, d, B, G, A):
dy[0] = P - B*y[0]*y[1] - d*y[0]
dy[1] = B*y[0]*y[1] + G*y[2] - A*y[0]*y[1]
dy[2] = d*y[0] + A*y[0]*y[1] - G*y[2]
return dy
dy = np.empty(3)
print ("native python")
%timeit odeint(f_nat, y0, t)
print ("hope")
%timeit odeint(f_hope, y0, t, args=(P, d, B, G, A))
print ("hope without allocation")
%timeit odeint(f_opt, y0, t, args=(dy, P, d, B, G, A))
print ("numba")
%timeit odeint(f_numb, y0, t, args=(P, d, B, G, A))
print ("numba without allocation")
%timeit odeint(f_numb_opt, y0, t, args=(dy, P, d, B, G, A))
Using a bigger value for N this disappear on my laptop showing no diff between native python, numba or hope. But there may be a nice optimization to find in hope JIT. Honestly I have not investigate more how the library is working. Hope it can help anyway. |
Hi Serge,
i hope you are fine at home.
I investigated about the bad result for Pythran mentioned in https://github.com/Thierry-Dumont/BenchmarksPythonJuliaAndCo/wiki/5-The-FeStiff-benchmark (see the resulting PR Thierry-Dumont/BenchmarksPythonJuliaAndCo#12).
I isolated two issues. The first one is about the lack of native struct (so a method uses Python
getattr
and 2 function calls instead of 1). Using__slots__
and PyPy removes most of the overhead!Then, even just for the numerical kernel (the code is here https://github.com/paugier/bench_integrate_callback/tree/master/struct_simd/only_simd), Pythran is slower than Julia on this example (ratio Julia/Pythran ~ 2.4).
The implementation in Julia contains the same information than the one in Python, except for a macro
@simd
(https://docs.julialang.org/en/v1/base/base/#Base.SimdLoop.@simd, which is used to "annotate a for loop to allow the compiler to take extra liberties to allow loop re-ordering").Even without
@simd
Julia is faster than Pythran, but it seems that it helps Julia to be faster (here, Pythran runs in 0.30 µs):@simd
is somehow similar to OpenMP comments, so i was wondering if we could also add such information in a comment (something like# simd for
) just before the for loop. Could Pythran do something useful with that?The text was updated successfully, but these errors were encountered: