Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LLVM memset intrinsic for compatibility and performance #795

Merged
merged 2 commits into from
Sep 17, 2024

Conversation

cpunion
Copy link
Contributor

@cpunion cpunion commented Sep 14, 2024

The runtime.Zeroinit function isn't compatible with ssa.NaiveForm mode (fully disable SSA optimizations for debugging), because the compiler will generate recursive code.

Another reason is that in the disassembly code, runtime.Zeroinit isn't inlined; it calls the C function _memset. These should be replaced with the LLVM memset intrinsic and be compiled to inlined code. See https://llvm.org/docs/LangRef.html#llvm-memset-intrinsics

c.Memset maybe also need to be replaced with an llgo:link instruction to rewrite it to the memset intrinsic.

Test code:

type E struct {
}
type F struct {
	e *E
	i int
	f float64
}

func Bar() {
	f := F{e: &E{}}
	println(f.e)
}

Before:

define void @main.Bar() {
_llgo_0:
  %0 = alloca %main.F, align 8
  %1 = call ptr @"github.com/goplus/llgo/internal/runtime.Zeroinit"(ptr %0, i64 24)
  %2 = getelementptr inbounds %main.F, ptr %1, i32 0, i32 0
  %3 = call ptr @"github.com/goplus/llgo/internal/runtime.AllocZ"(i64 0)
  store ptr %3, ptr %2, align 8
  %4 = getelementptr inbounds %main.F, ptr %1, i32 0, i32 0
  %5 = load ptr, ptr %4, align 8
  call void @"github.com/goplus/llgo/internal/runtime.PrintPointer"(ptr %5)
  call void @"github.com/goplus/llgo/internal/runtime.PrintByte"(i8 10)
  ret void
}
_main.Bar:
00000001000007d0	sub	sp, sp, #0x30
00000001000007d4	stp	x29, x30, [sp, #0x20]
00000001000007d8	add	x0, sp, #0x8
00000001000007dc	mov	w8, #0x18
00000001000007e0	mov	x1, x8
00000001000007e4	bl	"_github.com/goplus/llgo/internal/runtime.Zeroinit"
00000001000007e8	str	x0, [sp]
00000001000007ec	mov	x0, #0x0
00000001000007f0	bl	"_github.com/goplus/llgo/internal/runtime.AllocZ"
00000001000007f4	mov	x8, x0
00000001000007f8	ldr	x0, [sp]
00000001000007fc	str	x8, [x0]
0000000100000800	ldr	x0, [x0]
0000000100000804	bl	"_github.com/goplus/llgo/internal/runtime.PrintPointer"
0000000100000808	mov	w0, #0xa
000000010000080c	bl	"_github.com/goplus/llgo/internal/runtime.PrintByte"
0000000100000810	ldp	x29, x30, [sp, #0x20]
0000000100000814	add	sp, sp, #0x30
0000000100000818	ret

_github.com/goplus/llgo/internal/runtime.Zeroinit:
0000000100003a48	stp	x29, x30, [sp, #-0x10]!
0000000100003a4c	mov	x2, x1
0000000100003a50	mov	w1, #0x0
0000000100003a54	bl	0x10000ad98 ; symbol stub for: _memset
0000000100003a58	ldp	x29, x30, [sp], #0x10
0000000100003a5c	ret

After:

define void @main.Bar() {
_llgo_0:
  %0 = alloca %main.F, align 8
  call void @llvm.memset(ptr %0, i8 0, i64 24, i1 false)
  %1 = getelementptr inbounds %main.F, ptr %0, i32 0, i32 0
  %2 = call ptr @"github.com/goplus/llgo/internal/runtime.AllocZ"(i64 0)
  store ptr %2, ptr %1, align 8
  %3 = getelementptr inbounds %main.F, ptr %0, i32 0, i32 0
  %4 = load ptr, ptr %3, align 8
  call void @"github.com/goplus/llgo/internal/runtime.PrintPointer"(ptr %4)
  call void @"github.com/goplus/llgo/internal/runtime.PrintByte"(i8 10)
  ret void
}
_main.Bar:
00000001000007d0	sub	sp, sp, #0x30
00000001000007d4	stp	x29, x30, [sp, #0x20]
00000001000007d8	str	xzr, [sp, #0x8]
00000001000007dc	str	xzr, [sp, #0x10]
00000001000007e0	str	xzr, [sp, #0x18]
00000001000007e4	mov	x0, #0x0
00000001000007e8	bl	"_github.com/goplus/llgo/internal/runtime.AllocZ"
00000001000007ec	str	x0, [sp, #0x8]
00000001000007f0	ldr	x0, [sp, #0x8]
00000001000007f4	bl	"_github.com/goplus/llgo/internal/runtime.PrintPointer"
00000001000007f8	mov	w0, #0xa
00000001000007fc	bl	"_github.com/goplus/llgo/internal/runtime.PrintByte"
0000000100000800	ldp	x29, x30, [sp, #0x20]
0000000100000804	add	sp, sp, #0x30
0000000100000808	ret

The instruction call void @llvm.memset(ptr %0, i8 0, i64 24, i1 false) will be compiled to 3 str instructions.

Calls memset on a large more memory range will be compiled like below:

call void @llvm.memset(ptr %0, i8 0, i64 1304, i1 false)
00000001000007d8	sub	sp, sp, #0x520
00000001000007dc	add	x0, sp, #0x8
00000001000007e0	mov	w8, #0x518
00000001000007e4	mov	x1, x8
00000001000007e8	bl	0x10000a964 ; symbol stub for: _bzero

Copy link

codecov bot commented Sep 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.54%. Comparing base (9f8b9ea) to head (c0e1e31).
Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #795   +/-   ##
=======================================
  Coverage   97.53%   97.54%           
=======================================
  Files          19       19           
  Lines        4668     4683   +15     
=======================================
+ Hits         4553     4568   +15     
  Misses         97       97           
  Partials       18       18           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xushiwei xushiwei merged commit 7803477 into goplus:main Sep 17, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants