Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to target Wizard #54

Open
6 of 10 tasks
ejrgilbert opened this issue Jun 12, 2024 · 4 comments
Open
6 of 10 tasks

How to target Wizard #54

ejrgilbert opened this issue Jun 12, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@ejrgilbert
Copy link
Owner

ejrgilbert commented Jun 12, 2024

Progress Tracker

  • extend CLI
  • Can iterate over AST and collect necessary args to pass
  • Can generate a module that encodes exports correctly
  • split out dynamic and static parts of the predicate
  • Handle provided functions (provider, package, event, probe)
  • Fix assumption on DataType for special "argN"/"immN"/etc.
  • Library linking
  • Test infrastructure for wizard target
  • Allocation function generation
  • Report variables (can use END event on Engine to flush final state!)

Overview

The emitter that targets Wizard for instrumentation could emit a Wasm module that specifies where a probe should be attached to a program by placing the pattern matching rule in the name of a function export.
All exported functions are these pattern matching rules, e.g.:

( export "match_rule / pred_func(pred_args) / (probe_args))" (func 0))

Wizard will read this in and run the pattern matching rules to find all places in the program to call func 0. Note that in whamm a predicate can include static AND dynamic data. This means that the static portion of the predicate will reside in the match rule's predicate logic and the dynamic portion will be "pushed down" into the probe's function (in this case func 0).
This means that the predicate needs to be split out to have static predication in the export name and dynamic predication in the func 0 instructions!

Breaking Down the Export Name

match_rule / pred_func(pred_args) / (probe_args))

match_rule

Purpose: This is used to specify where to attach the probe in the application module.

This will basically just be the normal probe specification, but without the 'mode' portion, e.g.:
"wasm:opcode:call"

pred_func

Purpose: This function is used to further predicate on what constitutes a match site at match time.

This will be a pointer to the function in mon.wasm that contains the static predication logic. It could either be a function name OR a function ID preceded by $, e.g.:
"$call_predicate", or "$1"

pred_args

**Purpose: ** This is used to list the arguments that the engine will need to pass when invoking the predicate function.

The ordering of this is important! The predicate function expects the same ordering as requested in the match rule. An example:
"fid, pc"

probe_args

**Purpose: ** This is used to list the arguments that the engine will need to pass when invoking the probe function.

The ordering of this is important! The predicate function expects the same ordering as requested in the match rule. An example:
"fid, pc, arg0"

The provided globals that will need to be supported by wizard

To see this, use the info utility of whamm:

cargo run -- info -fg --spec "wasm:opcode:*"

Note: There is still some work to be done, these are not exhaustive.

An Example

The whamm script:

i32 count;

wasm:opcode:call:before / (fid == 3 && pc != 2) / {
   if (arg0 == 1) {
       count++;
   }
}

The resulting mon.wasm:

(module
  (type (;0;) (func (result i32)))
  (type (;1;) (func (param i32 i32) (result i32)))
  (type (;2;) (func (param i32)))
  (func $get_count (;0;) (type 0) (result i32)
    global.get 0
  )
  (func (;1;) (type 1) (param i32 i32) (result i32)
    local.get 0
    i32.const 3
    i32.eq
    local.get 1
    i32.const 2
    i32.ne
    i32.and
  )
  (func (;2;) (type 2) (param i32)
    local.get 0
    i32.const 1
    i32.eq
    if ;; label = @1
      global.get 0
      i32.const 1
      i32.add
      global.set 0
    end
  )
  (global (;0;) (mut i32) i32.const 0)
  (export "get_count" (func $get_count))
  (export "wasm:opcode:call / $1(fid, pc) / (arg0)" (func 2))
)

@ejrgilbert ejrgilbert added the enhancement New feature or request label Jun 12, 2024
@ahuoguo
Copy link
Collaborator

ahuoguo commented Jul 2, 2024

A more detailed writeup of what we are doing here

Instead of emitting virgil source code and loading to src/monitors, we are going to emit wasm code (whamm.wasm) and run it by wizeng --monitors=whamm.wasm test/monitors/app.wasm.

When the option provided to --monitors= ends with .wasm, it automatically loads a WhammMonitor, which defines how we want to interface with the wizard engine.

For now, we can create new whamm.wasm files by running v3c-dev -target=wasm test/whamm/PcTracer.v3. The probe action corresponds totrace_pc and the export defines where we fire the probe (it only supports wasm:bytecode:<pattern>:before for now (note that the mode is before is implicit)

Discussion of dynamic info

For now, we only support probes that are determined statically. To support probe that fires dynamically, eg

i32 i;
wasm:bytecode:call:before /
    arg0 == 1
/ {
    i++;
}

we want to define a placeholder externref to access the FrameAccessor. The following is an example PcTracer.wat that tries to get the real pc. We currently need to extend WhammMonitor.v3 to support this.

(module
 (type (;0;) (func (param i32 i32)))
 (type (;1;) (func (param i32)))
 (type (;2;) (func))
 (type (;3;) (func (param i32 i32 i32)))
 (import "wizeng" "puts" (func $wizeng.puts (type 0)))
 (import "wizeng" "puti" (func $wizeng.puti (type 1)))
 (import "wizeng:frame_accessor" "get_pc" (func $get_pc (param externref) (result i32)))
 (func $PcTracer.main (type 2)
  return)
 (func $PcTracer.trace_pc (param externref)
  i32.const 65536
  call $PcTracer.puts
  (call $get_pc (local.get 0))
  call $wizeng.puti
  i32.const 65552
  call $PcTracer.puts
  return)
 (func $PcTracer.puts (type 1) (param i32) ... )
 (func (;5;) (type 0) (param i32 i32)
  local.get 1
  call $PcTracer.puts)
 (func (;6;) (type 0) (param i32 i32)
  local.get 1
  call $wizeng.puti)
 (func (;7;) (type 3) (param i32 i32 i32)
  local.get 1
  local.get 2
  call $wizeng.puts)
 (table (;0;) 4 4 funcref)
 (memory (;0;) 3 3)
 (export "main" (func $PcTracer.main))
 (export "memory" (memory 0))
 (export "wasm:bytecode:loop" (func $PcTracer.trace_pc))
 (elem (;0;) (i32.const 1) func 5 6 7)
 (data (;0;) (i32.const 65536) "\05\00\00\00\08\00\00\00hello @ \05\00\00\00\01\00\00\00\0a\00\00\00"))

@ejrgilbert
Copy link
Owner Author

PXL_20240717_150241265

@ahuoguo
Copy link
Collaborator

ahuoguo commented Jul 31, 2024

a

@ejrgilbert ejrgilbert self-assigned this Oct 8, 2024
Repository owner deleted a comment from ahuoguo Oct 8, 2024
Repository owner deleted a comment from ahuoguo Oct 8, 2024
@ejrgilbert
Copy link
Owner Author

ejrgilbert commented Nov 21, 2024

Report Variables on Wizard

Report variables will be flushed at the end of program execution, on program exit (observable by the engine itself). On program exit, functions will be called that flush the data.

The functions

There will be one function per report variable type used in the script. So, if the script only uses i32 report variables, there will be one function called to flush the final state.

These functions will contain the logic to handle its respective datatype. For i32, for map, etc.

There will be 2 global variables used per data type as well:

  • one global that points to the memory location of the first allocated report variable of this type
  • one global that points to the most-recently allocated report variable of this type

The global variables are necessary since they will be used both during variable allocation and during variable flush.

Memory layout

We will have a linked list in memory of the report variables. One linked list per type.

next value
mem_offset: i32 var_data: datatype_len

Each variable location will have two pieces of information:

  1. mem_offset: i32: The memory offset to the next report variable of this type, it is the offset from the current memory location.
  2. var_data: datatype_len: The actual value of the variable, the type/len is dependent on the type of the variable. Since a function is called that iterates over each datatype's linked list of report variables, it will know how to parse the variable contents!

Variable allocation

On the first allocation for a datatype, the global that points to the first memory location is updated to point to the memory address.

When a new variable is allocated in memory, the global containing the memory address of the most-recently allocated variable of that type is used to update the next pointer to the difference between the previous and the current memory address (to find the offset). Then, that global is updated to the current memory address.

The value of the variable is then placed in the var_data slot.

Flush function logic

For each datatype, there will be a function that does something like the following pseudocode:

func report_DT() {
    // Set the first variable to look at the pointer to the first variable for some datatype
    let curr_var = mem.get(FIRST_VAR);

    do {
        flush_DT(curr_var.val);
    } while curr_var.next != NULL;
}

func flush_DT(val: DT) {
    // logic to flush this value
}

Note, this logic can't be factored out into a commonly-used function due to the strict typed-ness of function calls. Consider the following WAT fleshing some of this out:

(module
    ;; ... $alloc function that allocates memory for i32s

    (global $i32_start i32 (i32.const 0))
    (func $iter_i32s ()
        (local $next i32)
        (local $curr i32)
        (local $offset i32)
        (local.set $offset (i32.const 4)) ;; offset to skip over $next

        (block $finished
            ;; Load the next addr
            (local.set $next (i32.load (global.get $i32_start)))
            ;; Load the first value (specific to the i32 DT)
            (local.set $curr (i32.load (i32.add (local.get $offset) (global.get $i32_start))))
            (loop $continue
                ;; Flush the $curr value
                (call $flush_i32 (local.get $curr))

                ;; If $next is null, we're finished!
                (br_if $finished (i32.eq (i32.const NULL) (local.get $next)))

                ;; Otherwise, prepare the next value
                (local.set $curr (i32.load (i32.add (local.get $offset) (local.get $next)))) ;; use $next to pull value
                (local.set $next (i32.load (local.get $next))) ;; use $next to update to the new $next
            )
        )
    )
    (func $flush_i32 (param $val i32)
        ;; logic that flushes an i32 report variable
    )

    (func $on_end ()
        (call $iter_i32s (global.get $i32_start)
    )
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants