forked from BinaryAnalysisPlatform/bap
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] master from BinaryAnalysisPlatform:master #114
Open
pull
wants to merge
257
commits into
r4b3rt:master
Choose a base branch
from
BinaryAnalysisPlatform:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The new version might give us much better performance, see the [discussion][1] for further details. [1]: https://discuss.ocaml.org/t/ann-set-up-ocaml-2-0-0-alpha/7895
* Use semver-style version matching on CI workflow Signed-off-by: Sora Morimoto <[email protected]> * Update docker/build-push-action to v2 Signed-off-by: Sora Morimoto <[email protected]> * Format GitHub workflow files Signed-off-by: Sora Morimoto <[email protected]> * Enable dune cache on the Ubuntu runners Signed-off-by: Sora Morimoto <[email protected]>
* switches to FrontC.4.x adds the preprocessing option Since FrontC.4.x supports new types from C99+ we had to update BAP as well, both its representation of the C types as well as the parsing procedure. We also added an option (disabled by default) to apply preprocessing (with an ability to specify the path to it) since FrontC.4.1+ is much more robust and is capable of chewing linux system headers. * adds the FrontC constraint
The old script was following the old share folder layout and, as a result, bap is not working from debian as it is missing the neccessary lisp files.
The docker images are no longer pushed to dockerhubn after #1312. First of all, because the secret names were accidentally changed, and secondly, because the `push` parameter defaults to `false` in the `v2` version of the action.
This file is assuming the presence of the testing snapshot opam repository, which is not always available. It is also not required to get ppx_bap from it as it is already for a long time in the upstream repository.
We reuse the existing arm and thumb abi but make sure that the correct architecture is passed to the memory-related operations.
* initial scaffolding * adds support for nameless registers in the disasm backend We don't want to model unique (temporary) registers as normal physical registers, so we add an option to set the name to -1 to indicate that the register is a unique anonymous variable. * the minimal implementation * introduces sub-instructions to the disassembler There was some provision for that in the backend but never fully implemented. This feature enables seambless integeration of the ghidra backend but may also used for VLIW architecture, e.g., hexagon llvm backend is using it. * fixes the handling of the unique namespace * implements semantics of sequential instructions * specifies semantics for most of the pcode instructions * passes the basic instruction info to the sequential semantics value * fixes the semantics of LOAD * improves handling of pcode namespaces Fixes varnode classification (no longer treating addresses as registers). Also to prevent name clashes between virtual variables from different scopes we use unique prefixes (aka shortcuts in ghidra's parlance) to distinguish between them. This also makes the generated code more readable and closer to the originally genereated pcode. * fixes the subpiece implementation * puts all pcode opcodes into the pcode namespace Since they are the same for each architecture and now we have this feature. * implements support for user-defined opcodes we translate `CALLOTHER(<name>,<arg1>,...,<argN>)` to `<name>(<arg1>,...,<argN>)` * translates local (intra-instruction) branches into GOTOs Branches in p-code are overloaded by the type of destination. If the destination is a virtual address then it is a normal branch and if it is a constant (a varnode from the constant namespace) then it is an intra-instruction branch that represents the inner instruction logic. * fixes the satisfies function to account for subinstructions before that it was only looking into the top-level instruction * an attempt to pack subinstructions inside an instruction (breaks lots of stuff) * publishes subinstructions I will probably forfeit this approach, still investigating. * introduces the null object to the knowledge base It was actually already there, hidden in the [obj] domain. Now it is properly documented with well defined semantics. * uses the null object to represent unlabeled blocks in the lifters Also, more lifters now respect the passed label to the blk operator. * removes the subinstruction slot from instruction * adds labels reification to BIL semantics, also reifies gotos all using special encoding * implements intra-instruction gotos * adds the sequence number documentation. * fixes error hanlding in the goto-subinstruction primitive * implements a proper disassembler factory that scans for ldefs files So far not working quite correctly, as the default variables (like word size, etc) are not properly set. Investigating... * adds a proper processor initialization * implements proper command-line interface Now the plugin is able to list the targets and pass the path to ghidra root. * adds a tentative --x86-backend option to enable ghidra for x86 * fixes offset and address calculation * adds `is-symbol` semantic primitive * fixes overloading of the Primus Lisp semantic definitions The overloading was prevented by the attributes computation, which expected no overloads. Also, makes error message more readable. * tries to overload p-code operations based on their operand types * passes operands types per each operand, removes extra opcodes It looks like that not only branches are overloaded in p-code but all operations, e.g., we can have `INT_ADD (mem:x) (mem:y) (mem:z)` that represents `mem[x] <- mem[y] + mem[z]`. We could also resolve this overloading by adding suffixes to operations, e.g., `INT_ADDmr`, `INT_ADDmm`, and so on, but will explode the number of opcodes, especially in the presence of user-defined opcodes. * catches the bad or unimplemented instructions during decoding ghidra raises an exception if an instruction is not valid or there is no semantics for it. * adds signed ordering Primus Lisp primitives and implemented corresponding pcode operations * passes full information about the operand type from the backend In p-code the semantics of an operation is defined by the types of its operands. In case of the unique variables the type is not known to us so we have to pass it. This commit extends the previous approach, where we were passing only the kind of the operand (mem vs imm) to the full type qualification, where the type of memory is represented with Nil and the type of immediates is represented by its size in the number of bits. * adds the missing BOOL_NEGATE operation * fixes the negation operator (pcode represent bool as byte) * removes aliased registers from the register table * fixes the selection of the default backend for x86 it should be llvm if not specified otherwise * switches to caseless ordering of variables Changes and documents the ordering of variables. Variables are no compared caseless and the ordering is made loosely compatible with the caseless lexicographical ordering of the textual representation of variables' identifiers, e.g., ``` ``` * enables ghidra backend for the arm targets Right now it is disabled by default, use `--arm-backend=ghidra` to enable. * uses pcode-x86 as the CT language for pcode in x86 targets * adds ghidra backend to mips * minor pretty-printing tweaks to make things more readable * improves primitives performance in Primus Lisp With this optimization Primus Lisp-based lifters run five times faster. This especially important for ghidra backend lifters, which are fully dependent on Primus Lisp. The idea is to let the primitive implementors provide the body of their primitive so that every primitive is not computed via the semantics promise but is invoked directly. Another big idea is to provide such an interface that will allow to factor out computations that are common to the target. The same idea could be extrapolated to all promises. * optimization: improves name resolution in Primus Lisp Uses maps for names, not lists. Surprisingly not much of improvement, something about 5%. * optimizes unit computatation by hoisting it out of the loop * optimization: do not request lisp arguments if not necessary To compute list arguments we need to invoke the theory and reflect them into it. The resulting work is discarded if the name is not bound. The optimization checks if the name is bound by the program and only after that asks for the list of arguments. * adds ghidra to powerpc (works out of the box) * adds the `--x86-64-backend` option it is just an alias for `--x86-backend`, for consistency * enables ghidra for riscv (doesn't run out of the box) * enables the ghidra backend in CI/CD We will build Ghidra only on supported targets, right now it is Ubuntu Bionic. We will soon add more targets and more packages. The bap packages will be now split into `bap-core`, `bap`, and `bap-extra`. The `bap-core` will contain the minimal part of platform without analyses. The `bap` package will include most of the analyses, finally, `bap-extra` will include heavy-weight analysis (in terms of extra dependencies and build times), e.g., the symbolic executor, and the ghidra backend. * moves the ghidra install section lower * downgrades the ubuntu version on CI/CD * tries to fix the macOS build * disables ghidra in the opam/opam used in CI
The `popn` was always poping from the first register and `align_even` was also doing `popn` instead of aligning.
It should be using the address size everywhere (instead it was using the number of bits in the architecture, which is 8 for AVR8).
The `getBytesInAddress` virutal member-function for ELF files is implemented incorrectly in LLVM, ``` template <class ELFT> uint8_t ELFObjectFile<ELFT>::getBytesInAddress() const { return ELFT::Is64Bits ? 8 : 4; } ``` It is actually a static file that returns 32 for ELF32 and 64 for ELF64, which has nothing to do with the actual bitness of the target architecture address. The if is using the information obtained from the target (triple) and falls back to `getBytesInAddress` only if the target is not 16, 32, or 64 bit (the target interface is also strange as it doesn't allow any other values).
* Added flatten/unroll pass. * Update plugins/flatten/flatten_main.ml Co-authored-by: Ivan Gotovchits <[email protected]> * Update plugins/flatten/flatten_main.ml Co-authored-by: Ivan Gotovchits <[email protected]> * Update plugins/flatten/flatten_main.ml Co-authored-by: Ivan Gotovchits <[email protected]> * Update plugins/flatten/flatten_main.ml Co-authored-by: Ivan Gotovchits <[email protected]> * applies `make indent` * adds the missing core_kernel dependency Co-authored-by: Ivan Gotovchits <[email protected]>
For the context, since the Ghidra PR the semantics of the [blk] operation in Core Theory was refined and corresponding theories were updated. Before #1326 all theories were just ignoring the labels passed to the [blk] operator. The reason for that was that the [blk] operation serves two purposes. It enables concatenation of data and control flow effects. And it attaches a label to that sequence so that it could be referenced elsewhere. Before #1326 it was only used for the first purpose, i.e., to merge data and control effects into the single effect. But for Ghidra we needed an ability to create labels (as ghidra is relying on Branch/Cbranch) instructions everywhere, even to express the intra-instruction logic, not real control flow. Now, the theories have to take into account the label passed to the blk operation, when they produce their denotations, unless the label is `Label.null`. If the label is `Label.null` then the operation is denotes just a sequence of data and control flow effects. Moreover, denotations are allowed to coalesce several blocks together. But if the label is non-null then the denotation has to preserve it. Before this PR the BIR theory wasn't fully respecting the passed labels and was sometimes optimizing them away, for example, when the label was attached to an empty denotation. This PR takes care of keeping the passed labels and at the same time preserving the minimal form of the generated IR. Of course assuming that lifters are using the `blk` operation correctly, i.e., that they are not passing non-null labels to blocks that they do not plan to invoke later. In other words, if you have a lifter that uses the `blk` operation you need to update it and pass `Theory.Label.null` there instead of the fresh label. This will have the same semantics as it had before. Passing non-null label now has a different semantics.
The problem was not in heuristic per se, but cyclic dependency between the abi pass that discovers __libc_start_main, main, abi passes, and api passes. Ideally, we should reimplement our ABI/API infrastructure using the Knowledge Base, which was originally designed to handle such dependencies, but it will take much more time than we currently have. Therefore, right now we discover `__libc_start_main` in the ABI pass, to be sure that we have types applied to it (which we need as we will use them later to get the storage for the first argument). And we delay the `main` function discovery after the api pass. After we find main we also want types to be applied. Unfortunately, we can't can call the api pass again (which is responsible for that), but we known the prototype of main, so we don't really need to parse the C headers anymore and can manually apply the prototype and translate it to the arg terms. Now the main function discovery works perfectly for the programs that use glibc runtime, so yeah we can finally execute `/bin/true` and `/bin/false` :) ``` $bap /bin/false --run --primus-print-obs=call-return | grep main (call-return (main 0 0x40000008 1)) $ bap /bin/true --run --primus-print-obs=call-return | grep main (call-return (main 0 0x40000008 0)) ```
This theory is used for debugging and introspection purposes. The bug was in the denotation of `branch` and `repeat` operations, which were denoted as empty denotations if any of its effects are empty. It is now corrected.
This operation doesn't make sense for graphs with unlabeled nodes and shall return the graph unchanged. A corresponding clarification comment is added. Also, cleans up the documentation a little bit.
* Introduces context variables to the knowledge library * makes theory a part of the context Introduces the [Theory.current] and [Theory.with_current] functions.
* adds the register aliasing * fixes the msb function in the lisp library * uses signed extension during operands unification in Primus Lisp Before that it was unsigned but in general we should put more control on it, as right now we have 32 aarch64 bit operations performed in 64 bits.
The Primus Lisp interpreter uses some state and was previously implemented as the knowledge monad transformed into the state monad. The new implementation just uses the newly introduces context variables, which is much easier and a little bit more performant. It also enables us to persist the interpreter state between the calls so that we can generate more efficient code (propagate consts between instructions), especially for the ghidra backend.
Despite that ocaml/odoc#539 is fixed long time ago it never landed into the 1.x branch of odoc, which is the stable and the released version. All beta versions do not have the stack overflow bug on core_kernel, but render unreadable documentation (tried 2.0.0-beta till the latest master). Therefore, we had to fallback to the `--no-pkg-deps` workaround, so that we can get at least the BAP documentation. Note, this workaround means that the links to core kernel and other external libraries will be broken, only links across bap universe will work.
which is presumabely fixed but is only available in the master and is not released. We will start with the master branch and if it will break stick to the last working sha.
The run plugin was resetting the state of the toplevel knowledge base to the initial and effectively discarding the knowledge that was accumulated when the Program.t and Symtab.t were constructed as well discarding any information that might be added to the knowledge base in project passes. The run pass was capturing the state at the time when the Project.t is not fully ready, and at then end was using `Toplevel.set` to reset any accumualted knowledge. Effectively, this bug is a race condition, which we could prevent if the toplevel interface provided safe functions. This will be addressed in the next couple of pull requests (they will be more aggresive and change the semantics of several constructs so I decided to split them for this hotfix). Big thanks to @anwarmamat for providing a minimal reproducible example, without it would be nearly impossible to identify it.
…1559) * Fix not propagating term attrs in `mangle_names` during lifting * Fix style * Update testsuite revision * Log linking of lisp definitions * Emit relocations for data symbols in LLVM * KB interface for image loaders * KB-ify the LLVM loader and prefer existing bias for the unit * Add deps for `bap_llvm` library * Add bitvec dep for `bap-std` * Allocate libraries automatically We will just round up to the nearest page after the previous binary. Note that the main executable can go wherever it wants. * Log the mangling of names in `stub-resolver` * Load and map library segments in the Primus loader + apply relocations * Ignore empty filename * Rename and also log the steps of the Primus loader * Adds `relative-relocation` to the schema * Fix style * Mark `llvm:relative-relocation` only when a symbol is unavailable * Remove unused headers * Fix for LLVM >= 12 Co-authored-by: bmourad01 <[email protected]>
* fixes the raw (non directory or zip, but file) recipes * drops llvm depext (as it already should come from bap-conf-llvm) * switch from ubuntu-latest to focal * downgrades macos workers from latest to 11
* Fixes the stub resolver tests * Preserves transitivity * Drop `bap-relation` Co-authored-by: bmourad01 <[email protected]>
…1567) When we perform relocations on object files, we do the linker work not the loader, therefore we do not need to check if the memory is writable or even loadable. The easiest option is to extend the Primus memory interface with primitives that perform writing without any checks.
Switches to dune as the main build system but keeps support for OASIS with ocamlbuild and omake backends. The implementation utilizes the dune sites and plugins system, but it still supports plugins that are built with bapbuild. To ease the transition, the OASIS-built BAP is also able to load dune plugins. Later the OASIS build system will be removed from BAP but the support for bundled with bapbuild plugins will remain. Since dune sites impose their own restrictions we had to introduce a new bap-common package that manages all the sites, starting from plugins and ending with semantic files.
Removes duplicating packages, provided that they have the same implementation. When dune executes a binary (either via `dune exec` or `dune test`) it sets an internal `DUNE_DIR_LOCATIONS` environment variable, which affects the selection of plugins in bap, even if bap is installed from an indepenent repository. I am still not sure if this is a bug or a feature of dune, but to prevent double-loading we have to deduplicate plugins. We silently remove duplicating names only if all implementation that match the name are the same. We compare implementation using the sum of md5 sums of files that comprise the plugin folder. The choice of md5 sum as the implementation witness instead of parsing the META files allows us to preserve the abstraction of dune plugins, as the contents of the folder is the implementation detail of dune. This increases the probability that this code will still work, when they will change the implementation. In addition, this change arounds another subtle bug in the dune plugin system. When a plugin is removed with `opam remove`, opam leaves an empty folder in the plugin folder, which we should ignore, otherwise dune plugin loader will fail on it.
Normalized Keys --------------- 1) Keys passed to `ocaml tools/configure.ml` can now use dashes or underscores interchangeably 2) The keys are case sensitives 3) During rewritign the keys can use either underscores or dashes, and can use any registry. Basically the comparison function makes the following keys equal: `foo-bar`, `FOO_BAR`, `FOO-bar`, and `foo bar`. New Configuration Parameters ---------------------------- Publishes several configuration parameters via Bap_main.Configuration and makes it available via `bap config`, e.g., ``` $ bap config version: 2.6.0-alpha build_id: f4a6c54 datadir: /home/ivg/.local/share/bap cachedir: /home/ivg/.cache/bap sysdatadir: /home/ivg/factory/bap-dune/_opam/share/bap-common libdir: /home/ivg/factory/bap-dune/_opam/lib confdir: /home/ivg/factory/bap-dune/_opam/etc/bap-common bindir: /home/ivg/factory/bap-dune/_opam/bin plugindir: /home/ivg/factory/bap-dune/_opam/lib/bap-common/plugins ``` Updates bapbundle to Hanle dune Plugins --------------------------------------- ``` bapbundle remove foo ``` Will now correctly delete a plugin `foo` even if it a dune plugin. (Note, that it will only remove the folder from the plugins directory and wont remove any installed libraries, for that use opam or your build system)
When bap installed from the testing repository it fails on frontends due to a missing dependency ``` Bap_plugins_loader_backend referenced from /home/runner/work/bap/bap/_opam/lib/bap-plugins/bap_plugins.cmxa(Bap_common) ``` Also improves the opam workflow and adds functional checks.
This will break too much downstream code, as well as our own tooling like bapbuild
If bundled plugins are loaded before dune packaged plugins, the latter will fail on modules that were linked by the former. The old bap plugin system has now way to notify the dune plugin system on what units are loaded (no worries the dynlink system used by both will prevent double-loading). Instead of catching the Unit_already_loaded exception from dune, we better first load all dune packages and only after that handle the old bundled plugins with our old loader, which is capabled of handling correctly this exception.
It's unclear how it was passing the tests but there is still a missing module the bap-plugins OASIS specification that breaks downstream builds.
* Fix use of `set-nzcv-from-registers` in Thumb `cmp` * Fix docstring * Fix `t2ADDrs`
* Adds missing POSIX API definition for `strchrnul` * `__stack_chk_fail`
* Adds compatibility with LLVM 15 * Update `bap-llvm.opam`
This must be manually enabled by setting the variable `llvm-shared` to a true value in the current opam switch, similar to the existing `llvm-config` variable used by conf-bap-llvm. * bap_llvm/config/llvm_configurator.ml: use `llvm-config --shared-mode` to get the appropriate linking mode * bap_llvm/llvm_disasm.cpp: add missing include for Optional.h
Some were deprecated, some removed, and other just failing.
Occasionally, llvm refines their instruction definitions and change the number of operands, for example this [commit][1] added an extra destination register operand. For the old lifters we used a pretty strict rules for operands deconstructing and for the bts, btc, and btr instructions we were expecting two operands (now there are three of them). I decided to relax the deconstructor instead of changing the definition of instructions, first to preserve backward compatibility with the older versions of llvm and second because many other instructions could be affected by the same changes from the llvm side, and I don't see any harm if we will accept instructions with one more argument, knowing that destinations are always prepended. [1]: llvm/llvm-project@fe96ff7
Eases local dune-base developement, so that you can easily start local development without too much external dependencies hassle. Allows you to disable bap-extra packages and just do `dune build` and everything should work. For the context, Dune doesn't provide nice options for multi-project monorepos. The dune files represent the static information about the project and it is presumed that there is an external tool that generates them. Cf., tezos manifest that generates dune files from the OCaml EDSL or whatever they use in Janestreet (jenga?). Originally, the plan was to implement the same mechanism, e.g., generate the dune-project file during the configuration phase. But right now I don't have enough time for that, so I am pushing a simple workaround. All packages in bap-extra are now marked as allow_empty, indicating that they can produce now artifacts. Their corresponding libraries are now optional (if a dependency is not met, then the library is silently not built). The only special case is ghidra that doesn't have dependencies trackable by Dune (i.e., OCaml libraries, besides base and bap), so to prevent it from building I added a small check that ghidra library is installed.
Since v4 they all must have distinct names across all workflows.
as per a workaround suggested in ocaml/setup-ocaml#678
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )