Skip to content

Commit

Permalink
Attempt to wrangle docs into a meaningful shape
Browse files Browse the repository at this point in the history
  • Loading branch information
rhelmot committed Aug 9, 2017
1 parent 056f193 commit 6f8c83e
Show file tree
Hide file tree
Showing 13 changed files with 122 additions and 361 deletions.
6 changes: 6 additions & 0 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,3 +131,9 @@ Moving `libcapstone.so` to the same directory as that of Python files will fix t
Are you running Ubuntu 12.04? If so, please stop using a 5 year old operating system! Upgrading is free!

You can also try upgrading pip (`pip install -U pip`), which might solve the issue.

## AttributeError: 'FFI' object has no attribute 'unpack'

You have an outdated version of the `cffi` Python module. angr now requires at least version 1.7 of cffi.
Try `pip install --upgrade cffi`. If the problem persists, make sure your operating system hasn't pre-installed an old version of cffi, which pip may refuse to uninstall.
If you're using a Python virtual environment with the pypy interpreter, ensure you have a recent version of pypy, as it includes a version of cffi which pip will not upgrade.
35 changes: 20 additions & 15 deletions SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,29 @@
* [How to Contribute](HACKING.md)
* [What to Contribute](HELPWANTED.md)
* [Core Concepts](docs/toplevel.md)
* [Solver Engine](docs/solver.md)
* [Loading a binary](docs/loading.md)
* [Program State](docs/states.md)
* [Intermediate Representation](docs/ir.md)
* [Symbolic Execution](docs/symbolic.md)
* [The Execution Engine](docs/simuvex.md)
* [Controlling Execution](docs/paths.md)
* [Bulk Execution - Path Groups](docs/pathgroups.md)
* [Bulk Execution - Surveyors](docs/surveyors.md)
* [The Whole Pipeline](docs/pipeline.md)
* [Working with Data and Conventions](docs/structured_data.md)
* [Loading a binary](docs/loading.md)
* [Solver Engine](docs/solver.md)
* [Program State](docs/states.md)
* [Simulation Managers](docs/pathgroups.md)
* [Execution Engines](docs/simuvex.md)
* [Analyses](docs/analyses.md)
* [CFGAccurate](docs/analyses/cfg_accurate.md)
* [Backward Slicing](docs/analyses/backward_slice.md)
* [Speed Considerations](docs/speed.md)
* [Programming SimProcedures](docs/simprocedures.md)
* Advanced Topics
* [Gotchas](docs/gotchas.md)
* [The Whole Pipeline](docs/pipeline.md)
* [Speed Considerations](docs/speed.md)
* [Intermediate Representation](docs/ir.md)
* [Working with Data and Conventions](docs/structured_data.md)
* [Claripy](docs/claripy.md)
* [Symbolic Memory Addressing](docs/concretization_strategies.md)
* Extending angr
* [Programming SimProcedures](docs/simprocedures.md)
* [Extending the Environment Model](docs/environment.md)
* [Writing Exploration Techniques](docs/otiegnqwvk.md)
* [Writing Analyses](docs/analysis_writing.md)
* [Adding Support for New Platforms](docs/angr-bf.md)
* [Examples](docs/examples.md)
* [FAQ](docs/faq.md)
* [Gotchas](docs/gotchas.md)
* [Changelog](CHANGELOG.md)

* [Migrating from angr 6](MIGRATION.md)
81 changes: 23 additions & 58 deletions docs/faq.md
Original file line number Diff line number Diff line change
@@ -1,59 +1,43 @@
# FAQ
# Frequently Asked Questions

This is a collection of commonly-asked "how do I do X?" questions and other general questions about angr, for those too lazy to read this whole document.

## How do I load a binary?
If your question is of the form "how do I fix X issue", see also the Troubleshooting section of the [install instructions](../INSTALL.md).

A binary is loaded by doing:
## Why is it named angr?
The core of angr's analysis is on VEX IR, and when something is vexing, it makes you angry.

```python
p = angr.Project("/path/to/your/binary")
```

## Why am I getting terrifying error messages from LibVEX printed to stderr?

This is something that LibVEX does when it gets fed invalid instructions.
VEX is not designed for static analysis, it's designed for instrumentation, so it's mode of handling bad data is to freak out as badly as it possibly can.
There's no way of shutting it up, short of patching it.

We've already patched VEX so that instead of exiting, bringing down the python interpreter with it, it sends up a message that turns into a python exception than can later be caught by analysis.
Long story short, *this should not affect your analysis if you're just using builtin angr routines.*

## How can I get verbose debug messages for specific angr modules ?
## How should "angr" be stylized?
All lowercase, even at the beginning of sentences. It's an anti-proper noun.

## How can I get diagnostic information about what angr is doing?
angr uses the standard `logging` module for logging, with every package and submodule creating a new logger.

### Debug messages for everything
The most simple way to get a debug output is the following:
The simplest way to get debug output is the following:
```python
import logging
logging.basicConfig(level=logging.DEBUG) # ajust to the wanted debug level
logging.getLogger('angr').setLevel('DEBUG')
```

You may want to use `logging.INFO` or whatever else instead.

### More granular control
Each angr module has its own logger string, usually all the python modules
above it in the hierarchy, plus itself, joined with dots. For example,
`angr.analyses.cfg`. Because of the way the python logging module works, you
can set the verbosity for all submodules in a module by setting a verbosity
level for the parent module. For example, `logging.getLogger('angr.analyses').setLevel(logging.INFO)`
will make the CFG, as well as all other analyses, log at the INFO level.

### Automatic log settings
If you're using angr through IPython, you can add a startup script in your
IPython profile to set various logging levels.
You may want to use `INFO` or whatever else instead.
By default, angr will enable logging at the `WARNING` level.

Each angr module has its own logger string, usually all the python modules above it in the hierarchy, plus itself, joined with dots.
For example, `angr.analyses.cfg`.
Because of the way the python logging module works, you can set the verbosity for all submodules in a module by setting a verbosity level for the parent module.
For example, `logging.getLogger('angr.analyses').setLevel('INFO')` will make the CFG, as well as all other analyses, log at the INFO level.

## Why is a CFG taking forever to construct?
You want to load the binary without shared libraries loaded. If they are loaded,
like they are by default, the analysis will try to construct a CFG through your
libraries, which is almost always a really bad idea. Add the following option
to your `Project` constructor call: `load_options={'auto_load_libs': False}`
## Why is angr so slow?
[It's complicated!](speed.md)

## How do I find bugs using angr?
It's complicated!
The easiest way to do this is to define a "bug condition", for example, "the instruction pointer has become a symbolic variable", and run symbolic exploration until you find a state matching that condition, then dump the input as a testcase.
However, you will quickly run into the state explosion problem.
How you address this is up to you.
Your solution may be as simple as adding an `avoid` condition or as complicated as implementing CMU's MAYHEM system as an [Exploration Technique](otiegnqwvk.md).

## Why did you choose VEX instead of another IR (such as LLVM, REIL, BAP, etc)?

We had two design goals in angr that influenced this choice:

1. angr needed to be able to analyze binaries from multiple architectures. This mandated the use of an IR to preserve our sanity, and required the IR to support many architectures.
Expand All @@ -76,7 +60,6 @@ To support multiple IRs, we'll either want to abstract these things or translate


### My load options are ignored when creating a Project.

CLE options are an optional argument. Make sure you call Project with the following syntax:

```python
Expand All @@ -89,29 +72,11 @@ b = angr.Project('/bin/true', load_options)
```

## Why are some ARM addresses off-by-one?

In order to encode THUMB-ness of an ARM code address, we set the lowest bit to one.
This convention comes from LibVEX, and is not entirely our choice!
If you see an odd ARM address, that just means the code at `address - 1` is in THUMB mode.

## I get an exception that says ```AttributeError: 'FFI' object has no attribute 'unpack'``` What do I do?

You have an outdated version of the `cffi` Python module. angr now requires at least version 1.7 of cffi.
Try `pip install --upgrade cffi`. If the problem persists, make sure your operating system hasn't pre-installed an old version of cffi, which pip may refuse to uninstall.
If you're using a Python virtual environment with the pypy interpreter, ensure you have a recent version of pypy, as it includes a version of cffi which pip will not upgrade.

## When importing angr, I get an exception that says `ImportError: ERROR: fail to load the dynamic library.`

The capstone pip package is sometimes broken. You can reinstall capstone with:

```bash
pip install -I --no-use-wheel capstone
```

This will rebuild the dynamic library and your install will work.

## How do I serialize angr objects?

[Pickle](https://docs.python.org/2/library/pickle.html) will work.
However, python will default to using an extremely old pickle protocol that does not support more complex python data structures, so you must specify a [more advanced data stream format](https://docs.python.org/2/library/pickle.html#data-stream-format).
The easiest way to do this is `pickle.dumps(obj, -1)`.
15 changes: 8 additions & 7 deletions docs/ir.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Intermediate Representation

Because angr deals with widely diverse architectures, it must carry out its analysis on an intermediate representation. We use Valgrind's IR, "VEX", for this. The VEX IR abstracts away several architecture differences when dealing with different architectures, allowing a single analysis to be run on all of them:
In order to be able to analyze and execute machine code from different CPU architectures, such as MIPS, ARM, and PowerPC in addition to the classic x86, angr performs most of its analysis on an _intermediate representation_, a structured description of the fundamental actions performed by each CPU instruction.
By understanding angr's IR, VEX \(which we borrowed from Valgrind\), you will be able to write very quick static analyses and have a better understanding of how angr works.

The VEX IR abstracts away several architecture differences when dealing with different architectures, allowing a single analysis to be run on all of them:

- **Register names.** The quantity and names of registers differ between architectures, but modern CPU designs hold to a common theme: each CPU contains several general purpose registers, a register to hold the stack pointer, a set of registers to store condition flags, and so forth. The IR provides a consistent, abstracted interface to registers on different platforms. Specifically, VEX models the registers as a separate memory space, with integer offsets (e.g., AMD64's `rax` is stored starting at address 16 in this memory space).
- **Memory access.** Different architectures access memory in different ways. For example, ARM can access memory in both little-endian and big-endian modes. The IR abstracts away these differences.
Expand Down Expand Up @@ -54,7 +57,7 @@ Becomes this VEX IR:
PUT(16) = t3
PUT(68) = 0x59FC8:I32

Now that you understand VEX, you can actually play with some VEX in angr: We use a library called PyVEX (https://github.com/angr/pyvex) that exposes VEX into Python. In addition, PyVEX implements its own pretty-printing so that it can show register names instead of register offsets in PUT and GET instructions.
Now that you understand VEX, you can actually play with some VEX in angr: We use a library called [PyVEX](https://github.com/angr/pyvex) that exposes VEX into Python. In addition, PyVEX implements its own pretty-printing so that it can show register names instead of register offsets in PUT and GET instructions.

PyVEX is accessable through angr through the `Project.factory.block` interface. There are many different representations you could use to access syntactic properties of a block of code, but they all have in common the trait of analyzing a particular sequence of bytes. Through the `factory.block` constructor, you get a `Block` object that can be easily turned into several different representations. Try `.vex` for a PyVEX IRSB, or `.capstone` for a Capstone block.

Expand All @@ -67,12 +70,12 @@ Let's play with PyVEX:
>>> b = angr.Project("/bin/true")

# translate the starting basic block
>>> irsb = b.factory.block(b.entry).vex
>>> irsb = proj.factory.block(proj.entry).vex
# and then pretty-print it
>>> irsb.pp()

# translate and pretty-print a basic block starting at an address
>>> irsb = b.factory.block(0x401340).vex
>>> irsb = proj.factory.block(0x401340).vex
>>> irsb.pp()

# this is the IR Expression of the jump target of the unconditional exit at the end of the basic block
Expand Down Expand Up @@ -100,7 +103,7 @@ Let's play with PyVEX:
... print ""

# pretty-print the condition and jump target of every conditional exit from the basic block
... for stmt in irsb.statements:
>>> for stmt in irsb.statements:
... if isinstance(stmt, pyvex.IRStmt.Exit):
... print "Condition:",
... stmt.guard.pp()
Expand All @@ -115,5 +118,3 @@ Let's play with PyVEX:
# here is one way to get the type of temp 0
>>> print irsb.tyenv.types[0]
```

Keep in mind that this is a *syntactic* respresentation of a basic block. That is, it'll tell you what the block means, but you don't have any context to say, for example, what *actual* data is written by a store instruction. We'll get to that next.
2 changes: 2 additions & 0 deletions docs/loading.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,8 @@ True
```

Furthermore, you can use `proj.hook_symbol(name, hook)`, providing the name of a symbol as the first argument, to hook the address where the symbol lives.
One very important usage of this is to extend the behavior of angr's built-in library SimProcedures.
Since these library functions are just classes, you can subclass them, overriding pieces of their behavior, and then use your subclass in a hook.

## So far so good!

Expand Down
10 changes: 0 additions & 10 deletions docs/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,5 @@

These are blurbs describing each of the sections that need to be rewritten.

### Program States

So far, we've only used angr's simulated program states \(SimState objects\) in the barest possible way in order to demonstrate basic concepts about angr's operation. Here, you'll learn about the structure of a state object and how to interact with it in a variety of useful ways.

### The Simulation Manager

The most important control interface in angr is the SimulationManager, which allows you to control symbolic execution over groups of states simultaneously, applying search strategies to explore a program's state space. Here, you'll learn how to use it.

### Intermediate Representation

In order to be able to analyze and execute machine code from different CPU architectures, such as MIPS, ARM, and PowerPC in addition to the classic x86, angr performs most of its analysis on an _intermediate representation_, a structured description of the fundamental actions performed by each CPU instruction. By understanding angr's IR, VEX \(which we borrowed from Valgrind\), you will be able to write very quick static analyses and have a better understanding of how angr works.

6 changes: 4 additions & 2 deletions docs/pathgroups.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
Bulk Execution and Exploration - Path Groups
============================================
# Simulation Managers

The most important control interface in angr is the SimulationManager, which allows you to control symbolic execution over groups of states simultaneously, applying search strategies to explore a program's state space.
Here, you'll learn how to use it.

Path groups are just a bunch of paths being executed at once. They are also the future.

Expand Down
4 changes: 4 additions & 0 deletions docs/paths.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
**Congratulations! You found this page! Please leave.**

This interface no longer exists.

Program Paths - Controlling Execution
=====================================

Expand Down
6 changes: 4 additions & 2 deletions docs/solver.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ Variables are not tied to any one state, and can exist freely.

## More Solving Methods

TODO: write this as soon as the new API exists
TODO: write this as soon as the new API exists. Include any_n_int, solve-as-string

## Floating point numbers

Expand All @@ -249,7 +249,6 @@ TODO
| Concat | Concatenates any number of expressions together into a new expression. | `x.concat(y, ...)` |
| RotateLeft | Rotates an expression left. | `x.RotateLeft(8)` |
| RotateRight | Rotates an expression right. | `x.RotateRight(8)` |
| Reverse | Reverses the bytes of an expression. | `x.reversed` |
| And | Logical And (on boolean expressions) | `solver.And(x == y, x > 0)` |
| Or | Logical Or (on boolean expressions) | `solver.Or(x == y, y < 10)` |
| Not | Logical Not (on a boolean expression) | `solver.Not(x == y)` is the same as `x != y` |
Expand All @@ -263,3 +262,6 @@ TODO
| SGE | Signed greater than or equal to. | Check if x is greater than or equal to y: `x.SGE(y)` |
| SGT | Signed greater than. | Check if x is greater than y: `x.SGT(y)` |

## Extra Operations

TODO: document fun stuff like chop, reverse, variables, symbolic...
2 changes: 1 addition & 1 deletion docs/speed.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Regardless, there are a lot of optimizations and tweaks you can use to make angr
- *Don't load shared libraries unless you need them*.
The default setting in angr is to try at all costs to find shared libraries that are compatible with the binary you've loaded, including loading them straight out of your OS libraries.
This can complicate things in a lot of scenarios.
If you're performing an analysis that's anything more abstract than bare-bones symbolic execution, you might want to make the tradeoff of sacrificing accuracy for tractability.
If you're performing an analysis that's anything more abstract than bare-bones symbolic execution, ESPECIALLY control-flow graph construction, you might want to make the tradeoff of sacrificing accuracy for tractability.
angr does a reasonable job of making sane things happen when library calls to functions that don't exist try to happen.
- *Use hooking and SimProcedures*.
If you're enabling shared libraries, then you definitely want to have SimProcedures written for any complicated library function you're jumping into.
Expand Down
Loading

0 comments on commit 6f8c83e

Please sign in to comment.