Attempt to wrangle docs into a meaningful shape

c0lorw0lf · Aug 9, 2017 · 6f8c83e · 6f8c83e
1 parent 056f193
commit 6f8c83e
Show file tree

Hide file tree

Showing 13 changed files with 122 additions and 361 deletions.
diff --git a/INSTALL.md b/INSTALL.md
@@ -131,3 +131,9 @@ Moving `libcapstone.so` to the same directory as that of Python files will fix t
 Are you running Ubuntu 12.04? If so, please stop using a 5 year old operating system! Upgrading is free!
 
 You can also try upgrading pip (`pip install -U pip`), which might solve the issue.
+
+## AttributeError: 'FFI' object has no attribute 'unpack'
+
+You have an outdated version of the `cffi` Python module.  angr now requires at least version 1.7 of cffi.
+Try `pip install --upgrade cffi`.  If the problem persists, make sure your operating system hasn't pre-installed an old version of cffi, which pip may refuse to uninstall.
+If you're using a Python virtual environment with the pypy interpreter, ensure you have a recent version of pypy, as it includes a version of cffi which pip will not upgrade.
diff --git a/SUMMARY.md b/SUMMARY.md
@@ -6,24 +6,29 @@
   * [How to Contribute](HACKING.md)
   * [What to Contribute](HELPWANTED.md)
 * [Core Concepts](docs/toplevel.md)
-* [Solver Engine](docs/solver.md)
-* [Loading a binary](docs/loading.md)
-* [Program State](docs/states.md)
-* [Intermediate Representation](docs/ir.md)
-* [Symbolic Execution](docs/symbolic.md)
-  * [The Execution Engine](docs/simuvex.md)
-  * [Controlling Execution](docs/paths.md)
-  * [Bulk Execution - Path Groups](docs/pathgroups.md)
-  * [Bulk Execution - Surveyors](docs/surveyors.md)
-  * [The Whole Pipeline](docs/pipeline.md)
-* [Working with Data and Conventions](docs/structured_data.md)
+  * [Loading a binary](docs/loading.md)
+  * [Solver Engine](docs/solver.md)
+  * [Program State](docs/states.md)
+  * [Simulation Managers](docs/pathgroups.md)
+  * [Execution Engines](docs/simuvex.md)
 * [Analyses](docs/analyses.md)
   * [CFGAccurate](docs/analyses/cfg_accurate.md)
   * [Backward Slicing](docs/analyses/backward_slice.md)
-* [Speed Considerations](docs/speed.md)
-* [Programming SimProcedures](docs/simprocedures.md)
+* Advanced Topics
+  * [Gotchas](docs/gotchas.md)
+  * [The Whole Pipeline](docs/pipeline.md)
+  * [Speed Considerations](docs/speed.md)
+  * [Intermediate Representation](docs/ir.md)
+  * [Working with Data and Conventions](docs/structured_data.md)
+  * [Claripy](docs/claripy.md)
+  * [Symbolic Memory Addressing](docs/concretization_strategies.md)
+* Extending angr
+  * [Programming SimProcedures](docs/simprocedures.md)
+  * [Extending the Environment Model](docs/environment.md)
+  * [Writing Exploration Techniques](docs/otiegnqwvk.md)
+  * [Writing Analyses](docs/analysis_writing.md)
+  * [Adding Support for New Platforms](docs/angr-bf.md)
 * [Examples](docs/examples.md)
 * [FAQ](docs/faq.md)
-* [Gotchas](docs/gotchas.md)
 * [Changelog](CHANGELOG.md)
-
+  * [Migrating from angr 6](MIGRATION.md)
diff --git a/docs/faq.md b/docs/faq.md
@@ -1,59 +1,43 @@
-# FAQ
+# Frequently Asked Questions
 
 This is a collection of commonly-asked "how do I do X?" questions and other general questions about angr, for those too lazy to read this whole document.
 
-## How do I load a binary?
+If your question is of the form "how do I fix X issue", see also the Troubleshooting section of the [install instructions](../INSTALL.md).
 
-A binary is loaded by doing:
+## Why is it named angr?
+The core of angr's analysis is on VEX IR, and when something is vexing, it makes you angry.
 
-```python
-p = angr.Project("/path/to/your/binary")
-```
-
-## Why am I getting terrifying error messages from LibVEX printed to stderr?
-
-This is something that LibVEX does when it gets fed invalid instructions.
-VEX is not designed for static analysis, it's designed for instrumentation, so it's mode of handling bad data is to freak out as badly as it possibly can.
-There's no way of shutting it up, short of patching it.
-
-We've already patched VEX so that instead of exiting, bringing down the python interpreter with it, it sends up a message that turns into a python exception than can later be caught by analysis.
-Long story short, *this should not affect your analysis if you're just using builtin angr routines.*
-
-## How can I get verbose debug messages for specific angr modules ?
+## How should "angr" be stylized?
+All lowercase, even at the beginning of sentences. It's an anti-proper noun.
 
+## How can I get diagnostic information about what angr is doing?
 angr uses the standard `logging` module for logging, with every package and submodule creating a new logger.
 
-### Debug messages for everything
-The most simple way to get a debug output is the following:
+The simplest way to get debug output is the following:
 ```python
 import logging
-logging.basicConfig(level=logging.DEBUG) # ajust to the wanted debug level
+logging.getLogger('angr').setLevel('DEBUG')
 ```
 
-You may want to use `logging.INFO` or whatever else instead.
-
-### More granular control
-Each angr module has its own logger string, usually all the python modules
-above it in the hierarchy, plus itself, joined with dots. For example,
-`angr.analyses.cfg`. Because of the way the python logging module works, you
-can set the verbosity for all submodules in a module by setting a verbosity
-level for the parent module. For example, `logging.getLogger('angr.analyses').setLevel(logging.INFO)`
-will make the CFG, as well as all other analyses, log at the INFO level.
-
-### Automatic log settings
-If you're using angr through IPython, you can add a startup script in your
-IPython profile to set various logging levels.
+You may want to use `INFO` or whatever else instead.
+By default, angr will enable logging at the `WARNING` level.
 
+Each angr module has its own logger string, usually all the python modules above it in the hierarchy, plus itself, joined with dots.
+For example, `angr.analyses.cfg`.
+Because of the way the python logging module works, you can set the verbosity for all submodules in a module by setting a verbosity level for the parent module.
+For example, `logging.getLogger('angr.analyses').setLevel('INFO')` will make the CFG, as well as all other analyses, log at the INFO level.
 
-## Why is a CFG taking forever to construct?
-You want to load the binary without shared libraries loaded. If they are loaded,
-like they are by default, the analysis will try to construct a CFG through your
-libraries, which is almost always a really bad idea. Add the following option
-to your `Project` constructor call: `load_options={'auto_load_libs': False}`
+## Why is angr so slow?
+[It's complicated!](speed.md)
 
+## How do I find bugs using angr?
+It's complicated!
+The easiest way to do this is to define a "bug condition", for example, "the instruction pointer has become a symbolic variable", and run symbolic exploration until you find a state matching that condition, then dump the input as a testcase.
+However, you will quickly run into the state explosion problem.
+How you address this is up to you.
+Your solution may be as simple as adding an `avoid` condition or as complicated as implementing CMU's MAYHEM system as an [Exploration Technique](otiegnqwvk.md).
 
 ## Why did you choose VEX instead of another IR (such as LLVM, REIL, BAP, etc)?
-
 We had two design goals in angr that influenced this choice:
 
 1. angr needed to be able to analyze binaries from multiple architectures. This mandated the use of an IR to preserve our sanity, and required the IR to support many architectures.
@@ -76,7 +60,6 @@ To support multiple IRs, we'll either want to abstract these things or translate
 
 
 ### My load options are ignored when creating a Project.
-
 CLE options are an optional argument. Make sure you call Project with the following syntax:
 
 ```python
@@ -89,29 +72,11 @@ b = angr.Project('/bin/true', load_options)
 ```
 
 ## Why are some ARM addresses off-by-one?
-
 In order to encode THUMB-ness of an ARM code address, we set the lowest bit to one.
 This convention comes from LibVEX, and is not entirely our choice!
 If you see an odd ARM address, that just means the code at `address - 1` is in THUMB mode.
 
-## I get an exception that says ```AttributeError: 'FFI' object has no attribute 'unpack'``` What do I do?
-
-You have an outdated version of the `cffi` Python module.  angr now requires at least version 1.7 of cffi.
-Try `pip install --upgrade cffi`.  If the problem persists, make sure your operating system hasn't pre-installed an old version of cffi, which pip may refuse to uninstall.
-If you're using a Python virtual environment with the pypy interpreter, ensure you have a recent version of pypy, as it includes a version of cffi which pip will not upgrade.
-
-## When importing angr, I get an exception that says `ImportError: ERROR: fail to load the dynamic library.`
-
-The capstone pip package is sometimes broken. You can reinstall capstone with:
-
-```bash
-pip install -I --no-use-wheel capstone
-```
-
-This will rebuild the dynamic library and your install will work.
-
 ## How do I serialize angr objects?
-
 [Pickle](https://docs.python.org/2/library/pickle.html) will work.
 However, python will default to using an extremely old pickle protocol that does not support more complex python data structures, so you must specify a [more advanced data stream format](https://docs.python.org/2/library/pickle.html#data-stream-format).
 The easiest way to do this is `pickle.dumps(obj, -1)`.
diff --git a/docs/ir.md b/docs/ir.md
@@ -1,6 +1,9 @@
 # Intermediate Representation
 
-Because angr deals with widely diverse architectures, it must carry out its analysis on an intermediate representation. We use Valgrind's IR, "VEX", for this. The VEX IR abstracts away several architecture differences when dealing with different architectures, allowing a single analysis to be run on all of them:
+In order to be able to analyze and execute machine code from different CPU architectures, such as MIPS, ARM, and PowerPC in addition to the classic x86, angr performs most of its analysis on an _intermediate representation_, a structured description of the fundamental actions performed by each CPU instruction.
+By understanding angr's IR, VEX \(which we borrowed from Valgrind\), you will be able to write very quick static analyses and have a better understanding of how angr works.
+
+The VEX IR abstracts away several architecture differences when dealing with different architectures, allowing a single analysis to be run on all of them:
 
 - **Register names.** The quantity and names of registers differ between architectures, but modern CPU designs hold to a common theme: each CPU contains several general purpose registers, a register to hold the stack pointer, a set of registers to store condition flags, and so forth. The IR provides a consistent, abstracted interface to registers on different platforms. Specifically, VEX models the registers as a separate memory space, with integer offsets (e.g., AMD64's `rax` is stored starting at address 16 in this memory space).
 - **Memory access.** Different architectures access memory in different ways. For example, ARM can access memory in both little-endian and big-endian modes. The IR abstracts away these differences.
@@ -54,7 +57,7 @@ Becomes this VEX IR:
     PUT(16) = t3
     PUT(68) = 0x59FC8:I32
 
-Now that you understand VEX, you can actually play with some VEX in angr: We use a library called PyVEX (https://github.com/angr/pyvex) that exposes VEX into Python. In addition, PyVEX implements its own pretty-printing so that it can show register names instead of register offsets in PUT and GET instructions.
+Now that you understand VEX, you can actually play with some VEX in angr: We use a library called [PyVEX](https://github.com/angr/pyvex) that exposes VEX into Python. In addition, PyVEX implements its own pretty-printing so that it can show register names instead of register offsets in PUT and GET instructions.
 
 PyVEX is accessable through angr through the `Project.factory.block` interface. There are many different representations you could use to access syntactic properties of a block of code, but they all have in common the trait of analyzing a particular sequence of bytes. Through the `factory.block` constructor, you get a `Block` object that can be easily turned into several different representations. Try `.vex` for a PyVEX IRSB, or `.capstone` for a Capstone block.
 
@@ -67,12 +70,12 @@ Let's play with PyVEX:
 >>> b = angr.Project("/bin/true")
 
 # translate the starting basic block
->>> irsb = b.factory.block(b.entry).vex
+>>> irsb = proj.factory.block(proj.entry).vex
 # and then pretty-print it
 >>> irsb.pp()
 
 # translate and pretty-print a basic block starting at an address
->>> irsb = b.factory.block(0x401340).vex
+>>> irsb = proj.factory.block(0x401340).vex
 >>> irsb.pp()
 
 # this is the IR Expression of the jump target of the unconditional exit at the end of the basic block
@@ -100,7 +103,7 @@ Let's play with PyVEX:
 ...         print ""
 
 # pretty-print the condition and jump target of every conditional exit from the basic block
-... for stmt in irsb.statements:
+>>> for stmt in irsb.statements:
 ...     if isinstance(stmt, pyvex.IRStmt.Exit):
 ...         print "Condition:",
 ...         stmt.guard.pp()
@@ -115,5 +118,3 @@ Let's play with PyVEX:
 # here is one way to get the type of temp 0
 >>> print irsb.tyenv.types[0]
 ```
-
-Keep in mind that this is a *syntactic* respresentation of a basic block. That is, it'll tell you what the block means, but you don't have any context to say, for example, what *actual* data is written by a store instruction. We'll get to that next.
diff --git a/docs/loading.md b/docs/loading.md
@@ -273,6 +273,8 @@ True
 ```
 
 Furthermore, you can use `proj.hook_symbol(name, hook)`, providing the name of a symbol as the first argument, to hook the address where the symbol lives.
+One very important usage of this is to extend the behavior of angr's built-in library SimProcedures.
+Since these library functions are just classes, you can subclass them, overriding pieces of their behavior, and then use your subclass in a hook.
 
 ## So far so good!
 

diff --git a/docs/overview.md b/docs/overview.md
@@ -2,15 +2,5 @@
 
 These are blurbs describing each of the sections that need to be rewritten.
 
-### Program States
-
-So far, we've only used angr's simulated program states \(SimState objects\) in the barest possible way in order to demonstrate basic concepts about angr's operation. Here, you'll learn about the structure of a state object and how to interact with it in a variety of useful ways.
-
 ### The Simulation Manager
 
-The most important control interface in angr is the SimulationManager, which allows you to control symbolic execution over groups of states simultaneously, applying search strategies to explore a program's state space. Here, you'll learn how to use it.
-
-### Intermediate Representation
-
-In order to be able to analyze and execute machine code from different CPU architectures, such as MIPS, ARM, and PowerPC in addition to the classic x86, angr performs most of its analysis on an _intermediate representation_, a structured description of the fundamental actions performed by each CPU instruction. By understanding angr's IR, VEX \(which we borrowed from Valgrind\), you will be able to write very quick static analyses and have a better understanding of how angr works.
-
diff --git a/docs/pathgroups.md b/docs/pathgroups.md
@@ -1,5 +1,7 @@
-Bulk Execution and Exploration - Path Groups
-============================================
+# Simulation Managers
+
+The most important control interface in angr is the SimulationManager, which allows you to control symbolic execution over groups of states simultaneously, applying search strategies to explore a program's state space.
+Here, you'll learn how to use it.
 
 Path groups are just a bunch of paths being executed at once. They are also the future.
 

diff --git a/docs/paths.md b/docs/paths.md
@@ -1,3 +1,7 @@
+**Congratulations! You found this page! Please leave.**
+
+This interface no longer exists.
+
 Program Paths - Controlling Execution
 =====================================
 

diff --git a/docs/solver.md b/docs/solver.md
@@ -232,7 +232,7 @@ Variables are not tied to any one state, and can exist freely.
 
 ## More Solving Methods
 
-TODO: write this as soon as the new API exists
+TODO: write this as soon as the new API exists. Include any_n_int, solve-as-string
 
 ## Floating point numbers
 
@@ -249,7 +249,6 @@ TODO
 | Concat | Concatenates any number of expressions together into a new expression. | `x.concat(y, ...)` |
 | RotateLeft | Rotates an expression left. | `x.RotateLeft(8)` |
 | RotateRight | Rotates an expression right. | `x.RotateRight(8)` |
-| Reverse | Reverses the bytes of an expression. | `x.reversed` |
 | And | Logical And (on boolean expressions) | `solver.And(x == y, x > 0)` |
 | Or | Logical Or (on boolean expressions) | `solver.Or(x == y, y < 10)` |
 | Not | Logical Not (on a boolean expression) | `solver.Not(x == y)` is the same as `x != y` |
@@ -263,3 +262,6 @@ TODO
 | SGE | Signed greater than or equal to. | Check if x is greater than or equal to y: `x.SGE(y)` |
 | SGT | Signed greater than. | Check if x is greater than y: `x.SGT(y)` |
 
+## Extra Operations
+
+TODO: document fun stuff like chop, reverse, variables, symbolic...
diff --git a/docs/speed.md b/docs/speed.md
@@ -11,7 +11,7 @@ Regardless, there are a lot of optimizations and tweaks you can use to make angr
 - *Don't load shared libraries unless you need them*.
   The default setting in angr is to try at all costs to find shared libraries that are compatible with the binary you've loaded, including loading them straight out of your OS libraries.
   This can complicate things in a lot of scenarios.
-  If you're performing an analysis that's anything more abstract than bare-bones symbolic execution, you might want to make the tradeoff of sacrificing accuracy for tractability.
+  If you're performing an analysis that's anything more abstract than bare-bones symbolic execution, ESPECIALLY control-flow graph construction, you might want to make the tradeoff of sacrificing accuracy for tractability.
   angr does a reasonable job of making sane things happen when library calls to functions that don't exist try to happen.
 - *Use hooking and SimProcedures*.
   If you're enabling shared libraries, then you definitely want to have SimProcedures written for any complicated library function you're jumping into.