Skip to content

Commit

Permalink
Implement scan_python to process flow by external python script (sim…
Browse files Browse the repository at this point in the history
…song#164)

* Implement scan_python to process flow by external python script

Many authors have participed to this effort:

- @jakesmo   https://github.com/jakesmo
- @lassimus  https://github.com/lassimus
- @olibre    https://github.com/olibre

The objective is to extend tcpflow using python language.

The original work is available on @lassimus' fork:
https://github.com/lassimus/tcpflow/commits/master

@olibre has continued the work, and has deeply refactored the original source code from @jakesmo and @lassimus.
Instead of adding a new option -P, this commit reuses option "-e python" and adds three parameters:

- -S py_path=...
- -S py_module=...
- -S py_function=....

Autotools/Automake files have also been fixed in comparaison of original source code from @jakesmo and @lassimus.
CMake files have been updated.

For the Autotools/Automake side, the project builds fine with and without the package python-devel.
However for CMake build, package python-devel is required
This will be improved in a future pull request about CMake.

The resulted tcpflow executable have been tested in many ways:

- built with and without python-devel installation,
- tested with and without options -a, -e python,
- tested in situations where parameters were inconsistent
- tested with mistakes in parameters
- ...

There are also some TODOs withing the source code assigned to @simsong:

    TODO simsong#1 When the scanner cannot initialize it, should we use sp.info->flags = scanner_info::SCANNER_DISABLED?
    TODO simsong#2 Why PHASE_THREAD_BEFORE_SCAN never called?
    TODO simsong#3 Similar to TODO simsong#1

This new feature will amplify the possibilities of tcpflow output data processing 😃

* Remplace XML tag <scan_python_result> by <tcpflow:result>

For more information, see:
dfxml-working-group/dfxml_schema#24

* Fix XML tag name <tcpflow:result>

* Avoid symbols "<" and ">" in XML value

* Rename XML attribute py_function -> function
  • Loading branch information
olibre authored and simsong committed Jul 20, 2017
1 parent bf068fc commit 657ffeb
Show file tree
Hide file tree
Showing 12 changed files with 370 additions and 14 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
*.o
*.so
*.a
*.pyc

# Packages #
############
Expand Down
7 changes: 1 addition & 6 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,14 @@ project(tcpflow VERSION 1.4.6 LANGUAGES CXX C)
# 1. find_package(Threads) fails using only CXX on cmake-3.3 and previous
# 2. CMake files use CMAKE_C_COMPILER_ID instead of CMAKE_CXX_COMPILER_ID

# find_package(pcap) -> cmake/FindPCAP.cmake
# The following line if for find_package(pcap) -> cmake/FindPCAP.cmake
set(CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake)


include(cmake/options.cmake) # Set default CMake options
include(cmake/coverage.cmake) # Configure the build "Coverage"
include(cmake/compilation-flags.cmake) # Compiler & Linker flags
include(cmake/warning-flags.cmake) # Compiler & Linker warnings





# Source code
add_subdirectory(src)

Expand Down
21 changes: 18 additions & 3 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -311,9 +311,9 @@ if test $cairo = test ; then
*** cairo libraries not detected.
*** Please install cairo-devel to get 1-page PDF summary generation.
])
Fmissing_library="cairo-devel $missing_library "
Umissing_library="libcairo2-dev $missing_library "
Mmissing_library="cairo-devel "
Fmissing_library="$Fmissing_library cairo-devel "
Umissing_library="$Umissing_library libcairo2-dev "
Mmissing_library="$Mmissing_library cairo-devel "
])
fi

Expand Down Expand Up @@ -549,6 +549,21 @@ AC_CHECK_TYPES([sa_family_t], [], [],
### ]]
### )

################################################################
# Plugin scanner_python.cpp requires header "Python.h"
# If the header is not present => Disable the source code of the plugin
#
AC_CHECK_HEADERS(python2.7/Python.h) # ==> #define HAVE_PYTHON2_7_PYTHON_H
AC_CHECK_LIB(python2.7,Py_Initialize,,[
AC_MSG_WARN([
*** Cannot find python library.
*** Please install python-devel to enable scanner python.
])
Fmissing_library="$Fmissing_library python-devel " # Validated on Fedora 25
Umissing_library="$Umissing_library libpython2.7-dev" # Should be OK: https://packages.ubuntu.com/yakkety/libpython2.7-dev
Mmissing_library="$Mmissing_library python27" # Not sure: https://github.com/macports/macports-ports/blob/master/lang/python27/Portfile
])

############## drop optimization flags if requested ################

# Should we disable optimization?
Expand Down
19 changes: 19 additions & 0 deletions doc/tcpflow.1.in
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,25 @@ hash value, is also written to the
.B DFXML report
file.
.TP
.B \-e python \-S py_path=path \-S py_module=module \-S py_function=foo
Post-process TCP payload by an external python function.
.RS
.PP
The python function must take a single string parameter.
The python function can return a string (else the function does must not return).
The returned string (if any) is written in the
.B DFXML report
file inside the XML tag \fB<scan_python_result>...</scan_python_result>\fP.
A sample python script is available within the tcpflow source code
in directory \fBpython/plugins\fP.
.PP
Example:
.PP
.nf
\fBtcpflow -r my.cap -e python -S py_path=python/plugins -S py_module=samplePlugin -S py_function=sampleFunction\fP
.fi
.RE
.TP
.B \-F[format]
Specifies format for output filenames.
.RS
Expand Down
16 changes: 16 additions & 0 deletions python/plugins/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
To execute customizable python plugins:

1. Check examples in directory `tcpflow/python/plugins`.

2. Create a python script with the following properties:

- The script contains one or more functions for tcpflow usage.
- Each intended function must take a single string parameter.
This parameter will hold the contents of the application data captured by tcpflow.
- If an intended function returns, it must return a string,
which will then be added to the report.xml file with the "plugindata" tag.

3. Execute the `tcpflow` command line with arguments `-e python -S py_path=path -S py_module=module -S py_function=foo`.
Example:

tcpflow -r my.cap -o flows -e python -S py_path=python/plugins -S py_module=samplePlugin -S py_function=sampleFunction
49 changes: 49 additions & 0 deletions python/plugins/samplePlugin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
## Example of a python plugin for tcpflow.
## This sample contains three functions.

## The first function takes a string and returns a sample message.
## The input string contains the application data from tcpflow's buffer.

def sampleFunction(appData):
return "This message appears in the XML tag 'tcpflow:result' of report.xml (DFXML)."

## The second function takes a string (application data)
## and writes the application (HTTP) header data to the file
## myOutput.txt located in the python director.
## This function does not return and simply prints to stdout.

def headerWriter(appData):
fName = "myOutput.txt"
f = open("python/" + fName, 'a')
headerFinish = appData.find("\r\n\r\n") + 4
headerData = appData[:headerFinish+1]
f.write(headerData)
f.close()
print "Wrote data to " + fName

## The third function takes a string (application data)
## parses the HTTP message (without headers)
## performs a bitwise xor operation with a key defined in the function
## and returns the text corresponding to this binary result.

def xorOp(appData):
# Assume variable buffer includes message data.
dataStart = appData.find("\r\n\r\n") + 4
httpData = appData[dataStart:]
binaryData = ''.join(format(ord(x), 'b') for x in httpData)
if len(binaryData) < 1:
return 0

key = "01101011101"
keyLen = len(key)
newKey = ""
while len(newKey) + keyLen <= len(binaryData):
newKey += key
i = 0
while len(newKey) < len(binaryData):
if i == keyLen:
i = 0
newKey += key[i]
i += 1
xorRes = int(binaryData,2) ^ int(newKey,2)
return '{0:b}'.format(xorRes)
Binary file added samplePcaps/jpegs.cap
Binary file not shown.
15 changes: 10 additions & 5 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ find_package(Boost) #TODO(olibre): COMPONENTS program_options system)
find_package(PCAP)
find_package(OpenSSL)
find_package(Threads)
find_package(PythonLibs)


# TODO(olibre): Use target_link_libraries() instead of include_directories()
Expand Down Expand Up @@ -84,10 +85,12 @@ check_include_files(unordered_map HAVE_UNORDERED_MAP)
check_include_files(unordered_set HAVE_UNORDERED_SET)
check_include_files(winsock2.h HAVE_WINSOCK2_H)
check_include_files(zlib.h HAVE_ZLIB_H)
# The following command lines list the rest of the #define that are used but not yet implemented using CMake directives
# sed 's|/\* ||' config.h | awk '$1 ~ /#undef|#define/{print $2}' | sort -u | while read w ; do grep -wB1 $w config.h | grep '[^ ]*> header' -q && echo $w; done > already-implemented-using-cmake-directives
# ( sed 's|/\* ||' config.h | awk '$1 ~ /#undef|#define/{print $2}' | sort -u | while read w ; do find src -name 'config.h' -o -regex '.*.h\|.*.cpp' -exec fgrep -Iowqm1 $w {} '+' && echo "$w" ; done ) > used-in-source-code
# comm -13 already-implemented-using-cmake-directives used-in-source-code | grep -wf - config.h -B1 --color=always > rest-to-implement-using-cmake-directives
check_include_files(python2.7/Python.h PYTHON2_7_PYTHON_H) # TODO(olibre): Use instead PYTHON_INCLUDE_DIRS
# There are many other #define not (yet) implemented by above CMake directives.
# To list the #define use the following command lines:
# sed 's|/\* ||' config.h | awk '$1 ~ /#undef|#define/{print $2}' | sort -u | while read w ; do grep -wB1 $w config.h | grep '[^ ]*> header' -q && echo $w; done > already-implemented-using-cmake-directives
# ( sed 's|/\* ||' config.h | awk '$1 ~ /#undef|#define/{print $2}' | sort -u | while read w ; do find src -name 'config.h' -o -regex '.*.h\|.*.cpp' -exec fgrep -Iowqm1 $w {} '+' && echo "$w" ; done ) > used-in-source-code
# comm -13 already-implemented-using-cmake-directives used-in-source-code | grep -wf - config.h -B1 --color=always > rest-to-implement-using-cmake-directives
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/../config.h.in ${CMAKE_CURRENT_BINARY_DIR}/config.h)

# When all #define from config.h are implemented using CMake directives the following line...
Expand All @@ -102,6 +105,7 @@ file (GLOB netviz_h netviz/*.h)
source_group("netviz headers" FILES ${netviz_h})
add_library (netviz ${netviz_cpp} ${netviz_h})
target_include_directories(netviz PUBLIC netviz)
target_link_libraries(netviz cairo)

# add_subdirectory(dfxml/src)
set(dfxml_writer_h dfxml/src/dfxml_writer.h dfxml/src/hash_t.h)
Expand Down Expand Up @@ -172,6 +176,7 @@ set (tcpflow_cpp datalink.cpp flow.cpp
util.cpp
scan_md5.cpp
scan_http.cpp # Depends on zlib
scan_python.cpp # Depends on PYTHON_LIBRARIES
scan_tcpdemux.cpp
scan_netviz.cpp
pcap_writer.h
Expand All @@ -187,4 +192,4 @@ set (tcpflow_h
)
source_group("tcpflow headers" FILES ${tcpflow_h})
add_executable(tcpflow ${tcpflow_cpp} ${tcpflow_h})
target_link_libraries(tcpflow netviz wifipcap be13_api dfxml_writer http-parser z pcap)
target_link_libraries(tcpflow netviz wifipcap be13_api dfxml_writer http-parser z pcap ${PYTHON_LIBRARIES}) # add also ${PYTHON_INCLUDE_PATH}
1 change: 1 addition & 0 deletions src/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ tcpflow_SOURCES = \
tcpflow.h util.cpp \
scan_md5.cpp \
scan_http.cpp \
scan_python.cpp \
scan_tcpdemux.cpp \
scan_netviz.cpp \
pcap_writer.h \
Expand Down
Loading

0 comments on commit 657ffeb

Please sign in to comment.