-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sample Python auto generation #205
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! just initial comments.
I wonder if there's a way to refactor this to make all of this a bit neater in the future. Right now the concept of a benchmark
is quite ingrained in these classes, so we need to untie that, or alternately somehow create a synthetic "benchmark" based on the Python project we're generating targets for.
@@ -0,0 +1,9 @@ | |||
# Python auto-gen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we place this in /languages/python
instead?
|
||
def create_sample_harness(github_repo: str, func_elem): | ||
|
||
prompt_template = """Hi, I'm looking for your help to write a Python fuzzing harness for the %s Python project. The project is located at %s and I would like you to write a harness targeting this module. You should use the Python Atheris framework for writing the fuzzer. Could you please show me the source code for this harness? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this into a file?
@@ -119,6 +119,13 @@ def _pre_build_check(self, target_path: str, | |||
return False | |||
return True | |||
|
|||
def build_and_run_python(self, generated_project: str, target_path: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid adding language specific methods here? We should ideally make this class language agnostic.
@@ -145,6 +152,36 @@ def build_and_run(self, generated_project: str, target_path: str, | |||
generated_project, benchmark_target_name)) | |||
return build_result, run_result | |||
|
|||
def run_target_local_python(self, generated_project: str, target_name: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very similar to run_target_local
. Is the only difference we're not passing things from self.benchmark
?
We should find some better way to refactor things so there's less duplication here. One simple way I can think of now is to perhaps factor out a general run_oss_fuzz_helper(...)
which is called by both run_target_local
and python_fuzzgen.
@@ -199,6 +237,14 @@ def build_target_local(self, | |||
print(f'Failed to build image for {generated_project}') | |||
return False | |||
|
|||
if language == 'python': | |||
command = 'python3 infra/helper.py build_fuzzers %s' % (generated_project) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use fstrings here to be consistent?
@@ -199,6 +237,14 @@ def build_target_local(self, | |||
print(f'Failed to build image for {generated_project}') | |||
return False | |||
|
|||
if language == 'python': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What breaks if we let this run through the existing code from line 248 instead? IS there a way to make this work by changing the env vars being set there instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use fstrings here to be consistent?
Sample for auto-generating an OSS-Fuzz project for a given Python project.
This differs a bit from the existing set up. The approach in this PR relies on cloning the Python repository within an OSS-Fuzz base-builder image. Within this image Fuzz Introspector is cloned, and a fuzz introspector analysis purely based on static analysis is performed, to extract details about the Python library under analysis.
It's a bit raw at this stage, and can likely be better integrated into OSS-Fuzz-gen. However, I'm not 100% sure what the smartest steps are here, so am sharing in case there are opinions. It may be smarter to operate on two tracks in parallel and merge them later on once it's better known what works/doesn't work.