-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process Hangs Due to Recursive Code Transformations in Rule Graphs #709
Comments
Alright! My first suggestion is to avoid using tree-sitter queries and instead use "concrete syntax," unless you truly need the full expressiveness of tree-sitter queries (such as for alternations or similar complex patterns). For example:
Now, regarding the infinite loop issue: Piranha operates by applying transformations iteratively until it reaches a fixpoint, meaning it will continue transforming code until there are no further changes. If the code still matches after a rewrite, the same rule will trigger again. This approach is effective for feature-flash cleanup, but not ideal for migrations. To prevent repeated matches, we use what we call filters. Take a look at your code with these changes: from polyglot_piranha import execute_piranha, PiranhaArguments, Rule, RuleGraph, OutgoingEdges, Filter
def main():
code = """from tensorflow.keras import layers
sp = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()"""
find_from_tensorflow_keras_import_layers = Rule(
name="find_from_tensorflow_keras_import_layers",
query="""rgx from tensorflow.keras import layers""", # rgx indicates regex match
is_seed_rule=True
)
extend_method_chain = Rule(
name="extend_method_chain",
query="""cs :[builder].config(:[args+])""", # cs indicates we are using concrete syntax
replace_node="builder",
replace=':[builder].config("config3", "1")',
is_seed_rule=True,
filters={Filter(enclosing_node="(assignment) @assign",
not_contains=['cs :[other].config("config3", "1")'])} # here is the filter that says "only apply this rule once per assignment"
)
edge = OutgoingEdges("find_from_tensorflow_keras_import_layers", to=["extend_method_chain"], scope="Parent")
# Create Piranha arguments
piranha_arguments = PiranhaArguments(
code_snippet=code,
language="python",
rule_graph=RuleGraph(rules=[find_from_tensorflow_keras_import_layers, extend_method_chain], edges=[edge], )
)
# Execute Piranha and print the transformed code
piranha_summary = execute_piranha(piranha_arguments)
print(piranha_summary[0].content)
if __name__ == "__main__":
main() |
Thank you for the suggestions! I'll try using concrete syntaxes for now, but I might need to explore Tree-sitter queries in the future. The looping issue has been resolved with the from polyglot_piranha import execute_piranha, PiranhaArguments, Rule, RuleGraph, OutgoingEdges, Filter
def main():
code = """sp = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()"""
find_from_tensorflow_keras_import_layers = Rule(
name="find_from_tensorflow_keras_import_layers",
query="""rgx from tensorflow.keras import layers""", # rgx indicates regex match
is_seed_rule=True
)
extend_method_chain = Rule(
name="extend_method_chain",
query="""cs :[builder].config(:[args+])""", # cs indicates we are using concrete syntax
replace_node="builder",
replace=':[builder].config("config3", "1")',
is_seed_rule=True,
filters={Filter(enclosing_node="(assignment) @assign",
not_contains=['cs :[other].config("config3", "1")'])} # here is the filter that says "only apply this rule once per assignment"
)
edge = OutgoingEdges("find_from_tensorflow_keras_import_layers", to=["extend_method_chain"], scope="Parent")
# Create Piranha arguments
piranha_arguments = PiranhaArguments(
code_snippet=code,
language="python",
rule_graph=RuleGraph(rules=[find_from_tensorflow_keras_import_layers, extend_method_chain], edges=[edge], )
)
# Execute Piranha and print the transformed code
piranha_summary = execute_piranha(piranha_arguments)
print(piranha_summary[0].content)
if __name__ == "__main__":
main() Expected output
Actual output
|
The second rule needs to be declared as seed = False |
Thank you so much for your help in figuring this out. I tried setting from polyglot_piranha import execute_piranha, PiranhaArguments, Rule, RuleGraph, OutgoingEdges, Filter
def main():
code = """from tensorflow.keras import layers
sp = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()"""
find_from_tensorflow_keras_import_layers = Rule(
name="find_from_tensorflow_keras_import_layers",
query="""rgx from tensorflow.keras import layers""", # rgx indicates regex match
is_seed_rule=True
)
extend_method_chain = Rule(
name="extend_method_chain",
query="""cs :[builder].config(:[args+])""", # cs indicates we are using concrete syntax
replace_node="builder",
replace=':[builder].config("config3", "1")',
is_seed_rule=False,
filters={Filter(enclosing_node="(assignment) @assign",
not_contains=['cs :[other].config("config3", "1")'])} # here is the filter that says "only apply this rule once per assignment"
)
edge = OutgoingEdges("find_from_tensorflow_keras_import_layers", to=["extend_method_chain"], scope="Parent")
# Create Piranha arguments
piranha_arguments = PiranhaArguments(
code_snippet=code,
language="python",
rule_graph=RuleGraph(rules=[find_from_tensorflow_keras_import_layers, extend_method_chain], edges=[edge], )
)
# Execute Piranha and print the transformed code
piranha_summary = execute_piranha(piranha_arguments)
print(piranha_summary[0].content)
if __name__ == "__main__":
main() |
I suspect the reason is the scope. Rules in PolyglotPiranha trigger other rules within scopes. In your code, the rule PolyglotPiranha can support multiple scopes (they are user defined for each language). For example, for
Unfortunately, we don't have scopes defined for Python (we would need to add it to the piranha_arguments = PiranhaArguments(
code_snippet=code,
language="python",
rule_graph=RuleGraph(rules=[find_from_tensorflow_keras_import_layers, extend_method_chain], edges=[edge]),
number_of_ancestors_in_parent_scope=30 # set a big number here
) Let me know if this helps! |
Thank you again. I tried increasing the value of from polyglot_piranha import execute_piranha, PiranhaArguments, Rule, RuleGraph, OutgoingEdges, Filter
def main():
code = """from tensorflow.keras import layers
sp = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()"""
find_from_tensorflow_keras_import_layers = Rule(
name="find_from_tensorflow_keras_import_layers",
query="""rgx from tensorflow.keras import layers""", # rgx indicates regex match
is_seed_rule=True
)
extend_method_chain = Rule(
name="extend_method_chain",
query="""cs :[builder].config(:[args+])""", # cs indicates we are using concrete syntax
replace_node="builder",
replace=':[builder].config("config3", "1")',
is_seed_rule=False,
filters={Filter(enclosing_node="(assignment) @assign",
not_contains=['cs :[other].config("config3", "1")'])} # here is the filter that says "only apply this rule once per assignment"
)
edge = OutgoingEdges("find_from_tensorflow_keras_import_layers", to=["extend_method_chain"], scope="Parent")
piranha_arguments = PiranhaArguments(
code_snippet=code,
language="python",
rule_graph=RuleGraph(rules=[find_from_tensorflow_keras_import_layers, extend_method_chain], edges=[edge]),
number_of_ancestors_in_parent_scope=255 # set a big number here
)
# Execute Piranha and print the transformed code
piranha_summary = execute_piranha(piranha_arguments)
print(piranha_summary[0].content)
if __name__ == "__main__":
main() |
You're right; I had forgotten. Unfortunately, parent scope only applies to nodes that are actual parents and not their context recursively, so the current workaround is to set it to global. As for the required changes, I would be happy to review your PRs, but I currently don’t have write access to this repository, so I can't promise it will be merged. The change is pretty straightforward; you just have to create a scope config file and then add its relative path to the corresponding language, as shown here: piranha/src/models/language.rs Line 201 in 6f2e9bc
In # Scope generator for python files
[[scopes]]
name = "File"
[[scopes.rules]]
enclosing_node = """
(module) @p_m
"""
scope = "(module) @python_module"
You may also add other scopes, like class, function, method etc |
@danieltrt Thank you for your help. I created the file and successfully defined
Below is the # Scope generator for python files
[[scopes]]
name = "Function"
[[scopes.rules]]
enclosing_node = """
(
(function_definition
name: (_) @n
parameters: (parameters) @fp
) @xdn
)"""
scope = """
(
[
(function_definition
name: (_) @z
parameters: (parameters) @tp
)
(#eq? @z "@n")
(#eq? @tp "@fp")
]
) @qdn
"""
[[scopes]]
name = "Class"
[[scopes.rules]]
enclosing_node = """
((class_definition name: (_) @n) @c)
"""
scope = """
(
((class_definition
name: (_) @z) @qc)
(#eq? @z "@n")
)
"""
[[scopes]]
name = "File"
[[scopes.rules]]
enclosing_node = """
(module) @p_m
"""
scope = "(module) @python_module" |
That seems right to me for your example. PolyglotPiranha cannot find an enclosing class in The function scope seems to have a little problem. This version works for me: [[scopes]]
name = "Function"
[[scopes.rules]]
enclosing_node = """
(
(function_definition
name: (_) @n
parameters: (parameters) @fp
) @xdn
)"""
scope = """
(
[
(function_definition
name: (_) @z
parameters: (parameters) @tp
) @qdn
(#eq? @z "@n")
(#eq? @tp "@fp")
]
)
""" Try running it on this code instead: def test_fn(x):
from tensorflow.keras import layers
other = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()
first = layers.builder.config("config1", "5").config("config2", "5").getOrCreate() Hopefully you will observe the expected behavior 😃 |
It works now—thanks a lot for your help. However, if PolyglotPiranha can’t find an enclosing class or function, should it print a warning or error message instead of raising an exception? Is this the expected behavior? |
I agree, I think a warning could have been a better solution here ? I don't remember the rationale for this design decision |
The PR has been opened. Thank you once again for your help. |
I think we can add a flag that optionally prints a warning rather than an throwing an exception. |
Sure, if the expectation is to apply each rule at least once, that makes sense to me. |
Hi @danieltrt Thank you again for helping me understand how to run Piranha. I have a follow-up question regarding specifying the I am trying to make the following change: input code from tensorflow.keras import layers
from pandas import data
dt = data.build()
layer = layers.create(dt)
def function1():
sp = layer.builder.config("config1", "5")
def function2():
sp = layer.builder.config("config1", "5") expected code from tensorflow.keras import layers
from pandas import data
dt = data.build()
layer = layers.create()
def function1():
sp = layer.builder.config("config1", "5")
sp.enable("config1")
def function2():
sp = layer.builder.config("config1", "5")
sp.enable("config1") I tried to implement this change as described below. I have a query to identify the imports and another query to identify I tried using below is the output with incorrect indentation, if I use from tensorflow.keras import layers
from pandas import data
dt = data.build()
layer = layers.create()
def function1():
sp = layer.builder.config("config1", "5")
def function2():
sp = layer.builder.config("config1", "5")
sp.enable("config1") I browsed through all most all the examples in the repository. Since this change involves adding a statement, which is slightly different from most of the examples I found. I came across one example that involve adding import statement, which usually do not require indentation as they are typically located at the top of the file. However, I could not find an example demonstrating how to add other statement within the code with proper indentation. Do you have any suggestions or ideas on how to resolve this? Below is the complete code from polyglot_piranha import execute_piranha, PiranhaArguments, Rule, RuleGraph, OutgoingEdges, Filter
from collections import Counter
def main():
code = """from tensorflow.keras import layers
from pandas import data
dt = data.build()
layer = layers.create()
def function1():
sp = layer.builder.config("config1", "5")
def function2():
sp = layer.builder.config("config1", "5")"""
expected_code = """from tensorflow.keras import layers
from pandas import data
dt = data.build()
layer = layers.create()
def function1():
sp = layer.builder.config("config1", "5")
sp.enable("config1")
def function2():
sp = layer.builder.config("config1", "5")
sp.enable("config1")"""
find_from_tensorflow_keras_import_layers = Rule(
name="find_from_tensorflow_keras_import_layers",
query="""rgx from tensorflow.keras import layers""", # rgx indicates regex match
is_seed_rule=True
)
find_from_pandas_import_data = Rule(
name="find_from_pandas_import_data",
query="""rgx from pandas import data""", # rgx indicates regex match
is_seed_rule=False
)
find_layer_builder_config = Rule(
name="layer_builder_config",
query="""(function_definition
body: (block
(expression_statement
(assignment
left: (identifier) @var_name
right: (call
function: (attribute
object: (attribute
object: (identifier) @object_name
attribute: (identifier) @builder_name
)
attribute: (identifier) @config_name
)
)
)
(#eq? @object_name "layer")
(#eq? @builder_name "builder")
(#eq? @config_name "config")
)@ass
)
)
""",
replace_node="ass",
replace="@ass\nsp.enable(\"config1\")",
is_seed_rule=False,
# # holes={"ass"},
filters={Filter(enclosing_node="(module) @call",
not_contains=["cs sp.enable(\"config1\")"])}
)
edge1 = OutgoingEdges("find_from_tensorflow_keras_import_layers", to=["find_from_pandas_import_data"], scope="File")
edge2 = OutgoingEdges("find_from_pandas_import_data", to=["layer_builder_config"], scope="File")
piranha_arguments = PiranhaArguments(
code_snippet=code,
language="python",
rule_graph=RuleGraph(rules=[find_from_tensorflow_keras_import_layers, find_from_pandas_import_data, find_layer_builder_config], edges=[edge1,edge2]),
)
piranha_summary = execute_piranha(piranha_arguments)
rule_match_counter = Counter([m[0] for m in piranha_summary[0].matches])
print(rule_match_counter)
print(piranha_summary[0].content)
if __name__ == "__main__":
main() |
Hey @maldil, I believe the issue here is that Piranha isn't handling indentation properly. Unfortunately, this version of Piranha doesn’t natively support indentation. Since Uber primarily uses Piranha for cleanups in Java or other languages where indentation isn’t critical, adding support for Python indentation hasn’t been a priority for them. To address this, I’ve implemented indentation support in my own custom branch, which you can find here. I’ve also made it possible to perform single-pass transformations, which helps avoid looping issues. This is particularly useful for my research, as I primarily use Piranha for migrations rather than cleanup tasks. In this version, you don’t even need filters! curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
cargo build --no-default-features
maturin build --release Here's the modified script which works like you want to in my version of piranha from polyglot_piranha import execute_piranha, PiranhaArguments, Rule, RuleGraph, OutgoingEdges, Filter
from collections import Counter
def main():
code = """from tensorflow.keras import layers
from pandas import data
dt = data.build()
layer = layers.create()
def function1():
sp = layer.builder.config("config1", "5")
def function2():
sp = layer.builder.config("config1", "5")"""
expected_code = """from tensorflow.keras import layers
from pandas import data
dt = data.build()
layer = layers.create()
def function1():
sp = layer.builder.config("config1", "5")
sp.enable("config1")
def function2():
sp = layer.builder.config("config1", "5")
sp.enable("config1")"""
find_from_tensorflow_keras_import_layers = Rule(
name="find_from_tensorflow_keras_import_layers",
query="""rgx from tensorflow.keras import layers""", # rgx indicates regex match
is_seed_rule=True
)
find_from_pandas_import_data = Rule(
name="find_from_pandas_import_data",
query="""rgx from pandas import data""", # rgx indicates regex match
is_seed_rule=False
)
find_layer_builder_config = Rule(
name="layer_builder_config",
query="""cs :[var] = layer.builder.config("config1", :[val])""",
replace_node="*",
replace=""":[var] = layer.builder.config("config1", :[val])\nsp.enable(\"config1\")""",
is_seed_rule=False,
)
edge1 = OutgoingEdges("find_from_tensorflow_keras_import_layers", to=["find_from_pandas_import_data"], scope="File")
edge2 = OutgoingEdges("find_from_pandas_import_data", to=["layer_builder_config"], scope="File")
piranha_arguments = PiranhaArguments(
code_snippet=code,
language="py",
rule_graph=RuleGraph(rules=[find_from_tensorflow_keras_import_layers, find_from_pandas_import_data, find_layer_builder_config], edges=[edge1,edge2]),
)
piranha_summary = execute_piranha(piranha_arguments)
rule_match_counter = Counter([m[0] for m in piranha_summary[0].matches])
print(rule_match_counter)
print(piranha_summary[0].content)
if __name__ == "__main__":
main()
I'd be happy to clean this code version if you are interested in using it. I'm not sure how quickly we would be able to merge it to this repository though |
thanks a lot for sharing, let me try this. |
@danieltrt Thanks, it works! I noticed it also prints some debugging-related text, which I believe can be removed. Since this approach works for concrete syntaxes, do you have any insights into the effort and required changes to adapt it for Tree-sitter queries? |
I think the logic would be pretty similar. The way it works is by identifying the depth at which the matched code exists and treating it as a "box" structure. For example:
I used this concept to extract "unindented" code. During replacements, we just need to check where the tag (e.g., stmt) is and ensure the replacement respects its indentation. For example:
Here we just need to apply the indentation of "stmt" to the captured box block, resulting in
|
Thank you for your help. I’m not sure if I should open a feature request to track this, but since the primary issue of the process hang has been resolved, I will go ahead and close this ticket. |
Thank you again for the great work. I have a question to ensure I fully understand the expectations of the rule graphs. What I'm trying to do is the following change:
before code =
after code =
Objective:
The goal is to add
.config("config3", "1")
to the config chain only if the import statementfrom tensorflow.keras import layers
exists in the file. The change should not occur if the import is different, for examplefrom pytorch.ll import layers
. I created a rule graph with two nodes:.config("config3", "1")
to the chain.Question 1: Infinite Transformation Loop When Adding
.config("config3", "1")
Observation:
The process hangs when I add
.config("config3", "1")
because it matches the search query again. If I change thereplace
parameter in theextend_method_chain
rule to.config("1")
, the code changes successfully.Question 2: Transformation Applied Regardless of Import Presence
Observation:
The second rule,
extend_method_chain
, is executed and.config("config3", "1")
is added to the method chain even whenfrom tensorflow.keras import layers
is not present in the file. What I want is the change to happen only if the import exists. Additionally, I also tried to change the scope of the rule to the File, expecting to fix this error, and got below errorpyo3_runtime.PanicException: Could not create scope query for "File"
.Blow is the code that hangs
Do you have any suggestions or workarounds to fix both errors? Or am I doing something unexpected?
The text was updated successfully, but these errors were encountered: