Function Repository Resource:

ASTPattern

Source Notebook

Convert a pattern to match a corresponding CodeParser AST node

Contributed by: Richard Hennigan (Wolfram Research)

ResourceFunction["ASTPattern"][patt]

converts patt into a pattern suitable for matching against "CodeParser" AST nodes.

ResourceFunction["ASTPattern"][patt,meta]

uses the pattern meta to match against node metadata.

Details

AST stands for abstract syntax tree. An abstract syntax tree for Wolfram Language code can be generated with CodeParse.
ResourceFunction["ASTPattern"] creates a pattern such that if MatchQ[expr,patt], then MatchQ[node,ResourceFunction["ASTPattern"][patt]] where node is a part of an abstract syntax tree that would parse to expr.
Not all raw patterns are supported by ResourceFunction["ASTPattern"].
Combinations of multiple PatternTest, Condition, and/or repeated Pattern bindings should be considered experimental and unlikely to perform well.

Examples

Basic Examples (13) 

Create an AST pattern that will match any CallNode or LeafNode generated by CodeParse:

In[1]:=
ResourceFunction["ASTPattern"][_]
Out[1]=

Specify a head:

In[2]:=
ResourceFunction["ASTPattern"][_head]
Out[2]=

A function with two arguments:

In[3]:=
ResourceFunction["ASTPattern"][f[_, _]]
Out[3]=

A function with any number of arguments:

In[4]:=
ResourceFunction["ASTPattern"][f[___]]
Out[4]=

A named Pattern:

In[5]:=
ResourceFunction["ASTPattern"][a_Integer]
Out[5]=

A Repeated pattern:

In[6]:=
ResourceFunction[
 "ASTPattern"][{(_Integer?IntegerQ | _String?StringQ) ..}]
Out[6]=

A Verbatim pattern:

In[7]:=
ResourceFunction["ASTPattern"][Verbatim[_Integer]]
Out[7]=

A PatternSequence:

In[8]:=
ResourceFunction["ASTPattern"][{___, PatternSequence[x, y], ___}]
Out[8]=

A pattern for unevaluated code wrapped in HoldPattern:

In[9]:=
ResourceFunction["ASTPattern"][HoldPattern[1 + 1]]
Out[9]=

A PatternTest:

In[10]:=
ResourceFunction["ASTPattern"][_?test]
Out[10]=

A Condition:

In[11]:=
ResourceFunction["ASTPattern"][a_ /; test[a]]
Out[11]=

Duplicate pattern bindings:

In[12]:=
ResourceFunction["ASTPattern"][{a_, a_, a_}]
Out[12]=

Nested patterns:

In[13]:=
ResourceFunction["ASTPattern"][
 expr : f[_, ResourceFunction["ASTPattern"][arg_], _]]
Out[13]=

Scope (3) 

Use the second argument of ASTPattern to match node metadata:

In[14]:=
Cases[CodeParser`CodeParse["f[x_]:=x+y;\ny=1;"], ResourceFunction["ASTPattern"][_, KeyValuePattern["Definitions" -> defs_]] :> defs, Infinity]
Out[14]=

Match only nodes that start on a particular line:

In[15]:=
Cases[CodeParser`CodeParse["1+1\n2+2\n3+3"], ResourceFunction["ASTPattern"][_, KeyValuePattern[CodeParser`Source -> {{2, _}, _}]], Infinity]
Out[15]=

Bind parts of metadata in pattern names:

In[16]:=
ast = CodeParser`CodeParse["1+1"]
Out[16]=
In[17]:=
rule = ResourceFunction["ASTPattern"][node_Plus, meta : KeyValuePattern[CodeParser`Source -> src_]] :> <|
   "String" -> CodeParser`ToFullFormString[node], "Source" -> src, "Metadata" -> meta|>
Out[17]=
In[18]:=
Cases[ast, rule, Infinity]
Out[18]=

Applications (2) 

Parse a Wolfram Language file to get an abstract syntax tree:

In[19]:=
Short[ast = CodeParser`CodeParse[File["ExampleData/Collatz.m"], "SourceConvention" -> "SourceCharacterIndex"]]
Out[19]=

Extract all nodes from the AST that correspond to a pattern:

In[20]:=
nodes = Cases[ast, ResourceFunction["ASTPattern"][Collatz[_Plus | _Times]], Infinity]
Out[20]=

Convert to expressions:

In[21]:=
ToExpression[CodeParser`ToFullFormString /@ nodes, InputForm, Hold]
Out[21]=

Extract source positions:

In[22]:=
pos = Cases[ast, ResourceFunction["ASTPattern"][Collatz[_], KeyValuePattern[CodeParser`Source -> src_]] :> src, Infinity]
Out[22]=

Extract the corresponding parts of the file as strings:

In[23]:=
StringTake[ReadString["ExampleData/Collatz.m"], pos]
Out[23]=

Create a CodeCases function:

In[24]:=
CodeCases[file_File, patt_] := CodeCases[ReadString[file], patt];
CodeCases[string_String, patt_] := StringTake[string, Cases[CodeParser`CodeParse[string, "SourceConvention" -> "SourceCharacterIndex"], ResourceFunction["ASTPattern"][patt, KeyValuePattern[CodeParser`Source -> src_]] :> src, Infinity]];
In[25]:=
CodeCases[File["ExampleData/Collatz.m"], _Collatz]
Out[25]=

Compare to string matching:

In[26]:=
StringCases[ReadString["ExampleData/Collatz.m"], Shortest["Collatz[" ~~ ___ ~~ "]"]]
Out[26]=

This incorrectly matched part of the usage message as code:

In[27]:=
ReadString["ExampleData/Collatz.m"]
Out[27]=

When matching against the AST, there's no need to account for different types of input syntax:

In[28]:=
CodeCases[File["ExampleData/Collatz.m"], And[_, _]]
Out[28]=

Properties and Relations (4) 

The single-argument form of ASTPattern is effectively invisible to an outer ASTPattern:

In[29]:=
ResourceFunction["ASTPattern"][
 f[ResourceFunction["ASTPattern"][_], ResourceFunction["ASTPattern"][_]]]
Out[29]=
In[30]:=
ResourceFunction["ASTPattern"][f[_, _]]
Out[30]=
In[31]:=
% === %%
Out[31]=

The two-argument form inserts the corresponding metadata patterns:

In[32]:=
ResourceFunction["ASTPattern"][
 f[ResourceFunction["ASTPattern"][_, a_], ResourceFunction["ASTPattern"][_, b_]], c_]
Out[32]=

Get an AST node corresponding to a simple expression:

In[33]:=
node = CodeParser`CodeParse["f[1]"][[2, 1]]
Out[33]=

For simple patterns, the performance of matching an ASTPattern is comparable to normal pattern matching:

In[34]:=
simple = ResourceFunction["ASTPattern"][f[_Integer]]
Out[34]=
In[35]:=
RepeatedTiming[MatchQ[node, simple]]
Out[35]=
In[36]:=
RepeatedTiming[MatchQ[f[1], f[_Integer]]]
Out[36]=

Matching patterns that require evaluation typically incur significant cost as an ASTPattern:

In[37]:=
patt = ResourceFunction["ASTPattern"][f[x_Integer /; IntegerQ[x]]]
Out[37]=
In[38]:=
RepeatedTiming[MatchQ[node, patt]]
Out[38]=

Compare to normal pattern matching:

In[39]:=
RepeatedTiming[MatchQ[f[1], f[x_Integer /; IntegerQ[x]]]]
Out[39]=

Some pattern tests allow for optimized patterns:

In[40]:=
ResourceFunction["ASTPattern"][_Integer?IntegerQ]
Out[40]=
In[41]:=
ResourceFunction["ASTPattern"][_Integer?AtomQ]
Out[41]=
In[42]:=
ResourceFunction["ASTPattern"][_String?StringQ]
Out[42]=
In[43]:=
ResourceFunction["ASTPattern"][_String?AtomQ]
Out[43]=

This will only work for certain pattern tests that appear literally:

In[44]:=
myIntegerQ = IntegerQ;
ResourceFunction["ASTPattern"][_Integer?myIntegerQ]
Out[38]=

Pattern bindings that stay within ASTPattern correspond to expressions:

In[45]:=
ast = CodeParser`CodeParse["1+2"];
Cases[ast, ResourceFunction["ASTPattern"][
  x_Integer /; (Echo[x, "Inside"]; True)], Infinity]
Out[41]=

Once a value passes outside of ASTPattern, the binding corresponds to the AST node:

In[46]:=
Cases[ast, ResourceFunction["ASTPattern"][x_Integer] /; (Echo[x, "Outside"]; True), Infinity]
Out[46]=

Note that all instances of the pattern symbol appear below ASTPattern in this case:

In[47]:=
TreeForm[
 Hold[ResourceFunction["ASTPattern"][
   x_Integer /; (Echo[x, "Inside"]; True)]]]
Out[47]=

In this case, there is an x that's "lifted" out of the ASTPattern, which is why it appears as an AST node:

In[48]:=
TreeForm[
 Hold[ResourceFunction["ASTPattern"][
    x_Integer] /; (Echo[x, "Outside"]; True)]]
Out[48]=

Possible Issues (4) 

ASTPattern cannot generalize to all possible WL patterns:

In[49]:=
ast = CodeParser`CodeParse["{f[1],f[1],f[1]}"]
Out[49]=
In[50]:=
FreeQ[ast, ResourceFunction["ASTPattern"][{f[a_] ..}]]
Out[50]=
In[51]:=
FreeQ[{f[1], f[1], f[1]}, {f[a_] ..}]
Out[51]=

In this case, the pattern binding prevents matching, since the metadata will always be different between nodes:

In[52]:=
Cases[ast, ResourceFunction["ASTPattern"][{f[_] ..}], Infinity]
Out[52]=
In[53]:=
%[[1, 2, All, 3]]
Out[53]=

Without any repeated binding this pattern will match:

In[54]:=
FreeQ[ast, ResourceFunction["ASTPattern"][{f[_] ..}]]
Out[54]=

Not all raw patterns are supported by ASTPattern:

In[55]:=
node = CodeParser`CodeParse["f[x -> 1]"][[2, 1]]
Out[55]=
In[56]:=
patt = ResourceFunction["ASTPattern"][f[OptionsPattern[]]]
Out[56]=
In[57]:=
MatchQ[node, patt]
Out[57]=
In[58]:=
MatchQ[f[x -> 1], f[OptionsPattern[]]]
Out[58]=

Unsupported patterns are represented verbatim:

In[59]:=
MatchQ[CodeParser`CodeParse["f[OptionsPattern[]]"][[2, 1]], patt]
Out[59]=
In[60]:=
MatchQ[f[OptionsPattern[]], Verbatim[f[OptionsPattern[]]]]
Out[60]=

ASTPattern ignores the Unevaluated wrapper:

In[61]:=
patt = ResourceFunction["ASTPattern"][Unevaluated[1 + 1]]
Out[61]=
In[62]:=
FreeQ[CodeParser`CodeParse["Hold[1+1]"], patt]
Out[62]=

Use HoldPattern instead:

In[63]:=
patt = ResourceFunction["ASTPattern"][HoldPattern[1 + 1]]
Out[63]=
In[64]:=
FreeQ[CodeParser`CodeParse["Hold[1+1]"], patt]
Out[64]=

ASTPattern must evaluate in order to create the desired pattern, so it should not be wrapped in HoldPattern:

In[65]:=
MatchQ[CodeParser`CodeParse["1+1"][[2, 1]], HoldPattern[ResourceFunction["ASTPattern"][1 + 1]]]
Out[65]=

Use HoldPattern within ASTPattern instead:

In[66]:=
MatchQ[CodeParser`CodeParse["1+1"][[2, 1]], ResourceFunction["ASTPattern"][HoldPattern[1 + 1]]]
Out[66]=

Version History

  • 1.0.0 – 01 March 2022

Related Resources

License Information