Function Repository Resource:

CodeStructure

Source Notebook

Analyze C code in a variety of forms

Contributed by: Wolfram Research

ResourceFunction["CodeStructure"][File[filename]]

gives the syntax tree of the C file filename.

ResourceFunction["CodeStructure"]["string"]

gives the syntax tree of the C program given by string.

ResourceFunction["CodeStructure"]["string", form]

generates a representation of the C program given by string in the format specified by form.

ResourceFunction["CodeStructure"][File[filename],form]

generates a representation of the C file filename in the format specified by form.

Details

The possible types of forms include the following:
"SyntaxTree"syntax tree of the code as a graph
"SyntaxAnnotation"annotation of how the syntax tree matches up with a C program
"SourceAnnotation"annotation of how the syntax tree matches up with the source code of a C program exactly
"TokenAnnotation"annotation of the tokens of the C program
"CallGraph"basic call graph of the C program
"FileCallGraph"basic call graph of the C program where the files are the nodes and the function calls are the edges
The following option can be given for all forms:
"CommandLineArguments"arguments to give to the compiler
The following option can be given if form is "SyntaxTree" or is absent:
"IncludeTypeInformation"whether to include type information in the syntax tree
The following option is required if form is "CallGraph" or "FileCallGraph" and filename is a directory:
"BinaryLocation"location of the binary expected from the build process within the directory
The following options can be given if form is "CallGraph" or "FileCallGraph":
"ClangBinariesDirectory"location of the directory containing the clang and clang++ binaries (required if they are not on PATH)
"DropExternalFunctions"whether to drop functions that seem to have been defined outside the project to be analyzed
"NoPostProcessing"whether to return a raw call graph with no post-processing
"Recursive"whether to recursively search for C source files when analyzing a directory that does not contain a CMakeLists.txt or Makefile
The following option can be given if form is "CallGraph":
"IncludeFilePrefixes"whether to include in the label of a function the file it was defined in
The following option can be given if form is "FileCallGraph":
"ShowFunctionCount"whether to show how many times a file calls a function
If form is "CallGraph" or "FileCallGraph" and filename is a directory, ResourceFunction["CodeStructure"] will attempt to use any build system it can detect. If a CMakeLists.txt is present, ResourceFunction["CodeStructure"] will use filename/codeanalysis-build as the build directory. If a Makefile is present, ResourceFunction["CodeStructure"] will attempt to build the project in the project directory. If neither a CMakeLists.txt nor a Makefile is present, ResourceFunction["CodeStructure"] will attempt to auto-discover C source files and will then attempt to compile them so that they can be analyzed.

Examples

Basic Examples

Create two C source files:

In[1]:=
Export["main.c", "void func(void);\n\nint main(void)\n{\n\tfunc();\n\treturn 1;\n}\n", "String"];
Export["func.c", "void func(void)\n{\n}\n", "String"];

Get the call graph of the two C source files and use the directory HeadersFolder as an included directory:

In[2]:=
ResourceFunction[
 "CodeStructure"][{File["main.c"], File["func.c"]}, "CallGraph", "CommandLineArguments" -> {"-IHeadersFolder"}]
Out[2]=

Drop the file prefixes by setting the option "IncludeFilePrefixes" to False:

In[3]:=
ResourceFunction[
 "CodeStructure"][{File["main.c"], File["func.c"]}, "CallGraph", "CommandLineArguments" -> {"-Iheaders"}, "IncludeFilePrefixes" -> False]
Out[3]=

Get the call graph of a short C program:

In[4]:=
ResourceFunction["CodeStructure"]["#include <stdio.h>

int main(int argc, char **argv)
{
	(void) printf(\"%s\\n\", \"I am a short function!\");
}", "CallGraph", "DropExternalFunctions" -> False]
Out[4]=

Scope

Use the form "SyntaxAnnotation" to display how the syntax tree of a C program lines up with its source code:

In[5]:=
ResourceFunction[
 "CodeStructure"]["/* add.c -- Read a sequence of positive integers and print them \n *          out together with their sum. Use a Sentinel value\n *          (say 0) to determine when the sequence has terminated.\n */\n\n#include <stdio.h>\n#define SENTINEL 0\n\nint main(void) {\n  int sum = 0; /* The sum of numbers already read */\n  int current; /* The number just read */\n\n  do {\n    printf(\"\\nEnter an integer > \");\n    scanf(\"%d\", &current);\n    if (current > SENTINEL)\n      sum = sum + current;\n  } while (current > SENTINEL);\n  printf(\"\\nThe sum is %d\\n\", sum);\n}\n", "SyntaxAnnotation"]
Out[5]=

Use the form "CallGraph" to show the call graph of a directory containing a complete C project using its CMakeLists.txt:

In[6]:=
ResourceFunction["CodeStructure"][
 File["/Users/wolfram/wstpserver"], "CallGraph", "BinaryLocation" -> "wstpserver"]
Out[6]=

Use the form "FileCallGraph" to show the call graph of a C project where the files are the nodes and the function calls are the edges:

In[7]:=
ResourceFunction["CodeStructure"][
 File["/Users/wolfram/wstpserver"], "FileCallGraph", "BinaryLocation" -> "wstpserver", "ShowFunctionCount" -> True]
Out[7]=

Use Wolfram Language graph functions to further refine the file call graph:

In[8]:=
SimpleGraph[%, GraphLayout -> "LayeredDigraphEmbedding"]
Out[8]=

Get the syntax tree of a short C program:

In[9]:=
ResourceFunction["CodeStructure"][
 "int printf(const char *format, ...);

int main(int argc, char **argv)
{
	(void) printf(\"%s\\n\", \"I am a very short program!\");
}"]
Out[9]=

Display the syntax tree of a C program using the form "SyntaxTree":

In[10]:=
ResourceFunction[
 "CodeStructure"]["/* add.c -- Read a sequence of positive integers and print them \n *          out together with their sum. Use a Sentinel value\n *          (say 0) to determine when the sequence has terminated.\n */\n\n#include <stdio.h>\n#define SENTINEL 0\n\nint main(void) {\n  int sum = 0; /* The sum of numbers already read */\n  int current; /* The number just read */\n\n  do {\n    printf(\"\\nEnter an integer > \");\n    scanf(\"%d\", &current);\n    if (current > SENTINEL)\n      sum = sum + current;\n  } while (current > SENTINEL);\n  printf(\"\\nThe sum is %d\\n\", sum);\n}\n", "SyntaxTree"]
Out[10]=

Display tags for the types of tokens in a C program using the form "TokenAnnotation":

In[11]:=
ResourceFunction[
 "CodeStructure"]["/* add.c -- Read a sequence of positive integers and print them \n *          out together with their sum. Use a Sentinel value\n *          (say 0) to determine when the sequence has terminated.\n */\n\n#include <stdio.h>\n#define SENTINEL 0\n\nint main(void) {\n  int sum = 0; /* The sum of numbers already read */\n  int current; /* The number just read */\n\n  do {\n    printf(\"\\nEnter an integer > \");\n    scanf(\"%d\", &current);\n    if (current > SENTINEL)\n      sum = sum + current;\n  } while (current > SENTINEL);\n  printf(\"\\nThe sum is %d\\n\", sum);\n}\n", "TokenAnnotation"]
Out[11]=

Use the form "SourceAnnotation" to display how the syntax tree of a C program lines up with a more exact version of its source code:

In[12]:=
ResourceFunction[
 "CodeStructure"]["/* add.c -- Read a sequence of positive integers and print them \n *          out together with their sum. Use a Sentinel value\n *          (say 0) to determine when the sequence has terminated.\n */\n\n#include <stdio.h>\n#define SENTINEL 0\n\nint main(void) {\n  int sum = 0; /* The sum of numbers already read */\n  int current; /* The number just read */\n\n  do {\n    printf(\"\\nEnter an integer > \");\n    scanf(\"%d\", &current);\n    if (current > SENTINEL)\n      sum = sum + current;\n  } while (current > SENTINEL);\n  printf(\"\\nThe sum is %d\\n\", sum);\n}\n", "SourceAnnotation"]
Out[12]=

Possible Issues

Call graph generation will fail if the clang and clang++ binaries are not on PATH, as is typical on Linux, where gcc is the default compiler:

In[13]:=
ResourceFunction[
 "CodeStructure"]["int main(int argc, char **argv){return 1;}", "CallGraph"]
Out[13]=

Check the value of CodeAnalysis`$BuildError to find the cause of the error:

In[14]:=
CodeAnalysis`$BuildError
Out[14]=

Manually specify the location of the clang and clang++ binaries using the "ClangBinariesDirectory" option:

In[15]:=
ResourceFunction[
 "CodeStructure"]["int main(int argc, char **argv){return 1;}", "CallGraph", "ClangBinariesDirectory" -> "/Users/wolfram/my-llvm-clang/bin"]
Out[15]=

Call graph generation will fail if filename is a directory that contains a CMakeLists.txt that requires certain cache variables to be defined:

In[16]:=
ResourceFunction["CodeStructure"][
 File["/Users/wolfram/wstpserver"], "CallGraph", "BinaryLocation" -> "wstpserver"]
Out[16]=

Check the value of CodeAnalysis`$BuildError to find the cause of the error:

In[17]:=
CodeAnalysis`$BuildError
Out[17]=

To resolve the issue, manually define any required CMake cache variables in the codeanalysis-build subdirectory:

$ cd /Users/wolfram/wstpserver/codenalysis-build

$ cmake -DMATHLINK_INCLUDE_DIRECTORY=/Users/wolfram/mathlink/build/include.


Code analysis can fail or produce a bad result if required command-line arguments are missing:

In[18]:=
ResourceFunction["CodeStructure"][
 "int get_command_line_number(void)
{
	return COMMAND_LINE_NUMBER;
}", "CallGraph"]
Out[18]=

Check the value of CodeAnalysis`$BuildError to find the cause of the error:

In[19]:=
CodeAnalysis`$BuildError
Out[19]=

Pass the definition of the COMMAND_LINE_NUMBER macro to the compiler using the "CommandLineArguments" option:

In[20]:=
ResourceFunction["CodeStructure"][
 "int get_command_line_number(void)
{
	return COMMAND_LINE_NUMBER;
}", "CallGraph", "CommandLineArguments" -> {"-DCOMMAND_LINE_NUMBER=42"}]
Out[20]=

Publisher

Christopher Cooley

Version History

  • 1.0.2 – 07 October 2022
  • 1.0.1 – 21 July 2021

Related Resources

License Information