6. P4 language definition
The P4 language can be viewed as having several distinct components, which we describe separately:
- The core language, comprising of types, variables, scoping, declarations, statements, expressions, etc. We start by describing this part of the language.
- A sub-language for expressing parsers, based on state machines (Section [#sec-packet-parsing]).
- A sub-language for expressing computations using match-action units, based on traditional imperative control-flow (Section [#sec-control]).
- A sub-language for describing architectures (Section [#sec-arch-desc]).
Grammar
The complete grammar of P416 is given in Appendix [#sec-grammar], using Yacc/Bison grammar description language. This text is based on the same grammar. We adopt several standard conventions when we provide excerpts from the grammar:
-
UPPERCASEsymbols denote terminals in the grammar. -
- Excerpts from the grammar are given in BNF notation as
follows:
Begin P4Grammar [INCLUDE=grammar.mdk:p4program] End P4Grammar
- Excerpts from the grammar are given in BNF notation as
follows:
Pseudo-code (mostly used for describing the semantics of various P4 constructs) are shown with fixed-size fonts as in the following example:
\~ Begin P4Pseudo ParserModel.verify(bool condition, error err) { if (condition == false) { ParserModel.parserError = err; goto reject; } } \~ End P4Pseudo
Semantics and the P4 abstract machines
We describe the semantics of P4 in terms of abstract machines executing traditional imperative code. There is an abstract machine for each P4 sub-language (parser, control). The abstract machines are described in this text in pseudo-code and English.
P4 compilers are free to reorganize the code they generate in any way as long as the externally visible behaviors of the P4 programs are preserved as described by this specification where externally visible behavior is defined as:
- The input/output behavior of all P4 blocks, and
- The state maintained by extern blocks.
To aid composition of programs from multiple source files P4 compilers should support the following subset of the C preprocessor functionality:
#definefor defining macros (without arguments)#undef#if #else #endif #ifdef #ifndef #elif#include
The preprocessor should also remove the sequence backslash newline (ASCII codes 92, 10) to facilitate splitting content across multiple lines when convenient for formatting.
Additional C preprocessor capabilities may be supported, but are not
guaranteed—e.g., macros with arguments. Similar to C, #include can
specify a file name either within double quotes or within <>.
\~ Begin P4Example # include
The difference between the two forms is the order in which the preprocessor searches for header files when the path is incompletely specified.
P4 compilers should correctly handle #line directives that may be
generated during preprocessing. This functionality allows P4 programs to
be built from multiple source files, potentially produced by different
programmers at different times:
- the P4 core library, defined in this document,
- the architecture, defining data plane interfaces and extern blocks,
- user-defined libraries of useful components (e.g. standard protocol header definitions), and
- the P4 programs that specify the behavior of each programmable block.
The P4 language specification defines a core library that includes several common programming constructs. A description of the core library is provided in Appendix [#sec-p4-core-lib]. All P4 programs must include the core library. Including the core library is done with
\~ Begin P4Example # include \<core.p4> \~ End P4Example
All P4 keywords use only ASCII characters. All P4 identifiers must use only ASCII characters. P4 compilers should handle correctly strings containing 8-bit characters in comments and string literals. P4 is case-sensitive. Whitespace characters, including newlines are treated as token separators. Indentation is free-form; however, P4 has C-like block constructs, and all our examples use C-style indentation. Tab characters are treated as spaces.
The lexer recognizes the following kinds of terminals:
IDENTIFIER: start with a letter or underscore, and contain letters, digits and underscoresTYPE_IDENTIFIER: identifier that denotes a type nameINTEGER: integer literalsDONTCARE: a single underscore- Keywords such as
RETURN. By convention, each keyword terminal corresponds to a language keyword with the same spelling but using lowercase. For example, theRETURNterminal corresponds to thereturnkeyword.
Identifiers
P4 identifiers may contain only letters, numbers, and the underscore
character _, and must start with a letter or underscore. The special
identifier consisting of a single underscore _ is reserved to indicate
a “don’t care” value; its type may vary depending on the context.
Certain keywords (e.g., apply) can be used as identifiers if the
context makes it unambiguous.
\~ Begin P4Grammar [INCLUDE=grammar.mdk:nonTypeName]
name : nonTypeName | TYPE_IDENTIFIER ; \~ End P4Grammar
Comments
P4 supports several kinds of comments:
- Single-line comments, introduced by
//and spanning to the end of line, - Multi-line comments, enclosed between
/*and*/ - Nested multi-line comments are not supported.
- Javadoc-style comments, starting with
/**and ending with*/
Use of Javadoc-style comments is strongly encouraged for the tables and actions that are used to synthesize the interface with the control-plane.
P4 treats comments as token separators and no comments are allowed
within a token—e.g. bi/**/t is parsed as two tokens, bi and t, and
not as a single token bit.
Literal constants
Boolean literals
There are two Boolean literal constants: true and false.
Integer literals
Integer literals are non-negative arbitrary-precision integers. By default, literals are represented in base 10. The following prefixes must be employed to specify the base explicitly:
0xor0Xindicates base 16 (hexadecimal)0oor0Oindicates base 8 (octal)0dor0Dindicates base 10 (decimal)0bor0Bindicates base 2
The width of a numeric literal in bits can be specified by an unsigned number prefix consisting of a number of bits and a signedness indicator:
windicates unsigned numberssindicates signed numbers
Note that a leading zero by itself does not indicate an octal (base 8) constant. The underscore character is considered a digit within number literals but is ignored when computing the value of the parsed number. This allows long constant numbers to be more easily read by grouping digits together. The underscore cannot be used in the width specification or as the first character of an integer literal. No comments or whitespaces are allowed within a literal. Here are some examples of numeric literals:
\~ Begin P4Example 32w255 // a 32-bit unsigned number with value 255 32w0d255 // same value as above 32w0xFF // same value as above 32s0xFF // a 32-bit signed number with value 255 8w0b10101010 // an 8-bit unsigned number with value 0xAA 8w0b_1010_1010 // same value as above 8w170 // same value as above 8s0b1010_1010 // an 8-bit signed number with value -86 16w0377 // 16-bit unsigned number with value 377 (not 255!) 16w0o377 // 16-bit unsigned number with value 255 (base 8) \~ End P4Example
String literals
String literals are specified as an arbitrary sequence of 8-bit
characters, enclosed within double quote characters " (ASCII code 34).
Strings start with a double quote character and extend to the first
double quote sign which is not immediately preceded by an odd number of
backslash characters (ASCII code 92). P4 does not make any validity
checks on strings (i.e., it does not check that strings represent legal
UTF-8 encodings).
Since P4 does not provide any operations on strings, string literals are generally passed unchanged through the P4 compiler to other third-party tools or compiler-backends, including the terminating quotes. These tools can define their own handling of escape sequences (e.g., how to specify Unicode characters, or handle unprintable ASCII characters).
- Here are 3 examples of string literals:
Begin P4Example “simple string” “string " with " embedded " quotes” “string with embedded line terminator” End P4Example
Optional trailing commas
The P4 grammar allows several kinds of comma-separated lists to end in an optional comma.
\~Begin P4Grammar [INCLUDE=grammar.mdk:optTrailingComma] \~End P4Grammar
For example, the following declarations are both legal, and have the same meaning:
\~Begin P4Example enum E { a, b, c }
enum E { a, b, c, } \~End P4Example
This is particularly useful in combination with preprocessor directives:
\~Begin P4Example enum E { #if SUPPORT_A a, #endif b, c, } \~End P4Example
P4 provides a rich assortment of types. Base types include bit-strings, numbers, and errors. There are also built-in types for representing constructs such as parsers, pipelines, actions, and tables. Users can construct new types based on these: structures, enumerations, headers, header stacks, header unions, etc.
In this document we adopt the following conventions:
- Built-in types are written with lowercase characters—e.g.,
int<20>, - User-defined types are capitalized—e.g.,
IPv4Address, - Type variables are always uppercase—e.g.,
parser P<H, IH>(), - Variables are uncapitalized— e.g.,
ipv4header, - Constants are written with uppercase characters—e.g.,
CPU_PORT, and -
Errors and enumerations are written in camel-case— e.g.
PacketTooShort. -
A P4 program is a list of declarations:
Begin P4Grammar [INCLUDE=grammar.mdk:p4program] -
[INCLUDE=grammar.mdk:declaration]
End P4Grammar
An empty declarations is indicated with a single semicolon. (Allowing
empty declarations accommodates the habits of C/C++ and Java
programmers—e.g., certain constructs, like struct, do not require a
terminating semicolon).
Scopes
Some P4 constructs act as namespaces that create local scopes for names including:
- Derived type declarations (
struct,header,header_union,enum), which introduce local scopes for field names, - Block statements, which introduce local lexically-enclosed scopes,
parser,table,action, andcontrolblocks, which introduce local scopes- Declarations with type variables, which introduce a new scope for
those variables. For example, in the following
externdeclaration, the scope of the type variableHextends to the end of the declaration:
\~ Begin P4Example extern E
The order of declarations is important; with the exception of parser states, all uses of a symbol must follow the symbol’s declaration. (This is a departure from P414, which allows declarations in any order. This requirement significantly simplifies the implementation of compilers for P4, allowing compilers to use additional information about declared identifiers to resolve ambiguities.)
Stateful elements
Most P4 constructs are stateless: given some inputs they produce a result that solely depends on these inputs. There are only two stateful constructs that may retain information across packets:
-
tables: Tables are read-only for the data plane, but their entries can be modified by the control-plane, -
externobjects: many objects have state that can be read and written by the control plane and data plane. All constructs from the P414 language version that encapsulate state (e.g., counters, meters, registers) are represented usingexternobjects in P416.
In P4 all stateful elements must be explicitly allocated at compilation-time through the process called “instantiation”.
In addition, parsers, control blocks, and packages may contain
stateful element instantiations. Thus, they are also treated as stateful
elements, even if they appear to contain no state, and must be
instantiated before they can be used. However, although they are
stateful, tables do not need to be instantiated explicitly—declaring a
table also creates an instance of it. This convention is designed to
support the common case, since most tables are used just once. To have
finer-grained control over when a table is instantiated, a programmer
can declare it within a control.
Recall the example in Section [#sec-vss-all]: TopParser, TopPipe,
TopDeparser, Checksum16, and Switch are types. There are two
instances of Checksum16, one in TopParser and one in TopDeparser,
both called ck. The TopParser, TopDeparser, TopPipe, and
Switch are instantiated at the end of the program, in the declaration
of the main instance object, which is an instance of the Switch type
(a package).
L-values are expressions that may appear on the left side of an
assignment operation or as arguments corresponding to out and inout
function parameters. An l-value represents a storage reference. The
following expressions are legal l-values:
\~ Begin P4Grammar [INCLUDE=grammar.mdk:prefixedNonTypeName]
- [INCLUDE=grammar.mdk:lvalue]
End P4Grammar
- Identifiers of a base or derived type.
- Structure, header, and header union field member access operations (using the dot notation).
- References to elements within header stacks (see Section
[#sec-expr-hs]): indexing, and references to
lastandnext. - The result of a bit-slice operator
[m:l].
The following is a legal l-value: headers.stack[4].field. Note that
method and function calls cannot return l-values.
P4 provides multiple constructs for writing modular programs: extern methods, parsers, controls, actions. All these constructs behave similarly to procedures in standard general-purpose programming languages:
- They have named and typed parameters.
- They introduce a new local scope for parameters and local variables.
- They allow arguments to be passed by binding them to their parameters.
Invocations are executed using copy-in/copy-out semantics.
Each parameter may be labeled with a direction:
inparameters are read-only. It is an error to use aninparameter on the left-hand side of an assignment or to pass it to a callee as a non-inargument.inparameters are initialized by copying the value of the corresponding argument when the invocation is executed.outparameters are, with a few exceptions listed below, uninitialized and are treated as l-values (See Section [#sec-lvalues]) within the body of the method or function. An argument passed as anoutparameter must be an l-value; after the execution of the call, the value of the parameter is copied to the corresponding storage location for that l-value.inoutparameters behave like a combination ofinandoutparameters simultaneously: On entry the value of the arguments is copied to the parameters. On return the value of the parameters is copied back to the arguments. In consequence, an argument passed as aninoutparameter must be an l-value.- The meaning of parameters with no direction depends upon the kind of
entity the parameter is for:
- For anything other than an action, e.g. a control, parser, or function, a directionless parameter means that the value supplied as an argument in a call must be a compile-time known value (see Section [#sec-compile-time-known]).
- For an action, a directionless parameter indicates that it is
“action data”. See Section [#sec-actions] for the meaning
of action data, but its meaning includes the following
possibilities:
- The parameter’s value is provided in the P4 program. In this
case, the parameter behaves as if the direction were
in. Such an argument expression need not be a compile-time known value. - The parameter’s value is provided by the control plane software when an entry is added to a table that uses that action. See Section [#sec-actions].
- The parameter’s value is provided in the P4 program. In this
case, the parameter behaves as if the direction were
A directionless parameter of extern object type is passed by reference.
Direction out parameters are always initialized at the beginning of
execution of the portion of the program that has the out parameters,
e.g. control, parser, action, function, etc. This initialization
is not performed for parameters with any direction that is not out.
- If a direction
outparameter is of typeheaderorheader_union, it is set to “invalid”. - If a direction
outparameter is of type header stack, all elements of the header stack are set to “invalid”, and itsnextIndexfield is initialized to 0 (see Section [#sec-expr-hs]). - If a direction
outparameter is a compound type, e.g. a struct or tuple, other than one of the types listed above, then apply these rules recursively to its members. - If a direction
outparameter has any other type, e.g.bit<W>, an implementation need not initialize it to any predictable value.
-
For example, if a direction
outparameter has types2_tnamedp:
Begin P4Example header h1_t { bit\<8> f1; bit\<8> f2; } struct s1_t { h1_t h1a; bit\<3> a; bit\<7> b; } struct s2_t { h1_t h1b; s1_t s1; bit\<5> c; }End P4Example
then at the beginning of execution of the part of the program that has
the out parameter p, it must be initialized so that p.h1b and and
p.s1.h1a are invalid. No other parts of p are required to be
initialized.
Arguments are evaluated from left to right prior to the invocation of the function itself. The order of evaluation is important when the expression supplied for an argument can have side-effects. Consider the following example:
\~ Begin P4Example extern void f(inout bit x, in bit y); extern bit g(inout bit z); bit a; f(a, g(a)); \~ End P4Example
Note that the evaluation of g may mutate its argument a, so the
compiler has to ensure that the value passed to f for its first
parameter is not changed by the evaluation of the second argument. The
semantics for evaluating a function call is given by the following
algorithm (implementations can be different as long as they provide the
same result):
-
Arguments are evaluated from left to right as they appear in the function call expression.
-
If a parameter has a default value and no corresponding argument issupplied, the default value is used as an argument.
-
For each
outandinoutargument the corresponding l-value is saved (so it cannot be changed by the evaluation of the following arguments). This is important if the argument contains indexing operations into a header stack. -
The value of each argument is saved into a temporary.
-
The function is invoked with the temporaries as arguments. We are guaranteed that the temporaries that are passed as arguments are never aliased to each other, so this “generated” function call can be implemented using call-by-reference if supported by the architecture.
-
On function return, the temporaries that correspond to
outorinoutarguments are copied in order from left to right into the l-values saved in Step 3.
According to this algorithm, the previous function call is equivalent to the following sequence of statements:
\~ Begin P4Example bit tmp1 = a; // evaluate a; save result bit tmp2 = g(a); // evaluate g(a); save result; modifies a f(tmp1, tmp2); // evaluate f; modifies tmp1 a = tmp1; // copy inout result back into a \~ End P4Example
To see why Step 3 in the above algorithm is important, consider the following example:
\~ Begin P4Example header H { bit z; } H[2] s; f(s[a].z, g(a)); \~ End P4Example
The evaluation of this call is equivalent to the following sequence of statements:
\~ Begin P4Example bit tmp1 = a; // save the value of a bit tmp2 = s[tmp1].z; // evaluate first argument bit tmp3 = g(a); // evaluate second argument; modifies a f(tmp2, tmp3); // evaluate f; modifies tmp2 s[tmp1].z = tmp2; // copy inout result back; dest is not s[a].z \~ End P4Example
When used as arguments, extern objects can only be passed as
directionless parameters—e.g., see the packet argument in the very
simple switch example.
Justification
The main reason for using copy-in/copy-out semantics (instead of the
more common call-by-reference semantics) is for controlling the
side-effects of extern functions and methods. extern methods and
functions are the main mechanism by which a P4 program communicates with
its environment. With copy-in/copy-out semantics extern functions
cannot hold references to P4 program objects; this enables the compiler
to limit the side-effects that extern functions may have on the P4
program both in space (they can only affect out parameters) and in
time (side-effects can only occur at function call time).
In general, extern functions are arbitrarily powerful: they can store
information in global storage, spawn separate threads, “collude” with
each other to share information — but they cannot access any variable in
a P4 program. With copy-in/copy-out semantics the compiler can still
reason about P4 programs that invoke extern functions.
There are additional benefits of using copy-in copy-out semantics:
- It enables P4 to be compiled for architectures that do not support references (e.g., where all data is allocated to named registers. Such architectures may require indices into header stacks that appear in a program to be compile-time known values.)
- It simplifies some compiler analyses, since function parameters can never alias to each other within the function body.
\~ Begin P4Grammar [INCLUDE=grammar.mdk:parameterList]
[INCLUDE=grammar.mdk:nonEmptyParameterList]
[INCLUDE=grammar.mdk:parameter]
- [INCLUDE=grammar.mdk:direction]
End P4Grammar
Following is a summary of the constraints imposed by the parameter directions:
- When used as arguments, extern objects can only be passed as directionless parameters.
- All constructor parameters are evaluated at compilation-time, and in
consequence they must all be directionless (they cannot be
in,out, orinout); this applies topackage,control,parser, andexternobjects. Expressions for these parameters must be supplied at compile-time, and they must evaluate to compile-time known values. See Section [#sec-parameterization] for further details. - For actions all directionless parameters must be at the end of the
parameter list. When an action appears in a
table’sactionslist, only the parameters with a direction must be bound. See Section [#sec-actions] for further details. - Actions can also be explicitly invoked using function call syntax,
either from a control block or from another action. In this case,
values for all action parameters must be supplied explicitly,
including values for the directionless parameters. The directionless
parameters in this case behave like
inparameters. See Section [#sec-invoke-actions] for further details. - Default expressions are only allowed for ‘in’ or direction-less parameters, and the expressions supplied as defaults must be compile-time known values.
- If parameters with default values do not appear at the end of the list of parameters, invocations that use the default values must use named arguments, as in the following example:
\~ Begin P4Example extern void f(in bit a, in bit\<3> b = 2, in bit\<5> c);
void g() { f(a = 1, b = 2, c = 3); // ok f(a = 1, c = 3); // ok, equivalent to the previous call, b uses default value f(1, 2, 3); // ok, equivalent to the previous call // f(1, 3); // illegal, since the parameter b is not the last in the list } \~ End P4Example
Optional parameters
A parameter that is annotated with the @optional annotation is
optional: the user may omit the value for that parameter in an
invocation. Optional parameters can only appear for arguments of:
packages, parser types, control types, extern functions, extern methods,
and extern object constructors. Optional parameters cannot have default
values. If a procedure-like construct has both optional parameters and
default values then it can only be called using named arguments. It is
recommended, but not mandatory, for all optional parameters to be at the
end of a parameter list.
The implementation of such objects is not expressed in P4, so the meaning and implementation of optional parameters should be specified by the target architecture. For example, we can imagine a two-stage switch architecture where the second stage is optional. This could be declared as a package with an optional parameter:
\~Begin P4Example package pipeline(/* parameters omitted */); package switch(pipeline first, @optional pipeline second);
pipeline(/* arguments omitted */) ingress; switch(ingress) main; // a switch with a single-stage pipeline \~End P4Example
Here the target architecture could implement the elided optional argument using an empty pipeline.
The following example shows optional parameters and parameters with default values.
\~Begin P4Example extern void h(in bit\<32> a, in bool b = true); // default value
// function calls h(10); // same as h(10, true); h(a = 10); // same as h(10, true); h(a = 10, b = true);
struct Empty {} control nothing(inout Empty h, inout Empty m) { apply {} }
parser parserProto\<H, M>(packet_in p, out H h, inout M m); control controlProto\<H, M>(inout H h, inout M m);
package pack\<HP, MP, HC, MC>( @optional parserProto\<HP, MP> _parser, // optional parameter controlProto\<HC, MC> _control = nothing()); // default parameter value
pack() main; // No value for _parser, _control is an instance of nothing() \~End P4Example
P4 objects that introduce namespaces are organized in a hierarchical fashion. There is a top-level unnamed namespace containing all top-level declarations.
Identifiers prefixed with a dot are always resolved in the top-level namespace.
\~ Begin P4Example const bit\<32> x = 2; control c() { int\<32> x = 0; apply { x = x + (int\<32>).x; // x is the int\<32> local variable, // .x is the top-level bit\<32> variable } } \~ End P4Example
References to resolve an identifier are attempted inside-out, starting with the current scope and proceeding to all lexically enclosing scopes. The compiler may provide a warning if multiple resolutions are possible for the same name (name shadowing).
\~ Begin P4Example const bit\<4> x = 1; control p() { const bit\<8> x = 8; // x declaration shadows global x const bit\<4> y = .x; // reference to top-level x const bit\<8> z = x; // reference to p’s local x apply {} } \~ End P4Example
Identifiers defined in the top-level namespace are globally visible.
Declarations within a parser or control are private and cannot be
referred to from outside of the enclosing parser or control.