6. P4 language definition

The P4 language can be viewed as having several distinct components, which we describe separately:

The core language, comprising of types, variables, scoping, declarations, statements, expressions, etc. We start by describing this part of the language.
A sub-language for expressing parsers, based on state machines (Section [#sec-packet-parsing]).
A sub-language for expressing computations using match-action units, based on traditional imperative control-flow (Section [#sec-control]).
A sub-language for describing architectures (Section [#sec-arch-desc]).

Grammar

The complete grammar of P4₁₆ is given in Appendix [#sec-grammar], using Yacc/Bison grammar description language. This text is based on the same grammar. We adopt several standard conventions when we provide excerpts from the grammar:

UPPERCASE symbols denote terminals in the grammar.
- Excerpts from the grammar are given in BNF notation as follows:
  Begin P4Grammar [INCLUDE=grammar.mdk:p4program] End P4Grammar

Pseudo-code (mostly used for describing the semantics of various P4 constructs) are shown with fixed-size fonts as in the following example:

\~ Begin P4Pseudo ParserModel.verify(bool condition, error err) { if (condition == false) { ParserModel.parserError = err; goto reject; } } \~ End P4Pseudo

Semantics and the P4 abstract machines

We describe the semantics of P4 in terms of abstract machines executing traditional imperative code. There is an abstract machine for each P4 sub-language (parser, control). The abstract machines are described in this text in pseudo-code and English.

P4 compilers are free to reorganize the code they generate in any way as long as the externally visible behaviors of the P4 programs are preserved as described by this specification where externally visible behavior is defined as:

The input/output behavior of all P4 blocks, and
The state maintained by extern blocks.

To aid composition of programs from multiple source files P4 compilers should support the following subset of the C preprocessor functionality:

#define for defining macros (without arguments)
#undef
#if #else #endif #ifdef #ifndef #elif
#include

The preprocessor should also remove the sequence backslash newline (ASCII codes 92, 10) to facilitate splitting content across multiple lines when convenient for formatting.

Additional C preprocessor capabilities may be supported, but are not guaranteed—e.g., macros with arguments. Similar to C, #include can specify a file name either within double quotes or within <>.

\~ Begin P4Example # include # include “user_file” \~ End P4Example

The difference between the two forms is the order in which the preprocessor searches for header files when the path is incompletely specified.

P4 compilers should correctly handle #line directives that may be generated during preprocessing. This functionality allows P4 programs to be built from multiple source files, potentially produced by different programmers at different times:

the P4 core library, defined in this document,
the architecture, defining data plane interfaces and extern blocks,
user-defined libraries of useful components (e.g. standard protocol header definitions), and
the P4 programs that specify the behavior of each programmable block.

The P4 language specification defines a core library that includes several common programming constructs. A description of the core library is provided in Appendix [#sec-p4-core-lib]. All P4 programs must include the core library. Including the core library is done with

\~ Begin P4Example # include \<core.p4> \~ End P4Example

All P4 keywords use only ASCII characters. All P4 identifiers must use only ASCII characters. P4 compilers should handle correctly strings containing 8-bit characters in comments and string literals. P4 is case-sensitive. Whitespace characters, including newlines are treated as token separators. Indentation is free-form; however, P4 has C-like block constructs, and all our examples use C-style indentation. Tab characters are treated as spaces.

The lexer recognizes the following kinds of terminals:

IDENTIFIER: start with a letter or underscore, and contain letters, digits and underscores
TYPE_IDENTIFIER: identifier that denotes a type name
INTEGER: integer literals
DONTCARE: a single underscore
Keywords such as RETURN. By convention, each keyword terminal corresponds to a language keyword with the same spelling but using lowercase. For example, the RETURN terminal corresponds to the return keyword.

Identifiers

P4 identifiers may contain only letters, numbers, and the underscore character _, and must start with a letter or underscore. The special identifier consisting of a single underscore _ is reserved to indicate a “don’t care” value; its type may vary depending on the context. Certain keywords (e.g., apply) can be used as identifiers if the context makes it unambiguous.

\~ Begin P4Grammar [INCLUDE=grammar.mdk:nonTypeName]

name : nonTypeName | TYPE_IDENTIFIER ; \~ End P4Grammar

Comments

P4 supports several kinds of comments:

Single-line comments, introduced by // and spanning to the end of line,
Multi-line comments, enclosed between /* and */
Nested multi-line comments are not supported.
Javadoc-style comments, starting with /** and ending with */

Use of Javadoc-style comments is strongly encouraged for the tables and actions that are used to synthesize the interface with the control-plane.

P4 treats comments as token separators and no comments are allowed within a token—e.g. bi/**/t is parsed as two tokens, bi and t, and not as a single token bit.

Literal constants

Boolean literals

There are two Boolean literal constants: true and false.

Integer literals

Integer literals are non-negative arbitrary-precision integers. By default, literals are represented in base 10. The following prefixes must be employed to specify the base explicitly:

0x or 0X indicates base 16 (hexadecimal)
0o or 0O indicates base 8 (octal)
0d or 0D indicates base 10 (decimal)
0b or 0B indicates base 2

The width of a numeric literal in bits can be specified by an unsigned number prefix consisting of a number of bits and a signedness indicator:

w indicates unsigned numbers
s indicates signed numbers

Note that a leading zero by itself does not indicate an octal (base 8) constant. The underscore character is considered a digit within number literals but is ignored when computing the value of the parsed number. This allows long constant numbers to be more easily read by grouping digits together. The underscore cannot be used in the width specification or as the first character of an integer literal. No comments or whitespaces are allowed within a literal. Here are some examples of numeric literals:

\~ Begin P4Example 32w255 // a 32-bit unsigned number with value 255 32w0d255 // same value as above 32w0xFF // same value as above 32s0xFF // a 32-bit signed number with value 255 8w0b10101010 // an 8-bit unsigned number with value 0xAA 8w0b_1010_1010 // same value as above 8w170 // same value as above 8s0b1010_1010 // an 8-bit signed number with value -86 16w0377 // 16-bit unsigned number with value 377 (not 255!) 16w0o377 // 16-bit unsigned number with value 255 (base 8) \~ End P4Example

String literals

String literals are specified as an arbitrary sequence of 8-bit characters, enclosed within double quote characters " (ASCII code 34). Strings start with a double quote character and extend to the first double quote sign which is not immediately preceded by an odd number of backslash characters (ASCII code 92). P4 does not make any validity checks on strings (i.e., it does not check that strings represent legal UTF-8 encodings).

Since P4 does not provide any operations on strings, string literals are generally passed unchanged through the P4 compiler to other third-party tools or compiler-backends, including the terminating quotes. These tools can define their own handling of escape sequences (e.g., how to specify Unicode characters, or handle unprintable ASCII characters).

Here are 3 examples of string literals:
Begin P4Example “simple string” “string " with " embedded " quotes” “string with embedded line terminator” End P4Example

Optional trailing commas

The P4 grammar allows several kinds of comma-separated lists to end in an optional comma.

\~Begin P4Grammar [INCLUDE=grammar.mdk:optTrailingComma] \~End P4Grammar

For example, the following declarations are both legal, and have the same meaning:

\~Begin P4Example enum E { a, b, c }

enum E { a, b, c, } \~End P4Example

This is particularly useful in combination with preprocessor directives:

\~Begin P4Example enum E { #if SUPPORT_A a, #endif b, c, } \~End P4Example

P4 provides a rich assortment of types. Base types include bit-strings, numbers, and errors. There are also built-in types for representing constructs such as parsers, pipelines, actions, and tables. Users can construct new types based on these: structures, enumerations, headers, header stacks, header unions, etc.

In this document we adopt the following conventions:

Built-in types are written with lowercase characters—e.g., int<20>,
User-defined types are capitalized—e.g., IPv4Address,
Type variables are always uppercase—e.g., parser P<H, IH>(),
Variables are uncapitalized— e.g., ipv4header,
Constants are written with uppercase characters—e.g., CPU_PORT, and
Errors and enumerations are written in camel-case— e.g. PacketTooShort.
A P4 program is a list of declarations:
Begin P4Grammar [INCLUDE=grammar.mdk:p4program]
[INCLUDE=grammar.mdk:declaration]
End P4Grammar

An empty declarations is indicated with a single semicolon. (Allowing empty declarations accommodates the habits of C/C++ and Java programmers—e.g., certain constructs, like struct, do not require a terminating semicolon).

Scopes

Some P4 constructs act as namespaces that create local scopes for names including:

Derived type declarations (struct, header, header_union, enum), which introduce local scopes for field names,
Block statements, which introduce local lexically-enclosed scopes,
parser, table, action, and control blocks, which introduce local scopes
Declarations with type variables, which introduce a new scope for those variables. For example, in the following extern declaration, the scope of the type variable H extends to the end of the declaration:

\~ Begin P4Example extern E(/* parameters omitted /) { / body omitted */ } // scope of H ends here. \~ End P4Example

The order of declarations is important; with the exception of parser states, all uses of a symbol must follow the symbol’s declaration. (This is a departure from P4₁₄, which allows declarations in any order. This requirement significantly simplifies the implementation of compilers for P4, allowing compilers to use additional information about declared identifiers to resolve ambiguities.)

Stateful elements

Most P4 constructs are stateless: given some inputs they produce a result that solely depends on these inputs. There are only two stateful constructs that may retain information across packets:

tables: Tables are read-only for the data plane, but their entries can be modified by the control-plane,
extern objects: many objects have state that can be read and written by the control plane and data plane. All constructs from the P4₁₄ language version that encapsulate state (e.g., counters, meters, registers) are represented using extern objects in P4₁₆.

In P4 all stateful elements must be explicitly allocated at compilation-time through the process called “instantiation”.

In addition, parsers, control blocks, and packages may contain stateful element instantiations. Thus, they are also treated as stateful elements, even if they appear to contain no state, and must be instantiated before they can be used. However, although they are stateful, tables do not need to be instantiated explicitly—declaring a table also creates an instance of it. This convention is designed to support the common case, since most tables are used just once. To have finer-grained control over when a table is instantiated, a programmer can declare it within a control.

Recall the example in Section [#sec-vss-all]: TopParser, TopPipe, TopDeparser, Checksum16, and Switch are types. There are two instances of Checksum16, one in TopParser and one in TopDeparser, both called ck. The TopParser, TopDeparser, TopPipe, and Switch are instantiated at the end of the program, in the declaration of the main instance object, which is an instance of the Switch type (a package).

L-values are expressions that may appear on the left side of an assignment operation or as arguments corresponding to out and inout function parameters. An l-value represents a storage reference. The following expressions are legal l-values:

\~ Begin P4Grammar [INCLUDE=grammar.mdk:prefixedNonTypeName]

[INCLUDE=grammar.mdk:lvalue]
End P4Grammar

Identifiers of a base or derived type.
Structure, header, and header union field member access operations (using the dot notation).
References to elements within header stacks (see Section [#sec-expr-hs]): indexing, and references to last and next.
The result of a bit-slice operator [m:l].

The following is a legal l-value: headers.stack[4].field. Note that method and function calls cannot return l-values.

P4 provides multiple constructs for writing modular programs: extern methods, parsers, controls, actions. All these constructs behave similarly to procedures in standard general-purpose programming languages:

They have named and typed parameters.
They introduce a new local scope for parameters and local variables.
They allow arguments to be passed by binding them to their parameters.

Invocations are executed using copy-in/copy-out semantics.

Each parameter may be labeled with a direction:

in parameters are read-only. It is an error to use an in parameter on the left-hand side of an assignment or to pass it to a callee as a non-in argument. in parameters are initialized by copying the value of the corresponding argument when the invocation is executed.
out parameters are, with a few exceptions listed below, uninitialized and are treated as l-values (See Section [#sec-lvalues]) within the body of the method or function. An argument passed as an out parameter must be an l-value; after the execution of the call, the value of the parameter is copied to the corresponding storage location for that l-value.
inout parameters behave like a combination of in and out parameters simultaneously: On entry the value of the arguments is copied to the parameters. On return the value of the parameters is copied back to the arguments. In consequence, an argument passed as an inout parameter must be an l-value.
The meaning of parameters with no direction depends upon the kind of entity the parameter is for:
- For anything other than an action, e.g. a control, parser, or function, a directionless parameter means that the value supplied as an argument in a call must be a compile-time known value (see Section [#sec-compile-time-known]).
- For an action, a directionless parameter indicates that it is “action data”. See Section [#sec-actions] for the meaning of action data, but its meaning includes the following possibilities:
  - The parameter’s value is provided in the P4 program. In this case, the parameter behaves as if the direction were in. Such an argument expression need not be a compile-time known value.
  - The parameter’s value is provided by the control plane software when an entry is added to a table that uses that action. See Section [#sec-actions].

A directionless parameter of extern object type is passed by reference.

Direction out parameters are always initialized at the beginning of execution of the portion of the program that has the out parameters, e.g. control, parser, action, function, etc. This initialization is not performed for parameters with any direction that is not out.

If a direction out parameter is of type header or header_union, it is set to “invalid”.
If a direction out parameter is of type header stack, all elements of the header stack are set to “invalid”, and its nextIndex field is initialized to 0 (see Section [#sec-expr-hs]).
If a direction out parameter is a compound type, e.g. a struct or tuple, other than one of the types listed above, then apply these rules recursively to its members.
If a direction out parameter has any other type, e.g. bit<W>, an implementation need not initialize it to any predictable value.

For example, if a direction out parameter has type s2_t named p:
Begin P4Example header h1_t { bit\<8> f1; bit\<8> f2; } struct s1_t { h1_t h1a; bit\<3> a; bit\<7> b; } struct s2_t { h1_t h1b; s1_t s1; bit\<5> c; }

End P4Example

then at the beginning of execution of the part of the program that has the out parameter p, it must be initialized so that p.h1b and and p.s1.h1a are invalid. No other parts of p are required to be initialized.

Arguments are evaluated from left to right prior to the invocation of the function itself. The order of evaluation is important when the expression supplied for an argument can have side-effects. Consider the following example:

\~ Begin P4Example extern void f(inout bit x, in bit y); extern bit g(inout bit z); bit a; f(a, g(a)); \~ End P4Example

Note that the evaluation of g may mutate its argument a, so the compiler has to ensure that the value passed to f for its first parameter is not changed by the evaluation of the second argument. The semantics for evaluating a function call is given by the following algorithm (implementations can be different as long as they provide the same result):

Arguments are evaluated from left to right as they appear in the function call expression.
If a parameter has a default value and no corresponding argument is

supplied, the default value is used as an argument.
For each out and inout argument the corresponding l-value is saved (so it cannot be changed by the evaluation of the following arguments). This is important if the argument contains indexing operations into a header stack.
The value of each argument is saved into a temporary.
The function is invoked with the temporaries as arguments. We are guaranteed that the temporaries that are passed as arguments are never aliased to each other, so this “generated” function call can be implemented using call-by-reference if supported by the architecture.
On function return, the temporaries that correspond to out or inout arguments are copied in order from left to right into the l-values saved in Step 3.

According to this algorithm, the previous function call is equivalent to the following sequence of statements:

\~ Begin P4Example bit tmp1 = a; // evaluate a; save result bit tmp2 = g(a); // evaluate g(a); save result; modifies a f(tmp1, tmp2); // evaluate f; modifies tmp1 a = tmp1; // copy inout result back into a \~ End P4Example

To see why Step 3 in the above algorithm is important, consider the following example:

\~ Begin P4Example header H { bit z; } H[2] s; f(s[a].z, g(a)); \~ End P4Example

The evaluation of this call is equivalent to the following sequence of statements:

\~ Begin P4Example bit tmp1 = a; // save the value of a bit tmp2 = s[tmp1].z; // evaluate first argument bit tmp3 = g(a); // evaluate second argument; modifies a f(tmp2, tmp3); // evaluate f; modifies tmp2 s[tmp1].z = tmp2; // copy inout result back; dest is not s[a].z \~ End P4Example

When used as arguments, extern objects can only be passed as directionless parameters—e.g., see the packet argument in the very simple switch example.

Justification

The main reason for using copy-in/copy-out semantics (instead of the more common call-by-reference semantics) is for controlling the side-effects of extern functions and methods. extern methods and functions are the main mechanism by which a P4 program communicates with its environment. With copy-in/copy-out semantics extern functions cannot hold references to P4 program objects; this enables the compiler to limit the side-effects that extern functions may have on the P4 program both in space (they can only affect out parameters) and in time (side-effects can only occur at function call time).

In general, extern functions are arbitrarily powerful: they can store information in global storage, spawn separate threads, “collude” with each other to share information — but they cannot access any variable in a P4 program. With copy-in/copy-out semantics the compiler can still reason about P4 programs that invoke extern functions.

There are additional benefits of using copy-in copy-out semantics:

It enables P4 to be compiled for architectures that do not support references (e.g., where all data is allocated to named registers. Such architectures may require indices into header stacks that appear in a program to be compile-time known values.)
It simplifies some compiler analyses, since function parameters can never alias to each other within the function body.

\~ Begin P4Grammar [INCLUDE=grammar.mdk:parameterList]

[INCLUDE=grammar.mdk:nonEmptyParameterList]

[INCLUDE=grammar.mdk:parameter]

[INCLUDE=grammar.mdk:direction]
End P4Grammar

Following is a summary of the constraints imposed by the parameter directions:

When used as arguments, extern objects can only be passed as directionless parameters.
All constructor parameters are evaluated at compilation-time, and in consequence they must all be directionless (they cannot be in, out, or inout); this applies to package, control, parser, and extern objects. Expressions for these parameters must be supplied at compile-time, and they must evaluate to compile-time known values. See Section [#sec-parameterization] for further details.
For actions all directionless parameters must be at the end of the parameter list. When an action appears in a table’s actions list, only the parameters with a direction must be bound. See Section [#sec-actions] for further details.
Actions can also be explicitly invoked using function call syntax, either from a control block or from another action. In this case, values for all action parameters must be supplied explicitly, including values for the directionless parameters. The directionless parameters in this case behave like in parameters. See Section [#sec-invoke-actions] for further details.
Default expressions are only allowed for ‘in’ or direction-less parameters, and the expressions supplied as defaults must be compile-time known values.
If parameters with default values do not appear at the end of the list of parameters, invocations that use the default values must use named arguments, as in the following example:

\~ Begin P4Example extern void f(in bit a, in bit\<3> b = 2, in bit\<5> c);

void g() { f(a = 1, b = 2, c = 3); // ok f(a = 1, c = 3); // ok, equivalent to the previous call, b uses default value f(1, 2, 3); // ok, equivalent to the previous call // f(1, 3); // illegal, since the parameter b is not the last in the list } \~ End P4Example

Optional parameters

A parameter that is annotated with the @optional annotation is optional: the user may omit the value for that parameter in an invocation. Optional parameters can only appear for arguments of: packages, parser types, control types, extern functions, extern methods, and extern object constructors. Optional parameters cannot have default values. If a procedure-like construct has both optional parameters and default values then it can only be called using named arguments. It is recommended, but not mandatory, for all optional parameters to be at the end of a parameter list.

The implementation of such objects is not expressed in P4, so the meaning and implementation of optional parameters should be specified by the target architecture. For example, we can imagine a two-stage switch architecture where the second stage is optional. This could be declared as a package with an optional parameter:

\~Begin P4Example package pipeline(/* parameters omitted */); package switch(pipeline first, @optional pipeline second);

pipeline(/* arguments omitted */) ingress; switch(ingress) main; // a switch with a single-stage pipeline \~End P4Example

Here the target architecture could implement the elided optional argument using an empty pipeline.

The following example shows optional parameters and parameters with default values.

\~Begin P4Example extern void h(in bit\<32> a, in bool b = true); // default value

// function calls h(10); // same as h(10, true); h(a = 10); // same as h(10, true); h(a = 10, b = true);

struct Empty {} control nothing(inout Empty h, inout Empty m) { apply {} }

parser parserProto\<H, M>(packet_in p, out H h, inout M m); control controlProto\<H, M>(inout H h, inout M m);

package pack\<HP, MP, HC, MC>( @optional parserProto\<HP, MP> _parser, // optional parameter controlProto\<HC, MC> _control = nothing()); // default parameter value

pack() main; // No value for _parser, _control is an instance of nothing() \~End P4Example

P4 objects that introduce namespaces are organized in a hierarchical fashion. There is a top-level unnamed namespace containing all top-level declarations.

Identifiers prefixed with a dot are always resolved in the top-level namespace.

\~ Begin P4Example const bit\<32> x = 2; control c() { int\<32> x = 0; apply { x = x + (int\<32>).x; // x is the int\<32> local variable, // .x is the top-level bit\<32> variable } } \~ End P4Example

References to resolve an identifier are attempted inside-out, starting with the current scope and proceeding to all lexically enclosing scopes. The compiler may provide a warning if multiple resolutions are possible for the same name (name shadowing).

\~ Begin P4Example const bit\<4> x = 1; control p() { const bit\<8> x = 8; // x declaration shadows global x const bit\<4> y = .x; // reference to top-level x const bit\<8> z = x; // reference to p’s local x apply {} } \~ End P4Example

Identifiers defined in the top-level namespace are globally visible. Declarations within a parser or control are private and cannot be referred to from outside of the enclosing parser or control.