13. Packet parsing

This section describes the P4 constructs specific to parsing network packets.

\~ Figure { #fig-parserstatemachine; caption: “Parser FSM structure.” } [parserstatemachine] \~ [parserstatemachine]: figs/parserstatemachine.png { height: 5cm; page-align: here }

[]{tex-cmd: “”} A P4 parser describes a state machine with one start state and two final states. The start state is always named start. The two final states are named accept (indicating successful parsing) and reject (indicating a parsing failure). The start state is part of the parser, while the accept and reject states are distinct from the states provided by the programmer and are logically outside of the parser. Figure [#fig-parserstatemachine] illustrates the general structure of a parser state machine.

A parser declaration comprises a name, a list of parameters, an optional list of constructor parameters, local elements, and parser states (as well as optional annotations).

\~ Begin P4Grammar [INCLUDE=grammar.mdk:parserTypeDeclaration]

[INCLUDE=grammar.mdk:parserDeclaration]

[INCLUDE=grammar.mdk:parserLocalElements]

[INCLUDE=grammar.mdk:parserStates]
End P4Grammar

For a description of optConstructorParameters, which are useful for building parameterized parsers, see Section [#sec-parameterization].

Unlike parser type declarations, parser declarations may not be generic—e.g., the following declaration is illegal:

\~ Begin P4Example parser P(inout H data) { /* body omitted */ } \~ End P4Example

Hence, used in the context of a parserDeclaration the production rule parserTypeDeclaration should not yield type parameters.

At least one state, named start, must be present in any parser. A parser may not define two states with the same name. It is also illegal for a parser to give explicit definitions for the accept and reject states—those states are logically distinct from the states defined by the programmer.

State declarations are described below. Preceding the parser states, a parser may also contain a list of local elements. These can be constants, variables, or instantiations of objects that may be used within the parser. Such objects may be instantiations of extern objects, or other parsers that may be invoked as subroutines. However, it is illegal to instantiate a control block within a parser.

\~ Begin P4Grammar [INCLUDE=grammar.mdk:parserLocalElement] \~ End P4Grammar

The states and local elements are all in the same namespace, thus the following example will produce an error:

\~ Begin P4Example // erroneous example parser p() { bit\<4> t; state start { t = 1; transition t; } state t { // error: name t is duplicated transition accept; } } \~ End P4Example

For an example containing a complete declaration of a parser see Section [#sec-vss-all].

The semantics of a P4 parser can be formulated in terms of an abstract machine that manipulates a ParserModel data structure. This section describes this abstract machine in pseudo-code.

A parser starts execution in the start state and ends execution when one of the reject or accept states has been reached.

\~ Begin P4Pseudo ParserModel { error parseError; onPacketArrival(packet p) { ParserModel.parseError = error.NoError; goto start; } } \~ End P4Pseudo

An architecture must specify the behavior when the accept and reject states are reached. For example, an architecture may specify that all packets reaching the reject state are dropped without further processing. Alternatively, it may specify that such packets are passed to the next block after the parser, with intrinsic metadata indicating that the parser reached the reject state, along with the error recorded.

A parser state is declared with the following syntax:
Begin P4Grammar [INCLUDE=grammar.mdk:parserState] End P4Grammar

Each state has a name and a body. The body consists of a sequence of statements that describe the processing performed when the parser transitions to that state, including:

Local variable declarations,
Assignment statements,
Method calls, which serve several purposes:
- Invoking functions (e.g., using verify to check the validity of data already parsed), and
- Invoking methods (e.g., extracting data out of packets or computing checksums) and other parsers (see Section [#sec-invoke-subparser]), and
Conditional statements,
Transitions to other states (discussed in Section [#sec-transition]).

The syntax for parser statements is given by the following grammar rules:
Begin P4Grammar [INCLUDE=grammar.mdk:parserStatements]

[INCLUDE=grammar.mdk:parserStatement]

[INCLUDE=grammar.mdk:parserBlockStatement]
End P4Grammar

Architectures may place restrictions on the expressions and statements that can be used in a parser—e.g., they may forbid the use of operations such as multiplication or place restrictions on the number of local variables that may be used.

In terms of the ParserModel, the sequence of statements in a state are executed sequentially.

The last statement in a parser state is an optional transition statement, which transfers control to another state, possibly accept or reject. A transition statements is written using the following syntax:

\~ Begin P4Grammar [INCLUDE=grammar.mdk:transitionStatement]

[INCLUDE=grammar.mdk:stateExpression]
End P4Grammar

The execution of the transition statement causes stateExpression to be evaluated, and transfers control to the resulting state.

In terms of the ParserModel, the semantics of a transition statement can be formalized as follows:

\~ Begin P4Example goto eval(stateExpression) \~ End P4Example

For example, this statement:
Begin P4Example transition accept;

End P4Example

terminates execution of the current parser and transitions immediately to the accept state.

If the body of a state block does not end with a transition statement, the implied statement is

\~ Begin P4Example transition reject; \~ End P4Example

A select expression evaluates to a state. The syntax for a select expression is as follows:

\~ Begin P4Grammar [INCLUDE=grammar.mdk:selectExpression]

[INCLUDE=grammar.mdk:selectCaseList]

[INCLUDE=grammar.mdk:selectCase]
End P4Grammar

Each expression in the expressionList must have a type of bit<W>, int<W>, bool, enum, serializable enum, or a tuple type with fields of one of the above types.

In a select expression, if the expressionList has type tuple<T>, then each keysetExpression must have type set<tuple<T>>. In particular, if a set is specified as a range or mask expression, the endpoints of the range and mask expression are implicitly cast to type T using the standard rules for casts.

In terms of the ParserModel, the meaning of a select expression:
Begin P4Example select(e) { ks[0]: s[0]; ks[1]: s[1]; /* more labels omitted */ ks[n-2]: s[n-1]; _ : sd; // ks[n-1] is default }

End P4Example
is defined in pseudo-code as:
Begin P4Pseudo key = eval(e); for (int i=0; i \< n; i++) { keyset = eval(ks[i]); if (keyset.contains(key)) return s[i]; } verify(false, error.NoMatch);

End P4Pseudo

Some targets may require that all keyset expressions in a select expression be compile-time known values. Keysets are evaluated in order, from top to bottom as implied by the pseudo-code above; the first keyset that includes the value in the select argument provides the result state. If no label matches, the execution triggers a runtime error with the standard error code error.NoMatch.

Note that this implies that all cases after a default or _ label are unreachable; the compiler should emit a warning if it detects unreachable cases. This constitutes an important difference between select expressions and the switch statements found in many programming languages since the keysets of a select expression may “overlap”.

The typical way to use a select expression is to compare the value of a recently-extracted header field against a set of values, as in the following example:

\~ Begin P4Example header IPv4_h { bit\<8> protocol; /* more fields omitted / } struct P { IPv4_h ipv4; / more fields omitted */ } P headers; select (headers.ipv4.protocol) { 8w6 : parse_tcp; 8w17 : parse_udp; _ : accept; } \~ End P4Example

For example, to detect TCP reserved ports (\< 1024) one could write:
Begin P4Example select (p.tcp.port) { 16w0 &&& 16w0xFC00: well_known_port; _: other_port; }

End P4Example

The expression 16w0 &&& 16w0xFC00 describes the set of 16-bit values whose most significant six bits are zero.

Some targets may support parser value sets; see Section [#sec-value-set]. Given a type T for the type parameter of the value set, the type of the value set is set<T>. The type of the value set must match to the type of all other keysetExpressions in the same select expression. If there is a mismatch, the compiler must raise an error. The type of the values in the set must be either bit\<>, int\<>, tuple, struct, or serializable enum.

For example, to allow the control plane API to specify TCP reserved ports at runtime, one could write:

\~ Begin P4Example struct vsk_t { @match(ternary) bit\<16> port; } value_set(4) pvs; select (p.tcp.port) { pvs: runtime_defined_port; _: other_port; } \~ End P4Example

The above example allows the runtime API to populate up to 4 different keysetExpressions in the value_set. If the value_set takes a struct as type parameter, the runtime API can use the struct field names to name the objects in the value set. The match type of the struct field is specified with the @match annotation. If the @match annotation is not specified on a struct field, by default it is assumed to be @match(exact). A single non-exact field must be placed into a struct by itself, with the desired @match annotation.

The verify statement provides a simple form of error handling. verify can only be invoked within a parser; it is used syntactically as if it were a function with the following signature:

\~ Begin P4Example extern void verify(in bool condition, in error err); \~ End P4Example

If the first argument is true, then executing the statement has no side-effect. However, if the first argument is false, it causes an immediate transition to reject, which causes immediate parsing termination; at the same time, the parserError associated with the parser is set to the value of the second argument.

In terms of the ParserModel the semantics of a verify statement is given by:

\~ Begin P4Pseudo ParserModel.verify(bool condition, error err) { if (condition == false) { ParserModel.parserError = err; goto reject; } } \~ End P4Pseudo

The P4 core library contains the following declaration of a built-in extern type called packet_in that represents incoming network packets. The packet_in extern is special: it cannot be instantiated by the user explicitly. Instead, the architecture supplies a separate instance for each packet_in argument to a parser instantiation.

\~ Begin P4Example extern packet_in { void extract(out T headerLvalue); void extract(out T variableSizeHeader, in bit\<32> varFieldSizeBits); T lookahead(); bit\<32> length(); // This method may be unavailable in some architectures void advance(bit\<32> bits); } \~ End P4Example

To extract data from a packet represented by an argument b with type packet_in, a parser invokes the extract methods of b. There are two variants of the extract method: a one-argument variant for extracting fixed-size headers, and a two-argument variant for extracting variable-sized headers. Because these operations can cause runtime verification failures (see below), these methods can only be executed within parsers.

When extracting data into a bit-string or integer, the first packet bit is extracted to the most significant bit of the integer.

Some targets may perform cut-through packet processing, i.e., they may start processing a packet before its length is known (i.e., before all bytes have been received). On such a target calls to the packet_in.length() method cannot be implemented. Attempts to call this method should be flagged as errors (either at compilation time by the compiler back-end, or when attempting to load the compiled P4 program onto a target that does not support this method).

In terms of the ParserModel, the semantics of packet_in can be captured using the following abstract model of packets:

\~ Begin P4Pseudo packet_in { unsigned nextBitIndex; byte[] data; unsigned lengthInBits; void initialize(byte[] data) { this.data = data; this.nextBitIndex = 0; this.lengthInBits = data.sizeInBytes * 8; } bit\<32> length() { return this.lengthInBits / 8; } } \~ End P4Pseudo

Fixed-width extraction

The single-argument extract method handles fixed-width headers, and is declared in P4 as follows:

\~ Begin P4Example void extract(out T headerLeftValue); \~ End P4Example

The expression headerLeftValue must evaluate to an l-value (see Section [#sec-lvalues]) of type header with a fixed width. If this method executes successfully, on completion the headerLvalue is filled with data from the packet and its validity bit is set to true. This method may fail in various ways—e.g., if there are not enough bits left in the packet to fill the specified header.

For example, the following program fragment extracts an Ethernet header:
Begin P4Example struct Result { Ethernet_h ethernet; /* more fields omitted */ } parser P(packet_in b, out Result r) { state start { b.extract(r.ethernet); } }

End P4Example

In terms of the ParserModel, the semantics of the single-argument extract is given in terms of the following pseudo-code method, using data from the packet class defined above. We use the special valid$ identifier to indicate the hidden valid bit of a header, isNext$ to indicate that the l-value was obtained using next, and nextIndex$ to indicate the corresponding header or header union stack properties.

\~ Begin P4Pseudo void packet_in.extract(out T headerLValue) { bitsToExtract = sizeofInBits(headerLValue); lastBitNeeded = this.nextBitIndex + bitsToExtract; ParserModel.verify(this.lengthInBits >= lastBitNeeded, error.PacketTooShort); headerLValue = this.data.extractBits(this.nextBitIndex, bitsToExtract); headerLValue.valid$ = true; if headerLValue.isNext$ { verify(headerLValue.nextIndex$ \< headerLValue.size, error.StackOutOfBounds); headerLValue.nextIndex$ = headerLValue.nextIndex$ + 1; } this.nextBitIndex += bitsToExtract; } \~ End P4Pseudo

Variable-width extraction

The two-argument extract handles variable-width headers, and is declared in P4 as follows:
Begin P4Example void extract(out T headerLvalue, in bit\<32> variableFieldSize);

End P4Example

The expression headerLvalue must be an l-value representing a header that contains exactly one varbit field. The expression variableFieldSize must evaluate to a bit<32> value that indicates the number of bits to be extracted into the unique varbit field of the header (i.e., this size is not the size of the complete header, just the varbit field).

In terms of the ParserModel, the semantics of the two-argument extract is captured by the following pseudo-code:

\~ Begin P4Pseudo void packet_in.extract(out T headerLvalue, in bit\<32> variableFieldSize) { // targets are allowed to include the following line, but need not // verify(variableFieldSize[2:0] == 0, error.ParserInvalidArgument); bitsToExtract = sizeOfFixedPart(headerLvalue) + variableFieldSize; lastBitNeeded = this.nextBitIndex + bitsToExtract; ParserModel.verify(this.lengthInBits >= lastBitNeeded, error.PacketTooShort); ParserModel.verify(bitsToExtract \<= headerLvalue.maxSize, error.HeaderTooShort); headerLvalue = this.data.extractBits(this.nextBitIndex, bitsToExtract); headerLvalue.varbitField.size = variableFieldSize; headerLvalue.valid$ = true; if headerLValue.isNext$ { verify(headerLValue.nextIndex$ \< headerLValue.size, error.StackOutOfBounds); headerLValue.nextIndex$ = headerLValue.nextIndex$ + 1; } this.nextBitIndex += bitsToExtract; } \~ End P4Pseudo

The following example shows one way to parse IPv4 options—by splitting the IPv4 header into two separate headers:

\~ Begin P4Example // IPv4 header without options header IPv4_no_options_h { bit\<4> version; bit\<4> ihl; bit\<8> diffserv; bit\<16> totalLen; bit\<16> identification; bit\<3> flags; bit\<13> fragOffset; bit\<8> ttl; bit\<8> protocol; bit\<16> hdrChecksum; bit\<32> srcAddr; bit\<32> dstAddr; } header IPv4_options_h { varbit\<320> options; }

struct Parsed_headers { // Some fields omitted IPv4_no_options_h ipv4; IPv4_options_h ipv4options; }

error { InvalidIPv4Header }

parser Top(packet_in b, out Parsed_headers headers) { // Some states omitted

state parse_ipv4 { b.extract(headers.ipv4); verify(headers.ipv4.ihl >= 5, error.InvalidIPv4Header); transition select (headers.ipv4.ihl) { 5: dispatch_on_protocol; _: parse_ipv4_options; } }

state parse_ipv4_options { // use information in the ipv4 header to compute the number of bits to extract b.extract(headers.ipv4options, (bit\<32>)(((bit\<16>)headers.ipv4.ihl - 5) * 32)); transition dispatch_on_protocol; } } \~ End P4Example

Lookahead

The lookahead method provided by the packet_in packet abstraction evaluates to a set of bits from the input packet without advancing the nextBitIndex pointer. Similar to extract, it will transition to reject and set the error if there are not enough bits in the packet. When lookahead returns a value that contains headers (e.g., a header type, or a struct containing headers), the headers values in the returned result are always valid (otherwise lookahead must have transitioned to the reject state).

The lookahead method can be invoked as follows:
Begin P4Example b.lookahead()

End P4Example

where T must be a type with fixed width. In case of success the result of the evaluation of lookahead returns a value of type T.

In terms of the ParserModel, the semantics of lookahead is given by the following pseudocode:

\~ Begin P4Pseudo T packet_in.lookahead() { bitsToExtract = sizeof(T); lastBitNeeded = this.nextBitIndex + bitsToExtract; ParserModel.verify(this.lengthInBits >= lastBitNeeded, error.PacketTooShort); T tmp = this.data.extractBits(this.nextBitIndex, bitsToExtract); return tmp; } \~ End P4Pseudo

The TCP options example from Section [#sec-expr-hu] also illustrates how lookahead can be used:
Begin P4Example state start { transition select(b.lookahead\<bit\<8>>()) { 0: parse_tcp_option_end; 1: parse_tcp_option_nop; 2: parse_tcp_option_ss; 3: parse_tcp_option_s; 5: parse_tcp_option_sack; } }

// Some states omitted

state parse_tcp_option_sack { bit\<8> n = b.lookahead().length; b.extract(vec.next.sack, (bit\<32>) (8 * n - 16)); transition start; } \~ End P4Example

Skipping bits

P4 provides two ways to skip over bits in an input packet without assigning them to a header:

One way is to extract to the underscore identifier, explicitly specifying the type of the data:

\~ Begin P4Example b.extract(_) \~ End P4Example

Another way is to use the advance method of the packet when the number of bits to skip is known.

In terms of the ParserModel, the meaning of advance is given in pseudocode as follows:

\~ Begin P4Pseudo void packet_in.advance(bit\<32> bits) { // targets are allowed to include the following line, but need not // verify(bits[2:0] == 0, error.ParserInvalidArgument); lastBitNeeded = this.nextBitIndex + bits; ParserModel.verify(this.lengthInBits >= lastBitNeeded, error.PacketTooShort); this.nextBitIndex += bits; } \~ End P4Pseudo

A header stack has two properties, next and last, which can be used in parsing. Consider the following declaration, which defines a stack for representing the headers of a packet with at most ten MPLS headers:

\~ Begin P4Example header Mpls_h { bit\<20> label; bit\<3> tc; bit bos; bit\<8> ttl; } Mpls_h[10] mpls; \~ End P4Example

The expression mpls.next represents an l-value of type Mpls_h that references an element in the mpls stack. Initially, mpls.next refers to the first element of stack. It is automatically advanced on each successful call to extract. The mpls.last property refers to the element immediately preceding next if such an element exists. Attempting to access mpls.next element when the stack’s nextIndex counter is greater than or equal to size causes a transition to reject and sets the error to error.StackOutOfBounds. Likewise, attempting to access mpls.last when the nextIndex counter is equal to 0 causes a transition to reject and sets the error to error.StackOutOfBounds.

The following example shows a simplified parser for MPLS processing:
Begin P4Example struct Pkthdr { Ethernet_h ethernet; Mpls_h[3] mpls; // other headers omitted }

parser P(packet_in b, out Pkthdr p) { state start { b.extract(p.ethernet); transition select(p.ethernet.etherType) { 0x8847: parse_mpls; 0x0800: parse_ipv4; } } state parse_mpls { b.extract(p.mpls.next); transition select(p.mpls.last.bos) { 0: parse_mpls; // This creates a loop 1: parse_ipv4; } } // other states omitted } \~ End P4Example

P4 allows parsers to invoke the services of other parsers, similar to subroutines. To invoke the services of another parser, the sub-parser must be first instantiated; the services of an instance are invoked by calling it using its apply method.

The following example shows a sub-parser invocation:
Begin P4Example parser callee(packet_in packet, out IPv4 ipv4) { /* body omitted */ } parser caller(packet_in packet, out Headers h) { callee() subparser; // instance of callee state subroutine { subparser.apply(packet, h.ipv4); // invoke sub-parser transition accept; // accept if sub-parser ends in accept state } }

End P4Example

The semantics of a sub-parser invocation can be described as follows:

The state invoking the sub-parser is split into two half-states at the parser invocation statement.
The top half includes a transition to the sub-parser start state.
The sub-parser’s accept state is identified with the bottom half of the current state
The sub-parser’s reject state is identified with the reject state of the current parser.

\~ Figure { #fig-subparser; caption: “Semantics of invoking a sub-parser: top: original program, bottom: equivalent program.” } [subparser] \~ [subparser]: figs/subparser.png { width: 60%; page-align: here }

[]{tex-cmd: “”} Figure [#fig-subparser] shows a diagram of this process.

Note that since P4 requires definitions to precede uses, it is impossible to create recursive (or mutually recursive) parsers.

When a parser is instantiated, local instantiations of stateful objects are evaluated recursively. That is, each instantiation of a parser has a unique set of local parser value sets, extern objects, inner parser instances, etc. Thus, in general, invoking a parser instance twice is not the same as invoking two copies of the same parser instance. Note however that local variables do not persist across invocations of the parser. This semantics also applies to direct invocation (see Section [#sec-direct-invocation]).

Architectures may impose (static or dynamic) constraints on the number of parser states that can be traversed for processing each packet. For example, a compiler for a specific target may reject parsers containing loops that cannot be unrolled at compilation time or that may contain cycles that do not advance the cursor. If a parser aborts execution dynamically because it exceeded the time budget allocated for parsing, the parser should transition to reject and set the standard error error.ParserTimeout.

In some cases, the values that determine the transition from one parser state to another need to be determined at run time. MPLS is one example where the value of the MPLS label field is used to determine what headers follow the MPLS tag and this mapping may change dynamically at run time. To support this functionality, P4 supports the notion of a Parser Value Set. This is a named set of values with a run time API to add and remove values from the set.

Value sets are declared locally within a parser. They should be declared before being referenced in parser keysetExpression and can be used as a label in a select expression.

The syntax for declaring value sets is:
Begin P4Grammar [INCLUDE=grammar.mdk:valueSetDeclaration]

End P4Grammar

Parser Value Sets support a size argument to provide hints to the compiler to reserve hardware resources to implement the value set. For example, this parser value set:

\~ Begin P4Example value_set\<bit\<16>>(4) pvs; \~ End P4Example

creates a value_set of size 4 with entries of type bit<16>.

The semantics of the size argument is similar to the size property of a table. If a value set has a size argument with value N, it is recommended that a compiler should choose a data plane implementation that is capable of storing N value set entries. See “Size property of P4 tables and parser value sets” P4SizeProperty for further discussion on the implementation of parser value set size.

The value set is populated by the control plane by methods specified in the P4Runtime specification[4].