13. Packet parsing
This section describes the P4 constructs specific to parsing network packets.
\~ Figure { #fig-parserstatemachine; caption: “Parser FSM structure.” } [parserstatemachine] \~ [parserstatemachine]: figs/parserstatemachine.png { height: 5cm; page-align: here }
[]{tex-cmd: “”} A P4 parser describes a state machine with one start
state and two final states. The start state is always named start. The
two final states are named accept (indicating successful parsing) and
reject (indicating a parsing failure). The start state is part of
the parser, while the accept and reject states are distinct from the
states provided by the programmer and are logically outside of the
parser. Figure [#fig-parserstatemachine] illustrates the general
structure of a parser state machine.
A parser declaration comprises a name, a list of parameters, an optional list of constructor parameters, local elements, and parser states (as well as optional annotations).
\~ Begin P4Grammar [INCLUDE=grammar.mdk:parserTypeDeclaration]
[INCLUDE=grammar.mdk:parserDeclaration]
[INCLUDE=grammar.mdk:parserLocalElements]
- [INCLUDE=grammar.mdk:parserStates]
End P4Grammar
For a description of optConstructorParameters, which are useful for
building parameterized parsers, see Section [#sec-parameterization].
Unlike parser type declarations, parser declarations may not be generic—e.g., the following declaration is illegal:
\~ Begin P4Example parser P
Hence, used in the context of a parserDeclaration the production rule
parserTypeDeclaration should not yield type parameters.
At least one state, named start, must be present in any parser. A
parser may not define two states with the same name. It is also illegal
for a parser to give explicit definitions for the accept and reject
states—those states are logically distinct from the states defined by
the programmer.
State declarations are described below. Preceding the parser states, a
parser may also contain a list of local elements. These can be
constants, variables, or instantiations of objects that may be used
within the parser. Such objects may be instantiations of extern
objects, or other parsers that may be invoked as subroutines. However,
it is illegal to instantiate a control block within a parser.
\~ Begin P4Grammar [INCLUDE=grammar.mdk:parserLocalElement] \~ End P4Grammar
The states and local elements are all in the same namespace, thus the following example will produce an error:
\~ Begin P4Example // erroneous example parser p() { bit\<4> t; state start { t = 1; transition t; } state t { // error: name t is duplicated transition accept; } } \~ End P4Example
For an example containing a complete declaration of a parser see Section [#sec-vss-all].
The semantics of a P4 parser can be formulated in terms of an abstract
machine that manipulates a ParserModel data structure. This section
describes this abstract machine in pseudo-code.
A parser starts execution in the start state and ends execution when
one of the reject or accept states has been reached.
\~ Begin P4Pseudo ParserModel { error parseError; onPacketArrival(packet p) { ParserModel.parseError = error.NoError; goto start; } } \~ End P4Pseudo
An architecture must specify the behavior when the accept and reject
states are reached. For example, an architecture may specify that all
packets reaching the reject state are dropped without further
processing. Alternatively, it may specify that such packets are passed
to the next block after the parser, with intrinsic metadata indicating
that the parser reached the reject state, along with the error
recorded.
- A parser state is declared with the following syntax:
Begin P4Grammar [INCLUDE=grammar.mdk:parserState] End P4Grammar
Each state has a name and a body. The body consists of a sequence of statements that describe the processing performed when the parser transitions to that state, including:
- Local variable declarations,
- Assignment statements,
- Method calls, which serve several purposes:
- Invoking functions (e.g., using
verifyto check the validity of data already parsed), and - Invoking methods (e.g., extracting data out of packets or computing checksums) and other parsers (see Section [#sec-invoke-subparser]), and
- Invoking functions (e.g., using
- Conditional statements,
- Transitions to other states (discussed in Section [#sec-transition]).
- The syntax for parser statements is given by the following grammar
rules:
Begin P4Grammar [INCLUDE=grammar.mdk:parserStatements]
[INCLUDE=grammar.mdk:parserStatement]
- [INCLUDE=grammar.mdk:parserBlockStatement]
End P4Grammar
Architectures may place restrictions on the expressions and statements that can be used in a parser—e.g., they may forbid the use of operations such as multiplication or place restrictions on the number of local variables that may be used.
In terms of the ParserModel, the sequence of statements in a state are
executed sequentially.
The last statement in a parser state is an optional transition
statement, which transfers control to another state, possibly accept
or reject. A transition statements is written using the following
syntax:
\~ Begin P4Grammar [INCLUDE=grammar.mdk:transitionStatement]
- [INCLUDE=grammar.mdk:stateExpression]
End P4Grammar
The execution of the transition statement causes stateExpression to be
evaluated, and transfers control to the resulting state.
In terms of the ParserModel, the semantics of a transition statement
can be formalized as follows:
\~ Begin P4Example goto eval(stateExpression) \~ End P4Example
-
For example, this statement:
Begin P4Example transition accept;End P4Example
terminates execution of the current parser and transitions immediately
to the accept state.
If the body of a state block does not end with a transition statement,
the implied statement is
\~ Begin P4Example transition reject; \~ End P4Example
A select expression evaluates to a state. The syntax for a select
expression is as follows:
\~ Begin P4Grammar [INCLUDE=grammar.mdk:selectExpression]
[INCLUDE=grammar.mdk:selectCaseList]
- [INCLUDE=grammar.mdk:selectCase]
End P4Grammar
Each expression in the expressionList must have a type of bit<W>,
int<W>, bool, enum, serializable enum, or a tuple type with
fields of one of the above types.
In a select expression, if the expressionList has type tuple<T>,
then each keysetExpression must have type set<tuple<T>>. In
particular, if a set is specified as a range or mask expression, the
endpoints of the range and mask expression are implicitly cast to type
T using the standard rules for casts.
-
In terms of the
ParserModel, the meaning of a select expression:
Begin P4Example select(e) { ks[0]: s[0]; ks[1]: s[1]; /* more labels omitted */ ks[n-2]: s[n-1]; _ : sd; // ks[n-1] is default }End P4Example
-
is defined in pseudo-code as:
Begin P4Pseudo key = eval(e); for (int i=0; i \< n; i++) { keyset = eval(ks[i]); if (keyset.contains(key)) return s[i]; } verify(false, error.NoMatch);End P4Pseudo
Some targets may require that all keyset expressions in a select
expression be compile-time known values. Keysets are evaluated in order,
from top to bottom as implied by the pseudo-code above; the first keyset
that includes the value in the select argument provides the result
state. If no label matches, the execution triggers a runtime error with
the standard error code error.NoMatch.
Note that this implies that all cases after a default or _ label are
unreachable; the compiler should emit a warning if it detects
unreachable cases. This constitutes an important difference between
select expressions and the switch statements found in many
programming languages since the keysets of a select expression may
“overlap”.
The typical way to use a select expression is to compare the value of
a recently-extracted header field against a set of values, as in the
following example:
\~ Begin P4Example header IPv4_h { bit\<8> protocol; /* more fields omitted / } struct P { IPv4_h ipv4; / more fields omitted */ } P headers; select (headers.ipv4.protocol) { 8w6 : parse_tcp; 8w17 : parse_udp; _ : accept; } \~ End P4Example
-
For example, to detect TCP reserved ports (\< 1024) one could write:
Begin P4Example select (p.tcp.port) { 16w0 &&& 16w0xFC00: well_known_port; _: other_port; }End P4Example
The expression 16w0 &&& 16w0xFC00 describes the set of 16-bit values
whose most significant six bits are zero.
Some targets may support parser value sets; see Section
[#sec-value-set]. Given a type T for the type parameter of the
value set, the type of the value set is set<T>. The type of the value
set must match to the type of all other keysetExpressions in the same
select expression. If there is a mismatch, the compiler must raise an
error. The type of the values in the set must be either bit\<>,
int\<>, tuple, struct, or serializable enum.
For example, to allow the control plane API to specify TCP reserved ports at runtime, one could write:
\~ Begin P4Example struct vsk_t { @match(ternary) bit\<16> port; }
value_set
The above example allows the runtime API to populate up to 4 different
keysetExpressions in the value_set. If the value_set takes a
struct as type parameter, the runtime API can use the struct field names
to name the objects in the value set. The match type of the struct field
is specified with the @match annotation. If the @match annotation is
not specified on a struct field, by default it is assumed to be
@match(exact). A single non-exact field must be placed into a struct
by itself, with the desired @match annotation.
The verify statement provides a simple form of error handling.
verify can only be invoked within a parser; it is used syntactically
as if it were a function with the following signature:
\~ Begin P4Example extern void verify(in bool condition, in error err); \~ End P4Example
If the first argument is true, then executing the statement has no
side-effect. However, if the first argument is false, it causes an
immediate transition to reject, which causes immediate parsing
termination; at the same time, the parserError associated with the
parser is set to the value of the second argument.
In terms of the ParserModel the semantics of a verify statement is
given by:
\~ Begin P4Pseudo ParserModel.verify(bool condition, error err) { if (condition == false) { ParserModel.parserError = err; goto reject; } } \~ End P4Pseudo
The P4 core library contains the following declaration of a built-in
extern type called packet_in that represents incoming network
packets. The packet_in extern is special: it cannot be instantiated by
the user explicitly. Instead, the architecture supplies a separate
instance for each packet_in argument to a parser instantiation.
\~ Begin P4Example extern packet_in { void extract
To extract data from a packet represented by an argument b with type
packet_in, a parser invokes the extract methods of b. There are
two variants of the extract method: a one-argument variant for
extracting fixed-size headers, and a two-argument variant for extracting
variable-sized headers. Because these operations can cause runtime
verification failures (see below), these methods can only be executed
within parsers.
When extracting data into a bit-string or integer, the first packet bit is extracted to the most significant bit of the integer.
Some targets may perform cut-through packet processing, i.e., they may
start processing a packet before its length is known (i.e., before all
bytes have been received). On such a target calls to the
packet_in.length() method cannot be implemented. Attempts to call this
method should be flagged as errors (either at compilation time by the
compiler back-end, or when attempting to load the compiled P4 program
onto a target that does not support this method).
In terms of the ParserModel, the semantics of packet_in can be
captured using the following abstract model of packets:
\~ Begin P4Pseudo packet_in { unsigned nextBitIndex; byte[] data; unsigned lengthInBits; void initialize(byte[] data) { this.data = data; this.nextBitIndex = 0; this.lengthInBits = data.sizeInBytes * 8; } bit\<32> length() { return this.lengthInBits / 8; } } \~ End P4Pseudo
Fixed-width extraction
The single-argument extract method handles fixed-width headers, and is
declared in P4 as follows:
\~ Begin P4Example void extract
The expression headerLeftValue must evaluate to an l-value (see
Section [#sec-lvalues]) of type header with a fixed width. If this
method executes successfully, on completion the headerLvalue is filled
with data from the packet and its validity bit is set to true. This
method may fail in various ways—e.g., if there are not enough bits left
in the packet to fill the specified header.
-
For example, the following program fragment extracts an Ethernet header:
Begin P4Example struct Result { Ethernet_h ethernet; /* more fields omitted */ } parser P(packet_in b, out Result r) { state start { b.extract(r.ethernet); } }End P4Example
In terms of the ParserModel, the semantics of the single-argument
extract is given in terms of the following pseudo-code method, using
data from the packet class defined above. We use the special valid$
identifier to indicate the hidden valid bit of a header, isNext$ to
indicate that the l-value was obtained using next, and nextIndex$ to
indicate the corresponding header or header union stack properties.
\~ Begin P4Pseudo void packet_in.extract
Variable-width extraction
-
The two-argument
extracthandles variable-width headers, and is declared in P4 as follows:
Begin P4Example void extract(out T headerLvalue, in bit\<32> variableFieldSize); End P4Example
The expression headerLvalue must be an l-value representing a header
that contains exactly one varbit field. The expression
variableFieldSize must evaluate to a bit<32> value that indicates
the number of bits to be extracted into the unique varbit field of the
header (i.e., this size is not the size of the complete header, just the
varbit field).
In terms of the ParserModel, the semantics of the two-argument
extract is captured by the following pseudo-code:
\~ Begin P4Pseudo void packet_in.extract
The following example shows one way to parse IPv4 options—by splitting the IPv4 header into two separate headers:
\~ Begin P4Example // IPv4 header without options header IPv4_no_options_h { bit\<4> version; bit\<4> ihl; bit\<8> diffserv; bit\<16> totalLen; bit\<16> identification; bit\<3> flags; bit\<13> fragOffset; bit\<8> ttl; bit\<8> protocol; bit\<16> hdrChecksum; bit\<32> srcAddr; bit\<32> dstAddr; } header IPv4_options_h { varbit\<320> options; }
struct Parsed_headers { // Some fields omitted IPv4_no_options_h ipv4; IPv4_options_h ipv4options; }
error { InvalidIPv4Header }
parser Top(packet_in b, out Parsed_headers headers) { // Some states omitted
state parse_ipv4 { b.extract(headers.ipv4); verify(headers.ipv4.ihl >= 5, error.InvalidIPv4Header); transition select (headers.ipv4.ihl) { 5: dispatch_on_protocol; _: parse_ipv4_options; } }
state parse_ipv4_options { // use information in the ipv4 header to compute the number of bits to extract b.extract(headers.ipv4options, (bit\<32>)(((bit\<16>)headers.ipv4.ihl - 5) * 32)); transition dispatch_on_protocol; } } \~ End P4Example
Lookahead
The lookahead method provided by the packet_in packet abstraction
evaluates to a set of bits from the input packet without advancing the
nextBitIndex pointer. Similar to extract, it will transition to
reject and set the error if there are not enough bits in the packet.
When lookahead returns a value that contains headers (e.g., a header
type, or a struct containing headers), the headers values in the
returned result are always valid (otherwise lookahead must have
transitioned to the reject state).
-
The
lookaheadmethod can be invoked as follows:
Begin P4Example b.lookahead() End P4Example
where T must be a type with fixed width. In case of success the result
of the evaluation of lookahead returns a value of type T.
In terms of the ParserModel, the semantics of lookahead is given by
the following pseudocode:
\~ Begin P4Pseudo T packet_in.lookahead
- The TCP options example from Section [#sec-expr-hu] also
illustrates how
lookaheadcan be used:
Begin P4Example state start { transition select(b.lookahead\<bit\<8>>()) { 0: parse_tcp_option_end; 1: parse_tcp_option_nop; 2: parse_tcp_option_ss; 3: parse_tcp_option_s; 5: parse_tcp_option_sack; } }
// Some states omitted
state parse_tcp_option_sack { bit\<8> n =
b.lookahead
Skipping bits
P4 provides two ways to skip over bits in an input packet without assigning them to a header:
One way is to extract to the underscore identifier, explicitly
specifying the type of the data:
\~ Begin P4Example b.extract
Another way is to use the advance method of the packet when the number
of bits to skip is known.
In terms of the ParserModel, the meaning of advance is given in
pseudocode as follows:
\~ Begin P4Pseudo void packet_in.advance(bit\<32> bits) { // targets are allowed to include the following line, but need not // verify(bits[2:0] == 0, error.ParserInvalidArgument); lastBitNeeded = this.nextBitIndex + bits; ParserModel.verify(this.lengthInBits >= lastBitNeeded, error.PacketTooShort); this.nextBitIndex += bits; } \~ End P4Pseudo
A header stack has two properties, next and last, which can be used
in parsing. Consider the following declaration, which defines a stack
for representing the headers of a packet with at most ten MPLS headers:
\~ Begin P4Example header Mpls_h { bit\<20> label; bit\<3> tc; bit bos; bit\<8> ttl; } Mpls_h[10] mpls; \~ End P4Example
The expression mpls.next represents an l-value of type Mpls_h that
references an element in the mpls stack. Initially, mpls.next refers
to the first element of stack. It is automatically advanced on each
successful call to extract. The mpls.last property refers to the
element immediately preceding next if such an element exists.
Attempting to access mpls.next element when the stack’s nextIndex
counter is greater than or equal to size causes a transition to
reject and sets the error to error.StackOutOfBounds. Likewise,
attempting to access mpls.last when the nextIndex counter is equal
to 0 causes a transition to reject and sets the error to
error.StackOutOfBounds.
- The following example shows a simplified parser for MPLS
processing:
Begin P4Example struct Pkthdr { Ethernet_h ethernet; Mpls_h[3] mpls; // other headers omitted }
parser P(packet_in b, out Pkthdr p) { state start { b.extract(p.ethernet); transition select(p.ethernet.etherType) { 0x8847: parse_mpls; 0x0800: parse_ipv4; } } state parse_mpls { b.extract(p.mpls.next); transition select(p.mpls.last.bos) { 0: parse_mpls; // This creates a loop 1: parse_ipv4; } } // other states omitted } \~ End P4Example
P4 allows parsers to invoke the services of other parsers, similar to
subroutines. To invoke the services of another parser, the sub-parser
must be first instantiated; the services of an instance are invoked by
calling it using its apply method.
-
The following example shows a sub-parser invocation:
Begin P4Example parser callee(packet_in packet, out IPv4 ipv4) { /* body omitted */ } parser caller(packet_in packet, out Headers h) { callee() subparser; // instance of callee state subroutine { subparser.apply(packet, h.ipv4); // invoke sub-parser transition accept; // accept if sub-parser ends in accept state } }End P4Example
The semantics of a sub-parser invocation can be described as follows:
- The state invoking the sub-parser is split into two half-states at the parser invocation statement.
- The top half includes a transition to the sub-parser
startstate. - The sub-parser’s
acceptstate is identified with the bottom half of the current state - The sub-parser’s
rejectstate is identified with the reject state of the current parser.
\~ Figure { #fig-subparser; caption: “Semantics of invoking a sub-parser: top: original program, bottom: equivalent program.” } [subparser] \~ [subparser]: figs/subparser.png { width: 60%; page-align: here }
[]{tex-cmd: “”} Figure [#fig-subparser] shows a diagram of this process.
Note that since P4 requires definitions to precede uses, it is impossible to create recursive (or mutually recursive) parsers.
When a parser is instantiated, local instantiations of stateful objects are evaluated recursively. That is, each instantiation of a parser has a unique set of local parser value sets, extern objects, inner parser instances, etc. Thus, in general, invoking a parser instance twice is not the same as invoking two copies of the same parser instance. Note however that local variables do not persist across invocations of the parser. This semantics also applies to direct invocation (see Section [#sec-direct-invocation]).
Architectures may impose (static or dynamic) constraints on the number
of parser states that can be traversed for processing each packet. For
example, a compiler for a specific target may reject parsers containing
loops that cannot be unrolled at compilation time or that may contain
cycles that do not advance the cursor. If a parser aborts execution
dynamically because it exceeded the time budget allocated for parsing,
the parser should transition to reject and set the standard error
error.ParserTimeout.
In some cases, the values that determine the transition from one parser state to another need to be determined at run time. MPLS is one example where the value of the MPLS label field is used to determine what headers follow the MPLS tag and this mapping may change dynamically at run time. To support this functionality, P4 supports the notion of a Parser Value Set. This is a named set of values with a run time API to add and remove values from the set.
Value sets are declared locally within a parser. They should be declared
before being referenced in parser keysetExpression and can be used as
a label in a select expression.
-
The syntax for declaring value sets is:
Begin P4Grammar [INCLUDE=grammar.mdk:valueSetDeclaration]End P4Grammar
Parser Value Sets support a size argument to provide hints to the
compiler to reserve hardware resources to implement the value set. For
example, this parser value set:
\~ Begin P4Example value_set\<bit\<16>>(4) pvs; \~ End P4Example
creates a value_set of size 4 with entries of type bit<16>.
The semantics of the size argument is similar to the size property
of a table. If a value set has a size argument with value N, it is
recommended that a compiler should choose a data plane implementation
that is capable of storing N value set entries. See “Size property of
P4 tables and parser value sets”
P4SizeProperty
for further discussion on the implementation of parser value set size.
The value set is populated by the control plane by methods specified in the P4Runtime specification[4].