P4 syntax for a static LPM entry in a table - network-programming

This is a question about the P4 language, the language for programing the data plane in networks.
Imagine I have the following simple header:
header ipv4_header_t {
bit<8> ttl;
bit<32> dst_addr;
}
struct headers_t {
ipv4_header_t ipv4_header;
}
And imagine I have a simple table that does a longest prefix match (LPM) lookup on the destination address.
table ipv4_fib {
key = {
headers.ipv4_header.dst_addr: lpm;
}
actions = {
act_miss;
act_hit;
}
const default_action = act_miss();
}
What is the P4 syntax for adding some static LPM entries to the table?
entries = {
????: act_hit(); // Want entry for 0.0.0.0/0
????: act_hit(); // Want entry for 10.0.0.0/8
????: act_hit(); // Want entry for 10.1.2.3/32
}

The following answer is courtesy of Vladimir Gurevich who answered the question in a conversation on the P4 slack channel (see discussion https://p4-lang.slack.com/archives/C8ZR5EN3F/p1587830767153100)
To add static entries to a longest prefix match (lpm) table, you must use the same syntax as a ternary entry, using the &&& syntax to provide a value and a mask:
table ipv4_fib {
key = {
headers.ipv4_header.dst_addr: lpm;
}
actions = {
act_miss;
act_hit;
}
const default_action = act_miss();
const entries = {
32w0x0a010203 &&& 32w0xffffffff: act_hit(1); // 10.1.2.3/32 -> 1
32w0x0a010200 &&& 32w0xffffff00: act_hit(2); // 10.1.2.0/24 -> 2
32w0x0a010000 &&& 32w0xffff0000: act_hit(3); // 10.1.0.0/16 -> 3
32w0x0a000000 &&& 32w0xff000000: act_hit(4); // 10.0.0.0/8 -> 4
}
}
A longest prefix match can be considered to be a special case of a ternary match, where the mask consists of a contiguous series of ones (1s) followed by a contiguous series of zeroes (0s).
On some (but not all) platforms a longest prefix match is actually implemented as a ternary match "under the hood".
Note 1: when you provide static entries in a table, the entries MUST be const, and hence it is no longer possible for the software to add or remove dynamic entries to or from the table. Since lpm tables are most typically used for dynamic forwarding tables, it is quite rare to see an lpm table with static entries.
Note 2: I have been told that some platforms use the order of the entries in a lpm table as the priority order for matching the entries. Thus, it is important to put the more specific entires (e.g. 10.1.0.0/16) before the less specific aggregate entries (e.g. 10.0.0.0/8). Technically, this could be considered a "bug" because in an lpm table the longest prefix match must always be preferred. This behavior is due to the fact that on some platforms an lpm table is actually implemented under the hood as a ternary table. I have also been told that the open source v1model does match the longest prefix (most specific) key, regardless of the order of the entries in the table (i.e. it does not have the "bug").
Note 3: if you try to add a default entry with an all-zeroes mask (example below) you will get an error (also given below). Use the default_action instead. There is a subtle different, though, between a default entry in the table and a default action: in the former case table.apply() will indicate result hit, and in the latter case it will indicate miss. An alternative approach is to use key _. The fact that a default entry causes an error might be considered a bug, and if so, that bug could be fixed in a later release of the P4 compiler:
const entries = {
32w0x0a010203 &&& 32w0xffffffff: act_hit(1); // 10.1.2.3/32 -> 1
32w0x0a000000 &&& 32w0xff000000: act_hit(2); // 10.0.0.0/8 -> 2
32w0x00000000 &&& 32w0x00000000: act_hit(3); // 0.0.0.0/0 -> 3
}
}
The error is:
$ p4c complex.p4
./complex.p4i(838): [--Werror=invalid] error: &&&: Invalid mask for LPM key
32w0x00000000 &&& 32w0x00000000: act_hit(3); // 0.0.0.0/0 -> 3
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Related

F#: To Design or not design with mutually dependably record types

I try to model trees with their nodes using F# records.
Here is an abstraction of what my design looks like:
type Tree = { Root: Node }
and Node = { Tree: Tree }
(There are other record fields, of course.)
The rationale is that given a tree I want to quickly access its root node; furthermore, I want to know the associated tree for any given node (including a root node).
Initializing a tree with its root works:
let rec newTree = { Root = newRoot }
and newRoot = { Tree = newTree }
Now, I read in this Stackoverflow post that this only works because of some accidental internal visibility on the backing fields, which also leads that any function initializing such tree/root records must reside within the same assembly as the type definitions (not too great, but I can live with that).
This post describes using options for the mutually dependable fields, but I really want to model that each tree has a root node (no empty trees in my system) and each node has a tree, without having to test for Some/None.
Is my design approach sound? Or should I rather model the intrinsic bound between trees and their nodes in another way?
there is no need to declare a type for root. from a type perspective, a tree is a node. the canonical way to define a tree in f# is like so
type Node =
| N of int * Node list // (inner) node
| L of int // leaf
let tree = N(3, [L(5); L(7)])
its your choice if you define a separate case for the leaf or simply use
type Node = N of int * Node list
int is the node data type. you will customize this or even use a generic.
i often use mutable children collections, then i use records like
type Node = { data: int; mutable children: Node list }
let root = { data=3; children=[] }
root.children <- [{ data=7; children=[] }]

Stata: perform a foreach loop to calculate kappa across a large data file

I have a data file in Stata with 50 variables
j-r-hp j-p-hp j-m-hp p-c-hp p-r-hp p-p-hp p-m-hp ... etc,
I want to perform a weighted kappa between pairs, so that the first might be
kap j-r-hp j-p-hp, wgt(w2)
and the next would be
kap j-r-hp j-m-hp, wgt(w2)
I am new to Stata. Is there a straightforward way to use a loop for this, like a foreach loop?
Your variable names are not legal names in Stata, so I've changed the hyphens to underscores in the example below. Also, I don't know what it means to 'perform a weighted kappa', so my answer uses random normal variables and the corr[elate] command. You can use the results that Stata leaves behind in r() (see return list) to gather the results for the separate analyses.
The idea is to gather the variables in a list using a local, then to loop over each element in that list (but skipping the repeated pairs using continue). If you have many variables with structured names, you could instead use ds, which leaves r(varlist) in r().Have a look at the help file for macros (help macro and help extended_fcn), especially the section on 'Macro extended functions for parsing'. Hope this helps.
clear
set obs 100
local vars j_r_hp j_p_hp j_m_hp p_c_hp p_r_hp p_p_hp p_m_hp
foreach var of local vars {
gen `var'=rnormal()
}
forval ii=1/`: word count `vars'' {
forval jj=1/`: word count `vars'' {
if `ii'<`jj' continue
corr `: word `ii' of `vars'' `: word `jj' of `vars''
}
}
You can take advantage of the user-written command tuples (run ssc install tuples):
clear
set more off
*----- example data -----
set obs 100
local vars j_r_hp j_p_hp j_m_hp p_c_hp p_r_hp p_p_hp p_m_hp
foreach var of local vars {
gen `var' = abs(round(rnormal()*100))
}
*----- what you want -----
tuples `vars', min(2) max(2)
forvalues i = 1/`ntuples' {
display _newline(3) "variables `tuple`i''"
kappa `tuple`i''
}
How you get the variables names together to feed them into tuples will depend on the dataset.
This is a variation on the helpful answer by #Matthijs, but it really won't fit well into a comment. The main extra twists are
The use of tokenize to avoid repeated use of word # of. After tokenize the separate words of the argument (here separate variable names) are held in macros 1 up. Thus tokenize a b c puts a in local macro 1, b in local macro 2 and c in local macro 3. Nested macro references are treated exactly like parenthesised expressions in elementary algebra; what is on the inside is evaluated first.
Focusing directly on part of the notional matrix of results on one side of the diagonal. The small trick is to ensure that one matrix subscript exceeds the other subscript.
Random normal input doesn't make sense for kap, but you will be using your own data any way.
clear
set obs 100
local vars j_r_hp j_p_hp j_m_hp p_c_hp p_r_hp p_p_hp p_m_hp
foreach var of local vars {
gen `var' = rnormal()
}
tokenize `vars'
local p : word count `vars'
local pm1 = `p' - 1
forval i = 1/`pm1' {
local ip1 = `i' + 1
forval j = `ip1'/`p' {
di "``i'' and ``j''"
kap ``i'' ``j''
di
}
}
I thought I might add my own answer in addition to highlight a few things.
The first thing to note is that for a new user, the most "straightforward" way to do it would likely involve hard-coding all variables into a local to use in a loop (as other answers suggest), or referencing them using a wildcard and writing more than one loop for each group. See the example below on how you might use a wildcard:
clear *
sysuse auto
/* Rename variables to match your .dta file and identify groups */
rename (price mpg rep78) (j_r_hp j_p_hp j_m_hp)
rename (headroom trunk weight) (p_c_hp p_r_hp p_m_hp)
rename (length turn displacement foreign) (z_r_hp z_m_hp z_p_hp z_c_hp)
/* Loop over all variables beginning with j and ending hp */
foreach x of varlist j*hp {
foreach i of varlist j*hp {
if "`x'" != "`i'" & "`i'" >= "`x'"{ // This section ensures you get only
// unique pairs of x & i
kap `x' `i'
}
}
}
/* Loop over all variables beginning with p and ending hp */
foreach x of varlist p*hp {
* something involving x
}
* etc.
Now, depending on how many groups you have or how many variables you have, this might not seem straightforward after all.
This brings up the second thing I would like to mention. In cases where hard-coding many variables or many repeated commands becomes cumbersome, I tend to favor a programmatic solution. This will often involve writing more code up front, but in many cases tends to be at least quasi-generalizable, and will allow you to easily evaluate hundreds of variables if you ever have the need without having to write them all out.
The code below uses the returned results from describe, along with some foreach loops and some extended macro functions to execute the kappa command over your variables without having to store them in a local manually.
clear *
sysuse auto
rename (price mpg rep78) (j_r_hp j_p_hp j_m_hp)
rename (headroom trunk weight) (p_c_hp p_r_hp p_m_hp)
rename (length turn displacement foreign) (z_r_hp z_m_hp z_p_hp z_c_hp)
/*
use gear_ratio as an arbitrary weight, order it first to easily extract
from the local containing varlist
*/
order gear_ratio, first
qui describe, varlist
local Varlist `r(varlist)' // store varlist in a local macro
preserve // preserve data so canges can be reverted back
foreach x of local Varlist {
capture confirm numeric variable `x'
if _rc {
drop `x' // Keep only numeric variables to use in kappa
}
}
qui describe, varlist // replace the local macro varlist with now numeric only variables
local Varlist `r(varlist)'
local vars : list Varlist - weight // remove weight from analysis varlist
foreach x of local vars {
foreach i of local vars {
if "`x'" != "`i'" & "`i'" >= "`x'" {
gettoken leftx : x, parse("_")
gettoken lefti : i, parse("_")
if "`leftx'" == "`lefti'" {
kap `x' `i'
}
}
}
}
restore
There of course will be a learning curve here for new users but I've found the use of macros, loops and returned results to be wonderfully effective in adding flexibility to my programs and do files - I would highly suggest anybody using Stata at least studies the basics of these three topics.

string comparison against factors in Stata

Suppose I have a factor variable with labels "a" "b" and "c" and want to see which observations have a label of "b". Stata refuses to parse
gen isb = myfactor == "b"
Sure, there is literally a "type mismatch", since my factor is encoded as an integer and so cannot be compared to the string "b". However, it wouldn't kill Stata to (i) perform the obvious parse or (ii) provide a translator function so I can write the comparison as label(myfactor) == "b". Using decode to (re)create a string variable defeats the purpose of encoding, which is to save space and make computations more efficient, right?
I hadn't really expected the comparison above to work, but I at least figured there would be a one- or two-line approach. Here is what I have found so far. There is a nice macro ("extended") function that maps the other way (from an integer to a label, seen below as local labi: label ...). Here's the solution using it:
// sample data
clear
input str5 mystr int mynum
a 5
b 5
b 6
c 4
end
encode mystr, gen(myfactor)
// first, how many groups are there?
by myfactor, sort: gen ng = _n == 1
replace ng = sum(ng)
scalar ng = ng[_N]
drop ng
// now, which code corresponds to "b"?
forvalues i = 1/`=ng'{
local labi: label myfactor `i'
if "b" == "`labi'" {
scalar bcode = `i'
break
}
}
di bcode
The second step is what irks me, but I'm sure there's a also faster, more idiomatic way of performing the first step. Can I grab the length of the label vector, for example?
An example:
clear all
set more off
sysuse auto
gen isdom = 1 if foreign == "Domestic":`:value label foreign'
list foreign isdom in 1/60
This creates a variable called isdom and it will equal 1 if foreigns's value label is equal to "Domestic". It uses an extended macro function.
From [U] 18.3.8 Macro expressions:
Also, typing
command that makes reference to `:extended macro function'
is equivalent to
local macroname : extended macro function
command that makes reference to `macroname'
This explains one of the two : in the offered syntax. The other can be explained by
... to specify value labels directly in an expression, rather than through
the underlying numeric value ... You specify the label in double quotes
(""), followed by a colon (:), followed by the name of the value
label.
The quote is from Stata tip 14: Using value labels in expressions, by Kenneth Higbee, The Stata Journal (2004). Freely available at http://www.stata-journal.com/sjpdf.html?articlenum=dm0009
Edit
On computing the number of distinct observations, another way is:
by myfactor, sort: gen ng = _n == 1
count if ng
scalar sc_ng = r(N)
display sc_ng
But yours is fine. In fact, it is documented here: http://www.stata.com/support/faqs/data-management/number-of-distinct-observations/, along with more methods and comments.

How to declare an immutable graph with circular references?

I want to declare a graph of all states where the edges represent contiguous states. I think what I am trying to do might be called "tying the knot" (not sure about that though). It's not working like I expected, and I have a couple of questions.
First, I want a State type that has a string name and a list of contiguous states. But this declaration gives compiler error "...immediate cyclic reference...":
type State = string * (State list)
This way works:
type State(name:string, contigs: (State list)) =
let name = name
let contigs = contigs
But it's really not a requirement to name the members. A tuple is fine. How can I make that terse syntax work?
Second, the following code attempts to declare what should be three graphs of contiguous states (HI and AK are graphs consisting of a single node, all the remaining states constitute the last graph), followed by a list of all nodes. (For brevity I've only actually declared a handful of states here):
let rec hi = State("hi", [])
and mo = State("mo", [il ia])
and il = State("il", [mo])
and ia = State("ia", [mo])
and states = [hi,mo,il,ia]
This gives a variety of errors though including "mo will eventually be evaluated as part of it's own definition" and "expression was expected to have type 'a->'b but here has type State". I thought the 'rec' and 'and' keywords would allow this to work. Can I define this self referencing graph? If so, how?
The problem is your data structure and using invalid list element delimiters (should be semicolon). This works: (see edit)
type State =
| State of string * State list
let rec hi = State("hi", [])
and mo = State("mo", [il; ia])
and il = State("il", [mo])
and ia = State("ia", [mo])
let states = [hi; mo; il; ia]
Recursive references will be materialized as thunks (lazy). So you could, with a bit more typing do the same thing yourself with mutable lazys--just FYI--what you have is idiomatic.
EDIT
Intellisense didn't have a problem with it, but the compiler says
Recursive values cannot appear directly as a construction of the type 'List`1' within a recursive binding. This feature has been removed from the F# language. Consider using a record instead.
You can fix this by using seq instead of list.
type State =
| State of string * State seq
let rec hi = State("hi", [])
and mo = State("mo", seq { yield il; yield ia })
and il = State("il", seq { yield mo })
and ia = State("ia", seq { yield mo })
let states = [hi; mo; il; ia]
Although what Daniel says is correct I would contest the assertion that it is "idiomatic" because that does not produce a very useful data structure for representing graphs in the general case. Specifically, it only permits the addition of new vertices and edges from them but not adding or removing edges between existing vertices. In particular, this basically means your graph must be statically defined as a constant in your source code so you cannot load such a graph from disk easily.
The idiomatic purely functional representation of a graph is to replace dereferences with dictionary lookups. For example, represent the graph as a Map from vertices to Sets of vertices to which there are edges:
> let g =
Map["hi", set[]; "mo", set["il"; "ia"]; "il", set["mo"]; "ia", set["mo"]];;
val g : Map<string,Set<string>> =
map
[("hi", set []); ("ia", set ["mo"]); ("il", set ["mo"]);
("mo", set ["ia"; "il"])]
For example, you can lookup the vertices directly reachable via edges from mo like this:
> g.["mo"];;
val it : Set<string> = set ["ia"; "il"]
This is easier to debug than the mutable representation but it has significant disadvantages:
Lookup in a purely functional dictionary like Map is at least 200× slower than dereferencing a pointer for traversing graphs (according to a quick test here).
The garbage collector no longer reclaims unreachable subgraphs for you. The imperative solution is to use a weak dictionary but there are no known purely functional weak dictionaries.
So this is only feasible if performance and leaks will not be a problem. This is most commonly the case when your graphs are small or static.

Linked-list representation of disjoint sets - omission in Intro to Algorithms text?

Having had success with my last CLRS question, here's another:
In Introduction to Algorithms, Second Edition, p. 501-502, a linked-list representation of disjoint sets is described, wherein each list member the following three fields are maintained:
set member
pointer to next object
pointer back to first object (the set representative).
Although linked lists could be implemented by using only a single "Link" object type, the textbook shows an auxiliary "Linked List" object that contains a pointer to the "head" link and the "tail" link. Having a pointer to the "tail" facilitates the Union(x, y) operation, so that one need not traverse all of the links in a larger set x in order to start appending the links of the smaller set y to it.
However, to obtain a reference to the tail link, it would seem that each link object needs to maintain a fourth field: a reference to the Linked List auxiliary object itself. In that case, why not drop the Linked List object entirely and use that fourth field to point directly to the tail?
Would you consider this an omission in the text?
I just opened the text and the textbook description seems fine to me.
From what I understand the data-structure is something like:
struct Set {
LinkedListObject * head;
LinkedListObject * tail;
};
struct LinkedListObject {
Value set_member;
Set *representative;
LinkedListObject * next;
};
The textbook does not talk of any "auxillary" linked list structure in the book I have (second edition). Can you post the relevant paragraph?
Doing a Union would be something like:
// No error checks.
Set * Union(Set *x, Set *y) {
x->tail->next = y->head;
x->tail = y->tail;
LinkedListObject *tmp = y->head;
while (tmp) {
tmp->representative = x;
tmp = tmp->next;
}
return x;
}
why not drop the Linked List object entirely and use that fourth field to point directly to the tail?
An insight can be taken from path compression. There all the elements are supposed to point to head of list. If it doesn't happen then the find-set operation does that (by changing p[x] and returning that). You talk similarly of tail. So if such function is implemented only then can we use that.

Resources