Scaling IP Lookup to Large Databases using the CRAM Lens Robert Chang, Pradeep Dogga, Andy Fingerhut, Victor Rios, and George Varghese NSDI'25
IP Lookup
This paper introduces a new computational model (CRAM) and then applies that model to the problem of IP lookup. IP lookup is simply a mapping of an IP address to arbitrary data (e.g., the address of the next hop that a packet should be routed to). The mapping is described by (relatively static) routing tables.
The tricky part is that the keys in a routing table can contain wildcards. Table 1 contains a simple example:
Content Addressable Memory
Here is a refresher if you’ve forgotten the difference between SRAM, CAM, and TCAM. All three are types of on-chip memories.
SRAM behaves like an array: it maps integer keys to values. SRAM is dense: if the key width is 10 bits, then the SRAM contains 1024 entries.
CAM behaves like a hash table. It maps integer keys to values, but it contains sparse storage. For example, a CAM could have an input key width of 10 bits but only have storage for 64 entries.
A TCAM is like a CAM but supports wildcards in the key bits. For example, a CAM could have 10-bit input keys and contain a key→value mapping like (1010****11 → 35). The input key 1010000011 would produce a value of 35, and an input key 1010110111 would also.
RMT and dRMT
The CRAM model introduced by this paper is a simplified computational model for hardware which works well for this type of application. The elevator pitch for this model is that it is simple, and yet accurately predicts performance for two widely used network processing hardware architectures: RMT and dRMT.
The idea is that a hardware architect, or network engineer could use the CRAM model to do performance analysis before diving into the weeds.
Fig. 11 illustrates the RMT and dRMT hardware architectures:
In the context of IP lookup, the four hardware blocks illustrated in Fig. 11 perform the following tasks:
Match compares packet addresses against addresses in the routing table (key comparison)
Action forwards packets based on the data read from the routing table
CAM represents CAM or TCAM memories
RAM represents SRAM
RMT is a more rigid pipeline, whereas dRMT is more similar to a multi-core processor where each core has access to a shared pool of CAM and RAM.
CRAM Model
The core of the CRAM model is a DAG. Each vertex represents a step of the computation. Data flows between steps in a fixed-size array of fixed-sized registers.
A step comprises an optional table lookup, followed by a sequence of statements.
The lookup table can be SRAM, CAM or TCAM. Table lookup keys come from registers, and table lookup results are stored in registers. The lookup table itself is coupled with the step. There is no way for step N
to access the lookup table associated with step M
. Note that in the strict CRAM model, tables cannot be stored off-chip (e.g., DRAM).
Each statement looks more or less like a typical RISC instruction (e.g., R1 = R2 ^ R3
). All statements within a step run in parallel (after the table lookup).
Design Space Exploration
Section 2.2 lists eight idioms which can be used to explore the design space of CRAM graphs (I think of them as CRAM programs). Examples include:
Converting a TCAM to an SRAM by duplicating entries, or visa-versa
Using a large SRAM to handle common (and simple) cases, and a small TCAM to handle uncommon (and complex) cases
Breaking a single large lookup table into multiple small lookup tables, with each small lookup table handling a subset of the key bits
Results
There are two types of results in this paper:
New algorithms discovered by the authors using design-space exploration with the CRAM model
The predictive accuracy of the CRAM model vs the real world
RESAIL
The paper describes three new algorithms, RESAIL is one of them. RESAIL is a incremental improvement to SAIL (a state-of-the-art IPv4 lookup algorithm which requires DRAM). RESAIL is based on the fact that most IPv4 addresses (keys) in real-world routing tables have a suffix of wildcards that is at least 8 bits wide (e.g., 54.135.54.*). The prefix length of an address in the routing table is number of bits before the wildcard suffix.
RESAIL uses a small (~3KiB) TCAM to handle routing table entries with a prefix length greater than 24. The set of routing table entries with long prefix lengths is a good fit for TCAM because it is sparse.
For narrower prefix lengths, RESAIL uses two data structures in SRAM. The first is a set of 24 bitmaps. Bitmap i
has length 2^i
. i
prefix bits from an IPv4 address are used to check if that prefix is present in the routing table. All bitmaps can be checked in parallel. If there are multiple matches, then the longest prefix is chosen.
These bitmaps are also sparse (a whole lot of zeros), but it is still better to use SRAM vs TCAM because each entry is only a single bit wide.
If the bitmap lookup results in a hit, then a hash table (also stored in SRAM) is used to lookup the value associated with the IPv4 address. The prefix length from the bitmap lookup is used when constructing the key used for the hash table lookup. This is how wildcard support is added to a traditional hash table. The paper mentions that a hash table with low probability of collisions is used, but does not go into more detail about how collisions are handled.
Appendix A.5 contains pseudocode for RESAIL:
Predictive Accuracy
Table 10 compares TCAM and SRAM usage predicted by the CRAM model vs an ideal and actual RMT architecture. I suppose the unspoken assumption on all of this is that networking is all about on-chip memory, logic isn’t significant.
Dangling Pointers
It is a bit odd that CAM and TCAM are so important for networking hardware, but not other chips. Maps/Dictionaries are extremely common in general software, so why don’t general purpose processors have dedicated support? There is probably a lot of work to design an architecture which allows for virtualization and composition of libraries.
In some sense, a CPU cache acts a bit like a CAM. I wonder if the same hardware could be reused to support an architecture which had explicit CAM support.