Discussion about this post

User's avatar
Yannis Smaragdakis's avatar

As a heavy Datalog user, here's my opinion of the paper.

Generally, my objection to this work would be that almost all "Datalog" programs they evaluate (7 programs in total) are either just SQL (i.e., they only have joins, no recursion) or they are transitive closure (i.e., they only have linear recursion). This is not Datalog. This is *provably* (one of the few things we can prove in complexity theory) a lower complexity class than Datalog. So, there is very strong reason to believe that these results do *not* translate to full Datalog. Transitive closure can be sped up arbitrarily much, compared to more complex recursion.

The authors only evaluate *one* program (out of the 7) that has complex recursion. That one is "context-sensitive program analysis" (CSPA). However:

a) the souffle version is not optimized for complex recursion, I strongly suspect it can be sped up by 2 orders of magnitude, so I really doubt the GPU speedup for this one

b) despite the name, the analysis is not "context-sensitive", but let's just say this is an oversight

c) it's still just a 10-line program, hardly representative of full Datalog.

Methodologically, I cannot reproduce anything from this paper, at least not without talking to the authors directly. The artifact link in the paper (https://file.io/YZE8MKx12iqX) is broken already. In the repo, most of the large inputs are replaced with git lfs links and the github repo doesn't seem to serve git lfs, e.g., https://github.com/harp-lab/gdlog/blob/main/data/cspa/httpd/assign.facts

Expand full comment

No posts