DeltaLake

Hilbert Curves

When you want to cluster data together over multiple dimensions, you can use Z-Order. But a better algorithm is the Hilbert Curve, a fractal that makes a best attempt to keep adjacent points together in a 1-dimensional space. From DataBrick’s Liquid Cluster design doc we get this graphical representation of what it looks like: (Dotted line squares represent files). A Hilbert curve has the property that adjacent nodes (on the red line, above) have a distance of 1.