Show HN: Cachey, a Read-Through Cache for S3

(github.com)

1 points | by shikhar 4 hours ago

2 comments

shikhar 4 hours ago
How we run it:
Auto-scaled Kubernetes deployments, one for each availability zone, currently on m*gd instances which give us local NVMe. The pods are able to easily push GiBps with 1-2 CPUs used — network is the bottleneck so we made it a scaling dimension (thanks KEDA).
On the client side, each gateway process uses kube.rs to watch ready endpoints in the same zone as itself, and frequently polls /stats exposed by Cachey for recent network throughput as a load signal.
To improve hit rates with key affinity, clients use rendezvous hashing for picking a node, with bounded load (https://arxiv.org/abs/1608.01350) – if a node exceeds a predetermined throughput limit, the next choice for the key is picked.
We may move towards consistent hashing – it would be a great problem to have, if we needed so many Cachey pods in a zone that O(n) hashing was meaningful overhead! An advantage with the current approach is it does not suffer from the cascaded overflow problem (https://arxiv.org/abs/1908.08762).
whyandgrowth 4 hours ago
To be honest: for use as a local cache/S3 accelerator for large files – it’s fine. The API is simple but flexible. The only point is that the documentation is in English, and you need to understand how “hedged fe” works.