2 comments

  • phanisaimuni116 2 hours ago
    Hey HN , I built glide after getting tired of Claude Opus stalling mid-task under peak load.

    It's a transparent proxy that tracks rolling p95 TTFT per model and cascades to a faster model before you ever hit a timeout. Three routing strategies: solo (primary healthy), hedge (race two models, stream the winner), skip (both slow → sequential cascade).

    Stack: Python, FastAPI, asyncio streaming, SQLite for persistence, Prometheus metrics endpoint. Provider-agnostic — mix Anthropic, OpenAI, Gemini, Ollama in one cascade.

    Happy to answer questions on the implementation, especially the mid-stream SSE abort for TTT (time-to-think) enforcement.

  • phanisaimuni116 2 hours ago
    [dead]