Skip to main content

Architecture

Tech Stack

ComponentTechnologyPurpose
FrameworkFastify 5 (slim-bp-v2)REST API
PersistenceMongoDB / MongooseCanvases, widgets, conversations
Query EngineDuckDB.duckdb file queries (energy audit data)
AI (Primary)Configurable via DEFAULT_LLMSQL + chart config generation
AI (Fallback)Configurable via FALLBACK_LLMGuards against transient API errors
AuthJWT (Ed25519) + CerbosSession tokens + role-based policies

AI Strategy

Model Selection

  • Primary: Configurable via DEFAULT_LLM env var — proven 97% success rate (31/32 queries) on energy audit SQL generation
  • Fallback: Configurable via FALLBACK_LLM env var — triggered if primary returns no tool use or a transient API error (2-attempt flow: primary → fallback)
  • Store which model generated each widget's SQL (useful for quality debugging)

No Streaming

  • Simple JSON request/response — no SSE/WebSocket streaming
  • AI responses are short (sentence + chips, or SQL + config) — ~50-200 tokens
  • Streaming complexity not justified for this use case
  • Can be added later if latency becomes an issue

Structured Output

AI responses follow a structured format via tool use:

// Clarification response
{
"type": "clarification",
"message": "What month range should this cover?"
}

// Chart generation response
{
"type": "chart",
"title": "TC Trends — Line Chart",
"conversationName": "TC trends over time",
"chartType": "line",
"sql": "SELECT ...",
"chartConfig": { "xAxis": "month", "yAxis": ["audited", "total", "unaudited"] }
}

// Table generation response
{
"type": "table",
"title": "Section-wise Loss Breakdown",
"conversationName": "Loss percentage breakdown",
"sql": "SELECT ...",
"columns": [{ "key": "section", "label": "Section" }, ...]
}

Data Limits

  • Charts: Max 10 data points per axis
    • Prompt engineering: AI is instructed to write SQL that returns max 10 data points
    • Backend safety net: After DuckDB execution, if rows > 10 for a chart, do not create the widget. Return a clarification to the user explaining the data has too many points for a chart, suggest narrowing the request or switching to a table. Conversation stays active so the user can adjust
  • Tables: Server-side pagination (10 rows per page via LIMIT/OFFSET in DuckDB). No AI-side limit needed

DuckDB Error Handling

  • If AI-generated SQL fails DuckDB execution (syntax error, invalid column, etc.), do 1 retry — send the DuckDB error message back to AI so it can fix the SQL
  • If the retry also fails, return an error to the user

Role-Based Scoping

  • AI determines data scope based on user type/hierarchy
  • WHERE clause injection for role-based filtering
  • AEs: scoped to their section
  • CIO and above: can span multiple sections, divisions, circles

DuckDB Integration

Architecture

  • DuckDB runs in-process (Node.js native binding)
  • Pre-built .duckdb file opened in read-only mode (tables/views already exist)
  • Read-only queries only (no writes)
  • Connection pool managed per Cloud Run instance

Performance Benchmarks

Tested against ~43M rows (Sep-Dec 2025 data, months 2509-2512)

ComplexityQueriesAvg ColdAvg WarmMax Cold
Simple12~15ms~13ms27ms
Moderate10~70ms~62ms180ms
Complex10~160ms~150ms637ms
  • 31/32 queries succeeded (1 failed due to VARCHAR→DOUBLE cast edge case)
  • SQL generated by Claude Haiku 4.5 (benchmark model — app now uses configurable DEFAULT_LLM)

Benchmark Gaps (Follow-up)

  • Concurrent query load (multi-user simulation)
  • Memory/RAM consumption per query (Cloud Run sizing)
  • Large result sets (10K+ rows for tables)
  • Data scaling (12+ months of data)
  • Role-scoped query overhead (additional WHERE/JOIN)

Latency Considerations

HopLatency
Mumbai → Claude API (US)~200-300ms network RTT
LLM response (DEFAULT_LLM)~1-3s (50-200 tokens)
DuckDB query execution~15-200ms (typical)
Total per AI turn~1.5-3.5s
  • For a 2-3 turn conversation: ~5-10s total before widget appears
  • Network latency is a small fraction of total wait
  • If problematic: consider Bedrock in ap-south-1 (Mumbai) or ap-northeast-1 (Tokyo)

Data Flow

Table Pagination

  • Tables re-execute the stored SQL with LIMIT / OFFSET appended
  • Backend wraps stored SQL: SELECT * FROM ({stored_sql}) LIMIT {limit} OFFSET {offset}
  • Also returns total count for frontend pagination UI