ajan-sql: Schema-Aware Read-Only SQL Guard for AI Access

ajan-sql brings guarded, schema-aware read-only SQL access to LLMs for safer database inspection

ajan-sql gives LLMs schema-aware, read-only SQL access with guarded execution, structured machine-readable outputs, and safer database inspection for analytics.

ajan-sql emerged to address a common tension in AI-driven data tooling: models can generate useful SQL, but handing them direct, unrestricted access to a live database is risky. Built as a lightweight MCP-compatible server that communicates over stdio and connects using a DATABASE_URL, ajan-sql exposes a controlled surface of schema information and query tools so language models can inspect and analyze data without altering it or consuming excessive resources.

Below I walk through what ajan-sql provides, how it enforces safety, why schema-aware tooling matters for LLMs, how developers can integrate it into workflows, and what its arrival means for teams building AI‑assisted analytics, internal tools, and developer utilities.

What ajan-sql exposes to clients

At its core, ajan-sql is not a full database proxy; it’s a read-only gatekeeper that deliberately limits what an AI agent can see and do. The server presents a small set of tools—capabilities designed for exploration and analysis rather than modification. Key endpoints include list_tables, describe_table, list_relationships, run_readonly_query, explain_query, and sample_rows. In addition to these callable tools, ajan-sql publishes schema resources such as schema://snapshot and schema://table/{name}, giving clients a stable snapshot of the database schema for reasoning.

Those primitives are intentionally focused: list_tables surfaces visible tables and helpful metadata (schema name, table comment, and estimated row counts); describe_table returns column definitions, types, nullability, defaults, and index/constraint information; and list_relationships enumerates foreign key relationships. When a model needs to run SQL, run_readonly_query executes guarded SELECT statements and returns normalized SQL, timing, column metadata, and the result rows. explain_query leverages EXPLAIN (FORMAT JSON) to surface planner timing and a lightweight summary of the root plan node. sample_rows provides a small representative set of rows from a table, optionally limited to selected columns.

This combination—schema discovery plus safe, constrained query execution—lets an LLM reason about the shape of the data and produce informed queries while keeping control over the actual execution.

How the safety model constrains risk

The most important design decision in ajan-sql is refusing to equate convenience with permission. The safety model enforces read-only behavior and a set of hard query restrictions so that models can inspect but not mutate or degrade a database.

By default ajan-sql enforces:

SELECT-only execution: INSERT/UPDATE/DELETE/DROP/ALTER/TRUNCATE and other mutating statements are rejected.
No multi-statement SQL: statements containing more than one command are disallowed.
No SQL comments: comments are stripped or rejected to avoid injection vectors.
Default LIMIT 100 when a LIMIT is missing, preventing accidentally massive result sets.
A maximum query timeout (5 seconds) to cap resource usage.
Result-size checks to stop oversized payloads from being returned.

These constraints aim to strike a practical balance: allow meaningful queries for analysis while preventing simple mistakes or prompt ambiguity from causing data loss, long-running scans, or exfiltration of more rows than intended. The server still returns normalized SQL and timing so clients can audit and reason about what ran.

Why schema-aware tooling changes the LLM experience

As AI assistants get used to writing SQL, the gap between “the model knows SQL syntax” and “the model understands the actual database it’s querying” becomes the limiting factor. A model that can only guess column names and table relationships will produce brittle or inefficient queries. Giving LLMs schema awareness—visibility into table names, column types, primary keys, foreign keys, and indexes—changes their outputs in three ways:

Better query generation: When the model knows column names and datatypes it can craft safer WHERE clauses, correct JOIN predicates, and generate properly typed aggregations.
Fewer guesswork cycles: Schema discovery reduces back-and-forth prompt loops where the assistant issues a query, sees an error, and revises it.
More reliable analysis: With EXPLAIN plans and sampled rows, models can reason about performance and data characteristics instead of making blind assumptions.

Ajan-sql wraps schema discovery in tools designed to be machine-friendly: both human-readable summaries and structuredContent payloads intended for programmatic consumption. That structured output is useful when integrating with developer tools, automation platforms, or dashboards that expect predictable JSON rather than ad hoc text.

Tool-by-tool: practical developer expectations

Integrators and developers should treat each exposed tool as a capability that supports specific tasks:

list_tables: Use this to populate UI pickers, autocomplete, or for the model to enumerate candidate sources for a question. Including estimated row counts helps the assistant decide whether to sample or run full queries.
describe_table: Essential for type-aware transformations, formatting, and guarding against NULL surprises. A model that sees primary key and unique constraint metadata can avoid unnecessary DISTINCTs and avoid incorrect JOINs.
list_relationships: Enables LLMs to generate correct JOINs and to propose entity-centric views for reporting.
run_readonly_query: The guarded executor—suitable for final-answer queries that users request or for programmatic analysis. Consumers should expect normalized SQL, execution duration, and a reasonable default LIMIT.
explain_query: Use this when performance-sensitive queries are generated or when the assistant needs to surface why a query might be slow.
sample_rows: Best for previewing data, generating schema-based examples, or producing short tables for a chat response.

Because outputs supply both a text "content" summary and a structuredContent representation, clients can either display a crisp human summary to an end user or wire the structured JSON into downstream pipelines, charts, or monitoring logs.

Integrating ajan-sql into AI workflows

For teams building AI assistants, BI augmentation, or developer utilities, ajan-sql fits between the model and the database as a curated interface. Typical integration patterns include:

Assistant sandboxing: Attach the model to ajan-sql instead of directly to a DB connection so the assistant only ever sees guarded, read-only access. The assistant can combine schema calls with natural language instructions to iteratively build queries.
Preflight and validation: Use describe_table and explain_query to validate model-generated SQL before exposing results to end users or running heavier analytics jobs.
Flow-based analytics: In an automation pipeline (e.g., where an LLM creates reports), ajan-sql can be the trusted executor that returns normalized results and execution metadata for downstream transformation steps.
Developer tooling: Ship ajan-sql alongside internal Software Heraldols that let engineers ask natural-language questions about telemetry, logs, or staging databases without risking accidental schema changes.

Setup is intentionally lightweight: ajan-sql runs as an MCP service over stdio and reads DATABASE_URL for connection parameters. For quick testing, the package can be installed globally via npm and started in environments where MCP clients are used.

Security, governance, and operational controls

Ajan-sql is not a panacea for broader data governance concerns, but it provides a useful boundary that simplifies several operational issues:

Auditability: Returning normalized SQL and execution metadata gives ops teams an auditable trail of what queries were executed by AI clients. Integrators should log those normalized queries and map them back to user sessions.
Least privilege: Because ajan-sql only needs read permissions, you can provision a DB role with narrow SELECT privileges for the ajan-sql connection. That reduces blast radius if credentials are leaked.
Rate limiting and quotas: Pair ajan-sql with request-level throttles and quota enforcement so that model-driven queries can’t overwhelm databases under heavy usage.
Row- and column-level controls: While ajan-sql enforces statement-level restrictions, teams should still rely on database-native row-level security, views, or column redaction to protect sensitive fields such as PII.
Monitoring: Track query timeouts, result-size rejections, and unusual schema inspection patterns to detect misuse or misconfiguration by model clients.

These measures let operations teams treat AI-driven explorations as first-class traffic while minimizing risk.

Where ajan-sql fits among related technologies

Ajan-sql occupies a middle ground between raw database connectors and full-featured query mediators. It is lighter than a full proxy layer or managed data governance platform, yet more protective than allowing models direct DB access. For teams already using automation platforms, developer tools, or AI orchestration libraries, ajan-sql can serve as a safe backend for SQL generation workflows. It complements ecosystems like internal data catalogs and analytics tooling by offering a machine-friendly schema snapshot endpoint and constrained execution for exploratory tasks.

Because it focuses on read-only interactions and machine-readable outputs, ajan-sql pairs well with:

AI toolchains that orchestrate multiple helpers (e.g., a prompt manager, reasoning agent, and a guarded SQL executor).
BI platforms that want to allow natural language queries while retaining control over query safety.
Developer toolsets that generate code or diagnostics from live schema metadata.

Performance considerations and limitations

Ajan-sql’s default limits—such as LIMIT 100, a 5-second timeout, and result-size checks—are conservative by design. They prevent runaway queries but can also frustrate use cases that require larger scans or longer-running analytical queries. Teams should treat ajan-sql as ideal for interactive exploration, prototyping, and generation of final small-result answers, not for heavy ETL or large-scale analytics.

If your workflow needs broader capabilities, consider:

Running larger queries through a separate analytics pipeline with explicit human review.
Using materialized views or precomputed aggregates that ajan-sql can safely query within its limits.
Adjusting timeouts and limits only after assessing resource impact and implementing robust monitoring.

Developer experience: structured outputs and machine-friendly payloads

A key practical design in ajan-sql is returning both a concise human-facing content summary and a structuredContent JSON payload. This dual-output approach solves two frequent integration problems:

Chat UIs and notebooks get readable summaries they can present to users without additional formatting.
Downstream automation and developer tooling receive predictable fields (columns, types, rows, row counts, normalized SQL) that can be programmatically consumed without brittle parsing.

That pattern reduces the engineering work to bind model outputs to charts, export routines, or logging systems and makes the server a practical component for production-grade AI assistants.

Real-world use cases where guarded SQL is valuable

Several concrete scenarios benefit from ajan-sql’s model:

Internal analytics chatbots: Enable non-technical stakeholders to ask questions about sales, support loads, or inventory without exposing write access or large exports.
Data exploration in staging: Allow product teams to sample and inspect staging data safely as part of feature reviews.
Developer audit assistants: Automate routine checks—like schema drift detection or index usage hints—by letting agents inspect schema and run explain plans.
Compliance reporting: Let auditors pull limited, well-formed reports without risking changes to production datasets.

Across these use cases, the combination of schema discovery and execution constraints means assistants can be helpful while remaining within acceptable operational controls.

Broader implications for AI, developer tooling, and enterprise data

The emergence of tools like ajan-sql highlights a broader shift: AI models are moving from code-generation curiosities to integrated components in production developer and analytics flows. That shift forces teams to codify safety and observability patterns the way they already do for human developers. Guarded interfaces, machine-readable schema snapshots, and strict execution policies are likely to become standard primitives in internal AI platforms.

For developers, this means rethinking how assistants are treated: not as free-roaming agents with DB credentials but as capability-limited tools whose actions are mediated by purpose-built APIs. For businesses, the availability of read-only, schema-aware layers lowers the barrier to adopting AI-assisted analytics without wholesale changes to data governance.

At the same time, the approach surfaces new responsibilities: maintaining curated schema snapshots, defining acceptable default limits, and building monitoring and auditing into agent workflows. Governance teams will need to adapt policies to account for algorithmic access patterns and automated query generation.

Practical advice for teams adopting ajan-sql

If you’re evaluating ajan-sql for your stack, consider these practical steps:

Start with a dedicated read-only database role scoped to non-sensitive schemas.
Integrate logging immediately: capture normalized SQL, execution times, and the agent identity that requested the query.
Use schema snapshots to drive user-facing autocomplete and to reduce unnecessary queries.
Pair with row-level security or views to enforce PII protections independent of the ajan-sql safeguards.
Test the default limits with representative workloads to avoid surprising user friction.

These steps help you get safe value from LLM-driven exploration without opening wider risks.

The next wave of developer tooling will treat guarded data access as a composable capability. Tools like ajan-sql show it’s possible to give LLMs the context they need—schema awareness, explain plans, and sample rows—while enforcing the operational constraints that teams require. Expect integration patterns to converge around machine-readable outputs, auditable execution, and least-privilege connections as organizations scale AI access across product, analytics, and developer teams.

Looking ahead, we can anticipate richer integrations: model-directed query synthesis combined with policy engines that enforce per-user limits; hybrid deployments that route heavy analytics to data warehouses while letting agents query materialized summaries; and tighter IDE integration so developers can ask a contextual assistant about schema and performance directly while coding database interactions. As these patterns iterate, guarded interfaces like ajan-sql will be an important building block for practical, safe AI augmentation of data platforms.