SQL optimization examples

See How Easily AI Optimizes SQL

Discover how AI can optimize your SQL queries and identify rewrites that improve performance without changing their intent. Each example includes the original query, the optimized version, and a detailed explanation of the improvements.

  • Compare original SQL and optimized SQL side by side
  • Understand why each rewrite preserves or improves behavior
  • Review optional index strategies for faster execution plans
  • Use with any database engine

What these SQL optimization examples cover

This page focuses on real PostgreSQL optimization patterns: predicate tuning, correlated subquery rewrites, semi-join usage, aggregation simplification, and index guidance. You can inspect each original query and compare it to a rewritten version with detailed reasoning.

The examples are for PostgreSQL database but all common databases are support, e.g. MySQL, Oracle, MS SQL Sever.

SQL optimizations

Explore how our SQL optimizer improves different SQL queries.

Example 1

Original SQL query

SELECT
    user_id,
    username,
    full_name,
    email
FROM
    users
WHERE
    UPPER(email) = 'JOHN.DOE@EXAMPLE.COM';

Optimizer output

๐Ÿš€ Optimized SQL query

-- Keep the same logic; rely on a functional index on upper(email)
SELECT
    user_id,
    username,
    full_name,
    email
FROM
    users
WHERE
    UPPER(email) = 'JOHN.DOE@EXAMPLE.COM';

๐Ÿ’ฌ Explanation

#Explanation
1The current predicate UPPER(email) = 'JOHN.DOE@EXAMPLE.COM' defines case-insensitive equality via uppercasing the column. Changing it to LOWER(...) or citext may subtly change semantics for non-ASCII data. Therefore, we keep it exactly as-is to guarantee identical results.
2Performance should be addressed by indexing the expression used in the WHERE clause so the planner can perform an index scan instead of a full table scan.
3No query rewrite is necessary; with the functional index in place, the optimizer can use it. Optionally, INCLUDE columns to enable index-only scans.

๐Ÿ› ๏ธ Database indexes (optional)

-- Primary: functional index that matches the predicate
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_upper_email
  ON users (UPPER(email));

-- Optional: allow index-only scans for this query shape
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_upper_email_cover
  ON users (UPPER(email)) INCLUDE (user_id, username, full_name, email);

-- Refresh stats so the planner recognizes index selectivity
ANALYZE users;

Alternative options (only if you can change semantics/implementation consistently across the app):

  • Use LOWER(email) with a matching functional index on lower(email).
  • Convert email to citext and create a plain B-tree index for simpler case-insensitive comparisons.

๐Ÿ“ Notes and considerations

  • Build indexes CONCURRENTLY on large/active tables to avoid long write locks.
  • Ensure autovacuum/analyze is healthy; stale stats can push the planner toward sequential scans.
  • If you parameterize the value (recommended), still keep UPPER(email) = UPPER($1) to preserve identical semantics and enable the same functional index.
Example 2

Original SQL query

SELECT
    p.product_id,
    p.product_name,
    p.price,
    (
        SELECT MAX(o.order_date)
        FROM orders o
        JOIN order_items oi ON o.order_id = oi.order_id
        WHERE oi.product_id = p.product_id
    ) AS last_ordered_date
FROM
    products p;

Optimizer output

๐Ÿš€ Optimized SQL query

SELECT
    p.product_id,
    p.product_name,
    p.price,
    lo.last_ordered_date
FROM products AS p
LEFT JOIN (
    -- Aggregate once: exact same semantics as the correlated MAX() per product
    SELECT
        oi.product_id,
        MAX(o.order_date) AS last_ordered_date
    FROM order_items AS oi
    JOIN orders AS o ON o.order_id = oi.order_id
    GROUP BY oi.product_id
) AS lo
  ON lo.product_id = p.product_id;

๐Ÿ’ฌ Explanation

#Explanation
1The correlated subquery ran once per product. We precompute MAX(order_date) per product_id exactly once, then LEFT JOIN it to products. Products with no orders still return NULL for last_ordered_date, preserving semantics.
2The join between orders and order_items is executed once inside the aggregate subquery and grouped by product_id, eliminating repeated work.
3Using LEFT JOIN (not INNER) matches the original behavior where the scalar subquery returns NULL for products with no matching order_items.
4MAX(o.order_date) over the same join inputs yields the same value as the per-row correlated MAX; duplicates in order_items donโ€™t affect the maximum.
5The rewrite enables PostgreSQL to use hash/merge/parallel aggregates and set-based plans, typically reducing I/O and CPU vs. N-times correlated execution.

๐Ÿ› ๏ธ Database indexes (optional)

-- Choose based on your join/scan direction; start with the first, add the second if needed:

-- Supports grouping by product_id and subsequent join to orders via order_id
CREATE INDEX IF NOT EXISTS idx_order_items_product_order ON order_items (product_id, order_id);

-- If you often join from orders -> order_items by order_id, this helps that direction
CREATE INDEX IF NOT EXISTS idx_order_items_order_product ON order_items (order_id, product_id);

-- Ensure orders has a PK on (order_id). To enable index-only access to order_date via that join:
-- INCLUDE keeps order_id as the key while storing order_date in the index payload.
CREATE INDEX IF NOT EXISTS idx_orders_orderid_incl_date ON orders (order_id) INCLUDE (order_date);

-- After creating indexes:
ANALYZE order_items;
ANALYZE orders;

๐Ÿ“ Notes and considerations

  • If this query runs very frequently over large historical data, consider a summary object:
    • CREATE MATERIALIZED VIEW product_last_order AS SELECT oi.product_id, MAX(o.order_date) AS last_ordered_date FROM order_items oi JOIN orders o USING (order_id) GROUP BY oi.product_id;
    • REFRESH MATERIALIZED VIEW CONCURRENTLY product_last_order; and join to it instead of raw tables.
  • Compare plans with EXPLAIN (ANALYZE, BUFFERS) before/after; keep only the indexes that measurably help.
Example 3

Original SQL query

SELECT
  user_id,
  username,
  registration_date
FROM
  users
WHERE
user_id IN (
    SELECT DISTINCT o.user_id
    FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    WHERE oi.product_id = 12345
);

Optimizer output

๐Ÿš€ Optimized SQL query

SELECT
    u.user_id,
    u.username,
    u.registration_date
FROM users AS u
WHERE EXISTS (  -- same semantics as IN (SELECT DISTINCT ...), but allows short-circuit and avoids materialization
    SELECT 1
    FROM order_items AS oi
    JOIN orders AS o ON o.order_id = oi.order_id
    WHERE
        oi.product_id = 12345
        AND o.user_id = u.user_id
);

๐Ÿ’ฌ Explanation

#Explanation
1Rewrote IN with a DISTINCT subquery to a correlated EXISTS. This preserves exact semantics (including NULL handling) while avoiding duplicate elimination/materialization.
2EXISTS naturally implements a semi-join: it stops after the first qualifying row, which is typically faster than building a DISTINCT list of user_ids.
3The join order in the EXISTS lets PostgreSQL start from order_items filtered by product_id and then check the matching order via its PK, minimizing work.
4Semantics check: Users with NULL user_id will not match (same as original). Subquery NULL user_id rows do not cause matches (same as original). Results are identical.

๐Ÿ› ๏ธ Database indexes (optional)

-- Drives the lookup by product_id and supplies order_id to join back to orders
CREATE INDEX IF NOT EXISTS order_items_product_id_order_id_idx
  ON order_items (product_id, order_id);

-- Usually orders.order_id is the PK; ensure it's present
-- CREATE UNIQUE INDEX IF NOT EXISTS orders_pkey ON orders(order_id); -- if not already PK

-- Optional: if you observe plans starting from users and probing orders by user_id
-- this can help (not always necessary if starting from order_items is selective)
CREATE INDEX IF NOT EXISTS orders_user_id_order_id_idx
  ON orders (user_id, order_id);

-- If product_id=12345 is a very common, repeated predicate,
-- a partial index can be even smaller/faster:
-- CREATE INDEX IF NOT EXISTS order_items_p12345_order_id_partial_idx
--   ON order_items(order_id)
--   WHERE product_id = 12345;

๐Ÿ“ Notes and considerations

  • Run ANALYZE on orders and order_items after creating indexes to give the planner good stats.
  • Validate with EXPLAIN (ANALYZE, BUFFERS) before/after to ensure the plan becomes a semi-join with index scans.
  • If product_idโ€™s data type differs from the literalโ€™s type, keep the same literal form (to preserve semantics) but ensure your client binds it with the correct type to avoid implicit casts that could inhibit index use.
Example 4

Original SQL query

SELECT
  u.username,
  COUNT(l.log_id) AS log_count
FROM
  users u,
  logs l
WHERE
  u.role = 'Admin'
GROUP BY
  u.username;

Optimizer output

๐Ÿš€ Optimized SQL query

-- Preserves the original Cartesian-product semantics without materializing it
WITH log_stats AS (
  SELECT
    COUNT(*)        AS total_logs_rows,   -- used only to detect empty logs table
    COUNT(l.log_id) AS nonnull_logids     -- matches COUNT(l.log_id) in the original
  FROM logs l
)
SELECT
  u.username,
  COUNT(*) * ls.nonnull_logids AS log_count
FROM users u
CROSS JOIN log_stats ls
WHERE
  u.role = 'Admin'
  AND ls.total_logs_rows > 0   -- important: return no rows if logs is empty (exactly like the original)
GROUP BY
  u.username, ls.nonnull_logids;

๐Ÿ’ฌ Explanation

#Explanation
1The original query creates a Cartesian product between filtered admins and all logs, then counts l.log_id per username. The rewrite computes the two needed counts separately: number of admins per username (COUNT(*) on users after filtering) and number of non-null log_id rows in logs (COUNT(l.log_id)), then multiplies them โ€” exactly equivalent mathematically.
2If logs has zero rows, the original FROM u, l yields no rows and thus no groups. The condition ls.total_logs_rows > 0 preserves this behavior precisely, while still returning rows (with count 0) when logs has rows but all log_id are NULL.
3Grouping remains on u.username as in the original, preserving the semantics where multiple users sharing the same username are combined.
4Uses explicit CROSS JOIN of a one-row derived table instead of implicit comma joins, avoiding row explosion while keeping identical results. The COUNTs are bigint in PostgreSQL; their product remains bigint.

๐Ÿ› ๏ธ Database indexes (optional)

-- Helps filter admins and aggregate by username efficiently
CREATE INDEX IF NOT EXISTS users_admin_username_idx
  ON users (username)
  WHERE role = 'Admin';

-- If log_id is not already a PRIMARY KEY or indexed, consider:
-- This can enable an index-only scan for COUNT(log_id) when pages are all-visible
CREATE INDEX IF NOT EXISTS logs_log_id_idx ON logs (log_id);

๐Ÿ“ Notes and considerations

  • Important: Introducing a join between users and logs (e.g., ON l.user_id = u.user_id) would change the result set and does not preserve the original queryโ€™s semantics. If your intent was to count logs per admin user, use an explicit JOIN on the appropriate key and GROUP BY user_id, username instead.
Example 5

Original SQL query

SELECT
  product_id,
  product_name,
  stock_quantity
FROM
  products
WHERE
  product_name ILIKE '%obsolete%'
  OR product_name ILIKE '%discontinued%'
  OR product_name ILIKE '%legacy%'
  OR product_name ILIKE '%deprecated%'
  OR product_name ILIKE '%oldmodel%'
  OR product_name ILIKE '%v1_deprecated%';

Optimizer output

๐Ÿš€ Optimized SQL query

SELECT
    product_id,
    product_name,
    stock_quantity
FROM
    products
WHERE
    -- Same case-insensitive substring matches, preserved 1:1
    product_name ILIKE ANY (ARRAY[
        '%obsolete%',
        '%discontinued%',
        '%legacy%',
        '%deprecated%',
        '%oldmodel%',
        '%v1_deprecated%' -- note: "_" is a single-character wildcard in ILIKE (same as original)
    ]);

๐Ÿ’ฌ Explanation

#Explanation
1Replaced the long OR chain with a single ILIKE ANY array predicate. This preserves identical semantics (including ILIKE wildcard behavior for % and _) while letting the planner combine conditions efficiently (typically as a bitmap OR).
2This form is directly indexable by a pg_trgm GIN index on product_name, enabling fast substring search despite leading wildcards.
3Using ILIKE ANY avoids subtle semantic changes that could occur if rewritten as a regex; notably, in ILIKE the underscore "" is a single-character wildcard, which would not be true if naively written as a literal "" in a regex.

๐Ÿ› ๏ธ Database indexes (optional)

-- 1) Enable pg_trgm once per database
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- 2) Trigram GIN index to accelerate ILIKE/LIKE/regex substring search
CREATE INDEX IF NOT EXISTS idx_products_product_name_trgm
  ON products
  USING GIN (product_name gin_trgm_ops);

-- Refresh stats after creating the index (optional but recommended)
ANALYZE products;

๐Ÿ“ Notes and considerations

  • The queryโ€™s filtering logic is unchanged, including that "_" in '%v1_deprecated%' matches any single character (not a literal underscore). If you intended a literal underscore, escape it as '%v1_deprecated%' ESCAPE ''.
  • A GIN pg_trgm index cannot be โ€œcoveringโ€ for the selected columns; the executor will still visit the heap for product_id, product_name, and stock_quantity. This is normal for trigram indexes.
  • You can also test a regex alternative for readability: product_name ~* '(obsolete|discontinued|legacy|deprecated|oldmodel|v1.deprecated)'. It is semantically equivalent to the original (since '%deprecated%' already covers '%v1_deprecated%'), but keep in mind the underscore nuance if you rely on it elsewhere. Use EXPLAIN ANALYZE to compare.
Example 6

Original SQL query

WITH order_summary AS (
SELECT
    order_id::TEXT,
    'high_value' AS category,
    SUM(quantity * unit_price) AS total
FROM
    order_items
WHERE
    (quantity * unit_price) >= 1000
GROUP BY
    order_id

UNION ALL

SELECT
    order_id::TEXT,
    'low_value' AS category,
    SUM(quantity * unit_price) AS total
FROM
    order_items
WHERE
    (quantity * unit_price) < 1000
GROUP BY
    order_id
)
SELECT
o.order_date,
o.user_id,
os.category,
os.total
FROM
orders o
JOIN
order_summary os ON o.order_id = os.order_id::INTEGER;

Optimizer output

๐Ÿš€ Optimized SQL query

WITH order_summary AS (
  SELECT
    oi.order_id,  -- keep native type to allow index usage on joins
    -- Single scan with conditional aggregation
    SUM(oi.quantity * oi.unit_price) FILTER (WHERE (oi.quantity * oi.unit_price) >= 1000) AS high_total,
    SUM(oi.quantity * oi.unit_price) FILTER (WHERE (oi.quantity * oi.unit_price) <  1000) AS low_total
  FROM order_items AS oi
  GROUP BY oi.order_id
)
SELECT
  o.order_date,
  o.user_id,
  v.category,
  v.total
FROM orders AS o
JOIN order_summary AS os
  ON o.order_id = os.order_id
CROSS JOIN LATERAL (
  VALUES
    ('high_value'::text, os.high_total),
    ('low_value'::text, os.low_total)
) AS v(category, total)
WHERE v.total IS NOT NULL;  -- preserve original behavior: only emit categories that exist

๐Ÿ’ฌ Explanation

#Explanation
1Replaced the double scan and UNION ALL with a single aggregation over order_items using FILTER. This computes both category totals in one pass.
2Removed the TEXT/INTEGER round-trip casting. order_id stays INTEGER throughout, enabling index usage and avoiding cast-related join penalties.
3The LATERAL VALUES unpivots the two totals into two rows per order, then WHERE v.total IS NOT NULL ensures we only output categories that actually exist (identical to original semantics, which emitted no row for a category with no qualifying items).

๐Ÿ› ๏ธ Database indexes (optional)

-- For the join/grouping
CREATE INDEX IF NOT EXISTS idx_order_items_order_id ON order_items(order_id);

-- If you frequently re-use (quantity * unit_price) elsewhere, consider:
-- 1) A generated column for line_total (Postgres 12+):
ALTER TABLE order_items
  ADD COLUMN line_total numeric GENERATED ALWAYS AS (quantity * unit_price) STORED;

-- 2) Indexes on the generated column (useful if you ever filter by thresholds in other queries):
CREATE INDEX IF NOT EXISTS idx_order_items_line_total ON order_items(line_total);
CREATE INDEX IF NOT EXISTS idx_order_items_line_total_high ON order_items(line_total) WHERE line_total >= 1000;
CREATE INDEX IF NOT EXISTS idx_order_items_line_total_low  ON order_items(line_total) WHERE line_total < 1000;

-- After schema/index changes:
ANALYZE order_items;

๐Ÿ“ Notes and considerations

  • The optimized query returns exactly the same rows and totals as the original: one row per (order, category) only when the order has at least one line in that category; totals are identical.
  • If the workload runs this aggregation often and data is large but relatively stable, consider a materialized view of per-order high/low totals, refreshed on schedule or after ETL. Use indexes on the MV for fast joins.

SQL optimization checklist

Use this checklist to review SQL query performance improvements before merging query changes into production.

  • Start with EXPLAIN (ANALYZE, BUFFERS) before rewriting SQL to confirm the real bottleneck.
  • Preserve query semantics first, then reduce scans, sorts, and repeated work.
  • Use indexes that match your predicates, join keys, and grouping strategy.
  • Re-analyze tables after major index changes so the planner has fresh statistics.

After optimization, use the SQL validator, SQL formatter, and SQL explainer to verify syntax, readability, and query intent.

SQL optimizer FAQ

What does an SQL optimizer change?

A query optimizer rewrite changes SQL structure to reduce unnecessary work while keeping the same result set. Typical changes include replacing correlated subqueries, improving predicates, and simplifying joins.

Do I always need new indexes after SQL rewrites?

Not always. Some rewrites improve performance without schema changes. Add indexes when execution plans still show expensive scans or when repeated query patterns benefit from targeted index support.

How do I validate an optimization safely?

Compare row counts and key aggregates between old and new queries, then compare execution plans with EXPLAIN (ANALYZE, BUFFERS). Keep only rewrites and indexes that improve real workload performance.