What does an SQL optimizer change?
A query optimizer rewrite changes SQL structure to reduce unnecessary work while keeping the same result set. Typical changes include replacing correlated subqueries, improving predicates, and simplifying joins.
This page focuses on real PostgreSQL optimization patterns: predicate tuning, correlated subquery rewrites, semi-join usage, aggregation simplification, and index guidance. You can inspect each original query and compare it to a rewritten version with detailed reasoning.
The examples are for PostgreSQL database but all common databases are support, e.g. MySQL, Oracle, MS SQL Sever.
Explore how our SQL optimizer improves different SQL queries.
Original SQL query
SELECT
user_id,
username,
full_name,
email
FROM
users
WHERE
UPPER(email) = 'JOHN.DOE@EXAMPLE.COM'; Optimizer output
-- Keep the same logic; rely on a functional index on upper(email)
SELECT
user_id,
username,
full_name,
email
FROM
users
WHERE
UPPER(email) = 'JOHN.DOE@EXAMPLE.COM';
| # | Explanation |
|---|---|
| 1 | The current predicate UPPER(email) = 'JOHN.DOE@EXAMPLE.COM' defines case-insensitive equality via uppercasing the column. Changing it to LOWER(...) or citext may subtly change semantics for non-ASCII data. Therefore, we keep it exactly as-is to guarantee identical results. |
| 2 | Performance should be addressed by indexing the expression used in the WHERE clause so the planner can perform an index scan instead of a full table scan. |
| 3 | No query rewrite is necessary; with the functional index in place, the optimizer can use it. Optionally, INCLUDE columns to enable index-only scans. |
-- Primary: functional index that matches the predicate
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_upper_email
ON users (UPPER(email));
-- Optional: allow index-only scans for this query shape
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_upper_email_cover
ON users (UPPER(email)) INCLUDE (user_id, username, full_name, email);
-- Refresh stats so the planner recognizes index selectivity
ANALYZE users;
Alternative options (only if you can change semantics/implementation consistently across the app):
Original SQL query
SELECT
p.product_id,
p.product_name,
p.price,
(
SELECT MAX(o.order_date)
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE oi.product_id = p.product_id
) AS last_ordered_date
FROM
products p; Optimizer output
SELECT
p.product_id,
p.product_name,
p.price,
lo.last_ordered_date
FROM products AS p
LEFT JOIN (
-- Aggregate once: exact same semantics as the correlated MAX() per product
SELECT
oi.product_id,
MAX(o.order_date) AS last_ordered_date
FROM order_items AS oi
JOIN orders AS o ON o.order_id = oi.order_id
GROUP BY oi.product_id
) AS lo
ON lo.product_id = p.product_id;
| # | Explanation |
|---|---|
| 1 | The correlated subquery ran once per product. We precompute MAX(order_date) per product_id exactly once, then LEFT JOIN it to products. Products with no orders still return NULL for last_ordered_date, preserving semantics. |
| 2 | The join between orders and order_items is executed once inside the aggregate subquery and grouped by product_id, eliminating repeated work. |
| 3 | Using LEFT JOIN (not INNER) matches the original behavior where the scalar subquery returns NULL for products with no matching order_items. |
| 4 | MAX(o.order_date) over the same join inputs yields the same value as the per-row correlated MAX; duplicates in order_items donโt affect the maximum. |
| 5 | The rewrite enables PostgreSQL to use hash/merge/parallel aggregates and set-based plans, typically reducing I/O and CPU vs. N-times correlated execution. |
-- Choose based on your join/scan direction; start with the first, add the second if needed:
-- Supports grouping by product_id and subsequent join to orders via order_id
CREATE INDEX IF NOT EXISTS idx_order_items_product_order ON order_items (product_id, order_id);
-- If you often join from orders -> order_items by order_id, this helps that direction
CREATE INDEX IF NOT EXISTS idx_order_items_order_product ON order_items (order_id, product_id);
-- Ensure orders has a PK on (order_id). To enable index-only access to order_date via that join:
-- INCLUDE keeps order_id as the key while storing order_date in the index payload.
CREATE INDEX IF NOT EXISTS idx_orders_orderid_incl_date ON orders (order_id) INCLUDE (order_date);
-- After creating indexes:
ANALYZE order_items;
ANALYZE orders;
Original SQL query
SELECT
user_id,
username,
registration_date
FROM
users
WHERE
user_id IN (
SELECT DISTINCT o.user_id
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE oi.product_id = 12345
); Optimizer output
SELECT
u.user_id,
u.username,
u.registration_date
FROM users AS u
WHERE EXISTS ( -- same semantics as IN (SELECT DISTINCT ...), but allows short-circuit and avoids materialization
SELECT 1
FROM order_items AS oi
JOIN orders AS o ON o.order_id = oi.order_id
WHERE
oi.product_id = 12345
AND o.user_id = u.user_id
);
| # | Explanation |
|---|---|
| 1 | Rewrote IN with a DISTINCT subquery to a correlated EXISTS. This preserves exact semantics (including NULL handling) while avoiding duplicate elimination/materialization. |
| 2 | EXISTS naturally implements a semi-join: it stops after the first qualifying row, which is typically faster than building a DISTINCT list of user_ids. |
| 3 | The join order in the EXISTS lets PostgreSQL start from order_items filtered by product_id and then check the matching order via its PK, minimizing work. |
| 4 | Semantics check: Users with NULL user_id will not match (same as original). Subquery NULL user_id rows do not cause matches (same as original). Results are identical. |
-- Drives the lookup by product_id and supplies order_id to join back to orders
CREATE INDEX IF NOT EXISTS order_items_product_id_order_id_idx
ON order_items (product_id, order_id);
-- Usually orders.order_id is the PK; ensure it's present
-- CREATE UNIQUE INDEX IF NOT EXISTS orders_pkey ON orders(order_id); -- if not already PK
-- Optional: if you observe plans starting from users and probing orders by user_id
-- this can help (not always necessary if starting from order_items is selective)
CREATE INDEX IF NOT EXISTS orders_user_id_order_id_idx
ON orders (user_id, order_id);
-- If product_id=12345 is a very common, repeated predicate,
-- a partial index can be even smaller/faster:
-- CREATE INDEX IF NOT EXISTS order_items_p12345_order_id_partial_idx
-- ON order_items(order_id)
-- WHERE product_id = 12345;
Original SQL query
SELECT
u.username,
COUNT(l.log_id) AS log_count
FROM
users u,
logs l
WHERE
u.role = 'Admin'
GROUP BY
u.username; Optimizer output
-- Preserves the original Cartesian-product semantics without materializing it
WITH log_stats AS (
SELECT
COUNT(*) AS total_logs_rows, -- used only to detect empty logs table
COUNT(l.log_id) AS nonnull_logids -- matches COUNT(l.log_id) in the original
FROM logs l
)
SELECT
u.username,
COUNT(*) * ls.nonnull_logids AS log_count
FROM users u
CROSS JOIN log_stats ls
WHERE
u.role = 'Admin'
AND ls.total_logs_rows > 0 -- important: return no rows if logs is empty (exactly like the original)
GROUP BY
u.username, ls.nonnull_logids;
| # | Explanation |
|---|---|
| 1 | The original query creates a Cartesian product between filtered admins and all logs, then counts l.log_id per username. The rewrite computes the two needed counts separately: number of admins per username (COUNT(*) on users after filtering) and number of non-null log_id rows in logs (COUNT(l.log_id)), then multiplies them โ exactly equivalent mathematically. |
| 2 | If logs has zero rows, the original FROM u, l yields no rows and thus no groups. The condition ls.total_logs_rows > 0 preserves this behavior precisely, while still returning rows (with count 0) when logs has rows but all log_id are NULL. |
| 3 | Grouping remains on u.username as in the original, preserving the semantics where multiple users sharing the same username are combined. |
| 4 | Uses explicit CROSS JOIN of a one-row derived table instead of implicit comma joins, avoiding row explosion while keeping identical results. The COUNTs are bigint in PostgreSQL; their product remains bigint. |
-- Helps filter admins and aggregate by username efficiently
CREATE INDEX IF NOT EXISTS users_admin_username_idx
ON users (username)
WHERE role = 'Admin';
-- If log_id is not already a PRIMARY KEY or indexed, consider:
-- This can enable an index-only scan for COUNT(log_id) when pages are all-visible
CREATE INDEX IF NOT EXISTS logs_log_id_idx ON logs (log_id);
Original SQL query
SELECT
product_id,
product_name,
stock_quantity
FROM
products
WHERE
product_name ILIKE '%obsolete%'
OR product_name ILIKE '%discontinued%'
OR product_name ILIKE '%legacy%'
OR product_name ILIKE '%deprecated%'
OR product_name ILIKE '%oldmodel%'
OR product_name ILIKE '%v1_deprecated%'; Optimizer output
SELECT
product_id,
product_name,
stock_quantity
FROM
products
WHERE
-- Same case-insensitive substring matches, preserved 1:1
product_name ILIKE ANY (ARRAY[
'%obsolete%',
'%discontinued%',
'%legacy%',
'%deprecated%',
'%oldmodel%',
'%v1_deprecated%' -- note: "_" is a single-character wildcard in ILIKE (same as original)
]);
| # | Explanation |
|---|---|
| 1 | Replaced the long OR chain with a single ILIKE ANY array predicate. This preserves identical semantics (including ILIKE wildcard behavior for % and _) while letting the planner combine conditions efficiently (typically as a bitmap OR). |
| 2 | This form is directly indexable by a pg_trgm GIN index on product_name, enabling fast substring search despite leading wildcards. |
| 3 | Using ILIKE ANY avoids subtle semantic changes that could occur if rewritten as a regex; notably, in ILIKE the underscore "" is a single-character wildcard, which would not be true if naively written as a literal "" in a regex. |
-- 1) Enable pg_trgm once per database
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- 2) Trigram GIN index to accelerate ILIKE/LIKE/regex substring search
CREATE INDEX IF NOT EXISTS idx_products_product_name_trgm
ON products
USING GIN (product_name gin_trgm_ops);
-- Refresh stats after creating the index (optional but recommended)
ANALYZE products;
Original SQL query
WITH order_summary AS (
SELECT
order_id::TEXT,
'high_value' AS category,
SUM(quantity * unit_price) AS total
FROM
order_items
WHERE
(quantity * unit_price) >= 1000
GROUP BY
order_id
UNION ALL
SELECT
order_id::TEXT,
'low_value' AS category,
SUM(quantity * unit_price) AS total
FROM
order_items
WHERE
(quantity * unit_price) < 1000
GROUP BY
order_id
)
SELECT
o.order_date,
o.user_id,
os.category,
os.total
FROM
orders o
JOIN
order_summary os ON o.order_id = os.order_id::INTEGER; Optimizer output
WITH order_summary AS (
SELECT
oi.order_id, -- keep native type to allow index usage on joins
-- Single scan with conditional aggregation
SUM(oi.quantity * oi.unit_price) FILTER (WHERE (oi.quantity * oi.unit_price) >= 1000) AS high_total,
SUM(oi.quantity * oi.unit_price) FILTER (WHERE (oi.quantity * oi.unit_price) < 1000) AS low_total
FROM order_items AS oi
GROUP BY oi.order_id
)
SELECT
o.order_date,
o.user_id,
v.category,
v.total
FROM orders AS o
JOIN order_summary AS os
ON o.order_id = os.order_id
CROSS JOIN LATERAL (
VALUES
('high_value'::text, os.high_total),
('low_value'::text, os.low_total)
) AS v(category, total)
WHERE v.total IS NOT NULL; -- preserve original behavior: only emit categories that exist
| # | Explanation |
|---|---|
| 1 | Replaced the double scan and UNION ALL with a single aggregation over order_items using FILTER. This computes both category totals in one pass. |
| 2 | Removed the TEXT/INTEGER round-trip casting. order_id stays INTEGER throughout, enabling index usage and avoiding cast-related join penalties. |
| 3 | The LATERAL VALUES unpivots the two totals into two rows per order, then WHERE v.total IS NOT NULL ensures we only output categories that actually exist (identical to original semantics, which emitted no row for a category with no qualifying items). |
-- For the join/grouping
CREATE INDEX IF NOT EXISTS idx_order_items_order_id ON order_items(order_id);
-- If you frequently re-use (quantity * unit_price) elsewhere, consider:
-- 1) A generated column for line_total (Postgres 12+):
ALTER TABLE order_items
ADD COLUMN line_total numeric GENERATED ALWAYS AS (quantity * unit_price) STORED;
-- 2) Indexes on the generated column (useful if you ever filter by thresholds in other queries):
CREATE INDEX IF NOT EXISTS idx_order_items_line_total ON order_items(line_total);
CREATE INDEX IF NOT EXISTS idx_order_items_line_total_high ON order_items(line_total) WHERE line_total >= 1000;
CREATE INDEX IF NOT EXISTS idx_order_items_line_total_low ON order_items(line_total) WHERE line_total < 1000;
-- After schema/index changes:
ANALYZE order_items;
Use this checklist to review SQL query performance improvements before merging query changes into production.
After optimization, use the SQL validator, SQL formatter, and SQL explainer to verify syntax, readability, and query intent.
A query optimizer rewrite changes SQL structure to reduce unnecessary work while keeping the same result set. Typical changes include replacing correlated subqueries, improving predicates, and simplifying joins.
Not always. Some rewrites improve performance without schema changes. Add indexes when execution plans still show expensive scans or when repeated query patterns benefit from targeted index support.
Compare row counts and key aggregates between old and new queries, then compare execution plans with EXPLAIN (ANALYZE, BUFFERS). Keep only rewrites and indexes that improve real workload performance.
Get started for free
Generate, validate, optimize, and explain queries in seconds. Bring your own schema, keep full control, and start free with no credit card required.