Back to Skills

query-optimizer

CuriousLearner
Updated Today
20 views
16
3
16
View on GitHub
Othergeneral

About

The query-optimizer skill analyzes and improves SQL query performance by identifying bottlenecks like inefficient joins, missing indexes, and full table scans. It provides actionable recommendations for index creation, query rewriting, and database tuning. Developers should use this skill to optimize slow queries and enhance overall database efficiency.

Documentation

Query Optimizer Skill

Analyze and optimize SQL queries for better performance and efficiency.

Instructions

You are a database performance optimization expert. When invoked:

  1. Analyze Query Performance:

    • Use EXPLAIN/EXPLAIN ANALYZE to understand execution plan
    • Identify slow queries from logs
    • Measure query execution time
    • Detect full table scans and missing indexes
  2. Identify Bottlenecks:

    • Find N+1 query problems
    • Detect inefficient JOINs
    • Identify missing or unused indexes
    • Spot suboptimal WHERE clauses
  3. Optimize Queries:

    • Add appropriate indexes
    • Rewrite queries for better performance
    • Suggest caching strategies
    • Recommend query restructuring
  4. Provide Recommendations:

    • Index creation suggestions
    • Query rewriting alternatives
    • Database configuration tuning
    • Monitoring and alerting setup

Supported Databases

  • SQL: PostgreSQL, MySQL, MariaDB, SQL Server, SQLite
  • Analysis Tools: EXPLAIN, EXPLAIN ANALYZE, Query Profiler
  • Monitoring: pg_stat_statements, slow query log, performance schema

Usage Examples

@query-optimizer
@query-optimizer --analyze-slow-queries
@query-optimizer --suggest-indexes
@query-optimizer --explain SELECT * FROM users WHERE email = '[email protected]'
@query-optimizer --fix-n-plus-one

Query Analysis Tools

PostgreSQL - EXPLAIN ANALYZE

-- Basic EXPLAIN
EXPLAIN
SELECT u.id, u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;

-- EXPLAIN ANALYZE - actually runs the query
EXPLAIN ANALYZE
SELECT u.id, u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;

-- EXPLAIN with all options (PostgreSQL)
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT JSON)
SELECT * FROM orders
WHERE user_id = 123
  AND created_at >= '2024-01-01';

Reading EXPLAIN Output:

Seq Scan on users  (cost=0.00..1234.56 rows=10000 width=32)
  Filter: (active = true)

-- Seq Scan = Sequential Scan (full table scan) - BAD for large tables
-- cost=0.00..1234.56 = startup cost..total cost
-- rows=10000 = estimated rows
-- width=32 = average row size in bytes
Index Scan using idx_users_email on users  (cost=0.29..8.30 rows=1 width=32)
  Index Cond: (email = '[email protected]'::text)

-- Index Scan = Using index - GOOD
-- Much lower cost than Seq Scan
-- rows=1 = accurate estimate

MySQL - EXPLAIN

-- MySQL EXPLAIN
EXPLAIN
SELECT u.id, u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;

-- EXPLAIN with execution stats (MySQL 8.0+)
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 123;

-- Show warnings for optimization info
EXPLAIN
SELECT * FROM users WHERE email = '[email protected]';
SHOW WARNINGS;

MySQL EXPLAIN Output:

+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | users | ALL  | NULL          | NULL    | NULL    | NULL  | 1000 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+

-- type=ALL means full table scan - BAD
-- key=NULL means no index used

+----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key            | key_len | ref   | rows | Extra |
+----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+
|  1 | SIMPLE      | users | ref  | idx_users_email| idx_users_email| 767     | const |    1 | NULL  |
+----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+

-- type=ref means index lookup - GOOD
-- key shows index being used

Common Performance Issues

1. Missing Indexes

Problem:

-- Slow query - full table scan
SELECT * FROM users WHERE email = '[email protected]';

-- EXPLAIN shows:
-- Seq Scan on users (cost=0.00..1500.00 rows=1 width=100)
--   Filter: (email = '[email protected]')

Solution:

-- Add index on email column
CREATE INDEX idx_users_email ON users(email);

-- Now EXPLAIN shows:
-- Index Scan using idx_users_email on users (cost=0.29..8.30 rows=1 width=100)
--   Index Cond: (email = '[email protected]')

-- Query becomes 100x faster

2. N+1 Query Problem

Problem:

// ORM code causing N+1 queries
const users = await User.findAll(); // 1 query

for (const user of users) {
  const orders = await Order.findAll({
    where: { userId: user.id }  // N queries (one per user)
  });
  console.log(`${user.name}: ${orders.length} orders`);
}

// Total: 1 + N queries for N users
// For 100 users = 101 queries!

Solution:

// Use eager loading - single query with JOIN
const users = await User.findAll({
  include: [{
    model: Order,
    attributes: ['id', 'total_amount']
  }]
});

for (const user of users) {
  console.log(`${user.name}: ${user.orders.length} orders`);
}

// Total: 1 query regardless of user count

SQL Equivalent:

-- Instead of multiple queries:
SELECT * FROM users;
SELECT * FROM orders WHERE user_id = 1;
SELECT * FROM orders WHERE user_id = 2;
-- ... (N more queries)

-- Use single JOIN query:
SELECT
  u.id,
  u.name,
  COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

3. SELECT * Inefficiency

Problem:

-- Fetching all columns when only need few
SELECT * FROM products
WHERE category_id = 5;

-- Fetches: id, name, description (large text), image_url, specs (json),
--         price, stock, created_at, updated_at, etc.

Solution:

-- Only select needed columns
SELECT id, name, price, stock
FROM products
WHERE category_id = 5;

-- Benefits:
-- - Less data transferred
-- - Faster query execution
-- - Lower memory usage
-- - Can use covering indexes

4. Inefficient Pagination

Problem:

-- OFFSET becomes slow with large offsets
SELECT * FROM users
ORDER BY created_at DESC
LIMIT 20 OFFSET 10000;

-- Database must:
-- 1. Sort all rows
-- 2. Skip 10,000 rows
-- 3. Return next 20
-- Gets slower as offset increases

Solution:

-- Use cursor-based (keyset) pagination
SELECT * FROM users
WHERE created_at < '2024-01-01 12:00:00'
  AND (created_at < '2024-01-01 12:00:00' OR id < 12345)
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Or with indexed column:
SELECT * FROM users
WHERE id < 10000
ORDER BY id DESC
LIMIT 20;

-- Benefits:
-- - Consistent performance regardless of page
-- - Uses index efficiently
-- - No need to skip rows

5. Function on Indexed Column

Problem:

-- Function prevents index usage
SELECT * FROM users
WHERE LOWER(email) = '[email protected]';

-- EXPLAIN shows Seq Scan (index not used)

Solution 1 - Store lowercase:

-- Add computed column
ALTER TABLE users ADD COLUMN email_lower VARCHAR(255)
  GENERATED ALWAYS AS (LOWER(email)) STORED;

CREATE INDEX idx_users_email_lower ON users(email_lower);

-- Query:
SELECT * FROM users
WHERE email_lower = '[email protected]';

Solution 2 - Functional index (PostgreSQL):

-- Create index on function result
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Now original query uses index
SELECT * FROM users
WHERE LOWER(email) = '[email protected]';

Solution 3 - Case-insensitive collation:

-- PostgreSQL - use citext type
ALTER TABLE users ALTER COLUMN email TYPE citext;

-- Query without LOWER:
SELECT * FROM users WHERE email = '[email protected]';
-- Automatically case-insensitive

6. Inefficient JOINs

Problem:

-- Multiple JOINs without proper indexes
SELECT
  u.username,
  o.id as order_id,
  p.name as product_name
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE u.email = '[email protected]';

-- Slow if missing indexes on:
-- - users.email
-- - orders.user_id
-- - order_items.order_id
-- - order_items.product_id

Solution:

-- Add necessary indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

-- Now query uses indexes for all JOINs
-- EXPLAIN will show "Index Scan" for each table

7. OR Conditions

Problem:

-- OR prevents efficient index usage
SELECT * FROM users
WHERE username = 'john' OR email = '[email protected]';

-- May not use indexes optimally

Solution:

-- Use UNION for better index usage
SELECT * FROM users WHERE username = 'john'
UNION
SELECT * FROM users WHERE email = '[email protected]';

-- Each subquery uses its own index
-- Deduplicates results automatically

8. NOT IN with Subquery

Problem:

-- Slow subquery execution
SELECT * FROM users
WHERE id NOT IN (
  SELECT user_id FROM banned_users
);

-- Can be very slow with large subquery results

Solution:

-- Use LEFT JOIN with NULL check
SELECT u.*
FROM users u
LEFT JOIN banned_users bu ON u.id = bu.user_id
WHERE bu.user_id IS NULL;

-- Or use NOT EXISTS (often faster):
SELECT u.*
FROM users u
WHERE NOT EXISTS (
  SELECT 1 FROM banned_users bu
  WHERE bu.user_id = u.id
);

Index Optimization

When to Add Indexes

Add indexes for:

  • Primary keys (automatic in most databases)
  • Foreign keys (critical for JOINs)
  • Columns in WHERE clauses
  • Columns in ORDER BY clauses
  • Columns in GROUP BY clauses
  • Columns in JOIN conditions
  • Columns with high cardinality (many unique values)

Index Types

B-Tree Index (Default):

-- Best for: equality (=) and range (<, >, BETWEEN) queries
CREATE INDEX idx_users_created_at ON users(created_at);

-- Good for:
SELECT * FROM users WHERE created_at > '2024-01-01';
SELECT * FROM users WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';

Composite Index:

-- Index on multiple columns
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Used for queries filtering both columns
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

-- Also used for queries on first column only
SELECT * FROM orders WHERE user_id = 123;

-- NOT used for queries on second column only
SELECT * FROM orders WHERE status = 'pending'; -- Won't use this index

-- Column order matters! Most selective first

Partial Index (PostgreSQL):

-- Index only subset of rows
CREATE INDEX idx_active_users ON users(email)
WHERE active = true;

-- Smaller index, faster queries for active users
SELECT * FROM users WHERE email = '[email protected]' AND active = true;

GIN Index (PostgreSQL - for arrays, JSONB, full-text):

-- For JSONB columns
CREATE INDEX idx_products_metadata ON products USING GIN(metadata);

-- Query JSONB data
SELECT * FROM products
WHERE metadata @> '{"brand": "Apple"}';

-- For array columns
CREATE INDEX idx_tags ON posts USING GIN(tags);

-- Query arrays
SELECT * FROM posts WHERE tags @> ARRAY['postgresql'];

Full-Text Search Index:

-- PostgreSQL
CREATE INDEX idx_products_search ON products
USING GIN(to_tsvector('english', name || ' ' || description));

-- Full-text search query
SELECT * FROM products
WHERE to_tsvector('english', name || ' ' || description)
  @@ to_tsquery('english', 'laptop & gaming');

Covering Index

Concept:

-- Covering index includes all columns needed by query
CREATE INDEX idx_users_email_username ON users(email, username);

-- This query can be answered entirely from index (no table access)
SELECT username FROM users WHERE email = '[email protected]';

-- PostgreSQL: Index-Only Scan
-- MySQL: Using index

With INCLUDE (PostgreSQL 11+):

-- Include non-indexed columns in index leaf nodes
CREATE INDEX idx_users_email ON users(email)
INCLUDE (username, created_at);

-- Query can use index without table access
SELECT username, created_at
FROM users
WHERE email = '[email protected]';

Index Maintenance

Find Unused Indexes (PostgreSQL):

SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Drop unused indexes to save space and improve write performance

Find Duplicate Indexes:

-- PostgreSQL query to find duplicate indexes
SELECT
  indrelid::regclass AS table_name,
  array_agg(indexrelid::regclass) AS indexes
FROM pg_index
GROUP BY indrelid, indkey
HAVING COUNT(*) > 1;

Rebuild Fragmented Indexes:

-- PostgreSQL
REINDEX INDEX idx_users_email;
REINDEX TABLE users;

-- MySQL
OPTIMIZE TABLE users;

Query Rewriting Examples

Example 1: Aggregation Optimization

Before:

SELECT
  u.id,
  u.username,
  (SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count,
  (SELECT SUM(total_amount) FROM orders WHERE user_id = u.id) as total_spent
FROM users u
WHERE u.active = true;

-- N+1 problem: 1 query + 2 subqueries per user

After:

SELECT
  u.id,
  u.username,
  COUNT(o.id) as order_count,
  COALESCE(SUM(o.total_amount), 0) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;

-- Single query with JOIN
-- Much faster!

Example 2: EXISTS vs IN

Before:

SELECT * FROM products
WHERE id IN (
  SELECT product_id FROM order_items
  WHERE created_at >= '2024-01-01'
);

-- Subquery returns all product_ids (potentially large result set)

After:

SELECT p.* FROM products p
WHERE EXISTS (
  SELECT 1 FROM order_items oi
  WHERE oi.product_id = p.id
    AND oi.created_at >= '2024-01-01'
);

-- EXISTS stops at first match (more efficient)

Example 3: Avoid Cartesian Products

Before:

-- Accidental cartesian product
SELECT *
FROM users u, orders o
WHERE u.active = true
  AND o.status = 'completed';

-- Returns every user combined with every completed order!
-- Missing JOIN condition

After:

SELECT u.*, o.*
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.active = true
  AND o.status = 'completed';

-- Proper JOIN condition

Example 4: Optimize DISTINCT

Before:

SELECT DISTINCT user_id
FROM orders
WHERE status = 'completed';

-- DISTINCT requires sorting/deduplication

After:

SELECT user_id
FROM orders
WHERE status = 'completed'
GROUP BY user_id;

-- GROUP BY often faster than DISTINCT
-- Or if unique constraint exists:
SELECT DISTINCT ON (user_id) user_id, created_at
FROM orders
WHERE status = 'completed'
ORDER BY user_id, created_at DESC;

Monitoring Slow Queries

PostgreSQL - pg_stat_statements

-- Enable extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slowest queries
SELECT
  substring(query, 1, 50) AS short_query,
  round(total_exec_time::numeric, 2) AS total_time,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_time,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percentage
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Find queries with most calls
SELECT
  substring(query, 1, 50) AS short_query,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_time
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;

-- Reset statistics
SELECT pg_stat_statements_reset();

MySQL - Slow Query Log

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- Log queries taking > 1 second
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow-query.log';

-- Log queries not using indexes
SET GLOBAL log_queries_not_using_indexes = 'ON';

-- Analyze slow query log
-- Use mysqldumpslow tool:
-- mysqldumpslow -s t -t 10 /var/log/mysql/slow-query.log

Performance Schema (MySQL)

-- Enable performance schema
SET GLOBAL performance_schema = ON;

-- Find slowest statements
SELECT
  DIGEST_TEXT,
  COUNT_STAR AS executions,
  ROUND(AVG_TIMER_WAIT / 1000000000, 2) AS avg_ms,
  ROUND(SUM_TIMER_WAIT / 1000000000, 2) AS total_ms
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;

Best Practices

DO ✓

  • Use EXPLAIN before and after optimization
  • Add indexes on foreign keys - Critical for JOINs
  • Use covering indexes when possible
  • Paginate large result sets - Avoid loading all data
  • Monitor query performance - Use pg_stat_statements or slow query log
  • Test on production-like data - Performance differs with data volume
  • Use connection pooling - Reduce connection overhead
  • Cache frequently accessed data - Redis, Memcached
  • Archive old data - Keep active tables smaller
  • Regular VACUUM/ANALYZE (PostgreSQL) - Update statistics

DON'T ✗

  • **Don't use SELECT *** - Fetch only needed columns
  • Don't over-index - Each index slows down writes
  • Don't ignore EXPLAIN warnings - They indicate problems
  • Don't use functions on indexed columns - Prevents index usage
  • Don't fetch more data than needed - Use LIMIT
  • Don't use OFFSET for deep pagination - Use cursor-based instead
  • Don't ignore database logs - Monitor for errors
  • Don't optimize prematurely - Profile first, optimize bottlenecks
  • Don't forget about write performance - Indexes slow down INSERTs
  • Don't skip testing - Verify optimizations actually help

Query Optimization Checklist

## Query Optimization Checklist

- [ ] Run EXPLAIN/EXPLAIN ANALYZE on query
- [ ] Check if query uses indexes (no Seq Scan on large tables)
- [ ] Verify indexes exist on:
  - [ ] Foreign key columns
  - [ ] WHERE clause columns
  - [ ] JOIN condition columns
  - [ ] ORDER BY columns
- [ ] SELECT only needed columns (avoid SELECT *)
- [ ] Use appropriate JOIN type (INNER vs LEFT)
- [ ] Avoid N+1 queries (use JOINs or eager loading)
- [ ] Use pagination for large result sets
- [ ] Check for unused indexes (slow down writes)
- [ ] Consider query caching for frequent queries
- [ ] Test with production-like data volumes
- [ ] Monitor query performance over time

Notes

  • Always measure before and after optimization
  • Index creation can take time on large tables
  • Too many indexes slow down INSERT/UPDATE/DELETE
  • Keep database statistics up to date (ANALYZE)
  • Consider read replicas for read-heavy workloads
  • Use database-specific features when beneficial
  • Document optimization decisions for team
  • Regular performance audits prevent degradation

Quick Install

/plugin add https://github.com/CuriousLearner/devkit/tree/main/query-optimizer

Copy and paste this command in Claude Code to install this skill

GitHub 仓库

CuriousLearner/devkit
Path: skills/query-optimizer

Related Skills

subagent-driven-development

Development

This skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.

View skill

algorithmic-art

Meta

This Claude Skill creates original algorithmic art using p5.js with seeded randomness and interactive parameters. It generates .md files for algorithmic philosophies, plus .html and .js files for interactive generative art implementations. Use it when developers need to create flow fields, particle systems, or other computational art while avoiding copyright issues.

View skill

executing-plans

Design

Use the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.

View skill

cost-optimization

Other

This Claude Skill helps developers optimize cloud costs through resource rightsizing, tagging strategies, and spending analysis. It provides a framework for reducing cloud expenses and implementing cost governance across AWS, Azure, and GCP. Use it when you need to analyze infrastructure costs, right-size resources, or meet budget constraints.

View skill