Mastering SQL Techniques: NULL Handling, Deduplication, Pagination, and Special Keywords
by Eugene Koshy
Mastering SQL Techniques: NULL Handling, Deduplication, Pagination, and Special Keywords
📌 Table of Contents
- Introduction
- 1. Handling NULL Values
- 2. DISTINCT Keyword
- 3. LIMIT and OFFSET for Pagination
- 4. Special SQL Keywords by Database
- Conclusion and Best Practices
🛠 Introduction
SQL is the backbone of relational databases, but mastering its nuances is critical for performance and accuracy. This guide covers four essential areas—NULL handling, deduplication, pagination, and vendor-specific keywords—with real-world scenarios and best practices.
1️⃣ Handling NULL Values
NULL values represent missing or undefined data, which can lead to incorrect calculations, skewed reports, or even application crashes.
🔹 Why NULL Handling Matters
- Data Integrity: NULLs can break arithmetic operations (e.g.,
SUM,AVG). - Query Accuracy:
WHERE column = NULLdoes not work (useIS NULL). - Performance: Indexes may exclude NULLs, affecting query plans.
🔹 Core Functions and Operators
COALESCE() – Returns the first non-NULL value
SELECT COALESCE(employee_name, 'Unknown') AS name FROM employees;
IS NULL vs. = NULL – Correct way to check for NULL
SELECT * FROM orders WHERE discount IS NULL; -- ✅ Correct way
Database-Specific NULL Functions
- MySQL:
IFNULL()SELECT IFNULL(salary, 0) FROM employees; - Oracle:
NVL()SELECT NVL(address, 'Not Provided') FROM customers; - PostgreSQL:
NULLIF()SELECT AVG(NULLIF(revenue, 0)) FROM sales;
🔹 Handling NULLs in Aggregations
SELECT COUNT(*), COUNT(salary) FROM employees;
COUNT(*)includes NULLsCOUNT(salary)ignores NULLs
2️⃣ DISTINCT Keyword
The DISTINCT keyword removes duplicate rows but can be inefficient for large datasets.
🔹 DISTINCT vs. GROUP BY
SELECT city FROM users GROUP BY city; -- Often faster with indexes
🔹 PostgreSQL’s DISTINCT ON
SELECT DISTINCT ON (customer_id) customer_id, order_date FROM orders ORDER BY customer_id, order_date DESC;
3️⃣ LIMIT and OFFSET for Pagination
🔹 The Problem with OFFSET
Using OFFSET 10000 forces the database to scan and discard 10,000 rows, slowing queries.
🔹 Keyset Pagination (Seek Method) – Faster Alternative
SELECT * FROM employees WHERE id > 100 ORDER BY id LIMIT 10;
| Metric | LIMIT/OFFSET |
Keyset |
|---|---|---|
| Speed | Slower | Faster |
| Scalability | Poor | Excellent |
4️⃣ Special SQL Keywords by Database
🔹 PostgreSQL: RETURNING
INSERT INTO logs (message, severity) VALUES ('Login failed', 'HIGH') RETURNING log_id, created_at;
🔹 Oracle: MERGE
MERGE INTO inventory_target t USING inventory_source s ON (t.product_id = s.product_id)
WHEN MATCHED THEN UPDATE SET t.stock = t.stock + s.restock
WHEN NOT MATCHED THEN INSERT (product_id, stock) VALUES (s.product_id, s.restock);
🔹 MySQL: ON DUPLICATE KEY UPDATE
INSERT INTO users (email, last_login) VALUES ('alice@example.com', NOW())
ON DUPLICATE KEY UPDATE last_login = NOW();
🔹 Common Table Expressions (CTEs)
WITH RECURSIVE org_chart AS (
SELECT id, name, manager_id FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart;
✅ Conclusion and Best Practices
NULL Handling
✔ Use COALESCE for defaults, NULLIF to exclude values.
❌ Avoid WHERE column = NULL; use IS NULL instead.
Deduplication
✔ Prefer GROUP BY over DISTINCT for large datasets.
❌ Don’t overuse DISTINCT without query optimization.
Pagination
✔ Use Keyset Pagination for scalable results.
❌ Avoid OFFSET for tables with >10k rows.
Vendor-Specific Optimizations
✔ PostgreSQL’s RETURNING for auditing.
✔ Oracle’s MERGE for upserts.
✔ MySQL’s ON DUPLICATE KEY UPDATE for simplicity.
#SQL #DatabaseOptimization #DataEngineering #TechTips #PostgreSQL #DataAnalysis #SoftwareEngineering
tags: SQL - DatabaseOptimization - DataEngineering - Performance