Mastering SQL Techniques: NULL Handling, Deduplication, Pagination, and Special Keywords
by Eugene Koshy
Mastering SQL Techniques: NULL Handling, Deduplication, Pagination, and Special Keywords
📌 Table of Contents
- Introduction
- 1. Handling NULL Values
- 2. DISTINCT Keyword
- 3. LIMIT and OFFSET for Pagination
- 4. Special SQL Keywords by Database
- Conclusion and Best Practices
🛠 Introduction
SQL is the backbone of relational databases, but mastering its nuances is critical for performance and accuracy. This guide covers four essential areas—NULL handling, deduplication, pagination, and vendor-specific keywords—with real-world scenarios and best practices.
1️⃣ Handling NULL Values
NULL values represent missing or undefined data, which can lead to incorrect calculations, skewed reports, or even application crashes.
🔹 Why NULL Handling Matters
- Data Integrity: NULLs can break arithmetic operations (e.g.,
SUM
,AVG
). - Query Accuracy:
WHERE column = NULL
does not work (useIS NULL
). - Performance: Indexes may exclude NULLs, affecting query plans.
🔹 Core Functions and Operators
COALESCE()
– Returns the first non-NULL value
SELECT COALESCE(employee_name, 'Unknown') AS name FROM employees;
IS NULL
vs. = NULL
– Correct way to check for NULL
SELECT * FROM orders WHERE discount IS NULL; -- ✅ Correct way
Database-Specific NULL Functions
- MySQL:
IFNULL()
SELECT IFNULL(salary, 0) FROM employees;
- Oracle:
NVL()
SELECT NVL(address, 'Not Provided') FROM customers;
- PostgreSQL:
NULLIF()
SELECT AVG(NULLIF(revenue, 0)) FROM sales;
🔹 Handling NULLs in Aggregations
SELECT COUNT(*), COUNT(salary) FROM employees;
COUNT(*)
includes NULLsCOUNT(salary)
ignores NULLs
2️⃣ DISTINCT Keyword
The DISTINCT
keyword removes duplicate rows but can be inefficient for large datasets.
🔹 DISTINCT vs. GROUP BY
SELECT city FROM users GROUP BY city; -- Often faster with indexes
🔹 PostgreSQL’s DISTINCT ON
SELECT DISTINCT ON (customer_id) customer_id, order_date FROM orders ORDER BY customer_id, order_date DESC;
3️⃣ LIMIT and OFFSET for Pagination
🔹 The Problem with OFFSET
Using OFFSET 10000
forces the database to scan and discard 10,000 rows, slowing queries.
🔹 Keyset Pagination (Seek Method) – Faster Alternative
SELECT * FROM employees WHERE id > 100 ORDER BY id LIMIT 10;
Metric | LIMIT/OFFSET |
Keyset |
---|---|---|
Speed | Slower | Faster |
Scalability | Poor | Excellent |
4️⃣ Special SQL Keywords by Database
🔹 PostgreSQL: RETURNING
INSERT INTO logs (message, severity) VALUES ('Login failed', 'HIGH') RETURNING log_id, created_at;
🔹 Oracle: MERGE
MERGE INTO inventory_target t USING inventory_source s ON (t.product_id = s.product_id)
WHEN MATCHED THEN UPDATE SET t.stock = t.stock + s.restock
WHEN NOT MATCHED THEN INSERT (product_id, stock) VALUES (s.product_id, s.restock);
🔹 MySQL: ON DUPLICATE KEY UPDATE
INSERT INTO users (email, last_login) VALUES ('alice@example.com', NOW())
ON DUPLICATE KEY UPDATE last_login = NOW();
🔹 Common Table Expressions (CTEs)
WITH RECURSIVE org_chart AS (
SELECT id, name, manager_id FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart;
✅ Conclusion and Best Practices
NULL Handling
✔ Use COALESCE
for defaults, NULLIF
to exclude values.
❌ Avoid WHERE column = NULL
; use IS NULL
instead.
Deduplication
✔ Prefer GROUP BY
over DISTINCT
for large datasets.
❌ Don’t overuse DISTINCT
without query optimization.
Pagination
✔ Use Keyset Pagination for scalable results.
❌ Avoid OFFSET
for tables with >10k rows.
Vendor-Specific Optimizations
✔ PostgreSQL’s RETURNING
for auditing.
✔ Oracle’s MERGE
for upserts.
✔ MySQL’s ON DUPLICATE KEY UPDATE
for simplicity.
#SQL #DatabaseOptimization #DataEngineering #TechTips #PostgreSQL #DataAnalysis #SoftwareEngineering
tags: SQL - DatabaseOptimization - DataEngineering - Performance