Mastering SQL Techniques: NULL Handling, Deduplication, Pagination, and Special Keywords

by Eugene Koshy

Mastering SQL Techniques: NULL Handling, Deduplication, Pagination, and Special Keywords

📌 Table of Contents

Introduction
1. Handling NULL Values
2. DISTINCT Keyword
3. LIMIT and OFFSET for Pagination
- The Problem with OFFSET
- Keyset Pagination (Seek Method)
4. Special SQL Keywords by Database
Conclusion and Best Practices

🛠 Introduction

SQL is the backbone of relational databases, but mastering its nuances is critical for performance and accuracy. This guide covers four essential areas—NULL handling, deduplication, pagination, and vendor-specific keywords—with real-world scenarios and best practices.

1️⃣ Handling NULL Values

NULL values represent missing or undefined data, which can lead to incorrect calculations, skewed reports, or even application crashes.

🔹 Why NULL Handling Matters

Data Integrity: NULLs can break arithmetic operations (e.g., SUM, AVG).
Query Accuracy: WHERE column = NULL does not work (use IS NULL).
Performance: Indexes may exclude NULLs, affecting query plans.

🔹 Core Functions and Operators

`COALESCE()` – Returns the first non-NULL value

SELECT COALESCE(employee_name, 'Unknown') AS name FROM employees;

`IS NULL` vs. `= NULL` – Correct way to check for NULL

SELECT * FROM orders WHERE discount IS NULL;  -- ✅ Correct way

Database-Specific NULL Functions

MySQL: IFNULL()

SELECT IFNULL(salary, 0) FROM employees;

Oracle: NVL()

SELECT NVL(address, 'Not Provided') FROM customers;

PostgreSQL: NULLIF()

SELECT AVG(NULLIF(revenue, 0)) FROM sales;

🔹 Handling NULLs in Aggregations

SELECT COUNT(*), COUNT(salary) FROM employees;

COUNT(*) includes NULLs
COUNT(salary) ignores NULLs

2️⃣ DISTINCT Keyword

The DISTINCT keyword removes duplicate rows but can be inefficient for large datasets.

🔹 DISTINCT vs. GROUP BY

SELECT city FROM users GROUP BY city;  -- Often faster with indexes

🔹 PostgreSQL’s `DISTINCT ON`

SELECT DISTINCT ON (customer_id) customer_id, order_date FROM orders ORDER BY customer_id, order_date DESC;

3️⃣ LIMIT and OFFSET for Pagination

🔹 The Problem with OFFSET

Using OFFSET 10000 forces the database to scan and discard 10,000 rows, slowing queries.

🔹 Keyset Pagination (Seek Method) – Faster Alternative

SELECT * FROM employees WHERE id > 100 ORDER BY id LIMIT 10;

Metric	`LIMIT/OFFSET`	Keyset
Speed	Slower	Faster
Scalability	Poor	Excellent

4️⃣ Special SQL Keywords by Database

🔹 PostgreSQL: `RETURNING`

INSERT INTO logs (message, severity) VALUES ('Login failed', 'HIGH') RETURNING log_id, created_at;

🔹 Oracle: `MERGE`

MERGE INTO inventory_target t USING inventory_source s ON (t.product_id = s.product_id)
WHEN MATCHED THEN UPDATE SET t.stock = t.stock + s.restock
WHEN NOT MATCHED THEN INSERT (product_id, stock) VALUES (s.product_id, s.restock);

🔹 MySQL: `ON DUPLICATE KEY UPDATE`

INSERT INTO users (email, last_login) VALUES ('alice@example.com', NOW())
ON DUPLICATE KEY UPDATE last_login = NOW();

🔹 Common Table Expressions (CTEs)

WITH RECURSIVE org_chart AS (
  SELECT id, name, manager_id FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id FROM employees e
  JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart;

✅ Conclusion and Best Practices

NULL Handling

✔ Use COALESCE for defaults, NULLIF to exclude values. ❌ Avoid WHERE column = NULL; use IS NULL instead.

Deduplication

✔ Prefer GROUP BY over DISTINCT for large datasets. ❌ Don’t overuse DISTINCT without query optimization.

Pagination

✔ Use Keyset Pagination for scalable results. ❌ Avoid OFFSET for tables with >10k rows.

Vendor-Specific Optimizations

✔ PostgreSQL’s RETURNING for auditing. ✔ Oracle’s MERGE for upserts. ✔ MySQL’s ON DUPLICATE KEY UPDATE for simplicity.

#SQL #DatabaseOptimization #DataEngineering #TechTips #PostgreSQL #DataAnalysis #SoftwareEngineering

tags: SQL - DatabaseOptimization - DataEngineering - Performance

Mastering SQL Techniques: NULL Handling, Deduplication, Pagination, and Special Keywords

Mastering SQL Techniques: NULL Handling, Deduplication, Pagination, and Special Keywords

📌 Table of Contents

🛠 Introduction

1️⃣ Handling NULL Values

🔹 Why NULL Handling Matters

🔹 Core Functions and Operators

COALESCE() – Returns the first non-NULL value

IS NULL vs. = NULL – Correct way to check for NULL

Database-Specific NULL Functions

🔹 Handling NULLs in Aggregations

2️⃣ DISTINCT Keyword

🔹 DISTINCT vs. GROUP BY

🔹 PostgreSQL’s DISTINCT ON

3️⃣ LIMIT and OFFSET for Pagination

🔹 The Problem with OFFSET

🔹 Keyset Pagination (Seek Method) – Faster Alternative

4️⃣ Special SQL Keywords by Database

🔹 PostgreSQL: RETURNING

🔹 Oracle: MERGE

🔹 MySQL: ON DUPLICATE KEY UPDATE

🔹 Common Table Expressions (CTEs)

✅ Conclusion and Best Practices

NULL Handling

Deduplication

Pagination

Vendor-Specific Optimizations

`COALESCE()` – Returns the first non-NULL value

`IS NULL` vs. `= NULL` – Correct way to check for NULL

🔹 PostgreSQL’s `DISTINCT ON`

🔹 PostgreSQL: `RETURNING`

🔹 Oracle: `MERGE`

🔹 MySQL: `ON DUPLICATE KEY UPDATE`