Linkedin_Articles

View on GitHub

Mastering SQL Techniques: NULL Handling, Deduplication, Pagination, and Special Keywords

by Eugene Koshy

Mastering SQL Techniques: NULL Handling, Deduplication, Pagination, and Special Keywords

📌 Table of Contents


🛠 Introduction

SQL is the backbone of relational databases, but mastering its nuances is critical for performance and accuracy. This guide covers four essential areas—NULL handling, deduplication, pagination, and vendor-specific keywords—with real-world scenarios and best practices.


1️⃣ Handling NULL Values

NULL values represent missing or undefined data, which can lead to incorrect calculations, skewed reports, or even application crashes.

🔹 Why NULL Handling Matters

🔹 Core Functions and Operators

COALESCE() – Returns the first non-NULL value

SELECT COALESCE(employee_name, 'Unknown') AS name FROM employees;

IS NULL vs. = NULL – Correct way to check for NULL

SELECT * FROM orders WHERE discount IS NULL;  -- ✅ Correct way

Database-Specific NULL Functions

🔹 Handling NULLs in Aggregations

SELECT COUNT(*), COUNT(salary) FROM employees;

2️⃣ DISTINCT Keyword

The DISTINCT keyword removes duplicate rows but can be inefficient for large datasets.

🔹 DISTINCT vs. GROUP BY

SELECT city FROM users GROUP BY city;  -- Often faster with indexes

🔹 PostgreSQL’s DISTINCT ON

SELECT DISTINCT ON (customer_id) customer_id, order_date FROM orders ORDER BY customer_id, order_date DESC;

3️⃣ LIMIT and OFFSET for Pagination

🔹 The Problem with OFFSET

Using OFFSET 10000 forces the database to scan and discard 10,000 rows, slowing queries.

🔹 Keyset Pagination (Seek Method) – Faster Alternative

SELECT * FROM employees WHERE id > 100 ORDER BY id LIMIT 10;
Metric LIMIT/OFFSET Keyset
Speed Slower Faster
Scalability Poor Excellent

4️⃣ Special SQL Keywords by Database

🔹 PostgreSQL: RETURNING

INSERT INTO logs (message, severity) VALUES ('Login failed', 'HIGH') RETURNING log_id, created_at;

🔹 Oracle: MERGE

MERGE INTO inventory_target t USING inventory_source s ON (t.product_id = s.product_id)
WHEN MATCHED THEN UPDATE SET t.stock = t.stock + s.restock
WHEN NOT MATCHED THEN INSERT (product_id, stock) VALUES (s.product_id, s.restock);

🔹 MySQL: ON DUPLICATE KEY UPDATE

INSERT INTO users (email, last_login) VALUES ('alice@example.com', NOW())
ON DUPLICATE KEY UPDATE last_login = NOW();

🔹 Common Table Expressions (CTEs)

WITH RECURSIVE org_chart AS (
  SELECT id, name, manager_id FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id FROM employees e
  JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart;

✅ Conclusion and Best Practices

NULL Handling

✔ Use COALESCE for defaults, NULLIF to exclude values. ❌ Avoid WHERE column = NULL; use IS NULL instead.

Deduplication

✔ Prefer GROUP BY over DISTINCT for large datasets. ❌ Don’t overuse DISTINCT without query optimization.

Pagination

✔ Use Keyset Pagination for scalable results. ❌ Avoid OFFSET for tables with >10k rows.

Vendor-Specific Optimizations

✔ PostgreSQL’s RETURNING for auditing. ✔ Oracle’s MERGE for upserts. ✔ MySQL’s ON DUPLICATE KEY UPDATE for simplicity.


#SQL #DatabaseOptimization #DataEngineering #TechTips #PostgreSQL #DataAnalysis #SoftwareEngineering

tags: SQL - DatabaseOptimization - DataEngineering - Performance