Linkedin_Articles

View on GitHub

SQL Series: Mastering the Basics of SELECT

Your first step towards SQL mastery and landing your dream job in top tech companies.

Introduction

Welcome to the first installment of our Progressive SQL Learning Series. Over the coming weeks, we’ll delve into SQL concepts, starting from the basics and advancing to complex topics. This series is designed to equip you with the skills needed to excel in interviews at top tech companies.

In this article, we’ll explore:

What is the SELECT Statement?

The SELECT statement allows you to specify what data you want to retrieve. It’s like choosing specific pages from a book instead of reading it all. πŸ“–βž‘οΈπŸ”

Basic Syntax

SELECT column1, column2, ...  
FROM table_name 
[WHERE condition] 
[ORDER BY column_name ASC|DESC];

Note: Replace column1, column2, ... with the actual column names you wish to retrieve, and table_name with the name of the table.

Key Components

Projection: Specifying the Columns You Want to Fetch

SELECT name, age FROM employees;

Retrieves the name and age columns from the employees table. Avoid using SELECT * in production to reduce I/O overhead. πŸš«πŸ“Š

Filtering with WHERE: Extract Rows That Meet Specific Conditions

SELECT name, age  
FROM employees  
WHERE age > 30;

Fetches the name and age of employees who are older than 30. The WHERE clause filters records based on the specified condition. πŸ”πŸ“ˆ

Sorting with ORDER BY: Organize Data

SELECT name, age  
FROM employees  
ORDER BY age DESC;

Retrieves the name and age of employees, sorted by age in descending order. The ORDER BY clause sorts the result set based on one or more columns. πŸ”’πŸ”½

Aliasing for Readability

SELECT name AS employee_name, age AS employee_age  
FROM employees;

Renames the name column to employee_name and the age column to employee_age in the result set for better readability. The AS keyword assigns an alias to a column or table. πŸ·οΈπŸ” 

Performance Considerations

Practice Questions for Beginners

Question 1: Fetch the names and departments of employees aged above 25.

SELECT name, department  
FROM employees  
WHERE age > 25;

Retrieves the name and department of employees who are older than 25. The WHERE clause filters records based on the specified condition. πŸ”πŸ“ˆ

Question 2: Retrieve the unique job titles from the jobs table.

SELECT DISTINCT job_title  
FROM jobs;

Fetches all unique job titles from the jobs table. The DISTINCT keyword ensures that duplicate job titles are not included in the result set. πŸ”„πŸ’Ό

Question 3: Display employee names and their salaries, sorted by salary in ascending order.

SELECT name, salary  
FROM employees  
ORDER BY salary ASC;

Retrieves the name and salary of employees, sorted by salary in ascending order. The ORDER BY clause sorts the result set based on the specified column. πŸ”’πŸ”Ό

Key Takeaways


Mastering SQL: GROUP BY and Aggregate Functions

Structured Query Language (SQL), developed in the early 1970s as part of IBM’s System R project, has evolved to become the cornerstone of data manipulation. It empowers users to extract, organize, and analyze data effectively, revolutionizing database management across industries. Among its numerous features, the GROUP BY clause and aggregate functions play a pivotal role in summarizing data. In this article, we’ll explore these tools in detail with simple examples, discuss performance considerations, and highlight limitations. Stick around until the end for some beginner-friendly practice questions!

Understanding GROUP BY

The GROUP BY clause is used to arrange identical data into groups. It interacts closely with the HAVING clause, which allows you to filter groups based on aggregate values. Unlike WHERE, which filters rows before grouping, HAVING filters groups after aggregation.

Syntax:

SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name;
SELECT department, COUNT(employee_id) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 5;

This query will return only those departments where the employee count exceeds 5. It’s typically combined with aggregate functions to perform calculations on each group, making it indispensable for reporting and analytics.

Common Aggregate Functions

Here are some common aggregate functions used in SQL:

Examples

Example 1: Counting Employees by Department

Imagine a table employees with columns: department, employee_id, and salary. To count the number of employees in each department:

SELECT department, COUNT(employee_id) AS employee_count
FROM employees
GROUP BY department;

Example 2: Total Sales per Product

Consider a table sales with columns: product_name, quantity_sold, and price. To calculate the total sales for each product:

SELECT product_name, SUM(quantity_sold * price) AS total_sales
FROM sales
GROUP BY product_name;

Example 3: Average Age by City

If we have a table people with columns: city, age, and name, and we want to calculate the average age for each city:

SELECT city, AVG(age) AS average_age
FROM people
GROUP BY city;

Performance Considerations

Indexes

GROUP BY operations can see significant performance improvement when the columns involved are indexed. For example, consider a table orders with millions of rows:

Without Index

SELECT customer_id, SUM(order_amount) AS total_spent
FROM orders
GROUP BY customer_id;

Running this query on an unindexed customer_id column may result in a full table scan, which is computationally expensive. By adding an index:

With Index

CREATE INDEX idx_customer_id ON orders(customer_id);

The query planner can leverage this index to speed up grouping and aggregation, particularly for queries with selective filtering conditions (e.g., WHERE clauses).

Key Optimization Strategies

SELECT region, city, store_id, COUNT(*)
FROM sales
GROUP BY region, city, store_id;

While sometimes necessary, grouping by multiple columns should be done judiciously to avoid overloading memory.

Limitations

WITH grouped_sales AS (
    SELECT product_id, SUM(quantity) AS total_quantity
    FROM sales
    GROUP BY product_id
)
SELECT g.product_id, p.product_name, g.total_quantity
FROM grouped_sales g
JOIN products p ON g.product_id = p.product_id;
SELECT COALESCE(region, 'Unknown') AS region, COUNT(*)
FROM customers
GROUP BY COALESCE(region, 'Unknown');

Practice Questions

Question 1: Employee Count by Role

Table: staff
Columns: role, employee_id
Task: Count the number of employees in each role.

SELECT role, COUNT(employee_id) AS employee_count
FROM staff
GROUP BY role;

Question 2: Total Revenue by Category

Table: products
Columns: category, price, units_sold
Task: Calculate the total revenue (price * units_sold) for each category.

SELECT category, SUM(price * units_sold) AS total_revenue
FROM products
GROUP BY category;

Question 3: Average Test Score by Subject

Table: test_scores
Columns: subject, student_id, score
Task: Find the average test score for each subject.

SELECT subject, AVG(score) AS average_score
FROM test_scores
GROUP BY subject;

The GROUP BY clause and aggregate functions are essential tools for analyzing data. These simple examples and practice questions should help you build a solid foundation.