Encyclopedia

/

SQL Queries

What are SQL Queries?

A SQL query is a statement asking for data from a database in a database-specific language. SQL stands for Structured Query Language. Databases have very specific mechanisms by which they sort and organize data to be easily accessible. A SQL query is the best interface for pulling that data efficiently.

Why do SQL Queries Exist?

SQL was invented in 1970 at IBM by Donald D. Chamberlain and Raymond F. Boyce, originally designed to pull data from IBM’s database system. IBM began to explore the commercial potential of the technology, with Oracle coming in shortly after and also pushing on the commercial angle of the technology. By 1986, there was already an adopted standard for SQL.

How do you put together SQL Queries?

Let’s go through a simple example of SQL to best understand its use case and usefulness. First, we need to establish what a SQL query is actually doing: it pulls data from a database. A database is just a large excel-type file, with named columns and rows.

For this example, we are looking at a video game table within a database, where an online player has an ID, a username, a date their account was created, and an amount of virtual currency.

Here is an example of that setup:

id | user_name | created_date | virtual_currency
1  | Explo          | Jan 1, 2024   | 100
2  | SQL            | Jan 2, 2024  | 50

Suppose we wanted to pull all users from this list that have virtual currency greater than 50 and accounts created between Jan 1,2024 to Jan 7, 2024. How would we do that?

With the following SQL Select query:

Select *
from Video_Game_Table
Where virtual_currency > 50
and
created_date between  ‘2024-01-01’ and  ‘2024-01-07’

Let’s quickly break down this example.

First, we Select columns to show in our result. In a SQL query, the star notation means we want to show all columns for a record. So, in this case, that would be id, user_name, created_date, virtual_currency. Next, we have to tell the SQL query where we want to pull the data from, which is the Video Game table. Oftentimes within the same SQL environment there can be multiple tables of interest, so you need to say which one you want. Next, we have the where clause. This is the place in the query in which you specify the constraints. The SQL execution will look at each record and decide if the record matches each statement in the SQL query where clause to determine if the record should be returned. The first statement here is virtual_currency > 50, meaning we will only select records that have currency greater than 50. This eliminates the player with username SQL. We then have an AND statement, meaning that the next statement must also be satisfied to return the results. In this case, the only remaining potential record is username Explo, which does satisfy the statement created_date between  ‘2024-01-01’ and  ‘2024-01-07’, since Jan 2, 2024 is between these two dates.

Now that we’ve written our first SQL statement, we might have the question: how do we ensure efficient execution of our SQL query?

What is an Index in a SQL Query?

Before diving into SQL optimization, we need to first understand indexing, a powerful tool for making sure we can efficiently access data in a SQL query.

An index tells a SQL engine to store data for efficient retrieval. Let’s go back to our previous example to best understand how this is done.

The best example of a use case for an index would be the virtual_currency. If we are always asking questions in our database for players with virtual currencies greater than X amount, it would make a lot of sense to make sure we can access that data efficiently.

All SQL engines operate a bit differently, but most use some form of a B-tree to efficiently store data for retrieval. Let’s look at a simplified version of a B-tree to best understand. Suppose we had virtual_currency entries of:10, 20, 30, 50, 70, 80, 90

In order to return the virtual_currency amounts > 50 without an index, we would need to check all 7 entries. Now, pretend those 7 entries are actually 7 million entries – that’s a lot to check! With a B-tree, the data is stored more efficiently. Something like this:

B-Tree Representation

If you can imagine it, this sort of looks like a tree! At the root of the tree is the number 50. Each number to the right of the 50 is greater than 50, and each number to the left is less than 50. The same is true for each layer below as well, with 10 being less than 20 and 30 being greater than 20. In our example, to find the values greater than 50 in this B-tree, we would need to check the root node (50), then we’d know all values to the right are greater than it. We would then need to just return the 3 values to the right. In this case, we checked the root and the 3 values to the right of it, only 4 checks in total. Imagine that we had 7 million entries. With the B-tree we would have saved over 3 million checks just by efficient storage!

The same thing applies for pretty much anything you want to index on in SQL: the SQL engine will store it in some tree structure to reduce the number of entries that need to be checked for efficient access.

You may now be wondering: are there other ways to speed up my SQL query?

How do you Optimize SQL Queries?

Yes, there are a lot of ways to speed up a SQL query!

First, make sure you have indexes wherever possible. The places to look for indexes are generally items in the where clause (like our virtual_currency example) and with joins, which is how you connect two database tables together.

Second, make sure to only select the columns that you need to return. For example, if you don’t care about the id in our use case, don’t return that data.

Third, reduce expensive operations that require you to touch every row in the database. This is really a catch-all for a lot of inefficient ways to write SQL. Let’s go through a few examples:

String Comparisons

Select * from table
Where
user_name LIKE ‘%Explo%’

This requires us to look at every single row and see if user_name contains Explo. It is much more efficient to do a direct string comparison when possible like so:

Select * from table
Where
user_name = ‘Explo’

Case Statements

Select * from table
Where
(case When virtual_currency > 30 then 1 Else 0 end) > 0

This is a really inefficient way to ask if a currency is greater than 30, since you are doing a calculation per row in the table. This is much better as:

Select * from table
Where
virtual_currency > 30

The fewer the case statements in where clauses, generally the better.

Fourth, think about SQL execution order and refactor subqueries and common table expressions (CTEs).

Sometimes, CTEs and subqueries can cause your indexes and other elements to be used less efficiently. When possible, it is best to refactor these, but it is hard to determine ahead of time if this will yield improvements. In general, avoid redundant operations and do joins as opposed to subqueries where possible.

Fifth, improve the database setup. This means caching data, pre-aggregating data like a total_virtual_currency_in_circulation if that’s a commonly accessed metric, and upgrading the DB performance metrics.

Sixth, check the query plan. Your database has a query execution plan for how the SQL query is executed. It is best to check those for more specifics.

How do you use SQL Queries for Data Analytics?

SQL queries are often the backbone for modern data analytics, particularly when it comes to an internal business intelligence or external embedded analytics use case. There are many tools out that that create a frontend interface for aggregating and viewing your data from a SQL query into a production-grade application.

Can I use AI to write SQL Queries?

Yes, you can. AI has become increasingly popular for SQL query writing, editing, and optimizing. With natural language input and an approximate schema, AI can write SQL queries, AI can optimize SQL queries, and even catch syntax and logic errors in existing SQL queries. It is always recommended to check an AI’s output for a given query, but many have found in the business intelligence industry that SQL query writing can be much faster with AI’s help.

Related terms:

No items found.