New to SQL, or want to get a colleague ramped up on SQL? Learn some of these best practices for SQL before getting started!
On this page:
SQL is the most common language used in the data world, and it's used by everyone from data analysts and data scientists to data engineers, and data architects. More often than not, it's also used by large teams that work on the same project.
For this reason, it is necessary to write good SQL code. So, although writing SQL basic queries isn't that hard to do, writing good SQL queries does require some SQL best practices.
These best practices all aim to make code easier to read and understand, while, at the same time, making it easier for larger teams to work on the same code base.
Before even considering writing a single line of code it's important to study the data. This is to ensure a column actually does contain the expected or right data.
The simple reason for this is, if the column contains the expected data, any queries will get the expected results. In contrast, if it doesn't, the results will be skewed or inaccurate.
The general rule is to prioritize accuracy and readability before thinking about optimizing performance. In other words, does the query produce the expected results and can others easily read, understand, and change the code? Performance optimization should only be considered when the answer to both these questions is positive.
Even then, prioritization should only be considered when a query is run frequently or if it traverses a large number of rows.
The logical order in which a computer processes a SQL query is:
In terms of this logical order, SQL first checks the data table that will be worked with. Then it checks the filters. After this is done, it groups the data. Finally, SQL gets the data and if necessary sorts the data and prints only the specified number of rows.
Now this execution order differs from the writing order. In respect of writing order, it's vital to have the SQL Keywords in the query in the right order. The right writing order for SQL queries are:
For proper queries, this order is crucial. In simple terms, if this order isn't used, the query won't work. Although writing order is more important, it's also important to keep the logical order of operations in mind. This is simply because, when it comes to more complex SQL queries, it's useful to know the execution order to avoid logical errors in the code.
Consider the following two queries:
These are all the same SQL query. It shows that spaces and line breaks do not affect the outcome of an SQL query. The only thing that affects the query is the position of the semicolon which indicates the end of the query line.
Therefore, apart from the semicolon, it doesn't matter how many spaces and line breaks there are between the different SQL keywords.
Now, the question as to which style is best depends mostly on personal preference. And when it comes to personal preference, most SQL users use version 2 of the query. This is simply because it’s more readable.
Sure, it might not make much of a difference in a short query like this, but for very complex queries it's always better to keep SQL queries organized by using necessary spaces and line breaks.
This not only makes it easier to read, but also makes it easier to debug the code when necessary.
There are also some best practices when it comes to uppercase versus lowercase usage. Here, all reserved SQL keywords should be written in uppercase.
In contrast, all column names and table names should be written in lowercase. Finally, everything that comes between the apostrophes and are commonly used for field values, should be written exactly like they appear in the data table.
When using ORDER BY and GROUP BY, it’s possible to use both column names and column numbers. Here, it’s vital to always use these keywords with column names.
So, in the example:
It’s best to use the column name handicap rather than the column number.
The simple reason for this is that the column number may change in future if new data is added. If the query is then run on the new data, it will order the data by the wrong column number.
With any programming language, it's generally recommended to comment code and when writing SQL queries, it's no different.
Not only does commenting help the user understand their code in future, but it's also helpful when they work as part of a team. Here, it helps the team to understand the code.
For example, if another person on the team would need to work on the same code in future, they would understand what the code does just by looking at the comments.
To add a comment in SQL – – is used. Everything following the – – will not be executed.
This means in the query:
Nothing after the – – will be executed and the query will run just like it did before.
It's crucial to maintain code, especially where large teams are working on the same codebase.
For this purpose, many teams use software that re-organizes code and colors different keywords consistently. In this way, the code always looks the same, no matter who writes it. A tool like MySQL Workbench is also able to do this, and it's even possible to do this manually, although it does take more time.
Whether you're an experienced developer, or have never written a line of code in your life, SQL is an easy-to-pick-up language that can have applications in any company or role. Hopefully, this post was helpful to illustrate some of SQL best practices which makes code easier to read and understand when writing SQL queries.
If you're interested in learning SQL or considering data science as a career path, feel free to visit Career Karma for more help and resources to get started.