Seven SQL tips and tricks for developers
Written by Mirza Bečić, Database Developer at Softray Solutions
As a DB Developer, I often get asked for help on specific projects using relational databases. Optimization, debugging, writing new well-optimized queries, and creating new objects are the most common things that developers need help with. While many developers understand most of the time what is happening in the query, they often require a lot of time to do so (usually, the time they don’t have). Here are some tips and tricks that I hope will help developers and others that are interested in SQL, to read code faster, debug it easier, understand what is happening behind the query and who knows, maybe one day be able to easily optimise existing code and write new efficient code.
I won’t be going into too many details about each tip because there can be an article about it (probably there already is). The idea is to have a basic understanding of essential stuff and why it is so. The things I will discuss will be related to Microsoft SQL, but most apply to other RDBMS. Before diving into some most important tips and tricks, assuming we already know the basic syntax of SQL, it would be nice to know what exactly happens when we execute an SQL command, SELECT statement, for example.
A vital component that takes care of our queries is called Relational Engine, also known as Query Processor.
It consists of a couple of steps, and those are:
Parsing
This is the first step in checking the syntax of the query. If the question has a syntax error, that error is returned through the protocol layer back to the client. On the other hand, if the query is good, the result of this step is the parse tree.
Binding
The parse tree is then passed to the Algebrizer, which does the semantic check to determine if tables and columns exist, loads the metadata for information for the tables and columns, and does simple syntax-based optimization. The output of this step is a query tree.
Optimization
Optimization is the most complex part. In short, the query optimizer takes the query tree and builds an execution plan. Naturally, it will make the cheapest execution plan, not the best one. Often the cheapest is the best, but not always.
Execution
The Executor gets the data based on of execution plan. And after results are published to the Protocol layer, end data goes back to the end-user.
Now that we know what is happening when we execute the query, we can move to tip №1.
1. SQL Order of execution.
This is the most important tip for anyone who wants to delve into SQL. SQL is a declarative language, which means we declare what we expect from the result, but not HOW to obtain it. When we execute a query, there is a specific order by which the database executes query components:
I can’t emphasize how important it is to learn this order because it will help us write a better, better-optimized query once we start thinking this way. In addition, it will help us with debugging/optimizing the existing query.
2. * (All) in a query
The first query that almost everyone writes goes something like this: SELECT * FROM dbo.TABLE1.
In the beginning, while we are learning, there is nothing wrong with it as long as some local table has a small number of records and is being used to display the data. But the problem occurs when we are using something like this in environments where we have a lot of records.
You should avoid using * in your queries: Unnecessary IO, Conflicts in JOIN Queries, using way more application memory than needed, and increased network traffic. In short, it can cause unneeded performance issues that can be avoided by selecting the columns we want to display.
3. Readable code
Your code should always be well formatted. It all comes down to how fast and efficiently you can read what You or someone else wrote. So be nice and write your code clearly and well formatted. It will help with debugging, understanding the logic, and going through the code much more efficiently, plus it looks nice.
Here are some tips on how to make a code look better:
a) Use upper casing keywords
For SQL statements, you should always use upper casing. Example: CREATE, SELECT, FROM, WHERE etc…
b) Have a naming scheme
When we are naming objects, it should be meaningful and point us in the direction we are looking for. Calling a table dbo.foo means nothing to us, but naming it dbo. Customers have a meaning.
c) Lining up parameters, types, values
This looks much better:
Then this:
a) Prefixing name with data types
There is no need to name your variable as: @strNameOfVariable.
It will be enough to name it @NameOfVariable since what will happen if we have to change the data type is that we also have to change the variable’s name on every place where we have used it.
4. Parameter Sniffing
I had one issue that took me a while to figure out. I had a procedure that used to run fast, but after a while, it started running slow. After an investigation, I figured out the issue was called parameter sniffing.
When the optimizer creates an execution plan, it sniffs the parameter values. This is not the issue, it needs to build the best method, but the problem arises when the query uses a previously generated plan optimized for different data distribution.
There are multiple ways to solve this, but the friendliest for developers is to use dummy variables inside the procedure:
For every input parameter, you should create a dummy variable, which will avoid a problem with parameter sniffing.
5. Use GROUP BY over DISTINCT
This one might be controversial, but in my experience, we should avoid using distinct user groups in almost all cases. From a performance standpoint, there isn’t much difference on a simple query, but once more complex queries come into play, I often had a much better performance using GROUP BY.
Here is a link that dives into the subject much more for those interested in it:
https://sqlperformance.com/2017/01/t-sql-queries/surprises-assumptions-group-by-distinct
6. Using more than needed
When we are declaring a variable size, for example: DECLARE @name VARCHAR(MAX), if we know that @name size cannot be larger than, let’s say 50, there is no reason to use VARCHAR(MAX); instead, we should use VARCHAR(50). It will significantly help with the performance. This also applies to other data types that can have a size specified. Do not use more space than it is needed.
7. Choose a wrong data type
This one is quite simple, but I’ve seen this issue on many projects, so I need to mention it.
If You know that column is intended for date/time, use that data type, don’t use string/numeric.
The same goes for date/time when date/smalldatetime will do.
Time in place of interval.
NVARCHAR for postal code.
MAX types for URL & e-mail address.
These are just some of the tricks and tips that could help you make your work easier and successfully complete the project using relational databases. For those who are curious, I have also selected additional references where you can find more information on these topics.
If you enjoyed reading this, click the clap button so others can find this post.
References:
https://www.sqlservergeeks.com/sql-server-architecture-part-2-the-relational-engine/
https://www.sisense.com/blog/sql-query-order-of-operations/
https://www.java67.com/2018/02/why-you-should-not-use-select-in-sql.html
https://sqlblog.org/bad-habits
https://sqlperformance.com/2017/01/t-sql-queries/surprises-assumptions-group-by-distinct