Database Optimisation : Why the basics matter

This article takes us back to the basics on database relationships to see how it could actually speed up systems.

Key points

Database relationships matter
Sometimes doing strange things can help
How does all of this work and effect processing in practice?
Indexing and Optimisation

Database relationships matter

As the traditinal regergitated response goes "What you learn when you study vs how it is in the field is different". This article doesn't harp on that but does highlight a few of those expentions. Database 101 before even touching the keyboard teaches you to design your database to the 3 normal forms. This is great in theory but in practice can lead to some performance issues.

Say you have a table with customer names and their country.

Below is a visual representation of the same data using a mermaid ER diagram (view this page in a browser so the mermaid runtime can render it):

Customer Table

CustomerID	CustomerName	Country
1	John Doe	UK
2	Jane Smith	ZA
3	Alice Johnson	UK
4	Bob Brown	RU

erDiagram CUSTOMER { int CustomerID string CustomerName string CountryCode }

If we consider each character as a number (instead of going into the complexity of encoding etc) we can see that the Country field is taking up a lot of space. In this case we have 2 characters for UK, 2 for ZA and 2 for RU. So in total we have 8 characters used for country data.

Test increment scene

So whats the simple solution? Normalisation.

Customer Table (normalised)

erDiagram COUNTRY ||--o{ CUSTOMER : has COUNTRY { int CountryID string CountryCode } CUSTOMER { int CustomerID string CustomerName int CountryID }

Customer Table

CustomerID	CustomerName	CountryID
1	John Doe	1
2	Jane Smith	2
3	Alice Johnson	1
4	Bob Brown	3

Country Table

CountryID	CpuntryCode
1	UK
2	ZA
3	RU

BUT WHY?

Well this is where we going into the theory in practice now and look at the edge cases later in the article.

Simple answer: Less is more. We are grouping data to reduce redundancy. In this case the Country field is now only stored once per country in the Country table. So instead of storing 8 characters for country data we are now storing 2 characters for each country plus an integer for the CountryID in the Customer table. This reduces the overall storage space used and can improve query performance when searching or filtering by country.

The longer answer on how this acts in practice

This little ditty is nerd level 10 and will distract from the overall article but for those that want to see how this works in practice here we go.

Jump to reference the real meat

How SQL handles SELECT queries

Okay, lets start our deep dive from the server.

1. Single table without normilization and CountryCode in the CUSTOMER table

SELECT * FROM CUSTOMER;

1.1. The database server reads the CUSTOMER table from disk into memory. This involves locating the data pages that contain the CUSTOMER table and loading them into the buffer pool (memory).
1.2. The server then processes the query by scanning through the rows of the CUSTOMER table in memory. For each row, it retrieves the values of the CustomerID, CustomerName, and CountryCode columns.
1.3. The server constructs the result set by collecting the retrieved rows and preparing them for transmission back to the client.
1.4. Finally, the server sends the result set back to the client application that issued

The "Client"

Ok be fore " The client we still need to look at "

The network

Theory in practice

FUTURE USE

Normal Forms

The 3 normal forms are as follows:

First Normal Form (1NF): Ensures that each column contains atomic values and that there are no repeating groups or arrays.
Second Normal Form (2NF): Builds on 1NF by ensuring that all non-key attributes are fully functionally dependent on the primary key.
Third Normal Form (3NF): Builds on 2NF by ensuring that there are no transitive dependencies between non-key attributes. These forms are designed to reduce data redundancy and improve data integrity. However, in practice, strictly adhering to these forms can lead to complex queries and joins that may impact performance.

And well ... real life, people are going to use your system the way they want not how it's easiest for you to build it.

Sometimes doing strange things can help

Take for an example (and this is still dangours and should not be used in feild) but young intern days when needing to put a solution together and you have seniors that are happy to break your stuff