Not even on single-column tables should you use SELECT *

Even with single-column tables, try to stay away from SELECT *. Remember it even if you don't agree. I may want you to think about it by the end of this post.

A tale from 2012.
I actually encountered this situation with a customer backend application more than 12 years ago (about 2012–2013).

A robust backend API with a single-digit millisecond execution time. Users once had a sluggish and slow user experience.

Nothing was immediately apparent when we examined the commits; the majority of the modifications were harmless. Just in case we undid all of the commits (some of you may be able to identify with this; blaming things that don't make sense is a sign of desperation).

The app was still sluggish, though.

Examining the diagnostics, we saw that the API response time occasionally ranges from 500 ms to up to 2 seconds. where the millisecond was formerly a single number.

We began examining the database queries, but we are certain that nothing has changed in the backend that could have contributed to the slowdown.

The backend application receives a SELECT * on a table with three blob fields, each of which contains extremely huge documents.

As it turned out, the API was using a SELECT * to return and use the two integer columns that were present in this table. However, three blob fields that are utilised and filled in by another program were later added by the administrator.

Database, network, and protocol serialisation cost resulted from the backend API fetching the additional fields filled by other applications even if those blob fields were not being returned to the client.

The operation of database reads
Pages are the units in which rows are stored in a row-store database engine. Every page has a set header and several rows, each of which has a record header and the corresponding columns. For example, take a look at this PostgreSQL sample:

We have access to every row and column on a page that the database retrieves and inserts into the shared buffer pool. The question then becomes: why would SELECT * be expensive and slow if we have all the columns easily accessible in memory? Does it actually go as slowly as people say? If so, what causes it to be thus? We shall discuss these and other issues in this post.

Index-Only Kiss Scans Farewell
The database optimiser is unable to select index-only scans when SELECT * is used. For instance, if you have an index on the grades column that has the student ID as a non-key, and you need the IDs of students who scored more than 90, this index is ideal for your query.

But because you requested all fields, the database must contact the heap data page in order to retrieve the remaining information, which leads to a significant increase in random reads and I/Os. In contrast, if you hadn't used SELECT *, the database might have merely scanned the grades index and returned the IDs.

Cost of Deserialisation
The process of transforming raw bytes into data types is known as deserialisation or decoding. This entails transforming a series of bytes—usually from a file, network communication, or another source—back into a more structured data format, like Programming Language variables or objects.

The database must deserialise every column when you run a SELECT * query, including ones you might not need for your particular use case. The performance of the query may be slowed down and the computational overhead increased. You can lower the cost of deserialisation and increase the effectiveness of your queries by choosing only the columns that are required.

Not Every Column Is Linear
The fact that not all columns are kept inline within the page is a major problem with SELECT * queries. Postgres TOAST tables are an example of external tables that may hold large columns, such text or blobs, and only retrieve them upon request. Since these columns are frequently compressed, running a SELECT * query with a large number of text fields, geometric data, or blobs adds to the database's workload in order to retrieve the values from external tables, decompress them, and send the results back to the client.

Cost of the Network
The query result must be serialised in accordance with the database's supported communication protocol before being provided to the client. The more data that needs to be serialised, the more CPU power is needed. Following serialisation, the bytes are sent across TCP/IP. Network delay is ultimately impacted by the cost of transmission, which increases with the number of segments you must send.

Large columns that clients would never use, like strings or blobs, may need to be deserialised in order to return all columns.

Deserialisation of Clients
The client app must deserialise the data to the client's preferred language after receiving the raw bytes, which lengthens the processing time overall. This process slows down as more data enters the conduit.

Unpredictability
Even with a single field, using SELECT * on the client side might add volatility. Consider the following scenario: your application performs a SELECT *, lightning-fast, on two integer fields in a table with one or two fields.

Later on, though, the administrator made the decision to include a blob, JSON, XML, and other fields that are filled in and utilised by other applications. Your code will suddenly slow down since it is now collecting all the extra fields that your program didn't require in the first place, even though nothing changed.

Grep Code
You may grep the codebase for in-use columns with explicit SELECT, which is another benefit in case you wish to change or remove a column. This makes DDL modifications to database schemas easier to handle.

In brief
To sum up, a SELECT * query entails a lot of intricate procedures, therefore it's ideal to just choose the fields you use in order to cut down on needless overhead. Remember that the overhead of a SELECT * query may be minimal if your table contains a small number of columns with straightforward data types. However, choosing carefully which columns to retrieve in your queries is generally a smart idea.

Hi There, I'm Yahya, and I enjoy sharing knowledge and experiences.

DataWise - Your Free Literature

Not even on single-column tables should you use SELECT *

How to enable subscription manager in Red Hat Enterprise OS

How to disable subscription manager in Red Hat Enterprise OS

How to Install and Configure VMware Workstation Pro 17

SQL Professionals Are Withholding These 9 Database Optimisation Tricks From You

How to Configure VMWare Workstation Pro 17 Using Linux OS

Not even on single-column tables should you use SELECT *

Join the conversation