In this blog we will be exploring the lovely world of encoding. That’s right, the way Power BI, the underlying engine, stores values efficiently to reduce the overall memory footprint. So, we will cover three core encoding types.
What is the Underlying Engine of Power BI: VertiPaq Engine
Before we dive into it all, it's important we call this out. Above, I mention the "underlying engine". This is where all the magic happens and where a lot of the amazing performance comes from, so it's only right to offer more of an explanation. Power BI works through VertiPaq, a columnar storage engine, meaning it compresses columns and is optimised for working with integers. It's also worth calling out that the VertiPaq engine is also available when using Analysis Services Tabular and Power Pivot.
Also, I don’t want you to think that you must know every detail to use Power BI. Not at all! You drive the car, you do not need to know how the engine and all its moving parts work. This is simply what occurs under the Power BI hood. However, if you want to dive deeper into Power BI and understand why it behaves the way it does, keep reading. When you need to troubleshoot performance or investigate a slow solution, understanding the different encoding types is incredibly useful.
The Three Core Encoding Types
Let's now explore the three types of Power BI encoding that Power BI uses, which are:
- Value Encoding
- Dictionary (HASH) Encoding
- Run Length Encoding (RLE)
For Run Length Encoding (RLE), this is applied after Value or Dictionary Encoding to further compress. One last note, all my knowledge on this topic and understanding how the underlying engine works come from the book: The Definitive Guide to DAX by Marco Russo and Alberto Ferrari. I highly recommend you get this book, as its one that you will always refer back to it. Also, the SQLBI team have some incredible articles and videos on this topic and deep dive into various things such as the formula engine vs storage engine, how to improve compression, identifying queries being sent to the engine and much more.
Value Encoding
We will start with how the VertiPaq engine applies the first type of encoding, Value Encoding. Also, this only works against numerical columns and you will see why below. So, Value Encoding shrinks numeric columns (therefore reduces memory) by picking out the smallest number and then storing every other value as an offset from that minimum. In other words, it deducts the minimum value from the original values.
For instance, if the smallest value in a column is 10, then 15 becomes 5 (15 − 10), 30 becomes 20 (30 − 10) and 60 becomes 50 (60 − 10). So, by storing these deducted values, the VertiPaq engine cuts down the number of bits needed. When you run a query, it simply adds the minimum back to each offset to recover the true value. It should now be clear on why Value Encoding only works for numerical columns.

Dictionary Encoding
Let's now continue with the Dictionary Encoding, also known as HASH encoding. In fact, if you use the VertiPaq Analyzer to identify which types of encoding is being used by VertiPaq, you would see the word "HASH" being used. But don’t worry, they are both the same thing. Also, further below it will make sense why it's called dictionary encoding.
Anyway, for this type, it can be applied to text columns. When VertiPaq encounters a text column, it builds an internal dictionary to shrink the data in memory. This is not visible to the end-users of the solution and it has two parts:
- A unique integer ID for each distinct text value
- The actual string itself
Once building the dictionary, think of it as the VertiPaq engine swaps every text value in the column for its small integer ID, cutting down on the memory usage. As you can see in the below example, Menswear becomes 1, Womenswear becomes 2, Accessories becomes 3, etc. Also, if you use the VertiPaq Analyzer in DAX Studio you will see a column called "Dictionary". This shows the size of the columns dictionary in bits. Keep in mind that the more unique entries you have (higher cardinality), the larger that dictionary becomes and the more memory it consumes. This is why it's best to try and reduce high cardinality columns when possible.

It should now be clear why we should aim to use integers over text values. If we promote the usage of numerical columns, we would be promoting Value Encoding which does not create a Data Dictionary table which consumes more space.
Also, to finish up, take a look at the below screenshot from the VertiPaq Analyzer in DAX Studio. Notice, we have the column "Dictionary" which as explained above shows the size of each columns dictionary in bits. Also, notice, the Dictionary size is always high when the Encoding is "HASH". When the encoding is "VALUE", the dictionary size is minimal, as it does not create a dictionary table, it use Value Encoding through the first option explained in this blog.

But Wait! Why Is VertiPaq HASH (Dictionary) Encoding Numerical Columns?!
Did you spot it? In the above screenshot, we have columns such as Transaction ID and Qty which are numerical columns (Int64), yet the VertiPaq engine has decided to use HASH Encoding. This breaks the rule explained above in the blog. I remember some years back spending hours trying to find an answer in the Microsoft docs. But from what I remember, the VertiPaq engine simply picks the approach that offers the most efficient outcome. For example, if an integer column has thousands of unique values or a huge range between its minimum and maximum, the offsets you’d store with Value Encoding can end up just as large as the original numbers. So, building a compact dictionary of small integer IDs and swapping those in actually takes up less space. For this reason, you will find HASH (Dictionary) encoding on numerical columns when cardinality or range makes it the more efficient choice. Also, I can confirm it is not possible to simply change the encoding type - if you are reading this and you know how, please let me know. This is automatically assigned by the VertiPaq engine. Sure, you can influence it by the data type, but as we just saw now sometimes the engine goes against Value Encoding for integers.
Run Length Encoding (RLE)
We now arrive to the final type of encoding we will be exploring, Run Length Encoding. As mentioned above, this type of encoding is applied after Value or Dictionary Encoding to further compress. So, the VertiPaq engine would first decide to use Value Encoding or Dictionary (HASH) Encoding and will then apply Run Length Encoding.
So how does this work? Well, the VertiPaq engine will look for the same number appearing consecutively in the column and it will record the rows as one entry, but with a count. To make more sense of this, let's look at the below example. RLE is being applied to the column "Order ID" and notice we have some values appearing multiple times. We have the Order ID "1000921" appearing 3 times, so RLE will store that value with row start and the times it appears, instead of writing it out three times. Now, when you look at the below example you may be thinking how much of an improvement can this really make. However, it can make a huge difference when working with large volumes of data, with columns that contain long stretches of the same value.

How Import, Direct Lake and DirectQuery Feed into Encoding
Okay, just some additional information I thought would be helpful. We should now have a good understanding of the different types of encoding and how they are applied. Let's briefly explore how Import, DirectQuery and even DirectLake feed into this.
Import mode is one of the most commonly used storage modes in Power BI. When using this mode, the data is loaded into memory, meaning the VertiPaq engine steps in to handle queries. That’s where all the encoding options we’ve talked about come into play, helping reduce memory usage.
DirectQuery is a different story. It doesn’t load the data into memory and instead sends queries straight to the source, commonly SQL enabled data sources. That means the great encoding options we explored above will not apply here as nothing is stored in memory.
Direct Lake is the newer mode with Microsoft Fabric. It also uses the VertiPaq engine to process queries, so just like Import mode, some encoding options are active here too. However, it's worth knowing that Direct Lake performs a little differently under the hood. Concepts like Reframing and Paging play a role here.
If you want to get more familiar with the differences between Import, DirectQuery and DirectLake, check out my blog here: Power BI Storage Modes Demystified.
Summary
So, you should all feel more comfortable with the different types of encoding available. As I said earlier on, you don’t need to know this to simply use Power BI. So, for anyone new to Power BI or not getting into the heavy stuff, don’t worry you won't need to be diving into DAX Studio to review the VertiPaq Analyzer, check query plans and server timings, just to create a report or data model (dataset) 😆 However, for readers who have used Power BI for some time, hopefully you can see how understanding the above can play a crucial role in performance, memory efficiency and how fast your reports respond.