magento

EAV in Magento: why it exists, when it hurts you, and what I built to fix it

After spending enough time inside Magento's catalog model, I finally started understanding why EAV exists, where it helps, and where the tradeoff starts getting expensive.

For a long time, my opinion about Magento’s EAV model was very simple:

This is weird, and it makes everything harder than it should be.

After working on Magento projects at Contus for apptha.com long enough, that opinion changed. Not because EAV became pleasant, and not because the queries suddenly became fast. It changed because I finally understood what problem Magento was trying to solve.

That did not make me love EAV, but it did make me stop dismissing it as “bad design”.

Magento is not modeling a simple product catalog where every item has the same fields. It is modeling a large, flexible commerce system where attributes can be added, removed, configured, and scoped without changing the schema every time the business wants one more product field. Once I started seeing the problem that way, EAV made a lot more sense.

At the same time, the pain never disappeared. If anything, it became easier to explain. EAV gives flexibility, but it pushes complexity into the queries, into indexing, into joins, and into the mental overhead of understanding where your data actually lives.

This post is not about saying Magento was right or wrong. It is about understanding the tradeoff properly.

What EAV actually is (not just “Magento is weird”)

The easiest way to misunderstand EAV is to think it is just Magento being unnecessarily abstract.

It is more specific than that.

In a normal table design, I might store products like this:

CREATE TABLE products (
  id INT AUTO_INCREMENT PRIMARY KEY,
  sku VARCHAR(50),
  name VARCHAR(255),
  color VARCHAR(50),
  price DECIMAL(12,2),
  description TEXT
);

That is easy to query and easy to understand. Every product row has the same columns.

But this only works well if the product model is stable. The moment I want different attribute sets, custom fields, store-specific values, or fields that exist for some product types and not others, this starts getting awkward. I either keep adding nullable columns forever, or I start building side tables anyway.

EAV changes the shape completely.

Instead of saying “a product has a color column”, it says:

  • the product is an entity
  • color is an attribute definition
  • the actual value lives in a type-specific value table

In Magento that starts with tables like:

  • eav_attribute
  • catalog_product_entity
  • catalog_product_entity_varchar
  • catalog_product_entity_int
  • catalog_product_entity_decimal
  • catalog_product_entity_text
  • catalog_product_entity_datetime

That is the part that confuses people when they first inspect the database. The product row is not the full product. It is only the base entity. Most of the useful data is distributed across value tables depending on attribute type.

If I want a product name, I am not reading catalog_product_entity.name. I am joining from the product entity to eav_attribute to find the name attribute, then into catalog_product_entity_varchar to get the actual value.

Conceptually, it looks more like this:

eav_attribute
+----+-------------+---------------+
| id | code        | backend_type  |
+----+-------------+---------------+
| 73 | name        | varchar       |
| 81 | price       | decimal       |
| 91 | description | text          |
+----+-------------+---------------+
catalog_product_entity_varchar
+----+-------------+--------------+----------+----------------+
| id | entity_id   | attribute_id | store_id | value          |
+----+-------------+--------------+----------+----------------+
|  1 | 1001        | 73           | 0        | Travel Bag     |
+----+-------------+--------------+----------+----------------+

Once I started thinking of EAV as “schema flexibility moved into data”, the model stopped looking random. It was a deliberate trade.

Why Magento chose EAV: flexibility over performance

The reason Magento chose EAV became much clearer once I spent enough time working with real catalogs and marketplace-style requirements.

In apptha.com-style commerce work, the catalog is rarely fixed for long. New product attributes appear all the time:

  • brand
  • size
  • material
  • delivery type
  • compatibility
  • seller-specific fields

And they do not always apply evenly to every product type.

If the schema had to change every time the business wanted a new product attribute, Magento would become painful in a different way. It would be less dynamic, less admin-friendly, and much harder to extend safely.

With EAV, Magento can define attributes in data instead of in schema migrations. That opens up useful things:

  • admin-driven attribute creation
  • different attribute sets for different product types
  • type-aware storage
  • store-specific values through store_id

That last point matters more than I appreciated at first. A value like product name or description can vary per store view. Once I understood that, some of the extra indirection made more sense. Magento is not just storing product fields. It is storing product fields with context.

That is the strength of EAV. It is highly adaptable.

The weakness is that the cost is paid later.

You do not pay for EAV when defining the model. You pay when reading it.

Every simple field access becomes a join or a subquery. Every filter across multiple attributes gets heavier. Every performance issue becomes more annoying to reason about because the data is fragmented by type and sometimes by store scope as well.

So for me, the most honest summary is this:

EAV is a good answer if the main problem is attribute flexibility.

It is a bad answer if the main problem is fast, simple querying of a stable schema.

Magento clearly picked the first priority.

The queries EAV produces and why they’re slow

This is where most Magento developers feel the pain before they understand the design.

A normal query for product data in a flat schema is easy:

SELECT id, name, price
FROM products
WHERE id = 1001;

With Magento EAV, the same idea becomes much heavier.

If I want just name and price for a product, I may need something closer to this:

SELECT p.entity_id,
       name_attr.value AS name,
       price_attr.value AS price
FROM catalog_product_entity p
LEFT JOIN eav_attribute a_name
  ON a_name.attribute_code = 'name'
LEFT JOIN catalog_product_entity_varchar name_attr
  ON name_attr.entity_id = p.entity_id
 AND name_attr.attribute_id = a_name.attribute_id
 AND name_attr.store_id = 0
LEFT JOIN eav_attribute a_price
  ON a_price.attribute_code = 'price'
LEFT JOIN catalog_product_entity_decimal price_attr
  ON price_attr.entity_id = p.entity_id
 AND price_attr.attribute_id = a_price.attribute_id
 AND price_attr.store_id = 0
WHERE p.entity_id = 1001;

That is already more work for the database and more work for the developer.

Now imagine filtering a collection by multiple attributes, maybe sorting by one of them, and maybe joining store-specific values with fallback logic. The query shape gets large very quickly.

The slowdown is not because MySQL cannot handle joins. The slowdown comes from the fact that EAV turns simple row access into attribute reconstruction. Every field access is a join. More attributes means more joins. Filtering by multiple attributes means joining across multiple value tables and coordinating conditions across them. Ordering by a non-entity field means the sort column lives in a value table, not the entity table.

On top of that, backend type splits data across multiple tables. A product collection doesn’t query one value table, it needs varchar for name, decimal for price, int for status, text for description. The more flexible the product model becomes, the more fragmented the reads become.

That means the more flexible the product model becomes, the more fragmented the reads become.

This is also why some Magento developers react so positively to flat catalog strategies. Even if they are not perfect, they are an understandable response to EAV read complexity.

One of the harder parts for me was debugging why a query was slow when the SQL being generated was already so noisy. In a simple schema, I can inspect a query and reason about it quickly. In Magento, by the time a collection has loaded a few attributes, applied filters, and respected store scope, I am often reading SQL that is technically valid but no longer pleasant to think about.

That does not mean EAV is broken. It means the read path is carrying the price of the flexibility Magento chose at the model layer.

When EAV makes sense and when it doesn’t

Once I stopped reacting to EAV emotionally, I found it easier to ask a better question: when is this actually the right tool?

For Magento’s product catalog, the answer is more often than many developers want to admit. The business needs dynamic attributes, admin-managed field definitions, varying attribute sets by product type, and multi-store scoped values. EAV is a reasonable answer to that problem. It is not a crazy choice, it is solving a real problem that would be genuinely messy with a flat schema.

But the moment those requirements are not central, EAV becomes hard to justify. If I am modeling something with a stable shape, a known set of columns, heavy read traffic, and frequent filtering, I would much rather use a conventional relational model. EAV is powerful exactly in the places where flexibility matters more than read simplicity. It is expensive exactly in the places where query simplicity matters more than flexibility.

The thing I wish more Magento developers talked about is that distinction. Too often the conversation becomes “EAV is terrible” or “EAV is powerful” and both of those are incomplete. It is a tradeoff. Magento picked flexibility. The price shows up at read time. Neither the choice nor the price is irrational.

What changed for me was stopping the question “why would anyone design it like this?” and asking instead “what problem does this solve, and is that problem important enough to justify the cost?” That question works for EAV specifically and it works more broadly for any design decision that looks strange from the outside.

Early thinking about a cleaner EAV in Laravel

One thing Magento did was make me interested in patterns even when I did not want the whole framework around them.

By the end of 2014 I understood why Magento uses EAV, but I also felt its friction every time I worked with generated queries or tried to explain the data model to someone new. That has made me think about whether the core idea could be implemented more cleanly in a smaller PHP application, something not tied to Magento’s specific scope.

I have been sketching what that might look like in Laravel. Not a copy of Magento. Something smaller: entity model stays understandable, attribute definitions are explicit, value storage is still type-aware, querying common attribute sets is less painful than what Magento generates.

The reason I keep thinking about Laravel here is not that it removes the tradeoff. It doesn’t. Flexible schemas still make reads more complicated. But Laravel gives me more control over the developer experience. If I were building an EAV layer there, I would want two things: the model readable enough that a developer can understand where values live without a map of six tables, and a query story intentional enough that the common path doesn’t feel more mysterious than it needs to.

That idea is still early. I don’t want to pretend I have solved it when I have only started sketching. But Magento made me more careful about it. Once you understand why EAV exists, you also understand how easy it is to build something flexible and unpleasant at the same time. The flexibility is not free, it shows up as complexity somewhere. Knowing that before you start is more useful than discovering it after the schema is in production.