Database design for a social network: decisions I made on tripzar.com

When I started working on tripzar.com, I knew I was not building a normal content site.

It was a social network for travelers, which sounds straightforward until you sit down and decide what needs to be stored. A user can follow another user, post content, attach that content to a place, maybe attach it to a trip, and then expect some kind of feed to show the right updates in the right order. That is already more than basic CRUD.

This was one of the first projects where I had to think seriously about database design instead of just creating tables feature by feature and hoping they would fit together later. I still created some things too early and some things too simply, but tripzar.com is where I started understanding that the schema is not just storage. It defines what kinds of product decisions stay easy and what kinds become painful.

What follows is not “the correct way” to design a social network database. It is the set of decisions I made, why I made them, and the places where I could already see the tradeoffs while the system was still small.

The first important decision was whether user relationships should be one-way or two-way.

At first I kept bouncing between three ideas:

friends
followers
connections

Friends sounded natural because social networks had made that model familiar. But it also creates more friction. Both sides need to approve, and the data model becomes about a mutual relationship.

Followers were simpler. One user follows another user. That is a directed relationship. It matches publishing behavior better, and it makes feed generation easier to reason about.

Connections sounded flexible, but usually “flexible” in schema design means I have not made the real decision yet.

For tripzar.com, I leaned toward followers because the product behavior was closer to people publishing travel-related activity than to private mutual friendships. That let me use a very simple relationship table:

CREATE TABLE user_follows (
  id INT AUTO_INCREMENT PRIMARY KEY,
  follower_user_id INT NOT NULL,
  followed_user_id INT NOT NULL,
  created_at DATETIME NOT NULL,
  UNIQUE KEY uniq_follow (follower_user_id, followed_user_id),
  KEY idx_followed_user (followed_user_id)
);

I liked this because it answered the two important questions directly:

who am I following?
who follows this user?

In other words, an adjacency list was enough.

I did spend time reading and thinking about more complex relationship models, especially closure-table style approaches. Those are useful when the relationships themselves have depth or hierarchy that needs fast traversal. But for a simple follow model, I did not need to know the relationship path between three or four users. I just needed direct edges.

That made the tradeoff simple for me:

adjacency list for direct user-to-user follow relationships
avoid closure-table complexity until the product really demands it

I still think that was the right call for the size we were at. Closure tables would have solved a problem we did not actually have.

The bigger lesson was that naming matters. “Friends”, “followers”, and “connections” are not interchangeable words. They imply different product behavior, and the schema should reflect the one the product actually means.

Activity feeds: the fan-out problem at small scale

Once the follow model was clear, the next question was the feed.

This was where the project stopped feeling like a set of forms and started feeling like a real system.

The simplest version of a feed is easy to describe:

Show me posts from the users I follow, newest first.

In SQL, that starts out looking manageable:

SELECT p.*
FROM posts p
INNER JOIN user_follows f
  ON f.followed_user_id = p.user_id
WHERE f.follower_user_id = 15
ORDER BY p.created_at DESC
LIMIT 20;

For a small network, this is fine. It is readable, and it matches the product behavior.

The problem is not that this query is impossible. The problem is that feeds are rarely only posts. Very quickly, “activity” starts meaning more things:

a user posted a travel update
a user uploaded a photo
a user marked a place as visited
a user started a trip

Now I have to choose whether the feed is:

assembled from multiple activity tables at read time, or
written into a single activity table when actions happen

At the scale tripzar.com was operating, I preferred a single activity table because it gave me one place to query from and one ordering field to rely on.

Something like this:

CREATE TABLE activities (
  id INT AUTO_INCREMENT PRIMARY KEY,
  user_id INT NOT NULL,
  activity_type VARCHAR(50) NOT NULL,
  object_type VARCHAR(50) NOT NULL,
  object_id INT NOT NULL,
  created_at DATETIME NOT NULL,
  KEY idx_user_created (user_id, created_at)
);

Then the feed query became more uniform:

SELECT a.*
FROM activities a
INNER JOIN user_follows f
  ON f.followed_user_id = a.user_id
WHERE f.follower_user_id = 15
ORDER BY a.created_at DESC
LIMIT 20;

That still does not solve everything. I still have to load the referenced objects, and I still have to think about whether every event deserves to become feed activity. But it gave me a better foundation than trying to join five different tables every time the feed loaded.

This is the first place I started understanding the fan-out problem, even if I was not using that name every day. There are two broad choices:

fan out on read: build the feed when the user opens it
fan out on write: precompute or materialize feed entries when activity happens

At small scale, fan out on read is usually good enough and much simpler to maintain. That is what made sense for us. Precomputing feeds per user would have added more moving parts than the product justified at that stage.

Still, even at small scale, I could already see what would get expensive first: joins across follows, activities, and then object lookups for the feed cards themselves.

That was a useful warning sign.

Travel-specific data: linking content to places and trips

This is where tripzar.com became more interesting than a generic social app.

A normal social schema might stop at users, posts, comments, and likes. But a travel product wants content to have context. A photo is not just a photo. It might belong to a place. A post might belong to a trip. A trip might span multiple places.

The simplest mistake here would have been to keep travel context as plain text inside the post itself:

location_name
city_name
country_name

That looks easy in the beginning, but it becomes messy very quickly. Spelling variations appear, filtering becomes inconsistent, and anything location-based turns into string matching.

I wanted a cleaner link, so the better direction was:

places
trips
posts
join or foreign-key relationships between them

For example, a post could belong to a place and optionally to a trip:

CREATE TABLE posts (
  id INT AUTO_INCREMENT PRIMARY KEY,
  user_id INT NOT NULL,
  trip_id INT NULL,
  place_id INT NULL,
  body TEXT NOT NULL,
  created_at DATETIME NOT NULL,
  KEY idx_user_created (user_id, created_at),
  KEY idx_place_created (place_id, created_at),
  KEY idx_trip_created (trip_id, created_at)
);

I liked this because it gave the product room to grow.

Now I can ask:

show posts from a specific trip
show content linked to Barcelona
show a user’s travel history grouped by trip

This also made the data model more honest. A traveler does not just publish into the void. Their content is often anchored to where they were or what journey they were on.

The hard part was deciding how strict to be.

Should every post require a place? Should every place belong to a trip? Should a trip be a first-class object or just a label over a date range?

I did not want to over-model too early, so I kept some of those relationships optional. That was probably the right choice. Travel data has enough variation that forcing every piece of content into one rigid structure can make publishing harder than it needs to be.

But I was glad I did not collapse everything into free text. Once content needs filtering, grouping, or discovery, structured travel context becomes worth the extra schema work.

The queries that got slow first

The first slow queries were not surprising once I looked at them properly.

They were the queries that combined:

social relationships
reverse chronological ordering
optional travel context

In other words, exactly the things the product cared about most.

The feed query slowed first because it had to find followed users, then sort their activity, then often fetch the related object data needed for rendering. On paper, each step was reasonable. Together, they started getting heavier.

The second kind of query that became uncomfortable was anything that mixed location filtering with social or content ordering:

SELECT p.*
FROM posts p
INNER JOIN user_follows f
  ON f.followed_user_id = p.user_id
WHERE f.follower_user_id = 15
  AND p.place_id = 33
ORDER BY p.created_at DESC
LIMIT 20;

This is the kind of query that looks fine while the tables are small and then starts reminding you that indexes only help when the access pattern is clear and stable.

I also ran into the classic problem of loading one page of feed results and then issuing too many follow-up queries for the related users, places, and trips. Even when the main query was acceptable, the overall request could still become slow because of all the extra lookups around it. I did not need every related object in full detail for every feed item. Sometimes I only needed a small projection, user name, place name, trip title, created time. That realization helped more than any clever schema change.

The other thing I noticed: ordering by created_at everywhere is convenient, but if every major read path depends on it, indexes around time and relationship keys become some of the most important ones in the whole database. I had added indexes in places that felt important without really thinking through which queries would actually run most often. Once I started looking at what the feed page actually queried, the indexing decisions became much clearer.

What changed after this project

If I were starting tripzar.com again today, I would still keep the relationship model simple. Followers over mutual friendships, direct adjacency list for the follow table, those decisions I would keep. They matched what the product needed and they did not add complexity that wasn’t justified.

The thing I would change is starting from the read paths. Not a big design document, just: how does the home feed load, how does a place page load, how does a trip page load. If I had answered those questions before finalizing the schema, some of the indexing and table shape decisions would have been much easier. I was designing storage first and then discovering that the storage didn’t quite match how the pages needed to query it.

I would also make the activity model explicit from the beginning. I arrived at the single activities table after the fact, once I realized the feed was really about activity rather than just posts. Starting there would have saved some restructuring.

And I would be stricter about treating places as structured data from day one. The travel context is what makes tripzar.com different from any other social network. Free-text location fields were the easiest choice at the start, but they became awkward the moment the product wanted to filter or group content by actual places.

The bigger thing I took from this project: database design is part of product thinking, not just storage setup. The schema doesn’t just support the features, it decides which features stay easy and which become expensive. That sounds like something you would read in a book and acknowledge without really feeling. Working through a social product where the schema directly made certain queries fast or slow made it concrete for me in a way that stuck.

The social graph: followers vs friends vs connections

Activity feeds: the fan-out problem at small scale

Travel-specific data: linking content to places and trips

The queries that got slow first

What changed after this project