tooling

Self-hosting first: why we replaced Pusher, Sentry, Algolia, and S3

Why I increasingly prefer self-hosting when the protocol is open, the service is understandable, and ownership buys real control without forcing client rewrites.

The infrastructure at PickYourTrail has a pattern to it. Sockudo where we used to use Pusher. GlitchTip where we used to use Sentry. Typesense where we used to use a managed search service. MinIO for object storage. vLLM on RunPod for LLM inference.

Each of those replacements happened for different immediate reasons, cost, control, a specific limitation we’d hit, but looking back, the underlying logic is consistent: if the service speaks an open protocol, the operational burden is manageable, and the team is willing to own it, self-hosting is almost always the better long-term decision.

This is not a “managed services are bad” argument. We still use plenty of managed services. The principle is narrower than that.

The real criterion: protocol compatibility

The reason self-hosting is often avoided is migration cost. Swapping out a service means rewriting the application code that calls it, updating the client libraries, testing everything again. That’s a meaningful burden and it’s why teams stay with managed services longer than they should.

Open protocol compatibility changes that calculus entirely. When a self-hosted alternative speaks the same protocol as the managed service, same API shape, same SDK, same wire format, the migration is a config change, not a rewrite. The infrastructure changes. The application doesn’t.

That’s the thing I look for before committing to a self-hosted path. If I can change one environment variable and everything still works, the operational overhead of self-hosting becomes the only remaining question. And for most of the services I care about, that overhead is manageable.

Sockudo: the Pusher replacement that required one config change

Sockudo is a WebSocket server that implements the Pusher protocol. The Laravel application still uses the pusher broadcasting driver. The JavaScript frontend still uses the pusher-js client. The only thing that changed was the endpoint URL in the config.

// config/broadcasting.php, before
'pusher' => [
    'driver' => 'pusher',
    'key'    => env('PUSHER_APP_KEY'),
    'secret' => env('PUSHER_APP_SECRET'),
    'app_id' => env('PUSHER_APP_ID'),
    'options' => [
        'host'   => 'api.pusherapp.com',
        'port'   => 443,
        'scheme' => 'https',
    ],
],

// config/broadcasting.php, after (Sockudo)
'pusher' => [
    'driver' => 'pusher',
    'key'    => env('PUSHER_APP_KEY'),
    'secret' => env('PUSHER_APP_SECRET'),
    'app_id' => env('PUSHER_APP_ID'),
    'options' => [
        'host'   => env('SOCKUDO_HOST', 'ws.internal'),
        'port'   => 6001,
        'scheme' => 'http',
        'useTLS' => false,
    ],
],

The application code that publishes events didn’t change. The frontend subscription code didn’t change. The developer experience didn’t change. What changed is that we no longer pay per message and we own the process.

For a product like Plato, a CRM with real-time features across a 20-person sales team, the per-message cost of Pusher at our volume wasn’t catastrophic, but it was a recurring line item for something we could own. The more interesting gain is operational: when something behaves unexpectedly, I can look at the actual WebSocket server logs. That’s not always possible with managed services.

GlitchTip: Sentry-compatible error tracking

The Sentry SDK sends error reports to a DSN, a URL that tells the SDK where to send data. GlitchTip is a Sentry-compatible error tracking server. The migration was a single environment variable.

# Before
SENTRY_DSN=https://abc123@o12345.ingest.sentry.io/9876543

# After
SENTRY_DSN=https://abc123@errors.internal/9876543

The SDK initialisation, the error grouping, the alert rules, the integration with issue tracking, all of it still works because GlitchTip implements the Sentry ingest protocol. We get error tracking we can run on our own infrastructure, without the Sentry pricing model scaling against our error volume.

The part that doesn’t transfer perfectly: Sentry’s performance monitoring features are deeper. If you rely heavily on distributed tracing or session replay, GlitchTip won’t cover everything. For our use case, error capture, issue grouping, alerting, it covers everything that matters.

Typesense: search that’s actually simple to operate

Search is a category where teams often end up paying for managed Elasticsearch or Algolia because the alternatives seemed operationally daunting. Elasticsearch requires JVM tuning, cluster management, index template configuration, and careful capacity planning. Algolia is expensive at scale.

Typesense is a single statically-linked binary. No JVM, no cluster management for basic deployments, no separate configuration server. You run it, define a collection schema, and it works.

# Start Typesense
docker run -p 8108:8108 \
  -v /data/typesense:/data \
  typesense/typesense:27.0 \
  --data-dir /data \
  --api-key=$TYPESENSE_API_KEY \
  --enable-cors

The query API is different from Elasticsearch, it doesn’t speak the Elasticsearch wire protocol, so this isn’t a zero-change migration if you’ve built directly on Elasticsearch queries. But Typesense has first-class client libraries for PHP, JavaScript, Python, and Go, and the query model is simpler. If you’re building search from scratch or doing a greenfield feature, Typesense’s operational simplicity and performance on commodity hardware is hard to argue against.

For Plato, Typesense handles search across the CRM, customers, trips, bookings, sellers. The performance on our dataset is fast enough that we’ve never had a conversation about upgrading the Typesense instance. It just runs.

MinIO: S3 without the AWS bill

MinIO implements the S3 API. The AWS SDK just needs a different endpoint and credentials.

// Laravel filesystem config
's3' => [
    'driver'   => 's3',
    'key'      => env('MINIO_ACCESS_KEY'),
    'secret'   => env('MINIO_SECRET_KEY'),
    'region'   => 'us-east-1',
    'bucket'   => env('MINIO_BUCKET'),
    'url'      => env('MINIO_URL'),
    'endpoint' => env('MINIO_ENDPOINT', 'http://minio:9000'),
    'use_path_style_endpoint' => true,
],

Every piece of application code that calls Storage::put(), Storage::get(), or generates presigned URLs works without modification. The S3 API is the interface, the backing implementation is MinIO on our own disk.

The use case at PickYourTrail is document storage: PDF vouchers, trip itineraries, booking confirmations. These are files that need to be durable, quickly accessible via URL, and not going anywhere. S3 would work, but at the volume we’re operating, self-hosted object storage on a machine we already run costs nearly nothing incrementally.

MinIO is also how the local development environment handles object storage, it’s one of the services Bloat manages as a native process. Developers get the same s3:// interface locally as in production, with no mocking or local shims needed.

vLLM: the OpenAI-compatible inference layer

This pattern extends into AI infrastructure. The OpenAI client library, the most widely used LLM client across every language, has a base_url parameter for pointing at a different endpoint.

# Standard OpenAI
client = OpenAI(api_key="sk-...")

# vLLM on RunPod (same client, different endpoint)
client = OpenAI(
    api_key=os.getenv("VLLM_API_KEY"),
    base_url=os.getenv("VLLM_BASE_URL"),  # https://api.runpod.ai/v2/{pod_id}/openai/v1
)

# Usage is identical
response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=[{"role": "user", "content": prompt}],
)

All the application code that calls the OpenAI client, Sherpa’s chat analysis, the AI trip planning agents, the extraction pipeline in chat_dataset, can point at vLLM on RunPod without any code changes. The economics are different: no per-token cost on a serverless GPU that’s only running when there’s work, versus OpenAI’s pricing on high-volume inference.

The tradeoff is real: vLLM on a 7B model is not GPT-4o. We use it for workloads where volume is high and the task is constrained enough that a smaller model, properly guided (see: DSPy), does the job. For tasks requiring strong reasoning or creative generation, managed APIs are still the right call.

What we don’t self-host

The payment infrastructure runs on managed services and won’t change. The reasoning isn’t complexity, it’s that the cost of owning payment processing goes well beyond the server burden. PCI compliance, fraud systems, dispute handling: these are problems I don’t want to own, regardless of what the protocol compatibility looks like.

Analytics and BI tooling is another area where managed services still win for us. The operational overhead of maintaining a data warehouse, query engine, and dashboard layer is significant, and the strategic value of owning that layer is lower than the other infrastructure we’ve self-hosted.

Email delivery, transactional email specifically, sits on the boundary. We handle our own email templates and composition (plato-emails, a system that generates personalized trip documents), but the actual SMTP delivery layer is managed. Deliverability is hard to get right at scale and the failure mode (email in spam) is high-visibility.

The pattern for what we don’t self-host: high regulatory burden, hard-to-replicate network effects (deliverability reputation, fraud data), or low strategic value relative to operational cost.

The actual decision framework

The question isn’t “managed or self-hosted?” The question is: what is the realistic operational cost, and does the control and cost benefit justify it?

For services that speak open protocols, the migration cost is low, often a single config change. That removes the biggest objection. What remains is the operational burden: does the team have the capacity to run this, debug it when it misbehaves, and upgrade it over time?

For most of the services above, that burden is genuinely low. Sockudo, MinIO, and Typesense are stable, well-maintained, and don’t require significant ongoing attention. vLLM on RunPod is serverless, it scales to zero when not in use and we don’t manage the underlying GPU hardware. GlitchTip needs a database and occasional maintenance, but it’s not a high-complexity service.

If the answer to “what’s the realistic operational burden?” is “a few hours a year once it’s set up,” and the protocol compatibility means no client code changes, the decision is usually clear.

Own what’s worth owning. Use managed services where the complexity or the risk doesn’t belong to you. Know which category each service falls into before you decide.