The Silent Pain of Deploying Applications

The Silent Pain of Deploying Applications

M. Zakyuddin Munziri

M. Zakyuddin Munziri

@zakiego

I keep seeing the same deployment story play out across teams and stacks. It looks small at first. Nothing fails during deploy. Health checks are green. CI is happy. The server is live.

Then later, something breaks in production and the investigation becomes a slow, ugly hunt. The root cause is often the same. Missing or wrong environment variables. The app ran for hours or days before the missing value finally mattered. By then the failure looks mysterious. By then people are stressed.

This pattern is so common it should be treated as a design flaw, not developer carelessness. Environment variables are not optional metadata. They are critical configuration for the runtime. Treat them like that.


The familiar story

One dev asks another to help ship an app. CI passes. The deploy goes through. When tested, things look fine. Then a feature behaves unexpectedly. Or an endpoint silently returns the wrong payload.

We dig through logs. Nothing obvious. No stack trace points straight at the problem. Hours pass. Eventually someone finds it: a required environment variable never made it to production. Locally it worked because someone added it on their machine and forgot to update the central config.

This is not edge case. It is one of the most exhausting failure modes in modern apps. It is cheap to avoid, and expensive to fix under fire.


The real problem

This is not about blaming people. It is about systems that allow latent configuration errors to live too long.

If a vehicle had missing brakes, you would not let it start moving. You would not let it drive until the brakes are installed. Yet many applications start and run even when essential configuration is missing. The failure only becomes visible when a specific code path is triggered.

We need to treat configuration as first class. Fail early. Fail clearly. That is responsible engineering.


How it should behave

At minimum, two rules should be enforced in every service:

  1. The application refuses to start if required environment variables are missing. Do not wait for runtime to reveal the issue. Crash at startup and make the problem immediate and obvious.

  2. Errors must be explicit and human readable. If API_KEY is missing, the runtime error should say exactly that. No guessing. No necromancy in logs.

These two rules remove most of the emotional overhead in incident response. They stop the slow hunt and redirect effort where it matters: fixing the configuration contract between deploy and runtime.


Validating environment variables

Treat environment variables as inputs. Validate them. In JavaScript land this is easy and low risk.

I usually recommend schema validation with Zod or using a small abstraction like @t3-oss/env-nextjs. The pattern is simple:

  • define the contract
  • validate at startup or build time
  • fail loud if something is wrong

That gives you type safety and removes the “works on my machine” trap.

Example with Zod

env.ts

import { z } from "zod";

const envSchema = z.object({
  NODE_ENV: z.enum(["development", "test", "production"]),
  DATABASE_URL: z.string().url(),
  API_KEY: z.string().min(1),
  PORT: z.coerce.number().default(3000),
});

export const ENV = envSchema.parse(process.env);

If a required value is missing or malformed, the process crashes at startup with a clear error. That is exactly the behaviour you want.

Why I export ENV as uppercase

I export the validated values as an uppercase constant:

export const ENV = ...

Uppercase signals global, deployment-level configuration. When you see ENV.DATABASE_URL, you know this is not just some in-memory variable. It is part of the app contract with deployment and CI. That small convention reduces mental overhead in big codebases.


Next level: Next.js and t3 env helpers

For Next.js, @t3-oss/env-nextjs is nice because it separates server and client config and supports build time validation.

src/env.ts

import { createEnv } from "@t3-oss/env-nextjs";
import { z } from "zod";

export const ENV = createEnv({
  server: {
    NODE_ENV: z.enum(["development", "test", "production"]),
    DATABASE_URL: z.string().url(),
    OPEN_AI_API_KEY: z.string().min(1),
  },
  client: {
    NEXT_PUBLIC_PUBLISHABLE_KEY: z.string().min(1),
  },
  runtimeEnv: {
    DATABASE_URL: process.env.DATABASE_URL,
    OPEN_AI_API_KEY: process.env.OPEN_AI_API_KEY,
    NEXT_PUBLIC_PUBLISHABLE_KEY: process.env.NEXT_PUBLIC_PUBLISHABLE_KEY,
  },
});

Importing the env file in next.config.ts makes misconfiguration fail earlier, before deployment, which is even better.

import "./src/env";

const nextConfig = {};

export default nextConfig;

.env.example is a contract

Commit a .env.example. This file is not just documentation. It is a contract between developers, CI, and production infrastructure. It shows what keys are required and what shape they should have.

NODE_ENV=production
DATABASE_URL=
OPEN_AI_API_KEY=
NEXT_PUBLIC_PUBLISHABLE_KEY=
PORT=3000

If a variable is missing from this file, that is a signal something is not part of the contract.


The payoff

Unchecked environment variables cause invisible fragility.

When you validate early:

  • DevOps waste less time on false leads
  • Engineers lose less sleep over mysterious production bugs
  • Deployments become less stressful
  • “Works on my machine” stops being an acceptable outcome

Validating env is not optional. It is part of the safety belt you put on before starting the car.


Practical checklist

You do not need a complex system to get this right. Start with small steps:

  1. Add a schema and validate at startup.
  2. Export validated config as a global constant like ENV.
  3. Commit .env.example and treat it as the canonical contract.
  4. If possible, import env validation into your build step to fail fast during CI.

These moves are low cost and high ROI.


Final thought

If your app still depends on unchecked process.env values, you are not asking if production will break. You are asking when it will break.

Fail early. Fail clearly. Make configuration impossible to ignore.

More Articles

I Stopped Digging Through Logs

I Stopped Digging Through Logs

Debugging changed when I stopped reading logs manually and started using AI agents to correlate errors across observability data - faster root cause, fewer dead ends.

Speed Was Never the Hard Part in CI CD

Speed Was Never the Hard Part in CI CD

Fast pipelines don't eliminate shipping fear. Confidence comes from safe rollbacks, feature flags, and systems that behave predictably when things go wrong.