INFRASTRUCTURE · DEVOPS · AI

How I Self-Hosted My Entire Portfolio Stack for Free

Vercel · Cloudflare · Groq · Proxmox — production infrastructure at $0/month

The Setup

Most portfolios sit on Vercel or Netlify and call it a day. This one runs on a physical machine in my home — a mini PC running Proxmox, hosting two Ubuntu VMs, with everything from the AI chat service to the database wired together by hand.

The goal was never to avoid managed services for the sake of it. It was to understand what those services actually do — and to build something that deploys itself, monitors itself, and recovers itself without a cloud bill.

Proxmox as the Foundation

A bare-metal Proxmox hypervisor runs two Ubuntu 24.04 VMs: one for the Node.js/Express API and PostgreSQL, one for the Python/Flask AI service. Proxmox gives snapshot-based recovery, isolated networking, and resource limits per VM — the same primitives cloud providers sell, just on hardware I own outright.

Public traffic reaches the home network through a Cloudflare Tunnel, which handles TLS termination and DDoS protection without opening any inbound firewall ports. The tunnel is a systemd service on the VM — it starts on boot, reconnects automatically, and proxies traffic straight to the application.

How the stack deploys itself

Pushing to main triggers separate GitHub Actions workflows for the Express API and Flask service. Each workflow SSHs into the corresponding VM, pulls the latest code, installs dependencies, and performs a zero-downtime restart — PM2 for the Node.js process, systemd for Flask.

The frontend deploys to Vercel on the same push. Secrets never leave GitHub Actions environment variables; the home server is never exposed to the public internet directly.

How a request moves through the stack

GitHub push → Actions
  ├─ Vercel (Next.js frontend)
  └─ SSH deploy to home server
       ├─ VM 1: Express + PostgreSQL
       │    └─ PM2 (hot reload)
       └─ VM 2: Flask AI service
            └─ systemd (auto-restart)

Browser request
  → Cloudflare Tunnel (TLS)
    → Proxmox VM (Ubuntu 24.04)
      → Express API :4000
        └─ Flask AI  :5000
             └─ Groq API
                  └─ SSE stream
                    → browser

Zero-downtime on commodity hardware

With no load balancer or auto-scaling group, resilience comes from process supervision. PM2 restarts the API on crash. systemd restarts Flask. Cloudflare Tunnel masks any brief connectivity gap during a deploy.

This is the same reliability model that large-scale systems use — just stripped down to its essential parts.

The AI Layer

The chat widget in the corner of this site talks to a real AI service — running on the same home server, built from scratch.

Flask AI Service

A lightweight Python/Flask API receives chat messages from the Express backend and forwards them to the Groq API with a carefully engineered system prompt. The service has no database — all context management happens in the frontend conversation thread. One process, one responsibility.

Streaming with SSE

Responses stream token-by-token using Server-Sent Events — the same protocol OpenAI and Anthropic use in their chat UIs. The Flask service proxies Groq's streaming response directly to the client, so the first token appears in under a second even on a long reply.

Why Groq Instead of Local Ollama

Originally the stack ran Ollama with gemma3 locally. CPU-only inference on a home server meant 2+ minute response times — unusable for a chat widget. Groq's free tier (1,000 req/day, llama-3.1-8b-instant) delivers sub-2-second responses. The Flask service is model-agnostic — swapping providers is a one-line env change.

Every Layer of the Stack

Frontend

Next.js 15React 19Tailwind CSSFramer Motion

Backend

Node.jsExpressTypeScriptPrismaSocket.io

AI Layer

PythonFlaskGroq APIllama-3.1-8b-instantSSE Streaming

Infrastructure

ProxmoxUbuntu 24.04systemdGitHub ActionsCloudflare TunnelVercel

Auth & Data

JWTPostgreSQLGoogle OAuthAzure AD

By the Numbers

$0/month

Full production infrastructure cost

< 2s

AI response time via Groq

2 VMs

Self-hosted on Proxmox home server

What I Learned

Networking without training wheels

Configuring Cloudflare Tunnels, internal VM networking, and firewall rules teaches you what managed cloud services abstract away. You stop thinking in terms of features and start thinking in terms of packets.

Prompt engineering is real engineering

The difference between a generic AI response and a useful one is entirely in the system prompt. Tone, structure, factual grounding, and navigation behavior all require deliberate design — no different from writing a well-scoped function.

Self-hosting forces good architecture

When you can't rely on auto-scaling or managed restarts, you design for resilience from the start — systemd services, PM2 process managers, and health-check-driven deployments. Constraints are a feature.