sojuboy/README.md

14 KiB
Raw Blame History

sojuboy

An IRC bouncer companion service for soju that:

  • Watches your bouncer-connected channels continuously
  • Notifies you on mentions via Pushover (default)
  • Stores messages in SQLite for summaries and on-demand inspection
  • Generates AI digests (OpenAI by default) on schedule or on demand
  • Exposes a small HTTP API for health, tailing messages, metrics, and triggering digests

Note: this is not a bot and never replies in IRC. It passively attaches as a soju multi-client on your main account.

Why

If you use soju as a bouncer, you may want per-client alerts and AI summaries without running a heavy IRC client all the time. This service connects to soju as a distinct client identity (e.g., username/network@client) and handles notifications and summaries for you, containerized and easy to run on a Synology or any Docker host.

High-level architecture

  • Language: Go (single static binary, low memory footprint)
  • Long-lived IRC client: raw IRC using a lightweight parser (sorcix/irc) with an irssi-style handshake tailored for soju
  • Message storage: SQLite via modernc.org/sqlite
  • Scheduling: github.com/robfig/cron/v3
  • Notifications: github.com/gregdel/pushover
  • Summarization (LLM): github.com/sashabaranov/go-openai
  • HTTP API: Go stdlib net/http

Runtime modules:

  • internal/soju: soju connection, capability negotiation, irssi-style PASS/USER auth, joins, message ingestion, event playback, CHATHISTORY fallback
  • internal/store: SQLite schema and queries
  • internal/notifier: Pushover notifier (pluggable interface)
  • internal/summarizer: OpenAI client with GPT-5 defaults, GPT-4o-mini fallback
  • internal/scheduler: cron-based digest scheduling and daily retention job
  • internal/httpapi: /healthz, /ready, /tail, /trigger, /metrics
  • internal/config: env config loader and helpers

Features

  • Mention/keyword detection: punctuation-tolerant (letters, digits, _ and - are word chars)
  • Mention tuning: allow/deny channels, urgent keywords bypass quiet hours, rate limiting
  • AI digest generation: concise natural summaries (no rigid sections); integrates pasted multi-line posts and referenced link context; image links sent to GPT5 as vision inputs
  • Configurable schedules (cron), quiet hours, and summary parameters
  • Local persistence with retention pruning (daily at 03:00)
  • HTTP endpoints: health, tail, metrics, on-demand digests

How it works

  1. The service connects to soju and negotiates IRCv3 capabilities:

    • Requests: server-time, message-tags, batch, cap-notify, echo-message, draft/event-playback; optional fallback draft/chathistory when needed
    • Joins happen after numeric 001 (welcome)
  2. Authentication:

    • PASS then irssi-style USER <username/network@client> <same> <host> :<realname>
    • Sojus per-client identity preserves distinct history
  3. Playback and backfill:

    • If draft/event-playback is enabled, soju replays missed messages automatically
    • Optional fallback: CHATHISTORY LATEST <channel> timestamp=<RFC3339Nano> <limit> using the last stored timestamp per channel (disabled by default)
  4. Messages and mentions:

    • Each PRIVMSG is stored with server-time when available
    • Mentions trigger Pushover notifications subject to quiet hours, urgency, and rate limits
    • Debug logs include: mention delivered or suppression reason (backfill, quiet hours, rate limit)
  5. Summarization:

    • /trigger or the scheduler loads a window and calls OpenAI
    • GPT5 context: ~272k input tokens + up to 128k output tokens (400k total)
    • Summaries are concise/natural and integrate multi-line posts, article text (readability-extracted), and image links (vision)
  6. HTTP API:

    • /healthz200 ok
    • /ready200 only when connected to soju
    • /tail?channel=#chan&limit=N → plaintext tail (chronological)
    • /trigger?channel=#chan&window=6h → returns digest and sends via notifier
    • /metrics → Prometheus text metrics
    • Protect /tail and /trigger with HTTP_TOKEN via Bearer, token query, X-Auth-Token, or basic auth (token:<HTTP_TOKEN>)

Health and readiness

  • /healthz always returns 200
  • /ready returns 200 only when connected to soju
  • Binary supports --health to perform a local readiness check and exit 0/1. Example Docker healthcheck:
healthcheck:
  test: ["CMD", "/sojuboy", "--health"]
  interval: 30s
  timeout: 3s
  retries: 3

Installation

Prerequisites

  • Docker (or Synology Container Manager)
  • A soju bouncer you can connect to
  • Pushover account and app token (for push)
  • OpenAI API key (for AI summaries)

Build and run (Docker Compose)

  1. Create .env in repo root (see example below)

  2. Start:

docker-compose up -d --build
  1. Health check:
curl -s http://localhost:8080/healthz
  1. Tail last messages (remember to URL-encode # as %23):
curl -s "http://localhost:8080/tail?channel=%23animaniacs&limit=50" \
  -H "Authorization: Bearer $HTTP_TOKEN"
  1. Trigger a digest for the last 6 hours:
curl -s "http://localhost:8080/trigger?channel=%23animaniacs&window=6h" \
  -H "Authorization: Bearer $HTTP_TOKEN"
  1. Metrics:
curl -s http://localhost:8080/metrics

Quick start (Docker Compose)

docker-compose up -d --build
# wait for healthy
docker inspect --format='{{json .State.Health}}' sojuboy | jq

Compose includes a healthcheck calling the binarys --health flag, which returns 0 only when /ready is 200.

Configuration options

You can configure via a .env file or inline environment: in your compose YAML. Both approaches are shown below. Defaults for all variables are listed in the table after the examples.

Below shows maximum or large/reasonable values. Defaults are noted where they are also the maximum or when relevant.

# soju / IRC
SOJU_HOST=bnc.example.org
SOJU_PORT=6697
SOJU_TLS=true
SOJU_NETWORK=your-network

# Client identity: include client suffix for per-client history in soju
IRC_NICK=yourNick
IRC_USERNAME=yourUser/your-network@sojuboy
IRC_REALNAME=Your Real Name
IRC_PASSWORD=yourSojuClientPassword

# Channels to auto-join (comma-separated)
CHANNELS=#animaniacs,#general
KEYWORDS=yourNick,YourCompany

# Auth method hint (raw is used; value is ignored but kept for compatibility)
SOJU_AUTH=raw

# Notifier (Pushover)
NOTIFIER=pushover
PUSHOVER_USER_KEY=your-pushover-user-key
PUSHOVER_API_TOKEN=your-pushover-app-token

# Summarizer (OpenAI)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-5
# Max completion (output) tokens for GPT5 is ~128k (model limit). Default 128000.
OPENAI_MAX_TOKENS=128000
# Summarizer tuning
SUMM_FOLLOW_LINKS=true            # default true
SUMM_LINK_TIMEOUT=20s             # default 20s
SUMM_LINK_MAX_BYTES=1048576       # default 1048576 (1 MiB/article)
SUMM_GROUP_WINDOW=120s            # default 120s
SUMM_MAX_LINKS=20                 # default 20
SUMM_MAX_GROUPS=20000             # default 0 (no cap); example large
SUMM_TIMEOUT=10m                  # request timeout; default 10m

# Digests
DIGEST_CRON=0 */6 * * *           # every 6 hours
DIGEST_WINDOW=24h                 # default 24h
QUIET_HOURS=                      # e.g., 22:00-07:00

# Mentions/alerts
NOTIFY_BACKFILL=false             # default false
MENTION_MIN_INTERVAL=30s          # no hard max; rate-limit between alerts
MENTIONS_ONLY_CHANNELS=           # optional allow-list (CSV)
MENTIONS_DENY_CHANNELS=           # optional deny-list (CSV)
URGENT_KEYWORDS=urgent,priority   # bypass quiet hours

# HTTP API
HTTP_LISTEN=:8080
HTTP_TOKEN=put-a-long-random-token-here

# Storage
STORE_PATH=/data/app.db
STORE_RETENTION_DAYS=365          # default 365

# Logging
LOG_LEVEL=info

Compose (with localhost bind suitable for Synology reverse proxy):

services:
  sojuboy:
    image: code.cravey.net/your-user/sojuboy:v0.1.0-beta1
    restart: unless-stopped
    env_file: .env
    ports:
      - "127.0.0.1:8080:8080"  # bind only to localhost; fronted by DSM Reverse Proxy
    volumes:
      - /volume1/docker/sojuboy/data:/data
    healthcheck:
      test: ["CMD", "/sojuboy", "--health"]
      interval: 30s
      timeout: 3s
      retries: 3

Option B: Inline environment in compose (no .env)

services:
  sojuboy:
    image: code.cravey.net/your-user/sojuboy:v0.1.0-beta1
    restart: unless-stopped
    ports:
      - "127.0.0.1:8080:8080"  # bind only to localhost; fronted by DSM Reverse Proxy
    volumes:
      - /volume1/docker/sojuboy/data:/data
    environment:
      SOJU_HOST: "bnc.example.org"           # default 127.0.0.1
      SOJU_PORT: "6697"                      # default 6697
      SOJU_TLS: "true"                       # default true
      SOJU_NETWORK: "your-network"           # default ""
      IRC_NICK: "yourNick"                   # default sojuboy
      IRC_USERNAME: "yourUser/your-network@sojuboy"  # default IRC_NICK
      IRC_REALNAME: "Your Real Name"         # default sojuboy
      IRC_PASSWORD: "yourSojuClientPassword" # default ""
      CHANNELS: "#animaniacs,#general"       # default "" (none)
      KEYWORDS: "yourNick,YourCompany"       # default IRC_NICK
      SOJU_AUTH: "raw"                        # default sasl (hint only)
      NOTIFIER: "pushover"                   # default pushover
      PUSHOVER_USER_KEY: "..."               # default ""
      PUSHOVER_API_TOKEN: "..."              # default ""
      LLM_PROVIDER: "openai"                 # default openai
      OPENAI_API_KEY: "sk-..."               # default ""
      OPENAI_BASE_URL: "https://api.openai.com/v1"  # default ""
      OPENAI_MODEL: "gpt-5"                  # default gpt-5
      OPENAI_MAX_TOKENS: "128000"            # default 128000
      SUMM_FOLLOW_LINKS: "true"              # default true
      SUMM_LINK_TIMEOUT: "20s"               # default 20s
      SUMM_LINK_MAX_BYTES: "1048576"         # default 1048576
      SUMM_GROUP_WINDOW: "120s"              # default 120s
      SUMM_MAX_LINKS: "20"                   # default 20
      SUMM_MAX_GROUPS: "20000"               # default 0 (no cap)
      SUMM_TIMEOUT: "10m"                    # default 10m
      DIGEST_CRON: "0 */6 * * *"             # default 0 */6 * * *
      DIGEST_WINDOW: "24h"                    # default 24h
      QUIET_HOURS: ""                         # default ""
      NOTIFY_BACKFILL: "false"               # default false
      MENTION_MIN_INTERVAL: "30s"            # default 30s
      MENTIONS_ONLY_CHANNELS: ""             # default ""
      MENTIONS_DENY_CHANNELS: ""             # default ""
      URGENT_KEYWORDS: "urgent,priority"     # default ""
      HTTP_LISTEN: ":8080"                   # default :8080
      HTTP_TOKEN: "<long-random-token>"      # default ""
      STORE_PATH: "/data/app.db"             # default /data/app.db
      STORE_RETENTION_DAYS: "365"            # default 365
      LOG_LEVEL: "info"                      # default info
    healthcheck:
      test: ["CMD", "/sojuboy", "--health"]
      interval: 30s
      timeout: 3s
      retries: 3

Defaults reference

Variable Default
SOJU_HOST 127.0.0.1
SOJU_PORT 6697
SOJU_TLS true
IRC_NICK sojuboy
IRC_USERNAME IRC_NICK
IRC_REALNAME sojuboy
IRC_PASSWORD (empty)
SOJU_NETWORK (empty)
CHANNELS (empty)
KEYWORDS IRC_NICK
SOJU_AUTH sasl
NOTIFIER pushover
PUSHOVER_USER_KEY (empty)
PUSHOVER_API_TOKEN (empty)
LLM_PROVIDER openai
OPENAI_API_KEY (empty)
OPENAI_BASE_URL (empty)
OPENAI_MODEL gpt-5
OPENAI_MAX_TOKENS 700
SUMM_FOLLOW_LINKS true
SUMM_LINK_TIMEOUT 6s
SUMM_LINK_MAX_BYTES 262144
SUMM_GROUP_WINDOW 90s
SUMM_MAX_LINKS 5
SUMM_MAX_GROUPS 0
SUMM_TIMEOUT 5m
DIGEST_CRON 0 */6 * * *
DIGEST_WINDOW 6h
QUIET_HOURS (empty)
NOTIFY_BACKFILL false
MENTION_MIN_INTERVAL 30s
MENTIONS_ONLY_CHANNELS (empty)
MENTIONS_DENY_CHANNELS (empty)
URGENT_KEYWORDS (empty)
HTTP_LISTEN :8080
HTTP_TOKEN (empty)
STORE_PATH /data/app.db
STORE_RETENTION_DAYS 7
LOG_LEVEL info

Pushover setup

  1. Install Pushover iOS app and log in
  2. Get your User Key (in the app or on the website)
  3. Create an application at pushover.net/apps/build to get an API token
  4. Put them in .env as PUSHOVER_USER_KEY and PUSHOVER_API_TOKEN

OpenAI setup

  • Set OPENAI_API_KEY
  • Set OPENAI_BASE_URL to exactly https://api.openai.com/v1
  • If gpt-5 isnt available on your account, use a supported model like gpt-4o-mini
  • GPT5 limits: ~272k input + 128k output tokens (400k context)

HTTP API

  • GET /healthz200 ok
  • GET /tail?channel=%23chan&limit=50
    • Returns plaintext messages (chronological)
    • Auth: provide HTTP_TOKEN as a Bearer token (or query param token=)
  • GET /trigger?channel=%23chan&window=6h
    • Returns plaintext digest
    • Also sends via notifier when configured
    • Auth as above
  • GET /metrics
    • Prometheus metrics: sojuboy_messages_ingested_total, sojuboy_notifications_sent_total, sojuboy_messages_pruned_total, sojuboy_connected

Troubleshooting

  • Empty tail while theres activity

    • Ensure the service logs join requested: followed by joined for your channels
    • Confirm .env CHANNELS contains your channels
    • Check for /metrics and logs for recent message ingestion
  • 401 Unauthorized from /tail or /trigger

    • Provide Authorization: Bearer $HTTP_TOKEN or ?token=$HTTP_TOKEN
  • OpenAI 502/URL errors

    • Ensure OPENAI_BASE_URL=https://api.openai.com/v1
    • Try OPENAI_MODEL=gpt-4o-mini if gpt-5 isnt enabled for your account

Roadmap

  • Additional notifiers (ntfy, Telegram)
  • Long-form HTML digest rendering
  • Admin endpoints (e.g., /join?channel=#chan)

Development notes

Project layout (selected):

  • cmd/sojuboy/main.go entrypoint, wiring config/services
  • internal/soju soju connector and ingestion
  • internal/store SQLite schema and queries
  • internal/notifier Pushover notifier
  • internal/summarizer OpenAI client and prompts
  • internal/httpapi health, tail, trigger, metrics endpoints
  • internal/scheduler cron jobs

Go toolchain: see go.mod (Go 1.23), Dockerfile builds static binary for a distroless image.

License

MIT for code dependencies; this repositorys license will follow your preference (add a LICENSE if needed).