feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback - SQLite store with msgid de-dup and retention job - Mentions + Pushover + tuning; structured JSON logs - Summaries: concise, link-following, multi-line grouping - HTTP: /healthz, /ready, /tail, /trigger, /metrics - Docker: distroless, healthcheck, version metadata - Docs: README, CHANGELOG, compose
2025-08-15 18:06:28 -05:00 · 2025-08-15 18:06:28 -05:00 · 2954e85e7a
commit 2954e85e7a
19 changed files with 1983 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,274 @@
+# sojuboy
+
+An IRC bouncer companion service for soju that:
+
+- Watches your bouncer-connected channels continuously
+- Notifies you on mentions via Pushover (default)
+- Stores messages in SQLite for summaries and on-demand inspection
+- Generates AI digests (OpenAI by default) on schedule or on demand
+- Exposes a small HTTP API for health, tailing messages, metrics, and triggering digests
+
+Note: this is not a bot and never replies in IRC. It passively attaches as a soju multi-client on your main account.
+
+## Why
+
+If you use soju as a bouncer, you may want per-client alerts and AI summaries without running a heavy IRC client all the time. This service connects to soju as a distinct client identity (e.g., `username/network@client`) and handles notifications and summaries for you, containerized and easy to run on a Synology or any Docker host.
+
+## High-level architecture
+
+- Language: Go (single static binary, low memory footprint)
+- Long-lived IRC client: raw IRC using a lightweight parser (sorcix/irc) with an irssi-style handshake tailored for soju
+- Message storage: SQLite via modernc.org/sqlite
+- Scheduling: github.com/robfig/cron/v3
+- Notifications: github.com/gregdel/pushover
+- Summarization (LLM): github.com/sashabaranov/go-openai
+- HTTP API: Go stdlib `net/http`
+
+Runtime modules:
+
+- `internal/soju`: soju connection, capability negotiation, irssi-style PASS/USER auth, joins, message ingestion, event playback, CHATHISTORY fallback
+- `internal/store`: SQLite schema and queries
+- `internal/notifier`: Pushover notifier (pluggable interface)
+- `internal/summarizer`: OpenAI client with GPT-5 defaults, GPT-4o-mini fallback
+- `internal/scheduler`: cron-based digest scheduling and daily retention job
+- `internal/httpapi`: `/healthz`, `/tail`, `/trigger`, `/metrics`
+- `internal/config`: env config loader and helpers
+
+## Features
+
+- Mention/keyword detection: punctuation-tolerant (letters, digits, `_` and `-` are word chars)
+- Mention tuning: allow/deny channels, urgent keywords bypass quiet hours, rate limiting
+- AI digest generation: concise natural summaries (no rigid sections); integrates pasted multi-line posts and referenced link context
+- Configurable schedules (cron), quiet hours, and summary parameters
+- Local persistence with retention pruning (daily at 03:00)
+- HTTP endpoints: health, tail, metrics, on-demand digests
+
+## How it works
+
+1) The service connects to soju and negotiates IRCv3 capabilities:
+   - Requests: `server-time`, `message-tags`, `batch`, `cap-notify`, `echo-message`, `draft/event-playback`; optional fallback `draft/chathistory` when needed
+   - Joins happen after numeric 001 (welcome)
+
+2) Authentication:
+   - PASS then irssi-style `USER <username/network@client> <same> <host> :<realname>`
+   - Soju’s per-client identity preserves distinct history
+
+3) Playback and backfill:
+   - If `draft/event-playback` is enabled, soju replays missed messages automatically
+   - Optional fallback: `CHATHISTORY LATEST <channel> timestamp=<RFC3339Nano> <limit>` using the last stored timestamp per channel (disabled by default)
+
+4) Messages and mentions:
+   - Each `PRIVMSG` is stored with server-time when available
+   - Mentions trigger Pushover notifications subject to quiet hours, urgency, and rate limits
+   - Debug logs include: mention delivered or suppression reason (backfill, quiet hours, rate limit)
+
+5) Summarization:
+   - `/trigger` or the scheduler loads a window and calls OpenAI (with a 60s timeout)
+   - Defaults to `OPENAI_MODEL=gpt-5` with `MaxCompletionTokens`; temperature omitted for reasoning-like models
+   - Tunables let you follow link targets and group multi-line posts (see env below)
+
+6) HTTP API:
+   - `/healthz` → `200 ok`
+   - `/ready` → `200` only when connected to soju
+   - `/tail?channel=#chan&limit=N` → plaintext tail (chronological)
+   - `/trigger?channel=#chan&window=6h` → returns digest and sends via notifier
+   - `/metrics` → Prometheus text metrics
+   - Protect `/tail` and `/trigger` with `HTTP_TOKEN` via Bearer, `token` query, `X-Auth-Token`, or basic auth (`token:<HTTP_TOKEN>`)
+
+## Health and readiness
+
+- `/healthz` always returns 200
+- `/ready` returns 200 only when connected to soju
+- Binary supports `--health` to perform a local readiness check and exit 0/1. Example Docker healthcheck:
+
+```yaml
+healthcheck:
+  test: ["/sojuboy", "--health"]
+  interval: 30s
+  timeout: 3s
+  retries: 3
+```
+
+## Installation
+
+### Prerequisites
+
+- Docker (or Synology Container Manager)
+- A soju bouncer you can connect to
+- Pushover account and app token (for push)
+- OpenAI API key (for AI summaries)
+
+### Build and run (Docker Compose)
+
+1) Create `.env` in repo root (see example below)
+
+2) Start:
+
+```bash
+docker-compose up -d --build
+```
+
+3) Health check:
+
+```bash
+curl -s http://localhost:8080/healthz
+```
+
+4) Tail last messages (remember to URL-encode `#` as `%23`):
+
+```bash
+curl -s "http://localhost:8080/tail?channel=%23animaniacs&limit=50" \
+  -H "Authorization: Bearer $HTTP_TOKEN"
+```
+
+5) Trigger a digest for the last 6 hours:
+
+```bash
+curl -s "http://localhost:8080/trigger?channel=%23animaniacs&window=6h" \
+  -H "Authorization: Bearer $HTTP_TOKEN"
+```
+
+6) Metrics:
+
+```bash
+curl -s http://localhost:8080/metrics
+```
+
+## Quick start (Docker Compose)
+
+```bash
+docker-compose up -d --build
+# wait for healthy
+docker inspect --format='{{json .State.Health}}' sojuboy | jq
+```
+
+Compose includes a healthcheck calling the binary’s `--health` flag, which returns 0 only when `/ready` is 200.
+
+## Configuration (.env example)
+
+```env
+# soju / IRC
+SOJU_HOST=bnc.example.org
+SOJU_PORT=6697
+SOJU_TLS=true
+SOJU_NETWORK=your-network
+
+# Client identity: include client suffix for per-client history in soju
+IRC_NICK=yourNick
+IRC_USERNAME=yourUser/your-network@sojuboy
+IRC_REALNAME=Your Real Name
+IRC_PASSWORD=yourSojuClientPassword
+
+# Channels to auto-join (comma-separated)
+CHANNELS=#animaniacs,#general
+KEYWORDS=yourNick,YourCompany
+
+# Auth method hint (raw is used; value is ignored but kept for compatibility)
+SOJU_AUTH=raw
+
+# Notifier (Pushover)
+NOTIFIER=pushover
+PUSHOVER_USER_KEY=your-pushover-user-key
+PUSHOVER_API_TOKEN=your-pushover-app-token
+
+# Summarizer (OpenAI)
+LLM_PROVIDER=openai
+OPENAI_API_KEY=sk-...
+OPENAI_BASE_URL=https://api.openai.com/v1
+OPENAI_MODEL=gpt-5
+OPENAI_MAX_TOKENS=700
+# Summarizer tuning
+SUMM_FOLLOW_LINKS=true           # fetch small snippets from referenced links
+SUMM_LINK_TIMEOUT=6s             # HTTP timeout per link
+SUMM_LINK_MAX_BYTES=262144       # max bytes fetched per link
+SUMM_GROUP_WINDOW=90s            # group multi-line posts within this window
+SUMM_MAX_LINKS=5                 # limit links fetched per summary
+
+# Digests
+DIGEST_CRON=0 */6 * * *
+DIGEST_WINDOW=6h
+QUIET_HOURS=
+
+# Mentions/alerts
+NOTIFY_BACKFILL=false             # if true, notify even for replayed (older) messages
+MENTION_MIN_INTERVAL=30s          # min interval between alerts per channel/keyword
+MENTIONS_ONLY_CHANNELS=           # optional allow-list (CSV)
+MENTIONS_DENY_CHANNELS=           # optional deny-list (CSV)
+URGENT_KEYWORDS=urgent,priority   # bypass quiet hours
+
+# HTTP API
+HTTP_LISTEN=:8080
+HTTP_TOKEN=put-a-long-random-token-here
+
+# Storage
+STORE_PATH=/data/app.db
+STORE_RETENTION_DAYS=7
+
+# Logging
+LOG_LEVEL=info
+```
+
+## Pushover setup
+
+1) Install Pushover iOS app and log in
+2) Get your User Key (in the app or on the website)
+3) Create an application at `pushover.net/apps/build` to get an API token
+4) Put them in `.env` as `PUSHOVER_USER_KEY` and `PUSHOVER_API_TOKEN`
+
+## OpenAI setup
+
+- Set `OPENAI_API_KEY`
+- Set `OPENAI_BASE_URL` to exactly `https://api.openai.com/v1`
+- If `gpt-5` isn’t available on your account, use a supported model like `gpt-4o-mini`
+- GPT-5 beta limitations: temperature fixed; use `MaxCompletionTokens`
+
+## HTTP API
+
+- `GET /healthz` → `200 ok`
+- `GET /tail?channel=%23chan&limit=50`
+  - Returns plaintext messages (chronological)
+  - Auth: provide `HTTP_TOKEN` as a Bearer token (or query param `token=`)
+- `GET /trigger?channel=%23chan&window=6h`
+  - Returns plaintext digest
+  - Also sends via notifier when configured
+  - Auth as above
+- `GET /metrics`
+  - Prometheus metrics: `sojuboy_messages_ingested_total`, `sojuboy_notifications_sent_total`, `sojuboy_messages_pruned_total`, `sojuboy_connected`
+
+## Troubleshooting
+
+- Empty tail while there’s activity
+  - Ensure the service logs `join requested:` followed by `joined` for your channels
+  - Confirm `.env` `CHANNELS` contains your channels
+  - Check for `/metrics` and logs for recent message ingestion
+
+- 401 Unauthorized from `/tail` or `/trigger`
+  - Provide `Authorization: Bearer $HTTP_TOKEN` or `?token=$HTTP_TOKEN`
+
+- OpenAI 502/URL errors
+  - Ensure `OPENAI_BASE_URL=https://api.openai.com/v1`
+  - Try `OPENAI_MODEL=gpt-4o-mini` if `gpt-5` isn’t enabled for your account
+
+## Roadmap
+
+- Additional notifiers (ntfy, Telegram)
+- Long-form HTML digest rendering
+- Admin endpoints (e.g., `/join?channel=#chan`)
+
+## Development notes
+
+Project layout (selected):
+
+- `cmd/sojuboy/main.go` – entrypoint, wiring config/services
+- `internal/soju` – soju connector and ingestion
+- `internal/store` – SQLite schema and queries
+- `internal/notifier` – Pushover notifier
+- `internal/summarizer` – OpenAI client and prompts
+- `internal/httpapi` – health, tail, trigger, metrics endpoints
+- `internal/scheduler` – cron jobs
+
+Go toolchain: see `go.mod` (Go 1.23), Dockerfile builds static binary for a distroless image.
+
+## License
+
+MIT for code dependencies; this repository’s license will follow your preference (add a LICENSE if needed).