sojuboy/README.md
Thomas Cravey 2954e85e7a feat: initial Beta 1 release
- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose
2025-08-15 18:06:28 -05:00

274 lines
9.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# sojuboy
An IRC bouncer companion service for soju that:
- Watches your bouncer-connected channels continuously
- Notifies you on mentions via Pushover (default)
- Stores messages in SQLite for summaries and on-demand inspection
- Generates AI digests (OpenAI by default) on schedule or on demand
- Exposes a small HTTP API for health, tailing messages, metrics, and triggering digests
Note: this is not a bot and never replies in IRC. It passively attaches as a soju multi-client on your main account.
## Why
If you use soju as a bouncer, you may want per-client alerts and AI summaries without running a heavy IRC client all the time. This service connects to soju as a distinct client identity (e.g., `username/network@client`) and handles notifications and summaries for you, containerized and easy to run on a Synology or any Docker host.
## High-level architecture
- Language: Go (single static binary, low memory footprint)
- Long-lived IRC client: raw IRC using a lightweight parser (sorcix/irc) with an irssi-style handshake tailored for soju
- Message storage: SQLite via modernc.org/sqlite
- Scheduling: github.com/robfig/cron/v3
- Notifications: github.com/gregdel/pushover
- Summarization (LLM): github.com/sashabaranov/go-openai
- HTTP API: Go stdlib `net/http`
Runtime modules:
- `internal/soju`: soju connection, capability negotiation, irssi-style PASS/USER auth, joins, message ingestion, event playback, CHATHISTORY fallback
- `internal/store`: SQLite schema and queries
- `internal/notifier`: Pushover notifier (pluggable interface)
- `internal/summarizer`: OpenAI client with GPT-5 defaults, GPT-4o-mini fallback
- `internal/scheduler`: cron-based digest scheduling and daily retention job
- `internal/httpapi`: `/healthz`, `/tail`, `/trigger`, `/metrics`
- `internal/config`: env config loader and helpers
## Features
- Mention/keyword detection: punctuation-tolerant (letters, digits, `_` and `-` are word chars)
- Mention tuning: allow/deny channels, urgent keywords bypass quiet hours, rate limiting
- AI digest generation: concise natural summaries (no rigid sections); integrates pasted multi-line posts and referenced link context
- Configurable schedules (cron), quiet hours, and summary parameters
- Local persistence with retention pruning (daily at 03:00)
- HTTP endpoints: health, tail, metrics, on-demand digests
## How it works
1) The service connects to soju and negotiates IRCv3 capabilities:
- Requests: `server-time`, `message-tags`, `batch`, `cap-notify`, `echo-message`, `draft/event-playback`; optional fallback `draft/chathistory` when needed
- Joins happen after numeric 001 (welcome)
2) Authentication:
- PASS then irssi-style `USER <username/network@client> <same> <host> :<realname>`
- Sojus per-client identity preserves distinct history
3) Playback and backfill:
- If `draft/event-playback` is enabled, soju replays missed messages automatically
- Optional fallback: `CHATHISTORY LATEST <channel> timestamp=<RFC3339Nano> <limit>` using the last stored timestamp per channel (disabled by default)
4) Messages and mentions:
- Each `PRIVMSG` is stored with server-time when available
- Mentions trigger Pushover notifications subject to quiet hours, urgency, and rate limits
- Debug logs include: mention delivered or suppression reason (backfill, quiet hours, rate limit)
5) Summarization:
- `/trigger` or the scheduler loads a window and calls OpenAI (with a 60s timeout)
- Defaults to `OPENAI_MODEL=gpt-5` with `MaxCompletionTokens`; temperature omitted for reasoning-like models
- Tunables let you follow link targets and group multi-line posts (see env below)
6) HTTP API:
- `/healthz``200 ok`
- `/ready``200` only when connected to soju
- `/tail?channel=#chan&limit=N` → plaintext tail (chronological)
- `/trigger?channel=#chan&window=6h` → returns digest and sends via notifier
- `/metrics` → Prometheus text metrics
- Protect `/tail` and `/trigger` with `HTTP_TOKEN` via Bearer, `token` query, `X-Auth-Token`, or basic auth (`token:<HTTP_TOKEN>`)
## Health and readiness
- `/healthz` always returns 200
- `/ready` returns 200 only when connected to soju
- Binary supports `--health` to perform a local readiness check and exit 0/1. Example Docker healthcheck:
```yaml
healthcheck:
test: ["/sojuboy", "--health"]
interval: 30s
timeout: 3s
retries: 3
```
## Installation
### Prerequisites
- Docker (or Synology Container Manager)
- A soju bouncer you can connect to
- Pushover account and app token (for push)
- OpenAI API key (for AI summaries)
### Build and run (Docker Compose)
1) Create `.env` in repo root (see example below)
2) Start:
```bash
docker-compose up -d --build
```
3) Health check:
```bash
curl -s http://localhost:8080/healthz
```
4) Tail last messages (remember to URL-encode `#` as `%23`):
```bash
curl -s "http://localhost:8080/tail?channel=%23animaniacs&limit=50" \
-H "Authorization: Bearer $HTTP_TOKEN"
```
5) Trigger a digest for the last 6 hours:
```bash
curl -s "http://localhost:8080/trigger?channel=%23animaniacs&window=6h" \
-H "Authorization: Bearer $HTTP_TOKEN"
```
6) Metrics:
```bash
curl -s http://localhost:8080/metrics
```
## Quick start (Docker Compose)
```bash
docker-compose up -d --build
# wait for healthy
docker inspect --format='{{json .State.Health}}' sojuboy | jq
```
Compose includes a healthcheck calling the binarys `--health` flag, which returns 0 only when `/ready` is 200.
## Configuration (.env example)
```env
# soju / IRC
SOJU_HOST=bnc.example.org
SOJU_PORT=6697
SOJU_TLS=true
SOJU_NETWORK=your-network
# Client identity: include client suffix for per-client history in soju
IRC_NICK=yourNick
IRC_USERNAME=yourUser/your-network@sojuboy
IRC_REALNAME=Your Real Name
IRC_PASSWORD=yourSojuClientPassword
# Channels to auto-join (comma-separated)
CHANNELS=#animaniacs,#general
KEYWORDS=yourNick,YourCompany
# Auth method hint (raw is used; value is ignored but kept for compatibility)
SOJU_AUTH=raw
# Notifier (Pushover)
NOTIFIER=pushover
PUSHOVER_USER_KEY=your-pushover-user-key
PUSHOVER_API_TOKEN=your-pushover-app-token
# Summarizer (OpenAI)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-5
OPENAI_MAX_TOKENS=700
# Summarizer tuning
SUMM_FOLLOW_LINKS=true # fetch small snippets from referenced links
SUMM_LINK_TIMEOUT=6s # HTTP timeout per link
SUMM_LINK_MAX_BYTES=262144 # max bytes fetched per link
SUMM_GROUP_WINDOW=90s # group multi-line posts within this window
SUMM_MAX_LINKS=5 # limit links fetched per summary
# Digests
DIGEST_CRON=0 */6 * * *
DIGEST_WINDOW=6h
QUIET_HOURS=
# Mentions/alerts
NOTIFY_BACKFILL=false # if true, notify even for replayed (older) messages
MENTION_MIN_INTERVAL=30s # min interval between alerts per channel/keyword
MENTIONS_ONLY_CHANNELS= # optional allow-list (CSV)
MENTIONS_DENY_CHANNELS= # optional deny-list (CSV)
URGENT_KEYWORDS=urgent,priority # bypass quiet hours
# HTTP API
HTTP_LISTEN=:8080
HTTP_TOKEN=put-a-long-random-token-here
# Storage
STORE_PATH=/data/app.db
STORE_RETENTION_DAYS=7
# Logging
LOG_LEVEL=info
```
## Pushover setup
1) Install Pushover iOS app and log in
2) Get your User Key (in the app or on the website)
3) Create an application at `pushover.net/apps/build` to get an API token
4) Put them in `.env` as `PUSHOVER_USER_KEY` and `PUSHOVER_API_TOKEN`
## OpenAI setup
- Set `OPENAI_API_KEY`
- Set `OPENAI_BASE_URL` to exactly `https://api.openai.com/v1`
- If `gpt-5` isnt available on your account, use a supported model like `gpt-4o-mini`
- GPT-5 beta limitations: temperature fixed; use `MaxCompletionTokens`
## HTTP API
- `GET /healthz``200 ok`
- `GET /tail?channel=%23chan&limit=50`
- Returns plaintext messages (chronological)
- Auth: provide `HTTP_TOKEN` as a Bearer token (or query param `token=`)
- `GET /trigger?channel=%23chan&window=6h`
- Returns plaintext digest
- Also sends via notifier when configured
- Auth as above
- `GET /metrics`
- Prometheus metrics: `sojuboy_messages_ingested_total`, `sojuboy_notifications_sent_total`, `sojuboy_messages_pruned_total`, `sojuboy_connected`
## Troubleshooting
- Empty tail while theres activity
- Ensure the service logs `join requested:` followed by `joined` for your channels
- Confirm `.env` `CHANNELS` contains your channels
- Check for `/metrics` and logs for recent message ingestion
- 401 Unauthorized from `/tail` or `/trigger`
- Provide `Authorization: Bearer $HTTP_TOKEN` or `?token=$HTTP_TOKEN`
- OpenAI 502/URL errors
- Ensure `OPENAI_BASE_URL=https://api.openai.com/v1`
- Try `OPENAI_MODEL=gpt-4o-mini` if `gpt-5` isnt enabled for your account
## Roadmap
- Additional notifiers (ntfy, Telegram)
- Long-form HTML digest rendering
- Admin endpoints (e.g., `/join?channel=#chan`)
## Development notes
Project layout (selected):
- `cmd/sojuboy/main.go` entrypoint, wiring config/services
- `internal/soju` soju connector and ingestion
- `internal/store` SQLite schema and queries
- `internal/notifier` Pushover notifier
- `internal/summarizer` OpenAI client and prompts
- `internal/httpapi` health, tail, trigger, metrics endpoints
- `internal/scheduler` cron jobs
Go toolchain: see `go.mod` (Go 1.23), Dockerfile builds static binary for a distroless image.
## License
MIT for code dependencies; this repositorys license will follow your preference (add a LICENSE if needed).