sojuboy/README.md

275 lines
9.4 KiB
Markdown
Raw Normal View History

# sojuboy
An IRC bouncer companion service for soju that:
- Watches your bouncer-connected channels continuously
- Notifies you on mentions via Pushover (default)
- Stores messages in SQLite for summaries and on-demand inspection
- Generates AI digests (OpenAI by default) on schedule or on demand
- Exposes a small HTTP API for health, tailing messages, metrics, and triggering digests
Note: this is not a bot and never replies in IRC. It passively attaches as a soju multi-client on your main account.
## Why
If you use soju as a bouncer, you may want per-client alerts and AI summaries without running a heavy IRC client all the time. This service connects to soju as a distinct client identity (e.g., `username/network@client`) and handles notifications and summaries for you, containerized and easy to run on a Synology or any Docker host.
## High-level architecture
- Language: Go (single static binary, low memory footprint)
- Long-lived IRC client: raw IRC using a lightweight parser (sorcix/irc) with an irssi-style handshake tailored for soju
- Message storage: SQLite via modernc.org/sqlite
- Scheduling: github.com/robfig/cron/v3
- Notifications: github.com/gregdel/pushover
- Summarization (LLM): github.com/sashabaranov/go-openai
- HTTP API: Go stdlib `net/http`
Runtime modules:
- `internal/soju`: soju connection, capability negotiation, irssi-style PASS/USER auth, joins, message ingestion, event playback, CHATHISTORY fallback
- `internal/store`: SQLite schema and queries
- `internal/notifier`: Pushover notifier (pluggable interface)
- `internal/summarizer`: OpenAI client with GPT-5 defaults, GPT-4o-mini fallback
- `internal/scheduler`: cron-based digest scheduling and daily retention job
- `internal/httpapi`: `/healthz`, `/tail`, `/trigger`, `/metrics`
- `internal/config`: env config loader and helpers
## Features
- Mention/keyword detection: punctuation-tolerant (letters, digits, `_` and `-` are word chars)
- Mention tuning: allow/deny channels, urgent keywords bypass quiet hours, rate limiting
- AI digest generation: concise natural summaries (no rigid sections); integrates pasted multi-line posts and referenced link context
- Configurable schedules (cron), quiet hours, and summary parameters
- Local persistence with retention pruning (daily at 03:00)
- HTTP endpoints: health, tail, metrics, on-demand digests
## How it works
1) The service connects to soju and negotiates IRCv3 capabilities:
- Requests: `server-time`, `message-tags`, `batch`, `cap-notify`, `echo-message`, `draft/event-playback`; optional fallback `draft/chathistory` when needed
- Joins happen after numeric 001 (welcome)
2) Authentication:
- PASS then irssi-style `USER <username/network@client> <same> <host> :<realname>`
- Sojus per-client identity preserves distinct history
3) Playback and backfill:
- If `draft/event-playback` is enabled, soju replays missed messages automatically
- Optional fallback: `CHATHISTORY LATEST <channel> timestamp=<RFC3339Nano> <limit>` using the last stored timestamp per channel (disabled by default)
4) Messages and mentions:
- Each `PRIVMSG` is stored with server-time when available
- Mentions trigger Pushover notifications subject to quiet hours, urgency, and rate limits
- Debug logs include: mention delivered or suppression reason (backfill, quiet hours, rate limit)
5) Summarization:
- `/trigger` or the scheduler loads a window and calls OpenAI (with a 60s timeout)
- Defaults to `OPENAI_MODEL=gpt-5` with `MaxCompletionTokens`; temperature omitted for reasoning-like models
- Tunables let you follow link targets and group multi-line posts (see env below)
6) HTTP API:
- `/healthz``200 ok`
- `/ready``200` only when connected to soju
- `/tail?channel=#chan&limit=N` → plaintext tail (chronological)
- `/trigger?channel=#chan&window=6h` → returns digest and sends via notifier
- `/metrics` → Prometheus text metrics
- Protect `/tail` and `/trigger` with `HTTP_TOKEN` via Bearer, `token` query, `X-Auth-Token`, or basic auth (`token:<HTTP_TOKEN>`)
## Health and readiness
- `/healthz` always returns 200
- `/ready` returns 200 only when connected to soju
- Binary supports `--health` to perform a local readiness check and exit 0/1. Example Docker healthcheck:
```yaml
healthcheck:
test: ["/sojuboy", "--health"]
interval: 30s
timeout: 3s
retries: 3
```
## Installation
### Prerequisites
- Docker (or Synology Container Manager)
- A soju bouncer you can connect to
- Pushover account and app token (for push)
- OpenAI API key (for AI summaries)
### Build and run (Docker Compose)
1) Create `.env` in repo root (see example below)
2) Start:
```bash
docker-compose up -d --build
```
3) Health check:
```bash
curl -s http://localhost:8080/healthz
```
4) Tail last messages (remember to URL-encode `#` as `%23`):
```bash
curl -s "http://localhost:8080/tail?channel=%23animaniacs&limit=50" \
-H "Authorization: Bearer $HTTP_TOKEN"
```
5) Trigger a digest for the last 6 hours:
```bash
curl -s "http://localhost:8080/trigger?channel=%23animaniacs&window=6h" \
-H "Authorization: Bearer $HTTP_TOKEN"
```
6) Metrics:
```bash
curl -s http://localhost:8080/metrics
```
## Quick start (Docker Compose)
```bash
docker-compose up -d --build
# wait for healthy
docker inspect --format='{{json .State.Health}}' sojuboy | jq
```
Compose includes a healthcheck calling the binarys `--health` flag, which returns 0 only when `/ready` is 200.
## Configuration (.env example)
```env
# soju / IRC
SOJU_HOST=bnc.example.org
SOJU_PORT=6697
SOJU_TLS=true
SOJU_NETWORK=your-network
# Client identity: include client suffix for per-client history in soju
IRC_NICK=yourNick
IRC_USERNAME=yourUser/your-network@sojuboy
IRC_REALNAME=Your Real Name
IRC_PASSWORD=yourSojuClientPassword
# Channels to auto-join (comma-separated)
CHANNELS=#animaniacs,#general
KEYWORDS=yourNick,YourCompany
# Auth method hint (raw is used; value is ignored but kept for compatibility)
SOJU_AUTH=raw
# Notifier (Pushover)
NOTIFIER=pushover
PUSHOVER_USER_KEY=your-pushover-user-key
PUSHOVER_API_TOKEN=your-pushover-app-token
# Summarizer (OpenAI)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-5
OPENAI_MAX_TOKENS=700
# Summarizer tuning
SUMM_FOLLOW_LINKS=true # fetch small snippets from referenced links
SUMM_LINK_TIMEOUT=6s # HTTP timeout per link
SUMM_LINK_MAX_BYTES=262144 # max bytes fetched per link
SUMM_GROUP_WINDOW=90s # group multi-line posts within this window
SUMM_MAX_LINKS=5 # limit links fetched per summary
# Digests
DIGEST_CRON=0 */6 * * *
DIGEST_WINDOW=6h
QUIET_HOURS=
# Mentions/alerts
NOTIFY_BACKFILL=false # if true, notify even for replayed (older) messages
MENTION_MIN_INTERVAL=30s # min interval between alerts per channel/keyword
MENTIONS_ONLY_CHANNELS= # optional allow-list (CSV)
MENTIONS_DENY_CHANNELS= # optional deny-list (CSV)
URGENT_KEYWORDS=urgent,priority # bypass quiet hours
# HTTP API
HTTP_LISTEN=:8080
HTTP_TOKEN=put-a-long-random-token-here
# Storage
STORE_PATH=/data/app.db
STORE_RETENTION_DAYS=7
# Logging
LOG_LEVEL=info
```
## Pushover setup
1) Install Pushover iOS app and log in
2) Get your User Key (in the app or on the website)
3) Create an application at `pushover.net/apps/build` to get an API token
4) Put them in `.env` as `PUSHOVER_USER_KEY` and `PUSHOVER_API_TOKEN`
## OpenAI setup
- Set `OPENAI_API_KEY`
- Set `OPENAI_BASE_URL` to exactly `https://api.openai.com/v1`
- If `gpt-5` isnt available on your account, use a supported model like `gpt-4o-mini`
- GPT-5 beta limitations: temperature fixed; use `MaxCompletionTokens`
## HTTP API
- `GET /healthz``200 ok`
- `GET /tail?channel=%23chan&limit=50`
- Returns plaintext messages (chronological)
- Auth: provide `HTTP_TOKEN` as a Bearer token (or query param `token=`)
- `GET /trigger?channel=%23chan&window=6h`
- Returns plaintext digest
- Also sends via notifier when configured
- Auth as above
- `GET /metrics`
- Prometheus metrics: `sojuboy_messages_ingested_total`, `sojuboy_notifications_sent_total`, `sojuboy_messages_pruned_total`, `sojuboy_connected`
## Troubleshooting
- Empty tail while theres activity
- Ensure the service logs `join requested:` followed by `joined` for your channels
- Confirm `.env` `CHANNELS` contains your channels
- Check for `/metrics` and logs for recent message ingestion
- 401 Unauthorized from `/tail` or `/trigger`
- Provide `Authorization: Bearer $HTTP_TOKEN` or `?token=$HTTP_TOKEN`
- OpenAI 502/URL errors
- Ensure `OPENAI_BASE_URL=https://api.openai.com/v1`
- Try `OPENAI_MODEL=gpt-4o-mini` if `gpt-5` isnt enabled for your account
## Roadmap
- Additional notifiers (ntfy, Telegram)
- Long-form HTML digest rendering
- Admin endpoints (e.g., `/join?channel=#chan`)
## Development notes
Project layout (selected):
- `cmd/sojuboy/main.go` entrypoint, wiring config/services
- `internal/soju` soju connector and ingestion
- `internal/store` SQLite schema and queries
- `internal/notifier` Pushover notifier
- `internal/summarizer` OpenAI client and prompts
- `internal/httpapi` health, tail, trigger, metrics endpoints
- `internal/scheduler` cron jobs
Go toolchain: see `go.mod` (Go 1.23), Dockerfile builds static binary for a distroless image.
## License
MIT for code dependencies; this repositorys license will follow your preference (add a LICENSE if needed).