sojuboy/README.md

279 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# sojuboy
An IRC bouncer companion service for soju that:
- Watches your bouncer-connected channels continuously
- Notifies you on mentions via Pushover (default)
- Stores messages in SQLite for summaries and on-demand inspection
- Generates AI digests (OpenAI by default) on schedule or on demand
- Exposes a small HTTP API for health, tailing messages, metrics, and triggering digests
Note: this is not a bot and never replies in IRC. It passively attaches as a soju multi-client on your main account.
## Why
If you use soju as a bouncer, you may want per-client alerts and AI summaries without running a heavy IRC client all the time. This service connects to soju as a distinct client identity (e.g., `username/network@client`) and handles notifications and summaries for you, containerized and easy to run on a Synology or any Docker host.
## High-level architecture
- Language: Go (single static binary, low memory footprint)
- Long-lived IRC client: raw IRC using a lightweight parser (sorcix/irc) with an irssi-style handshake tailored for soju
- Message storage: SQLite via modernc.org/sqlite
- Scheduling: github.com/robfig/cron/v3
- Notifications: github.com/gregdel/pushover
- Summarization (LLM): github.com/sashabaranov/go-openai
- HTTP API: Go stdlib `net/http`
Runtime modules:
- `internal/soju`: soju connection, capability negotiation, irssi-style PASS/USER auth, joins, message ingestion, event playback, CHATHISTORY fallback
- `internal/store`: SQLite schema and queries
- `internal/notifier`: Pushover notifier (pluggable interface)
- `internal/summarizer`: OpenAI client with GPT-5 defaults, GPT-4o-mini fallback
- `internal/scheduler`: cron-based digest scheduling and daily retention job
- `internal/httpapi`: `/healthz`, `/ready`, `/tail`, `/trigger`, `/metrics`
- `internal/config`: env config loader and helpers
## Features
- Mention/keyword detection: punctuation-tolerant (letters, digits, `_` and `-` are word chars)
- Mention tuning: allow/deny channels, urgent keywords bypass quiet hours, rate limiting
- AI digest generation: concise natural summaries (no rigid sections); integrates pasted multi-line posts and referenced link context; image links sent to GPT5 as vision inputs
- Configurable schedules (cron), quiet hours, and summary parameters
- Local persistence with retention pruning (daily at 03:00)
- HTTP endpoints: health, tail, metrics, on-demand digests
## How it works
1) The service connects to soju and negotiates IRCv3 capabilities:
- Requests: `server-time`, `message-tags`, `batch`, `cap-notify`, `echo-message`, `draft/event-playback`; optional fallback `draft/chathistory` when needed
- Joins happen after numeric 001 (welcome)
2) Authentication:
- PASS then irssi-style `USER <username/network@client> <same> <host> :<realname>`
- Sojus per-client identity preserves distinct history
3) Playback and backfill:
- If `draft/event-playback` is enabled, soju replays missed messages automatically
- Optional fallback: `CHATHISTORY LATEST <channel> timestamp=<RFC3339Nano> <limit>` using the last stored timestamp per channel (disabled by default)
4) Messages and mentions:
- Each `PRIVMSG` is stored with server-time when available
- Mentions trigger Pushover notifications subject to quiet hours, urgency, and rate limits
- Debug logs include: mention delivered or suppression reason (backfill, quiet hours, rate limit)
5) Summarization:
- `/trigger` or the scheduler loads a window and calls OpenAI
- GPT5 context: ~272k input tokens + up to 128k output tokens (400k total)
- Summaries are concise/natural and integrate multi-line posts, article text (readability-extracted), and image links (vision)
6) HTTP API:
- `/healthz``200 ok`
- `/ready``200` only when connected to soju
- `/tail?channel=#chan&limit=N` → plaintext tail (chronological)
- `/trigger?channel=#chan&window=6h` → returns digest and sends via notifier
- `/metrics` → Prometheus text metrics
- Protect `/tail` and `/trigger` with `HTTP_TOKEN` via Bearer, `token` query, `X-Auth-Token`, or basic auth (`token:<HTTP_TOKEN>`)
## Health and readiness
- `/healthz` always returns 200
- `/ready` returns 200 only when connected to soju
- Binary supports `--health` to perform a local readiness check and exit 0/1. Example Docker healthcheck:
```yaml
healthcheck:
test: ["CMD", "/sojuboy", "--health"]
interval: 30s
timeout: 3s
retries: 3
```
## Installation
### Prerequisites
- Docker (or Synology Container Manager)
- A soju bouncer you can connect to
- Pushover account and app token (for push)
- OpenAI API key (for AI summaries)
### Build and run (Docker Compose)
1) Create `.env` in repo root (see example below)
2) Start:
```bash
docker-compose up -d --build
```
3) Health check:
```bash
curl -s http://localhost:8080/healthz
```
4) Tail last messages (remember to URL-encode `#` as `%23`):
```bash
curl -s "http://localhost:8080/tail?channel=%23animaniacs&limit=50" \
-H "Authorization: Bearer $HTTP_TOKEN"
```
5) Trigger a digest for the last 6 hours:
```bash
curl -s "http://localhost:8080/trigger?channel=%23animaniacs&window=6h" \
-H "Authorization: Bearer $HTTP_TOKEN"
```
6) Metrics:
```bash
curl -s http://localhost:8080/metrics
```
## Quick start (Docker Compose)
```bash
docker-compose up -d --build
# wait for healthy
docker inspect --format='{{json .State.Health}}' sojuboy | jq
```
Compose includes a healthcheck calling the binarys `--health` flag, which returns 0 only when `/ready` is 200.
## Configuration (.env example)
Below shows maximum or large/reasonable values. Defaults are noted where they are also the maximum or when relevant.
```env
# soju / IRC
SOJU_HOST=bnc.example.org
SOJU_PORT=6697
SOJU_TLS=true
SOJU_NETWORK=your-network
# Client identity: include client suffix for per-client history in soju
IRC_NICK=yourNick
IRC_USERNAME=yourUser/your-network@sojuboy
IRC_REALNAME=Your Real Name
IRC_PASSWORD=yourSojuClientPassword
# Channels to auto-join (comma-separated)
CHANNELS=#animaniacs,#general
KEYWORDS=yourNick,YourCompany
# Auth method hint (raw is used; value is ignored but kept for compatibility)
SOJU_AUTH=raw
# Notifier (Pushover)
NOTIFIER=pushover
PUSHOVER_USER_KEY=your-pushover-user-key
PUSHOVER_API_TOKEN=your-pushover-app-token
# Summarizer (OpenAI)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-5
# Max completion (output) tokens for GPT5 is ~128k (model limit). Default 700.
OPENAI_MAX_TOKENS=128000
# Summarizer tuning
SUMM_FOLLOW_LINKS=true # default true
SUMM_LINK_TIMEOUT=20s # no hard max; example large
SUMM_LINK_MAX_BYTES=1048576 # no hard max; example large (1 MiB/article)
SUMM_GROUP_WINDOW=120s # no hard max; example large grouping window
SUMM_MAX_LINKS=20 # no strict max; example large
SUMM_MAX_GROUPS=20000 # 0=no cap; example large
SUMM_TIMEOUT=10m # request timeout; default 5m
# Digests
DIGEST_CRON=0 */6 * * * # every 6 hours
DIGEST_WINDOW=24h # no hard max; example large window
QUIET_HOURS= # e.g., 22:00-07:00
# Mentions/alerts
NOTIFY_BACKFILL=false # default false
MENTION_MIN_INTERVAL=30s # no hard max; rate-limit between alerts
MENTIONS_ONLY_CHANNELS= # optional allow-list (CSV)
MENTIONS_DENY_CHANNELS= # optional deny-list (CSV)
URGENT_KEYWORDS=urgent,priority # bypass quiet hours
# HTTP API
HTTP_LISTEN=:8080
HTTP_TOKEN=put-a-long-random-token-here
# Storage
STORE_PATH=/data/app.db
STORE_RETENTION_DAYS=365 # example large retention
# Logging
LOG_LEVEL=info
```
## Pushover setup
1) Install Pushover iOS app and log in
2) Get your User Key (in the app or on the website)
3) Create an application at `pushover.net/apps/build` to get an API token
4) Put them in `.env` as `PUSHOVER_USER_KEY` and `PUSHOVER_API_TOKEN`
## OpenAI setup
- Set `OPENAI_API_KEY`
- Set `OPENAI_BASE_URL` to exactly `https://api.openai.com/v1`
- If `gpt-5` isnt available on your account, use a supported model like `gpt-4o-mini`
- GPT5 limits: ~272k input + 128k output tokens (400k context)
## HTTP API
- `GET /healthz``200 ok`
- `GET /tail?channel=%23chan&limit=50`
- Returns plaintext messages (chronological)
- Auth: provide `HTTP_TOKEN` as a Bearer token (or query param `token=`)
- `GET /trigger?channel=%23chan&window=6h`
- Returns plaintext digest
- Also sends via notifier when configured
- Auth as above
- `GET /metrics`
- Prometheus metrics: `sojuboy_messages_ingested_total`, `sojuboy_notifications_sent_total`, `sojuboy_messages_pruned_total`, `sojuboy_connected`
## Troubleshooting
- Empty tail while theres activity
- Ensure the service logs `join requested:` followed by `joined` for your channels
- Confirm `.env` `CHANNELS` contains your channels
- Check for `/metrics` and logs for recent message ingestion
- 401 Unauthorized from `/tail` or `/trigger`
- Provide `Authorization: Bearer $HTTP_TOKEN` or `?token=$HTTP_TOKEN`
- OpenAI 502/URL errors
- Ensure `OPENAI_BASE_URL=https://api.openai.com/v1`
- Try `OPENAI_MODEL=gpt-4o-mini` if `gpt-5` isnt enabled for your account
## Roadmap
- Additional notifiers (ntfy, Telegram)
- Long-form HTML digest rendering
- Admin endpoints (e.g., `/join?channel=#chan`)
## Development notes
Project layout (selected):
- `cmd/sojuboy/main.go` entrypoint, wiring config/services
- `internal/soju` soju connector and ingestion
- `internal/store` SQLite schema and queries
- `internal/notifier` Pushover notifier
- `internal/summarizer` OpenAI client and prompts
- `internal/httpapi` health, tail, trigger, metrics endpoints
- `internal/scheduler` cron jobs
Go toolchain: see `go.mod` (Go 1.23), Dockerfile builds static binary for a distroless image.
## License
MIT for code dependencies; this repositorys license will follow your preference (add a LICENSE if needed).