sojuboy/README.md

406 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# sojuboy
An IRC bouncer companion service for soju that:
- Watches your bouncer-connected channels continuously
- Notifies you on mentions via Pushover (default)
- Stores messages in SQLite for summaries and on-demand inspection
- Generates AI digests (OpenAI by default) on schedule or on demand
- Exposes a small HTTP API and a minimal Web UI (Pico.css) for status, tail, history, link cards, and on-demand summaries
Note: this is not a bot and never replies in IRC. It passively attaches as a soju multi-client on your main account.
## Why
If you use soju as a bouncer, you may want per-client alerts and AI summaries without running a heavy IRC client all the time. This service connects to soju as a distinct client identity (e.g., `username/network@client`) and handles notifications and summaries for you, containerized and easy to run on a Synology or any Docker host.
## High-level architecture
- Language: Go (single static binary, low memory footprint)
- Long-lived IRC client: raw IRC using a lightweight parser (sorcix/irc) with an irssi-style handshake tailored for soju
- Message storage: SQLite via modernc.org/sqlite (WAL enabled)
- Scheduling: github.com/robfig/cron/v3
- Notifications: github.com/gregdel/pushover
- Summarization (LLM): github.com/sashabaranov/go-openai
- HTTP API + Web UI: Go stdlib `net/http` + `html/template` + embedded static assets
Runtime modules:
- `internal/soju`: soju connection, capability negotiation, irssi-style PASS/USER auth, joins, message ingestion, event playback, CHATHISTORY fallback
- `internal/store`: SQLite schema and queries
- `internal/notifier`: Pushover notifier (pluggable interface)
- `internal/summarizer`: OpenAI client with GPT-5 defaults, GPT-4o-mini fallback; separate link-summarization prompt
- `internal/scheduler`: cron-based digest scheduling and daily retention job
- `internal/httpapi`: `/healthz`, `/ready`, `/tail`, `/trigger`, `/metrics`, Web UI and JSON APIs
- `internal/config`: env config loader and helpers
## Features
- Mention/keyword detection: punctuation-tolerant (letters, digits, `_` and `-` are word chars)
- Mention tuning: allow/deny channels, urgent keywords bypass quiet hours, rate limiting
- AI digest generation: concise natural summaries (no rigid sections); integrates pasted multi-line posts and referenced link context; image links sent to GPT5 as vision inputs
- Configurable schedules (cron), quiet hours, and summary parameters
- Local persistence with retention pruning (daily at 03:00)
- Web UI with:
- Realtime chat tail via SSE; auto-scroll to bottom; preload older history with infinite scroll-up
- Link cards with OG/Twitter metadata (X posts via oEmbed), YouTube oEmbed embeds, direct image previews
- Inline on-demand link summarization with caching (24h), and a single summarize toggle (🌚/🌝)
- Channel selector in the menubar, login interstitial using `HTTP_TOKEN`
## How it works
1) The service connects to soju and negotiates IRCv3 capabilities:
- Requests: `server-time`, `message-tags`, `batch`, `cap-notify`, `echo-message`, `draft/event-playback`; optional fallback `draft/chathistory` when needed
- Joins happen after numeric 001 (welcome)
2) Authentication:
- PASS then irssi-style `USER <username/network@client> <same> <host> :<realname>`
- Sojus per-client identity preserves distinct history
3) Playback and backfill:
- If `draft/event-playback` is enabled, soju replays missed messages automatically
- Optional fallback: `CHATHISTORY LATEST <channel> timestamp=<RFC3339Nano> <limit>` using the last stored timestamp per channel (disabled by default)
4) Messages and mentions:
- Each `PRIVMSG` is stored with server-time when available
- Mentions trigger Pushover notifications subject to quiet hours, urgency, and rate limits
5) Summarization:
- Digests: `/trigger` or the scheduler loads a window and calls OpenAI with a conversation-focused prompt
- Link summaries: dedicated prompt that ignores chat context; fetches page content with readability; includes oEmbed hints for YouTube and X; passes images to vision models
6) HTTP + JSON API:
- `/healthz``200 ok`
- `/ready``200` only when connected to soju
- `/tail?channel=#chan&limit=N` → JSON tail for UI
- `/history?channel=#chan&before=<RFC3339>&limit=N` → JSON older messages (infinite scroll)
- `/trigger?channel=#chan&window=6h` → returns digest JSON and (optionally) pushes via notifier
- `/linkcard?url=...` → card JSON (title/desc/image or embed HTML)
- `/linksummary?url=...` → brief AI summary of a single URL (cached 24h)
- `/metrics` → Prometheus text metrics
- Protect UI + JSON with `HTTP_TOKEN` cookie; APIs also allow Bearer/query token
## Health and readiness
- `/healthz` always returns 200
- `/ready` returns 200 only when connected to soju
- Binary supports `--health` to perform a local readiness check and exit 0/1. Example Docker healthcheck:
```yaml
healthcheck:
test: ["CMD", "/sojuboy", "--health"]
interval: 30s
timeout: 3s
retries: 3
```
## Installation
### Prerequisites
- Docker (or Synology Container Manager)
- A soju bouncer you can connect to
- Pushover account and app token (for push)
- OpenAI API key (for AI summaries)
### Build and run (Docker Compose)
1) Create `.env` in repo root (see example below)
2) Start:
```bash
docker-compose up -d --build
```
3) Health check:
```bash
curl -s http://localhost:8080/healthz
```
4) Tail last messages (remember to URL-encode `#` as `%23`):
```bash
curl -s "http://localhost:8080/tail?channel=%23animaniacs&limit=50" \
-H "Authorization: Bearer $HTTP_TOKEN"
```
5) Trigger a digest for the last 6 hours:
```bash
curl -s "http://localhost:8080/trigger?channel=%23animaniacs&window=6h" \
-H "Authorization: Bearer $HTTP_TOKEN"
```
6) Metrics:
```bash
curl -s http://localhost:8080/metrics
```
## Quick start (Docker Compose)
```bash
docker-compose up -d --build
# wait for healthy
docker inspect --format='{{json .State.Health}}' sojuboy | jq
```
Compose includes a healthcheck calling the binarys `--health` flag, which returns 0 only when `/ready` is 200.
## Configuration options
You can configure via a `.env` file or inline `environment:` in your compose YAML. Both approaches are shown below. Defaults for all variables are listed in the table after the examples.
### Option A: .env file (recommended for development)
Below shows maximum or large/reasonable values. Defaults are noted where they are also the maximum or when relevant.
```env
# soju / IRC
SOJU_HOST=bnc.example.org
SOJU_PORT=6697
SOJU_TLS=true
SOJU_NETWORK=your-network
# Client identity: include client suffix for per-client history in soju
IRC_NICK=yourNick
IRC_USERNAME=yourUser/your-network@sojuboy
IRC_REALNAME=Your Real Name
IRC_PASSWORD=yourSojuClientPassword
# Channels to auto-join (comma-separated)
CHANNELS=#animaniacs,#general
KEYWORDS=yourNick,YourCompany
# Auth method hint (raw is used; value is ignored but kept for compatibility)
SOJU_AUTH=raw
# Notifier (Pushover)
NOTIFIER=pushover
PUSHOVER_USER_KEY=your-pushover-user-key
PUSHOVER_API_TOKEN=your-pushover-app-token
# Summarizer (OpenAI)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-5
# Max completion (output) tokens for GPT5 is ~128k (model limit). Default 128000.
OPENAI_MAX_TOKENS=128000
# Summarizer tuning
SUMM_FOLLOW_LINKS=true # default true
SUMM_LINK_TIMEOUT=20s # default 20s
SUMM_LINK_MAX_BYTES=1048576 # default 1048576 (1 MiB/article)
SUMM_GROUP_WINDOW=120s # default 120s
SUMM_MAX_LINKS=20 # default 20
SUMM_MAX_GROUPS=20000 # default 0 (no cap); example large
SUMM_TIMEOUT=10m # request timeout; default 10m
# Digests
DIGEST_CRON=0 */6 * * * # every 6 hours
DIGEST_WINDOW=24h # default 24h
QUIET_HOURS= # e.g., 22:00-07:00
# Mentions/alerts
NOTIFY_BACKFILL=false # default false
MENTION_MIN_INTERVAL=30s # no hard max; rate-limit between alerts
MENTIONS_ONLY_CHANNELS= # optional allow-list (CSV)
MENTIONS_DENY_CHANNELS= # optional deny-list (CSV)
URGENT_KEYWORDS=urgent,priority # bypass quiet hours
# HTTP API
HTTP_LISTEN=:8080
HTTP_TOKEN=put-a-long-random-token-here
# Storage
STORE_PATH=/data/app.db
STORE_RETENTION_DAYS=365 # default 365
# Logging
LOG_LEVEL=info
```
Compose (with localhost bind suitable for Synology reverse proxy):
```yaml
services:
sojuboy:
image: code.cravey.net/your-user/sojuboy:v0.2.0-beta2
restart: unless-stopped
env_file: .env
ports:
- "127.0.0.1:8080:8080" # bind only to localhost; fronted by DSM Reverse Proxy
volumes:
- /volume1/docker/sojuboy/data:/data
healthcheck:
test: ["CMD", "/sojuboy", "--health"]
interval: 30s
timeout: 3s
retries: 3
```
### Option B: Inline environment in compose (no .env)
```yaml
services:
sojuboy:
image: code.cravey.net/your-user/sojuboy:v0.2.0-beta2
restart: unless-stopped
ports:
- "127.0.0.1:8080:8080" # bind only to localhost; fronted by DSM Reverse Proxy
volumes:
- /volume1/docker/sojuboy/data:/data
environment:
SOJU_HOST: "bnc.example.org" # default 127.0.0.1
SOJU_PORT: "6697" # default 6697
SOJU_TLS: "true" # default true
SOJU_NETWORK: "your-network" # default ""
IRC_NICK: "yourNick" # default sojuboy
IRC_USERNAME: "yourUser/your-network@sojuboy" # default IRC_NICK
IRC_REALNAME: "Your Real Name" # default sojuboy
IRC_PASSWORD: "yourSojuClientPassword" # default ""
CHANNELS: "#animaniacs,#general" # default "" (none)
KEYWORDS: "yourNick,YourCompany" # default IRC_NICK
SOJU_AUTH: "raw" # default sasl (hint only)
NOTIFIER: "pushover" # default pushover
PUSHOVER_USER_KEY: "..." # default ""
PUSHOVER_API_TOKEN: "..." # default ""
LLM_PROVIDER: "openai" # default openai
OPENAI_API_KEY: "sk-..." # default ""
OPENAI_BASE_URL: "https://api.openai.com/v1" # default ""
OPENAI_MODEL: "gpt-5" # default gpt-5
OPENAI_MAX_TOKENS: "128000" # default 128000
SUMM_FOLLOW_LINKS: "true" # default true
SUMM_LINK_TIMEOUT: "20s" # default 20s
SUMM_LINK_MAX_BYTES: "1048576" # default 1048576
SUMM_GROUP_WINDOW: "120s" # default 120s
SUMM_MAX_LINKS: "20" # default 20
SUMM_MAX_GROUPS: "20000" # default 0 (no cap)
SUMM_TIMEOUT: "10m" # default 10m
DIGEST_CRON: "0 */6 * * *" # default 0 */6 * * *
DIGEST_WINDOW: "24h" # default 24h
QUIET_HOURS: "" # default ""
NOTIFY_BACKFILL: "false" # default false
MENTION_MIN_INTERVAL: "30s" # default 30s
MENTIONS_ONLY_CHANNELS: "" # default ""
MENTIONS_DENY_CHANNELS: "" # default ""
URGENT_KEYWORDS: "urgent,priority" # default ""
HTTP_LISTEN: ":8080" # default :8080
HTTP_TOKEN: "<long-random-token>" # default ""
STORE_PATH: "/data/app.db" # default /data/app.db
STORE_RETENTION_DAYS: "365" # default 365
LOG_LEVEL: "info" # default info
healthcheck:
test: ["CMD", "/sojuboy", "--health"]
interval: 30s
timeout: 3s
retries: 3
```
### Defaults reference
| Variable | Default |
|---|---|
| SOJU_HOST | 127.0.0.1 |
| SOJU_PORT | 6697 |
| SOJU_TLS | true |
| IRC_NICK | sojuboy |
| IRC_USERNAME | IRC_NICK |
| IRC_REALNAME | sojuboy |
| IRC_PASSWORD | (empty) |
| SOJU_NETWORK | (empty) |
| CHANNELS | (empty) |
| KEYWORDS | IRC_NICK |
| SOJU_AUTH | sasl |
| NOTIFIER | pushover |
| PUSHOVER_USER_KEY | (empty) |
| PUSHOVER_API_TOKEN | (empty) |
| LLM_PROVIDER | openai |
| OPENAI_API_KEY | (empty) |
| OPENAI_BASE_URL | (empty) |
| OPENAI_MODEL | gpt-5 |
| OPENAI_MAX_TOKENS | 128000 |
| SUMM_FOLLOW_LINKS | true |
| SUMM_LINK_TIMEOUT | 20s |
| SUMM_LINK_MAX_BYTES | 1048576 |
| SUMM_GROUP_WINDOW | 120s |
| SUMM_MAX_LINKS | 20 |
| SUMM_MAX_GROUPS | 0 |
| SUMM_TIMEOUT | 10m |
| DIGEST_CRON | 0 */6 * * * |
| DIGEST_WINDOW | 24h |
| QUIET_HOURS | (empty) |
| NOTIFY_BACKFILL | false |
| MENTION_MIN_INTERVAL | 30s |
| MENTIONS_ONLY_CHANNELS | (empty) |
| MENTIONS_DENY_CHANNELS | (empty) |
| URGENT_KEYWORDS | (empty) |
| HTTP_LISTEN | :8080 |
| HTTP_TOKEN | (empty) |
| STORE_PATH | /data/app.db |
| STORE_RETENTION_DAYS | 365 |
| LOG_LEVEL | info |
## Pushover setup
1) Install Pushover iOS app and log in
2) Get your User Key (in the app or on the website)
3) Create an application at `pushover.net/apps/build` to get an API token
4) Put them in `.env` as `PUSHOVER_USER_KEY` and `PUSHOVER_API_TOKEN`
## OpenAI setup
- Set `OPENAI_API_KEY`
- Set `OPENAI_BASE_URL` to exactly `https://api.openai.com/v1`
- If `gpt-5` isnt available on your account, use a supported model like `gpt-4o-mini`
- GPT5 limits: ~272k input + 128k output tokens (400k context)
## HTTP API
- `GET /healthz``200 ok`
- `GET /tail?channel=%23chan&limit=50` (JSON)
- `GET /history?channel=%23chan&before=<RFC3339>&limit=50` (JSON)
- `GET /trigger?channel=%23chan&window=6h` (JSON)
- `GET /linkcard?url=…` (JSON)
- `GET /linksummary?url=…` (JSON)
- `GET /metrics`
## Troubleshooting
- Empty tail while theres activity
- Ensure the service logs readiness and joins for your channels
- Confirm `.env` `CHANNELS` contains your channels
- Check `/metrics` and logs for recent message ingestion
- 401 Unauthorized from UI/API
- Log in at `/login` with `HTTP_TOKEN`, or pass it via Bearer/`token=`
- OpenAI 502/URL errors
- Ensure `OPENAI_BASE_URL=https://api.openai.com/v1`
- Try `OPENAI_MODEL=gpt-4o-mini` if `gpt-5` isnt enabled for your account
## Roadmap
- Additional notifiers (ntfy, Telegram)
- Long-form HTML digest rendering
- Admin endpoints (e.g., `/join?channel=#chan`)
## Development notes
Project layout (selected):
- `cmd/sojuboy/main.go` entrypoint, wiring config/services
- `internal/soju` soju connector and ingestion
- `internal/store` SQLite schema and queries
- `internal/notifier` Pushover notifier
- `internal/summarizer` OpenAI client and prompts
- `internal/httpapi` UI and endpoints
- `internal/scheduler` cron jobs
Go toolchain: see `go.mod` (Go 1.23), Dockerfile builds static binary for a distroless image.
## License
MIT for code dependencies; this repositorys license will follow your preference (add a LICENSE if needed).