sojuboy/README.md

# sojuboy

An IRC bouncer companion service for soju that:

- Watches your bouncer-connected channels continuously
- Notifies you on mentions via Pushover (default)
- Stores messages in SQLite for summaries and on-demand inspection
- Generates AI digests (OpenAI by default) on schedule or on demand
- Exposes a small HTTP API for health, tailing messages, metrics, and triggering digests

Note: this is not a bot and never replies in IRC. It passively attaches as a soju multi-client on your main account.

## Why

If you use soju as a bouncer, you may want per-client alerts and AI summaries without running a heavy IRC client all the time. This service connects to soju as a distinct client identity (e.g., `username/network@client`) and handles notifications and summaries for you, containerized and easy to run on a Synology or any Docker host.

## High-level architecture

- Language: Go (single static binary, low memory footprint)
- Long-lived IRC client: raw IRC using a lightweight parser (sorcix/irc) with an irssi-style handshake tailored for soju
- Message storage: SQLite via modernc.org/sqlite
- Scheduling: github.com/robfig/cron/v3
- Notifications: github.com/gregdel/pushover
- Summarization (LLM): github.com/sashabaranov/go-openai
- HTTP API: Go stdlib `net/http`

Runtime modules:

- `internal/soju`: soju connection, capability negotiation, irssi-style PASS/USER auth, joins, message ingestion, event playback, CHATHISTORY fallback
- `internal/store`: SQLite schema and queries
- `internal/notifier`: Pushover notifier (pluggable interface)
- `internal/summarizer`: OpenAI client with GPT-5 defaults, GPT-4o-mini fallback
- `internal/scheduler`: cron-based digest scheduling and daily retention job
- `internal/httpapi`: `/healthz`, `/ready`, `/tail`, `/trigger`, `/metrics`
- `internal/config`: env config loader and helpers

## Features

- Mention/keyword detection: punctuation-tolerant (letters, digits, `_` and `-` are word chars)
- Mention tuning: allow/deny channels, urgent keywords bypass quiet hours, rate limiting
- AI digest generation: concise natural summaries (no rigid sections); integrates pasted multi-line posts and referenced link context; image links sent to GPT‑5 as vision inputs
- Configurable schedules (cron), quiet hours, and summary parameters
- Local persistence with retention pruning (daily at 03:00)
- HTTP endpoints: health, tail, metrics, on-demand digests

## How it works

1) The service connects to soju and negotiates IRCv3 capabilities:
   - Requests: `server-time`, `message-tags`, `batch`, `cap-notify`, `echo-message`, `draft/event-playback`; optional fallback `draft/chathistory` when needed
   - Joins happen after numeric 001 (welcome)

2) Authentication:
   - PASS then irssi-style `USER <username/network@client> <same> <host> :<realname>`
   - Soju’s per-client identity preserves distinct history

3) Playback and backfill:
   - If `draft/event-playback` is enabled, soju replays missed messages automatically
   - Optional fallback: `CHATHISTORY LATEST <channel> timestamp=<RFC3339Nano> <limit>` using the last stored timestamp per channel (disabled by default)

4) Messages and mentions:
   - Each `PRIVMSG` is stored with server-time when available
   - Mentions trigger Pushover notifications subject to quiet hours, urgency, and rate limits
   - Debug logs include: mention delivered or suppression reason (backfill, quiet hours, rate limit)

5) Summarization:
   - `/trigger` or the scheduler loads a window and calls OpenAI
   - GPT‑5 context: ~272k input tokens + up to 128k output tokens (400k total)
   - Summaries are concise/natural and integrate multi-line posts, article text (readability-extracted), and image links (vision)

6) HTTP API:
   - `/healthz` → `200 ok`
   - `/ready` → `200` only when connected to soju
   - `/tail?channel=#chan&limit=N` → plaintext tail (chronological)
   - `/trigger?channel=#chan&window=6h` → returns digest and sends via notifier
   - `/metrics` → Prometheus text metrics
   - Protect `/tail` and `/trigger` with `HTTP_TOKEN` via Bearer, `token` query, `X-Auth-Token`, or basic auth (`token:<HTTP_TOKEN>`)

## Health and readiness

- `/healthz` always returns 200
- `/ready` returns 200 only when connected to soju
- Binary supports `--health` to perform a local readiness check and exit 0/1. Example Docker healthcheck:

```yaml
healthcheck:
  test: ["CMD", "/sojuboy", "--health"]
  interval: 30s
  timeout: 3s
  retries: 3
```

## Installation

### Prerequisites

- Docker (or Synology Container Manager)
- A soju bouncer you can connect to
- Pushover account and app token (for push)
- OpenAI API key (for AI summaries)

### Build and run (Docker Compose)

1) Create `.env` in repo root (see example below)

2) Start:

```bash
docker-compose up -d --build
```

3) Health check:

```bash
curl -s http://localhost:8080/healthz
```

4) Tail last messages (remember to URL-encode `#` as `%23`):

```bash
curl -s "http://localhost:8080/tail?channel=%23animaniacs&limit=50" \
  -H "Authorization: Bearer $HTTP_TOKEN"
```

5) Trigger a digest for the last 6 hours:

```bash
curl -s "http://localhost:8080/trigger?channel=%23animaniacs&window=6h" \
  -H "Authorization: Bearer $HTTP_TOKEN"
```

6) Metrics:

```bash
curl -s http://localhost:8080/metrics
```

## Quick start (Docker Compose)

```bash
docker-compose up -d --build
# wait for healthy
docker inspect --format='{{json .State.Health}}' sojuboy | jq
```

Compose includes a healthcheck calling the binary’s `--health` flag, which returns 0 only when `/ready` is 200.

## Configuration options

You can configure via a `.env` file or inline `environment:` in your compose YAML. Both approaches are shown below. Defaults for all variables are listed in the table after the examples.

### Option A: .env file (recommended for development)

Below shows maximum or large/reasonable values. Defaults are noted where they are also the maximum or when relevant.

```env
# soju / IRC
SOJU_HOST=bnc.example.org
SOJU_PORT=6697
SOJU_TLS=true
SOJU_NETWORK=your-network

# Client identity: include client suffix for per-client history in soju
IRC_NICK=yourNick
IRC_USERNAME=yourUser/your-network@sojuboy
IRC_REALNAME=Your Real Name
IRC_PASSWORD=yourSojuClientPassword

# Channels to auto-join (comma-separated)
CHANNELS=#animaniacs,#general
KEYWORDS=yourNick,YourCompany

# Auth method hint (raw is used; value is ignored but kept for compatibility)
SOJU_AUTH=raw

# Notifier (Pushover)
NOTIFIER=pushover
PUSHOVER_USER_KEY=your-pushover-user-key
PUSHOVER_API_TOKEN=your-pushover-app-token

# Summarizer (OpenAI)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-5
# Max completion (output) tokens for GPT‑5 is ~128k (model limit). Default 128000.
OPENAI_MAX_TOKENS=128000
# Summarizer tuning
SUMM_FOLLOW_LINKS=true            # default true
SUMM_LINK_TIMEOUT=20s             # default 20s
SUMM_LINK_MAX_BYTES=1048576       # default 1048576 (1 MiB/article)
SUMM_GROUP_WINDOW=120s            # default 120s
SUMM_MAX_LINKS=20                 # default 20
SUMM_MAX_GROUPS=20000             # default 0 (no cap); example large
SUMM_TIMEOUT=10m                  # request timeout; default 10m

# Digests
DIGEST_CRON=0 */6 * * *           # every 6 hours
DIGEST_WINDOW=24h                 # default 24h
QUIET_HOURS=                      # e.g., 22:00-07:00

# Mentions/alerts
NOTIFY_BACKFILL=false             # default false
MENTION_MIN_INTERVAL=30s          # no hard max; rate-limit between alerts
MENTIONS_ONLY_CHANNELS=           # optional allow-list (CSV)
MENTIONS_DENY_CHANNELS=           # optional deny-list (CSV)
URGENT_KEYWORDS=urgent,priority   # bypass quiet hours

# HTTP API
HTTP_LISTEN=:8080
HTTP_TOKEN=put-a-long-random-token-here

# Storage
STORE_PATH=/data/app.db
STORE_RETENTION_DAYS=365          # default 365

# Logging
LOG_LEVEL=info
```

Compose (with localhost bind suitable for Synology reverse proxy):

```yaml
services:
  sojuboy:
    image: code.cravey.net/your-user/sojuboy:v0.1.0-beta1
    restart: unless-stopped
    env_file: .env
    ports:
      - "127.0.0.1:8080:8080"  # bind only to localhost; fronted by DSM Reverse Proxy
    volumes:
      - /volume1/docker/sojuboy/data:/data
    healthcheck:
      test: ["CMD", "/sojuboy", "--health"]
      interval: 30s
      timeout: 3s
      retries: 3
```

### Option B: Inline environment in compose (no .env)

```yaml
services:
  sojuboy:
    image: code.cravey.net/your-user/sojuboy:v0.1.0-beta1
    restart: unless-stopped
    ports:
      - "127.0.0.1:8080:8080"  # bind only to localhost; fronted by DSM Reverse Proxy
    volumes:
      - /volume1/docker/sojuboy/data:/data
    environment:
      SOJU_HOST: "bnc.example.org"           # default 127.0.0.1
      SOJU_PORT: "6697"                      # default 6697
      SOJU_TLS: "true"                       # default true
      SOJU_NETWORK: "your-network"           # default ""
      IRC_NICK: "yourNick"                   # default sojuboy
      IRC_USERNAME: "yourUser/your-network@sojuboy"  # default IRC_NICK
      IRC_REALNAME: "Your Real Name"         # default sojuboy
      IRC_PASSWORD: "yourSojuClientPassword" # default ""
      CHANNELS: "#animaniacs,#general"       # default "" (none)
      KEYWORDS: "yourNick,YourCompany"       # default IRC_NICK
      SOJU_AUTH: "raw"                        # default sasl (hint only)
      NOTIFIER: "pushover"                   # default pushover
      PUSHOVER_USER_KEY: "..."               # default ""
      PUSHOVER_API_TOKEN: "..."              # default ""
      LLM_PROVIDER: "openai"                 # default openai
      OPENAI_API_KEY: "sk-..."               # default ""
      OPENAI_BASE_URL: "https://api.openai.com/v1"  # default ""
      OPENAI_MODEL: "gpt-5"                  # default gpt-5
      OPENAI_MAX_TOKENS: "128000"            # default 128000
      SUMM_FOLLOW_LINKS: "true"              # default true
      SUMM_LINK_TIMEOUT: "20s"               # default 20s
      SUMM_LINK_MAX_BYTES: "1048576"         # default 1048576
      SUMM_GROUP_WINDOW: "120s"              # default 120s
      SUMM_MAX_LINKS: "20"                   # default 20
      SUMM_MAX_GROUPS: "20000"               # default 0 (no cap)
      SUMM_TIMEOUT: "10m"                    # default 10m
      DIGEST_CRON: "0 */6 * * *"             # default 0 */6 * * *
      DIGEST_WINDOW: "24h"                    # default 24h
      QUIET_HOURS: ""                         # default ""
      NOTIFY_BACKFILL: "false"               # default false
      MENTION_MIN_INTERVAL: "30s"            # default 30s
      MENTIONS_ONLY_CHANNELS: ""             # default ""
      MENTIONS_DENY_CHANNELS: ""             # default ""
      URGENT_KEYWORDS: "urgent,priority"     # default ""
      HTTP_LISTEN: ":8080"                   # default :8080
      HTTP_TOKEN: "<long-random-token>"      # default ""
      STORE_PATH: "/data/app.db"             # default /data/app.db
      STORE_RETENTION_DAYS: "365"            # default 365
      LOG_LEVEL: "info"                      # default info
    healthcheck:
      test: ["CMD", "/sojuboy", "--health"]
      interval: 30s
      timeout: 3s
      retries: 3
```

### Defaults reference

| Variable | Default |
|---|---|
| SOJU_HOST | 127.0.0.1 |
| SOJU_PORT | 6697 |
| SOJU_TLS | true |
| IRC_NICK | sojuboy |
| IRC_USERNAME | IRC_NICK |
| IRC_REALNAME | sojuboy |
| IRC_PASSWORD | (empty) |
| SOJU_NETWORK | (empty) |
| CHANNELS | (empty) |
| KEYWORDS | IRC_NICK |
| SOJU_AUTH | sasl |
| NOTIFIER | pushover |
| PUSHOVER_USER_KEY | (empty) |
| PUSHOVER_API_TOKEN | (empty) |
| LLM_PROVIDER | openai |
| OPENAI_API_KEY | (empty) |
| OPENAI_BASE_URL | (empty) |
| OPENAI_MODEL | gpt-5 |
| OPENAI_MAX_TOKENS | 700 |
| SUMM_FOLLOW_LINKS | true |
| SUMM_LINK_TIMEOUT | 6s |
| SUMM_LINK_MAX_BYTES | 262144 |
| SUMM_GROUP_WINDOW | 90s |
| SUMM_MAX_LINKS | 5 |
| SUMM_MAX_GROUPS | 0 |
| SUMM_TIMEOUT | 5m |
| DIGEST_CRON | 0 */6 * * * |
| DIGEST_WINDOW | 6h |
| QUIET_HOURS | (empty) |
| NOTIFY_BACKFILL | false |
| MENTION_MIN_INTERVAL | 30s |
| MENTIONS_ONLY_CHANNELS | (empty) |
| MENTIONS_DENY_CHANNELS | (empty) |
| URGENT_KEYWORDS | (empty) |
| HTTP_LISTEN | :8080 |
| HTTP_TOKEN | (empty) |
| STORE_PATH | /data/app.db |
| STORE_RETENTION_DAYS | 7 |
| LOG_LEVEL | info |

## Pushover setup

1) Install Pushover iOS app and log in
2) Get your User Key (in the app or on the website)
3) Create an application at `pushover.net/apps/build` to get an API token
4) Put them in `.env` as `PUSHOVER_USER_KEY` and `PUSHOVER_API_TOKEN`

## OpenAI setup

- Set `OPENAI_API_KEY`
- Set `OPENAI_BASE_URL` to exactly `https://api.openai.com/v1`
- If `gpt-5` isn’t available on your account, use a supported model like `gpt-4o-mini`
- GPT‑5 limits: ~272k input + 128k output tokens (400k context)

## HTTP API

- `GET /healthz` → `200 ok`
- `GET /tail?channel=%23chan&limit=50`
  - Returns plaintext messages (chronological)
  - Auth: provide `HTTP_TOKEN` as a Bearer token (or query param `token=`)
- `GET /trigger?channel=%23chan&window=6h`
  - Returns plaintext digest
  - Also sends via notifier when configured
  - Auth as above
- `GET /metrics`
  - Prometheus metrics: `sojuboy_messages_ingested_total`, `sojuboy_notifications_sent_total`, `sojuboy_messages_pruned_total`, `sojuboy_connected`

## Troubleshooting

- Empty tail while there’s activity
  - Ensure the service logs `join requested:` followed by `joined` for your channels
  - Confirm `.env` `CHANNELS` contains your channels
  - Check for `/metrics` and logs for recent message ingestion

- 401 Unauthorized from `/tail` or `/trigger`
  - Provide `Authorization: Bearer $HTTP_TOKEN` or `?token=$HTTP_TOKEN`

- OpenAI 502/URL errors
  - Ensure `OPENAI_BASE_URL=https://api.openai.com/v1`
  - Try `OPENAI_MODEL=gpt-4o-mini` if `gpt-5` isn’t enabled for your account

## Roadmap

- Additional notifiers (ntfy, Telegram)
- Long-form HTML digest rendering
- Admin endpoints (e.g., `/join?channel=#chan`)

## Development notes

Project layout (selected):

- `cmd/sojuboy/main.go` – entrypoint, wiring config/services
- `internal/soju` – soju connector and ingestion
- `internal/store` – SQLite schema and queries
- `internal/notifier` – Pushover notifier
- `internal/summarizer` – OpenAI client and prompts
- `internal/httpapi` – health, tail, trigger, metrics endpoints
- `internal/scheduler` – cron jobs

Go toolchain: see `go.mod` (Go 1.23), Dockerfile builds static binary for a distroless image.

## License

MIT for code dependencies; this repository’s license will follow your preference (add a LICENSE if needed).
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
+								# sojuboy
 								An IRC bouncer companion service for soju that:
 								- Watches your bouncer-connected channels continuously
 								- Notifies you on mentions via Pushover (default)
 								- Stores messages in SQLite for summaries and on-demand inspection
 								- Generates AI digests (OpenAI by default) on schedule or on demand
 								- Exposes a small HTTP API for health, tailing messages, metrics, and triggering digests
 								Note: this is not a bot and never replies in IRC. It passively attaches as a soju multi-client on your main account.
 								## Why
 								If you use soju as a bouncer, you may want per-client alerts and AI summaries without running a heavy IRC client all the time. This service connects to soju as a distinct client identity (e.g., `username/network@client`) and handles notifications and summaries for you, containerized and easy to run on a Synology or any Docker host.
 								## High-level architecture
 								- Language: Go (single static binary, low memory footprint)
 								- Long-lived IRC client: raw IRC using a lightweight parser (sorcix/irc) with an irssi-style handshake tailored for soju
 								- Message storage: SQLite via modernc.org/sqlite
 								- Scheduling: github.com/robfig/cron/v3
 								- Notifications: github.com/gregdel/pushover
 								- Summarization (LLM): github.com/sashabaranov/go-openai
 								- HTTP API: Go stdlib `net/http`
 								Runtime modules:
 								- `internal/soju`: soju connection, capability negotiation, irssi-style PASS/USER auth, joins, message ingestion, event playback, CHATHISTORY fallback
 								- `internal/store`: SQLite schema and queries
 								- `internal/notifier`: Pushover notifier (pluggable interface)
 								- `internal/summarizer`: OpenAI client with GPT-5 defaults, GPT-4o-mini fallback
 								- `internal/scheduler`: cron-based digest scheduling and daily retention job
-												docs: expand .env example to show max/large values; add SUMM_TIMEOUT and summarizer tunables\n\nfeat: summarizer improvements\n- readability extraction for articles\n- image links passed to model as vision inputs\n- configurable max groups/links/bytes and timeout\n- higher default ceilings; resilient fallback summary

											
										
										
											2025-08-15 20:41:31 -05:00
+								- `internal/httpapi`: `/healthz`, `/ready`, `/tail`, `/trigger`, `/metrics`
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
+								- `internal/config`: env config loader and helpers
 								## Features
 								- Mention/keyword detection: punctuation-tolerant (letters, digits, `_` and `-` are word chars)
 								- Mention tuning: allow/deny channels, urgent keywords bypass quiet hours, rate limiting
-												docs: expand .env example to show max/large values; add SUMM_TIMEOUT and summarizer tunables\n\nfeat: summarizer improvements\n- readability extraction for articles\n- image links passed to model as vision inputs\n- configurable max groups/links/bytes and timeout\n- higher default ceilings; resilient fallback summary

											
										
										
											2025-08-15 20:41:31 -05:00
+								- AI digest generation: concise natural summaries (no rigid sections); integrates pasted multi-line posts and referenced link context; image links sent to GPT‑5 as vision inputs
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
+								- Configurable schedules (cron), quiet hours, and summary parameters
 								- Local persistence with retention pruning (daily at 03:00)
 								- HTTP endpoints: health, tail, metrics, on-demand digests
 								## How it works
 ) The service connects to soju and negotiates IRCv3 capabilities:
 								   - Requests: `server-time`, `message-tags`, `batch`, `cap-notify`, `echo-message`, `draft/event-playback`; optional fallback `draft/chathistory` when needed
 								   - Joins happen after numeric 001 (welcome)
 ) Authentication:
 								   - PASS then irssi-style `USER <username/network@client> <same> <host> :<realname>`
 								   - Soju’s per-client identity preserves distinct history
 ) Playback and backfill:
 								   - If `draft/event-playback` is enabled, soju replays missed messages automatically
 								   - Optional fallback: `CHATHISTORY LATEST <channel> timestamp=<RFC3339Nano> <limit>` using the last stored timestamp per channel (disabled by default)
 ) Messages and mentions:
 								   - Each `PRIVMSG` is stored with server-time when available
 								   - Mentions trigger Pushover notifications subject to quiet hours, urgency, and rate limits
 								   - Debug logs include: mention delivered or suppression reason (backfill, quiet hours, rate limit)
 ) Summarization:
-												docs: expand .env example to show max/large values; add SUMM_TIMEOUT and summarizer tunables\n\nfeat: summarizer improvements\n- readability extraction for articles\n- image links passed to model as vision inputs\n- configurable max groups/links/bytes and timeout\n- higher default ceilings; resilient fallback summary

											
										
										
											2025-08-15 20:41:31 -05:00
+								   - `/trigger` or the scheduler loads a window and calls OpenAI
 								   - GPT‑5 context: ~272k input tokens + up to 128k output tokens (400k total)
 								   - Summaries are concise/natural and integrate multi-line posts, article text (readability-extracted), and image links (vision)
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
 ) HTTP API:
 								   - `/healthz` → `200 ok`
 								   - `/ready` → `200` only when connected to soju
 								   - `/tail?channel=#chan&limit=N` → plaintext tail (chronological)
 								   - `/trigger?channel=#chan&window=6h` → returns digest and sends via notifier
 								   - `/metrics` → Prometheus text metrics
 								   - Protect `/tail` and `/trigger` with `HTTP_TOKEN` via Bearer, `token` query, `X-Auth-Token`, or basic auth (`token:<HTTP_TOKEN>`)
 								## Health and readiness
 								- `/healthz` always returns 200
 								- `/ready` returns 200 only when connected to soju
 								- Binary supports `--health` to perform a local readiness check and exit 0/1. Example Docker healthcheck:
 								```yaml
 								healthcheck:
-												docs: expand .env example to show max/large values; add SUMM_TIMEOUT and summarizer tunables\n\nfeat: summarizer improvements\n- readability extraction for articles\n- image links passed to model as vision inputs\n- configurable max groups/links/bytes and timeout\n- higher default ceilings; resilient fallback summary

											
										
										
											2025-08-15 20:41:31 -05:00
+								  test: ["CMD", "/sojuboy", "--health"]
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
+								  interval: 30s
 								  timeout: 3s
 								  retries: 3
 								```
 								## Installation
 								### Prerequisites
 								- Docker (or Synology Container Manager)
 								- A soju bouncer you can connect to
 								- Pushover account and app token (for push)
 								- OpenAI API key (for AI summaries)
 								### Build and run (Docker Compose)
 ) Create `.env` in repo root (see example below)
 ) Start:
 								```bash
 								docker-compose up -d --build
 								```
 ) Health check:
 								```bash
 								curl -s http://localhost:8080/healthz
 								```
 ) Tail last messages (remember to URL-encode `#` as `%23`):
 								```bash
 								curl -s "http://localhost:8080/tail?channel=%23animaniacs&limit=50" \
 								  -H "Authorization: Bearer $HTTP_TOKEN"
 								```
 ) Trigger a digest for the last 6 hours:
 								```bash
 								curl -s "http://localhost:8080/trigger?channel=%23animaniacs&window=6h" \
 								  -H "Authorization: Bearer $HTTP_TOKEN"
 								```
 ) Metrics:
 								```bash
 								curl -s http://localhost:8080/metrics
 								```
 								## Quick start (Docker Compose)
 								```bash
 								docker-compose up -d --build
 								# wait for healthy
 								docker inspect --format='{{json .State.Health}}' sojuboy | jq
 								```
 								Compose includes a healthcheck calling the binary’s `--health` flag, which returns 0 only when `/ready` is 200.
-												defaults: raise max/defaults (OPENAI_MAX_TOKENS=128000, larger summarizer timeouts/limits, DIGEST_WINDOW=24h, RETENTION=365); docs: add inline env option, defaults table; compose: bind 127.0.0.1:8080

											
										
										
											2025-08-16 12:29:58 -05:00
+								## Configuration options
 								You can configure via a `.env` file or inline `environment:` in your compose YAML. Both approaches are shown below. Defaults for all variables are listed in the table after the examples.
 								### Option A: .env file (recommended for development)
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
-												docs: expand .env example to show max/large values; add SUMM_TIMEOUT and summarizer tunables\n\nfeat: summarizer improvements\n- readability extraction for articles\n- image links passed to model as vision inputs\n- configurable max groups/links/bytes and timeout\n- higher default ceilings; resilient fallback summary

											
										
										
											2025-08-15 20:41:31 -05:00
+								Below shows maximum or large/reasonable values. Defaults are noted where they are also the maximum or when relevant.
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
+								```env
 								# soju / IRC
 								SOJU_HOST=bnc.example.org
 								SOJU_PORT=6697
 								SOJU_TLS=true
 								SOJU_NETWORK=your-network
 								# Client identity: include client suffix for per-client history in soju
 								IRC_NICK=yourNick
 								IRC_USERNAME=yourUser/your-network@sojuboy
 								IRC_REALNAME=Your Real Name
 								IRC_PASSWORD=yourSojuClientPassword
 								# Channels to auto-join (comma-separated)
 								CHANNELS=#animaniacs,#general
 								KEYWORDS=yourNick,YourCompany
 								# Auth method hint (raw is used; value is ignored but kept for compatibility)
 								SOJU_AUTH=raw
 								# Notifier (Pushover)
 								NOTIFIER=pushover
 								PUSHOVER_USER_KEY=your-pushover-user-key
 								PUSHOVER_API_TOKEN=your-pushover-app-token
 								# Summarizer (OpenAI)
 								LLM_PROVIDER=openai
 								OPENAI_API_KEY=sk-...
 								OPENAI_BASE_URL=https://api.openai.com/v1
 								OPENAI_MODEL=gpt-5
-												defaults: raise max/defaults (OPENAI_MAX_TOKENS=128000, larger summarizer timeouts/limits, DIGEST_WINDOW=24h, RETENTION=365); docs: add inline env option, defaults table; compose: bind 127.0.0.1:8080

											
										
										
											2025-08-16 12:29:58 -05:00
+								# Max completion (output) tokens for GPT‑5 is ~128k (model limit). Default 128000.
-												docs: expand .env example to show max/large values; add SUMM_TIMEOUT and summarizer tunables\n\nfeat: summarizer improvements\n- readability extraction for articles\n- image links passed to model as vision inputs\n- configurable max groups/links/bytes and timeout\n- higher default ceilings; resilient fallback summary

											
										
										
											2025-08-15 20:41:31 -05:00
+								OPENAI_MAX_TOKENS=128000
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
+								# Summarizer tuning
-												docs: expand .env example to show max/large values; add SUMM_TIMEOUT and summarizer tunables\n\nfeat: summarizer improvements\n- readability extraction for articles\n- image links passed to model as vision inputs\n- configurable max groups/links/bytes and timeout\n- higher default ceilings; resilient fallback summary

											
										
										
											2025-08-15 20:41:31 -05:00
+								SUMM_FOLLOW_LINKS=true            # default true
-												defaults: raise max/defaults (OPENAI_MAX_TOKENS=128000, larger summarizer timeouts/limits, DIGEST_WINDOW=24h, RETENTION=365); docs: add inline env option, defaults table; compose: bind 127.0.0.1:8080

											
										
										
											2025-08-16 12:29:58 -05:00
+								SUMM_LINK_TIMEOUT=20s             # default 20s
 								SUMM_LINK_MAX_BYTES=1048576       # default 1048576 (1 MiB/article)
 								SUMM_GROUP_WINDOW=120s            # default 120s
 								SUMM_MAX_LINKS=20                 # default 20
 								SUMM_MAX_GROUPS=20000             # default 0 (no cap); example large
 								SUMM_TIMEOUT=10m                  # request timeout; default 10m
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
 								# Digests
-												docs: expand .env example to show max/large values; add SUMM_TIMEOUT and summarizer tunables\n\nfeat: summarizer improvements\n- readability extraction for articles\n- image links passed to model as vision inputs\n- configurable max groups/links/bytes and timeout\n- higher default ceilings; resilient fallback summary

											
										
										
											2025-08-15 20:41:31 -05:00
+								DIGEST_CRON=0 */6 * * *           # every 6 hours
-												defaults: raise max/defaults (OPENAI_MAX_TOKENS=128000, larger summarizer timeouts/limits, DIGEST_WINDOW=24h, RETENTION=365); docs: add inline env option, defaults table; compose: bind 127.0.0.1:8080

											
										
										
											2025-08-16 12:29:58 -05:00
+								DIGEST_WINDOW=24h                 # default 24h
-												docs: expand .env example to show max/large values; add SUMM_TIMEOUT and summarizer tunables\n\nfeat: summarizer improvements\n- readability extraction for articles\n- image links passed to model as vision inputs\n- configurable max groups/links/bytes and timeout\n- higher default ceilings; resilient fallback summary

											
										
										
											2025-08-15 20:41:31 -05:00
+								QUIET_HOURS=                      # e.g., 22:00-07:00
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
 								# Mentions/alerts
-												docs: expand .env example to show max/large values; add SUMM_TIMEOUT and summarizer tunables\n\nfeat: summarizer improvements\n- readability extraction for articles\n- image links passed to model as vision inputs\n- configurable max groups/links/bytes and timeout\n- higher default ceilings; resilient fallback summary

											
										
										
											2025-08-15 20:41:31 -05:00
+								NOTIFY_BACKFILL=false             # default false
 								MENTION_MIN_INTERVAL=30s          # no hard max; rate-limit between alerts
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
+								MENTIONS_ONLY_CHANNELS=           # optional allow-list (CSV)
 								MENTIONS_DENY_CHANNELS=           # optional deny-list (CSV)
 								URGENT_KEYWORDS=urgent,priority   # bypass quiet hours
 								# HTTP API
 								HTTP_LISTEN=:8080
 								HTTP_TOKEN=put-a-long-random-token-here
 								# Storage
 								STORE_PATH=/data/app.db
-												defaults: raise max/defaults (OPENAI_MAX_TOKENS=128000, larger summarizer timeouts/limits, DIGEST_WINDOW=24h, RETENTION=365); docs: add inline env option, defaults table; compose: bind 127.0.0.1:8080

											
										
										
											2025-08-16 12:29:58 -05:00
+								STORE_RETENTION_DAYS=365          # default 365
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
 								# Logging
 								LOG_LEVEL=info
 								```
-												defaults: raise max/defaults (OPENAI_MAX_TOKENS=128000, larger summarizer timeouts/limits, DIGEST_WINDOW=24h, RETENTION=365); docs: add inline env option, defaults table; compose: bind 127.0.0.1:8080

											
										
										
											2025-08-16 12:29:58 -05:00
+								Compose (with localhost bind suitable for Synology reverse proxy):
 								```yaml
 								services:
 								  sojuboy:
 								    image: code.cravey.net/your-user/sojuboy:v0.1.0-beta1
 								    restart: unless-stopped
 								    env_file: .env
 								    ports:
 								      - "127.0.0.1:8080:8080"  # bind only to localhost; fronted by DSM Reverse Proxy
 								    volumes:
 								      - /volume1/docker/sojuboy/data:/data
 								    healthcheck:
 								      test: ["CMD", "/sojuboy", "--health"]
 								      interval: 30s
 								      timeout: 3s
 								      retries: 3
 								```
 								### Option B: Inline environment in compose (no .env)
 								```yaml
 								services:
 								  sojuboy:
 								    image: code.cravey.net/your-user/sojuboy:v0.1.0-beta1
 								    restart: unless-stopped
 								    ports:
 								      - "127.0.0.1:8080:8080"  # bind only to localhost; fronted by DSM Reverse Proxy
 								    volumes:
 								      - /volume1/docker/sojuboy/data:/data
 								    environment:
 								      SOJU_HOST: "bnc.example.org"           # default 127.0.0.1
 								      SOJU_PORT: "6697"                      # default 6697
 								      SOJU_TLS: "true"                       # default true
 								      SOJU_NETWORK: "your-network"           # default ""
 								      IRC_NICK: "yourNick"                   # default sojuboy
 								      IRC_USERNAME: "yourUser/your-network@sojuboy"  # default IRC_NICK
 								      IRC_REALNAME: "Your Real Name"         # default sojuboy
 								      IRC_PASSWORD: "yourSojuClientPassword" # default ""
 								      CHANNELS: "#animaniacs,#general"       # default "" (none)
 								      KEYWORDS: "yourNick,YourCompany"       # default IRC_NICK
 								      SOJU_AUTH: "raw"                        # default sasl (hint only)
 								      NOTIFIER: "pushover"                   # default pushover
 								      PUSHOVER_USER_KEY: "..."               # default ""
 								      PUSHOVER_API_TOKEN: "..."              # default ""
 								      LLM_PROVIDER: "openai"                 # default openai
 								      OPENAI_API_KEY: "sk-..."               # default ""
 								      OPENAI_BASE_URL: "https://api.openai.com/v1"  # default ""
 								      OPENAI_MODEL: "gpt-5"                  # default gpt-5
 								      OPENAI_MAX_TOKENS: "128000"            # default 128000
 								      SUMM_FOLLOW_LINKS: "true"              # default true
 								      SUMM_LINK_TIMEOUT: "20s"               # default 20s
 								      SUMM_LINK_MAX_BYTES: "1048576"         # default 1048576
 								      SUMM_GROUP_WINDOW: "120s"              # default 120s
 								      SUMM_MAX_LINKS: "20"                   # default 20
 								      SUMM_MAX_GROUPS: "20000"               # default 0 (no cap)
 								      SUMM_TIMEOUT: "10m"                    # default 10m
 								      DIGEST_CRON: "0 */6 * * *"             # default 0 */6 * * *
 								      DIGEST_WINDOW: "24h"                    # default 24h
 								      QUIET_HOURS: ""                         # default ""
 								      NOTIFY_BACKFILL: "false"               # default false
 								      MENTION_MIN_INTERVAL: "30s"            # default 30s
 								      MENTIONS_ONLY_CHANNELS: ""             # default ""
 								      MENTIONS_DENY_CHANNELS: ""             # default ""
 								      URGENT_KEYWORDS: "urgent,priority"     # default ""
 								      HTTP_LISTEN: ":8080"                   # default :8080
 								      HTTP_TOKEN: "<long-random-token>"      # default ""
 								      STORE_PATH: "/data/app.db"             # default /data/app.db
 								      STORE_RETENTION_DAYS: "365"            # default 365
 								      LOG_LEVEL: "info"                      # default info
 								    healthcheck:
 								      test: ["CMD", "/sojuboy", "--health"]
 								      interval: 30s
 								      timeout: 3s
 								      retries: 3
 								```
 								### Defaults reference
 								| Variable | Default |
 								|---|---|
 								| SOJU_HOST | 127.0.0.1 |
 								| SOJU_PORT | 6697 |
 								| SOJU_TLS | true |
 								| IRC_NICK | sojuboy |
 								| IRC_USERNAME | IRC_NICK |
 								| IRC_REALNAME | sojuboy |
 								| IRC_PASSWORD | (empty) |
 								| SOJU_NETWORK | (empty) |
 								| CHANNELS | (empty) |
 								| KEYWORDS | IRC_NICK |
 								| SOJU_AUTH | sasl |
 								| NOTIFIER | pushover |
 								| PUSHOVER_USER_KEY | (empty) |
 								| PUSHOVER_API_TOKEN | (empty) |
 								| LLM_PROVIDER | openai |
 								| OPENAI_API_KEY | (empty) |
 								| OPENAI_BASE_URL | (empty) |
 								| OPENAI_MODEL | gpt-5 |
 								| OPENAI_MAX_TOKENS | 700 |
 								| SUMM_FOLLOW_LINKS | true |
 								| SUMM_LINK_TIMEOUT | 6s |
 								| SUMM_LINK_MAX_BYTES | 262144 |
 								| SUMM_GROUP_WINDOW | 90s |
 								| SUMM_MAX_LINKS | 5 |
 								| SUMM_MAX_GROUPS | 0 |
 								| SUMM_TIMEOUT | 5m |
 								| DIGEST_CRON | 0 */6 * * * |
 								| DIGEST_WINDOW | 6h |
 								| QUIET_HOURS | (empty) |
 								| NOTIFY_BACKFILL | false |
 								| MENTION_MIN_INTERVAL | 30s |
 								| MENTIONS_ONLY_CHANNELS | (empty) |
 								| MENTIONS_DENY_CHANNELS | (empty) |
 								| URGENT_KEYWORDS | (empty) |
 								| HTTP_LISTEN | :8080 |
 								| HTTP_TOKEN | (empty) |
 								| STORE_PATH | /data/app.db |
 								| STORE_RETENTION_DAYS | 7 |
 								| LOG_LEVEL | info |
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
+								## Pushover setup
 ) Install Pushover iOS app and log in
 ) Get your User Key (in the app or on the website)
 ) Create an application at `pushover.net/apps/build` to get an API token
 ) Put them in `.env` as `PUSHOVER_USER_KEY` and `PUSHOVER_API_TOKEN`
 								## OpenAI setup
 								- Set `OPENAI_API_KEY`
 								- Set `OPENAI_BASE_URL` to exactly `https://api.openai.com/v1`
 								- If `gpt-5` isn’t available on your account, use a supported model like `gpt-4o-mini`
-												docs: expand .env example to show max/large values; add SUMM_TIMEOUT and summarizer tunables\n\nfeat: summarizer improvements\n- readability extraction for articles\n- image links passed to model as vision inputs\n- configurable max groups/links/bytes and timeout\n- higher default ceilings; resilient fallback summary

											
										
										
											2025-08-15 20:41:31 -05:00
+								- GPT‑5 limits: ~272k input + 128k output tokens (400k context)
-												feat: initial Beta 1 release

- soju raw connector with event playback and CHATHISTORY fallback
- SQLite store with msgid de-dup and retention job
- Mentions + Pushover + tuning; structured JSON logs
- Summaries: concise, link-following, multi-line grouping
- HTTP: /healthz, /ready, /tail, /trigger, /metrics
- Docker: distroless, healthcheck, version metadata
- Docs: README, CHANGELOG, compose

											
										
										
											2025-08-15 18:06:28 -05:00
 								## HTTP API
 								- `GET /healthz` → `200 ok`
 								- `GET /tail?channel=%23chan&limit=50`
 								  - Returns plaintext messages (chronological)
 								  - Auth: provide `HTTP_TOKEN` as a Bearer token (or query param `token=`)
 								- `GET /trigger?channel=%23chan&window=6h`
 								  - Returns plaintext digest
 								  - Also sends via notifier when configured
 								  - Auth as above
 								- `GET /metrics`
 								  - Prometheus metrics: `sojuboy_messages_ingested_total`, `sojuboy_notifications_sent_total`, `sojuboy_messages_pruned_total`, `sojuboy_connected`
 								## Troubleshooting
 								- Empty tail while there’s activity
 								  - Ensure the service logs `join requested:` followed by `joined` for your channels
 								  - Confirm `.env` `CHANNELS` contains your channels
 								  - Check for `/metrics` and logs for recent message ingestion
 								- 401 Unauthorized from `/tail` or `/trigger`
 								  - Provide `Authorization: Bearer $HTTP_TOKEN` or `?token=$HTTP_TOKEN`
 								- OpenAI 502/URL errors
 								  - Ensure `OPENAI_BASE_URL=https://api.openai.com/v1`
 								  - Try `OPENAI_MODEL=gpt-4o-mini` if `gpt-5` isn’t enabled for your account
 								## Roadmap
 								- Additional notifiers (ntfy, Telegram)
 								- Long-form HTML digest rendering
 								- Admin endpoints (e.g., `/join?channel=#chan`)
 								## Development notes
 								Project layout (selected):
 								- `cmd/sojuboy/main.go` – entrypoint, wiring config/services
 								- `internal/soju` – soju connector and ingestion
 								- `internal/store` – SQLite schema and queries
 								- `internal/notifier` – Pushover notifier
 								- `internal/summarizer` – OpenAI client and prompts
 								- `internal/httpapi` – health, tail, trigger, metrics endpoints
 								- `internal/scheduler` – cron jobs
 								Go toolchain: see `go.mod` (Go 1.23), Dockerfile builds static binary for a distroless image.
 								## License
 								MIT for code dependencies; this repository’s license will follow your preference (add a LICENSE if needed).