fix: resolve multiple bugs from code review #116

Fixes four issues identified in the community code review:

- Session persistence broken on Windows: session keys like
  "telegram:123456" contain ':', which is illegal in Windows
  filenames. filepath.Base() strips drive-letter prefixes on Windows,
  causing Save() to silently fail. Added sanitizeFilename() to
  replace invalid chars in the filename while keeping the original
  key in the JSON payload.

- HTTP client with no timeout: HTTPProvider used Timeout: 0 (infinite
  wait), which can hang the entire agent if an API endpoint becomes
  unresponsive. Set a 120s safety timeout.

- Slack AllowFrom type mismatch: SlackConfig used plain []string
  while every other channel uses FlexibleStringSlice, so numeric
  user IDs in Slack config would fail to parse.

- Token estimation wrong for CJK: estimateTokens() divided byte
  length by 4, but CJK characters are 3 bytes each, causing ~3x
  overestimation and premature summarization. Switched to
  utf8.RuneCountInString() / 3 for better cross-language accuracy.

Also added unit tests for the session filename sanitization.

Ref #116
This commit is contained in:
xiaoen
2026-02-15 00:28:36 +08:00
parent 1cff7d4e37
commit 0a88ff0817
5 changed files with 103 additions and 10 deletions

View File

@@ -16,6 +16,7 @@ import (
"sync"
"sync/atomic"
"time"
"unicode/utf8"
"github.com/sipeed/picoclaw/pkg/bus"
"github.com/sipeed/picoclaw/pkg/config"
@@ -768,10 +769,13 @@ func (al *AgentLoop) summarizeBatch(ctx context.Context, batch []providers.Messa
}
// estimateTokens estimates the number of tokens in a message list.
// Uses rune count instead of byte length so that CJK and other multi-byte
// characters are not over-counted (a Chinese character is 3 bytes but roughly
// one token).
func (al *AgentLoop) estimateTokens(messages []providers.Message) int {
total := 0
for _, m := range messages {
total += len(m.Content) / 4 // Simple heuristic: 4 chars per token
total += utf8.RuneCountInString(m.Content) / 3
}
return total
}