food-market/tests/load/monitor-soak.sh
nns e30861fb57
Some checks are pending
Auto-tag / Create date-tag (push) Waiting to run
CI / Backend (.NET 8) (push) Waiting to run
CI / Web (React + Vite) (push) Waiting to run
CI / POS (WPF, Windows) (push) Waiting to run
feat(s27): cross-feature integration + soak + crash recovery (8/8 ✓)
Каждый из 26 спринтов работал в изоляции; этот спринт проверяет
взаимодействие — реально ли все фичи совместимы.

1. tests/integration/03-loyalty-signalr-i18n: программа PointsAccrual →
   карта → продажа 100₸ → начисление 10 баллов; SignalR через
   /hubs/notifications + WS получает SalePosted; ru-RU и en-US оба 200.
2. tests/integration/01-permissions-bulk-audit: manager без
   ProductsDelete/Edit → DELETE и bulk-archive оба 403 (атомарно);
   orgB не видит userId orgA в audit-log; orgB не видит товары orgA.
3. tests/integration/04-2fa-sso-permissions: providers endpoint OK;
   challenge Google без конфига → 503 с подсказкой; 2FA enroll+verify+
   disable работают с otplib TOTP; permissions для manager'a
   проверяются после 2FA enable.
4. tests/integration/02-ofd-mock-reports: PUT /api/organization/fiscal
   {provider:1} → Mock; 50 продаж имеют fiscalNumber.startsWith("MOCK-");
   sales report ≥50 транзакций; ABC классифицирует как A с share>0.5.
5. tests/integration/05-real-business-day: open→supply 100×2→50 sales→
   customer return→inventory→transfer→loss→demand→3 reports + stock
   invariant validated. Прогон 24.7s.
6. tests/load/soak-4h.js + monitor-soak.sh — k6 constant-arrival-rate
   50 RPS. Soak-lite 16m34s @ 20 RPS: 19863 iterations, 0 failures,
   p95 me=16.9ms / products=29.5ms / stats=стабильно, mem 320-344 MiB
   без линейного роста, PG conn 18, disk не двинулся. Без утечек.
7. tests/integration/06-edge-cases: 100 concurrent SignalR подключений
   = 100/100 успешных WS handshake; 90 параллельных запросов = 100%
   200, <8s, 0 5xx. Hangfire workers=2 не блокирует API.
8. Crash recovery test: host SIGKILL dotnet процесса → unless-stopped
   policy → recovery 11.7s ≤ 30s SLA. Найдено: docker kill (через CLI)
   = explicit-stop по политике Docker, не триггерит auto-restart;
   реальный host-side crash работает корректно.

Cert-прогон: 7 integration specs все зелёные за 1.2 мин.
0 production bugs found.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 03:09:17 +05:00

101 lines
3.6 KiB
Bash
Executable file
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#!/usr/bin/env bash
# Sprint 27: snapshot стейджа каждые 5 минут на протяжении soak'a.
#
# Запуск:
# tests/load/monitor-soak.sh /tmp/soak-metrics.csv > /dev/null 2>&1 &
# (затем k6 run soak-4h.js — оба идут параллельно)
#
# Записывает CSV с колонками:
# ts,api_mem_mb,api_cpu_pct,pg_connections,disk_free_gb,p95_db_ms
#
# Источники:
# - docker stats food-market-stage-api-1 (на 192.168.1.190)
# - psql pg_stat_activity (через docker exec postgres)
# - df -h (на 192.168.1.190)
# - /metrics → histogram_quantile для p95 DB
set -uo pipefail
OUT="${1:-/tmp/soak-metrics.csv}"
INTERVAL="${INTERVAL:-300}" # 5 минут default
DURATION="${DURATION:-14400}" # 4 часа default
STAGE_HOST="${STAGE_HOST:-192.168.1.190}"
# Header
if [ ! -f "$OUT" ]; then
echo "ts,api_mem_mb,api_cpu_pct,pg_connections,disk_free_gb,me_p95_ms,products_p95_ms" > "$OUT"
fi
end=$(($(date +%s) + DURATION))
while [ "$(date +%s)" -lt $end ]; do
TS=$(date -Iseconds)
# API container stats (mem MB, CPU %)
STATS=$(ssh -o ConnectTimeout=5 nns@$STAGE_HOST \
"docker stats --no-stream --format '{{.MemUsage}} {{.CPUPerc}}' food-market-stage-api-1" 2>/dev/null || echo "0MiB / 0MiB 0%")
MEM=$(echo "$STATS" | awk '{print $1}' | sed 's/MiB//;s/GiB//')
# Если в GiB — конвертим в MiB (×1024)
if echo "$STATS" | awk '{print $1}' | grep -q GiB; then
MEM=$(python3 -c "print(int(float('$MEM')*1024))")
fi
CPU=$(echo "$STATS" | awk '{print $4}' | sed 's/%//')
# PG connections
PG_CONN=$(ssh -o ConnectTimeout=5 nns@$STAGE_HOST \
"docker exec food-market-stage-postgres-1 psql -U food_market -d food_market -tA -c 'SELECT count(*) FROM pg_stat_activity'" 2>/dev/null || echo "0")
# Disk
DISK=$(ssh -o ConnectTimeout=5 nns@$STAGE_HOST "df -BG --output=avail / | tail -1 | tr -d 'G '" 2>/dev/null || echo "0")
# P95 latency (rough — из /metrics histogram)
METRICS=$(curl -fsS --max-time 5 https://test.admin.food-market.kz/metrics 2>/dev/null || echo "")
# Парсим через python (histogram_quantile сложно в shell)
P95_ME=$(echo "$METRICS" | python3 -c "
import sys
buckets = []
total = 0
for line in sys.stdin:
if 'http_request_duration_seconds_bucket' in line and 'action=\"GetMe\"' in line:
try:
le = float(line.split('le=\"')[1].split('\"')[0])
val = float(line.rsplit(' ', 1)[1])
buckets.append((le, val))
except: pass
buckets.sort()
if buckets:
total = buckets[-1][1]
p95_target = 0.95 * total
for le, v in buckets:
if v >= p95_target:
print(int(le * 1000)); break
else: print(0)
else: print(0)
" 2>/dev/null || echo 0)
P95_PRODUCTS=$(echo "$METRICS" | python3 -c "
import sys
buckets = []
for line in sys.stdin:
if 'http_request_duration_seconds_bucket' in line and 'action=\"List\"' in line and 'controller=\"Products\"' in line:
try:
le = float(line.split('le=\"')[1].split('\"')[0])
val = float(line.rsplit(' ', 1)[1])
buckets.append((le, val))
except: pass
buckets.sort()
if buckets:
total = buckets[-1][1]
p95_target = 0.95 * total
for le, v in buckets:
if v >= p95_target:
print(int(le * 1000)); break
else: print(0)
else: print(0)
" 2>/dev/null || echo 0)
echo "$TS,$MEM,$CPU,$PG_CONN,$DISK,$P95_ME,$P95_PRODUCTS" >> "$OUT"
echo "[$TS] mem=${MEM}MiB cpu=${CPU}% pg_conn=$PG_CONN disk=${DISK}G p95_me=${P95_ME}ms p95_prod=${P95_PRODUCTS}ms"
sleep "$INTERVAL"
done