После 24 спринтов regress-suite разросся; нестабильность блокирует доверие. Этот спринт: ловит flaky тесты, добавляет observability (Grafana + Prometheus alerts + RUNBOOK), сертифицирует 10× cert-прогон. 1. tests/regression/find-flaky.sh — 10× прогон + JSON-агрегатор → docs/flaky-tests.md (per-test pass/fail sequence + reproduce). 2. OrgFactory.signupWithRetry теперь honors Retry-After header (api-client.ts:ApiError.retryAfterSec). Stage rate-limit поднят: RATE_SIGNUP_HOUR=5000, RATE_PER_IP_MIN=5000 (~/food-market-stage/deploy/.env). 3. fullyParallel=true + workers=4 = тесты идут в недетерминированном порядке; isolation работает (OrgFactory per-test). 4. workers=4 даёт **2.4× ускорение** (66.6s → 27.7s). Worker-scoped фикстура lib/worker-org.ts добавлена как opt-in. 5. deploy/grafana/dashboards/quality-watchdog.json (10 панелей: smoke success ratio 7d, incidents, multi-tenant violations, current emoji, p95 by endpoint, step failures, RPS, DB p95, docs posted, disk free) + dashboards/README.md. quality-watchdog.sh пишет Prometheus textfile экспорт в ~/.fm-watchdog/textfile/quality_watchdog.prom для node_exporter. 6. deploy/prometheus/alerts.yml — 10 правил, 4 группы (uptime, errors, database, quality-watchdog). MultiTenantViolation = P0. deploy/prometheus/prometheus.yml — reference config. 7. docs/RUNBOOK.md +178 строк: action per alert (api-down, rps-drop, http-errors-spike/growing, doc-posting-errors, db-p95-high, disk-free-low, watchdog-red, multi-tenant-violation, watchdog-incident). Junior-friendly с конкретными командами. **Cert-прогон (10× workers=4):** 420/420 passed, 0 flaky, avg 30.1s/run, total 300.6s (< 5min budget). Изменения вне репо: - ~/food-market-stage/deploy/.env — RATE_* limits bumped. - ~/quality-watchdog.sh — добавлен .prom textfile экспорт. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
170 lines
6.7 KiB
Bash
Executable file
170 lines
6.7 KiB
Bash
Executable file
#!/usr/bin/env bash
|
||
# Sprint 26: flaky-test detection.
|
||
#
|
||
# Прогоняет regression flows N раз подряд. Тест который менял статус
|
||
# хотя бы раз (passed → failed или наоборот) = FLAKY.
|
||
#
|
||
# Запуск:
|
||
# ./find-flaky.sh # 10 runs, default
|
||
# RUNS=5 ./find-flaky.sh # custom
|
||
# WORKERS=2 ./find-flaky.sh # override parallelism
|
||
# SUITE=flows/03-catalog.spec.ts ./find-flaky.sh # subset
|
||
#
|
||
# Артефакты:
|
||
# reports/flaky-runs/run-N.json — per-run Playwright JSON-output
|
||
# reports/flaky-summary.txt — итог: passed/failed по каждому тесту
|
||
# docs/flaky-tests.md — markdown-отчёт (только flaky тесты)
|
||
#
|
||
# Sprint 26, 2026-06-08.
|
||
|
||
set -uo pipefail
|
||
|
||
RUNS="${RUNS:-10}"
|
||
WORKERS="${WORKERS:-4}"
|
||
SUITE="${SUITE:-flows/}"
|
||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||
REPO_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||
RUN_DIR="$SCRIPT_DIR/reports/flaky-runs"
|
||
|
||
mkdir -p "$RUN_DIR"
|
||
rm -f "$RUN_DIR"/run-*.json
|
||
|
||
cd "$SCRIPT_DIR"
|
||
echo "=== flaky-detect: $RUNS прогонов suite=$SUITE workers=$WORKERS ==="
|
||
echo "stage URL: ${E2E_ADMIN_URL:-https://test.admin.food-market.kz}"
|
||
|
||
for i in $(seq 1 "$RUNS"); do
|
||
echo
|
||
echo "--- run $i/$RUNS ($(date '+%H:%M:%S'))"
|
||
# Используем config-reporter (json → reports/results.json). CLI override
|
||
# сбрасывает массив reporters, поэтому НЕ передаём --reporter.
|
||
# На время прогона выключаем retries чтобы flaky-индикатор не
|
||
# маскировался автоматическим ретраем.
|
||
WORKERS="$WORKERS" PLAYWRIGHT_JSON_OUTPUT_NAME="reports/results.json" \
|
||
pnpm exec playwright test "$SUITE" --retries=0 2>&1 \
|
||
| tee /tmp/flaky-run-$i.log \
|
||
| grep -E "(passed|failed|flaky)" | tail -3 || true
|
||
cp -f reports/results.json "$RUN_DIR/run-$i.json" 2>/dev/null || echo "WARN: results.json missing"
|
||
done
|
||
|
||
echo
|
||
echo "=== анализ результатов ==="
|
||
python3 - "$RUN_DIR" "$REPO_DIR/docs/flaky-tests.md" <<'PYEOF'
|
||
import json, sys, os, glob
|
||
from collections import defaultdict
|
||
|
||
run_dir = sys.argv[1]
|
||
docs_path = sys.argv[2]
|
||
|
||
# {test_full_title: [status_run_1, status_run_2, ...]}
|
||
results = defaultdict(list)
|
||
runs_present = sorted(glob.glob(os.path.join(run_dir, "run-*.json")))
|
||
|
||
def walk_suite(suite, path):
|
||
for spec in suite.get("specs", []):
|
||
title = path + " › " + spec.get("title", "")
|
||
# каждый spec.tests[*].results[*].status — это outcome
|
||
for t in spec.get("tests", []):
|
||
for r in t.get("results", []):
|
||
results[title].append(r.get("status", "?"))
|
||
for sub in suite.get("suites", []):
|
||
sub_path = path + " › " + sub.get("title", "") if path else sub.get("title", "")
|
||
walk_suite(sub, sub_path)
|
||
|
||
per_run = {}
|
||
for path in runs_present:
|
||
try:
|
||
data = json.load(open(path, encoding="utf-8"))
|
||
except Exception as e:
|
||
print(f"WARN: skipping {path}: {e}")
|
||
continue
|
||
run_id = os.path.basename(path)
|
||
per_run[run_id] = data
|
||
for suite in data.get("suites", []):
|
||
walk_suite(suite, suite.get("title", ""))
|
||
|
||
# Группировка результатов по тесту: pass-count, fail-count.
|
||
summary = []
|
||
flaky = []
|
||
for title, statuses in results.items():
|
||
passed = sum(1 for s in statuses if s == "passed")
|
||
failed = sum(1 for s in statuses if s in ("failed", "timedOut", "interrupted"))
|
||
skipped = sum(1 for s in statuses if s == "skipped")
|
||
summary.append((title, passed, failed, skipped, len(statuses)))
|
||
if passed > 0 and failed > 0:
|
||
flaky.append((title, passed, failed, statuses))
|
||
|
||
summary.sort()
|
||
flaky.sort()
|
||
|
||
print(f"\nИтого тестов уникальных: {len(summary)} flaky: {len(flaky)}")
|
||
for t, p, f, s, n in summary:
|
||
icon = "🔴" if f and not p else ("🟡" if p and f else "🟢")
|
||
print(f" {icon} pass={p} fail={f} skip={s} of {n} — {t}")
|
||
|
||
# Markdown report.
|
||
lines = []
|
||
lines.append("# Flaky tests report")
|
||
lines.append("")
|
||
lines.append(f"_Сгенерировано `tests/regression/find-flaky.sh` — {len(runs_present)} прогонов suite._")
|
||
lines.append("")
|
||
lines.append(f"**Всего уникальных тестов:** {len(summary)} ")
|
||
lines.append(f"**Flaky:** {len(flaky)} ({100*len(flaky)//max(1,len(summary))}%) ")
|
||
lines.append(f"**Всегда зелёные:** {sum(1 for _,p,f,_,_ in summary if f==0 and p>0)} ")
|
||
lines.append(f"**Всегда красные:** {sum(1 for _,p,f,_,_ in summary if p==0 and f>0)} ")
|
||
lines.append("")
|
||
if not flaky:
|
||
lines.append("## 🟢 Нет flaky тестов")
|
||
lines.append("")
|
||
lines.append("Suite стабилен.")
|
||
else:
|
||
lines.append("## 🟡 Flaky тесты")
|
||
lines.append("")
|
||
lines.append("| Тест | pass/fail | Sequence |")
|
||
lines.append("|---|---|---|")
|
||
for t, p, f, statuses in flaky:
|
||
seq = "".join("🟢" if s == "passed" else "🔴" for s in statuses)
|
||
lines.append(f"| `{t}` | {p}/{f} | {seq} |")
|
||
lines.append("")
|
||
lines.append("## Reproduce инструкции")
|
||
lines.append("")
|
||
for t, p, f, statuses in flaky:
|
||
# Извлекаем относительный path-fragment.
|
||
spec_match = ""
|
||
for tok in t.split("›"):
|
||
tok = tok.strip()
|
||
if tok.endswith(".ts"):
|
||
spec_match = tok
|
||
break
|
||
grep_match = t.split("›")[-1].strip()
|
||
lines.append(f"### `{grep_match}`")
|
||
lines.append("")
|
||
lines.append(f"Прогон: pass {p} / fail {f} из {p+f}. ")
|
||
lines.append("Reproduce:")
|
||
lines.append("```bash")
|
||
lines.append(f"# 5 повторов в изоляции:")
|
||
lines.append(f"cd tests/regression")
|
||
if spec_match:
|
||
lines.append(f"for i in 1 2 3 4 5; do pnpm exec playwright test {spec_match} --grep \"{grep_match[:40]}\" --reporter=line; done")
|
||
else:
|
||
lines.append(f"for i in 1 2 3 4 5; do pnpm exec playwright test --grep \"{grep_match[:40]}\" --reporter=line; done")
|
||
lines.append("```")
|
||
lines.append("")
|
||
|
||
if "Always-failing" in [t for t,p,f,_,_ in summary if p==0 and f>0]:
|
||
pass
|
||
|
||
# Всегда-красные тесты тоже выводим — это не flaky, но baseline-broken.
|
||
always_red = [(t,f,n) for t,p,f,s,n in summary if p==0 and f>0]
|
||
if always_red:
|
||
lines.append("## 🔴 Всегда красные (не flaky, но broken)")
|
||
lines.append("")
|
||
for t, f, n in always_red:
|
||
lines.append(f"- `{t}` ({f}/{n})")
|
||
lines.append("")
|
||
|
||
with open(docs_path, "w", encoding="utf-8") as f:
|
||
f.write("\n".join(lines))
|
||
print(f"\nReport: {docs_path}")
|
||
PYEOF
|