fix(lite): non-blocking, non-hanging sync (Finding B)

The backend `sync` command is a blocking, uninterruptible full chain scan (do_sync(true);
does not honor the shutdown flag), and balance/list block until synced. Previously
startSync() ran on the main thread (would freeze wallet creation) and the worker could
block, making the destructor join() hang at shutdown.

Redesign:
- bridge is now std::shared_ptr<LiteClientBridge>, shared with a detached sync thread so
  detaching is safe and litelib_shutdown isn't called while a running sync still holds the
  bridge; the controller's own ref prevents premature shutdown during normal operation.
- startSync() launches the blocking `sync` on a detached thread (non-blocking; never joined).
- refreshModel() gates on syncDone_: while syncing it publishes syncstatus progress only;
  once synced it does the full balance/addresses/list refresh (now fast).
- destructor joins only the fast poll worker and detaches the sync thread -> no hang.
- syncComplete() accessor added.

Tests (deterministic, via a blocking-sync fake; counters made atomic for the detached
thread): testLiteWalletControllerShutdownDoesNotHangDuringSync (destructor returns <1.5s
with sync blocked); refresh/worker tests wait for syncComplete()/a balance-bearing model.
Stable across repeated runs; lite+backend and full-node apps build clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-05 06:35:26 -05:00
parent 59c55e33f8
commit 3119440cd9
5 changed files with 112 additions and 30 deletions

View File

@@ -98,7 +98,7 @@ Each milestone is independently demoable and gated by a fake-backend test. Order
> - ✅ **M2b-3 — threaded App hook (done + tested).** `LiteWalletController` owns a background worker (`std::thread`) that, once a wallet is ready, refreshes every ~4s and publishes a copyable `LiteWalletAppRefreshModel` under a mutex; `App::update()` calls `takeRefreshedModel()` and applies it into `state_` on the main thread (WalletState is non-copyable, so the model crosses the thread boundary, not the state). Worker auto-starts on lifecycle-ready and is stopped+joined in the controller destructor. `status_` is written only on the main thread to avoid races; `walletOpen_`/`syncStarted_` are atomic. `testLiteWalletControllerWorkerProducesModel()` opens a wallet and asserts the worker publishes a populated model (stable across repeated runs). Builds clean in all configs.
> **Real-backend refresh smoke (2026-06-04): ran `lite_smoke --create --refresh` against the live backend — found two real bugs** the fake/fixture couldn't (smoke now links `lite_result_parsers` and runs each command's real output through the parser):
> 1. **FIXED — `syncstatus` parser mismatch.** `parseLiteSyncStatusResponse` hard-required `synced_blocks`/`total_blocks`, but the real backend (per `commands.rs:83-87`) returns **idle = `{"syncing":"false"}`** (string!) and only **in-progress = `{"syncing":"true","synced_blocks":N,"total_blocks":M}`**. The parser now reads `syncing` as a string and treats the block fields as in-progress-only (idle → complete, synced/total 0). Covered by `testLiteSyncStatusParserRealShapes()` and **verified against the live backend** (`syncstatus parse_ok=1`). (info/balance/addresses parsers also verified OK against real output.)
> 2. **OPEN — first data query blocks on a full chain sync.** `execute("balance"/"list")` on a fresh wallet triggers a synchronous multi-million-block sync (observed "Syncing 1.76M/2.99M…"). On the M2b-3 worker thread that means the controller's destructor `join()` would hang at app shutdown. Needs: a cancel/timeout path for in-flight refresh (e.g., don't block shutdown on the worker), and likely gating data fetches until sync has progressed. **This is the main blocker for a usable real lite wallet** and should lead M2 polish / M3.
> 2. **ADDRESSED — blocking, uninterruptible sync.** The backend `sync` command runs `do_sync(true)`, a blocking full scan that does **not** honor the shutdown flag (`lightclient.rs`), and `balance`/`list` block until synced. Redesign: the controller runs `sync` on a **detached** thread (never joined), the bridge is a `std::shared_ptr` shared with that thread (so detaching is safe and the bridge isn't `litelib_shutdown`'d while a sync still holds it), and `startSync()` is now non-blocking (was called on the main thread → would have frozen wallet creation). The joinable **poll worker** only issues fast `syncstatus` calls while syncing (publishing progress) and fetches balance/addresses/list **once `syncDone_` is set**. Shutdown joins only the fast poll worker and detaches the sync thread → no hang. Verified deterministically by `testLiteWalletControllerShutdownDoesNotHangDuringSync()` (blocking-sync fake; destructor returns <1.5s) and the worker/refresh tests (stable across repeated runs).
>
> - ⏳ **Remaining for M2 polish:** fix the syncstatus parser (above), address the blocking-sync/worker-shutdown issue (above), per-address balances (notes-correlation; currently aggregate-only), and harden the gateway's abort-on-first-failure (skip-and-continue per command).
- Implement `LiteSyncService::startSync` (replace the "not implemented" stub) + a background worker polling `syncstatus`, mirroring `NetworkRefreshService`/`RefreshScheduler` (enqueue → worker → apply on main thread).