From 3119440cd900614e6f32cea68c8fdcf524fd6d4b Mon Sep 17 00:00:00 2001 From: DanS Date: Fri, 5 Jun 2026 06:35:26 -0500 Subject: [PATCH] fix(lite): non-blocking, non-hanging sync (Finding B) The backend `sync` command is a blocking, uninterruptible full chain scan (do_sync(true); does not honor the shutdown flag), and balance/list block until synced. Previously startSync() ran on the main thread (would freeze wallet creation) and the worker could block, making the destructor join() hang at shutdown. Redesign: - bridge is now std::shared_ptr, shared with a detached sync thread so detaching is safe and litelib_shutdown isn't called while a running sync still holds the bridge; the controller's own ref prevents premature shutdown during normal operation. - startSync() launches the blocking `sync` on a detached thread (non-blocking; never joined). - refreshModel() gates on syncDone_: while syncing it publishes syncstatus progress only; once synced it does the full balance/addresses/list refresh (now fast). - destructor joins only the fast poll worker and detaches the sync thread -> no hang. - syncComplete() accessor added. Tests (deterministic, via a blocking-sync fake; counters made atomic for the detached thread): testLiteWalletControllerShutdownDoesNotHangDuringSync (destructor returns <1.5s with sync blocked); refresh/worker tests wait for syncComplete()/a balance-bearing model. Stable across repeated runs; lite+backend and full-node apps build clean. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...allet-implementation-plan-v2-2026-06-04.md | 2 +- src/wallet/lite_wallet_controller.cpp | 51 ++++++++++++++----- src/wallet/lite_wallet_controller.h | 22 +++++--- tests/fake_lite_backend.h | 17 +++++-- tests/test_phase4.cpp | 50 +++++++++++++++--- 5 files changed, 112 insertions(+), 30 deletions(-) diff --git a/docs/lite-wallet-implementation-plan-v2-2026-06-04.md b/docs/lite-wallet-implementation-plan-v2-2026-06-04.md index 5892f5e..6b740ed 100644 --- a/docs/lite-wallet-implementation-plan-v2-2026-06-04.md +++ b/docs/lite-wallet-implementation-plan-v2-2026-06-04.md @@ -98,7 +98,7 @@ Each milestone is independently demoable and gated by a fake-backend test. Order > - ✅ **M2b-3 — threaded App hook (done + tested).** `LiteWalletController` owns a background worker (`std::thread`) that, once a wallet is ready, refreshes every ~4s and publishes a copyable `LiteWalletAppRefreshModel` under a mutex; `App::update()` calls `takeRefreshedModel()` and applies it into `state_` on the main thread (WalletState is non-copyable, so the model crosses the thread boundary, not the state). Worker auto-starts on lifecycle-ready and is stopped+joined in the controller destructor. `status_` is written only on the main thread to avoid races; `walletOpen_`/`syncStarted_` are atomic. `testLiteWalletControllerWorkerProducesModel()` opens a wallet and asserts the worker publishes a populated model (stable across repeated runs). Builds clean in all configs. > **Real-backend refresh smoke (2026-06-04): ran `lite_smoke --create --refresh` against the live backend — found two real bugs** the fake/fixture couldn't (smoke now links `lite_result_parsers` and runs each command's real output through the parser): > 1. **FIXED — `syncstatus` parser mismatch.** `parseLiteSyncStatusResponse` hard-required `synced_blocks`/`total_blocks`, but the real backend (per `commands.rs:83-87`) returns **idle = `{"syncing":"false"}`** (string!) and only **in-progress = `{"syncing":"true","synced_blocks":N,"total_blocks":M}`**. The parser now reads `syncing` as a string and treats the block fields as in-progress-only (idle → complete, synced/total 0). Covered by `testLiteSyncStatusParserRealShapes()` and **verified against the live backend** (`syncstatus parse_ok=1`). (info/balance/addresses parsers also verified OK against real output.) -> 2. **OPEN — first data query blocks on a full chain sync.** `execute("balance"/"list")` on a fresh wallet triggers a synchronous multi-million-block sync (observed "Syncing 1.76M/2.99M…"). On the M2b-3 worker thread that means the controller's destructor `join()` would hang at app shutdown. Needs: a cancel/timeout path for in-flight refresh (e.g., don't block shutdown on the worker), and likely gating data fetches until sync has progressed. **This is the main blocker for a usable real lite wallet** and should lead M2 polish / M3. +> 2. **ADDRESSED — blocking, uninterruptible sync.** The backend `sync` command runs `do_sync(true)`, a blocking full scan that does **not** honor the shutdown flag (`lightclient.rs`), and `balance`/`list` block until synced. Redesign: the controller runs `sync` on a **detached** thread (never joined), the bridge is a `std::shared_ptr` shared with that thread (so detaching is safe and the bridge isn't `litelib_shutdown`'d while a sync still holds it), and `startSync()` is now non-blocking (was called on the main thread → would have frozen wallet creation). The joinable **poll worker** only issues fast `syncstatus` calls while syncing (publishing progress) and fetches balance/addresses/list **once `syncDone_` is set**. Shutdown joins only the fast poll worker and detaches the sync thread → no hang. Verified deterministically by `testLiteWalletControllerShutdownDoesNotHangDuringSync()` (blocking-sync fake; destructor returns <1.5s) and the worker/refresh tests (stable across repeated runs). > > - ⏳ **Remaining for M2 polish:** fix the syncstatus parser (above), address the blocking-sync/worker-shutdown issue (above), per-address balances (notes-correlation; currently aggregate-only), and harden the gateway's abort-on-first-failure (skip-and-continue per command). - Implement `LiteSyncService::startSync` (replace the "not implemented" stub) + a background worker polling `syncstatus`, mirroring `NetworkRefreshService`/`RefreshScheduler` (enqueue → worker → apply on main thread). diff --git a/src/wallet/lite_wallet_controller.cpp b/src/wallet/lite_wallet_controller.cpp index 154129b..777f149 100644 --- a/src/wallet/lite_wallet_controller.cpp +++ b/src/wallet/lite_wallet_controller.cpp @@ -95,12 +95,12 @@ LiteWalletController::LiteWalletController(WalletCapabilities capabilities, LiteConnectionSettings connectionSettings, LiteClientBridge bridge, LiteWalletControllerOptions options) - : bridge_(std::move(bridge)), - lifecycle_(capabilities, connectionSettings, &bridge_, + : bridge_(std::make_shared(std::move(bridge))), + lifecycle_(capabilities, connectionSettings, bridge_.get(), LiteWalletLifecycleOptions{options.allowBridgeCalls}), - gateway_(capabilities, connectionSettings, &bridge_, + gateway_(capabilities, connectionSettings, bridge_.get(), LiteWalletGatewayOptions{options.allowBridgeCalls}), - sync_(capabilities, connectionSettings, &bridge_, + sync_(capabilities, connectionSettings, bridge_.get(), LiteSyncServiceOptions{options.allowBridgeCalls}) { status_ = lifecycle_.status(); @@ -108,7 +108,11 @@ LiteWalletController::LiteWalletController(WalletCapabilities capabilities, LiteWalletController::~LiteWalletController() { - stopWorker(); + stopWorker(); // joins the fast poll worker (short iterations) + // The sync thread may be blocked in an uninterruptible full scan; detach it. It holds + // shared refs (bridge_ + syncDone_), so it stays safe and the bridge survives until it + // finishes — the process is exiting, so a late litelib_shutdown is harmless. + if (syncThread_.joinable()) syncThread_.detach(); } std::unique_ptr LiteWalletController::createLinked( @@ -133,30 +137,51 @@ void LiteWalletController::onLifecycleResult(const LiteWalletLifecycleResult& re } } -LiteSyncStartResult LiteWalletController::startSync() +void LiteWalletController::startSync() { - auto result = sync_.startSync(LiteSyncStartRequest{}); - if (result.syncStarted) syncStarted_ = true; - return result; + if (syncLaunched_) return; + syncLaunched_ = true; + syncStarted_ = true; + // The backend `sync` command is a blocking, uninterruptible full chain scan, so run it on + // a detached thread. Capture shared refs (not the controller) so it is safe to outlive us. + auto bridge = bridge_; + auto done = syncDone_; + syncThread_ = std::thread([bridge, done] { + if (bridge) bridge->execute("sync", ""); // blocks until synced (or errors out) + done->store(true); + }); } std::optional LiteWalletController::refreshModel() { if (!walletOpen_.load()) return std::nullopt; - // Poll sync status first so the refresh bundle (and the mapped sync model) carries it. - LiteWalletRefreshRequest request; + // syncstatus is fast (reads shared state the sync thread updates). Poll it every time. const auto syncResult = sync_.pollSyncStatus(LiteSyncStatusRequest{}); + + if (!syncDone_->load()) { + // Sync still running: publish progress only. Data queries (balance/list) would block + // until the chain is synced, so don't issue them yet. + if (!syncResult.ok) return std::nullopt; + LiteWalletAppRefreshModel model; + model.hasSyncStatus = true; + model.sync.walletHeight = syncResult.syncStatus.syncedBlocks; + model.sync.chainHeight = syncResult.syncStatus.totalBlocks; + model.sync.progress = syncResult.syncStatus.progress; + model.sync.complete = syncResult.syncStatus.complete; + return model; + } + + // Synced: full refresh (balance/addresses/transactions are fast now). + LiteWalletRefreshRequest request; if (syncResult.ok) { request.haveSyncStatus = true; request.syncStatus = syncResult.syncStatus; } - const auto refreshResult = gateway_.refresh(request); if (refreshResult.bundle.successfulCommandCount == 0 && !request.haveSyncStatus) { return std::nullopt; } - const auto mapped = mapLiteWalletRefreshResult(refreshResult); if (!mapped.ok) return std::nullopt; return mapped.model; diff --git a/src/wallet/lite_wallet_controller.h b/src/wallet/lite_wallet_controller.h index beafe44..4dd911b 100644 --- a/src/wallet/lite_wallet_controller.h +++ b/src/wallet/lite_wallet_controller.h @@ -85,10 +85,12 @@ public: LiteWalletLifecycleResult restoreWallet(LiteWalletRestoreRequest request); bool syncStarted() const { return syncStarted_; } + bool syncComplete() const { return syncDone_ && syncDone_->load(); } - // Begin background sync on the backend (idempotent enough to call once a wallet is ready; - // also invoked automatically when a lifecycle op produces a ready wallet). - LiteSyncStartResult startSync(); + // Launch the backend sync on a detached background thread (NON-blocking; the backend's + // `sync` command runs a full, uninterruptible chain scan). Auto-invoked when a lifecycle + // op produces a ready wallet; safe to call once. + void startSync(); // Poll sync status + fetch balance/addresses/transactions, and apply the result into the // app's WalletState. Returns true if state was updated. Safe no-op when no wallet is open. @@ -110,7 +112,10 @@ private: void stopWorker(); void workerLoop(); - LiteClientBridge bridge_; // the single owned bridge; services below borrow &bridge_ + // The bridge is shared (not just owned) so the detached, uninterruptible sync thread can + // safely outlive the controller: it holds a ref, so the underlying bridge is destroyed + // (and litelib_shutdown called) only once BOTH the controller and a running sync release it. + std::shared_ptr bridge_; LiteWalletLifecycleService lifecycle_; LiteWalletGateway gateway_; LiteSyncService sync_; @@ -119,14 +124,19 @@ private: std::atomic syncStarted_{false}; WalletBackendStatus status_; // written only on the main thread (lifecycle ops) - // Background refresh worker. + // Detached background sync (backend `sync` is a blocking, uninterruptible full scan). + std::thread syncThread_; + bool syncLaunched_ = false; + std::shared_ptr> syncDone_ = std::make_shared>(false); + + // Joinable background refresh worker (fast iterations: syncstatus, plus data once synced). std::thread worker_; std::atomic running_{false}; std::mutex wakeMutex_; std::condition_variable wakeCv_; std::mutex modelMutex_; std::optional pendingModel_; // guarded by modelMutex_ - static constexpr int kRefreshIntervalMs = 4000; + static constexpr int kRefreshIntervalMs = 2000; }; } // namespace wallet diff --git a/tests/fake_lite_backend.h b/tests/fake_lite_backend.h index 6990006..17896de 100644 --- a/tests/fake_lite_backend.h +++ b/tests/fake_lite_backend.h @@ -17,18 +17,22 @@ #include "wallet/lite_client_bridge.h" +#include +#include #include #include +#include namespace dragonx { namespace test { -// Owned-string accounting (C++17 inline vars: single definition across TUs). -inline long g_liteFakeAlloc = 0; // owned strings handed to the bridge -inline long g_liteFakeFreed = 0; // owned strings released via freeString +// Owned-string accounting (atomic: a detached sync thread may touch these concurrently). +inline std::atomic g_liteFakeAlloc{0}; // owned strings handed to the bridge +inline std::atomic g_liteFakeFreed{0}; // owned strings released via freeString inline bool g_liteFakeWalletExists = true; inline bool g_liteFakeServerOnline = true; inline bool g_liteFakeShutdownCalled = false; +inline std::atomic g_liteFakeSyncBlock{false}; // when true, the "sync" command blocks inline void resetLiteFakeCounters() { @@ -64,7 +68,12 @@ inline char* liteFakeExecute(const char* command, const char*) // tests/fixtures/lite/result_parsers.json), so the gateway/sync refresh path parses. if (command) { const char* c = command; - if (std::strcmp(c, "sync") == 0) return liteFakeDup("{\"result\":\"success\"}"); + if (std::strcmp(c, "sync") == 0) { + // Simulate the real backend's blocking full sync when requested, so tests can + // verify shutdown doesn't hang on an in-flight sync. + while (g_liteFakeSyncBlock.load()) std::this_thread::sleep_for(std::chrono::milliseconds(5)); + return liteFakeDup("{\"result\":\"success\"}"); + } if (std::strcmp(c, "syncstatus") == 0) // real backend shape: "syncing" is a string return liteFakeDup("{\"syncing\":\"true\",\"synced_blocks\":1000,\"total_blocks\":1000}"); if (std::strcmp(c, "balance") == 0) diff --git a/tests/test_phase4.cpp b/tests/test_phase4.cpp index 390717e..b0067e0 100644 --- a/tests/test_phase4.cpp +++ b/tests/test_phase4.cpp @@ -4634,6 +4634,12 @@ void testLiteWalletControllerRefreshPopulatesState() EXPECT_TRUE(controller.walletOpen()); EXPECT_TRUE(controller.syncStarted()); // auto-started when the wallet became ready + // Sync runs on a detached thread; the full refresh (balance/addresses) only runs once it + // completes. Wait for it (instant with the fake) so the refresh is deterministic. + for (int i = 0; i < 500 && !controller.syncComplete(); ++i) + std::this_thread::sleep_for(std::chrono::milliseconds(5)); + EXPECT_TRUE(controller.syncComplete()); + dragonx::WalletState state; EXPECT_TRUE(controller.refreshWalletState(state)); EXPECT_NEAR(state.privateBalance, 2.0, 1e-9); @@ -4692,14 +4698,20 @@ void testLiteWalletControllerWorkerProducesModel() LiteWalletController controller(caps, conn, LiteClientBridge::fromApi(dragonx::test::makeFakeLiteApi())); EXPECT_TRUE(controller.createWallet(LiteWalletCreateRequest{}).walletReady); // auto-starts the worker - // The worker refreshes immediately on start; poll briefly (<=2s) for the produced model. + // The worker publishes progress-only models while syncing, then full models once synced. + // Poll until a full (balance-bearing) model arrives (sync is instant with the fake). LiteWalletAppRefreshModel model; - bool got = false; - for (int i = 0; i < 200 && !got; ++i) { - got = controller.takeRefreshedModel(model); - if (!got) std::this_thread::sleep_for(std::chrono::milliseconds(10)); + bool gotFull = false; + for (int i = 0; i < 500 && !gotFull; ++i) { + LiteWalletAppRefreshModel m; + if (controller.takeRefreshedModel(m) && m.hasBalance) { + model = m; + gotFull = true; + break; + } + std::this_thread::sleep_for(std::chrono::milliseconds(10)); } - EXPECT_TRUE(got); + EXPECT_TRUE(gotFull); EXPECT_TRUE(model.hasBalance); EXPECT_TRUE(model.hasAddresses); @@ -4713,6 +4725,31 @@ void testLiteWalletControllerWorkerProducesModel() EXPECT_FALSE(idle.takeRefreshedModel(none)); } +// M2b-3 hardening: the backend `sync` is a blocking, uninterruptible full scan. Destroying the +// controller while a sync is in flight must NOT hang (the sync thread is detached, not joined). +void testLiteWalletControllerShutdownDoesNotHangDuringSync() +{ + using namespace dragonx::wallet; + const auto caps = makeWalletCapabilities(WalletBuildKind::Lite, /*embeddedDaemon*/ false, /*liteBackendLinked*/ true); + const auto conn = defaultLiteConnectionSettings(); + + dragonx::test::g_liteFakeSyncBlock.store(true); // make the backend "sync" block indefinitely + const auto start = std::chrono::steady_clock::now(); + { + LiteWalletController controller(caps, conn, LiteClientBridge::fromApi(dragonx::test::makeFakeLiteApi())); + controller.createWallet(LiteWalletCreateRequest{}); // launches the (now-blocked) sync thread + EXPECT_TRUE(controller.syncStarted()); + EXPECT_FALSE(controller.syncComplete()); + // controller destructs here with the sync thread still blocked -> must return promptly. + } + const auto elapsedMs = std::chrono::duration_cast( + std::chrono::steady_clock::now() - start).count(); + EXPECT_TRUE(elapsedMs < 1500); // did not wait for the (blocked) sync to finish + + dragonx::test::g_liteFakeSyncBlock.store(false); // release the detached sync thread + std::this_thread::sleep_for(std::chrono::milliseconds(50)); // let it unwind cleanly +} + } // namespace int main() @@ -4752,6 +4789,7 @@ int main() testLiteSyncStatusParserRealShapes(); testLiteWalletControllerRefreshPopulatesState(); testLiteWalletControllerWorkerProducesModel(); + testLiteWalletControllerShutdownDoesNotHangDuringSync(); testLiteBridgeRuntimeShutdownIsIdempotent(); testLiteBridgeRuntimeDestructorCallsShutdownOnce(); testLiteBridgeRuntimeShutdownWaitsForOwnedStringRelease();