fix(lite): non-blocking, non-hanging sync (Finding B)
The backend `sync` command is a blocking, uninterruptible full chain scan (do_sync(true); does not honor the shutdown flag), and balance/list block until synced. Previously startSync() ran on the main thread (would freeze wallet creation) and the worker could block, making the destructor join() hang at shutdown. Redesign: - bridge is now std::shared_ptr<LiteClientBridge>, shared with a detached sync thread so detaching is safe and litelib_shutdown isn't called while a running sync still holds the bridge; the controller's own ref prevents premature shutdown during normal operation. - startSync() launches the blocking `sync` on a detached thread (non-blocking; never joined). - refreshModel() gates on syncDone_: while syncing it publishes syncstatus progress only; once synced it does the full balance/addresses/list refresh (now fast). - destructor joins only the fast poll worker and detaches the sync thread -> no hang. - syncComplete() accessor added. Tests (deterministic, via a blocking-sync fake; counters made atomic for the detached thread): testLiteWalletControllerShutdownDoesNotHangDuringSync (destructor returns <1.5s with sync blocked); refresh/worker tests wait for syncComplete()/a balance-bearing model. Stable across repeated runs; lite+backend and full-node apps build clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -98,7 +98,7 @@ Each milestone is independently demoable and gated by a fake-backend test. Order
|
||||
> - ✅ **M2b-3 — threaded App hook (done + tested).** `LiteWalletController` owns a background worker (`std::thread`) that, once a wallet is ready, refreshes every ~4s and publishes a copyable `LiteWalletAppRefreshModel` under a mutex; `App::update()` calls `takeRefreshedModel()` and applies it into `state_` on the main thread (WalletState is non-copyable, so the model crosses the thread boundary, not the state). Worker auto-starts on lifecycle-ready and is stopped+joined in the controller destructor. `status_` is written only on the main thread to avoid races; `walletOpen_`/`syncStarted_` are atomic. `testLiteWalletControllerWorkerProducesModel()` opens a wallet and asserts the worker publishes a populated model (stable across repeated runs). Builds clean in all configs.
|
||||
> **Real-backend refresh smoke (2026-06-04): ran `lite_smoke --create --refresh` against the live backend — found two real bugs** the fake/fixture couldn't (smoke now links `lite_result_parsers` and runs each command's real output through the parser):
|
||||
> 1. **FIXED — `syncstatus` parser mismatch.** `parseLiteSyncStatusResponse` hard-required `synced_blocks`/`total_blocks`, but the real backend (per `commands.rs:83-87`) returns **idle = `{"syncing":"false"}`** (string!) and only **in-progress = `{"syncing":"true","synced_blocks":N,"total_blocks":M}`**. The parser now reads `syncing` as a string and treats the block fields as in-progress-only (idle → complete, synced/total 0). Covered by `testLiteSyncStatusParserRealShapes()` and **verified against the live backend** (`syncstatus parse_ok=1`). (info/balance/addresses parsers also verified OK against real output.)
|
||||
> 2. **OPEN — first data query blocks on a full chain sync.** `execute("balance"/"list")` on a fresh wallet triggers a synchronous multi-million-block sync (observed "Syncing 1.76M/2.99M…"). On the M2b-3 worker thread that means the controller's destructor `join()` would hang at app shutdown. Needs: a cancel/timeout path for in-flight refresh (e.g., don't block shutdown on the worker), and likely gating data fetches until sync has progressed. **This is the main blocker for a usable real lite wallet** and should lead M2 polish / M3.
|
||||
> 2. **ADDRESSED — blocking, uninterruptible sync.** The backend `sync` command runs `do_sync(true)`, a blocking full scan that does **not** honor the shutdown flag (`lightclient.rs`), and `balance`/`list` block until synced. Redesign: the controller runs `sync` on a **detached** thread (never joined), the bridge is a `std::shared_ptr` shared with that thread (so detaching is safe and the bridge isn't `litelib_shutdown`'d while a sync still holds it), and `startSync()` is now non-blocking (was called on the main thread → would have frozen wallet creation). The joinable **poll worker** only issues fast `syncstatus` calls while syncing (publishing progress) and fetches balance/addresses/list **once `syncDone_` is set**. Shutdown joins only the fast poll worker and detaches the sync thread → no hang. Verified deterministically by `testLiteWalletControllerShutdownDoesNotHangDuringSync()` (blocking-sync fake; destructor returns <1.5s) and the worker/refresh tests (stable across repeated runs).
|
||||
>
|
||||
> - ⏳ **Remaining for M2 polish:** fix the syncstatus parser (above), address the blocking-sync/worker-shutdown issue (above), per-address balances (notes-correlation; currently aggregate-only), and harden the gateway's abort-on-first-failure (skip-and-continue per command).
|
||||
- Implement `LiteSyncService::startSync` (replace the "not implemented" stub) + a background worker polling `syncstatus`, mirroring `NetworkRefreshService`/`RefreshScheduler` (enqueue → worker → apply on main thread).
|
||||
|
||||
@@ -95,12 +95,12 @@ LiteWalletController::LiteWalletController(WalletCapabilities capabilities,
|
||||
LiteConnectionSettings connectionSettings,
|
||||
LiteClientBridge bridge,
|
||||
LiteWalletControllerOptions options)
|
||||
: bridge_(std::move(bridge)),
|
||||
lifecycle_(capabilities, connectionSettings, &bridge_,
|
||||
: bridge_(std::make_shared<LiteClientBridge>(std::move(bridge))),
|
||||
lifecycle_(capabilities, connectionSettings, bridge_.get(),
|
||||
LiteWalletLifecycleOptions{options.allowBridgeCalls}),
|
||||
gateway_(capabilities, connectionSettings, &bridge_,
|
||||
gateway_(capabilities, connectionSettings, bridge_.get(),
|
||||
LiteWalletGatewayOptions{options.allowBridgeCalls}),
|
||||
sync_(capabilities, connectionSettings, &bridge_,
|
||||
sync_(capabilities, connectionSettings, bridge_.get(),
|
||||
LiteSyncServiceOptions{options.allowBridgeCalls})
|
||||
{
|
||||
status_ = lifecycle_.status();
|
||||
@@ -108,7 +108,11 @@ LiteWalletController::LiteWalletController(WalletCapabilities capabilities,
|
||||
|
||||
LiteWalletController::~LiteWalletController()
|
||||
{
|
||||
stopWorker();
|
||||
stopWorker(); // joins the fast poll worker (short iterations)
|
||||
// The sync thread may be blocked in an uninterruptible full scan; detach it. It holds
|
||||
// shared refs (bridge_ + syncDone_), so it stays safe and the bridge survives until it
|
||||
// finishes — the process is exiting, so a late litelib_shutdown is harmless.
|
||||
if (syncThread_.joinable()) syncThread_.detach();
|
||||
}
|
||||
|
||||
std::unique_ptr<LiteWalletController> LiteWalletController::createLinked(
|
||||
@@ -133,30 +137,51 @@ void LiteWalletController::onLifecycleResult(const LiteWalletLifecycleResult& re
|
||||
}
|
||||
}
|
||||
|
||||
LiteSyncStartResult LiteWalletController::startSync()
|
||||
void LiteWalletController::startSync()
|
||||
{
|
||||
auto result = sync_.startSync(LiteSyncStartRequest{});
|
||||
if (result.syncStarted) syncStarted_ = true;
|
||||
return result;
|
||||
if (syncLaunched_) return;
|
||||
syncLaunched_ = true;
|
||||
syncStarted_ = true;
|
||||
// The backend `sync` command is a blocking, uninterruptible full chain scan, so run it on
|
||||
// a detached thread. Capture shared refs (not the controller) so it is safe to outlive us.
|
||||
auto bridge = bridge_;
|
||||
auto done = syncDone_;
|
||||
syncThread_ = std::thread([bridge, done] {
|
||||
if (bridge) bridge->execute("sync", ""); // blocks until synced (or errors out)
|
||||
done->store(true);
|
||||
});
|
||||
}
|
||||
|
||||
std::optional<LiteWalletAppRefreshModel> LiteWalletController::refreshModel()
|
||||
{
|
||||
if (!walletOpen_.load()) return std::nullopt;
|
||||
|
||||
// Poll sync status first so the refresh bundle (and the mapped sync model) carries it.
|
||||
LiteWalletRefreshRequest request;
|
||||
// syncstatus is fast (reads shared state the sync thread updates). Poll it every time.
|
||||
const auto syncResult = sync_.pollSyncStatus(LiteSyncStatusRequest{});
|
||||
|
||||
if (!syncDone_->load()) {
|
||||
// Sync still running: publish progress only. Data queries (balance/list) would block
|
||||
// until the chain is synced, so don't issue them yet.
|
||||
if (!syncResult.ok) return std::nullopt;
|
||||
LiteWalletAppRefreshModel model;
|
||||
model.hasSyncStatus = true;
|
||||
model.sync.walletHeight = syncResult.syncStatus.syncedBlocks;
|
||||
model.sync.chainHeight = syncResult.syncStatus.totalBlocks;
|
||||
model.sync.progress = syncResult.syncStatus.progress;
|
||||
model.sync.complete = syncResult.syncStatus.complete;
|
||||
return model;
|
||||
}
|
||||
|
||||
// Synced: full refresh (balance/addresses/transactions are fast now).
|
||||
LiteWalletRefreshRequest request;
|
||||
if (syncResult.ok) {
|
||||
request.haveSyncStatus = true;
|
||||
request.syncStatus = syncResult.syncStatus;
|
||||
}
|
||||
|
||||
const auto refreshResult = gateway_.refresh(request);
|
||||
if (refreshResult.bundle.successfulCommandCount == 0 && !request.haveSyncStatus) {
|
||||
return std::nullopt;
|
||||
}
|
||||
|
||||
const auto mapped = mapLiteWalletRefreshResult(refreshResult);
|
||||
if (!mapped.ok) return std::nullopt;
|
||||
return mapped.model;
|
||||
|
||||
@@ -85,10 +85,12 @@ public:
|
||||
LiteWalletLifecycleResult restoreWallet(LiteWalletRestoreRequest request);
|
||||
|
||||
bool syncStarted() const { return syncStarted_; }
|
||||
bool syncComplete() const { return syncDone_ && syncDone_->load(); }
|
||||
|
||||
// Begin background sync on the backend (idempotent enough to call once a wallet is ready;
|
||||
// also invoked automatically when a lifecycle op produces a ready wallet).
|
||||
LiteSyncStartResult startSync();
|
||||
// Launch the backend sync on a detached background thread (NON-blocking; the backend's
|
||||
// `sync` command runs a full, uninterruptible chain scan). Auto-invoked when a lifecycle
|
||||
// op produces a ready wallet; safe to call once.
|
||||
void startSync();
|
||||
|
||||
// Poll sync status + fetch balance/addresses/transactions, and apply the result into the
|
||||
// app's WalletState. Returns true if state was updated. Safe no-op when no wallet is open.
|
||||
@@ -110,7 +112,10 @@ private:
|
||||
void stopWorker();
|
||||
void workerLoop();
|
||||
|
||||
LiteClientBridge bridge_; // the single owned bridge; services below borrow &bridge_
|
||||
// The bridge is shared (not just owned) so the detached, uninterruptible sync thread can
|
||||
// safely outlive the controller: it holds a ref, so the underlying bridge is destroyed
|
||||
// (and litelib_shutdown called) only once BOTH the controller and a running sync release it.
|
||||
std::shared_ptr<LiteClientBridge> bridge_;
|
||||
LiteWalletLifecycleService lifecycle_;
|
||||
LiteWalletGateway gateway_;
|
||||
LiteSyncService sync_;
|
||||
@@ -119,14 +124,19 @@ private:
|
||||
std::atomic<bool> syncStarted_{false};
|
||||
WalletBackendStatus status_; // written only on the main thread (lifecycle ops)
|
||||
|
||||
// Background refresh worker.
|
||||
// Detached background sync (backend `sync` is a blocking, uninterruptible full scan).
|
||||
std::thread syncThread_;
|
||||
bool syncLaunched_ = false;
|
||||
std::shared_ptr<std::atomic<bool>> syncDone_ = std::make_shared<std::atomic<bool>>(false);
|
||||
|
||||
// Joinable background refresh worker (fast iterations: syncstatus, plus data once synced).
|
||||
std::thread worker_;
|
||||
std::atomic<bool> running_{false};
|
||||
std::mutex wakeMutex_;
|
||||
std::condition_variable wakeCv_;
|
||||
std::mutex modelMutex_;
|
||||
std::optional<LiteWalletAppRefreshModel> pendingModel_; // guarded by modelMutex_
|
||||
static constexpr int kRefreshIntervalMs = 4000;
|
||||
static constexpr int kRefreshIntervalMs = 2000;
|
||||
};
|
||||
|
||||
} // namespace wallet
|
||||
|
||||
@@ -17,18 +17,22 @@
|
||||
|
||||
#include "wallet/lite_client_bridge.h"
|
||||
|
||||
#include <atomic>
|
||||
#include <chrono>
|
||||
#include <cstdlib>
|
||||
#include <cstring>
|
||||
#include <thread>
|
||||
|
||||
namespace dragonx {
|
||||
namespace test {
|
||||
|
||||
// Owned-string accounting (C++17 inline vars: single definition across TUs).
|
||||
inline long g_liteFakeAlloc = 0; // owned strings handed to the bridge
|
||||
inline long g_liteFakeFreed = 0; // owned strings released via freeString
|
||||
// Owned-string accounting (atomic: a detached sync thread may touch these concurrently).
|
||||
inline std::atomic<long> g_liteFakeAlloc{0}; // owned strings handed to the bridge
|
||||
inline std::atomic<long> g_liteFakeFreed{0}; // owned strings released via freeString
|
||||
inline bool g_liteFakeWalletExists = true;
|
||||
inline bool g_liteFakeServerOnline = true;
|
||||
inline bool g_liteFakeShutdownCalled = false;
|
||||
inline std::atomic<bool> g_liteFakeSyncBlock{false}; // when true, the "sync" command blocks
|
||||
|
||||
inline void resetLiteFakeCounters()
|
||||
{
|
||||
@@ -64,7 +68,12 @@ inline char* liteFakeExecute(const char* command, const char*)
|
||||
// tests/fixtures/lite/result_parsers.json), so the gateway/sync refresh path parses.
|
||||
if (command) {
|
||||
const char* c = command;
|
||||
if (std::strcmp(c, "sync") == 0) return liteFakeDup("{\"result\":\"success\"}");
|
||||
if (std::strcmp(c, "sync") == 0) {
|
||||
// Simulate the real backend's blocking full sync when requested, so tests can
|
||||
// verify shutdown doesn't hang on an in-flight sync.
|
||||
while (g_liteFakeSyncBlock.load()) std::this_thread::sleep_for(std::chrono::milliseconds(5));
|
||||
return liteFakeDup("{\"result\":\"success\"}");
|
||||
}
|
||||
if (std::strcmp(c, "syncstatus") == 0) // real backend shape: "syncing" is a string
|
||||
return liteFakeDup("{\"syncing\":\"true\",\"synced_blocks\":1000,\"total_blocks\":1000}");
|
||||
if (std::strcmp(c, "balance") == 0)
|
||||
|
||||
@@ -4634,6 +4634,12 @@ void testLiteWalletControllerRefreshPopulatesState()
|
||||
EXPECT_TRUE(controller.walletOpen());
|
||||
EXPECT_TRUE(controller.syncStarted()); // auto-started when the wallet became ready
|
||||
|
||||
// Sync runs on a detached thread; the full refresh (balance/addresses) only runs once it
|
||||
// completes. Wait for it (instant with the fake) so the refresh is deterministic.
|
||||
for (int i = 0; i < 500 && !controller.syncComplete(); ++i)
|
||||
std::this_thread::sleep_for(std::chrono::milliseconds(5));
|
||||
EXPECT_TRUE(controller.syncComplete());
|
||||
|
||||
dragonx::WalletState state;
|
||||
EXPECT_TRUE(controller.refreshWalletState(state));
|
||||
EXPECT_NEAR(state.privateBalance, 2.0, 1e-9);
|
||||
@@ -4692,14 +4698,20 @@ void testLiteWalletControllerWorkerProducesModel()
|
||||
LiteWalletController controller(caps, conn, LiteClientBridge::fromApi(dragonx::test::makeFakeLiteApi()));
|
||||
EXPECT_TRUE(controller.createWallet(LiteWalletCreateRequest{}).walletReady); // auto-starts the worker
|
||||
|
||||
// The worker refreshes immediately on start; poll briefly (<=2s) for the produced model.
|
||||
// The worker publishes progress-only models while syncing, then full models once synced.
|
||||
// Poll until a full (balance-bearing) model arrives (sync is instant with the fake).
|
||||
LiteWalletAppRefreshModel model;
|
||||
bool got = false;
|
||||
for (int i = 0; i < 200 && !got; ++i) {
|
||||
got = controller.takeRefreshedModel(model);
|
||||
if (!got) std::this_thread::sleep_for(std::chrono::milliseconds(10));
|
||||
bool gotFull = false;
|
||||
for (int i = 0; i < 500 && !gotFull; ++i) {
|
||||
LiteWalletAppRefreshModel m;
|
||||
if (controller.takeRefreshedModel(m) && m.hasBalance) {
|
||||
model = m;
|
||||
gotFull = true;
|
||||
break;
|
||||
}
|
||||
std::this_thread::sleep_for(std::chrono::milliseconds(10));
|
||||
}
|
||||
EXPECT_TRUE(got);
|
||||
EXPECT_TRUE(gotFull);
|
||||
EXPECT_TRUE(model.hasBalance);
|
||||
EXPECT_TRUE(model.hasAddresses);
|
||||
|
||||
@@ -4713,6 +4725,31 @@ void testLiteWalletControllerWorkerProducesModel()
|
||||
EXPECT_FALSE(idle.takeRefreshedModel(none));
|
||||
}
|
||||
|
||||
// M2b-3 hardening: the backend `sync` is a blocking, uninterruptible full scan. Destroying the
|
||||
// controller while a sync is in flight must NOT hang (the sync thread is detached, not joined).
|
||||
void testLiteWalletControllerShutdownDoesNotHangDuringSync()
|
||||
{
|
||||
using namespace dragonx::wallet;
|
||||
const auto caps = makeWalletCapabilities(WalletBuildKind::Lite, /*embeddedDaemon*/ false, /*liteBackendLinked*/ true);
|
||||
const auto conn = defaultLiteConnectionSettings();
|
||||
|
||||
dragonx::test::g_liteFakeSyncBlock.store(true); // make the backend "sync" block indefinitely
|
||||
const auto start = std::chrono::steady_clock::now();
|
||||
{
|
||||
LiteWalletController controller(caps, conn, LiteClientBridge::fromApi(dragonx::test::makeFakeLiteApi()));
|
||||
controller.createWallet(LiteWalletCreateRequest{}); // launches the (now-blocked) sync thread
|
||||
EXPECT_TRUE(controller.syncStarted());
|
||||
EXPECT_FALSE(controller.syncComplete());
|
||||
// controller destructs here with the sync thread still blocked -> must return promptly.
|
||||
}
|
||||
const auto elapsedMs = std::chrono::duration_cast<std::chrono::milliseconds>(
|
||||
std::chrono::steady_clock::now() - start).count();
|
||||
EXPECT_TRUE(elapsedMs < 1500); // did not wait for the (blocked) sync to finish
|
||||
|
||||
dragonx::test::g_liteFakeSyncBlock.store(false); // release the detached sync thread
|
||||
std::this_thread::sleep_for(std::chrono::milliseconds(50)); // let it unwind cleanly
|
||||
}
|
||||
|
||||
} // namespace
|
||||
|
||||
int main()
|
||||
@@ -4752,6 +4789,7 @@ int main()
|
||||
testLiteSyncStatusParserRealShapes();
|
||||
testLiteWalletControllerRefreshPopulatesState();
|
||||
testLiteWalletControllerWorkerProducesModel();
|
||||
testLiteWalletControllerShutdownDoesNotHangDuringSync();
|
||||
testLiteBridgeRuntimeShutdownIsIdempotent();
|
||||
testLiteBridgeRuntimeDestructorCallsShutdownOnce();
|
||||
testLiteBridgeRuntimeShutdownWaitsForOwnedStringRelease();
|
||||
|
||||
Reference in New Issue
Block a user