Portability rules for distributed implementations
Portability rules for distributed implementations
Many SDK interfaces (IJobScheduler, IJobStore, IModuleQueryBus, INotificationChannel, IShareTokenStore, etc.) could plausibly be implemented by a distributed task framework — Akka.NET, Orleans, Hangfire, Redis-backed, etc. Doing so without breaking consumers requires that the shape of the contract itself stays portable.
The six rules below define that portability discipline. Any interface that might be implemented distributed-side must satisfy all six. The conformance test packs (IJobSchedulerContract, IModuleQueryBusContract, IShareTokenStoreContract, IDataSourceContract, IEntityStoreContract, etc.) are the executable enforcement bar — every implementation passes the same tests.
Rule 1 — Identity by value
Returns and parameters use string, Guid, or domain records — never live framework handles (IActorRef, IGrainReference, Task<T> you can't resume from a different process, etc.).
// Good
abstract GetJob: jobId: string -> Async<JobDefinition option>
// Bad — Akka actor ref doesn't cross process boundaries
abstract GetJob: jobId: string -> Async<IActorRef>
Why: identity by value means the same caller can look up the same handle from any node in a cluster. Live handles bind to a process; if that process is gone, so is the handle.
Rule 2 — Async at every boundary
Every interface method returns Async<T> or Task<T>. Synchronous methods (unit -> T) and fire-and-forget Tell-style signatures (unit -> unit) are violations.
// Good
abstract Publish: scopeId: string -> Notification -> Async<unit>
// Bad — caller can't await the actual publish completion
abstract Publish: scopeId: string -> Notification -> unit
Why: Async/Task lets a distributed impl do real network I/O, batch, retry, and propagate failures. Sync signatures force a buffer-or-block decision the impl can't make safely.
Documented exception: IMetricsSink
The metrics interface is sync-by-design — hot path, write-only, no return to await. The trade-off is documented; metrics sinks that need to do I/O buffer internally and flush on a background timer.
Documented exception: compose-time-only methods
Methods invoked exclusively at compose time MAY return unit synchronously rather than Async<unit> when (a) the call site is known to be the composition root or its narrow delegates (typically ServerModule.Server.fs once per module), and (b) the interface header documents the compose-time-only contract. The in-process default for IJobScheduler.RegisterHandler follows this carve-out — its header explicitly notes "Modules typically call this once at compose time … the SDK does NOT call it per request", and a sync unit return reflects the pure-registration semantics (no I/O, no failure to propagate). A future distributed companion (Akka cluster, Orleans) registering handlers at runtime over the network would need to surface an additional RegisterHandlerAsync : name -> IJobHandler -> Async<Result<unit, _>> overload alongside the sync method; the sync overload stays for compose-time consumers and the async overload covers the runtime-registration path. This carve-out is parallel to the IMetricsSink sync exception above — both prefer a sync signature when the interface's documented usage pattern makes the async boundary noise rather than substrate.
Rule 3 — Retry and supervision as data
Retry, backoff, and dead-letter behaviour are expressed as records (e.g. RetryPolicy). Callback parameters like OnFailure: exn -> unit or supervision-strategy objects leak framework semantics.
// Good
type JobRetryPolicy = {
MaxAttempts: int
BackoffSeconds: int list
DeadLetterAfter: TimeSpan option
}
// Bad — exposes Akka-specific supervision semantics
abstract RegisterJob: jobId: string -> handler: ... -> supervisor: SupervisorStrategy -> Async<unit>
Why: data-shaped retry/supervision serialises across the wire, persists across restarts, and translates cleanly between frameworks. Callback supervision is framework-bound and untransportable.
Rule 4 — Stateless handlers between invocations
Handler interfaces (IJobHandler.Execute, IQueryHandler.Handle, notification subscribers) receive all state via parameters (JobContext, payload, AccessContext). Implementations must NOT assume in-memory state between calls.
type IJobHandler =
// Good — every input is on the parameter list
abstract Execute: ctx: JobContext -> payload: byte[] -> Async<JobResult>
Why: Orleans can deactivate grains between calls; Akka.Persistence can restart actors after a crash. Anything cached in handler instance fields evaporates. Inputs flow on the parameter list; if a handler needs durable state, it persists through IBlobStorage / IEntityStore / etc.
Rule 5 — No cross-shard ordering promises
Documentation and tests make clear that ordering is guaranteed only within a ShardKey. A method whose correctness depends on cross-shard ordering is a violation.
For IJobScheduler: jobs with the same JobId execute in order. Across different JobIds no ordering promise exists — Job A's tick-at-09:00 may complete before or after Job B's tick-at-09:00.
For IModuleQueryBus: queries on the same shard key are ordered. Across shards no ordering.
Why: distributed implementations partition by shard key for horizontal scale; cross-shard ordering would force a single coordinator and serialise the whole system.
Rule 6 — Precision at the lower bound
Scheduling and timing primitives declare their precision contract (JobPrecision: Second | Minute). An interface that implicitly promises sub-second precision where some implementations can't honour it (Orleans Reminders fire at minute granularity, for example) is a violation.
type JobPrecision = Second | Minute
type JobDefinition = {
// …
Precision: JobPrecision // explicit
}
The default InProcessJobScheduler rejects Second-precision job registration with ScheduleError.PrecisionNotSupported rather than silently slipping to minute granularity. The decision is the caller's: ship a lower-precision job or use a different scheduler.
Why: precision drift is silent and impossible to debug. Explicit precision declaration forces callers to acknowledge the floor; tests can verify both impl + caller agree.
Other portability constraints (corollaries)
The six rules are the load-bearing core. These follow from them:
- No framework-specific serialisation attributes (
[<Serializable>],[<ProtoContract>], AkkaIWithUnboundedStash, etc.) on any type in a shared<Compile>file. Wire format is JSON viaFable.Remoting.Json.FableJsonConverterfor SDK SSE/persistence paths, vendor-specific for companion sinks. - No
open Akka.*/open Orleans.*in any file underToolUp.Platform.*. Distributed implementations live in companion packages (src/JobScheduler/Akka/, etc.) — the SDK interface never references a companion's types. - Companion packages exist only at the SDK boundary. The SDK interface never references a companion's types. Consumers pull both the SDK and the companion; the companion implements an SDK interface.
Conformance bar
Every distributed-friendly interface has a contract test pack in ToolUp.Platform.Tests. External implementations consume the test pack:
<PackageReference Include="ToolUp.Platform.Tests" />
And run the pack against their impl in their own test suite. The pack tests behaviour, not implementation — the same N tests pass for InProcessJobScheduler, a future Akka.NET impl, and a future Orleans impl.
Current packs:
IJobSchedulerContract(15 tests)IModuleQueryBusContract(9 tests)IShareTokenStoreContract(11 tests)IDataSourceContract(7 tests)IEntityStoreContract+IEntityQueryContract(22 tests combined)
A "distributed companion" not bundled with its conformance run is the wrong shape — without the test pack as evidence, the portability claim is unverified.
How this interacts with companion authoring
If you're writing a new SDK interface destined to have distributed impls in the future:
- Sketch the interface with all six rules in mind from day one. Retrofitting rule 2 (async at every boundary) is destructive.
- Author the contract test pack alongside the interface, not after. The first in-process impl validates the contract; future impls pin to it.
- Document the precision floor explicitly (rule 6) — what's the smallest tick you guarantee? Both impl and caller agree on the floor; tests verify.
- If you find yourself wanting to expose a live handle (rule 1) or a callback (rule 3), find another shape. Live handles are a portability trap; callbacks are an Erlang-shaped design that doesn't survive the .NET runtime model.
If you're writing a companion impl of an existing distributed-friendly interface:
- Run the conformance test pack against your impl in your test suite. If a test fails, your impl is wrong, not the test.
- Document the precision floor your impl actually achieves (might be higher than the SDK floor — Orleans Reminders are minute-precision; an Akka impl might honour sub-second).
- Distributed-ready impls must be stateless between handler calls (rule 4). In-process impls may hold state but document it clearly so consumers know not to use it distributed.