Multi-Domain Real-Time Coordination in C++20

Ranjith Hegde

mayaflux.org · github.com/MayaFlux/MayaFlux

The problem

Real-time multimedia systems do not run in a single loop. They run several incompatible ones at once.

Audio callbacks at 48kHz with hard real-time deadlines.
Graphics render loops at 60–144 FPS with soft frame pacing.
Asynchronous input from MIDI, HID, OSC devices with backend-defined threading.
User-defined coroutines that may demand either sample accuracy or frame-rate granularity depending on intent.

These contexts cannot be collapsed into a single abstraction without erasing the constraints that make them correct.

But these contexts need to cooperate.

Audio-rate data feeds GPU shaders.
Input events trigger coroutines.
A coroutine scheduled at sample rate drives a computation consumed at frame rate.

When independent execution contexts must share data and coordinate timing without violating each other's safety constraints: the architectural problem is not scheduling.

It is coexistence.

How do independent real-time execution contexts coexist without collapsing into a single abstraction?

What coexistence requires

Independent execution loops operating concurrently, each with its own thread and clock.

Lock-free coordination, because some contexts cannot block or yield.

Temporal intent that survives movement across threads and rate domains.

Compile-time data unification, so computation does not fragment into domain-specific representations at every boundary.

This is not an isolated problem to any single domain. Game engines coordinate rendering, physics, and input at mismatched rates. Robotics systems coordinate sensors, actuators, and planners. Any real-time C++ system with heterogeneous execution models faces the same architectural tension.

The journey

Not from a conventional technical background. No CS degree.

Interdisciplinary performance practice across sound, movement, image, space. Not "providing music" but building real-time conversations across domains where negotiation of tension, form, and power is the work.

Research thesis (Institute of Sonology, The Hague): gesture tracking and tendency systems.

Audio developer, Metro: Awakening (VR, Unreal Engine 5) shipped production audio, hit MetaSounds' walls from the inside.

Faculty at Srishti: creative computing, shader programming, sociopolitics of sound, systems theory.

"I trained as a classical violinist. That's not architecturally relevant, but it's where the research started: understanding the limits of inherited dogmas in established traditions."

Sculpting Noise: (Systems theory)

A university-level course I designed and taught at Srishti.

Understanding and appreciating indeterminacy all around us.
Acknowledging the quality of asynchronous components within all systems.
How we fail when we pretend that things are or need to be linear.
How we disqualify non-linear interactions with a system instead of celebrating and working with them.

A large component was about logical fallacies. The fallacy that synchronous behavior is the default. The fallacy that determinism equals correctness. The fallacy that if you cannot predict the output, the system is broken.

This is exactly the framing for multi-domain real-time systems.

Independent execution contexts are asynchronous by nature. Forcing them into a single linear model is not engineering. It is a logical fallacy dressed as architecture.

The interdisciplinary method

The interdisciplinary method is the coexistence problem in physical form.

Not linking a library and making calls via its provided API.
Inheriting from each other's classes.
Creating overloads to put my concepts into their structures, providing the same acccess in return
Type erasure to morph and accept external into my domain and vice versa.
Breaking things apart.
Threading them out.
Transcendence/disolving disciplinary identity/boundaries

I make a sound. A dancer responds. I respond to their movement (with sound or movement).
We negotiate space, tension, form, power. Not through an API. Through inheritance and overriding.

This is where the architectural intuition comes from. Independent agents, independent clocks, independent vocabularies, cooperating without surrendering autonomy. The rest of the talk is what that looks like in C++20.

From acoustic -> Analog -> Digital

Analog synthesis : evolution from acoustic constraints.
A voltage has control. An op-amp has limits. A resistor has tolerances. Negotiating those constraints leads to interesting results: the same way negotiating wood does.

Commercial mimics of acoustic domains (electric keyboard and guitars are ubiquitous and impressive examples)
But, the real innovation: new possibilities of the medium. New sounds, new techniques, new imaginations unrelated to the acoustic domain.

Then computing. And here the journey broke.

Not an evolution from analog constraints the way analog from acoustic constraints.
Instead: an expensive and painstaking recreation of analog constraints.
Virtual knobs. Virtual cables. Virtual eurorack. Virtual LFOs (artificial separation that made sense in analog, but pointless in digital)

Not exploration of the new wild digital paradigm. Recreation of the old familiar analog one.

The Material Agency

Negotiating with the materiality of wood: allows player to work around the limits of physicality.
This yields endless variation because the constraints are real, universal, physical.
The same is true of analog: the op-amp's limits, the resistor's tolerances, the noise floor, negotiating those constraints is where expression lives.

A generation of GUI patchers stripped that negotiation away entirely. You are not negotiating the materiality of the circuit or the logic. You are not working around real constraints. You are adjusting parameter numbers on someone else's abstraction.

The agency that made acoustic and analog constraints generative is absent. There is nothing to negotiate with.

Cascading Constrains: The Luthier Principle

A violin is made of wood. Wood is real. It has grain, density, resonance, humidity response.
For centuries, luthiers have worked to ensure their constraints do not cascade into the player's constraints.
The gum they use, the veneer they need.

These should not limit the dimensions of play. The luthier's job is to absorb their own material limitations so the player never encounters them.

This is exactly what computing does not do.

The large majority of limitations in computing are not universal truths. They are decisions one of us made yesterday afternoon.

We need ~24+ frames per second to perceive continuous motion.
The GPU and display need to collaborate to facilitate that.

But it is a fundamental misconception to say the user has to provide instructions at that rate. That is cascading the domain limitation into user space.

The draw() loop. The audio callback. The update() tick. Not computational necessities.
One person's decision about how to expose hardware timing to the user.
Every framework that adopts them passes that decision forward as though it were a law of physics.

The compiler does not care.

A number is a number. Whether it becomes a sample in a DAC buffer or a pixel in a framebuffer or a push constant in a compute shader - routing decision made at the last possible moment. Not an architectural category imposed at the first.

The camera and the microphone having different clocks is a 1930s synchronization problem. Solved with 1930s hardware constraints.

We are not in 1930. We have processors. The computer knows nothing of biological or mechanical limits; those are imposed by our interfaces, not by the machine.

Separating audio and video pipelines at the architecture level is not physics. It is inertia. Separating them for hardware capture is real. Separating them as fundamental computational categories is not.

The unification is not a feature. It is the correction of a category error.

The tools we inherited

Pure Data/Max(early) : the scheduler problem stated clearly, forty years ago. The hardware it ran on is a museum piece. The problem it solved is not.

Processing / OpenGL: the draw-update loop as organizing principle. Opened computing to artists. Genuinely. The loop is the ceiling.

Max/MSP (modern): gen~ is brilliant. gen~ is also fighting its own environment. The patcher feels infinite. The ontological ceiling is real and invisible.

TouchDesigner:
Unprecedented interop between processing models.
Also:
Access as agency.
Complexity sanitized.
Parameterize pre-made decisions.
Carefully bounded environment.
Architecture remains a black box.
Flexibility means adjusting parameters, not reshaping the system.

MetaSounds / UE5:
AAA budget.
A real attempt at integrated (sans middleware) development.
Also:
Node graph = commercial eurorack, 1:1.
No new computational possibilities.
Digital mimicry of old hardware.
Not a DSP system; a consumer hardware metaphor with a C++ wrapper.

Imagine if Unreal’s camera or lighting systems:
Enforced vintage film camera constraints.
Baked commercial hardware limits into their classes.
Instead of exposing the full potential of computation.

The pattern across all of them: the analog metaphor, dressed differently each decade. At no point was this inevitable. These were choices.

So I started writing.

MayaFlux is fifteen years of the same question, finally with the right language to build the answer.

C++20: coroutines, atomic_ref, concepts, structured bindings, ranges. For the first time, the language was expressive enough to match the idea without fighting it.

Not an audio library with visual injection.

Infrastructure-level creative computing where audio, visual, and control data are unified numerical streams.

Processed by the same node architecture.
Coordinated by the same coroutine scheduler.
Sharing state through the same lock-free mechanisms.

The existence proof that domain separation is a choice, not a constraint.

Four independent execution contexts. None owns the scheduling.

Audio: RtAudio callback, hardware-driven, ~21μs per sample at 48kHz.

Graphics: Vulkan 1.3 dynamic rendering, manual frame submission.

Input: backend-dependent : GLFW polling, HID interrupt, MIDI callback.

User code: C++20 coroutines, schedulable from any of the above.

Each has its own thread. Each has its own clock. They share data but not control flow.

AudioSubsystem::process_output

// Called by RtAudio hardware interrupt
m_callback_active.fetch_add(1, std::memory_order_acquire);

m_handle->tasks.process_buffer_cycle();
for (uint32_t ch = 0; ch < num_channels; ch++) {
    m_handle->buffers.process_channel(ch, num_frames);
    network_out[ch] =
        m_handle->nodes.process_audio_networks(num_frames, ch);
}
for (size_t i = 0; i < num_frames; ++i) {
    m_handle->tasks.process(1);
    for (size_t j = 0; j < num_channels; ++j) {
        double s = m_handle->nodes.process_sample(j)
                 + buffer_sample;
        output_span[i * num_channels + j] =
            std::clamp(s, -1., 1.);
    }
}

m_callback_active.fetch_sub(1, std::memory_order_release);

GraphicsSubsystem::process

// Called from own thread, paced by FrameClock
for (auto& [name, hook] : m_handle->pre_process_hooks)
    hook(1);

m_handle->tasks.process(1);
m_handle->nodes.process(1);
m_handle->buffers.process(1);

render_all_windows();

for (auto& [name, hook] : m_handle->post_process_hooks)
    hook(1);

Same handle. Same interface. Different thread. Different clock. Different processing order. Neither knows the other exists.

InputManager::processing_loop

// Its own thread. No tick. No frame. Event-driven.
while (true) {
    while (auto value = m_queue.pop()) {
        dispatch_to_nodes(*value);
        m_events_processed.fetch_add(1);
    }
    if (m_stop_requested.load()) break;
    m_queue_notify.wait(false);
    m_queue_notify.store(false);
}

// macOS: hazard pointers (no std::atomic<shared_ptr>)
#ifdef MAYAFLUX_PLATFORM_MACOS
    size_t slot = m_hazard_counter.fetch_add(1) % MAX_READERS;
    const RegistrationList* current_regs;
    do {
        current_regs = m_registrations.load();
        m_hazard_ptrs[slot].store(current_regs);
    } while (current_regs != m_registrations.load());
#else
    auto current_regs = m_registrations.load();
#endif

Three subsystems. Three threads. Three entirely different timing models. The handle is the shared vocabulary. The scheduling is not.

Why lock-free

The conventional answer: mutexes cause priority inversion, unbounded latency, potential deadlock.

The deeper reason: a mutex is a scheduling decision. When thread A holds a lock and thread B blocks on it, B has surrendered its scheduling authority to A.

Lock-free is not an optimization. It is the only coordination model consistent with the premise.

RootNode : lock-free node graph registration

void RootNode::register_node(const std::shared_ptr<Node>& node)
{
    for (auto& pending_op : m_pending_ops) {
        bool expected = false;
        if (pending_op.active.compare_exchange_strong(
                expected, true,
                std::memory_order_acquire,
                std::memory_order_relaxed)) {
            pending_op.node = node;
            pending_op.is_addition = true;
            m_pending_count.fetch_add(1, std::memory_order_relaxed);
            return;
        }
    }

    while (m_is_processing.load(std::memory_order_acquire))
        m_is_processing.wait(true, std::memory_order_acquire);

    m_Nodes.push_back(node);
}

Fixed-size PendingOp array, each slot guarded by an atomic bool. CAS to claim. Drain between cycles. No allocation. No lock. Same pattern in RootBuffer.

The processing cycle guard

bool RootNode::preprocess()
{
    bool expected = false;
    if (!m_is_processing.compare_exchange_strong(expected, true,
            std::memory_order_acquire, std::memory_order_relaxed))
        return false;

    if (m_pending_count.load(std::memory_order_relaxed) > 0)
        process_pending_operations();
    return true;
}

void RootNode::postprocess()
{
    for (auto& node : m_Nodes)
        node->request_reset_from_channel(m_channel);

    if (m_pending_count.load(std::memory_order_relaxed) > 0)
        process_pending_operations();

    m_is_processing.store(false, std::memory_order_release);
    m_is_processing.notify_all();
}

acquire on entry. release on exit. Pending ops drain at both boundaries. Registration latency bound to one cycle.

Memory ordering : why these specific choices

acquire on load guards processing entry : must see all prior writes from other threads.

release on store signals completion : must publish all writes made during processing.

relaxed on counters : m_pending_count needs atomicity, not ordering.

compare_exchange_strong: acquire on success (must see slot state), relaxed on failure (don't care, try the next).

On ARM (Steam Deck, Apple Silicon, phones) these produce different instructions. x86 gives you acquire/release for free. Lock-free bugs that only manifest on ARM are real.

Coroutines : time as schedulable intent

The audio hardware needs samples at 48kHz. The user should not have to think in units of 48,000 ticks per second.

co_await Kriya::SampleDelay{scheduler.seconds_to_samples(0.5)};

struct SampleDelay {
    uint64_t samples_to_wait;

    bool await_ready() const { return samples_to_wait == 0; }
    void await_resume() { }

    void await_suspend(std::coroutine_handle<promise_type> h) {
        h.promise().next_sample += samples_to_wait;
    }
};

One integer write into the promise. No thread. No callback. The user said "wait half a second." The infrastructure translated it to 24,000 ticks. The hardware constraint did not cascade.

Domain clocks and token routing

scheduler->process_token(ProcessingToken::SAMPLE_ACCURATE, 1024);
scheduler->process_token(ProcessingToken::FRAME_ACCURATE, 1);

Each ProcessingToken has its own clock. AudioSubsystem advances SAMPLE_ACCURATE. GraphicsSubsystem advances FRAME_ACCURATE.

The coroutine does not know which domain advances it. It expressed intent in time units. The domain that owns the clock fulfills it. Same structure, different scheduling context. The code is identical.

This is what "the compiler does not care" looks like in practice.

Composable awaiters

auto routine = [](TaskScheduler& scheduler) -> SoundRoutine {
    while (true) {
        co_await Kriya::Gate{scheduler, callback, logic_node, true};

        float wait = calculate_timing();
        co_await Kriya::SampleDelay{scheduler.seconds_to_samples(wait)};

        co_await Kriya::Trigger{scheduler, true, sync_cb, sync_node};
    }
};

Gate suspends until a condition. SampleDelay for a duration. Trigger fires and synchronizes. Each a standalone awaiter. They compose because co_await is an expression, not a control structure.

The interdisciplinary method, in code. Each awaiter is an independent voice. The coroutine is the conversation.

Data unification : concepts, not inheritance

template <typename T>
concept ProcessableData = ArithmeticData<T> || ComplexData<T> || GlmData<T>;

A Polynomial node shaping an audio envelope and a Polynomial node warping a shader parameter are the same node. Resolved at compile time.

NodeBindingsProcessor binds audio-rate outputs to GPU push constants. The node outputs a double. The shader consumes a float. The Polynomial does not know it feeds a shader. The shader does not know the number came from audio.

The full concept hierarchy : pch.h

template <typename T> concept IntegerData    = std::is_integral_v<T>;
template <typename T> concept DecimalData    = std::is_floating_point_v<T>;
template <typename T> concept ArithmeticData = IntegerData<T> || DecimalData<T>;

template <typename T> concept GlmVectorType =
    std::is_same_v<T, glm::vec2> || std::is_same_v<T, glm::vec3>
 || std::is_same_v<T, glm::vec4> || std::is_same_v<T, glm::dvec2> /* ... */;

template <typename T> concept GlmMatrixType =
    std::is_same_v<T, glm::mat2> || std::is_same_v<T, glm::mat3>
 || std::is_same_v<T, glm::mat4> /* ... */;

template <typename T> concept ProcessableData =
    ArithmeticData<T> || ComplexData<T> || GlmData<T>;

template <typename T> concept ContiguousContainer = requires(T t) {
    { t.data() } -> std::convertible_to<typename T::value_type*>;
    { t.size() } -> std::convertible_to<std::size_t>;
};

All in the precompiled header. Available everywhere. No include chains. float, double, int, complex, vec2, vec3, vec4, mat4 --- all constrained at compile time through the same hierarchy.

Compile-time conversion safety

template <typename From, typename To>
    requires GlmType<From> && GlmType<To>
         && (glm_component_count<From>() == glm_component_count<To>())
struct is_convertible_data<From, To> : std::true_type { };

template <typename From, typename To>
    requires ArithmeticData<From> && ArithmeticData<To>
         && (!GlmType<From>) && (!GlmType<To>)
struct is_convertible_data<From, To> : std::true_type { };

template <typename From, typename To>
    requires GlmType<From> && ArithmeticData<To> && (!GlmType<To>)
struct is_convertible_data<From, To> : std::true_type { };

Constrained partial specializations. vec3→vec3: allowed (same component count). vec3→float: allowed (extraction). vec3→vec2: rejected at compile time (component mismatch). No runtime type checks. No dynamic_cast. The compiler enforces data compatibility before the program runs.

NDData: unified storage, not unified type

using DataVariant = std::variant<
    std::vector<double>,              // High precision audio
    std::vector<float>,               // Standard precision
    std::vector<uint8_t>,             // Image data
    std::vector<uint16_t>,            // CD audio, 16-bit image
    std::vector<std::complex<float>>, // Spectral FFT
    std::vector<std::complex<double>>,// High precision spectral
    std::vector<glm::vec2>,           // UV coordinates
    std::vector<glm::vec3>,           // Vertex positions, normals
    std::vector<glm::vec4>,           // RGBA colors
    std::vector<glm::mat4>            // Transform matrices
>;

One variant type holds everything. Audio samples, image pixels, vertex positions, spectral data, transformation matrices. The variant does not erase the type, it preserves it. std::visit dispatches at runtime only when you cross a storage boundary. Within a domain, the type is known statically.

DataModality: semantic detection, not user annotation

enum class DataModality : uint8_t {
    AUDIO_1D, AUDIO_MULTICHANNEL,
    IMAGE_2D, IMAGE_COLOR,
    VIDEO_GRAYSCALE, VIDEO_COLOR,
    SPECTRAL_2D, VOLUMETRIC_3D, TENSOR_ND,
    VERTEX_POSITIONS_3D, VERTEX_NORMALS_3D,
    VERTEX_COLORS_RGB, TEXTURE_COORDS_2D,
    TRANSFORMATION_MATRIX,
    UNKNOWN
};

// Modality detected from dimensional structure, not declared by user
std::vector<DataDimension> dims = {
    DataDimension::time(48000),
    DataDimension::channel(2)
};
DataModality m = detect_data_modality(dims); // AUDIO_MULTICHANNEL

The data describes itself. Add a spatial dimension and the same data becomes IMAGE_2D. Add a time dimension on top and it becomes VIDEO_GRAYSCALE. The modality is structural, not declared.

DataDimension : N-dimensional by construction

// Factory methods. dimensions are semantic, not just sizes
auto t = DataDimension::time(48000);
auto ch = DataDimension::channel(2, 1);
auto sp = DataDimension::spatial_2d(1920, 1080);
auto vp = DataDimension::vertex_positions(10000);   // vec3, 3 components
auto uv = DataDimension::texture_coords(10000);    // vec2, 2 components

// Create typed storage for any modality
auto [variants, dims] =
    DataDimension::create_for_modality<float>(
        DataModality::IMAGE_COLOR,
        {1080, 1920, 4}   // height, width, RGBA
    );

Same factory, same variant, same dimension system. Whether you're storing 48,000 audio samples or a 1920×1080 RGBA image or 10,000 vertex positions. The N in NDData is not aspirational, it is the only mode of operation.

The full coroutine promise

struct audio_promise : public routine_promise<SoundRoutine> {
    SoundRoutine get_return_object();
    std::suspend_always initial_suspend() { return {}; }
    std::suspend_always final_suspend() noexcept { return {}; }
    void return_void() { }
    void unhandled_exception() { std::terminate(); }

    const ProcessingToken processing_token { ProcessingToken::ON_DEMAND };
    bool auto_resume = true;
    bool should_terminate = false;
    const bool sync_to_clock = false;
    uint64_t delay_amount = 0;
    uint64_t next_sample = 0;
    uint64_t next_buffer_cycle = 0;
    DelayContext active_delay_context = DelayContext::AWAIT;
    std::unordered_map<std::string, std::any> state;

    template <typename T>
    void set_state(const std::string& key, T value);

    template <typename T>
    T* get_state(const std::string& key);
};

Everything lives in the coroutine frame. next_sample for audio timing. next_buffer_cycle for buffer-rate timing. State dictionary for arbitrary typed storage between suspensions. External code reads and writes state through the promise while the coroutine is suspended.

Three awaiter domains

// Audio domain — sample-accurate
struct SampleDelay {
    void await_suspend(std::coroutine_handle<audio_promise> h) {
        h.promise().next_sample += samples_to_wait;
        h.promise().active_delay_context = DelayContext::SAMPLE_BASED;
    }
};

// Buffer domain — per-cycle granularity
struct BufferDelay {
    void await_suspend(std::coroutine_handle<audio_promise> h) {
        h.promise().next_buffer_cycle += num_cycles;
        h.promise().active_delay_context = DelayContext::BUFFER_BASED;
    }
};

// Graphics domain — frame-accurate
struct FrameDelay {
    void await_suspend(std::coroutine_handle<graphics_promise> h) noexcept;
};

// Cross-domain — simultaneous constraints
struct MultiRateDelay {
    uint64_t samples_to_wait;
    uint32_t frames_to_wait;
    void await_suspend(std::coroutine_handle<complex_promise> h) noexcept;
};

Each awaiter writes to a different field in the promise. Each promise type matches a domain. MultiRateDelay bridges both: it suspends until both the sample count and frame count are reached.

Routine resumption : TaskScheduler internals

bool SoundRoutine::try_resume_with_context(
    uint64_t current_value, DelayContext context)
{
    auto& p = m_handle.promise();

    if (p.should_terminate || !p.auto_resume)
        return false;

    bool should_resume = false;

    switch (context) {
    case DelayContext::SAMPLE_BASED:
        should_resume = (current_value >= p.next_sample);
        if (should_resume)
            p.next_sample = current_value + p.delay_amount;
        break;
    case DelayContext::BUFFER_BASED:
        should_resume = (current_value >= p.next_buffer_cycle);
        if (should_resume)
            p.next_buffer_cycle = current_value + p.delay_amount;
        break;
    }

    if (should_resume)
        m_handle.resume();

    return should_resume;
}

The scheduler calls this for every active routine each cycle. The context parameter determines which clock to compare against. The promise's delay_amount auto-reschedules repeating patterns. One comparison, one resume. No vtable. No callback indirection.

NodeBindingsProcessor : audio to GPU, no bridging code

void NodeBindingsProcessor::execute_shader(
    const std::shared_ptr<VKBuffer>& buffer)
{
    update_push_constants_from_nodes();

    auto& staging = buffer->get_pipeline_context().push_constant_staging;

    for (const auto& [name, binding] : m_bindings) {
        if (staging.size() < binding.push_constant_offset + binding.size)
            staging.resize(binding.push_constant_offset + binding.size);

        std::memcpy(
            staging.data() + binding.push_constant_offset,
            m_push_constant_data.data() + binding.push_constant_offset,
            binding.size);
    }
}

// User code -- bind once, runs every frame
auto proc = create_processor<NodeBindingsProcessor>(tex, config);
proc->set_push_constant_size<Params>();
proc->bind_node("radial", envelope_node,
    offsetof(Params, radial_scale), sizeof(float));

The node ticks at audio rate. The processor reads get_last_output() at frame rate. memcpy into the push constant staging buffer. Vulkan consumes it. The entire audio-to-GPU path is one memcpy per binding per frame.

DescriptorBindingsProcessor : arbitrary sources to GPU descriptors

void DescriptorBindingsProcessor::update_descriptor_from_node(DescriptorBinding& binding)
{
    switch (binding.source_type) {

    case SourceType::NODE: {
        float value {};
        if (binding.processing_mode.load(std::memory_order_acquire) == ProcessingMode::INTERNAL)
            value = static_cast<float>(Buffers::extract_single_sample(binding.node));
        else
            value = static_cast<float>(binding.node->get_last_output());

        Nodes::NodeContext& ctx = binding.node->get_last_context();

        switch (binding.binding_type) {
        case BindingType::SCALAR:
            upload_to_gpu(&value, sizeof(float), binding.gpu_buffer, nullptr); break;
        case BindingType::VECTOR: {
            auto data = dynamic_cast<Nodes::GpuVectorData*>(&ctx)->gpu_data();
            upload_to_gpu(data.data(), data.size_bytes(), binding.gpu_buffer, nullptr); break; }
        case BindingType::MATRIX: {
            auto data = dynamic_cast<Nodes::GpuMatrixData*>(&ctx)->gpu_data();
            upload_to_gpu(data.data(), data.size_bytes(), binding.gpu_buffer, nullptr); break; }
        case BindingType::STRUCTURED: {
            auto data = dynamic_cast<Nodes::GpuStructuredData*>(&ctx)->gpu_data();
            upload_to_gpu(data.data(), data.size_bytes(), binding.gpu_buffer, nullptr); break; }
        } break;
    }

    case SourceType::AUDIO_BUFFER: {
        const auto& samples = audio->get_data();
        thread_local std::vector<float> conv;
        std::ranges::transform(samples, conv.begin(),
            [](double d) { return static_cast<float>(d); });
        upload_to_gpu(conv.data(), required, binding.gpu_buffer, nullptr); break;
    }

    case SourceType::HOST_VK_BUFFER:  /* raw bytes from VKBuffer */
    case SourceType::NETWORK_AUDIO:  /* double→float conversion from network stream */
    case SourceType::NETWORK_GPU:    /* vertex data from network peer */
        // Same pattern: extract → ensure capacity → upload_to_gpu
    }
}

Six source types. Scalars, vectors, matrices, structured data, entire audio buffers, network streams. All converge to the same upload_to_gpu call. The GPU does not know where the data came from. The source does not know it feeds a descriptor.

The shader side

#version 450

layout(push_constant) uniform Params {
    float radial_scale;
    float angular_velocity;
    float chroma_split;
};

void main() {
    vec2 center = fragTexCoord - 0.5;
    float dist = length(center);
    float angle = atan(center.y, center.x);

    float r = dist + radial_scale * 0.1
            * sin(dist * 20.0 + angular_velocity);

    // Chromatic aberration from audio envelope
    vec2 uv_r = vec2(0.5 + r * cos(angle - chroma_split),
                     0.5 + r * sin(angle - chroma_split));
    vec2 uv_b = vec2(0.5 + r * cos(angle + chroma_split),
                     0.5 + r * sin(angle + chroma_split));

    outColor = vec4(
        texture(texSampler, uv_r).r,
        texture(texSampler, fragTexCoord).g,
        texture(texSampler, uv_b).b,
        1.0);
}

radial_scale, angular_velocity, chroma_split. Each driven by a Polynomial node processing an audio signal. The shader has no idea. It just reads floats from push constants. The unification is invisible at the consumption site because there was never a separation to bridge.

ProcessingTokens : three subsystems, three vocabularies

Nodes : what rate to process at

enum class ProcessingToken {
    AUDIO_RATE,
    VISUAL_RATE,
    EVENT_RATE,
    CUSTOM_RATE
};

Vruta (coroutines): when to resume

enum class ProcessingToken {
    SAMPLE_ACCURATE,
    FRAME_ACCURATE,
    EVENT_DRIVEN,
    MULTI_RATE,
    ON_DEMAND,
    CUSTOM
};

Buffers : composable bitfield: rate × device × concurrency

enum ProcessingToken : uint32_t {
    // Rate
    SAMPLE_RATE  = 0x0,
    FRAME_RATE   = 0x2,
    EVENT_RATE   = 0x40,
    // Device
    CPU_PROCESS  = 0x4,
    GPU_PROCESS  = 0x8,
    // Concurrency
    SEQUENTIAL   = 0x10,
    PARALLEL     = 0x20,

    // Composed backends
    AUDIO_BACKEND    = SAMPLE_RATE | CPU_PROCESS | SEQUENTIAL,
    GRAPHICS_BACKEND = FRAME_RATE  | GPU_PROCESS | PARALLEL,
    AUDIO_PARALLEL   = SAMPLE_RATE | GPU_PROCESS | PARALLEL,
    INPUT_BACKEND    = EVENT_RATE  | CPU_PROCESS | SEQUENTIAL,
};

Nodes define what rate. Buffers define where and how. Coroutines define when. Each subsystem speaks its own token language. None inherits from the others.

Domain : composition, not category

enum Domain : uint64_t {
    // Three subsystem tokens packed into one integer
    // [Nodes::ProcessingToken << 32 | Buffers::ProcessingToken << 16 | Vruta::ProcessingToken]

    AUDIO            = (Nodes::AUDIO_RATE  << 32) | (Buffers::AUDIO_BACKEND   << 16) | Vruta::SAMPLE_ACCURATE,
    GRAPHICS         = (Nodes::VISUAL_RATE << 32) | (Buffers::GRAPHICS_BACKEND << 16) | Vruta::FRAME_ACCURATE,
    AUDIO_VISUAL_SYNC = (Nodes::AUDIO_RATE  << 32) | (Buffers::SAMPLE_RATE      << 16) | Vruta::FRAME_ACCURATE,
    AUDIO_GPU        = (Nodes::AUDIO_RATE  << 32) | (Buffers::GPU_PROCESS      << 16) | Vruta::MULTI_RATE,
    INPUT_EVENTS     = (Nodes::CUSTOM_RATE << 32) | (Buffers::WINDOW_EVENTS    << 16) | Vruta::EVENT_DRIVEN,
};

// Compose arbitrary domains at runtime
inline Domain compose_domain(Nodes::ProcessingToken n, Buffers::ProcessingToken b, Vruta::ProcessingToken t) {
    return static_cast<Domain>((static_cast<uint64_t>(n) << 32) | (static_cast<uint64_t>(b) << 16) | static_cast<uint64_t>(t));
}

// Decompose back into constituents
inline Nodes::ProcessingToken  get_node_token(Domain d)   { return static_cast<Nodes::ProcessingToken>((d >> 32) & 0xFFFF); }
inline Buffers::ProcessingToken get_buffer_token(Domain d) { return static_cast<Buffers::ProcessingToken>((d >> 16) & 0xFFFF); }
inline Vruta::ProcessingToken  get_task_token(Domain d)   { return static_cast<Vruta::ProcessingToken>(d & 0xFFFF); }

AUDIO_VISUAL_SYNC is not a special case someone built a bridge for. It is a different combination of the same three tokens. Any domain the user needs is one compose_domain call away.

"Audio" is a perceptual category. "Sample-accurate" is a scheduling constraint. MayaFlux encodes the latter. Domains are arithmetic, not taxonomy.

Live camera feed from /dev/video0. Three physical modeling networks: ResonatorNetwork (vowel/formant voice modelling), Waveguide string, and Modal inharmonic, all running at audio rate and mutually exciting each other.

All three drive the fragment shader directly through GPU storage buffers and push constants. The camera texture is being “messed up” in real time by audio-rate physical models with zero bridging code. Graphics-rate exciters and keyboard/mouse input switch the exciters instantly.

Multiple domains. Multiple clocks. Multiple threads. One architecture.

The luthier’s constraints do not cascade into the player’s hands.