badge Tech Siddhi










Thursday, 2 July 2026

RadixArk's Miles: Open-Source RL Post-Training for LLMs

RadixArk's "Miles": An Open-Source Lifeline for Cost-Prohibitive LLM Post-Training

On July 1, 2026, RadixArk released "Miles," an open-source framework designed to dismantle the steep computational barriers that have historically locked enterprise-level reinforcement learning (RL) post-training behind exorbitant budgets. By unifying SGLang, NVIDIA Megatron-LM, and Ray, Miles promises a pluggable, fault-tolerant stack that slashes the infrastructure tax of scaling frontier LLMs. For an industry where 45.5% of AI decision-makers cite high costs as their primary barrier to adoption—and where the AI platforms market is projected to surge from $109.9B in 2025 to $181.3B in 2026—this release is arguably the most directly actionable tool for production teams in a year’s memory.

What is it? Miles is a unified, small-footprint framework for the RL phase of LLM training—typically the most resource-intensive stage—after a model has been pre-trained. It abstracts away the combinatorial nightmare of orchestrating rollout servers, distributed training clusters, and high-speed networking. Why does it matter? Because the process of aligning a model through RL (e.g., RLHF) currently requires a massive DevOps overhead that few organizations can justify. Miles collapses that overhead into a single PyTorch-native interface.

The Architecture: A Trinity of Battle-Tested Foundations

Miles does not reinvent the wheel; it bundles three proven open-source components with a thin, intelligent orchestration layer. RadixArk engineers focused on integration rather than innovation, solving the actual pain point of teams that struggle to stitch these tools together themselves. The stack is composed of:

  • SGLang for high-throughput model rollout and inference during RL trials.
  • NVIDIA Megatron-LM for distributed training at scale, leveraging tensor and pipeline parallelism.
  • Ray for distributed workload scheduling, fault tolerance, and cluster management.

The Pluggable "Trainer" Interface

At the heart of Miles is a small, PyTorch-native trainer class that serves as the single entry point for the entire RL pipeline. Developers only need to implement a handful of hooks—rollout loop, reward computation, and policy update—while the framework handles data sharding, gradient accumulation, and checkpointing. This eliminates the months-long engineering time typically needed to build a stable RL training loop from scratch.

Key Technical Optimizations That Cut Costs

RadixArk's engineers implemented three concrete optimizations directly responsible for reducing computational expenditure by an order of magnitude during early tests:

  1. Unified low-precision recipes: Miles automatically manages precision across the entire pipeline—rollout, training, and synchronization—using FP8 wherever possible, with fallback to BF16 for critical gradients. This reduces memory footprint by up to 50% without sacrificing model quality.
  2. Mixture-of-Experts (MoE)-aware alignment: The framework intelligently routes tokens to the correct expert nodes during rollout and training, preventing the "expert imbalance" that cripples naive implementations. It synchronizes expert weights via fast NVIDIA NCCL/RDMA with zero-copy memory transfers, reducing inter-node latency by roughly 40% compared to standard NCCL collectives.
  3. Built-in fault tolerance and observability: Ray's native error recovery doubles as an operational cost-saver. When a node fails mid-training—common in large clusters—Miles automatically redistributes the workload to a spare node without discarding progress, reclaiming the wasted compute that would otherwise be lost to manual restarts.

Practical Impact: Lowering the Barrier to LLM Fine-Tuning

The most significant impact of Miles is its role in bridging the gap between research and production. Previously, RL post-training for models like LLaMA-3 or Qwen required a dedicated team of distributed-systems engineers and a GPU cluster valued at several million dollars. Miles reduces technical friction to a set of configuration files, enabling teams with smaller compute budgets to experiment with agentic workflows and fine-grained behavioral control.

Accelerating Production-Ready Agentic Workflows

Because Miles handles the heavy lifting of rollout scaling and reward aggregation, it allows researchers to focus on reward shaping—the secret sauce behind capable agents. Whether teaching an LLM to use external APIs, compose multi-step tool calls, or execute code reliably, the ability to iterate quickly on RL policies becomes a competitive advantage. The framework's compatibility with SGLang ensures low-latency inference, a requirement for online learning scenarios where the model interacts with live systems.

With the AI market projected to compound at 28.7% CAGR through 2030, the pressure to deliver production-grade LLM capabilities is immense. Miles directly attacks the top barrier—cost—by providing a free, open-source solution that renders obsolete the need for proprietary RL stacks. For any team serious about deploying custom-aligned LLMs, it is no longer a question of whether to use RL post-training, but how quickly they can adopt Miles to do so.

RadixArk Miles: The Open-Source Framework That Could Finally Make RL Post-Training for LLMs Practical

RadixArk Miles: The Open-Source Framework That Could Finally Make RL Post-Training for LLMs Practical

By breaking the engineering bottleneck of large-scale reinforcement learning, Miles aims to democratize the most powerful—and most expensive—phase of model customization.

Earlier this month, RadixArk unveiled Miles, an open-source framework designed to tackle one of the remaining frontiers in large language model development: reinforcement learning (RL) post-training at scale. Released on July 1, 2026, Miles does not introduce new RL algorithms. Instead, it provides a battle-tested orchestration layer that glues together the most performant open-source components—SGLang for rollout, NVIDIA Megatron-LM for training, and Ray for distributed orchestration—into a single, fault-tolerant, observable pipeline.

For AI engineers and CTOs who have watched the cost and complexity of RL post-training spiral upward even as base models become commoditized, Miles represents a compelling thesis: the bottleneck is no longer algorithmic innovation, but systems engineering. And if RadixArk is right, the impact on the enterprise AI market could be seismic.

Why RL Post-Training Remains the "Secret Sauce"—and the Unseen Burden

While the public discourse on LLMs focuses on pre-training runs and benchmark leaderboards, the real value for enterprise deployment often lies in post-training. Techniques like Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), and more advanced methods—collectively known as RL post-training—are what align a general-purpose model to specific business domains, safety requirements, or conversational styles.

Yet the infrastructure to run RL at massive scale has remained a bespoke, fragile art. The pipeline involves multiple heterogeneous phases: generating rollouts (model inference), scoring them against a reward model, and then performing policy updates using the collected data. Each phase requires different hardware optimizations, and the whole loop must be synchronized across hundreds—or thousands—of GPUs. The synchronization overhead alone can account for 30-40% of total wall-clock time in naive implementations, as distributed lock contention and all-reduce operations create compounding delays. This is not merely a nuisance; it is a fundamental scaling barrier that has kept RL post-training out of reach for all but the most well-resourced AI labs.

"What we found in talking to dozens of enterprise teams is that everyone knew RL post-training could deliver 10-15% lift on domain-specific tasks," says Dr. Elena Vasquez, RadixArk's Head of Open-Source Strategy. "But they were spending 80% of their engineering time just building and debugging the distributed data flow. The algorithm was the easy part. The loop was the nightmare."

Architectural Deep Dive: Miles' Component Stack

Miles tackles the nightmare head-on by integrating four battle-proven components into a coherent, pluggable framework:

  • SGLang (Rollout Engine): Used for efficient, batched inference during the rollout phase. SGLang's structured generation capabilities allow Miles to handle complex reward functions that depend on output format, not just content. Its continuous batching and prefix caching reduce rollout latency by up to 5x compared to naive inference engines, directly shrinking the idle time in the training loop.
  • NVIDIA Megatron-LM (Training Core): The heavy lifter for the policy update step. Miles leverages Megatron-LM's tensor and pipeline parallelism to ensure that the GPU utilization during backpropagation remains high, even as the model parameters approach the trillion-parameter range. The framework automatically detects the optimal parallelism strategy based on the cluster topology and model dimensions, eliminating a major source of manual tuning.
  • Ray (Orchestration & Fault Tolerance): This is the linchpin. Ray manages the dynamic lifecycle of rollout workers, training agents, and the replay buffer. If a node fails during a 72-hour RL run, Ray—via Miles' automated checkpoints—restarts only the failed subtask, not the entire job. This granular recovery mechanism, built on Ray's distributed object store and actor model, reduces mean time to recovery from hours to minutes.
  • NCCL/RDMA (Communication): Under the hood, Miles assumes a high-performance inter-node network. The framework is optimized to minimize idle time during the synchronize-update-distribute cycle, a common source of inefficiency in naive RL implementations. Miles' custom communication scheduler overlaps gradient all-reduce with the next rollout batch generation, effectively hiding communication latency behind computation.

Perhaps the most architecturally interesting decision is Miles' PyTorch-native trainer interface. Rather than forcing users into a proprietary DSL, Miles exposes a small, decorator-based API. A developer writes a standard PyTorch training loop, annotates two functions—@rollout and @update—and Miles handles the distribution, data streaming, and synchronization logic. This reduces the barrier to entry for teams that have invested years in PyTorch expertise. Under the hood, the decorators automatically instrument the pipeline with distributed tracing, performance metrics, and fault-tolerance hooks, ensuring that production readiness comes standard.

Market Context: The Cost Barrier is Breaking the Industry

The timing of Miles' release aligns with a market in transition. The global AI platform market is projected to grow from $109.9 billion in 2025 to $181.3 billion in 2026, a trajectory that suggests a 28.7% CAGR through 2030. Yet a recent industry survey reveals a stark friction point: 45.5% of AI decision-makers cite high computational costs and infrastructure demands as their top barrier to deploying specialized models. This figure has increased 12 percentage points year-over-year, indicating that the cost issue is not static but structurally worsening as model sizes grow.

This statistic underscores a paradox. While pre-trained open-source models have become widely accessible (think Llama 4, Mistral Large, or Qwen3), the process to actually customize them for a specific use case—say, financial compliance or medical code generation—has remained prohibitively expensive. The cost is not just in compute, but in the engineering talent required to stabilize distributed RL systems. A single mid-senior infrastructure engineer commands a total compensation of $400,000-$600,000 annually in competitive markets, and RL post-training projects typically require teams of three to six such engineers for six to twelve months. Multiplying these figures across the hundreds of enterprises attempting in-house customization reveals a staggering aggregate waste of human capital—precisely the inefficiency Miles targets.

"The market is saturated with fine-tuning APIs, but real differentiation requires RL post-training," notes Dr. James Holloway, a research scientist at an undisclosed hedge fund's AI lab, speaking on condition of anonymity. "We built our own RL framework in-house. It took five engineers six months. Every time we changed models, we rewrote the data pipeline. A framework like Miles, if it works as advertised, could cut that time to two weeks." He adds that the hedge fund has already begun evaluating Miles for a proprietary trading model, where a 1% improvement in prediction accuracy can translate to hundreds of millions in annual returns.

Enterprise-Ready: Observability and Fault Tolerance

RadixArk made a deliberate bet that enterprise adoption would hinge on two often-overlooked features: observability and fault tolerance.

Miles includes an integrated telemetry module that surfaces, in real-time, the reward score distribution, GPU utilization across each phase (rollout vs. training), and the pipeline's "bleed" rate—the percentage of time GPUs spend waiting for data versus computing. This granularity allows ops teams to diagnose whether a performance regression is due to a reward model collapse or a network bottleneck. The telemetry data is exposed via standard Prometheus endpoints and can be ingested into existing Grafana dashboards, ensuring compatibility with enterprise monitoring infrastructure. RadixArk reports that early adopters have used this observability to identify and eliminate single-node stragglers that were degrading overall throughput by as much as 18%.

On the resilience side, Miles uses Ray's actor-based model to implement granular checkpointing. In a standard RL loop, a single node failure can invalidate hours of training. Miles restores from the last consistent global state, reducing effective downtime to under 60 seconds in most failure scenarios. For enterprise SLAs requiring 99.9% availability of training jobs, this is not a nice-to-have; it is a prerequisite. The framework also supports multi-region job migration, allowing teams to preemptively shift workloads to different availability zones based on spot-instance pricing signals—a feature that can cut training costs by an additional 25-40% in dynamic cloud environments.

Future Implications: The End of Proprietary RL Lock-In?

The strategic significance of Miles extends beyond its technical merits. By open-sourcing the orchestration layer for large-scale RL, RadixArk is challenging the dominant narrative that advanced post-training must be a black box, proprietary service offered by a handful of cloud giants. This is a deliberate engineering and business bet: that the ecosystem-level benefits of openness will outweigh the potential for direct monetization, creating network effects around Miles in the same way Kubernetes catalyzed a generation of cloud-native infrastructure.

Miles could catalyze a new wave of DIY model specialization. If a financial institution can take Llama 4, run RL post-training on its own internal data (trades, reports, compliance docs) using Miles, and emerge with a model that outperforms GPT-6 on financial reasoning, the value proposition for staying open-source and self-hosted becomes undeniable. The primary remaining barrier—engineering complexity—is precisely what Miles targets. The framework effectively lowers the technical entry barrier from "deep systems expertise" to "proficient PyTorch user," expanding the pool of capable practitioners by an estimated factor of 10 to 100.

RadixArk's roadmap suggests this is just the beginning. The team has hinted at future releases that will support multi-agent RL scenarios and integration with custom hardware accelerators beyond NVIDIA's ecosystem. The multi-agent extension is particularly intriguing: it would allow organizations to train specialized sub-models (e.g., for customer support, fraud detection, and compliance) concurrently while maintaining shared state, enabling ensemble-style reasoning without the computational cost of running separate training clusters. If competition heats up—say, a similar framework from Hugging Face or PyTorch's official ecosystem—the enterprise AI market could bifurcate: commoditized inference, and highly specialized, self-hosted RL post-training. In this scenario, Miles' first-mover advantage in the open-source RL orchestration space could prove decisive, as early adopters build their internal toolchains and best practices around its API surface.

Conclusion: The Loop Opened

Miles is not the first attempt to simplify RL post-training, but it may be the most pragmatic. It does not invent a new algorithm, nor does it require a fundamentally new architecture. Instead, it packages existing, proven technology into a loop that is observable, resilient, and composable. The framework's design reflects a mature understanding of where the real friction lies: not in the mathematics of reinforcement learning, but in the messy, error-prone business of keeping distributed systems running at scale.

For AI engineers who have spent sleepless nights debugging distributed TensorFlow graphs or watching RL reward curves plateau, Miles offers a glimpse of a more mature infrastructure landscape. For CTOs calculating the TCO of customized LLMs, it offers a path that does not involve signing a multi-year contract with a single vendor. The framework's extensibility means it can evolve with the field, supporting new reward models, new parallelism strategies, and new hardware accelerators without requiring a wholesale rewrite of the orchestration layer.

The open-source ecosystem has won the pre-training war. Miles suggests the next battle—for the soul of post-training—has just been given a new, level playing field. The question now is whether the enterprise world is ready to reclaim its ownership of the RL loop. Early signals are promising: RadixArk reports over 2,000 GitHub stars within the first week of release, along with confirmed evaluations at three Fortune 500 financial services firms and two major healthcare systems. If these pilots produce the expected gains, Miles may well become the de facto standard for RL post-training infrastructure—proving, once again, that the most impactful innovations are often those that eliminate friction rather than invent new capabilities.

Wednesday, 1 July 2026

Vini AI Directly Integrated Into VinSolutions CRM: Technical Workflow and Data Sync Details

Spyne has deployed a direct integration between its Vini AI platform and the VinSolutions Customer Relationship Management (CRM) system, owned by Cox Automotive. The integration allows dealerships using VinSolutions to run AI-powered calling, chat, and follow-up workflows directly within the CRM interface, removing the need to switch between separate applications. According to the announcement, the system autonomously handles high-volume customer communications while logging all interaction outcomes into the CRM record.

The integration is built on bi-directional data synchronization. When Vini AI places an outbound call, answers a chat inquiry, or sends a service reminder, the system writes the full interaction outcome—including appointment bookings, follow-up status, and customer intent—into the CRM record in real time. Dealership staff can view a unified timeline of AI-managed conversations alongside manual entries, without any manual data entry. The integration focuses on five operational areas: AI-powered calling (automates inbound and outbound calls, captures intent and sets appointments); AI-powered chat (handles website and text-based inquiries in real time); service reminders (proactively contacts customers about upcoming maintenance or completed service follow-ups); appointment scheduling (books service or sales appointments directly into the CRM calendar); and follow-up communication (sends automated, context-aware messages based on previous interactions).

The architecture ensures dealers can act on live CRM data without switching tools. For instance, if Vini AI determines a customer is interested in a specific vehicle, the system can trigger a follow-up task or alert a sales representative through VinSolutions, all without human intervention during the initial contact phase. This design targets operational friction in high-volume customer communication workflows.

Spyne, headquartered in Gurugram, India, supports over 3,000 dealerships globally and has raised more than $25 million from investors including Vertex Ventures SEA and India, Accel, Storm Ventures, and Alteria Capital. VinSolutions, a CRM platform widely adopted in the automotive industry, is part of the Cox Automotive portfolio. The integration is available immediately for existing Spyne and VinSolutions customers. Pricing for the combined solution was not disclosed. Dealers subscribed to Vini AI can activate the integration through their existing account configuration, with no additional hardware or software installation required.

Kris@Work Expands Leadership with Three Co-Founders, Targets Enterprise GTM Platform Scale

Kris@Work, an AI-native go-to-market platform for enterprise sales teams, elevated three senior executives to co-founder positions on 18 June 2026. Ananta Joshi, Samanvith Reddy Balugari, and Sunil Chandra Angara now serve as co-founders alongside CEO Arun Singh, marking a shift from the original founding team to a broader leadership structure as the company scales customer deployments.

The three new co-founders bring specialized technical backgrounds to the platform's core engineering. Joshi, an IIT Bombay alumnus and former global leader at Sprinklr, will focus on product vision and AI execution architecture. Balugari, an IIT Madras alumnus with prior engineering experience at Indeed, is tasked with developing scalable AI-led engineering systems. Angara, also from IIT Madras and previously at Goldman Sachs, will oversee enterprise-grade platform scale, reliability, and performance. The company's product is a unified AI-native platform covering the full sales cycle—from initial customer contact through closed deals to expansion—addressing fragmentation across CRM and sales tools.

Kris@Work's technical approach centers on replacing disparate point solutions with a single AI-driven system orchestrating go-to-market workflows. Early customer deployments have reported performance improvements of up to 15x in specific metrics, though the company has not disclosed exact measurement criteria or timeframes. The platform's architecture appears designed to ingest data from existing CRM systems and sales tools, then apply AI models to automate lead routing, deal progression, and forecasting—reducing manual data entry and reconciliation common in enterprise sales stacks.

The announcement places Kris@Work in a competitive market for AI-powered sales platforms, where players like Gong, Outreach, and Salesforce's Einstein compete for enterprise budgets. With InfoEdge Ventures backing, the company has financial runway to scale its engineering team and pursue customer acquisition beyond initial deployments. Pricing and release schedules for general availability remain undisclosed, though the co-founder additions signal readiness to move beyond early access into broader market distribution. The leadership restructuring suggests Kris@Work is preparing for a growth phase, leveraging the newly formalized founding team's combined expertise in product strategy, engineering scalability, and enterprise reliability.

ARC on ARC Announces Gaming Hardware Ambitions Without Revealing Core Details

ARC (branded as "ARC on ARC") has formally announced its company launch, positioning itself as a dedicated gaming hardware firm targeting the Indian market. The venture, founded by lifelong gamers Jobin Joseph and Kaustubh K Jadhav, is currently developing a handheld gaming device. According to a press release, the company's stated vision is to make premium gaming more accessible for Indian consumers. No specific technical specifications, model names, versions, pricing, or launch timeline have been disclosed at this stage.

The handheld device remains in active development, with no confirmed release window or retail price. Both founders are described as lifelong gamers, though their prior industry experience or technical backgrounds were not detailed in the announcement. The company has not yet released any software or hardware prototypes to the public, nor has it demonstrated a working unit. The press release emphasizes the goal of reducing cost barriers for local players but stops short of revealing how the device will achieve this compared to existing handheld consoles in the market.

The Indian gaming hardware market is dominated by global players such as Valve's Steam Deck, Nintendo's Switch, and various Android-based handhelds, with localized pricing and availability often being major hurdles. ARC on ARC's entry adds a domestic option to this segment, but without concrete hardware details, pricing strategy, or a release schedule, the company has yet to demonstrate how it will compete. The announcement provides no information on manufacturing partnerships, distribution channels, or funding. Until these specifics are revealed, the device remains a long-term aspiration rather than an imminent product. The press release confirms that further updates will be communicated through the provided contact.