Deep dive

How Roblox doubled AI code acceptance by teaching models to think like their engineers

March 23, 2026

Half of Roblox's engineering team uses AI coding assistants. But until recently, only about 20% of AI-generated suggestions survived human review. The models could write code, but they couldn't write Roblox code - the kind that follows two decades of internal conventions, avoids known pitfalls, and passes the review standards of senior engineers.

Roblox published a detailed blog post explaining how they fixed this. The results: AI pull request acceptance jumped from roughly 30% to over 60% across a dataset of 10,000 PRs. One internal code cleanup agent went from 46% accuracy to over 90%. On their golden evaluation set, pass rates hit 100%.

The approach has lessons for anyone using AI to write Roblox games, not just Roblox's own engineers.

The core idea: context over capability

Generic AI models are trained on the open internet. They know Python, JavaScript, and general programming patterns well. But they haven't seen 700,000 Roblox pull requests, 1.7 million internal code review comments, or 20 years of commits and design documents.

Roblox's insight was that the bottleneck wasn't model intelligence. It was domain knowledge. A model that doesn't know Roblox conventions will write code that technically works but fails review because it uses deprecated patterns, misses performance implications, or ignores established architectural decisions.

Instead of fine-tuning a model from scratch, Roblox built a system that feeds the right context to existing models at the right time.

Three techniques that made it work

1. Exemplar alignment

Roblox engineers curate “exemplars” - gold-standard code examples paired with explanations of why they are correct. When an AI touches similar code, the system retrieves the relevant exemplar and includes it in the prompt.

A concrete example from the blog: the system flags blocking FetchData calls inside high-frequency loops. It doesn't just say “don't do this.” It explains the latency risk and links to the internal async best practices documentation.

The system also generates exemplars automatically by embedding historical review comments into vector space, clustering them by theme, and refining the clusters into general rules using LLM-assisted analysis.

2. Negative signal learning

Every rejected suggestion, failed refactor, and post-merge regression becomes training data. When the system generates new code, it searches this history of past mistakes and reviewer feedback to avoid repeating them.

This is the part most individual developers miss. If you are using AI for Roblox development, the model does not remember that it suggested something wrong last time. Roblox built an explicit memory layer for failures. You can approximate this by keeping a prompt file of patterns to avoid.

3. Hybrid symbolic-vector representation

Roblox unified their version control, build graphs, and runtime telemetry into a single knowledge structure. This lets the system understand relationships between code components that go beyond text similarity - things like which modules depend on each other, what code runs on hot paths, and where changes tend to cause regressions.

What this means for Roblox developers

You do not have Roblox's infrastructure, but you can apply the same principles when using AI tools like BloxBot, Claude Code, or Cursor for your Roblox projects:

Give the model your conventions. Keep a project rules file (like CLAUDE.md or a system prompt) that lists your coding standards. Use task.wait instead of wait. Require type annotations. Specify your client-server communication patterns.
Include examples, not just rules. Roblox found that showing a model a correct implementation with reasoning beats a list of do's and don'ts. Paste a well-written script into your prompt as a reference.
Track what goes wrong. When AI-generated code breaks or gets rejected during playtesting, write down the pattern and add it to your prompt context. This is the manual version of Roblox's negative signal learning.
Use Roblox-specific benchmarks. Roblox open-sourced OpenGameEval, which now includes 30 debug-focused evaluations. If you are comparing models, this dataset tells you how well each one handles actual Roblox development tasks.

The bigger picture

Roblox's results confirm something the AI-assisted development community has been learning: raw model intelligence matters less than domain-specific context. A smaller model with the right Roblox conventions in its prompt will outperform a larger model working blind.

This is also why tools that connect directly to Roblox Studio through the MCP Server are valuable. They give AI agents access to your actual project state - the instance tree, scripts, properties - instead of asking the model to guess from a description.

The full technical breakdown is in Roblox's blog post. It is worth reading if you are building anything on the platform with AI assistance.

Back to BloxBot