Predict A/B test outcomes before you launch.

Replica runs simulated user sessions across your control and treatment website variants, calibrated against your past A/B tests. Get predicted impact, confidence intervals, replays, and diagnostics in minutes.

Part of

Bessemer Beam

Simulating · 4,237 / 50,000 sessions

Control

Treatment

Predicted lift

95% CI+0.8% to +6.0%

0 hour

Average simulation time per experiment

0+

Simulated user sessions per experiment

0

Experiments runnable in parallel, all without risk or interference

Features

Forecast the metric. See the behavior behind it.

Replica simulates user sessions across your control and treatment website variants, predicting lift and confidence intervals while showing the replays, reasoning, themes, and diagnostics behind the result.

01

Metric forecasts

See how each metric is predicted to move, with lift estimates, 95% confidence intervals, and a clear ship-or-skip recommendation. Segment results by user attributes for deeper analysis.

02

Session replays

Watch each simulated user session from start to finish, with every click, scroll, pause, and input paired with what the user was thinking in a searchable transcript.

/

00:00

0:22

Transcript · actions + thoughts

00:01

› Clicked “View menu”“Let me jump into the menu.”

00:04

› Scrolled through menu“Wow, lots of options this week.”

00:06

› Clicked search filter“Too many to pick from — let me filter.”

00:07

› Typed “vegetarian salads”“Trying to eat lighter this week.”

00:10

› Clicked “Search”“Alright, let me see what matches.”

00:12

› Clicked first meal“Yeah, this looks good for Tuesday.”

00:14

› Clicked second meal“And this one for Friday — kids will eat it.”

00:16

› Clicked “Checkout”“OK, ready to check out.”

00:18

› Typed “FRESH10”“Wait — I have that promo code from email.”

00:20

› Clicked “Place order”“Done. Looking forward to dinner.”

03

Simulated user interviews

Interview individual simulated users to understand specific moments of conversion, hesitation, or drop-off in their sessions — or ask across all sessions to uncover broader patterns behind the forecast.

04

Auto-clustered themes

Replica analyzes session transcripts across control and treatment to identify recurring behavioral patterns, then ranks themes by frequency, relevance, and impact to show what mattered most across the simulation.

05

Searchable transcripts

Search across all sessions to quickly find moments of conversion, hesitation, confusion, or drop-off, then click any line to jump directly to that moment in the replay.

Accuracy

Calibrated against your real A/B test history

Replica backtests against past experiments where outcomes are already known, then tunes the models and simulated users until forecasted lift closely tracks actual lift. Once calibrated, Replica can forecast future website tests before launch.

Random guess

0%Coin-flip baseline

A coin flip on every A/B test — no information, just luck.

+ Data integrations

Data integrations

Read-only connectors into your product analytics, experimentation, session replay, and warehouse tools so Replica can model your real users, traffic mix, and behavior patterns.

+ Finetuned models

Data integrationsFinetuned models

Foundation models finetuned on real session recordings and transcripts so simulated users behave more like your actual users.

+ Backtest calibration

Data integrationsFinetuned modelsBacktest calibration

Tuned against past A/B test outcomes until forecasted lift closely tracks actual lift.

Case studies & research

Validated on real experiments and real user behavior

Replica’s case studies show forecasts matching live A/B test outcomes. Our research shows that simulations become more accurate when grounded in real user attributes, behavior data, and finetuning — the core methodology behind Replica.

Case study · Explainpaper

Primary funnel metrics predicted correctly

In under an hour, Replica predicted the same outcome that Explainpaper’s three-week live A/B test later showed across both primary signup-funnel metrics.

Read the case study →

Research · User data

Lower prediction error from realistic user modeling

Using Statsig user data, Replica modeled more realistic simulated users and reduced prediction error — showing that real user context improves forecast accuracy.

Read the paper →

Research · Finetuning

Higher action prediction accuracy

Finetuning on 5,000 real user session recordings helped Replica predict clicks, scrolls, inputs, and drop-offs more accurately than base GPT-4o.

Read the paper →

“The fact that Replica can accurately predict whether A or B is better means we can test extremely rapidly.”

CEO of B2C company, 400k+ users

How it works

Use your existing data stack to simulate real users

Replica uses your existing product and session data to create simulated users, finetune their behavior, and run thousands of browser sessions across your control and treatment variants. In minutes, you get predicted lift, confidence intervals, session replays, transcripts, and behavioral themes before launching the test.

Statsig

Amplitude

Optimizely

+50

more

Replica

01

Connect

Replica connects to your analytics, experimentation, session replay, and warehouse tools to create simulated users matched to your real audience. We use user attributes and traffic patterns to define each simulated user, then finetune their behavior on session recordings and action transcripts.

Control

Treatment

02

Simulate

Replica uses these simulated users to run thousands of web sessions across your control and treatment variants in minutes. Each simulated user views, thinks, scrolls, clicks, and types like a real user.

Control

Treatment

+3.4% · 95% CI

03

Decide

Predicted lift and 95% confidence intervals show what changed. Session replays, transcripts, and clustered behavioral themes show why. Ship or skip with quantitative signal and qualitative evidence.

Integrations

Statsig

Statsig

Amplitude

Amplitude

Optimizely

Optimizely

Mixpanel

Mixpanel

PostHog

PostHog

Google Analytics

Google Analytics

Hotjar

Hotjar

Statsig

Statsig

Amplitude

Amplitude

Optimizely

Optimizely

Mixpanel

Mixpanel

PostHog

PostHog

Google Analytics

Google Analytics

Hotjar

Hotjar

Statsig

Statsig

Amplitude

Amplitude

Optimizely

Optimizely

Mixpanel

Mixpanel

PostHog

PostHog

Google Analytics

Google Analytics

Hotjar

Hotjar

Snowflake

Snowflake

Databricks

Databricks

PostgreSQL

PostgreSQL

MongoDB

MongoDB

MySQL

MySQL

Redis

Redis

SQLite

SQLite

Google Cloud

Google Cloud

Snowflake

Snowflake

Databricks

Databricks

PostgreSQL

PostgreSQL

MongoDB

MongoDB

MySQL

MySQL

Redis

Redis

SQLite

SQLite

Google Cloud

Google Cloud

Snowflake

Snowflake

Databricks

Databricks

PostgreSQL

PostgreSQL

MongoDB

MongoDB

MySQL

MySQL

Redis

Redis

SQLite

SQLite

Google Cloud

Google Cloud

Where Replica fits

Prioritize the right tests before production

Replica runs before live experimentation. Use it to forecast impact, inspect behavioral evidence, and prioritize which website changes deserve real A/B test traffic. It helps teams test more ideas, filter out weak candidates, and make every live experiment count.

Dimension

Replica simulated A/B test

Live A/B test

User interviews

Time to result

Minutes

2–4 weeks

1–2 weeks

Sample size per experiment

Unlimited

Capped by traffic

5–15 participants

Production traffic consumed

None

Full allocation

None

Quantitative lift estimate

With 95% CI

With 95% CI

No

Qualitative reasoning

Replays + Q&A + themes

None

Direct quotes

Parallel experiments

Unlimited

Limited by traffic

Limited by ops

Confirms behavior in production

No

Yes

No

Behind Replica

Built by experimentation veterans, supported by the best

Part of

Bessemer Beam

See how Replica performs on your past A/B tests

Share past website A/B tests where you already know the outcomes. Replica calibrates its simulations against your experiment history, compares predicted lift to actual lift, and gets Replica ready for production use on future tests.

Run a backtest with Replica