Shopify A/B Testing Agency
CONVERTIBLES runs structured, multi-variation A/B testing programs for Shopify Plus brands doing $2M+ in revenue. 2 to 4 experiments per month on the elements that actually move revenue. Built, shipped, and measured by one team, so test winners don't sit in a backlog.
Last reviewed May 2026 by Julian Samarjiev, Co-founder of Convertibles. Methodology validated across 1,000+ live A/B tests on $2M+ DTC Shopify brands including Jones Road Beauty, Performance Golf, and Gymreapers. Tools of record: Intelligems (official partner) and TestBuddy.
What is Shopify A/B testing?
Shopify A/B testing is the process of running controlled experiments on a Shopify store to measure which version of a page, offer, or element generates more revenue. A/B testing compares a control (the current experience) against one or more variants, splits traffic between them, and declares a winner once the data reaches statistical significance.
Real A/B testing is not "try a new button color and see what happens." It is a structured program: pick the right elements to test based on revenue impact, run enough traffic to hit significance, ship winners, and compound the gains month over month.
At a Glance
- Test velocity: 2 to 4 experiments designed, built, and shipped per month
- Test type: Multi-variation (2 to 4 variants per test, not just A vs B)
- Cycle length: ~2 weeks to statistical significance for most tests
- Focus: Needle-moving components (offers, PDPs, collections, cart, checkout), not button colors
- Platform: Intelligems (official partner) + TestBuddy (our proprietary program management tool)
- Typical impact: Winning tests generate $10,000 to $100,000+ in additional monthly revenue
- Measured by: Year over year revenue growth, not test win rate
Why Most A/B Testing Programs Don't Move Revenue
Three problems kill most testing programs.
First, the wrong tests. Teams test what's easy, not what matters. Button colors, headline tweaks, padding changes. Those tests run, sometimes they win, but the revenue impact is flat. Meanwhile the real blockers (a broken PDP layout, a weak offer, a slow collection page) go untouched.
Second, the handoff problem. A typical setup has a CRO agency running tests and a dev shop implementing winners. The test wins, the spec gets written, the dev team is booked for six weeks, and by the time the winner ships the insight is stale. We fix this by owning both sides. Winners ship in days, not months.
Third, the wrong metric. Agencies optimize for test win rate ("we won 60% of our tests") because it sounds good in a report. We optimize for year over year revenue growth. A losing test that teaches us something about the customer is worth more than a winning test that lifts a low-traffic page by 2%.
What We Test
Offers and Promotions
The single highest-impact test category. We test offer structure (percentage off vs. dollar off vs. bundle vs. gift with purchase), thresholds, and messaging across segments. Offers beat copy tweaks ten times out of ten.
Product Detail Pages
Layout, hero image treatment, social proof placement, trust signals, upsell positioning, variant selectors, and CTA copy. PDPs convert or they don't. This is where the money is.
Collection Pages
Sort order, filtering, product card design, merchandising, and navigation. Collection pages are often the most-trafficked page on the store after the homepage and get the least attention.
Cart and Checkout
Cart upsells, free shipping thresholds, progress indicators, trust badges, checkout flow friction. Every percentage point here is pure revenue. For a worked example, see the $50K/month cart drawer free gift progress case study, or our dedicated Shopify checkout optimization service for the full cart-to-confirmation program.
Hero and Homepage
Hero imagery, headline, offer framing, and primary CTA. Homepage tests have high traffic volume and fast read times. For a subscription example, see the homepage sticky CTA test that lifted conversion rate +20.4% by keeping the join action visible on a long-scroll page.
Popups and Email Capture
Timing, offer, copy, and segment targeting. We test popups as revenue drivers, not just list builders.
How We Design Tests
- Audit and prioritize. Review analytics, heatmaps, and session recordings to identify where revenue is leaking. Rank opportunities by impact and confidence, not by what's easy.
- Write the hypothesis. Every test starts with a hypothesis that ties back to a specific user behavior or revenue metric. No "let's see what happens" tests.
- Build multi-variation tests. 2 to 4 variants per test, not just A vs B. Multi-variation tests move faster and teach more, as long as traffic supports it.
- Ship and monitor. Winners roll to 100% of traffic across segments. Losers get documented and feed into the next test.
- Feed insights across the stack. On-site winners inform ad creative. Ad learnings inform on-site tests. Two sides, one feedback loop.
Our Tools
Intelligems (Official Partner)
Intelligems is our primary testing platform. Unlike legacy tools that only split traffic and measure conversion, Intelligems also supports price testing, personalization, and profit optimization on Shopify. We are an official Intelligems partner and were among the early agencies to deploy the platform at scale across client programs.
TestBuddy (Proprietary)
TestBuddy is a visual personalization and test management tool we built in-house because no off-the-shelf tool did what we needed. Clients get real-time visibility into what's live, what's being tested, and how every experience is performing. No spreadsheets, no status meetings, no "where are we on that test?" questions.
Proof
Jones Road Beauty ($160M+)
CEO Cody Plofker publicly endorsed CONVERTIBLES on X/Twitter: "I'm a very happy client. They have a unique testing methodology." See the post.
Performance Golf (9 figure)
We rebuilt Performance Golf's websites into a performant frontend experience and ran the testing program that drives revenue on top of it. Donnie French, Copy Chief at Performance Golf, shared a video testimonial about the work. Watch it here.
Gymreapers (athletic apparel)
Roc Pilon, Founder & CEO of Gymreapers, on the work: "CONVERTIBLES are highly versed e-commerce experts. Their ability to understand demand generation and demand capture and be VERY tactical within the ecommerce landscape makes them highly valued team members to any company looking for growth."
Results at Scale
Individual tests have produced $60K to $110K/month in additional revenue for brands at scale. Most clients see measurable lifts within the first 2 to 3 months of the program. For 36 published examples across PDPs, collections, cart, homepages, and popups, see our Shopify A/B test case studies ($2.3M+/month in aggregate measured lift).
A/B Testing Is Part of the Full CRO Program
A/B testing on its own is half the job. Tests only compound when winners ship fast, speed issues get fixed, and ad traffic lands on pages built for conversion. That's why our A/B testing service is part of our full Shopify CRO program, which bundles testing, landing pages, speed, dev, and Google & YouTube Ads under one team.
Who This Is For
- Shopify Plus brands doing $2M+/year in revenue with enough traffic to hit significance
- Brands running ads but unsure if the on-site experience is converting well
- Teams that have tried A/B testing in-house or with a tool-only vendor and stalled
- Brands that want a structured testing program, not ad-hoc "experiments"
- Marketing leaders who want revenue lift, not test reports
Before scoping a testing program, founders usually want to see what the gap looks like at their traffic and AOV. Our conversion rate calculator returns a monthly revenue gap by vertical.
Who This Is Not For
- Brands under $2M/year or with low traffic volume (A/B testing needs statistical power)
- Brands looking for a self-serve testing tool (we are a full-service program, not software)
- Brands that want to optimize for test win rate instead of revenue growth
Frequently Asked Questions
How many A/B tests do you run per month?
2 to 4 experiments per month is our standard cadence. We prioritize impact over volume. Running 10 low-quality tests is worse than running 3 well-designed ones that move revenue.
How long does a typical A/B test take?
Most tests reach statistical significance within 2 weeks, depending on traffic volume and effect size. Tests on lower-traffic pages or smaller effects take longer. We plan test duration before launch based on expected sample size.
Do you run A/B tests or multi-variation tests?
Both. We default to multi-variation tests (2 to 4 variants) when traffic supports it because they move faster and teach more per cycle. Classic A vs B tests work fine on lower-traffic pages where we need statistical significance sooner.
What testing tool do you use?
Intelligems is our primary platform. We are an official Intelligems partner and have deep expertise in their personalization, price testing, and profit optimization tools. For program management and visual tracking of live tests, we use TestBuddy, our proprietary tool.
What kinds of things do you test?
Offers, PDPs, collection pages, cart and checkout, hero sections, and popups. We focus on elements that move revenue, not button colors or cosmetic changes. If a test can't plausibly move at least a few percentage points in revenue, we don't run it.
Can you ship winning tests, or do we need our own dev team?
We ship winners ourselves. That's a core reason clients hire us. Most CRO agencies hand off winners to a dev team and wait weeks. We own both sides, so winners go live in days.
How do you measure success?
Year over year revenue growth, not test win rate. A high win rate is easy to game by running small tests on low-risk pages. What matters is whether the business is making more money this year than last year, and whether the testing program is a meaningful contributor to that.
Book a Strategy Call
If you run a Shopify Plus brand doing $2M+ in revenue and want a structured A/B testing program that actually moves the number, let's talk.