How r2decide Outperforms Doofinder in Live A/B Testing

2025-03-03
4 min read
How r2decide Outperforms Doofinder in Live A/B Testing

How r2decide Outperforms Doofinder in Live A/B Testing

In our previous post, we demonstrated how r2decide excelled in standard search performance metrics such as F1-Score, NDCG, and MRR, surpassing other search solutions. While these offline evaluations are crucial for benchmarking search relevance, they don’t fully capture real-world user behavior and its impact on business outcomes.

To bridge this gap, we conducted a live A/B test to evaluate r2decide in an operational e-commerce environment. This post details our findings from a real-world experiment on Elli, a web store renowned for its stylish jewelry collection. The goal was to measure how r2decide performs under real business conditions with authentic user interactions.


Recap of Offline Performance

In case you missed our last post, we evaluated r2decide using well-established search metrics:

  • F1-Score: Measures the balance between precision and recall for relevance assessment.
  • NDCG (Normalized Discounted Cumulative Gain): Captures ranking quality by considering the position of relevant results.
  • MRR (Mean Reciprocal Rank): Indicates how quickly the correct results appear in ranked search results.

The results showed that r2decide consistently outperformed alternative search solutions, delivering superior relevance and ranking. However, offline benchmarks must be validated with real-world performance, leading us to our live A/B test.


A/B Test Design

To test r2decide in a live environment, we partnered with Elli, a Shopify-based jewelry store, and compared r2decide against Doofinder using a structured A/B test.

Experimental Setup

  1. Theme-Based Comparison

    • Two Shopify themes were deployed:
      • Theme A: Integrated with r2decide
      • Theme B: Integrated with Doofinder
    • This setup ensured that each visitor interacted with only one search experience, isolating the impact of the search engine on user behavior.
  2. Randomized Traffic Distribution

    • We used Intelligems, a dedicated A/B testing tool, to split the traffic evenly (50/50) between the two themes.
    • Each visitor was randomly assigned to either r2decide (Theme A) or Doofinder (Theme B), ensuring an unbiased comparison.
  3. Duration

    • The results are based on the test run for 15 days (Feb 11 - Feb 25), allowing us to capture enough data to account for variations in user behavior across weekdays and weekends.
  4. Data Collection & Metrics

    • Google Analytics and Shopify’s predefined events were used to track user interactions.
    • We measured key business-impacting metrics:
      • Search-to-purchase conversion rate: The percentage of searchers who completed a purchase.
      • Checkout initiation rate: The percentage of users who proceeded to checkout after a search.
      • Revenue per visitor (RPV): A direct measure of revenue impact per site visitor.

Why This Approach?

  • Theme-based testing isolates search behavior differences without interference from other store functionalities.
  • Equal traffic split ensures fairness in comparison and eliminates sample bias.
  • A 15-day testing period balances statistical significance while capturing diverse shopping behaviors.

Results: r2decide vs. Doofinder

After analyzing data from 50K+ users who participated in the A/B test, the results decisively favored r2decide:

ab_test_conversion_rate

ab_test_checkout_rate

ab_test_rev_per_visitor

  1. 8.6× Higher Search-to-Purchase Conversion

    • r2decide: 3.35% of searchers completed a purchase.
    • Doofinder: Only 0.39% of searchers made a purchase.
  2. 3.3× More Shoppers Initiate Checkout

    • r2decide: 46% of search-driven users started checkout.
    • Doofinder: Only 14% of search-driven users initiated checkout.
  3. 4% Higher Revenue per Visitor (RPV)

    • Stores using r2decide observed a 4% increase in revenue per visitor compared to Doofinder-powered search.

These results indicate that r2decide significantly enhances user engagement, conversion rates, and revenue generation in a real-world e-commerce environment.


Conclusion

Our A/B test on Elli’s Shopify store reinforces the insights from our offline evaluations: r2decide delivers a superior, higher-converting search experience compared to Doofinder. By optimizing search relevance and ranking, r2decide not only improves user satisfaction but also drives measurable business impact.

Interested in seeing how r2decide can boost your Shopify store’s search experience? Check out r2decide search on Shopify today!