Skip to content

Issues when using Async + VCR #381

Open
@izaguirrejoe

Description

@izaguirrejoe

I'm running into an issue when using Async, RubyLLM + VCR. I'm using Async to generate multiple embeddings simultaneously, then attempting to replay using VCR in tests. On the first run (when no cassette is available) the embeddings are generated and saved successfully. On subsequent runs, when using the available cassette, the embeddings are shuffled and stored incorrectly. Seems like the fibers are getting crossed. I saw a (potentially) related issue here, but I'm not using async-http.

I notice that RubyLLM is using Faraday under the hood. There's some strange interplay between Async + VCR + Webmock + Faraday, it seems.

Here's a minimally reproducible script that shows the issue:

require "async"
require "async/barrier"
require "vcr"
require 'test/unit'
require 'webmock/test_unit'
require "ruby_llm"
require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'async'
  gem 'vcr'
  gem 'webmock'
  gem 'ruby_llm'
end

RubyLLM.configure do |config|
  config.openai_api_key = ENV["OPENAI_API_KEY"]
end

VCR.configure do |config|
  config.cassette_library_dir = "fixtures/vcr_cassettes"
  config.filter_sensitive_data('<OPENAI_API_KEY>') { ENV["OPENAI_API_KEY"] }
  config.hook_into :webmock
end

class AsyncVCRTest < Test::Unit::TestCase
  def test_async_faraday
    Sync do
      VCR.use_cassette("async") do
        result = {}
        barrier = Async::Barrier.new

        %w[dog cat turtle elephant flower].each do |term|
          barrier.async do
            embedding = RubyLLM.embed term
            # Let's grab the first term of the vector to compare later.
            result[term] = embedding.vectors.first
          end
        end
        barrier.wait

        pp result
        # This is an example second-run, after the cassette has been saved.
        # Notice that values have been swapped!
        # {"dog" => 0.046830196, "cat" => 0.028151099, "turtle" => 0.016412826, "elephant" => 0.05113775, "flower" => 0.02552942}

        # These are the correct values
        # Obtained from not using VCR, or the first-run when using VCR
        {
          "dog" => 0.051114134,
          "cat" => 0.02552942,
          "turtle" => 0.028160911,
          "elephant" => 0.046814237,
          "flower" => 0.016412826
        }.each do |key, expected|
          actual = result[key]
          # This will pass on the VCR first-run
          # and fail on subsequent runs.
          # There's some variability when creating embeddings, so
          # testing within a certain delta
          assert_in_delta(expected, actual, 0.01)
        end
      end
    end
  end
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions