How To Build A Web App, part 12 of ?: VCR cassettes, service architecting

Mikey Clarke
9 min readFeb 14, 2020
Photo by Bill Jelen on Unsplash

This is the twelfth in a series of articles taking you through all the actual steps in building a web app. If you’re an aspiring developer, if mucking around with teensy beginner tutorials frustrates you, if you’d love to build a properly substantial app that does fab things, these articles are for you.

Last time, we wrote our first test. It verified that Ticketmaster’s raw API response is structured the way we’d like.

Today, we’ll use the VCR gem to record and cache Ticketmaster’s response data, so that we can run that test over and over and over and never hit Ticketmaster more than once per day. Then we’ll write and test a new function for our Ticketmaster service: it’ll massage its API’s raw response data into something containing only the data points we’re after, and nothing else.

Onward!

Wrap our API test with VCR

So far so good! Next: set up VCR so’s that we don’t clobber Ticketmaster every single time we run our tests. Which is often. That’s the whole point of tests.

First: Gemfile. VCR has a dependency called Webmock. Webmock is a library for “stubbing” HTTP requests (that means replacing their expensive responses with something faster/cheaper/spicier).

(By the way, for those who don’t know — “expensive” is just techie jargon for any operation that requires lots of time and/or processing power. It doesn’t mean money.)

Add webmock to the base file, and vcr to the :test group:

# Gemfile...gem 'webmock'...group :test do
...
gem 'vcr'
end
...

VCR requires a bit of setup too. Create a file in spec/support called vrc.rb and add this:

VCR.configure do |config|
config.cassette_library_dir = "spec/vcr"
config.hook_into :typhoeus
end

What’s this? What it says on the tin. VCR’s cassettes go into the spec/vcr directory, and it uses the Typhoeus gem to make its HTTP requests.

Quick tangent: gem management

You may be bitching thusly:

…Waaaitasec … I thought we already had an HTTP library installed. HTTParty, yeah? Do HTTParty and Typhoeus not do the same thing? Why not use just one or just the other? Why both?

Good question! Answer: they don’t quite do exactly the same thing … but they do have a ton of overlap. They’re both HTTP libraries. They both make HTTP requests.

Why don’t I use just one or just the other? Answer: here are the options for VCR’s hook_into. To set up VCR properly, we must use one of them. And HTTParty isn’t on there! Hmph and double hmph.

But I rather like HTTParty! It’s fab and funky and has a nice vibe. I’d be incredibly reluctant to abandon it. Yeah, it’s a fair amount of code duplication. If we were in a strict, professional, minimalist setting, I’d likely favour refactoring my existing Ticketmaster Service code away from HTTParty and embracing, say, NetHttp or something.

Thing is, in any good-sized Rails project, it’s quite incredibly easy to get carried away with gem usage! There’s a term, “franken-code”, for a project that uses a million billion external libraries or plug-ins or extensions. That is bad. The bigger your third-party dependency list, the longer it takes to install them, the greater the likelihood of encountering a bug from forcing two gems never designed to work together to be neighbours … all kinds of problems. It’s excellent practice to be extremely cautious about adding to your Gemfile without a damn good reason.

…But gosh-darn-it! It’s my casual project. I’d rather enjoy myself. So nyah.

Tangent over

Okay, now VCR’s actual usage. Back we go to spec/services/ticketmaster_spec.rb. Let us make these changes:

# spec/services/ticketmaster_spec.rb...it "returns the schema we're expecting" do  query = { 
classificationName: 'Comedian',
size: 1,
apikey: Gigs::Application.credentials.ticketmaster_key
}
path = 'https://app.ticketmaster.com/discovery/v2/events.json'
VCR.use_cassette 'services/ticketmaster' do
response = HTTParty.get(path, query: query).parsed_response
errors = JSON::Validator.fully_validate(schema, response)
expect(errors).to eq []
end
end

Note the main change. The expect meat-and-potatoes is now inside a block, which we pass to VCR.use_cassette.

And now we re-run our VCR-spruced-up test:

$ rspec spec/services/ticketmaster_spec.rb
.
Finished in 0.17794 seconds (files took 4.68 seconds to load)
1 example, 0 failures

Success!

If you peek into spec/vcr/services, you’ll see a brand new file, ticketmaster.yml, chock-full of our cached response. Each time we re-run the test, it’ll return this, instead of chewing up our Ticketmaster API quota.

But we don’t want to cache this single recorded result until the heat death of the universe. Ticketmaster may one day change its response schema. The whole point of this specific test is to detect that! We’d very much like to know about it.

Soooo…. let’s re-record a fresh API copy, say, every day. We tell VCR that by passing it a parameter called re_record_interval:

# spec/services/ticketmaster_spec.rb...VCR.use_cassette 'services/ticketmaster', re_record_interval: 1.day  do
response = HTTParty.get(path, query: query).parsed_response
errors = JSON::Validator.fully_validate(schema, response)
expect(errors).to eq []
end

Awesome! Let’s commit all this, usual story, you know the drill by now, git add [your files], then git commit -m "[message]". Here’s my version.

Write API data to local database

Great. We’ve confirmed a raw data source, we’ve tested its structure, sweet.

Now we’ll filter and massage and simplify that raw data source and its structure, to have it contain only standup comedy data relevant to our interests, to make it much much easier to insert only the right bits into our database.

Recap: we want to iterate over every single Gig, Venue and Act in the entire Ticketmaster database pertaining to standup comedy, then save to our own local database any gig/venue/act data not already present there.

Optional: we may also want to update certain updates and relationships on existing Gigs, Venues and Acts, depending on what they are. Cross that bridge when we come to it, maybe in a later article. For now, just stick with adding new ones. MVP yo.

We talked in article 11 about something called Services. Now we’re going to write one. Recall that a Service is basically just a Rails class/module/library/whatever that makes third-party API calls. Like to Ticketmaster. Standard practice is to plonk them in app/services/[name]_service.rb.

And … you know what, I also mentioned in article 11 test-driven development. It’s not a bad idea when you already have a really specific and well-formed idea of the exact functions you want to test … but sometimes, you really are just feeling your way forward. Like here. So what the hell. I shall write my functions first and my tests second. Onward.

I’ve created a file called ticketmaster_service.rb. After a bit of head-scratching and manually messing about with Ticketmaster, I’ve written this:

# app/services/ticketmaster_service.rbmodule TicketmasterService  EVENTS_URL = 'https://app.ticketmaster.com/discovery/v2/events.json'  # Hit the Ticketmaster event discovery API, 
# iterate over every page of every single
# standup comedy event, grab the IDs, names
# and other relevant attributes we may decide
# on later.
#
def self.get_all_gigs
# We're going to fill this little bugger up thusly:
# gigs = [{ t_id, name, venue: { t_id, name }, act: { t_id, name }}, ...]
gigs = []
page = 0

while true do
response = HTTParty.get(EVENTS_URL, query: {
classificationName: 'Comedian',
page: page,
size: 20,
apikey: Gigs::Application.credentials.ticketmaster_key
})

response.code == 400 ? break : page += 1
response.parsed_response['_embedded']['events'].each do |event|
gigs << {
ticketmaster_id: event['id'],
name: event['name'],
venue: {
ticketmaster_id: event['_embedded']['venues'][0]['id'],
name: event['_embedded']['venues'][0]['name'],
},
act: {
ticketmaster_id: event['_embedded']['attractions'][0]['id'],
name: event['_embedded']['attractions'][0]['name'],
}
}
end
end
gigs
end
end

Not bad, eh? Here’s a quick once-over.

The purpose of get_all_gigs is preprocessing. It’s to transform each gig’s data-structure from this enormous nonsense, https://gist.github.com/tyrant/25b2d24c1c283baa16ab77a25ea48335 to this:

[{
ticketmaster_id: xxx,
name: yyy,
venue: {
ticketmaster_id: zzz,
name: aaa
},
act: {
ticketmaster_id: bbb,
name: ccc
}
}, {
...
}]

So much easier, right?

The main feature is the while-loop. It loops forever, executing HTTParty.get over and over again, bumping up the page-count each time, until we hit a 400 Bad Request error response, and exit the loop.

I did a bit of experimenting. Turns out the Ticketmaster API refuses to return more than 1000 entries. Combined with the maximum allowed page size being 20, this means we can hit 50 pages max. Bah. So be it.

What do we test? We wish to verify that yoinking Ticketmaster’s data returns an array of our yoinked gigs in the format we’d like. It’s another JSON schema test, but much much simpler.

I’ll was also going to tweak the HTTParty.get call a bit, this time to make just a single page, so we don’t spray out 50 calls and wait ages and ages. But aha, VCR comes to our rescue. Accessing 50 local pages will take us no time at all.

Okay … here’s my first attempt at testing TicketmasterService.get_all_gigs:

# app/services/ticketmaster_service_spec.rb...describe '#get_all_gigs' do  let!(:schema) {
{
type: 'array',
items: {
type: 'object',
required: ['name', 'ticketmaster_id', 'venue', 'act'],
properties: {
name: { type: 'string' },
ticketmaster_id: { type: 'string' },
venue: {
type: 'object',
required: ['name', 'ticketmaster_id'],
properties: {
name: { type: 'string' },
ticketmaster_id: { type: 'string' },
}
},
act: {
type: 'object',
required: ['name', 'ticketmaster_id'],
properties: {
name: { type: 'string' },
ticketmaster_id: { type: 'string' },
}}}}}
}
it "returns the schema we're expecting" do
VCR.use_cassette 'services/ticketmaster_service_get_all_gigs', re_record_interval: 1.day do
gigs = TicketmasterService.get_all_gigs
errors = JSON::Validator.fully_validate(schema, gigs)

expect(errors).to eq []
end
end
end

Okay, let’s do a cheeky rspec spec/services/ticketmaster_service_spec.rb:98, and let’s see what we get.

Why yes it takes 27 years, as it’s stockpiling all 50 API requests inside its VCR cassette…

…And:

$ rspec spec/services/ticketmaster_spec.rb:98
Run options: include {:locations=>{"./spec/services/ticketmaster_spec.rb"=>[98]}}
F
Failures:1) Ticketmaster service #get_all_gigs returns the schema we're expecting
Failure/Error: expect(errors).to eq []
expected: []
got: ["The property '#/359/venue/name' of type null did not match the following type: string in schema 3e0... type null did not match the following type: string in schema 3e07369a-8dd3-5494-b33c-b3d0bde1da0f"]
(compared using ==)
# ./spec/services/ticketmaster_spec.rb:103:in `block (4 levels) in <top (required)>'
# ./spec/services/ticketmaster_spec.rb:99:in `block (3 levels) in <top (required)>'
Finished in 1.78 seconds (files took 3.15 seconds to load)
1 examples, 1 failure
Failed examples:rspec ./spec/services/ticketmaster_spec.rb:98 # Ticketmaster service #get_all_gigs returns the schema we're expecting

It failed! What happened?

Take a close look at that exact schema error. "The property '#/359/venue/name'...". The name within 359th entry in our response array. From an array 1000 entries long. Geez. Deep!

And take note of the ellipses in the centre of that error array. It’s truncated. So I hopped into Byebug and examined the entire thing. Too big to fully reproduce non-messily, but long story short, we’re getting null venue name errors for array entries 359, 360, 376, 403, 462, 464, 529, 655, 659, 664, 670, 671, 687, 696, 703, 837, 886, 888, 897, 905, 927, 932, 933, 941, 946, 947, 949, and 954. 28 of these 1000 venues have no names. That ain’t trivial.

Well, well. Turns out Ticketmaster is just fine with 2.8% of their venues having no names. Okay then! Looks like we’ll have to be fine with that too. Either that or have our own code swap generate a placeholder name upon the API data being null, something like venue.name="name_#{venue.id}". Either works.

But! Consolations flourish. You can see that re-running the same test took 1.78 seconds, not 27 years. Sweet. VCR is indeed doing its job. Though if you examine spec/vcr/services/ticketmaster_service_get_all_gigs.yml, you’ll see it’s gigantic. 10711898 bytes! Though that’s to be expected. Fifty pages of gig JSON.

I’ve just spent many minutes staring at https://jsonapi.org/format/ until the streets ran red with Burgundy’s blood, and I can’t see a thing about how to specify an attribute can have either a certain datatype or null … so I’ve just simply commented that line out, plus an actual comment denoting this.

Okay, running again …

$ rspec spec/services/ticketmaster_spec.rb:98
Run options: include {:locations=>{"./spec/services/ticketmaster_spec.rb"=>[98]}}
.
Finished in 2.34 seconds (files took 4.22 seconds to load)
1 example, 0 failures

Success!

Let’s make a commit. I’d also done a bit of housekeeping, and renamed spec/services/ticketmaster.yml to spec/services/ticketmaster_service_api.yml, so we can add that change too.

You can see my version of that commit here.

That’ll do for today! Nice work. We’ve written a service that obtains for us every standup-comedy gig in the Ticketmaster database, and written a test that ensures the API response schema is indeed in the structure we desire. Awesome. Give yourself a pat on the back.

Next time: we’ll write another method for the same service that copies the right bits of that response into our database, plus tests.

--

--

Mikey Clarke

Hi there! My snippets and postings here are either zeroth drafts from my larger novels, or web-app tutorials and other computery codey musings.