Airline availability and solving the travelling salesman problem

The first part of this viewpoint explored how Aerospike could help hospitality distribution, here we turn to air travel where there are similar issues, compounded by a version of the "travelling salesman problem".

No, not an actual salesman…

NB: This is the second part of a two-part viewpoint from Max Rayner, a partner at Hudson Crossing. Here he explores some of the challenges for the airline distribution market as well as some of the cost implications for both sectors.

In air travel we’re talking about the classic computer science problem of having a graph (all the possible routes with connections between origin and destination).

Where hotels have rate plans, airlines have a multiplicity of fare bases. The fare basis dictates conditions for cancellations and changes, the ability to upgrade, etc… and this means that what might look like one plane with two cabins is really a complex web of multiple "products".

Explaining "legal" connections, interlining and selfie-do connections without actual interlining is a whole other matter that may deserve its own write-up.

But if you get the general idea, consider now the need to understand price and availability for every "legitimate" path in the graph even as airlines are changing availability and dynamically revenue managing costs on each segment, cabin, seat type and fare basis.

Having a good (if expensive) answer to this problem is why ITA Software was worth $700 million dollars to Google, and why its leading challenger, Vayant Travel Technologies just received a significant strategic investment from Deutsche Lufthansa AG.

It is also why major distributors of travel are hot with extraordinary computational challenges whether they’re facing the internet as B2C players (leading metas like Kayak, OTAs like Expedia), or B2B players (wholesalers like HotelBeds, GTA and Tourico, switches like Pegasus, and of course the GDSs).

So far, not necessarily scary, until you consider the interconnected nature of travel shopping

As users search more and increasingly casually (say on a mobile just to see what prices look like, or first on a mobile then at a work computer and then on home tablet), distributors face millions of people shopping for prices and availability.

Distributors in turn ask wholesaler and direct providers, which in turn monitor one another, creating a gigantic echo chamber.

This would not be so bad if the need was just to cache prices for a reasonably static product at Amazon. But the need in travel is to maintain an heterogeneous system with intense read/write loads so that you can simultaneously:

Check your own data for price and availability.
Request data from switches, wholesalers and primary providers such as hotels or airlines.
Explore CRM angles if you can identify the shopper as a prior customer.
Explore personalization, retargeting, and cohort optimization whether you can identify the shopper or not.
Revenue optimize sort order and other aspects of how you respond to a search.

Hospitality details: How does this translate into costs?

In some cases of lesser efficiency with traditional technologies, we find the direct cost of a search can be as high as nine thousandths of a cent ($0.00009).

In this definition "direct" means all costs for an additional quantum of searches – hardware, software, bandwidth and people but excluding SG&A and other relatively static costs.

While nine thousandths of a cent per search may sound absurdly excessive, basic math with pro-rated costs can get you there: add the physical layer costs, network/bandwidth, software license costs, variable human input costs, and third party costs that may come into play for some subset of the searches.

Now let’s check how many searches have to happen before a traveller books…

You may recall it’s not unusual for a large wholesaler to see 150,000:1 look-to-book ratios.

That would mean 150,000 * $0.00009, or $13.50 per booking.

On a booking for a single night at $80, with a margin of let’s say 15%, that would mean that the margin of $12 would NOT cover the direct search costs.

This might be a corner case, so let’s consider hotel industry typical numbers of something like 2.4 nights per stay and $150 average daily rate. Assuming a 15% margin that means $54 per booking to play with.

Reversing direction now, we can ask, what look-to-book ratio would cause a channel to be unprofitable? Any channels using our distributor that exceed 600,000 searches per booking would be operating at a loss.

Airline use cases are also hairy

Airfare price and availability searches are also a challenge. Imagine that you want to handle both sudden peaks in demand and broad queries over many alternative dates.

As mentioned above, the computational challenge here is to solve a "traveling salesman" style problem, with costs associated with each graph segment and in fact each combination of segments.

There are a number of services that can handle a fairly simple search. Global distribution systems have been doing it for years, and often serve not only in their legacy capacity of answering travel agent queries, but also as API providers. When acting as API providers, costs per search vary, but let’s say they might hover around $0.03 per search for your average small client.

The main alternative to GDS APIs has been ITA Software, which provides more sophisticated and arguably more cost effective searches now that it is under Google… let’s say something around $0.02 per search depending on volumes.

Emerging challenges come from startups such as Vayant, which use modern technology to offer data more cost effectively.

But now let’s see what an innocent price calendar for airfare might imply: say you want to show shoppers a 90-day calendar, where they can see the best price for departing on any given day and returning on any of the subsequent days up to 15 days later.

So there are ~90 possible departure dates and each one could be pared up with 15 possibly different return dates (basically covering an overnight trip all the way to a two week vacation). So that’s 1,350 searches to build a calendar of prices.

So far so good, except that look-to-book ratios for air distributors can be somewhere between 500:1 and 1,000:1. These can be even higher for B2B API providers such as GDSs, ITA Software and Vayant.

Suppose then this interesting challenge: build 90 day custom calendars with look–to-books of 500:1 and 1,350 queries each and suppose that every time one person books the price or availability or both may change.

Now suppose that determining "best" really means exploring all relevant and acceptable costs of getting from A to B with any possible layover in between. No surprise then that costs can be on the order of $0.02 per individual visitor search, with look-to-book ratios of 500:1 and four searches per visit.

In this extreme this would net to effective costs of as much as $40. Now most US domestic air tickets don’t have $40 or margin to spare.

It’s here that something like Aerospike can really make a difference.

Typical arrangements with legacy technologies would scale at a rate of about one additional server for each peak of about 500 searches/second.

At first this might seem ridiculously low throughput, but we have observed this in practice, often when traditional RDBMSs are involved so that record locks and other latency inducing issues gate high performance for simultaneous reads and writes.

On the other hand, with Aerospike:

An individual server easily reaches and sustains about one million reads/sec in RAM, with latency tolerances such that 99% of all reads return within less than a millisecond. Only other solution in RAM that’s worth mentioning is Couchbase, which in-memory only can actually step up to Aerospike. But since independent tests show it can let itself down for data sets too large to hold in RAM completely it’s rather limited and limiting.

For balanced read/write scenarios (needed in travel where price and availability are subject to change all the time) Aerospike has been tested at well over 50,000 transactions per second per server using SSDs. Versus a legacy solution of similar costs, the price/performance advantage is on the order of 100 times better.

Versus MongoDB and Cassandra, Aerospike has been independently tested to a 10X performance advantage together with a hardware cost advantage ranging from 4X to 14X using SSDs.

For those whose ambitions are to perform travel related searches at an leisurely speed for an unremarkable number of users, we say, cool, shard MySQL and use Couchbase as a cache and go forward. You’ll do just fine.

If your data set is small enough now and for all foreseeable growth in the future that it can all fit in RAM, you can use either Aerospike or Couchbase… both are quite good with in-memory data sets.

But if you’re shooting for remarkable speeds with internet-scale transactions and hot analytics over large data sets, there’s no substitute.

Aerospike is the first product to totally transform what you can expect from an affordable, and now open source, technology. It gives you speeds you thought could only come from RAM at flash SSD economics.

It scales both in speed, data size and use cases: it can operate in-memory, on directly addressable flash, as a hybrid and on disk as well.

NB: This is the second part of a two-part viewpoint from Max Rayner, a partner at Hudson Crossing.

NB2: The author has a minor equity stake in Aerospike.

NB3: Managing data image via Shutterstock.