NB: This is a guest article by Humphrey Sheil, chief technology officer for Comtec Group.
Data is big, and getting bigger. The more we track and log, the more storage is needed to warehouse it, and the more CPU horsepower is needed to mine it to answer questions posed by the business.
As an aside, everyone is facing this issue and it's sink or swim, with the swimmers sure to get a competitive advantage over the sinkers.
In this article, I'll examine the main data feeds that matter in leisure travel, and propose an architecture to collect, manage and mine them for business benefit. The end goal is to propose a vision, explaining why and how to collect data to better inform and drive business decisions that improve ecommerce performance.
But why now - hasn't this always been an issue? Yes, but now more than ever, leisure travel is poised on the cusp of another big game-changer.
Companies like Google and Microsoft are clearly already focusing more on travel as a segment, and their data gathering and mining capabilities are considerable. But tour operators and online travel agencies (OTAs) have a significant competitive advantage over pure play technology companies as we'll see a little later.
Important data sources in leisure travel ecommerce
First, let's examine the primary data sources that affect leisure travel ecommerce. There are some obvious entries in the table that follows, and some less so.
[table id=1514 /]
Two important characteristics of data are whether you control it or not (and hence can change it if you need to) and whether it is sourced from an internal system or an external system (and thus how trustworthy/accurate the data is and whether it is unique to you or if other business entities can see it too). We have added these two characteristics to the table above for clarity.
What should be obvious to the reader is that a holistic picture of ecommerce performance requires multiple data sources, some of which traditionally would not be seen as impacting the effectiveness of a leisure travel ecommerce system.
Gone are the days of simply looking at the web logs to see how effective (or leaky) the conversion funnel is! In fact, there are probably some sources that I've inadvertently omitted, and indeed as new systems come on stream, new sources will be added to this table / taxonomy.
Finally, it's interesting from a barrier to entry perspective to note that only the well-placed tour operator or OTA actually has the wherewithal and access to collate data from all of the sources noted in the table.
Other new entrants simply do not have access to many of the sources listed. The data itself is now a valuable commodity (and is increasing in value), and an asset that leisure travel businesses would do well to guard jealously.
What we need - systems and data working together
At present, I contend that the average tour operator/OTA is collecting some, but not all of the data sources identified, and that no tour operator or OTA has yet constructed a system that provides a holistic, joined-up view of the data back to the business function to inform decision-making activities.
Why not? Because it's not easy to do!
The IT estate behind these data sources is fragmented (core res system, yielding system, multiple content management systems, external systems, separate booking repositories/agency management systems, Google Analytics, Google AdWords, Excel spreadsheets), often owned by different companies and wasn't designed to provide with the kind of view that is now needed.
Ominously, new entrants into the space do not have a lot of the legacy baggage that incumbents do, meaning their velocity of implementation and ongoing change creates a hard-to-ignore imperative for all sellers of leisure travel to innovate quickly and learn from their data, or be left behind.
The technical challenge is four-fold:
1. Collection and storage
Gather and store as much data as possible for each data source in the table, with that data being as clean and structured as possible (and in the real world, every data set will have some noise to it)
2. Build a holistic, joined-up data set
Identify ways to link the data sources together - version number, unique keys, foreign keys, link backs, tagging etc. The more your data sources are joined up, the more holistic a view of the business you are building (and can provide back to the business).
Conversely, disconnected data sets (data islands) are of much less value to the business and introduce the risk of an incomplete / inaccurate view of what's really happening now being used to influence what's going to happen next
3. Answering the questions
Provide a mechanism to answer questions over this corpus of data in near real-time to allow the business to modify its behaviour and focus to maximise profits, yield and margin
4. Suggesting the questions
Once the above three points have been implemented to a mature and repeatable level, the final logical step is for the data function to actually suggest areas of improvement and further exploration based on emergent patterns in the data, using techniques such as artificial neural network and self-organising maps (SOM) analysis
Putting it all together - a suggested framework
There are many ways to construct a view over the data sources identified in the previous section. And in fact, multiple views are encouraged depending on the goal of the business.
Here, however, a hybrid of time and business function is selected in order to select a reasonable framework to hold the data. This framework is depicted in the following diagram:
A concrete implementation of the framework
The question naturally arises - how would this system be constructed, not just initially but also maintained and extended going forward?
Some natural candidates already exist, chief among them Cassandra and Hadoop. A hybrid architecture of Cassandra's data storage and innate simplicity and high availability, coupled with the MapReduce framework from Hadoop offers the best blend of performance, scalability, availability/resilience, querying and extensibility.
A separate follow-on instalment to this article is warranted to provide a detailed technical treatise on the underpinnings of the system outlined here.
Conclusion
The dominant data sources that impact the effectiveness of a leisure travel ecommerce strategy are identified, named and classified.
Developing this classification further, a model is used to create a framework to house the data sources and a concrete implementation suggested.
NB: This is a guest article by Humphrey Sheil, chief technology officer for Comtec Group.