Tnooz has decamped to Boston for our latest THack, which is addressing an exciting and continually challenging area of the travel business: big travel data.
With access to several large databases filled with historical searches, flight information, trip duration, and more, hackers have been let loose to create data-driven travel solutions.
This sort of event is vital to the continued understanding of Big Data and it's implications in the travel space. More and more data is being collected as storage costs drop, and so now the narrative must be about effective solutions that address real business problems using actual data sets.
Participants range from those currently employed in travel to undergraduates with zero knowledge of the travel industry, which allows for a compelling mix of perspectives coming together to forge data-mined solutions.
THackers have been provided with the following data sets from which to cull and pinpoint patterns:
- 1.2 billion global flight records created in 2011, broken down by month from ARC
- 6 months of historical search data from Amadeus
- comprehensive flight records (2011) from FlightStats
- recent search data from early September via Travelport
Teams formed last night and this morning spontaneously, with each team given a server and all the available data sets.
Big, hairy audacious Data
Initial reaction from attendees has been a bit of a shock. One mentioned the fact that the software she was using couldn't even begin to handle that much data; another outlined just how challenging it was to normalize data across entities in order to make the appropriate processing calls.
Chandra Jacobs, from TripChi, had this to say about the volume of data:
"There's enough correlations between the different types of data that you can neck down specific cross-cuts and specific problem sets so you can answer and correlate across the different steps of the travel process.
"The hardest thing was finding a business problem - we have this volume of data, now how do we pose a relevant question that the people or business want an answer to?"
Other attendees didn't have any experience whatsoever with travel data - or the travel industry. Harvard undergraduate William Chen is taking it all in stride:
"It's kind of a splash of water in your face. Loading in everything properly was a big challenge to us, as we've been learning new systems. It's a remarkably slow process for us, but I think we'll have something to present. It's a lot more work than a predictive modeling exercise since there's all the data extraction and structuring."
The beauty about opening up a large data set in THack format is the ability to bring new perspectives of big travel data, and offer up a fresh take away from typically entrenched industry viewpoints.
Abdoul Sylla, Senior Director of Product, Big Data, for Travelport:
"Here, because we have people who are novices and people who have may have limited knowledge of the industry, they are approaching the problems from a different perspective. Is there going to be a killer application? Maybe not. But tons of half-good ideas can be put together to create one good idea."
Effectively, it's all about openness to new takes on the same data, as Sylla succinctly suggests:
"Data is only valuable if you can find that needle in the haystack, so that's what we're looking for."
Adrienne Cochrane, the Executive Director of our event host Hack/Reduce agrees:
"It's great to get people together with similar interests and different skillsets. It's amazing what can come out of that. It's also important to bring together the data scientists, the developers, with the domain experts. There are people here today that have experience with travel data, and those that have never looked at travel data."
Tadhg Pearson, Software Developer at Amadeus, thinks that insights from outside-industry data analysts is especially vital.
"The interesting thing about these events is that you get a whole load of people that aren't in the travel space. I'm sure partnerships will grow into data sharing within travel, but it's most interesting when you get data analysts from other industries."
From new perspectives towards more open and collaborative travel
A significant motivator behind the first big travel data-focused event was to not only see what happens when a random group of people gets together and crunches data. It was also to create a collaborative forum where openness and transparency could temporarily replace the competitive opacity that thrives in travel.
But can this shared data suggest future partnerships? Could travel companies work together via a co-opitition model to leverage the billions of travel data points into better products for clients and consumers?
The tone among industry attendees was hopeful.
Travelport's Sylla:
"The challenge that we always face is that data is very valuable for the industry. And we always face an uphill challenge to give up control of the data.
My belief is that, with time and more events like this, we will be in a position where we see that there is a lot of value in opening up the data and letting people come look into the data. When you start collecting more data than you can look at yourself, you have to have other people look at it!"
ARC's Chuck Thackston, Managing Director of Data Research and Analytics, is also optimistic for increased cooperation when it comes to sharing data.
"One of the things that we're going to learn here today is that given the opportunity to work with data sets across these areas, we're going to see things we couldn't see before.
I do hope there are more hacks like this as they continue to evolve as we get more data providers and provide more people looking at the data from outside the industry to give us new perspectives."
TripChi's Jacobs, for her part, argues that data interoperability and standardization must also be a part of this process. Despite privacy concerns, some demographic data must be tied to these data flows so that the resulting algorithms can more fully deliver on the promise of big travel data:
"I would have liked data that ties to an individual persona, for example in the search queries. That allows us to see the evolution of search over time to link it to consumer behavior. I do think this data set is robust enough to tie that together at the persona level."
Presentations are forthcoming, with only an hour to go for the teams to wrap up their hacks. Regardless of what emerges, the wheels have been turning and the collaboration around large data sets has been fierce. More on the results soon.