Researchers from the University of California - Santa Barbara have come up with a novel approach to predictive analysis - by parsing past behavior on social media, they've found patterns that predict future behavior.
The research sorts these behaviors into so-called "social media genotypes," which places past behavior into specific categories.
The paper defines the genotype model as such:
Here we define our genotype model that captures the topicspecific behavior of a single user (node) within a social media network. Our main premise is that, based on observed network behavior, we can derive a unique signature of a user.
The user's signature is derived from a giant data set: 467 million tweets sent by 42 million users in 2009 (or 20% of tweets sent over a 6-month period) and another consisting of 14.5 million tweets sent by nine thousands users in 2012.
While the dataset is a few years old, it seems that these interests are generally fixed human behaviors rather than shifting with time.
By grouping hashtags into subsets, the researchers were able to identify core interests of users. These categories include: Business, Sports, Politics, Science/Technology and Celebrities.
For tweets without specific hashtags, they looked to shared URLs within tweets to place the communication in context.
The hypothesis is that users exhibit stable characterstics which facilitate future behavior prediction.
Our hypothesis is that individual users exhibit consistent behavior of adopting and using hashtags (stable genotype) within a known topic. If we are able to capture such invariant user characteristics in our genotype metrics then we can turn to employing the genotypes for applications.
Pushing through the dense statistical analyses that drive the paper, they prove that these genotypes do indeed predict future behavior on social media. By understanding genotypes, marketers and other social media users are able to more accurately predict a user's behavior.
This information should hit eager ears in the Twitter offices, as this study shows how a deep understanding of the available data allows for more effective targeting. If Twitter knows a user's genotype, they can ensure that the most relevant tweets likely to see engagement are placed near the top of the stream.
As the paper points out, this "network latency minimization" is an extremely valuable tool.
Another important problem that can be addressed given knowledge of topic-specific user behavior is that of improving the speed of information dissemination.
Fast information dissemination is critical for social media-aided disaster relief, large social movement coordination (such as the Arab Spring of 2010), as well as time-critical health information distribution in developing regions.
In such scenarios, genotypes and the influence structure among users are critical for improving the overall “latency” of the social media network.
Such placement would also clearly apply to advertisements as well. Twitter could conceivably provide genotype information similar to Facebook's demographic targeting, which has been parsed simply via the content said user has shared on the network.
The statsitcal analysis also shows a positive correlation of using the genotype prediction model to predict influence - much more accurate than simply using the follow/follower measurement of a user.
This means that looking at topic dissemination provides a better way to discover topic-specific influencers and adopters - and was seen to increase "influence predictive power" by 20%.
A Klout score may measure shares, engagement and other data points - this is all about analyzing what people actually demonstrate interest in via their own postings.
As the conclusion states, this improvement in predicting influence is a compelling twist to simply looking at follower count or perceived topic interest. The genotype breaks it down into a repeatable measure that creates a new demographic frame when looking at social media.
Features captured by the user genotypes define the actual topic-specific user behavior in the network, while the traditionally analyzed follower network defines only what is possible in the information dissemination process. Within our genotype model, each network user becomes an individual with a unique and invariant behavioral signature within the topic-specific content dissemination. In addition, we demonstrated that users are embedded in topic-specific influence.
As marketers continue to see influence as a key measure for targeting different users, this topic-specifc lens could provide a useful addition to this influence measure - especially in a content-heavy industry such as travel.
The paper, entitled "The Social Media Genome: Modeling Individual Topic-Specific Behavior in Social Media," is highly statistical and can be downloaded here.
NB: Gene image courtesy Shutterstock.