We are just getting started with our big data project

My colleagues and I have recently obtained the UAE's entire Twitter data for 2015. This is more than 18 million tweets saved in a database that we can query in a multitude of ways. If a person reads every tweet you ever wrote, that is known as stalking. When psychologists do it however, it's called research. This is a new frontier in social research and it has been enabled by the birth of big data.

The rapid proliferation of digital devices, ever greater storage capacity and increasingly sophisticated data processing techniques have all coalesced to make big data very relevant.

The implications of big data are widespread, and over the coming decades they will increasingly shape the way we live our lives, operate our businesses and govern our societies. Just as our ancestors pondered the celestial bodies trying to predict portentous events, so too our own awed contemplation of big data is slowly morphing into innovative applications aimed at predicting and explaining human behaviour.

Social psychologists argue that predicting and explaining other people’s behaviour is just something we humans naturally and routinely do. Like amateurish scientists or detectives, we look at the observable data in other people’s actions and start to make all kinds of inferences.

Many of us will eye other people’s purchases at the supermarket checkout, and our thoughts run to speculation about what their lives might be like: celebrity magazine, beep, three tins of cat food, beep, a multi-pack of Red Bull, beep, low-fat vegetarian lasagne, beep.

Supermarkets have systematically used electronic point-of-sale data for decades to understand and predict consumer behaviour.

With the growth of social networking sites, we now have even richer data sets to explore. One of our own research projects will look at the times of the day, week, month and year that people appear to be using positive or negative self- referential language. In other words, when are people most likely to be saying things such as “I loved it lol :) so happy I went” and conversely when are they saying things such as “I’m not happy about that, I hate that sort of thing :(.”

Furthermore, we can also look at differences between tweets in English and Arabic or different emirates. The relative frequency of positive and negative language use, gives us an indication of people’s moods.

The good thing about big data is that it is so big. Even if some of the tweets are just noise, for example, people tweeting song lyrics or inspirational quotes, the sheer volume of the data will still drown out such interference and ultimately reveal clear patterns. As the years go by, this data is going to grow exponentially, and the certainty we have about the observed patterns is likely to grow with it.

We are just getting started with this type of research and the tools and skills to get the most out of big data are still emerging.

One major development is advances in unstructured data analysis techniques. Such techniques now make it possible to trawl through unstructured data such as text messages, emails and even images.

This is probably one of the reasons Google’s chief economist, Hal Varian, suggests: “The sexy job in the next 10 years will be statisticians … The ability to take data – to be able to understand it, to process it, to extract value from it, to visualise it, to communicate it.”

As part of our project we have developed a computerised interface to the Twitter data called Bulbul, derived from the Arabic word for nightingale.

We hope that Bulbul will also be used as a teaching tool, allowing students to explore their own research ideas and get a taste for big data.

Dr Justin Thomas is an associate professor at Zayed University

On Twitter: @DrJustinThomas

We are just getting started with our big data project

The millions of tweets sent each year in the UAE are proving to be a treasure trove for sociologists, writes Justin Thomas.