Wednesday, May 23, 2018

Google as "digital truth serum"

Seth Stephens-Davidowitz is forthright about the point of Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are:

... social science is a real science. And this new, real, science is poised to improve our lives.

Count me as mildly skeptical, for reasons I'll outline below. Nonetheless, this book is fun, easy to read, and full of suggestions for more exploration.

He's a good explainer: he introduces the concept of data by pointing out that his grandmother's life experience watching family relationships has likely made her the one of his relatives with the most sophisticated view of what he should be looking for in a potential wife. Good catch, that.

But the data that Stephens-Davidowitz wants us to appreciate, as he does, is the tracks left by of our digital explorations, Google searches, and choices on the site PornHub. We are (usually) individually anonymous as we move about the net, but the aggregate of our web behavior tells an awful lot about us as a society.

The power in Google data is that people tell the giant search engine things they might not tell anyone else.

In 2015, when Rizwan Farook and Tashfeen Malik shot up his office party killing fourteen people in San Bernardino, then-President Obama went on the air urging us all to reject painting any community with a broad brush.

That evening, literally minutes after the media first reported one of the shooters' Muslim-sounding name, a disturbing number of Californians had decided what they wanted to do with Muslims: kill them. The top Google search with the word "Muslims" in it at that time was "kill Muslims." ... While hate searches were approximately20 percent of all searches about Muslims before the attack, more than half of all search volume about Muslims became hateful in the hours that followed it.

...Obama asked Americans to "not forget that freedom is more powerful than fear." Yet searches for "kill Muslims" tripled during his speech.

The PornHub searches Stephens-Davidowitz examines actually seem somewhat comforting in comparison to the hate searches. Sure, people look for some pretty weird sex stuff. But overall

... there's something out there for everyone. Women, not surprisingly, often search for "tall" guys, "dark" guys, and "handsome" guys. But they also sometimes search for "short" guys, "pale" guys, and "ugly" guys. ...Men frequently search for "thin" women, women with "big tits," and women with "blonde" hair. But they also sometimes search for "fat" women, women with "tiny tits," and women with "green" hair.

And yes, he uses search data to conclude that about 5 percent of men are gay, though in most of the country, half of those are still in the closet. He admits to being unable to use any of the varieties of web data to figure out how many lesbians are out there.

And so the book goes on, disgorging fascinating data-derived observations, some of which seem more plausible than others, but all of which seem at least suggestive of potential for future study.

Yet I did not come away convinced that I was being introduced to a new triumph of social science. I've lived at the intersection of data and purposeful activity for years. That is, I have at lot of experience with some of the largest data sets anyone worked with before they had access to Google: election participation statistics and results. When working on a campaign, I've often found myself trying to calm someone waving a new poll: "Hold on! We already know where that district leans because we have the much larger polls which were the past elections." Sometimes results can change, but the underlying data set from which to work has been complied over the years by election authorities.

(By the way, the flap over Cambridge Analytica was an example of confusion over the utility of data. That kind of data-based profiling of voters is always very tempting to some, but apparently as is usually the case, the election pros who got the stuff from Cambridge Analytica found it useless. Voters chose Trump; the election was not manipulated by a sneaky data company.)

What we can do with that big data comes down, in large part, to how imaginatively we can query and reinterpret what we already know. I don't think what we do with search data is any different. What we learn from it will be largely determined by the rigor and creatively with which we choose to question it. And that's not science, as science is often understood using the natural sciences as the frame of reference. Social science remains more a mix of art and science -- modern cosmologists might agree.

For all my skepticism, Stephens-Davidowitz's little book is great fun for anyone who cares about data's possibilities.

No comments: