An intern at the data mining and analysis firm Cambridge Analytica left online for nearly a year what appears to be programming instructions for the voter targeting tools the company used around the time of the election, raising questions about who could have accessed the tools and to what end.
Social media analyst and data scientist Jonathan Albright discovered the election data processing scripts – or programming instructions – on what he said was the intern’s personal GitHub account. GitHub, a “Facebook for programmers,” is an internet hosting service mostly used for code.
A LinkedIn account that appears to belong to the intern identified by Albright lists him as a “Data Science Intern” for Cambridge Analytica between March and June of 2016.
A spokesman for Cambridge Analytica, which mined and analyzed voter data for the Trump campaign last year, told Business Insider after this article’s publication that “interns often carry out small learning exercises as part of their internship.”
“This code was never used by CA,” the spokesman said. “The API secret doesn’t belong to the company.”
But the scripts contained references to converting Twitter data and user IDs into a format “to put into the neural network,” and the intern used words like “we,” “we’re,” and “our keywords” to describe the processes. The account was also scrubbed less than an hour after Albright published his findings on Medium. The scripts had already been archived.
The tools the intern appears to have extracted facilitated geolocation targeting, to be used in enriching voter files with GPS coordinates, and Twitter sentiment analysis – essentially, the process of determining someone’s position on an issue by analyzing tweets and pulling data from users discussing certain topics.
The tool was used to find and group people on Twitter that talked about, or responded to, specific keywords in retweets.
Albright, who heads Columbia’s Tow Center for Digital Journalism and recently published extensive research on Russia’s use of Facebook during the election, said Cambridge Analytica’s real-time social media mining tool was not necessarily complex or novel in and of itself.
What is more interesting, he said, is how the tool appeared to retrieve people’s recent tweets and favorites to “expand” Cambridge Analytica’s body of keywords “around specific objects of election ‘outrage’ sentiment'” – like abortion, citizenship, naturalization, guns, and Planned Parenthood.
Recent reporting has revealed that Russia harnessed and harvested “outrage” sentiment in an attempt to galvanize and sway voters during the campaign. Accounts linked to Russia bought $100,000 worth of Facebook ads between 2015 and 2016, many of which promoted outsider candidates and exploited racial tensions. Similar methods were deployed on Twitter, Google, Instagram, Pinterest – and even Pokemon Go, as CNN reported earlier this week.
Additionally, the intern appeared to have left Cambridge Analytica’s Twitter API secret and key online when he uploaded the scripts. The secret and key, which was removed in February, amounts to the account username and password that companies and developers use to search and pull tweets and user profile information from Twitter, Albright explained.
Albright said the code for the tools was “sitting right on Github for almost a year: from March 2016 to February 2017 – the last 8 months of the US election.”
“That’s a security issue, in my opinion,” Albright added. “Could Russia find this and use it? Absolutely.”
Only Twitter would be able to definitively reveal whether the accidentally copied-and-pasted API key belonged to Cambridge Analytica, according to Albright.
But because of the social media’s terms of service and privacy protections for developers, the information could likely only be obtained via a subpoena. The House Intelligence Committee is scrutinizing Cambridge Analytica as part of its investigation into whether any collusion occurred between the campaign and Russia, The Daily Beast reported last week.
Still, Albright said, “showing the actual code in two of their scripts is one of the few pieces of evidence that can break through the noise and puffery around Cambridge Analytica. While code is not a person, it’s the ultimate journalistic source for a CA-related election story.”
Albright argued in a post on Medium that the question of Cambridge Analytica’s ownership – “a foreign business previously registered in the United States as a foreign corporation” – is now more relevant than ever.
“Foreign influence – sound familiar?” he wrote.
The company was founded in 2013 as an offshoot of its British parent company, SLC Group, and is partially owned by Robert Mercer – a hedge fund billionaire, Trump supporter, and top investor in Breitbart.
Twitter last week gave the Senate Intelligence Committee the profile names, or “handles,” of 201 accounts it believes were operating out of Russia during the election. But Politico reported Friday that much of the data that could be useful in examining the extent of Russia’s Twitter operation was deleted by the company.