- REUTERS/Suzanne Plunkett
- A Swedish database startup called Neo4j was integral to the Paradise Papers leak, which showed how the world’s wealthy elite hide their millions in offshore bank accounts. Neo4j’s software enabled investigative journalists to explore the hidden relationships between powerful people and offshore bank accounts. The startup said its link to investigations such as last year’s Panama Papers has resulted in more paying enterprise clients signing up for its graph database software. Neo4j has raised $80 million (£61 million) to date and almost doubled its headcount over the last 12 months.
Emil Eifrem was driving home from his goddaughter’s fifth birthday party in Gothenburg, Sweden, when his phone started buzzing. A stream of notifications alerted him to the Paradise Papers, a massive leak which showed how the world’s richest people use offshore havens to shield their wealth.
“I switched seats with my wife,” he said. “We turned on the radio, and as I’m sitting in the car I’m pulling up my laptop, trying to hotspot. I knew what my Monday would be like.”
Over the next 24 hours, Eifrem knew he’d be fielding a bunch of interview requests about the leaks.
He is the founder and CEO of Neo4j, a graph database company whose bread and butter is convincing blue-chip firms to use its tech to store and structure their data. It counts most of the big US retailers among its customers, including Walmart and eBay. But it has now also played a crucial role in three major journalistic investigations that involved querying huge amounts of data: the Swiss Leaks in 2015, last year’s Panama Papers, and this year’s Paradise Papers.
Journalists used Neo4j to uncover hidden relationships between powerful individuals and offshore bank accounts
Unlike a normal tabular database, which structures information a bit like an Excel spreadsheet, Neo4j can analyse relationships between different types of data. Facebook’s Graph search engine worked on similar principles.
“The structure of rows and columns is awesome if you want a payroll system that shows you first name, last name, salary, and title,” Eifrem told Business Insider in an interview at the Web Summit conference in Lisbon. “That’s what data looked like in the 1970s.”
Eifrem argued that data is more useful when you can detect relationships, which isn’t possible in a tabular structure.
That proved true for the ICIJ, the investigative journalistic organisation which co-ordinated coverage of the three leaks. Faced with millions of documents outlining complex offshore deals, the ICIJ needed to find a fast way to uncover hidden networks.
An example from the Panama Papers is former Iceland prime minister Sigmundur Davíð Gunnlaugsson, who was forced to step down after revelations his family tried to hide millions in offshore accounts.
Here is a diagram showing how Gunnlaugsson was connected to the offshore accounts – it was all indirect, several steps removed, and through his wife.
Discovering these connections would have been impossible if the information had been structured into a tabular format, according to Eifrem. The ICIJ used Neo4j and other tools to detect connections that were otherwise invisible – like the shared address between Gunnlaugsson and his wife, Anna Sigurlaug Pálsdóttir.
Pierre Romera, chief technology officer of the ICIJ, told Business Insider: “Most of the leaks we get are not structured since they are raw documents.
“With the Paradise Papers, those documents represented 1.4 TB of data and were gathered from different sources. Putting them in a single one database was a challenge for us. With Neo4j and [visualisation tool] Linkurious, and after a few weeks of research, we were able to propose to our 382 journalists a way to explore the data and also to share visualisations from stories they were working on. It’s surprising how intuitive a graph database can be for non-tech savvy people. Thanks to this approach, we could both investigate and prepare the future releases.”
Now Neo4j wants to help other investigative journalists
Neo4j still makes the bulk of its money from enterprise clients, though Eifrem isn’t giving out numbers. The startup has raised $80 million (£61 million) to date, and has almost doubled its staff to just less than 200 people. Eifrem himself moved from Sweden to Silicon Valley so the company could keep growing. With its latest tranche of funding, the firm has evolved slightly from a graph database company to something more like a platform, with new visualisation tools out of the box, and integration with other database systems.
Eifrem told reporters last year that the firm would be cash-flow positive by 2017, but he confirmed to Business Insider that this hasn’t happened. He explained the firm decided to take on a new funding round and invest that cash in growth. But he remains optimistic, since bigger competitors have become interested in graph databases and launched their own products, like Oracle and IBM.
“What’s the saying?” he said. “‘First they ignore you, then they laugh at you, then you win.’ We’re somewhere between them laughing at us and winning.”
Neo4j’s involvement with the ICIJ has also impacted its bottom line, in a good way. Banks, horrified by journalists airing their dirty laundry in public, have increasingly signed up to the firm’s software for fraud detection. “It’s been a massive upswing,” said Eifrem, who added that half of the top 25 banks globally are customers.
Eifrem is careful to state that Neo4j is not a journalistic organisation. While he knew “something” was coming a few hours before the Paradise Papers story broke, no one from Neo4j had seen any of the data or stories in advance. But he does want its open-source software to continue helping investigative journalists.
Neo4j now offers an accelerator programme to qualified journalists who want to learn how to query documents using its software – potentially meaning many more Paradise Papers-style investigations. It has also launched a connected data fellowship with the ICIJ, a six-month programme that enables a developer or journalist to work in a team to find stories in data.