The Pandora Papers’s 11.9 million records arrived from 14 different offshore services firms in a jumble of files and formats – even ink-on-paper – presenting a massive data-management challenge
A 2.94 terabyte data trove exposes the offshore secrets of wealthy elites from more than 200 countries and territories. These are people who use tax and secrecy havens to buy property and hide assets; many avoid taxes and worse. They include more than 330 politicians and 130 Forbes billionaires, as well as celebrities, fraudsters, drug dealers, royal family members and leaders of religious groups around the world.
The International Consortium of Investigative Journalists spent more than a year structuring, researching and analyzing the more than 11.9 million records in the Pandora Papers leak. The task involved three main elements: journalists, technology and time.
What is the Pandora Papers?
The Pandora Papers investigation is the world’s largest-ever journalistic collaboration, involving more than 600 journalists from 150 media outlets in 117 countries.
The investigation is based on a leak of confidential records of 14 offshore service providers that give professional services to wealthy individuals and corporations seeking to incorporate shell companies, trusts, foundations and other entities in low- or no-tax jurisdictions. The entities enable owners to conceal their identities from the public and sometimes from regulators. Often, the providers help them open bank accounts in countries with light financial regulation.
The 2.94 terabytes of data, leaked to ICIJ and shared with media partners around the world, arrived in various formats: as documents, images, emails, spreadsheets, and more.
The records include an unprecedented amount of information on so-called beneficial owners of entities registered in the British Virgin Islands, Seychelles, Hong Kong, Belize, Panama, South Dakota and other secrecy jurisdictions. They also contain information on the shareholders, directors and officers. In addition to the rich, the famous and the infamous, those exposed by the leak include people who don’t represent a public interest and who don’t appear in our reporting, such as small business owners, doctors and other, usually affluent, individuals away from the public spotlight.
While some of the files date to the 1970s, most of those reviewed by ICIJ were created between 1996 and 2020. They cover a wide range of matters: the creation of shell companies, foundations and trusts; the use of such entities to purchase real estate, yachts, jets and life insurance; their use to make investments and to move money between bank accounts; estate planning and other inheritance issues; and the avoidance of taxes through complex financial schemes. Some documents are tied to financial crimes, including money laundering.
What’s in the Pandora Papers?
The more than 330 politicians exposed by the leak were from more than 90 countries and territories. They used entities in secrecy jurisdictions to buy real estate, hold money in trust, own other companies and other assets, sometimes anonymously.
The Pandora Papers investigation also reveals how banks and law firms work closely with offshore service providers to design complex corporate structures. The files show that providers don’t always know their customers, despite their legal obligation to take care not to do business with people who engage in questionable dealings.
The investigation also reports on how U.S. trust providers have taken advantage of some states’ laws that promote secrecy and help wealthy overseas clients hide wealth to avoid taxes in their home countries.
What form did the data come in?
The 11.9 million-plus records were largely unstructured. More than half of the files (6.4 million) were text documents, including more than 4 million PDFs, some of which ran to more than 10,000-pages. The documents included passports, bank statements, tax declarations, company incorporation records, real estate contracts and due diligence questionnaires. There were also more than 4.1 million images and emails in the leak.
Spreadsheets made up 4% of the documents, or more than 467,000. The records also included slide shows and audio and video files.
What’s different about this leak from others we’ve heard about?
The Pandora Papers information – the 2.94 terabytes in more than 11.9 million records – comes from 14 providers that offer services in at least 38 jurisdictions. The 2016 Panama Papers investigation was based on 2.6 terabytes of data in 11.5 million documents from a single provider, the now-defunct Mossack Fonseca law firm. The 2017 Paradise Papers investigation was based on a leak of 1.4 terabytes in more than 13.4 million files from one offshore law firm, Appleby, as well as Asiaciti Trust, a Singapore-based provider, and government corporate registries in 19 secrecy jurisdictions.
The Pandora Papers presented a new challenge because the 14 providers had different ways of presenting and organizing information. Some organized documents by client, some by various offices, and others had no apparent system at all. A single document sometimes contained years’ worth of emails and attachments. Some providers digitized their records and structured them in spreadsheets; others kept paper files that were scanned. Some PDFs contained spreadsheets that had to be reconstructed into spreadsheets. The documents arrived in English, Spanish, Russian, French, Arabic, Korean and other languages, requiring extensive coordination among ICIJ partners.
The Pandora Papers gathered information on more than 27,000 companies and 29,000 so-called ultimate beneficial owners from 11 of the providers, or more than twice the number of beneficial owners identified in the Panama Papers.
The Pandora Papers connected offshore activity to more than twice as many politicians and public officials as did the Panama Papers. And the Pandora Papers’ more than 330 politicians and public officials, from more than 90 countries and territories , included 35 current and former country leaders.
The new leak also includes information on jurisdictions not explored in previous ICIJ projects or for which there was little data, such as Belize, Cyprus and South Dakota.
The legal entities in the files of six providers – the companies, foundations and trusts – were all registered between 1971 and 2018. The records show providers and clients shifting their business from one jurisdiction to another after investigations and resulting rule changes.
How did you explore the files?
Only 4% of the files were structured, with data organized in tables (spreadsheets, csv files and a few “dbf files”).
To explore and analyze the information in the Pandora Papers, ICIJ identified files that contained beneficial ownership information by company and jurisdiction and structured it accordingly. Each provider’s data required a different process.
In cases where information came in spreadsheet form, ICIJ removed duplicates and combined it into a master spreadsheet. For PDF or document files, ICIJ used programming languages such as Python to automate data extraction and structuring as much as possible.
In more complex cases, ICIJ used machine learning and other tools, including the Fonduer and Scikit-learn softwares, to identify and separate specific forms from longer documents.
Some provider forms were handwritten, requiring ICIJ to extract information manually.
Once information was extracted and structured, ICIJ generated lists that linked beneficial owners to the companies they owned in specific jurisdictions. In some cases, information about where or when a company was registered wasn’t available. In others, information was missing about when a person or an entity had become the owner of the company, among other details.
After structuring the data, ICIJ used graphic platforms (Neo4J and Linkurious) to generate visualizations and make them searchable. This allowed reporters to explore connections between people and companies across providers.
To identify potential story subjects in the data, ICIJ matched information in the leak against other data sets: sanctions lists, previous leaks, public corporate records, media lists of billionaires and public lists of political leaders.
ICIJ’s partner in Sweden, SVT, generated spreadsheets containing data extracted from passports found in the Pandora Papers.
ICIJ shared records with media partners using Datashare, a secure research and analytical tool developed by ICIJ’s technical team. Datashare’s batch-search function helped reporters match some public figures with the data.
The leak contains routine documents that service providers gather for due diligence – news articles, Wikipedia entries, information from financial data provider World-Check – that don’t necessarily confirm whether a person is hiding wealth in a secrecy jurisdiction. ICIJ used machine learning to tag such files in Datashare, enabling reporters to exclude them from their searches.
Our 150 media partners shared tips, leads and other information of interest using ICIJ’s global I-Hub, a secure social media and messaging platform. Throughout the project, ICIJ held extensive training sessions for partners on the use of ICIJ technology to explore, mine and better understand the files.
What did you research and how did you organize it?
Having identified documents that contained information on the owners of offshore entities and structured the information by provider, ICIJ unified the data in a centralized database.
This provided ICIJ and its media partners with a unique data set of beneficial owners of companies in secrecy jurisdictions.
ICIJ eliminated duplications in the data and identified key elements, such as nationality of the owner, country of residence and place of birth. This enabled us to find, for instance, nearly 3,700 companies with more than 4,400 beneficiaries who were Russian nationals – the most among all nationalities in the data. The figure includes 46 Russian oligarchs.
ICIJ also researched and analyzed the use of U.S. trusts, using keyword searches and matches with public data, among other methods.
As a result, ICIJ identified more than 200 trusts settled, or created, in the U.S from 2000 to 2019, with the largest number registered in South Dakota. The trusts were connected with people from 40 countries (not including the U.S.). ICIJ identified assets in single trusts worth between $67,000 and $165 million held between 2000 and 2019. The data shows that U.S. trusts held assets worth a total of more than $1 billion. Those included U.S. real estate and bank accounts in Panama, Switzerland, Luxembourg, Puerto Rico, the Bahamas and elsewhere.
To perform the analysis of U.S.-based trusts, ICIJ manually gathered information on the creators, known as settlors; the beneficiaries, and the assets held by the trusts. ICIJ was able to identify and gather data on trusts from 15 U.S. states and the District of Columbia.
ICIJ and its media partners used keyword searches to identify politicians in the data, using passport information to help with the identification.
ICIJ used public records to verify details related to the companies and to be sure the people named in the data were actually the political leaders identified with those names. We found some false positives and discarded them. Among sources used in the research were the Dow Jones Risk and Compliance database, Sayari, Nexis, OpenCorporates, property records in the U.S and U.K., and public corporate records. More than 330 politicians and high level public officials, including 35 country leaders were confirmed.
ICIJ structured the information in a spreadsheet and put it through two rounds of fact-checking. Data gathered on politicians was also visualized in the profiles in our Power Players feature.
ICIJ matched Forbes’s billionaires lists against the Pandora Papers to find more than 130 who had entities in secrecy jurisdictions. More than 100 of them had a combined fortune valued at more than $600 billion in 2021.
ICIJ analyzed 109 so-called suspicious activity reports to financial authorities filed by the Panamanian law firm Alemán, Cordero, Galindo & Lee, or Alcogal, and learned that 87 of the anti-money-laundering forms were written only after authorities or journalists had publicly identified the firm’s clients as involved in alleged wrongdoing.
ICIJ also read through several thousand publicly available employees’ profiles and found out that more than 220 lawyers associated with the giant law firm Baker McKenzie in 35 countries had previously held government posts in agencies including justice departments, tax offices, the EU Commission, and offices of heads of state.
ICIJ also did research and analysis to explore the role offshore finance plays in hiding looted art and ancient relics that authorities and communities seek to reclaim.
Finally, the Pandora Papers investigation identified more than 500 BVI companies that had been clients of Mossack Fonseca, the law firm at the center of the Panama Papers scandal, and moved their business to other BVI providers in the aftermath whom we found in the data.
ICIJ also matched Panamanian companies from the Panama Papers data against data available for the Panama corporate registry on OpenCorporates, and found out that at least 113 companies had changed registered agents and simply moved to Alcogal between April 3, 2016 and 2020. Together with The Miami Herald data team, ICIJ also counted 759 BVI companies that specifically considered moving to Trident Trust as part of the provider’s so-called “Mossfon Project”.
How big a slice of all offshore provider data in the world does the Pandora Papers leak represent?
The Pandora Papers probe offers a broad look at secrecy jurisdictions and offshore service providers, but the data came incomplete.
The quality of the data varied by provider. In some cases, the data tied to companies didn’t offer information about the jurisdiction where they were registered, the period during which an individual was linked to an entity, or about intermediaries. The data still offered important information about owners and, in some cases, transactions and other financial details.
The 14 providers, which offered services in at least 38 jurisdictions, are part of a larger industry of offshore services operating around the world. It’s hard to say how much of the universe of provider data we have, a small fraction, probably.
For example, in the BVI, where six, or nearly half, of the providers found in the Pandora Papers have acted as registered agents, they are among at least 101 firms acting in that capacity, according to the BVI Financial Services Commision. In March 2021, there were more than 370,000 active companies, about a dozen for each of the tiny island nation’s inhabitants.
Why so many more ‘ultimate beneficial owners’ – UBOs – here than in previous leaks?
A significant proportion of the beneficial ownership information in the Pandora Papers comes from reports generated by providers for the BVI’s Beneficial Ownership Secure Search System, or BOSS, established in the wake of the 2016 publication of the Panama Papers. This information is not available to the public.
A 2017 BVI law requires providers to report to BVI authorities the names of the real owners of the companies registered there. The leak identified many documents containing such information.
Why so many world leaders and politicians in the data?
Alcogal and Trident Trust was where we found a large number of current and former politicians and public officials as clients. Most of their companies were registered in the BVI and Panama. Alcogal clients include nearly half of the politicians and public officials identified in the Pandora Papers. In the beneficial ownership data that ICIJ was able to structure, nearly half of the companies were linked to Alcogal. Alcogal, headquartered in Panama, has among its founders several politicians, one of whom served as Panama’s ambassador to the United States.
Why so many beneficial owners from Russia and Latin America?
Some of the providers, based on their location and the jurisdictions where they do business, such as Cyprus, have a large proportion of Russian clients, the largest group by nationality in the Pandora Papers data.
In the Pandora Papers, more than 30% of the companies that received services from Demetrios A. Demetriades LLC, or DadLaw, a provider headquartered in Cyprus, had one or more Russians as beneficial owners. Similarly, more than 40% of the companies that received services from Seychelles-based Alpha Consulting Group, also had one or more Russians as beneficial owners. Alcogal and Fidelity Corporate Services Limited were also among the providers with the largest number of Russian clients.
A large proportion of beneficial owners appearing in the data are from Latin America. More than 90 of the more 330 politicians and public officials in the data are from Latin America. Argentina, Brazil and Venezuela are among the countries with the largest representation of beneficial owners. In the leaked data, Alcogal headquartered in Panama, has the largest group of Latin American clients.
Where are the U.S. citizens and multinational corporations?
When it comes to creating offshore companies, foundations and trusts, parties from different parts of the world and with different needs select different providers and jurisdictions for their shell companies.
Pandora Papers documents cover a large number of providers, but obviously not all, or even most, of them, and many jurisdictions are not represented in the data.
In previous ICIJ investigations, including 2017’s Paradise Papers, the leak came from a prestigious law firm with a larger corporate practice, Appleby. As a result, the data included more documents about multinationals. Bermuda and the Cayman Islands, which are popular havens for corporations, were among the jurisdictions with a large presence in that leak.
As for U.S. nationals, ICIJ identified more than 700 companies with beneficial owners connected to the U.S. in the Pandora Papers; Americans were also among the top 20 nationalities represented in the data. In the Pandora Papers, Russia, the United Kingdom, Argentina, China and Brazil, are among the countries with the largest representation of beneficial owners.
In the Paradise Papers, U.S. citizens had a larger relative presence.