A flexible data collection and supervisory system proven in over 25 jurisdictions.
Find out moreMeet all of your AEOI commitments including CRS, FATCA, CbC Reporting and ETR.
Find out moreAutomated Regulatory Reporting trusted by Regulators.
Find out moreAuthor: Ben Kelly, SupTech Product Manager
Several origin myths from around the globe once claimed that the entire human race once spoke a single, common language.
In one version, it is said that the single, unified people affronted their God by attempting to build a tower that reached the heavens. Divine intervention was quick, bringing into existence multiple languages, introducing mutual unintelligibility within society, and then scattering all peoples to every corner of the Earth.
Whether or not early humans in fact spoke a common language, when we look at the datasphere, we can see that a lack of common interpretation today is a severe roadblock to timely and efficient use of data. A digital lingua franca is sorely needed. Volumes are exponentially increasing in every sector and the efficiency brought about via ease of interpretation of that information becomes exponentially more critical. This surge in data within the datasphere from 2010 and projecting to 2025 is illustrated below, broken down by region.
IMAGE: Figure 12 - Size and Growth of the Global Datasphere by Region. Source: The Digitalization of the World from Edge to Core.
Financial Regulators are facing the same challenge, and a failure to adapt harms all stakeholders –the regulator, the regulated and the general public. A lack of common definitions and interpretation between the regulated and the regulator leads to the increased cost and reduced efficiency for both parties.
The increase in data volumes in financial regulation is evident. Huw van Steenis, chairing a review of the United Kingdom’s financial system in 2019, noted an explosion of data meant supervisory teams were “receiving twice the entire works of Shakespeare in reading each week”.
Van Steenis further observed that the potential backlog of reading material to interpret is not going to reduce “…the amount of data HSBC stores on its servers doubles every two to three years. It is up to 240 petabytes.” As a reminder, a single petabyte equals one quadrillion bytes!
The costs incurred due to opaque data requests are also clear. From the perspective of the regulated, The Financial Times found in a poll that “institutions spend up to 10 per cent of their annual revenue dealing with a patchwork of divergent regulations” across countries. The absolute number was also noted as ‘conservative’. It found that even where common standards exist, interpretations will differ across borders. This burden disproportionately affects smaller institutions.
What does this all have to do with granular data? Granular data is one part of a solution which underpins a common regulatory language which will prevent obfuscation of meaning – enabling a digital lingua franca (common language) for financial regulation. While there are many tools that address this challenge, the focus of this article is granular data and how it supports this digital lingua franca.
Let us take a simple example prior to giving the definition. Imagine that you are in charge of determining the national average temperature for the month of December. You request regional bodies to provide their monthly average temperatures at the end of the month. Essentially you have requested a single aggregated value from each body. But this can lead to issues. Perhaps each body interpreted the requested aggregation differently. One provided the mean, the next the median and another the mode. Another issue is that since you made the request you have now become interested in the variation of temperature throughout the month – you now must make a different request for data!
Perhaps you should have requested disaggregated data; reporting each individual daily temperature readings, resulting in 31 values reported at the end of the month by each month. This would have avoided misinterpretation by the reporting bodies in how to properly aggregate the values and allows you the flexibility to use the data in different ways and avoids repetitive data requests.
The Bank of England acknowledges this pain point in financial regulation, writing “the Bank often requires data to be aggregated in ways that makes reports hard to repurpose. This leads to more requests for new reports or breakdowns of existing reports than would otherwise be the case. It also leads to redundancy in the reporting process, as firms need to re-assemble the same underlying building blocks in different ways for different reports.”
We can now start to see the benefits for the stakeholders involved.
Granular Data means for the reporter:
Granular Data means for the requestor:
So, what is the definition of granular data? The broadly accepted definition is that granular data is the disaggregation of data to its finest grain. However, it is truer to say that is disaggregation to the point that is practicable in both implementation and utility (while also complying to data privacy legislation!) all while providing the benefits enumerated above. Consider that in our national temperature reading example, it may be more granular to report the temperature for every minute or second, but it may not be practical for the reporter to implement this reporting requirement. It may also provide little to no foreseeable utility to the national body.
The need for more data, including granular data, became accelerated by the 2009 financial crisis. Huw van Steenis noted that “the BCBS alone published twice as many regulatory standards between 2009-17 than in the 20 years priors”.
Two granular data requirements were promptly developed on both sides of the Atlantic in the wake of the crisis. We also look at the experience of a non-G20 regulator on the Atlantic leading on this approach.
The FR Y-14M report collects monthly detailed data from bank holding companies (BHC) and intermediate holding companies with $50 billion or more in total consolidated assets. The report is comprised of three loan- and portfolio-level collections and one detailed address matching collection.
In scope BHCs will also have differing reporting requirements based on their size and relative activity in certain risk areas. The loan level data table consists of over 100 data fields.
The data provides the Federal Reserve with a plethora of information to:
Probably the most widely known credit risk focussed collection, AnaCredit (Analytic Credit Datasets) comprises the collection of granular credit data based on harmonised ECB statistical reporting requirements to be submitted by all credit institutions (inclusive of their foreign branches) in the euro area. The data represents highly detailed information on individual banks loans (approximately 100 data points) within the euro area. Currently, the collection is restricted to data on loans to corporations (and other legal entities), where loans are larger than €25,000.
The benefits for the ECB and the regulators of euro zone National Competent Authorities are:
The Bank of England estimates that currently 15% of their collection templates involve granular data. These collections can be ‘hybrid’ in nature, collecting both aggregate and granular data. The Solvency II collection asks insurers to provide 30 data points on each asset they hold – including their nature, issuer, economic sector, value and acquisition price. Adding to the granularity, multiple rows can be provided per asset depending on the assets’ position.
The Bank uses Vizor Software, called the Bank of England Electronic Data Submission (BEEDS) Portal, to collect this data, executing automated plausibility checking in addition to data quality rules, at the point of collection.
The Bank noted several benefits:
Regulators of G20 economies and global financial centres are not the only ones making strides in this area. The Monthly All Financial Institutions (MAFI) Return is a Ghanaian monthly reporting requirement requiring transactional details on all loans, deposits, borrowing and investments in a single submission. Approximately 140 data points are reported. The regulator uses Vizor Software solution, called ORASS (Online Regulatory and Analytical Surveillance Software) by Bank of Ghana, to support both direct upload to a web portal as well as machine-to-machine reporting via a RESTful API.
The data is stored to the same repository as all other data collected from the banks, allowing the Bank of Ghana staff to conduct holistic and fully integrated risk assessments on the Vizor platform based on credit and liquidity risk indicators derived from MAFI data. A mixture of granular and aggregated data is also acquired in this collection pertaining to market, operational, strategic, and earnings risk. For example, banks are also required to report each individual cybersecurity incident by type and impact, which informs calculation of the operational key risk indicators.
All data from this collection is also available, (without requiring the complex transformation that non-granular data would typically entail) in a data warehouse enabling the regulator to dynamically aggregate and visualise the data as they desire.
In summary, this provides the following benefits to the Bank of Ghana:
There are many more examples of regulators having a vision for granular data. Let’s move away from the Atlantic and look at what proactive regulators in the APAC region are progressing.
The Australian Prudential Regulation Authority have an approach in which they will progressively move from form-based returns to concept-dimension models thereby collecting data at a more granular level. Their draft ARS 220.0 aims to collect provisions allocated on a portfolio basis at a detailed level from authorised deposit taking institutions, which is planned to go-live in March 2022. The degree to which the collection can be disaggregated will be influenced by the Privacy Act 1988 and will require consultation with the industry. The Vizor Software platform will be used to collect and ensure the quality and completeness of the data acquired. Their solution will be named APRA Connect.
Nearby, the Reserve Bank of New Zealand has identified rapid house price rises since the Global Financial Crisis and associated build of mortgage debt as the key risk to their economy. Their ongoing and future approach to collection is governed by the mantra “collect once, use multiple times”. They have made recent strides in launching new collections, designed in collaboration with the industry, to better monitor this risk area. Yet they proactively seek to become even more granular via acquisition of anonymised customer-level transactional data, giving them “more detail, content and flexibility to enable analysis on data that may not currently be collected”.
The Hong Kong Monetary Authority (HKMA) completed a successful pilot of their Granular Data Repository in 2019. This pilot, collaborating with 19 participating banks, involved a monthly collection of transactions pertaining to corporate loans and residential mortgages.
Approximately 250 fields are reported including loans, counterparties and repayment schedules. In scope banks are required to report on behalf of their Hong Kong offices as well as their branches and subsidiaries in mainland China.
The HKMA has a long-term vision of replacing form-based reporting with granular data acquisition - this is part of a broader digital transformation, with the regulator establishing a dedicated Digitalisation Office. “We hope that, in the long run, the use of new technology will replace the current requirement for banks to submit template-based regulatory reports, thereby lessening their reporting burden. This will be a win-win outcome for the HKMA and the banking industry.”
Granular Data is not the only solution enabling a regulator to get closer to achieving our ‘Lingua Franca’ and should not be considered in isolation. In truth, a regulator needs to adopt a range of strategies. They may include not only technology but better processes around the utilisation of new and existing tools. Other enablers of an effective lingua franca in financial regulation include:
Returning to the growth in the datasphere illustrated at the outset of this article, it is important to make a brief reference to other solutions addressing this challenge.
For many data collections moving towards granular data is a mandatory component in achieving a world of finance in which all stakeholders are on the same page – because everyone is speaking the same language. We can see that regulators all over the globe are having the same conversation. Obfuscated terminology will be reduced or eliminated, leading to timelier and cost-effective acquisition of data which enables flexibility by its consumers.
However, this does not imply an immediate and total need to transition all current data collection to granular data. We have seen in the mentioned use cases that the primary utilisation of granular data has been in the area of lending and deposits. This data is statistical in nature, requiring less complexity than other types of reporting which requires data mapping to regulatory definitions or accounting principles.
And while granular data acquisition leads to reduced cost over time, a ‘big bang’ approach to acquiring granular data may incur significant initial costs for financial institutions. These institutions may be working with a web of legacy systems which were not originally procured with granular data acquisition in mind – for example, the introduction of AnaCredit saw financial institutions attempting to source data from dozens of internal data stores.
It also follows that investment in the processing power and storage capabilities of the regulators’ infrastructure also needs to be considered. Relational databases have evolved significantly in recent years and are now capable of handling acquisition, processing and analysis of granular data given the correct setup.
Therefore, there is no harm in taking a phased approach to introducing granular data, taking the most impactful and cost-effective areas of supervision to attain / promote industry buy-in as well as meet your most pressing supervisory needs in a prioritised manner.
A digital lingua franca in SupTech needs to be achieved, but each regulator needs to plan and find their roadmap to get there.