NextGen SupTech: Granular Data - The Building Blocks of a Digital Lingua Franca

Author: Ben Kelly, SupTech Product Manager

Several origin myths from around the globe once claimed that the entire human race once spoke a single, common language.

In one version, it is said that the single, unified people affronted their God by attempting to build a tower that reached the heavens. Divine intervention was quick, bringing into existence multiple languages, introducing mutual unintelligibility within society, and then scattering all peoples to every corner of the Earth.

Whether or not early humans in fact spoke a common language, when we look at the datasphere, we can see that a lack of common interpretation today is a severe roadblock to timely and efficient use of data. A digital lingua franca is sorely needed. Volumes are exponentially increasing in every sector and the efficiency brought about via ease of interpretation of that information becomes exponentially more critical. This surge in data within the datasphere from 2010 and projecting to 2025 is illustrated below, broken down by region.

IMAGE: Figure 12 - Size and Growth of the Global Datasphere by Region. Source: The Digitalization of the World from Edge to Core.

Financial Regulators are facing the same challenge, and a failure to adapt harms all stakeholders –the regulator, the regulated and the general public. A lack of common definitions and interpretation between the regulated and the regulator leads to the increased cost and reduced efficiency for both parties.

The increase in data volumes in financial regulation is evident. Huw van Steenis, chairing a review of the United Kingdom’s financial system in 2019, noted an explosion of data meant supervisory teams were “receiving twice the entire works of Shakespeare in reading each week”.

Van Steenis further observed that the potential backlog of reading material to interpret is not going to reduce “…the amount of data HSBC stores on its servers doubles every two to three years. It is up to 240 petabytes.” As a reminder, a single petabyte equals one quadrillion bytes!

The costs incurred due to opaque data requests are also clear. From the perspective of the regulated, The Financial Times found in a poll that “institutions spend up to 10 per cent of their annual revenue dealing with a patchwork of divergent regulations” across countries. The absolute number was also noted as ‘conservative’. It found that even where common standards exist, interpretations will differ across borders. This burden disproportionately affects smaller institutions.

What does this all have to do with granular data? Granular data is one part of a solution which underpins a common regulatory language which will prevent obfuscation of meaning – enabling a digital lingua franca (common language) for financial regulation. While there are many tools that address this challenge, the focus of this article is granular data and how it supports this digital lingua franca.

What is Granular Data

Let us take a simple example prior to giving the definition. Imagine that you are in charge of determining the national average temperature for the month of December. You request regional bodies to provide their monthly average temperatures at the end of the month. Essentially you have requested a single aggregated value from each body. But this can lead to issues. Perhaps each body interpreted the requested aggregation differently. One provided the mean, the next the median and another the mode. Another issue is that since you made the request you have now become interested in the variation of temperature throughout the month – you now must make a different request for data!

Perhaps you should have requested disaggregated data; reporting each individual daily temperature readings, resulting in 31 values reported at the end of the month by each month. This would have avoided misinterpretation by the reporting bodies in how to properly aggregate the values and allows you the flexibility to use the data in different ways and avoids repetitive data requests.

The Bank of England acknowledges this pain point in financial regulation, writing “the Bank often requires data to be aggregated in ways that makes reports hard to repurpose. This leads to more requests for new reports or breakdowns of existing reports than would otherwise be the case. It also leads to redundancy in the reporting process, as firms need to re-assemble the same underlying building blocks in different ways for different reports.”

We can now start to see the benefits for the stakeholders involved.

Granular Data means for the reporter:

  • It is easier for the reporter to understand the nature of the request, since language is broken down to its simplest components
  • Less burdensome for the reporter, as scope for interpretation and potential for further ad hoc requests are minimised
  • More accurate reporting and reduced need to make resubmissions

Granular Data means for the requestor:

  • More timely acquisition of data as reporting burden is reduced
  • More accurate insights as the reporter will not misinterpret the data request
  • More flexibility to re-use acquired data to produce different insights

So, what is the definition of granular data? The broadly accepted definition is that granular data is the disaggregation of data to its finest grain. However, it is truer to say that is disaggregation to the point that is practicable in both implementation and utility (while also complying to data privacy legislation!) all while providing the benefits enumerated above. Consider that in our national temperature reading example, it may be more granular to report the temperature for every minute or second, but it may not be practical for the reporter to implement this reporting requirement. It may also provide little to no foreseeable utility to the national body.

Examples of Granular Data Sets in Financial Regulation

The need for more data, including granular data, became accelerated by the 2009 financial crisis. Huw van Steenis noted that “the BCBS alone published twice as many regulatory standards between 2009-17 than in the 20 years priors”.

Two granular data requirements were promptly developed on both sides of the Atlantic in the wake of the crisis. We also look at the experience of a non-G20 regulator on the Atlantic leading on this approach.

FR Y-14M (The Federal Reserve of the United States)

The FR Y-14M report collects monthly detailed data from bank holding companies (BHC) and intermediate holding companies with $50 billion or more in total consolidated assets. The report is comprised of three loan- and portfolio-level collections and one detailed address matching collection.

In scope BHCs will also have differing reporting requirements based on their size and relative activity in certain risk areas. The loan level data table consists of over 100 data fields.

The data provides the Federal Reserve with a plethora of information to:

  • Assess capital adequacy based on forecasted projections of profit and loss
  • Strengthen continuous risk monitoring and stress test models
  • Inform operational decision making in support of consumer protection
AnaCredit (European Central Bank, National Competent Authorities within the euro zone)

Probably the most widely known credit risk focussed collection, AnaCredit (Analytic Credit Datasets) comprises the collection of granular credit data based on harmonised ECB statistical reporting requirements to be submitted by all credit institutions (inclusive of their foreign branches) in the euro area. The data represents highly detailed information on individual banks loans (approximately 100 data points) within the euro area. Currently, the collection is restricted to data on loans to corporations (and other legal entities), where loans are larger than €25,000.

The benefits for the ECB and the regulators of euro zone National Competent Authorities are:

  • Enables bank supervisors to accurately assess credit risk in supervised financial institutions
  • Informs support of financial transactions by assisting credit institutions in the evaluation of risk
  • Enables macroprudential and economic analysis, using harmonised and concepts across different jurisdictions within the euro area
Solvency II (European Insurance & Occupational Pensions Authority, Bank of England, National Competent Authorities within the European Union)

The Bank of England estimates that currently 15% of their collection templates involve granular data. These collections can be ‘hybrid’ in nature, collecting both aggregate and granular data. The Solvency II collection asks insurers to provide 30 data points on each asset they hold – including their nature, issuer, economic sector, value and acquisition price. Adding to the granularity, multiple rows can be provided per asset depending on the assets’ position.

The Bank uses Vizor Software, called the Bank of England Electronic Data Submission (BEEDS) Portal, to collect this data, executing automated plausibility checking in addition to data quality rules, at the point of collection.

The Bank noted several benefits:

  • Supports the Bank’s risk reviews of insurers
  • Allows analysis of common counterparty exposure risk and country specific concentration risk
  • Mitigates ad hoc data requests by the Bank when conduction thematic reviews of asset class risk
Monthly All Financial Institutions Return (Bank of Ghana)

Regulators of G20 economies and global financial centres are not the only ones making strides in this area. The Monthly All Financial Institutions (MAFI) Return is a Ghanaian monthly reporting requirement requiring transactional details on all loans, deposits, borrowing and investments in a single submission. Approximately 140 data points are reported. The regulator uses Vizor Software solution, called ORASS (Online Regulatory and Analytical Surveillance Software) by Bank of Ghana, to support both direct upload to a web portal as well as machine-to-machine reporting via a RESTful API.

The data is stored to the same repository as all other data collected from the banks, allowing the Bank of Ghana staff to conduct holistic and fully integrated risk assessments on the Vizor platform based on credit and liquidity risk indicators derived from MAFI data. A mixture of granular and aggregated data is also acquired in this collection pertaining to market, operational, strategic, and earnings risk. For example, banks are also required to report each individual cybersecurity incident by type and impact, which informs calculation of the operational key risk indicators.

All data from this collection is also available, (without requiring the complex transformation that non-granular data would typically entail) in a data warehouse enabling the regulator to dynamically aggregate and visualise the data as they desire.

In summary, this provides the following benefits to the Bank of Ghana:

  • Enables immediate and quality credit risk assessment, fully integrated into a complete risk model
  • The granular MAFI data can be dynamically aggregated, calculated and visualised as per the supervisor’s wishes
  • The MAFI data can be integrated with other data sets as all collections exist in the same platform
A Global Vision Toward more Granular Data

There are many more examples of regulators having a vision for granular data. Let’s move away from the Atlantic and look at what proactive regulators in the APAC region are progressing.

The Australian Prudential Regulation Authority have an approach in which they will progressively move from form-based returns to concept-dimension models thereby collecting data at a more granular level. Their draft ARS 220.0 aims to collect provisions allocated on a portfolio basis at a detailed level from authorised deposit taking institutions, which is planned to go-live in March 2022. The degree to which the collection can be disaggregated will be influenced by the Privacy Act 1988 and will require consultation with the industry. The Vizor Software platform will be used to collect and ensure the quality and completeness of the data acquired. Their solution will be named APRA Connect.

Nearby, the Reserve Bank of New Zealand has identified rapid house price rises since the Global Financial Crisis and associated build of mortgage debt as the key risk to their economy. Their ongoing and future approach to collection is governed by the mantra “collect once, use multiple times”. They have made recent strides in launching new collections, designed in collaboration with the industry, to better monitor this risk area. Yet they proactively seek to become even more granular via acquisition of anonymised customer-level transactional data, giving them “more detail, content and flexibility to enable analysis on data that may not currently be collected”.

The Hong Kong Monetary Authority (HKMA) completed a successful pilot of their Granular Data Repository in 2019. This pilot, collaborating with 19 participating banks, involved a monthly collection of transactions pertaining to corporate loans and residential mortgages.

Approximately 250 fields are reported including loans, counterparties and repayment schedules. In scope banks are required to report on behalf of their Hong Kong offices as well as their branches and subsidiaries in mainland China.

The HKMA has a long-term vision of replacing form-based reporting with granular data acquisition - this is part of a broader digital transformation, with the regulator establishing a dedicated Digitalisation Office. “We hope that, in the long run, the use of new technology will replace the current requirement for banks to submit template-based regulatory reports, thereby lessening their reporting burden. This will be a win-win outcome for the HKMA and the banking industry.”

Other tools and processes to enable a digital Lingua Franca

Granular Data is not the only solution enabling a regulator to get closer to achieving our ‘Lingua Franca’ and should not be considered in isolation.  In truth, a regulator needs to adopt a range of strategies.  They may include not only technology but better processes around the utilisation of new and existing tools. Other enablers of an effective lingua franca in financial regulation include:

  • Early and continued collaboration with industry stakeholders – the regulated and, where possible, other regulators.
  • A well maintained, descriptive and machine-readable data dictionary, acting as a point of reference for all collections. The ECB’s Bank’s Integrated Reporting Dictionary (BIRD) is an example of this.
  • Use of clearly defined machine-readable data collection, which make explicit reference to the data dictionary. AnaCredit is an example of a collation which is covered by the BIRD.
  • Publication of these specifications in draft and finalised formats, manually or automatically via API, so that they are available and consumable by both humans and machines.
  • Format agnostic specifications allowing consumers choice regarding their preferred method of consumption. Publication artefacts for a single collection might include machine readable Excels, taxonomies and/or XSDs.   
  • New or improved approaches to drafting reporting instructions from regulation – examples include standardised natural language, annotated instructions, instructions as code. The Financial Conduct Authority and the Bank of England explored this as part of their Digital Regulatory Reporting pilot.
Other solutions for increased volumes generally

Returning to the growth in the datasphere illustrated at the outset of this article, it is important to make a brief reference to other solutions addressing this challenge.

  • The use of APIs to enable automated machine to machine reporting. New architectural approaches to data acquisition in large volumes are necessary to reduce human effort and error, as well as timeliness.
  • Machine learning can facilitate the automated generation of insights leading to prompt decision making by regulators, which may otherwise be frozen by the sheer volumes of data humans are swamped by.
  • Interpretation of big data. This should not be confused with granular data – Big Data is a broader concept associated with voluminous amounts of unstructured data, as well as structured data. NoSQL databases can be considered as an option for storage and analysis of big data. They do not replace SQL databases, which excel at analysis of structured data, rather they remain another potential tool in a regulators’ arsenal.

 

Conclusion

For many data collections moving towards granular data is a mandatory component in achieving a world of finance in which all stakeholders are on the same page – because everyone is speaking the same language. We can see that regulators all over the globe are having the same conversation. Obfuscated terminology will be reduced or eliminated, leading to timelier and cost-effective acquisition of data which enables flexibility by its consumers.

However, this does not imply an immediate and total need to transition all current data collection to granular data. We have seen in the mentioned use cases that the primary utilisation of granular data has been in the area of lending and deposits. This data is statistical in nature, requiring less complexity than other types of reporting which requires data mapping to regulatory definitions or accounting principles.

And while granular data acquisition leads to reduced cost over time, a ‘big bang’ approach to acquiring granular data may incur significant initial costs for financial institutions. These institutions may be working with a web of legacy systems which were not originally procured with granular data acquisition in mind – for example, the introduction of AnaCredit saw financial institutions attempting to source data from dozens of internal data stores.

It also follows that investment in the processing power and storage capabilities of the regulators’ infrastructure also needs to be considered. Relational databases have evolved significantly in recent years and are now capable of handling acquisition, processing and analysis of granular data given the correct setup.

Therefore, there is no harm in taking a phased approach to introducing granular data, taking the most impactful and cost-effective areas of supervision to attain / promote industry buy-in as well as meet your most pressing supervisory needs in a prioritised manner.

A digital lingua franca in SupTech needs to be achieved, but each regulator needs to plan and find their roadmap to get there.

BACK TO ALL