In Propylon’s early days, our work with standards developers primarily involved financial standards bodies such as FASB, AICPA, ICAEW, and IFRS. Unlike traditional standards development organizations (SDOs) in regulated sectors such as life sciences, aviation, oil and gas, etc., these entities did not use the term ‘SDO’. Our partnership with these standards bodies coincided with the dawn of the electronic publishing revolution, progressing from CD-ROMs to DVDs, the World Wide Web, and XML feeds, all through the late nineties and early 2000s.
Our work with financial standards bodies resulted in our involvement with the content of many of the key regulators such as the SEC in the USA. Since the SEC’s establishment in 1934, public companies have been required to file reports such as profit and loss statements and balance sheets. The SEC’s EDGAR system, launched as a pilot in 1984 and fully operational in 1995, mandated electronic versions of these filings. In 2009, the SEC transitioned EDGAR to a more machine-readable, XML-based electronic format called XBRL and by 2022, it had fully transitioned to electronic submissions, eliminating the vast majority of paper filings.
Propylon’s co-founder, Paul McKeon, and I attended a meeting at the AICPA’s offices in New York in the late nineties when discussions began around developing the development of XBRL. Spearheaded by CPA Charlie Hoffman, XBRL standardized the digitization of financial reports in a machine-readable form that uses taxonomies such as US Generally Accepted Accounting Principles (GAAP) from FASB.
The XBRL format is a data model that facilitates highly precise financial information exchange between financial regulators and regulated financial entities. It has become pervasive in the financial domain not only in the USA but also in the UK, EU, and Africa where it has been mandated.
The shift to a data-first approach to maker/taker information exchange is happening in parallel with the advances in artificial intelligence.
Auditing and accounting ahead of the machine-readable game
Looking back at it now, it is clear that the auditing and accounting domain has led the way in recognizing the advantages of machine-readable data models for information exchange between rule-makers and rule-takers in regulated industries.
Many other regulated industries are now looking to develop their own data models and XBRL will no doubt be seen as an example to learn from.
In oil and gas, for example, the International Association of Oil and Gas Producers (IOGP) is developing a similar framework known as DISC. The Digital Standards Alliance (DSA) within SAE ITC, associated with aerospace giants such as Boeing and Lockheed Martin, aims to create machine-readable digital standards for the aviation industry. This initiative seeks to transform traditional paper or PDF-based standards into dynamic, digital formats that support the full realization of digital threads and digital twins, thereby advancing the digital transformation of the industry. In construction, Building Information Models (BIM) represent a similar initiative towards machine readability and digital twinning. In pharmaceuticals, standards bodies such as CDISC and ICH are busily developing analogous models for various aspects of the regulated information flows in drug development.
These sample initiatives reflect a broader trend within regulated industries of creating standardized, machine-readable data models that facilitate and streamline information exchange between rule-makers and rule-takers.
The development of XML-based models is fundamentally a good thing.
Knowledge graphs: the next step in data-first approaches
This broader shift to a data-first approach to maker/taker information exchange is happening in parallel with the advances in artificial intelligence (AI), particularly generative AI, large language models (LLMS), retrieval-augmented generation (RAG), and knowledge graphs. With the benefit of hindsight, it is now clear that XBRL and all the other machine-readable data models mentioned above are examples of knowledge graphs.
In technical speak, from an AI perspective, organizational knowledge graphs are a way of contextualizing queries in a way that greatly improves the quality of results from LLMs such as ChatGPT. In business speak, knowledge graphs help rule-makers and rule-takers signal the important pieces of information that are relevant in their industries so that the AIs can act more like SMEs who understand the enterprise and the domains in which the enterprise operates.
The days of rule-makers and rule-takers employing large teams to read through PDFs to extract (via copy/paste) pieces of information are now fading as the technical direction forward to a better alternative is now clearly established.
Prioritizing the Subject Matter Experts (SMEs)
XBRL’s machine readability comes – in part – from its use of very clever techniques beloved by software engineers known as markup, schemas, and parsers. Although vital to the success of XBRL, these are, from an SME’s perspective technological ‘plumbing’ that should exist behind the scenes. The SME doesn’t care or need to care about the XML innards any more than an SME needs to worry about APIs, Unicode, or RFC 822 email formats.
Once XBRL was established, there was a surge of activity to develop plugins for tools such as Microsoft Excel so that XBRL could remain behind the scenes, allowing financial content to be produced and consumed in XBRL format without SMEs ever needing to concern themselves with the XML layer.
The development of XML-based models is fundamentally a good thing and XML will clearly play an important role in the development of XBRL-link data models/knowledge-graphs in many domains along with formats such as JSON. However, the lesson from XBRL is clear – the XML/JSON, etc. should be in the plumbing and not exposed to the SME.
When it comes to narrative content, Microsoft Word is the tool of choice for makers and takers alike; this is not going to change any time soon. The good news is that today, it is entirely possible to keep XML behind the scenes in Word just as it is possible to keep XBRL behind the scenes of Excel.
XML and the digitization of rulemaking
Propylon’s experience spans decades, witnessing the genesis of XBRL in auditing and accounting and the expansion of similar models into other domains. We have been helping rule-makers and rule-takers build data models and knowledge graphs for decades.
XML is here to stay. In markets such as auditing and accounting where XML-based models have matured, they have become embedded in the digital infrastructure to the extent that we may even question their existence as if we never see them. This is how it should be. The rightful place for XML and JSON alike is buried in the plumbing unless you happen to be a software engineer or programmer interested in low-level details.
Generative AI is also here to stay. The fusion of the two in the exploding world of RAG and knowledge graphs will form the basis for knowledge management and regulatory compliance for decades to come. In time XBRL will, in my opinion, be seen as having been ahead of its era.