In the newest article in a recent swathe of research into XBRL and AI, UBPartner (long time XBRL software vendor and member) has explored the potential for natural language processing (NLP) to automate the identification and tagging of key information in annual reports.
Research by XII’s own Revathy Ramanan showed that LLMs, when fed with structured xBRL-JSON files from the SEC’s 10K iXBRL reports, performed well in answering financial queries. This suggests the importance of using semantic tags provided by companies for enhanced data analysis. However, the reverse process—using AI to tag financial reports with XBRL—presents a different set of challenges.
UBPartner’s Kapil Verma and Martin DeVille have been testing the potential of using NLP AI to simplify the XBRL tagging process. They set out to see if NLP can identify relevant elements – numbers and texts – in the filings that need to be tagged, and if NLP can select the right XBRL tags for the element.
Their initial results show inconsistent quality, generally around 70-80% but with some outliers. The NLP approach correctly tagged standard elements of financial statements, however struggled with HTML analysis of complex table structures and the model needs a lot more development before it is useful. The use of LLMs like GPT-4 could enhance the process, leveraging deep language understanding to better digest textual elements.
Fundamentally, however, Verma and Deville pinpoint a move towards ‘digital first’ reporting – which starts with HTML publishing, keeping the underlining structure of the report more readable – as the biggest change that will enable AI analysis in the coming years.
The ability to “co-pilot” XBRL tagging with AI could improve digital financial reporting, reducing manual effort and improving consistency, while leaving judgement calls where the belong: in the hands of management.