Happy Authors and Structured Documents. XML Q&A with Sean McGrath

In this interview, Propylon CIO and co-founder, Sean McGrath takes a trip down memory lane to tell us how he became interested in XML in the 1980s. Early enthusiasm for the power of structured documents later led to a book on the topic and it was this experience that unlocked a moment of realization: authoring is anything but structured. Indeed, in the process of creating his book, Sean began to find that the XML tools he was so enamored with were actually a hindrance to the process rather than a help. Finally, Sean describes the emerging dual trend, the changing nature of work, how we consume content, and what it means for the future.

1. Sean, tell us a bit about your background. How did you get interested in XML?

In the early eighties, I studied Computer Science at Trinity College, Dublin. One of my professors – Dr. David Abrahamson – used SGML, the predecessor of XML, in one of his programming courses. It was 1985 I think, the year before SGML became an international standard now known as ISO 8879:1986.

All the key ideas of XML and indeed HTML are in SGML. In particular, the ‘generalized markup’ capability which allows documents to become structured, machine-readable data. This idea really appealed to me. By 1992, I had set up a software business and one of the first projects that came my way was a good fit for structured documents. It involved creating a searchable electronic library of third-level college courses in Ireland for distribution to schools on a 3.5-inch floppy disk. The content was a mix of highly structured and semi-structured text with extensive hyperlinks. A perfect fit for SGML/XML.

Back in those days, Windows 3.1 was the main personal computer operating system and there was no such thing as a browser or the World Wide Web. However, the Microsoft Windows built-in online help system (Winhelp) was a viable, basic ‘thin client’ for this kind of electronic publishing.

Obviously, a lot has changed in publishing technology since then. Email at that time was via dial-up modems and an email service called CompuServe. This was also before the arrival of CD-ROMs. I remember having to post money and a self-addressed envelope to Switzerland in order to get an SGML tool called ArcSGML posted back to me on a floppy disk!

So, all told my interest in XML dates back 37 years and counting. I was an invited expert on the W3C committee that created the XML standard as a result of my background in SGML.

A thick client application is installed on a local computer. Thick client requires local software.

A thin client application is accessed from the internet through your browser. Thin client relies on network connection to a central server.

2. How did you first encounter the world of XML authoring tools? What value did you see in them?

I first encountered XML authoring tools back in the SGML days in the late eighties/early nineties. Some of today’s XML editors are evolutions of older SGML editors. I have used pretty much all the SGML/XML editing tools over the decades, including the modern ones – both thick and thin client.

I still use some from time to time but not too often, to be honest. I was an enthusiastic experimenter with XML/SGML editors in the early days but that enthusiasm for them has waned over the decades. This is because my thinking about where they fit and do not fit, has evolved considerably since then.

3. How so?

Well, I was so enamored with the power of structured documents from the eighties onwards, I just assumed that all authors would see the value of this new document paradigm and be willing to learn to use structured document editing tools in order to unlock all the downstream benefits they provide in publishing workflows.

However, I realized that this was a big mistake somewhere around 1995 when I found myself becoming an author. What happened was that I signed up with Prentice Hall to write a book called SGML For Software Engineers in the Dr. Charles F. Goldfarb Series on Open Information Management. Charles Goldfarb is a lawyer who worked at IBM and was the creator of the ISO SGML 8879 standard.

Now, as soon as I put my author hat on – my Subject Matter Expert hat, if you like – I had an epiphany. I found that as I was crafting the content of the chapters of my book, I was constantly moving stuff around, re-arranging the hierarchy, splitting and merging chunks of content as I worked to find the right flow for the material. During this creative phase of creating each chapter, the content itself was the exact opposite of ‘structured’. Frankly, it was quite a mess made up of thoughts and bulleted paragraphs and diagrams and tables and To-Dos and Note-To-Self material, etc., which slowly became cleaned up and ‘structured’ as each chapter approached a good first draft.

Then it hit me. Although I was most definitely producing structured content as the end product, the vast majority of the time, it was a fluid, largely unstructured body of text. I found that this arranging/re-arranging/editing work was actively hindered by structured editing tools. The tools would beep at me for breaking the “rules”. Rules I did not care about because they all related to how the finished content would be structured – not the work-in-progress, the messy, fluid content I had in front of me 99 percent of the time.

Consequently, I found I was much more productive working with a word processor. I used WordPerfect to create that book. I found that the free-flowing nature of word processors was a much better fit for my authoring while the content was in that work-in-progress stage. Once I had a good draft, it was then trivial to generate SGML/XML from the word processor files. I used to do that in order to validate that the content met the authoring guidelines I was working with at the time. I found that I could use styles to automate the vast majority of the formatting in the word processor and easily generate very clean, very structured SGML/XML.

It was this experience as an author that led me to see the error of my ways in advocating structured authoring tools to authors. Firstly, it taught me that just because the content might appear to be beautifully structured when you look at the finished product, it does not follow that it was beautifully structured all the way through its creation/update lifecycle. Quite the opposite is often true.

Second, it is entirely possible to get all the back-end value of structured documents without having to move authors away from the content editing tools they are comfortable with and that fit their workflows, on the front end. These are typically word processors for the reasons mentioned earlier and in today’s business world, that word processor is mostly Microsoft Word.

Thirdly, no amount of the glorious back-end automation of publishing processes that structured documents enable is worth a hill of beans if you cannot get your authors comfortable producing the content in the first place. Simply put, the key to most publishing systems is to have both happy authors and structured documents. Without happy authors, you have nothing.

4. In your experience developing software solutions for governments, legislatures and firms operating in heavily regulated industries, why do organizations look to XML editors as a solution to their challenges?

I think it is because it is such an intuitively appealing idea: use a dedicated structured authoring tool to create structured documents. Simple! I certainly believed that too in the early days of my career, but it was only when I became an author myself that I realized the mismatch. This mismatch has been confirmed to me many, many times over nearly thirty years now as I have worked with lawyers, accountants, engineers putting in place structured document authoring and publishing solutions.

5. Why is there skepticism around the ability for Microsoft Word to help firms achieve their content goals?

There is a mix of factors, I think. Firstly, the mainstream XML message remains that XML editors are the way to go. I spend a lot of my time at Propylon working with authors who are moving away from them, and I know others in the industry who do similar projects where they seek to keep the structured document back-end but find a way to better serve the authors/editors on the front end. Oftentimes, end-to-end structured document systems are put in place with much optimism, and it takes a period of time before the problems with the author/edit side manifest themselves.

Secondly, there is a belief that because Word allows authors to mix, say, portrait and landscape page layouts or embed images in footnotes, etc., you necessarily get a mess of unstructured, hand-formatted content arriving at the back end ‘structured’ systems. This is simply not true in practice. Word these days is itself an XML-based format (the .docx file format) and with proper design and proper ‘house styles’ and drafting guidelines, it is entirely possible to get structured content from Word.

6. You said earlier, “Without happy authors, you have nothing”. Can you expand on that a little?

The standard author pattern I have seen over the years is this:
  • An XML system is put in place because of the many downstream benefits that can accrue from structured documents.
  • The authors are told they will need to use a structured XML authoring tool. There is some grumbling, but the authors say they will give it a go.
  • The early days are rocky on the author/edit side, but the value propositions of structured documents work out, so the authors stick with it. The authors try to get the tools customized to the authoring experience they need. Somewhere along the line, they realize what they are really looking for has been on their desktops all along. Typically, it is Microsoft Word.
  • Slowly but surely, the authors abandon the structured authoring tool and do all their work in Word instead, only putting the content into the structured authoring tool when they are finished working on it.
I have seen this pattern so many times, I could write a book about it. Maybe I will someday.

7. Making a technology choice that doesn’t fit the needs of the business can incur hidden or unanticipated costs. How does this typically play out in your experience?

Firstly, I have seen numerous situations where the organization finds itself spending more money than ever on the authoring side of its workflow once it introduces end-to-end structured document solutions. This is because the organization may need to hire people to take the Word files from the SMEs and copy/paste them into the XML editor. I have seen organizations where the authors get blamed for this. The problem does not lie with the authors. It comes from the fundamental mismatch I mentioned earlier.

Secondly, I have seen organizations spend significant sums of money trying to turn their structured XML editors into something that behaves like Word. Why spend money trying to re-invent Word? If the authors want to work in Word, let them work in Word. Word is extremely programmable – especially now that it has an XML-based file format and has the new generation Office 365 APIs. Better to start with something the authors are happy with and customize on top of it, rather than try to re-invent Word itself. That is a very expensive proposition and does not generate any new value for content businesses.

Thirdly. Document validation. If I had a dollar for every time somebody told me that they were sold on the idea that XML magically validates documents, I would be rich indeed. The reality is that the type of validations that XML provides out-of-the-box are not the validations authors need help with. Authors in my world – the world of laws, standards, and regulations – worry about validating aspects such as effective dates, ensuring that all measurement units are the same order of magnitude, ensuring that citations point to valid external resources, gender-neutral language, quoted material matches the source of the quote, etc.

These are all examples of validation checks found in typical QA/QC workflow steps, drafting manuals, style guides etc. XML validation does not give you any of these things. Of course, you can custom program any of these but that all comes at an extra cost. If you are looking to add these things into a structured XML editor, then the costs are going to be higher than they would be if your starting point was a mainstream programming environment such as Office 365.

8. How have your learnings over the years fed into how Propylon develops its TimeArc® platform?

Well, what we call ‘experience’ is really making mistakes and then learning from them. With the benefit of hindsight, I now know I was wrong back in my early efforts to move authors away from the tools that suited them, onto tools that suited me, and my desire to have structured XML documents to process on the back end.

I like to think that TimeArc today is the culmination of all the years of experience and strikes the right balance by paying much more attention to the needs of the authors, in particular by meeting the authors where they are at – Word – and working from there.

9. Any final comments?

There is an ever-increasing emphasis on Software-as-a-Service (SaaS) and thin client computing. Related to that, thanks to the COVID-19 pandemic, remote working has supercharged the development of collaboration tools to allow remote teams to work together effectively.

As a result, the old SGML/XML model of monolithic documents is changing rapidly and being replaced with a component-centric model of managing content. In this paradigm, it is the back-end component content management system that does the heavy lifting in terms of managing components. Thus, a lot of the ‘big document support’ features of XML author tools – and indeed DTP packages – are becoming unnecessary. Many organizations I work with who are moving away from monolithic DMS systems into CCMS systems are using this transition as their opportunity to transition away from traditional XML-based structured authoring tools also.

TimeArc is both an author/edit tool and a CCMS, so we believe we are well-positioned as these dual trends progress.

Don't miss an update

Subscribe to our newsletter to receive new content right to your inbox.