Why GOV.UK content should be published in HTML and not PDF

16 Jul 2018 03:24 PM

Blog posted by: , 16 July 2018 – Categories: Accessibility

A laptop showing the GOV.UK guide on the accessibility of PDFs

GOV.UK exists to make government services and information as easy as possible to find and use.

For that reason, we're not huge fans of PDFs on GOV.UK.

Compared with HTML content, information published in a PDF is harder to find, use and maintain. More importantly, unless created with sufficient care PDFs can often be bad for accessibility and rarely comply with open standards. 

Many departments are doing great work to move away from them. For example, the Driver and Vehicle Standards Agency (DVSA) blogged about how it created and published its strategy in HTML and Public Health England has written about its work to move away from PDFs. 

Content managed by the GOV.UK team in GDS is entirely in HTML and the training, guidance and tools we provide for publishers encourage HTML by default. However, we still have around 200,000 PDFs on GOV.UK and we’re publishing tens of thousands of new ones each month. We’ve heard from GOV.UK publishers and we know there are pressures that can make it difficult to avoid using PDFs. 

The default should be to create all content in HTML. If you can’t avoid publishing a PDF, ideally it should be in addition to an HTML version and the PDF must meet accessibility standards and archiving standards. We hope this post will help publishers explain the problems with PDFs to their colleagues and support moving towards an HTML-first culture.

Problems with PDFs

They do not change size to fit the browser

On a responsive website like GOV.UK, content and page elements shift around to suit the size of the user’s device and browser. However, PDFs are not designed to be flexible in their layout. They generally require a lot of zooming in and out, and scrolling both vertically and horizontally. This is especially troublesome with long documents and on small devices like mobile phones.

They’re not designed for reading on screens

People read differently on the web, so it’s really important to create content that is clear, concise, structured appropriately and focused on meeting the user need. A PDF document that was created for offline use will not suit the context of the web and is likely to result in a poor user experience.

It’s harder to track their use

We cannot get as much information from analytics about how people are using PDFs. We can get data on how many times a PDF has been downloaded from GOV.UK, but we cannot measure views of the file offline.

In addition, we cannot get data about how users have interacted with a PDF – for example how long they’ve viewed it for or what links they’ve followed. This makes it harder to identify issues or find ways to make improvements.

They cause difficulties for navigation and orientation

Depending on the user’s device and browser, PDFs might open in a new browser window, new tab or a separate app. Sometimes they automatically download to the user's device. Whatever happens, the user is taken away from the website when they open a PDF. This means they lose the context of the website and its navigation, making it harder for them to go back if they need to.

This is even more of an issue if the user goes directly to the PDF from a search engine. Without the context of the site the PDF is hosted on, they can’t easily browse to related content or search the website.

It’s also worth remembering that although many devices and browsers have PDF viewers built-in - and they are freely available to download - there are still users who do not have them, or cannot download them.

A mobile phone showing a zoomed in part of a PDF

PDFs generally require a lot of zooming in and out and scrolling to read the content on a mobile phone

They can be hard for some users to access

The accessibility of a PDF depends on how it was created. For example, it needs to have a logical structure based on tags and headings, meaningful document properties, readable body text, good colour contrast and text alternatives for images. It takes time to do this properly.

Even if this work is done according to best practice, there’s still no guarantee that PDF content will meet the accessibility needs of users and their technology. Operating systems, browsers and devices all work slightly differently and so do the wide variety of assistive technologies such as screen readers, magnifiers and literacy software.

Some users need to change browser settings such as colours and text size to make web content easier to read. It’s difficult to do this for content in PDFs. You can magnify the file, but the words might not wrap and the font might pixelate, making for a poor user experience. Locking content into a PDF limits the ability for people to make these kind of accessibility customisations.

It’s our responsibility to ensure that our users can access the information we publish. Plus, publishing content in HTML will also reduce the need to supply alternative formats on demand to users who can’t access a PDF.

They’re less likely to be kept up to date

Compared with HTML, it’s harder to update a PDF once it’s been created and published. PDFs are also less likely to be actively maintained, which can lead to broken links and users getting the wrong information. This can be especially problematic if a document has been published in multiple formats. Any changes need to be made to all the versions, meaning more work and more opportunities for error.

In addition, users are more likely to download a PDF and continue to refer to it and share it offline. They may not expect the content in the PDF to change and might not check the website to get the latest information. HTML documents encourage people to refer to the website for the latest version.

They’re hard to reuse

It can be very difficult to reuse content from a PDF by copy and pasting it. The design and layout of the PDF can produce unexpected results, particularly if it has multiple columns, hasn’t been structured correctly, or uses incompatible fonts.

We’re also working on tools to extend the use of our web content - such as a new content API and ways to measure the quality of content. These tools will not work with PDFs. Publishing content in HTML means it will work with new developments like these - and for whatever platforms we might use in the future.

Similarly, users cannot use browser extensions and add-ons such as Google translate on PDF content.

Why do people use PDFs?

Despite all this, there are understandable reasons why PDFs remain popular in government. Below are some of the common reasons for creating PDFs and the counter-arguments GOV.UK publishers may find helpful as they help their colleagues make the shift to HTML.

They’re quick and easy to create

PDFs may seem to be the fastest option because they can be easily created from popular applications that people are already using to author and share documents.

Converting content into HTML takes a bit of time. However, as explained earlier, creating a fully usable and accessible PDF from a source document requires specialist knowledge and can actually take longer than creating the content in HTML.  

Control over the design

Authors and publishers have more control over the layout, design and branding of a PDF. This can be especially important when there is a need to include complex tables and charts, which are sometimes tricky to create in HTML. However, the downside is that there will be people who do not or cannot access the content. Plus, the content will not benefit from the simple and consistent design of GOV.UK that’s been tested and optimised for users and is trusted as a credible source of information.

DVSA's strategy in HTML being read on a mobile phone

The DVSA designed and published its strategy in HTML

They’re easy for people to download and print

While this is certainly true, you can print HTML web pages just as readily. And modern operating systems and browsers also make it easy to download or save web content. And as mentioned earlier on, it’s not ideal for users to download documents as they can quickly become out of date.

They have the feel of a stand-alone product

We know from GOV.UK publishers that they’re often sent content for publishing that is already in PDF format. This might happen because authors want control over the final content and design - and PDFs are easy for them to create.

It can also be because the document was primarily created for offline use - after all, government is still very paper based. There’s a common feeling that a PDF publication is a more tangible and credible ‘product’ compared to a HTML publication.

These are understandable reasons, but they’re an outcome of an ingrained print culture and outdated content production processes. Government is transitioning towards a digital first culture, but old habits and ways of working take time to change.

What we’re doing to help

We'll continue to improve GOV.UK content formats so it's easy to create great-looking, usable and accessible HTML documents.

We also intend to build functionality for users to automatically generate accessible PDFs from HTML documents. This would mean that publishers will only need to create and maintain one document, but users will still be able to download a PDF if they need to. (This work is downstream of some higher priorities, but is on the long-term roadmap).

We cover the main problems with PDFs in the training that all GOV.UK publishers have to do. Discussion about these issues continues on the government content community’s Basecamp and at community events.

We want to hear from you. If you’re a GOV.UK publisher and have any suggestions for improvements that would help you to publish in HTML rather than PDF, please let us know.

Subscribe to future blog updates.