We are pleased to announce the release of a WordPress to Wagtail migration kit: Wagtail WordPress Import. It's installable from PyPI.
Though you're likely to need to customise it to your site, the kit is simple to understand and to extend.
How the kit works
To get started, you need to download an XML export file from your WordPress admin. The kit generates the Wagtail site from this XML export. To be more specific about how the kit works:
- The kit firstly decodes the XML file. This is done incrementally as there can be a lot of content and comments to parse. This gives us the content as WordPress stores it in its database.
- It turns out WordPress's raw content is a little funky. WordPress runs a set of filters on the content to clean up these quirks before displaying it on the web. The filters include standardising some HTML attributes, removing others, and inserting e.g. paragraph tags. We mimic these filters in the kit to de-funk the content before moving on to the next step.
- Once the content is normalised, the kit parses the content with BeautifulSoup and Python’s html5lib. Once the HTML is parsed, it splits it up into blocks, generating a RawHtmlBlock for each one.
- Where possible these RawHtmlBlocks are then converted into more appropriate block types: image, heading, paragraph, etc. Where this isn't possible they are either removed or left as raw HTML.
- The kit then tackles WordPress's idiosyncratic shortcodes - both at a block-level (e.g. carousel) or inline text level (e.g. make my text flash). In step 2 they were temporarily converted to custom HTML tags; now the block-level tags are converted to custom StreamField blocks and the inline ones to RichText editor markup.
- Finally, it downloads all the images, converts headings, paragraphs, and block quotes, and detects and merges consecutive paragraph blocks.
This is a whistle-stop simplification of the process. Along the way, we also imported and matched author records, parsed dates, SEO metadata, and category tags, and logged the results for auditing. For more see the Import process page of the documentation.
Since WordPress sites lack a lot of standardisation, it would be hard to make a migration kit that covers every scenario out of the box. Instead, the kit has been developed and documented so that it's easy to extend.
If you're importing a site, expect to configure the mapping of WordPress content to Wagtail blocks, and write handlers for any custom shortcodes your site uses. You'll find documentation to help you on the project homepage. We also have a Slack channel where we're looking for feedback and hope to offer support to users.
Wagtail's slack has a dedicated channel for WordPress migrations (#wordpress-migrations) where support (from the developers of the kit and the wider community) can be found.
The kit was developed with sponsorship from The Motley Fool, who plan to migrate six WordPress sites to Wagtail this year. Find out more about sponsoring Wagtail features and enhancements.