Phrameworks and XHTML
by Castwide on 11-21-2007 Tags: phrameworks, xhtml 0 commentsThe upcoming version of the Phrameworks kernel will include an XHTML compliance mode. When this option is enabled, XHTML documents will be sent with the application/xhtml+xml mime-type to all user agents except Internet Explorer. (The implemented solution is a variation of the content-negotiation technique proposed by the W3C.) This feature presents a number of challenges, but the benefits make it worth the trouble.
Valid XHTML requires more care than HTML. Browsers that understand the application/xhtml+xml mime-type parse the document with a strict XML parser. They will not attempt to render invalid documents, as browsers traditionally do with HTML. Firefox, for example, will simply display an error. A single illegal character can break the whole page.
Problems with XHTML documents are not always obvious, either. When the document is sent with the text/html mime-type, browsers treat it as HTML tag soup and ignore errors, even if the document uses an XHTML doctype. Many web pages written in XHTML would break if they were sent as application/xhtml+xml.
Much to my satisfaction, however, most of the errors I encountered were easy to fix. Ampersands in URLs were definitely the most common problem. A slightly less frequent one was embedded JavaScript.
There are several changes to the Phrameworks parseran extended version of the PHP Pagemillthat were necessary for valid output. One change was in the Pagemill itself. The newest version allows prefixes that tell the template parser whether the data needs to be encoded or decoded for HTML entities. The biggest changes, however, were in the Phrameworks kernel code.
- When a page has the XHTML doctype and Phrameworks is in XHTML compliance mode, everything in script and style elements is automatically wrapped as CDATA.
- Every piece of data that gets passed into the Pagemill gets encoded for HTML entities.
The latter change is the one that will require the most adjustment. The template parser will no longer inject raw HTML into templates by default. Additional code is required to allow markup in content; but automatic encoding also makes applications easier to secure, and it is easier to ensure that the resulting document will be valid XHTML.
In an upcoming article, I'll discuss input scrubbing and the benefits of the new design in more detail.