The docType Class
This class is for people who want to use the PHP DOMDocument class to create valid XHTML documents sent with the proper mime type while still providing backwards compatibility for users with web browsers that do not properly support the application/xhtml+xml mime type.
There are many advantages to using the DOMDocument class to generate content and there are also many advantages to sending the resulting content with the application/xhtml+xml mime type.
I will not go too much into these benefits, they are discussed on the PHP e-mail list from time to time, but to summarize:
- XSS injection is a little easier to prevent since pages constructed with DOMDocument must be well formed.
- Since the page is an object and not sent until it is fully constructed, the page can be manipulated making it is easy to do things like dynamically add to the navigation menu, add JavaScript and CSS references to the document head section after you have started creating the body content, etc. This makes web applications that produce dynamic content a little bit easier to work with.
- It is easier for some external applications to properly parse your web page, especially if your content includes multi-byte UTF-8 characters. Some document import tools poorly handle UTF-8 from HTML but handle it quite well when coming from XML.
The class looks at the $_SERVER['HTTP_ACCEPT'] variable which is sent in a header by the requesting web client. When support is sufficient, the class sets up the DOM as an XHTML document. Otherwise, it sets up the DOM as an HTML document.
When you are ready to serve the document, the class can send the appropriate header and the appropriate version of the content.
docType Source
Class Source Code (With Highlight):
Download as text: xml_doctype.inc.php
docType Usage
Initialize the Class
require_once('xml_doctype.inc'); $dom = new DOMDocument('1.0','UTF-8'); $dom->preserveWhiteSpace = false; $dom->formatOutput = true; $myPage = new docType($dom); $xmlHtml = $myPage->document();
The variable $xmlHtml in the above example represents the root html node. Add your content as child elements of that node.
Serving the Content
Once you have constructed your content, you need to serve it with the following command:
$myPage->sendpage();
That’s it! Browsers that report an ability to handle application/xhtml+xml content will get an XHTML version of your content. Browsers that do not report ability to handle it will get an HTML version of your content.
Public Variables
Public variables need to be defined after the class is initialized. The first three must be defined before the document() function is called if you do not want the defaults. You can also extend the class to define them.
- public $htmlDT
- String. The DOCTYPE string to use for HTML documents. Defaults to HTML 4.01 strict.
- public $xhtmlDT
- String. The DOCTYPE string to use for XHTML documents. Defaults to XHTML 1.1.
- public $xmlLang
- String. The xml:lang attribute to use for XHTML documents. Defaults to en.
- public $noChild
- Array. In XHTML, some elements are never allowed to have child elements. For example the meta, br, and hr elements. In XHTML they are self closing, but HTML does not have self closing elements. Since they are not allowed to have children, they do not have a closing tag in HTML since that technically results in a child, even though the child is 0 bytes. DOMDocument knows about most of these and gets them right on HTML export, but it may know about the newest ones, such as the HTML 5 source element. This array variable allows you to define new elements that never are not allowed to have children so that if DOMDocument does not know about them, any incorrect closing tag can be removed from HTML exports of the DOM.
- public $keywordArray
- Array. Array of keywords to add to the keyword meta tag. Defaults to empty (in which case no keyword metatag is generated.)
- public $descriptionMeta
- String. Used to generate a description meta tag if the string is not empty. Defaults to empty.
- public $generator
- Boolean. Whether or not you want a generator meta tag added to the document head. Defaults to true.
- public $genstring
- String. Specifies the string to put in the content attribute of the generator meta tag if you want a generator tag. If empty, the content attribute will specify the version of PHP, DOMDocument, and the version of libxml2 your php is compiled against.
- public $chromeFrame
- Boolean. If set to true and the requesting browser identifies itself as having the Chrome Frame plugin, the appropriate meta tag will be inserted into the head section when the document is sent. Defaults to false, but in the html5DT class that extends this class, it defaults to true.
- public $rtalabel
- Boolean. If set to true, the class will add a Restricted To Adults meta tag and send the RTA header. This option is for those who want their content flagged as adult content by parental filters. For more information, see RTALabel.org. Defaults to false.
- public $usexml
- Boolean. This is set by the constructor based upon the client accept header, but there may be circumstances where you may wish to over-ride to force content to be sent as HTML or XHTML.
- public $acceptsXHTML
- Boolean. This is set by the constructor based upon the client accept header, but you may wish to over-ride it in certain circumstances. For example, I over-ride it with the W3C validator, which can handle XHTML but does not state so in their accept header.
Public Functions
- constructor function docType($dom,$accept='')
- Takes a DOMDocument object as the first argument, optional an accept string for the second argument (overrides what the browser sends, useful for debugging by forcing HTML or XHTML). This function is called when you initialize the class.
- public function document()
- Loads the Document Type and root element into the DOM object. Returns an object for the root html node that you can append children to.
- public function addKeyword($keyword)
- Requires one argument, a string. Adds the specified keyword to the $keywordArray public variable. Unfortunately due to keyword stuffing, most search engines now ignore keywords. Some search engines however still make use of them, especially site specific search engines where keyword stuffing is not an issue.
- public function addKeyArray($keywords)
- Requires one argument, an array. Adds the elements of the specified array to the $keywordArray public variable. Unfortunately due to keyword stuffing, most search engines now ignore keywords. Some search engines however still make use of them, especially site specific search engines where keyword stuffing is not an issue.
- public function sendpage()
- Sends the appropriate header and web page to the requesting client.
Things to Look Out For
White Space
Make sure you do not have any blank lines or carriage returns before your opening <?php. Otherwise the server may send a header and content before you intend it to, which will result in a broken page.
You also should avoid any white space within your php code. Do not use a closing ?> until the very end of your script.
A common source of unwanted white space is in PHP scripts that are included by your main script.
Lower Case Elements
HTML is not case sensitive for element and attribute names. XML is case sensitive, and XHTML uses lower case for element and attribute names. When you create elements and attributes, make sure they are lower case.
HTML Entities
HTML defines many entities that are popular in web design. For example, and ©
These are not valid in XML unless defined in the Document Type. Do not use them, they will often cause XML errors in clients that receive XHTML content. You should use the numbered entity instead. For a non breaking space, you would use  . For copyright symbol, you could use either © or use a UTF-8 text editor and just type a © directly.
JavaScript
JavaScript in the document body is bad form. You really should keep your JavaScript in external script files and reference them in your document head. However, even though it is bad form, it technically is legal to define your scripts within the document body.
If you insist on using in-line JavaScript, you need to be aware that XHTML does not play nice with JavaScript contained in a comment. JavaScript within a comment needs to be contained in a CDATA block. Note that using a CDATA block will fail in some older browsers. You really should just keep all your JavaScript external and reference the external files from within the document head node. It really is the best way.
Some will point out that your page starts to load faster if the JavaScript is at the bottom of your page rather than in the head because all the JS does not need to finish downloading before the page starts to render.
However, the same amount of data is required to download before the page is fully functional, so I really recommend just referencing it in the document head. It is cleaner and if it causes a load problem, you have too much JS.
Tips and Tricks
When creating documents from scratch with DOMDocument, it can be a lot more tedious than just typing the raw XHTML:
<img src="funny.jpg" width="300" height="300" alt="[Kid with Pie]" />
That is one line of XML but requires multiple lines to generate with DOMDocument. One to create the img element, one for each attribute, and finally one to add it as a child to its intended parent object.
Static Content
For static content, you can still create your content the old way, you just need to write it as vanilla XML and then import it into your document. For example, suppose you have a text file called 'content.xml' containing the following:
<?xml version="1.0" encoding="UTF-8"?> <html> <head><title>I am a web page</title></head> <body> <h1 class="funky">Hello World!</h1> <p class="center">I am a paragraph<br /> with a self closing break <span class="groovy">and a span</span>. </p> </body> </html>
Since it is clean XML, you can import that XML into your DOM using the following technique:
require_once('xml_doctype.inc.php'); $dom = new DOMDocument('1.0','UTF-8'); $dom->preserveWhiteSpace = false; $dom->formatOutput = true; $myPage = new docType($dom); $xmlHtml = $myPage->document(); $xmlfile = 'content.xml'; $buffer = file_get_contents($xmlfile); $tmpDOM = new DOMDocument('1.0','utf-8'); $tmpDOM->loadXML($buffer); $nodeList = $tmpDOM->getElementsByTagName('head'); $impHead = $nodeList->item(0); $xmlHead = $dom->importNode($impHead,true); $nodeList = $tmpDOM->getElementsByTagName('body'); $impBody = $nodeList->item(0); $xmlBody = $dom->importNode($impBody,true); $xmlHtml->appendChild($xmlHead); $xmlHtml->appendChild($xmlBody); $myPage->sendpage();
Now you edit the file 'content.xml' at your leisure. It will be served as XHTML to clients that support it and as HTML to clients that do not.
Just make sure you keep your content as well-formed XML or it will not load properly by the DOMDocument loadXML(); function. There is a loadHTML() function that is more lenient, but it tends to turn multi-byte UTF-8 characters into named entities, and that can cause problems when serving the content as XHTML.
In the DOMBlogger CMS, we actually pass static content through libtidy in XML mode to make sure the content is valid XML before importing it into the DOM.
Dynamic Content
For dynamic content, life will be a lot easier if you write yourself a library of functions that do the grunge work of creating XML nodes.
I actually have such a library I personally use, but it is currently undergoing revision and I am not *quite* ready to share it yet.
ChangeLog
: Cleaned up code to DOMBlogger coding standards. Added support for sending of RTA header. Changed license to MIT (Expat).
: Initial public release.