XHTML Basics for UNC Web Authors

On this page:

What Is XHTML?

  • XHTML stands for extensible HyperText Markup Language
  • XHTML is aimed to replace HTML
  • XHTML is almost identical to HTML 4.01
  • XHTML is a stricter and cleaner version of HTML
  • XHTML is HTML defined as an XML application
  • XHTML is a W3C Recommendation

So What's Wrong with HTML?

HTML is the set of codes (the "markup language") that a writer puts into a document to make it displayable on the World Wide Web. HTML (HyperText Markup Language) has been the standard of the World Wide Web since its inception in 1990. It has gone through several revisions, and is now at version 4. Although it has been enormously successful, the language is no longer suitable as a basis for the deployment of commercial and industrial web-based applications on the Internet and intranets.

HTML will not go through another revision, except as an application of XML, i.e. XHTML. HTML was originally designed for a very different environment than today's very demanding hi-tech Internet - namely, exchange of data and documents between scientists associated with CERN, the birthplace of the web. Since then the language has been hacked and stretched into an unwieldy monster, and the prevalence of sloppy markup practices makes it hard or impossible for some user agents (e.g. browsers, spiders, etc) to make sense of the web.

The following HTML code will work fine if you view it in a browser, even if it does not follow the HTML rules:

  • <html>
  • <head>
  • <title>This is bad HTML</title>
  • <body>
  • <h1>Bad HTML
  • </body>

In the above example, the <head>, <h1> and <html> tags are not closed. Some browsers will still display the page as expected, many will display it in a very strange way, not at all what the designer intended.

XML (Extensible Markup Language) is a structured set of rules for how one might define any kind of data to be shared on the Web. It's called "extensible" because anyone can invent a particular set of markup for a particular purpose and as long as everyone uses it (the writer and an application program at the receiver's end), it can be adapted and used for many purposes - including, as it happens, describing the appearance of a Web page.

However, the immediate issue is to facilitate the transition from HTML for the mass of developers already familiar with HTML. That being the case, it seemed desirable to reframe HTML in terms of XML. The result is XHTML, a particular application of XML for "expressing" Web pages.

XHTML is, in fact, the follow-on version of HTML 4. You could think of it as HTML 5, except that it is called XHTML 1.0. In XHTML, all HTML 4 markup tags and attributes (the language of HTML) will continue to be supported.

With HTML, authors had a fixed set of elements to use, with no variation. Unlike HTML, however, XHTML can be extended by anyone that uses it. New tags and attributes can be defined and added to those that already exist, making possible new ways to embed content and programming in a Web page. With XHTML 1.0, authors can mix and match known HTML 4 elements with elements from other XML languages, including those developed by W3C for multimedia.

Desires to extend the functionality of the web will lead to combining HTML with other tag sets: (Synchronized Multimedia Integration Language - SMIL), mathematical expressions (MathML), two dimensional vector graphics (Scalable Vector Graphics - SVG), and metadata (Resource Description Framework - RDF).

Why XHTML?

XML is a markup language where everything has to be marked up correctly, which results in "well-formed" documents.

XML was designed to describe data and HTML was designed to display data.  Today's market consists of different browser technologies, some browsers run Internet on computers, and some browsers run Internet on mobile phones and handhelds. The last-mentioned do not have the resources or power to interpret a "bad" markup language.

Therefore - by combining HTML and XML, and their strengths, we got a markup language that is useful now and in the future - XHTML.

XHTML pages can be read by all XML enabled devices AND while waiting for the rest of the world to upgrade to XML supported browsers, XHTML gives you the opportunity to write "well-formed" documents now, that work in all browsers and that are backward browser compatible.

Why would you want to use XHTML?

The usual reasons for upgrading to a new language version are to be able to take advantage of new bells and whistles, and also because problems with the earlier version have been fixed. However, XHTML is a fairly faithful copy of HTML 4, as far as tag functionalities go, so do not expect any fancy new tags. The reasons offered by W3C are extensibility and portability.

Extensibility

XML documents are required to be well-formed (elements nest properly). Under HTML (an SGML application), the addition of a new group of elements requires alteration of the entire DTD (Document Type Definition, the DOCTYPE tag at the beginning of an HTML document). In an XML-based DTD, all that is required is that the new set of elements be internally consistent and well-formed to be added to an existing DTD. This greatly eases the development and integration of new collections of elements.

Portability

There will be increasing use of non-desktop devices to access Internet documents. By the year 2008 as much as 75% of Internet access could be carried out on these alternate platforms. In most cases these devices will not have the computing power of a desktop computer, and will not be designed to accommodate ill-formed HTML as current browsers tend to do. In fact, if these non-desktop browsers do not receive well-formed markup (HTML or XHTML), they may simply be unable to display the document.

While HTML isn't completely lacking those attributes, we're all too familiar with how painfully slow the evolution has been (relative to the pace of Internet development), and how hard it can be to make your pages work on a wide range of browsers and platforms. XHTML will help to remedy those problems.

XHTML 1.0 Combines the Familiarity of HTML with the Power of XML XHTML 1.0 Provides a Foundation for Device-Independent Web Access

The Most Important Differences Between HTML and XHTML

  • XHTML elements must be properly nested
  • XHTML documents must be well-formed
  • Tag names must be in lowercase
  • All XHTML elements must be closed

Elements Must Be Properly Nested

In HTML some elements can be improperly nested within each other like this: <strong><em>This text is bold and italic</strong></em> In XHTML all elements must be properly nested within each other like this: <strong><em>This text is bold and italic</em></strong> Note: A common mistake in nested lists, is to forget that the inside list must be within a li element, like this:

  • Incorrect:
  • <ul>
  • <li>Coffee</li>
  • <li>Tea</li>
  • <ul>
  • <li>Black tea</li>
  • <li>Green tea</li>
  • </ul>
  • <li>Milk</li>
  • </ul>
  • This is correct:
  • <ul>
  • <li>Coffee</li>
  • <li>Tea
  • <ul>
  • <li>Black tea</li>
  • <li>Green tea</li>
  • </ul>
  • </li>
  • <li>Milk</li>
  • </ul>
  • Notice the </li> tag after the </ul> tag in the "correct" code example.

Documents Must Be Well-formed

All XHTML elements must be nested within the <html> root element. All other elements can have sub (children) elements. Sub elements must be in pairs and correctly nested within their parent element. The basic document structure is:

  • <html>
  • <head> ... </head>
  • <body> ... </body>
  • </html>

Tag Names Must Be In Lower Case

This is because XHTML documents are XML applications. XML is case-sensitive. Tags like <br> and <BR> are interpreted as different tags. This is wrong:

  • <BODY>
  • <P>This is a paragraph</P>
  • </BODY>
  • This is correct:
  • <body>
  • <p>This is a paragraph</p>
  • </body>

All XHTML Elements Must Be Closed

Non-empty elements must have an end tag. This is wrong:

  • <p>This is a paragraph
  • <p>This is another paragraph
  • This is correct:
  • <p>This is a paragraph</p>
  • <p>This is another paragraph</p>

Empty Elements Must Also Be Closed

Empty elements must either have an end tag or the start tag must end with />. This is wrong:

  • This is a break<br>
  • Here comes a horizontal rule:<hr>
  • Here's an image <img src="happy.gif" alt="Happy face">
  • This is correct:
  • This is a break<br />
  • Here comes a horizontal rule:<hr />
  • Here's an image <img src="happy.gif" alt="Happy face" />

IMPORTANT Compatibility Note

To make your XHTML compatible with today's browsers, you should add an extra space before the "/" symbol like this: <br  />, and this: <hr  />. 

Mandatory XHTML Elements

All XHTML documents must have a DOCTYPE declaration. The html, head and body elements must be present, and the title must be present inside the head element. This is a minimum XHTML document template:

  • <!DOCTYPE Doctype goes here>
  • <html xmlns="http://www.w3.org/1999/xhtml">
  • <head>
  • <title>Title goes here</title>
  • </head>
  • <body>
  • Body text goes here
  • </body>
  • </html>

Note: The DOCTYPE declaration is not a part of the XHTML document itself. It is not an XHTML element, and it should not have a closing tag.

Note: The xmlns attribute inside the <html> tag is required in XHTML. However, the validator on w3.org does not complain when this attribute is missing in an XHTML document. This is because "xmlns=http://www.w3.org/1999/xhtml" is a fixed value and will be added to the <html> tag even if you do not include it.

Semantic Markup (AKA Descriptive Markup)

Semantic markup is markup that is descriptive enough to allow us and the machines we program to recognize it and make decisions about it. In other words, markup means something when we can identify it and do useful things with it. In this way, semantic markup becomes more than merely descriptive. It becomes a brilliant mechanism that allows both humans and machines to “understand” the same information.

An example: you have a bunch of links you will use as navigation on the left side of your page. You might do it like this because it "looks right":

  • <p>link one<br>
  • link two<br>
  • link three</p>

Or you might make them all paragraphs:

  • <p>link one</p>
  • <p>link two</p>
  • <p>link three</p>

Semantic markup would require you to use the unordered list, because that's what it is - it isn't a series of paragraphs, it is a list. So the correct way to do this is as follows:

  • <ul>
    • <li>link one</li>
    • <li>link two</li>
    • <li>link three</li>
  • </ul>

Semantic markup and XHTML work together to separate content and display. The content should be tagged in ways that give information about the content and the stylesheet then controls the display of the page. When this approach is used, an entire site can be controlled by a single stylesheet - change the stylesheet and change the design of the entire site. This is an aid to screen readers and therefore helps accessibility.

Resources