Now that we know how to create a well-formed XML document, let's explore how to design one properly so that it meets our needs. As we start determining the structure of our XML document, we'll want to take a closer look at the type of data we're working with, and think about how that'll affect the markup we create.
Narrative vs. Record-Like Documents
There are two different categories of XML documents: narrative XML documents and data-centric XML documents. Narrative XML documents are more free-flowing and less structured. It often contains a considerable amount of text within the XML markup. A section of text from a play, for example, marked up with XML for archival purposes, is one example of a narrative XML document. While there might be a set of tags used to mark up a play, the exact ones used for each section of text may vary.
Data-centric XML documents, in contrast, are much more rigid. These documents are more likely to contain data that could also be housed in a database, such as a list of employment records or an equipment inventory. There is likely to be greater consistency among documents in a document set, and the same tags are likely to be used for every record.
The job listing document we are marking up today is more of a data-centric document. Its overall structure is likely to be similar from week to week. There may be more or fewer job listings, but each listing is likely to contain similar, if not identical, elements.
Once we've familiarized ourselves with the data we'll be working with, the next step is to determine what tags will be needed to effectively mark up the document.
Determining Tag Sets
A tag set is the complete list of tags that can be used to add XML structure to a similar set of documents. Building a tag set can sometimes be more art than science, and sometimes getting the right amount of granularity when establishing the elements in a tag set is not as easy as it looks. One of the most straightforward ways to create a tag set is to examine several similar documents to be marked up, and then create a single tag set that will suit the needs of each of those documents.
Since we've only got one document so far, the job listings for the current week, we'll build our tag set based off of this document. If you end up getting other job listing documents that contain other information that's not covered by the current tag set, you can always modify the tag set to cover any additional information that needs marking up.
As you look at the information contained in job_postings.xml, you can see that each listing contains the same pieces of information: job title, salary, classification, department, description, qualifications, and so on. We'll want to create a tag for each piece of information, and then use those tags to mark up the information in each record.
For this document's tag set, it'll include a parent tag that contains each job listing, and then additional tags for each piece of information about the specific job (title, salary, and so on). Since each piece of information listed about a specific job belongs to only that job, we'll nest those tags inside of the parent tag for each listing to group them together.
Let's start marking up the document — we'll mark up the first job to start with. First we'll add the parent tag for the first job, then add the child tags to the rest of the data for that job. As you're marking up the document, you might want to indent child elements, similar to the following example:
<parent>
<child>Child element 1</child>
<child>Child element 2</child>
</parent>
Exiting code block.
This can help make it more obvious that certain elements are nested inside of each other, and will help you make sure that elements are nested properly and don't accidentally overlap. XML ignores whitespace in a document, so adding the extra space won't affect how your document is processed. To indent each element, you can either add a few spaces before a nested element or press the Tab key to indent a line of code. While it's not required to indent lines of code that include nested elements, it's generally considered a good idea to do to help make nested elements more obvious, and make it easier to see how a document is structured.
You might also notice that some of the sections in the job listing have headings — we'll be removing those as we go, as the tags we'll be marking up the document with will add all the structure we need.
Let's go ahead and start marking up job_postings.xml.