Bringing Everything Together

Firstly, here's a new XML document for you to parse - it has the same format as the prior document, except it has more elements make the output more exciting:

<?xml version="1.0"?> <newsitems>
    <news type="programming">
    PHP 6.0 has been released!
    </news>

    <news type="programming">
    Larry Wall switches to PHP!
    </news>

    <news type="sci-tech">
    Cat lands on Mars!
    </news>

    <news type="programming">
    XML takes over world!
    </news>
</newsitems>

Save the new document over the old one as we will be using it from now on. Notice that now there is the standard XML header in there, for the sake of compatibility.

Now, onto the PHP code itself. I am going to run over the complete code for an event-based XML-parsing script, and at the same time I will be introducing a couple of new functions to add extra functionality to your parsing scripts. You'll recognise a lot of the code from what you have just read, but there are a few new bits in there to keep you on your toes...

<?php
    $parser = xml_parser_create();

    function startElement($parser, $el_name, $attributes) {
        $line = xml_get_current_line_number($parser);
        $attribute_type = $attributes['TYPE'];
    
        switch ($attribute_type) {
            case "programming":
                print "Programming headline found on line $line<br />";
                break;
            case "sci-tech":
                print "Sci/tech headline found on line $line<br />";
                break;
        }
    }

    function endElement($parser, $el_name) {
        print "Closed element $el_name.<br />";
    }

    xml_set_element_handler($parser, "startElement", "endElement");

    function charData($parser, $chardata) {
        $line = xml_get_current_line_number($parser);
        $chardata = trim($chardata);
        if ($chardata == "") return;

        print "Character data found on line $line. The data was $chardata<br />";
    }

    xml_set_character_data_handler($parser, "charData");

    $file = '/path/to/somexmlfile.xml';

    if (!file_exists($file)) {
        print "Error loading XML file - please check the file exists and that you have access to it.";
        exit;
    } else {
        print "XML file loaded successfully!<br /><br />";
    }

    $data = file_get_contents($file);

    if (!xml_parse($parser, $data, true)) {
        print "<H1>Unrecoverable XML error encountered! </H1>";
        printf("<P> The error report was %s at line %d</P>", xml_error_string(xml_get_error_code($parser)),
        xml_get_current_line_number($parser));
    } else {
        print "<br /><br />Parsing complete.";
    }

    xml_parser_free($parser);
?>

All being well, you should be able to recognise the majority of that code, despite the new pieces being in there. In startElement(), a new variable, $attribute_type, is set to the TYPE attribute of the items being passed in. This is then used in a switch case statement to select the correct output for type of news item. Notice that the attribute name is "TYPE" and not "type" because, as mentioned already, case-folding (automatic uppercasing) is enabled by default.

Also in startElement(), a new function appears, xml_get_current_line_number(). This takes a parser reference as its first parameter, and returns the current line being parsed by the parser as an integer. This is one of the advantages of event-based parsing - your callback functions are called when the appropriate XML is matched, which means you can get information about the line number, errors, and more.

In the charData() function, we now run the character data passed in through the trim() function. This is generally a good idea because very often you will find the character data contains spaces or line-breaks at the beginning and/or end, and this helps clean it up.

Finally, there is a new line of code to be run when XML parsing fails, and it introduces two more functions - xml_get_error_code(), and xml_error_string().

 

Want to learn PHP 7?

Hacking with PHP has been fully updated for PHP 7, and is now available as a downloadable PDF. Get over 1200 pages of hands-on PHP learning today!

If this was helpful, please take a moment to tell others about Hacking with PHP by tweeting about it!

Next chapter: SimpleXML >>

Previous chapter: Event-based XML parsing, at last!

Jump to:

 

Home: Table of Contents

Copyright ©2015 Paul Hudson. Follow me: @twostraws.