If there's no object there, we create one, and set the SEEN field for it. We'll use the SEEN field later for sorting the output. If there is no enclosing element, then the undefined value is returned. This can only happen for the root element. Otherwise the name we get is the name of a parent element for this element.
There are three anonymous hashes created when we instantiate an Elinfo object, one each for parents, children, and attributes. When we find a parent, we can increment its slot in our parent table and increment our slot in the parent's child table. Also if this element is contained in some other element, then that element can't be empty.
Finally, we deal with the remaining parameters, which are the attributes to this element passed along as name and value pairs. This is done until the parameter list is empty. The character handler has two tasks: set the empty boolean to false if we see any content at all for that element, and increment the byte count if the given data we're seeing has any non-whitespace characters. This way of counting content bytes is somewhat bogus.
A better way would have been to keep a count of all bytes and a flag that indicates whether or not any non-whitespace has been seen.
Then an end handler could summarize for that instance of the element. Now that we've seen how the element information is generated during the parse, let's go back to the rest of the main program to see how this information is processed. The only hard thing here is to establish the order in which we print out the elements. This number is the minimum level from the root element that an element occurred.
A little rummaging around, and you'll uncover ready-made widgets for everything from text processing to socket programming. If you're one of those people, keep reading, because this document lists ten CPAN modules that you're sure to find helpful in your daily development activities. This module provides an interface to the expat XML parser. It is one of the core Perl modules for XML parsing, and many of the other packages in this list depend on it.
It can be used to both parse existing trees or to dynamically create new trees from scratch. This module provides an XPath implementation, making it possible to address XML nodes or node collections through the use of XPath expressions. Use this module when you need to create custom node collections or capture nodes matching complex criteria. Recognised values for the option are:. Setting this option to 1 will cause all element and attribute names to be expanded to include their namespace prefix.
The default is no translation. If your data contains expanded names, you should set this option to 1 otherwise XMLout will emit XML which is not well formed. Three levels are possible:. This option also accepts an IO handle object - especially useful in Perl 5. Note, XML::Simple does not require that the object you pass in to the OutputFile option inherits from IO::Handle - it simply assumes the object supports a print method.
Note: This option is now officially deprecated. If you find it useful, email the author with an example of what you use it for. This option allows you to pass parameters to the constructor of the underlying XML::Parser object which of course assumes you're not using SAX. This option allows you to specify an alternative name.
Specifying either undef or the empty string for the RootName option will produce XML with no root elements. Nevertheless, the option has been found to be useful in certain circumstances. If you pass XMLin a filename, but the filename include no directory component, you can use this option to specify which directories should be searched to locate the file. If a filename is provided to XMLin but SearchPath is not defined, the file is assumed to be in the current directory.
If the first parameter to XMLin is undefined, the default SearchPath will contain only the directory in which the script itself is located. Otherwise the default SearchPath will be empty. This option controls what XMLin should do with empty elements no attributes and no content. The default behaviour is to represent them as empty hashes. Setting this option to a true value eg: 1 will cause empty elements to be skipped altogether.
Setting the option to 'undef' or the empty string will cause empty elements to be represented as the undefined value or the empty string respectively. The latter two alternatives are a little easier to test for in your code than a hash with no keys. The option also controls what XMLout does with undefined values.
Setting the option to undef causes undefined values to be output as empty elements rather than empty attributes , it also suppresses the generation of warnings about undefined values.
Setting the option to a true value eg: 1 causes undefined values to be skipped altogether on output. This preferred form of the ValueAttr option requires you to specify both the element and the attribute names.
This option allows variables in the XML to be expanded when the file is read. If no matching key is found, the variable will not be replaced. In addition to the variables defined using Variables , this option allows variables to be defined in the XML.
The value of the attribute will be used as the variable name and the text content of the element will be used as the value. A variable defined in this way will override a variable defined using the Variables option. The default XML declaration is:. If you want some other string for example to declare an encoding value , set the value of this option to the complete string you require.
The procedural interface is both simple and convenient however there are a couple of reasons why you might prefer to use the object oriented OO interface:. The default values for the options described above are unlikely to suit everyone. It works like this:. You can also specify options when you make the method calls and these values will be merged with the values specified when the object was created.
Values specified in a method call take precedence. The method names are aliased so the only difference is the aesthetics. You can make your own class which inherits from XML::Simple and overrides certain behaviours.
The following methods may provide useful 'hooks' upon which to hang your modified behaviour. You may find other undocumented methods by examining the source, but those may be subject to change in future releases.
This method will be called when one of the parsing methods or the XMLout method is called. The initial argument will be a string either 'in' or 'out' and the remaining arguments will be name value pairs. Called from XMLin or any of the parsing methods. Takes either a file name as the first argument or undef followed by a 'string' as the second argument. Returns a simple tree data structure.
You could override this method to apply your own transformations before the data structure is returned to the caller. When the 'simple tree' data structure is being built, this method will be called to create any required anonymous hashrefs. Called from XMLout , to handle attribute values. Called from XMLout , when 'unfolding' a hash of hashes into an array of hashes. Yes, and namespaces are the reason. Keep it in mind. That makes them weird, and something you might not pay enough attention to.
Remember that documents often use DTDs and have declarations for such things as entities and attributes. If you forget, you could end up breaking something. Maybe the content is in another file, or maybe it contains characters that are difficult to type.
The concept is simple, but the execution can be a royal pain. Entities can contain other entities to an arbitrary depth. This fact can lead to some surprising results.
By default, an XML processor will preserve all of it—even the newlines you put after tags to make them more readable or the spaces you use to indent text. Some parsers will give you options to ignore space in certain circumstances, but there are no hard and fast rules. In the end, Perl and XML are well suited for each other. The order of elements changed, which is significant in XML. We can say for sure that the documents are close enough to satisfy all the requirements of the software for which they were intended and of the end user.
Skip to main content. Ray, Jason McIntosh. Start your free trial. Chapter 1. Perl and XML. XMLin This subroutine reads an XML document from a file or string and builds a data structure to contain the data and element structure. XMLout Given a reference to a hash containing an encoded document, this subroutine generates XML markup and returns it as a string of text. Example SpamChucker datafile.
A script to capitalize customer names. We'll also turn on the 'forcearray' option, so that all elements contain arrayrefs. XML Processors. A Myriad of Modules. Keep in Mind We will get into some detailed examples of larger programs later in this book. XML Gotchas.
0コメント