I think this is new, so I’ll describe a new technique used in GXml to parse a large set of nodes in an XML document.
Parsing Large XML documents
If you have a large XML document, with a root with a number of child nodes, the standard technique is read all of them, including the child’s children ones, to create the XML tree. This process can take a while.
New on-the-fly parser
GXml now has a new custom parser called StreamReader used to read the root element and its children, but without any attribute and without any child’s children; the attributes and the children’s children are stored in a string on-the-fly in order to read the document almost at the same time it is read from the IO stream, for the root and for each children, improving the loading time of large XML documents up to 400% times faster than the previous technique already present in GXml.
By using this On-the-fly-post-parsing technique, You can’t access the child’s children or the root’s attributes immediately after first read, you have to parse it from a temporally location in the
GXml.Element class, using the new
GXml.Element.parse_buffer() method, this one use the standard method, already present in GXml, to parse the root’s properties and the children’s children. When
GXml.Element.parse_buffer() is called over the root, all children’s children are parsed recursively, but you can choose to parse just one of the root’s child, making a really convenient technique when you need just one root’s child node in a large XML document.
GXml.Element.parse_buffer_async(), when called on root’s element, uses
GLib.ThreadPool to parse each child in a different thread each and uses as many as threads are usable (less one) in your system. The expected behavior is getting a parse boost over the standard technique using in GXml:
Xml.TextReader from the veteran
libxml2 library running over just one thread. Currently a standard time parsing is provided when
GXml.Element.parse_buffer_async() is called on document’s root, this maybe is a limitation on
libxml2, because we have lot of
Xml.TextReader running at the same time parsing element’s children; or a limitation on
GLib.ThreadPool. Maybe the solution is a step away.