GXml and XSD

While on the road to release GXml 0.14, I started to port some of my projects to new GXml.Gom* objects, in order to take advantage on speed and reduced memory footprint.

In the process, my library requires to define a large set of strings to select on for an element’s attribute. This is, PostalCode number. They are defined as XSD enumeration in a SimpleType.

At the begining, I started to define an array to add to GXml.GomArrayString[1], but found they are too many to maintain and error prone.

Then I desired to take a look at W3C Schema Specification and started to define a new GXml.Xsd* interfaces and a new GXml.GomXsd* classes to implement them. The result is: GXml.GomXsdArrayString [1], a new class taking an XSD file, to search a SimpleType definition and parse all enumerations, add them to an array of strings you can choose or validate your property value from.

These are the first steps in the way to get XSD support in GXml. While, GXml.Xsd* interfaces are unstable, and will continue this way after 0.14 release, will open new opportunities to any one consuming XSD definitions.

May be in the future, some one can use this API to create an XSD to Vala (or C) code GObject classes.

May other wants to help adding more object definitions from XSD specification in order to get patterns and other restriction from schema definitions, and improve data handling and validation.

[1] GXml.GomCollections definitions

[2] GXml.Schema definitions and GXml.GomXsd implementations

 

GomDocument: Providing best of two worlds

In upcoming GXml 0.14, there will be a new DOM4 implementation called GomDocument, it along with GomElement and GomObject, provides support to serialize or deserialize GObject object to or from XML files.

I’ve made initial measures about what is the performance of GomDocument against other implementations in GXml: GDocument and TDocument, first one is a libxml2 implementation with bidings on GObject and DOM4 while last one is a pure GObject implementation with no support (yet?) of DOM4.

In performance test, we use a 3.5 MB file with a lot of nodes, read it and then create an internal in memory XML tree and then fill out your GObject class properties, we measure required time to deserialize and then time to write it back to an XML file.

We can see GDocument taking a lot of time on deserialize, because it uses libxml2  to create a tree and GXml.SerializableObjectModel to fill out your GObject class. Serialize is very competitive, because all implementations, use almost same engine: direct access to libxml2 by xmlWriter.

GomDocument, is very competitive if compared to TDocument, but can perform much better than GDocument.

Please note that GomDocument time to write to disk is not available, because serialization and deserialization is make in one go, then may be we case reduce this time (should be the same from others  because the file is loaded in memory before to read) and then this makes GomDocument to perform even better!!

On memory usage, TDocument requires much more memory and GDocument is the one to defeat. GomDocument now requires almost same memory than GDocument.

Conclusion

GomDocument is the way to go on Serialization framework for GXml. These results will help LibreSCL, to provide a very competitive serialization/deserialization framework and consider to create a WEB based application using GObject Introspection and Python to access very large files, without requiring lot of memory resources and may be a good response time on reading.

I’m starting to stabilize GXml to release 0.14 as soon as I can found most bugs on parse XML files. GXml’s GomDocument have a set of errors detection, based on DOM4, not present in other GXml implementations, making it more sensitive to some files and may not affecting (or detected) by libxml2, for example.

GXml 0.13.1 Released

Now you can convert your GObject classes in XML nodes. This is, you can read and write XML trees directly to object classes’ properties, from basic types to complex like object properties, representing XML element’s attributes, to other child elements, while you can use collection of child nodes.

This has been easiest to implement than GXml.SerializableObjectModel, which requires you to read an XML tree and then translate to your object properties. This should be slower than new GOM implementation included in this release.

Next, should be to test it in a real project like GSVG and make some performance tests.

These are old results of GDocument vs TwDocument, used by SerializableObjectModel, to serialize and deserialize large files (3.1MB). In future articles, hope to attach this GomDocument performance and resources comparations with all existing implementations in GXml.

gvstw-memory

gvstw-time

GXml: Objects and Collections to XML and back

Today I’ve finished to push last implementation for GXml GOM, to allow write, to XML:

  • Object Properties, as child nodes of current class
  • Object Properties, as element’s attributes
  • Object Collection Properties, referencing XML child nodes

Object Properties

In GXml GOM, any GomElement is an XML Element node. If it has GomElement as properties, they will be added as child nodes.

Object Properties as Attributes

If your GObject class implements GomProperty interface, and is a property in your object, it will be translated to an Element attributes with a name and a text value.

For simple types, this means you can control if an attribute is written or not, depending if it is not null. Standard properties, not GObject classes implementing GomProperty, they will be always written with its default value. This is, for example, a boolean will always use false by default.

Using GomProperty, you can define default actions when a property is omitted in XML file.

Complex Object Properties

Some times, your string representation of attributes include more information than just values, like units. In GSVG, you have an attribute like: length=”3.8 cm. You can implement an object with properties for each value component like:

public class Length : Object {

public double value { get; set; }

public UnitsEnum units { get; set; }

}

With GXml GOM, is now possible to implement W3C SVG 1.1 specification interfaces, most objects will be complex properties to be translated to Element’s attributes. Once you implement a way to parse a string representation to your object’s properties and back, you can have GomProperty objects in your GomElement to be de/serialized to attributes.

Collections

GomElement objects are containers, by definition, in DOM4. It can have child nodes of different local names and namespaces.

Once you have a set of different nodes, may you want classify them by their node’s names and for, for example, its id attribute.

GXml GOM, have added a set of basic collection classes, implementing GomCollection interface. ArrayList and HashMap, are classes you can use to access child nodes. All references are to child node’s indexes, no copy or ref-counted objects.

Examples

If you want to see how implement different kind of classes and properties, you can checkout GXml repository Unit Tests.

GXml 0.14 and Serialization

Now with all in place for DOM4, GXmlGom is getting support to derive classes from GXml.GomElement, making easy to serialize your GObject classes to XML.

Now you just need to prefix your GObject’s property nicks with “::” and it will be used as XML Element attribute. Now Gom have support for strings, integers, unsigned integers, double and enumerations properties types.

I you have this class:

  public class Taxes : GomElement {
[Description (nick=”::monthRate“)]
public double month_rate { get; set; }
[Description (nick=”::TaxFree“)]
public bool tax_free { get; set; }
[Description (nick=”::Month“)]
public Month month { get; set; }
construct {
_local_name = “Taxes”;
}
public string to_string () {
return (_document as GomDocument).to_string ();
}
public enum Month {
JANUARY,
FEBRUARY
}

You can use:

var t = new Taxes ();

printf (t.to_string ());

and you’ll get:

<Taxes monthRate=”16.5Month=”februaryTaxFree=”true“/>

Next steps will be to implement reading XML documents back to your GObject’s properties.

You will find more “examples” and advances at GXml repository.

Rust and Vala

This post is based on my experience on not just using but creating and maintaining Vala libraries.

Rust is on the horizon and have voices to use it instead Vala. For Vala, we can say, is true to be niche oriented language, because it just create GObject based applications and libraries. For Rust, it has a more general purpose, with save concurrency and save programming: true again.

I have downloaded Rust to start using it, because seems to be very convenient for C development replacement, specially may be suitable to replace old libxml2 library, hopping some some have started to write an XML parser and writer, I can re-use in my own projects.

After taking a time to check at Rust documentation and write a post about GObject and Rust, I would like to share my thoughts about Vala and Rust.

Vala have a very specific target: GObject. We can use Vala to create GObject classes and define API interfaces using GInterface, but in a sugar and very productive syntax. Doing Object Oriented Programming using Vala and GObject is easy and natural.

I’ve managed to get W3C set of API interfaces based on specifications for DOM and SVG, because Vala provides a kindly close interface definition syntax to the ones from W3C. Implement them have been a matter of, you know, work. W3C interfaces are really complex in a way of their relations ships and dependencies, which should be very difficult to implement if doing in C and GObject/GInterface. This is not false for Rust, because it doesn’t have a GObject/GInterface mechanism, at least not jet.

In my opinion, Rust should develop its own, more powerful and secure, implementation of GObject/GInterface, based in their own internal types, like Traits.

I don’t know if Rust is planning to provide an Object Oriented like mechanisms, like GObject/GInteface, with properties, signals, introspection and inheritance, but is very convenient for other languages and implementing Object Oriented API, like the ones from W3C.

While I write this article, I’m studding Rust, to find equivalences and possibilities, thinking on how to port my work to Rust if possible. At least, for now, I can’t, but I have been just little time, I need to follow its development and road map.

Vala development can be used by Rust applications and libraries, because is C and GObject, and because it produce GIR files to create bindings easily for Rust, along with other languages.

If GObject is your choice to write your next library or Application, I can recommend to use Vala: you’ll get a very productive syntax and a easy bindable API.

No, I’ve don’t want to stop my self to Vala. Just is very convenient to produce GObject based libraries, use any C based library too, and is really productive. My work on GXml, can’t be just dropped and its capabilities to Serialize GObject classes, by introspection of its properties, are very useful and productive. I’ll share more in these later.

Rust and GObject

First all, I’m not a Rust programmer, this is just a point of view of my first impressions about Rust, from documentation of it, and how I see it to use with GObject.

From documentation Rust provides a low level and high level API to access common operations. Provides a set of assumptions to help its great features like automatic memory management, secure and concurrent data access. On high level side, Rust provides a rich set of common collection, iterators, tuples and others.

For GObject interoperability, there is a project , and this too, I found to allow you to use GObject based libraries in Rust, while they depends on other project, or directly on GObject Introspection generated XML files to introspect these C libraries.

I don’t see a GObject equivalent, not jet at least, into Rust. From GNOME developers site, I found an introduction to GObject:

  • A generic type system to register arbitrary single-inherited flat and deep derived types as well as interfaces for structured types. It takes care of creation, initialization and memory management of the assorted object and class structures, maintains parent/child relationships and deals with dynamic implementations of such types. That is, their type specific implementations are relocatable/unloadable during runtime.
  • A collection of fundamental type implementations, such as integers, doubles, enums and structured types, to name a few.
  • A sample fundamental type implementation to base object hierarchies upon – the GObject fundamental type.
  • A signal system that allows very flexible user customization of virtual/overridable object methods and can serve as a powerful notification mechanism.
  • An extensible parameter/value system, supporting all the provided fundamental types that can be used to generically handle object properties or otherwise parameterized types.

One of the most powerful features on GObject is C, but at the same time it its weakest one, because GObject through GObject Introspection makes easy to create bindings to any languages, including Rust. But is hard to write code for GObject classes and interfaces. GObject provides an Object Oriented programing paradigm to C.

I don’t think any one is thinking to rewrite GObject based libraries to Rust, because you can. Then lets put this option aside for a moment.

While Rust have great features, I would like to find a way to write a Rust library and share it through GObject Introspection GIR, making it available to other languages at day 0. Just remember GObject Introspection, is better suitable for GObject based libraries.

I don’t find in Rust a direct GInterface equivalent, no classes, no object properties and signals, no error reporting equivalent to GError. All of them are getting in, by bindings from GLib, GObject and GIO libraries, written in C, using GIR. I don’t think any one will re-write that libraries to Rust.

GObject bindings to Rust are not stable jet and may mature enough in a few years, depending on demand, resources and their use in new written code.

My personal conclusion
  • GNOME will relay on GObject for next 5 to 10 years.
  • GObject will be improved and maintained in same period.
  • GObject Introspection will be a vehicle to easy access C libraries from other languages, including Rust.
  • Rust will grow and hope they’ll add object oriented mechanisms equivalent to GObject/GInterface, in order to provide equivalent more secure and concurrent safe API to create new libraries.
  • C language, will be here for next 20 years, but may gradually delegated to raw operations.

 

GObject and SVG

GSVG is a project to provide a GObject API, using Vala. It has almost all, with some complementary, interfaces from W3C SVG 1.1 specification.

GSVG is LGPL library. It will use GXml as XML engine. SVG 1.1 DOM interfaces relays on W3C DOM, then using GXml is a natural choice.

SVG is XML and its DOM interfaces, requires to use Object’s properties and be able to add child DOM Elements; then, we need a new set  of classes.

GXml, have a Serialization framework, it can be used to provide GObject properties to XML Element properties and collection of child nodes as GObject. I’ve created some other projects, like LibreSCL, using it.

Serialization framework, requires to create an XML tree first, then fill out GObject properties. This could add some delays on large files.

Considering LibreSCL have to deal with files about 10 MB to 60 MB, with thousand of XML nodes, this process XML Tree -> GObject properties, could take 10 to 20 seconds.

A few time ago, I imagined to have a GObject class as a XML Node. This is, an XML Element node, represent a GObject, XML Element’s properties should be mapped directly to GObject’s ones and XML Element’s child nodes, should be a collection inside GObject’s properties.

Now with SVG and GXml supporting DOM4, I face the opportunity to create a GObject class you can derive from, to convert your classes in XML nodes, making serialization/deserialization faster and reducing memory footprint.

Let’s see what is coming and how they evolve. As always, any help is welcome.

PD. As a side note, I’ve able to copy/paste with little modifications, W3C’s interfaces definitions to Vala ones in a short time, because Vala’s syntax.

Vala and Reproducibility

Reproducibility, in Debian, is:

With free software, anyone can inspect the source code for malicious flaws. But Debian provide binary packages to its users. The idea of “deterministic” or “reproducible” builds is to empower anyone to verify that no flaws have been introduced during the build process by reproducing byte-for-byte identical binary packages from a given source.

Then, in order to provide reproducible binaries to Vala projects we need:

  1. Make sure you distribute generated C source code
  2. If you are a library, make sure to distribute VAPI and GIR files

This will help build process to avoid call valac in order to generate C source code, VAPI and GIR files from your Vala sources.

Because C source is distributed with a release’s tarball, any Vala project could be binary reproducible from sources.

In order to produce development packages, you should distribute VAPI and GIR files, along with .h ones. They should be included in your tarball, to avoid valac produce them.

GXml distribute all their C sources, but not GIR and VAPI files. This should be fixed next release.

GNOME Clocks distributes just Vala sources; this is why bug #772717 against Clocks, has been filed.

libgee distributes Vala sources also, but no Debian bug exists against it. May be its Vala source annotations helps, but may is a good idea to distribute C, VAPI and GIR files in future versions.

My patches to GNOME Builder, produce Makefiles to generate C sources form Vala ones. They require to be updated  in order to distribute VAPI and GIR files with your Vala project.

Should we drop Vala?

Richard Huges, on my recent post, has pointed out his interest to re-write GXml in C in order to avoid Vala dependency, quoting his words:

[…] Being honest; I’m not going to be depending on a vala library any time soon as I have to maintain all my stuff for the next decade and Vala isn’t an ideal long-term support option at all. […]

Is it Vala development a waste of time? Is Vala suitable for long term support libraries?

In GNOME, core technologies have been written in C, with high level bindings using Python and, as an intermediate solution Vala, both with strong relation to GObject Introspection.

Vala have a candy syntax, making write C API and C objects based on GLib really productive, while avoids any overhead.

While GLib have provide an Object Oriented approach because GObject, it is still C. You have to use C sentences all time even for the most common tasks. You can write really resource and speed optimized code if written in C.

GObject Introspection makes bindings to different languages available at release-day of C libraries, so for Vala ones. Vala create directly GIR required files at compilation time.

Vala generates C code. It isn’t of course, may never, perfect or reproducible. This is true and a hard issue for libraries providing interfaces, because its algorithm, I think, uses hash tables to store some parsed code representation, making impossible to generated reproducible code each time it is generated, because function position in its C form, could change producing ABI incompatible sources; this never change C behavior or security.

There are a set of fixes introduced recently to Vala to avoid interfaces issues, used by, for example libgee. This is the path for GXml to follow if it is written in Vala.

Reproducible C code generation may  require to change a lot of things internally in Vala compiler.

GLib and GObject uses macros to reduce hand written C code, Vala coders use Vala compilers because C development could be slow and error prone; and because you should care about memory management by hand too.

If Vala is not going to be maintained, because a set of core developers prefer to use C, and GNOME Builder is not going to take the time to improve its Vala support, because C is the way to go for long term support projects, then may we should stop to use Vala for libraries. Is it?

Improve Vala C code generation.

While there is macros in C to reduce burden on hand code generation, still there is room to have macros written in Vala to do the same. This is a really futuristic, wish list, sentence.

Vala syntax makes possible to write GXml, with the features it have today. Serialization framework, was possible once a GObject based API (and in Vala) was available and DOM4, was possible because we was able to copy and paste specification declarations making just few changes to Vala syntax.

Vala provides a high productive C code generator with lot of candy syntax to most common activities, like string manipulation, with automatic memory management, using secure GLib’s methods and so on.

Is it possible to improve Vala compiler to create more readable C code, suitable to switch from Vala to C?

Is it possible to embed Vala code in C or C in Vala code?

We need to explode how C generated code from Vala code, can be imported *as is* to GLib and improve over time, removing temporally variables and replace some more code to C’sh style according with the target project code standard. May this, task for GXml, is a little hard because it depends on libgee, another Vala library; but no impossible.

C limits expanded with Vala

C have lot of applications and Vala generics is pushing C limits with Generics, libgee use generics along a set of interfaces and implementations, allowing as to use different kind of collection for our objects, this powerful combination of GInteface and Vala generics, is a feature hard to reproduce (in its easy and convenient way in Vala syntax) to C API.

GObject and GInterface have a set of limits, expanded with Vala too. Now is possible to define very quickly a set of interfaces and implementations, figuring out really fast its relation ship, thanks to Vala syntax.

This advantages, in others out of the scope (and my mind), make me ask if we can embed Vala code in C or C in Vala code. Even if once a feature is mature enough in Vala code, may someone can (with C code generation optimization) import to other core libraries like GLib. Again this is one of my wish list.

Vala as a Long Term supported language

Vala syntax provides lot of advantages over hand written GObject classes, in the spirit of productivity.

GXml provides C and Vala API. Vala is more easy to use and allows to implement object oriented specifications (may be based on the idea of Java classes), but providing back a C API. Vala code is more Object Oriented syntax, than C/GObject.

Then how we can continue to contribute Vala libraries to be used by any one, and suitable to be used by any other projects even written in C?

How we can make a Vala library, like GXml, a GNOME core component with Long Term Support?

There are plenty of bugs and room to improvements to C code generation in Vala compiler, we, their users, should care about and may pay for improvements. I really want to some one add Jürg Billeter, Rico Tzschichholz or Matthias Berndt to the list of “Adopt a Hacker” list from GNOME Foundation, in order to push at least one of this ideas to improve GNOME infrastructure.