GXml and on-the-fly-post-parsing technique

I think this is new, so I’ll describe a new technique used in GXml to parse a large set of nodes in an XML document.

Parsing Large XML documents

If you have a large XML document, with a root with a number of child nodes, the standard technique is read all of them, including the child’s children ones, to create the XML tree. This process can take a while.

New on-the-fly parser

GXml now has a new custom parser called StreamReader used to read the root element and its children, but without any attribute and without any child’s children; the attributes and the children’s children are stored in a string on-the-fly in order to read the document almost at the same time it is read from the IO stream, for the root and for each children, improving the loading time of large XML documents up to 400% times faster than the previous technique already present in GXml.

On-the-fly-post-parsing

By using this On-the-fly-post-parsing technique, You can’t access the child’s children or the root’s attributes immediately after first read, you have to parse it from a temporally location in the GXml.Element class, using the new GXml.Element.parse_buffer() method, this one use the standard method, already present in GXml, to parse the root’s properties and the children’s children. When GXml.Element.parse_buffer() is called over the root, all children’s children are parsed recursively, but you can choose to parse just one of the root’s child, making a really convenient technique when you need just one root’s child node in a large XML document.

Multi-threading parsing

Currently GXml.Element.parse_buffer_async(), when called on root’s element, uses GLib.ThreadPool to parse each child in a different thread each and uses as many as threads are usable (less one) in your system. The expected behavior is getting a parse boost over the standard technique using in GXml: Xml.TextReader from the veteran libxml2 library running over just one thread. Currently a standard time parsing is provided when GXml.Element.parse_buffer_async() is called on document’s root, this maybe is a limitation on libxml2, because we have lot of Xml.TextReader running at the same time parsing element’s children; or a limitation on GLib.ThreadPool. Maybe the solution is a step away.

How to map objects to databases

GDA stands for GNOME Data Access and is a library to wrap database connections and its data using GObject classes, you can execute queries and much more.

VDA stands for Vala Data Access, is a library providing a heavily object oriented API to access databases’s data.

The API developed for VDA is going to be:

  • Heavily Object Oriented
  • Asynchronous almost by default
  • Provides GObject to database mapping

Object Oriented

GDA uses lot of structures, they are hard to be introspectable, so hard to be used outside C and Vala.

Providers now are Connection objects, so you have to instantiate them and then call open and catch opened signal to know if you are up with the connection.

SQL statements now are Query objects, created from string and now from structured objects with simple API, they can use execute on itself to get a TableModel, AffectedRows or any other Result object. An object for almost all kind of SQL commands will be added, with a simple easy to use API to provides the data it needs to be executed over the Connection.

There are models for tables, rows and columns, some of them implementing GLib.ListModel, so can iterate over their members, like rows in a table or columns in a row.

Asynchronous

Database operations can take time to be executed on servers, locally or remote, so you now have all Query execution as async methods, so for opening connections.

API

As you can notice, some API are still in development for VDA, so you can use the one already working or access GDA’s Connection objects if you are using Providers from it.

Eventually all API will be implemented by native Connection objects, without bind GDA. Searching to provide an easy and fast way to access introspection data from databases.

Easy API for New Database Servers

Currently GDA implementation for Providers is hard to implement for new database servers. VDA provides a new easy to implement interfaces, for new database servers, so is possible to extend VDA by creating new libraries, without depend on plugins.

Map objects to databases

Recently, VDA gains Vda.DataObject interface, it provides mapping your object to database’s table’s row, where the row’s columns are mapped to object’s properties and back.

Vda.DataObject supports:

  • Gets data from the database, through a SELECT query execution
  • Sets data in a table’s row using an UPDATE command
  • Creates new rows in the tables using  an INSERT command
  • Removes a row in the table using a DELETE command

Best of all you just need:

  • Implement Vda.DataObject with just 3 properties
  • Mark your object’s properties you want to map to the database’s table’s row’s column, using its nick with a text like: @Property Name::id, this is: your field’s in the data base can have any supported name, including spaces, we use @ to mark a property as to be mapped and ::id to mark it as an ID property used to query from the database.

All queries to read/write data to the database will be calculated automatically for you. Your class should set a Vda.Connection and the table’s name, through Vda.DataObject.database_connection and Vda.DataObject.database_table properties, this last one just at construction time.

This an example on how your code could be seen. Pay attention at initialization() method, was added here to show you how the table is created in the database and how the data is mapped using compatible types, in this case string to variant. In the near feature, could be possible to add automatic table creation if it doesn’t exits yet.

public class Client : Object, Vda.DataObject {

    // Database mapping
    [Description (nick="@id::id")]
    public string id { get; set; }
    [Description (nick="@name")]
    public string name { get; set; }
    [Description (nick="@description")]
    public string description { get; set; }
    [Description (nick="@phone")]
    public string phone { get; set; }

    construct {
      database_table_name = "clients";
      id = GLib.Uuid.string_random ();
    }

    public async string initialization () throws GLib.Error {
      var qct = database_connection.parse_string ("CREATE TABLE IF NOT EXISTS clients (id varchar(50), name varchar(50), description varchar(50), phone varchar(50))");
      yield qct.execute (null);
      var qi = database_connection.parse_string ("INSERT INTO clients (id, name, description, phone) VALUES ('"+id+"','"+name+"','"+description+"','"+phone+"')");
      yield qi.execute (null);
      return id;
    }

    // DataObject
    public string database_table_name { get; construct set; }
    public Vda.Connection database_connection { get; set; }
    public Cancellable cancellable { get; set; }
  }


GDA 5.2.9 Released

GDA is the GNOME Data Access library, able you to use GObject like API to access database provides like PostgreSQL, SQLite and MySQL, among others like JDBC, running queries and get data in data models for your application use.

This new release is an effort to fix bugs in 5.2 stable branch, including:

  • Fix Sun JRE 1.8 detection
  • Fix JDK 11.0 detection
  • Drop unneeded JAVAH variable check
  • Fix build for System Installed SQLite libs
  • Non-Dates and Timestamps values returns ‘NULL’ string when converted
  • Fix –with-ui, now UI is buildable when enable

Expect more fixes as this version is available in LTS distribution versions and current master targeting 6.0 release is getting more and more fixed on memory leaks, than ever.

GDA Starts Beta Stage

GDA 6.0 Beta has been Released. Valgrind tests and memory leak fixes are in progress.

To reach Beta stage, some providers and tools, like GTK widgets, has been cataloged as experimental and disable by default.

Now we start a Beta stage, where translation continue and release notes is a work in progress.

Please test it. Thanks to all contributions, translators and reporters. Report any issue you find, thanks in advance.

GDA and disable old stuff

After hundred of commits, GDA is getting fixes for most of its bugs around SQLite, PostgreSQL and MySQL providers. Lot of modernization work has been landing in master, making GCC more and more happy avoiding warnings, some of them are present, yet.

To have a warning free release, GDA needs to modify some internals, hope they are not too invasive in order to avoid bugs.

Internals in GDA is a long history, started in January 2002, evolving and implementing stuff not available in GLib or GObject at that time. But the things have changed and now we have modern techniques to implement interfaces and objects, so a modernization was started since I took the maintainership.

GDA has lot of tools, like entire applications and command line (a la psql), allowing you  to access your data. The one I use the most is its GObject Introspection bindings through Python, to extract, manipulate and push to or from different databases providers; this was the main task on modernization: change the API to be more binding friendly, specially on GObject Introspection; I think it is in a better shape now.

Unfortunately, most tools are buggy, like GTK widgets and so its derived GUI tools, in a level that is difficult to fix with the limited resources we have; this is true for some providers, because they are outdated. This is the reason we have put them in the “experimental zone“, to reduce the burden.

The advantage is that putting things aside, will help to push 6.0 out to the streets in a more short time.

For 6.0, we need to think about if we will keep current API and a very difficult way to implement new in-tree providers; or we take a big step forward and create interfaces you allow out-of-tree plugins. But that will be a different post.

GDA and GObject Introspection: Remember 1

This is just to help remember:

If you have used G_DECLARE_* macros in a C class header in order to modernize your old C code to resent GLib class definition methods; you are using GObject Introspection to parse it and generate GIR files; and you get this error:

"Fatal: Gda: Namespace conflict for 'your_type_get_type'"

Then remove your “*_get_type()” method definition from your header file.

Above have been the result to modernize GDA to recent practices and could be helpful for any one else.

Vala state: October 2018

Maintainability

While I think maintainability could be improved, adding to history commits from contributions, apart from the ones coming from current Maintainer. Actually, there are some lot of commits not in history coming from authors outside current ones. Hope with new GitLab GNOME’s instance, this will reflect the correct situation.

Behind scenes, Vala has to improve its code base to adapt to new requirements like to develop a descent Vala Language Server and more IEDs supporting Vala. At least for me, even GEdit is productive enough to produce software in Vala, because the language itself; write a Class, an Interface and implement interfaces, is 10 times faster in Vala than in C.

Vala has received lot of improvements in last development cycles, like a new POSIX profile, ABI stability, C Warnings improvements and many other, to be reported in a different article.

Look at Vala’s repository history, you will see more “feature” commits than “bindings” ones, contrary to the situation reported by Emmanuel, while should be a good idea to produce a graphic on this, but resent improvements could tell by them self the situation has been improved in recent release cycles.

Lets look at repository’s chart. It reports 2000 commits in the last 3 months, 1.1 average per day, from 101 contributions as for October 19, 2018. Me at 10 commits from the last year, so I’m far to be a core contributor, but push ABI stability to be a reality. My main contributions are to communicate Vala advances and status.

Vala as a language

Vala has its roots as a GObject oriented language. Currently for C too, because it can produce C code without dependent on GLib/GObject.

Program a C code base using an Object Oriented Programming is easy on Vala, even if the C output are GObject classes or not. You can produce Compact Classes, as for  Vala’s Manual:

Compact classes, so called because they use less memory per instance, are the least featured of all class types. They are not registered with the GType system and do not support reference counting, virtual methods, or private fields. They do support unmanaged properties. Such classes are very fast to instantiate but not massively useful except when dealing with existing libraries. They are declared using the Compact attribute on the class

In C a compact class is a simple struct, its fields can have accessors, so your users just call a get/set method to access it and run additional code when is done. Following code:

[vala]
[Compact]
class Caller {
  public string _name;
  public string name {
    get {
      return _name;
    }
    set {
      _name = value;
    }
  }
  public void call (string number) {}
}
[/vala]

Is translated to:

#include <glib.h>
#include <glib-object.h>
#include <stdlib.h>
#include <string.h>

typedef struct _Caller Caller;
#define _g_free0(var) (var = (g_free (var), NULL))

struct _Caller {
	gchar* _name;
};



void caller_free (Caller * self);
static void caller_instance_init (Caller * self);
void caller_call (Caller* self,
                  const gchar* number);
Caller* caller_new (void);
const gchar* caller_get_name (Caller* self);
void caller_set_name (Caller* self,
                      const gchar* value);


void
caller_call (Caller* self,
             const gchar* number)
{
	g_return_if_fail (self != NULL);
	g_return_if_fail (number != NULL);
}


Caller*
caller_new (void)
{
	Caller* self;
	self = g_slice_new0 (Caller);
	caller_instance_init (self);
	return self;
}


const gchar*
caller_get_name (Caller* self)
{
	const gchar* result;
	const gchar* _tmp0_;
	g_return_val_if_fail (self != NULL, NULL);
	_tmp0_ = self->_name;
	result = _tmp0_;
	return result;
}


void
caller_set_name (Caller* self,
                 const gchar* value)
{
	gchar* _tmp0_;
	g_return_if_fail (self != NULL);
	_tmp0_ = g_strdup (value);
	_g_free0 (self->_name);
	self->_name = _tmp0_;
}


static void
caller_instance_init (Caller * self)
{
}


void
caller_free (Caller * self)
{
	_g_free0 (self->_name);
	g_slice_free (Caller, self);
}

Could you find some improvements in C code generation?

As you can see, Vala generates code to create your struct, access your fields while run costume code, to free your struct and initialize it. Your C users will have a clean and easy API to use, as you expect, your struct.

If you use Vala to access your C struct, also written in Vala, it will create and free correctly automatically.

So,

Less code, more features.

Vala Pushing up Complex projects

GObject/GInterface are a powerfull tool, but is really hard to use in C if you have to write and implement a complex hierarchy of classes and interfaces. W3C DOM4, is a clear example of it; sure is possible implement in C, but is hundred times easy to implement and maintain in Vala.

GXml and GSVG are an example on how complex a hierarchy can be, and how they have been implemented in a short time.

GXml’s GomElement hierarchy just shows all interfaces it should implement to be DOM4’s Element.

Have you tried to write lot of fragmented featured interfaces in C and implement all of them in a single GObject class?

W3C SVG 1.1 specification have lot of interfaces to write and implement to have a fully compliant object’s class.

Implement GSVG over Vala, makes SVG 1.1 possible in a short time; but don’t stop here, because maintain this complex hierarchy, like move an object from final to derivable one, or move methods from a final to parent class, including its interface implementation, is hundred times easy to do in Vala than in plain C/GObject.

Vala allows: Complex Interface hierarchy implementation easy

While GSVG implementation is not completed, Vala is the right tool to use on this kind of projects.

Vala libraries are Introspectable and usable in other languages

Yes, C GObject classes are introspectable, but lot of annotations are required to produce GIR/TYPELIB, usable in Python or JavaScript.

Vala code produce Introspection Annotations as you write. Produce GIR XML file format is a matter to use --gir switch. Errors and warnings in Vala are focused to make your classes introspectable consistent; for example, you can’t return NULL in Non NULL declared methods.

Vala Scripting?

Vala Language Server

I’m working with a library called GNOME Vala Language Server (GVls), as a proof of concept for a server that will serve autocompletion, syntax highlighting and that kind of stuff, but found something interesting by accident.

I’ve added an interface called Client, may is not it final name, but it allows to locale a symbol in a already parsed file, along with some goodness from other interfaces and implementations, I’ll talk about in another article.

The Accident

While trying to create a in-line parser to detect if the actual word is an object or any other kind of symbol to, in the near future, propose a list of properties and methods or a list of method’s parameters, by accident Client now can parse in-line Vala code, so you push text to this Client object and a Server (the object that parse and handle Symbol collection), parse  it and just add new Symbols to its collection, without any effort or parse errors for incomplete code!!!!

So now if you push the folowing text:

class Push {\n

Server now find a Push symbol representing a new Vala class. Inserting a medhod:

public void callme () {

You can find in the Server a new symbol for Push.callme method. Just no error, even if the code is not completed yet.

Vala Scripting?

Well not for now. But with this work the doors are opening. This include Genie, because it has more script suitable syntax than Vala. Any way, leave to the time and some others’ help of course.

GDA 6.0 progress

GDA project has released 5.2.5 and tagged 5.2.6, with some improvements, but the real work is on master.

Master is targeting 6.0, a new ABI/API release, providing better GObject Introspection support and code modernization.

A new Meson build system is on the way to replace Autotools. Meson helped to implement, fix and test all changes in less time. Like on multi-threading, where is more easy to produce multiple parallel tests, helping to expose issues to fix. Master have big improvement on that matter.

API is almost the same with minor changes, so port to new one should be easy. Major API breaks will happen on 7.0 development cycle.

GDA have really old code, modernization of public API has been the priority; but, some parts are very sensitive to the change, so they may never will change or will wait to 7.0 cycle.

GDA’s GTK+ widgets, has been ported to avoid Deprecated API by recent 3.x releases. No heavy changes here, apart of the code modernization. GDA’s Control Center and Browser, has been fixed, so they are back from the dead. There are other, more modern applications out there like Sequeler, so in the short term may this tools old will be removed, unless there are interest by users.

My experiment on mix Vala code with C source in GDA should wait, but I found it functional, just some improvements in Vala code generation adding C source documentation are required.

Beyond 6.0, there are interest in improve implementation of new database Providers. At first, I’ve tried to make more deep changes in API to make room for a new providers platform, but the way current one and connection objects are tied each other makes that hard, so I decided to wait until 7.0 cycle starts.

In parallel, I’m developing a new Vala library called libvda, to experiment with a new platform for providers. It is exposing a set of interfaces, making more easy to have new providers; load them thought libpeas is planned.

Currently libvda has objects to connect to PostgreSQL and SQLite, using a common GDA provider object; add more existing GDA’s providers is a matter of a really few lines of code, thanks to Vala.

Because GDA has a very slow mechanism to generate Meta information about data base’s objects, I plan to expose meta queries directly, so they can be used faster. libvda, expose interfaces to access meta information using a GObject API, but there is a new DDL module in GDA for meta information, but that is another history for a different article.

Design and implement interfaces an objects in Vala is about 8 to 10 times faster, considering API changes in the development process. This is why I can expend time on libvda, to check out how could be on GDA and why I’m considering to mix Vala source code with C one.

Vala+GDA: An experiment

I’m working on GNOME Data Access, now on its GTK+ library, specially on its Browser, a GTK+ application more as a demo than a real database access tool.

GDA’s Browser, spots notable features in GDA’s non-GUI library. For example, support to create a single connection binding for two or more connections to different databases and from different providers (SQLite, PostgreSQL, MySQL), so you can create a single query and get a table result combining data from different sources. More details in another post.

The Experiment

Browser is getting back to life, but in the process I found crashes here and there. Fix them requires to walk around the code, found lots of opportunities to port it to new available technics to define GObject/GInterface API. Do that work may it’s too much work for a single man.

So, What about to use Vala to produce C code faster in order to rewrite existing or new objects?

This experiment, will be for 7.0 development cycle. For next 6.0 release, I would like to fix bugs as far as possible.

This experiment may help to answer another question: Is possible to port Vala code to C in order to be accepted by upstream projects like GXml’s DOM4 API to GLib?