Gtef 2.0 – GTK+ Text Editor Framework

Gtef is now hosted on gnome.org, and the 2.0 version has been released alongside GNOME 3.24. So it’s a good time for a new blog post on this new library.

The main goal of Gtef is to ease the development of text editors and IDEs based on GTK+ and GtkSourceView, by providing a higher-level API.

Some background information is written on the wiki:

In this blog post I’ll explain in more details some aspects of Gtef: why a new library was needed, why calling it a framework, and one feature that I worked on during this cycle (a new file loader). There are more stuff already in the pipeline and will maybe be covered by future blog posts, stay tuned (and see the roadmap) ;)

Iterative API design + stability guarantees

In Gtef, I want to be able to break the API at any time. Because API design is hard, it needs an iterative process. Sometimes we see possible improvements several years later. But application developers want a stable API. So the solution is simple: bumping the major version each time an API break is desirable, every 6 months if needed! Gtef 1.0 and Gtef 2.0 are parallel-installable, so an application depending on Gtef 1.0 still compiles fine.

Gtef is a small library, so it’s not a problem if there are e.g. 5 different gtef *.so loaded in memory at the same time. For a library like GTK+, releasing a new major version every 6 months would be more problematic for memory consumption and application startup time.

A concrete benefit of being able to break the API at any time: a contributor (David Rabel) wanted to implement code folding. In GtkSourceView there are several old branches for code folding, but nothing was merged because it was incomplete. In Gtef it is not a problem to merge the first iteration of a class. So even if the code folding API is not finished, there has been at least some progress: two classes have been merged in Gtef. The code will be maintained instead of bit-rotting in a branch. Unfortunately David Rabel doesn’t have the time anymore to continue contributing, but in the future if someone wants to implement code folding, the first steps are already done!

Framework

Gtef is the acronym for “GTK+ Text Editor Framework”, but the framework part is not yet finished. The idea is to provide the main application architecture for text editors and IDEs: a GtkApplication on top, containing GtkApplicationWindow’s, containing a GtkNotebook, containing tabs (GtkGrid’s), with each tab containing a GtkSourceView widget. If you look at the current Gtef API, there is only one missing subclass: GtkNotebook. So the core of the framework is almost done, I hope to finish it for GNOME 3.26. I’ll probably make the GtkNotebook part optional (if a text editor prefers only one GtkSourceView per window) or replacable by something else (e.g. a GtkStack plus GtkStackSwitcher). Let’s see what I’ll come up with.

Of course once the core of the framework is finished, to be more useful it’ll need an implementation for common features: file loading and saving, search and replace, etc. With the framework in place, it’ll be possible to offer a much higher-level API for those features than what is currently available in GtkSourceView.

Also, it’s interesting to note that there is a (somewhat) clear boundary between GtkSourceView and Gtef: the top level object in GtkSourceView is the GtkSourceView widget, while the GtkSourceView widget is at the bottom of the containment hierarchy in Gtef. I said “somewhat” because there is also GtkSourceBuffer and GtefBuffer, and both libraries have other classes for peripheral, self-contained features.

New file loader based on uchardet

The file loading and saving API in GtkSourceView is quite low-level, it contains only the backend part. In case of error, the application needs to display the error (preferably in a GtkInfoBar) and for some errors provide actions like choosing another character encoding manually. One goal of Gtef will be to provide a simpler API, taking care of all kinds of errors, showing GtkInfoBars etc.

But how the backend works has an impact on the GUI. The file loading and saving classes in GtkSourceView come from gedit, and I’m not entirely happy with the gedit UI for file loading and saving. There are several problems, one of them is that GtkFileChooserNative cannot be used with the current gedit UI so it’s problematic to sandbox the application with Flatpak.

With gedit, when we open a file from a GtkFileChooserDialog, there is a combobox for the encoding: by default the encoding is auto-detected from a configurable list of encodings, and it is possible to choose manually an encoding from that same list. I want to get rid of that combobox, to always auto-detect the encoding (it’s simpler for the user), and to be able to use GtkFileChooserNative (because custom widgets like the combobox cannot be added to a GtkFileChooserNative).

The problem with the file loader implementation in GtkSourceView is that the encoding auto-detection is not that good, hence the need for the combobox in the GtkFileChooserDialog in gedit. But to detect the encoding, there is now a simple to use library called uchardet, maintained by Jehan Pagès, and based on the Mozilla universal charset detection code. Since the encoding auto-detection is much better with uchardet, it will be possible to remove the combobox and use GtkFileChooserNative!

Jehan started to modify GtkSourceFileLoader (or, more precisely, the internal class GtkSourceBufferOutputStream) to use uchardet, but as a comment in GtkSourceBufferOutputStream explains, that code is a big headache… And the encoding detection is based only on the first 8KB of the file, which results in bugs if for example the first 8KB are only ASCII characters and a strange character appears later. Changing that implementation to take into account the whole content of the file was not easily possible, so instead, I decided to write a new implementation from scratch, in Gtef, called GtefFileLoader. It was done in Gtef and not in GtkSourceView, to not break the GtkSourceView API, and to have the time in Gtef to write the implementation and API incrementally (trying to keep the API as close as possible to the GtkSourceView API).

The new GtefFileLoader takes a simpler approach, doing things sequentially instead of doing everything at the same time (the reason for the headache). 1) Loading the content in memory, 2) determining the encoding, 3) converting the content to UTF-8 and inserting the result into the GtkTextBuffer.

Note that for step 2, determining the encoding, it would have been entirely possible without uchardet, by counting the number of invalid characters and taking the first encoding for which there are no errors (or taking the one with the fewest errors, escaping the invalid characters). And when uchardet is used, that method can serve as a nice fallback. Since all the content is in memory, it should be fast enough even if it is done on the whole content (GtkTextView doesn’t support very big files anyway, 50MB is the default maximum in GtefFileLoader).

GtefFileLoader is usable and works well, but it is still missing quite a few features compared to GtkSourceFileLoader: escaping invalid characters, loading from a GInputStream (e.g. stdin) and gzip uncompression support. And I would like to add more features: refuse to load very long lines (it is not well supported by GtkTextView) and possibly ask to split the line, and detect binary files.

The higher-level API is not yet created, GtefFileLoader is still “just” the backend part.

This entry was posted in Gtef, GtkSourceView, Library development. Bookmark the permalink.

5 Responses to Gtef 2.0 – GTK+ Text Editor Framework

  1. anon says:

    Sometimes I have to edit files that have a non UTF-8 encoding, specifically of the ISO-8859 family. There are all kinds of ISO-8859 encodings for different languages, and there’s no way to reliably detect which language the file uses. If you remove the option to manually pick a file’s encoding in gedit I’ll lose the ability to edit these files, and that will be quite annoying!

    I’ve read uchardet’s source code for this, and it looks that it uses all kinds of heuristics to try and figure out the language of file (so it can pick the right ISO-8859 family) but those are just heuristics, so they’re not 100% accurate. It’s basically just a guess. It might guess correctly most of the time, but not always, and in those cases gedit will simply not allow me to properly edit the file :(

    • swilmet says:

      If the encoding auto-detection fails, there are two possibilities:

      1. It failed to find any encoding with zero invalid characters, in which case there will be an info bar to choose another encoding manually.

      2. uchardet has chosen a bad encoding and some characters are not well displayed (but there are no invalid characters). There will be no info bar, so what will be needed is to add in the menu a way to reload the file with another encoding. To make it easier to choose the right encoding, there can even be a list of encodings on the left, with the content displayed on the right corresponding to the selected encoding.

      It’s on the gedit roadmap since a long time, “Changing encoding of opened files”:
      https://wiki.gnome.org/Apps/Gedit/RoadMap

      But note that gedit is not yet ported to Gtef, unfortunately. It’s complicated, because gedit provides an API for plugins, and if we port gedit to Gtef, the API will most probably break, which would require to port the official gedit plugins (those shipped in the gedit and gedit-plugins git repositories), and of course some third-party plugins would be broken as well. To worsen the matter, libpeas (the library used for the plugin system) doesn’t support API versioning checks, so there are no checks to see if a certain plugin matches the current gedit API.

  2. liam says:

    GtkTextView doesn’t support very big files anyway, 50MB is the default maximum in GtefFileLoader).

    Any chance of getting gtktextview fixed so that it can actually load arbitrarily large files (up to memory limits, of course)?
    As it now stands, The textview in gedit just hangs when trying to load files larger than, about, 500KB (it really varies, for some reason). Opening in vim, no problem. Even openoffice works more often.

    • swilmet says:

      If a 500KB file makes gedit to hang, it’s probably because the file contains a very long line.

      I know that Christian Hergert (the developer of gnome-builder) wants to fix that situation, but it’s not that easy, GtkTextView is approximately 40k lines of code, it is quite complex.

      • liam says:

        Thanks for the response and sorry for the late reply.

        The behavior you describe, and I experience, is not a great thing. I’m afraid I can’t say for certain that all instances of gedit becoming unresponsive occurred when trying to load a for with a “very long line”. I do know that this problem occurs with gedit more than any other in my experience.
        If i may, it sounds like this library is failing to perform its primary job and the user just take greater care when deciding to use gedit than other editors.
        I’m sure this won’t be terribly useful since the app was never ported to gtk3, but genie was the fastest graphical editor I’ve ever used on Linux. I recall it being a good deal more responsive and robust than gedit was, despite offering quite a large number of features (for instance, it had code folding and robust syntax highlighting/completion, iirc).
        Please understand, it’s not my intent to denigrate the project. Given what your are working on I thought that relaying my experience might be helpful.

Comments are closed.

Leave a Reply

Your email address will not be published.