I just merged a new regex implementation for GtkSourceView’s language specifications. Previously it used
GRegex (based on
PCRE) and now it uses
PCRE2 directly similar to what VTE did.
Not only does this get us on a more modern PCRE implementation, but it also allows us to use new features such as a JIT.
JITs are interesting in that you can trade a little bit of memory and time to generate executable code upfront for huge gains in execution time. Given that you only compile language specifications once per regex, but execute them many, many times, it’s a worthwhile feature for GtkSoureView.
Trying to highlight the minified HTML of
google.com/ won’t even highlight (due to timeouts) with
GRegex. But with PCRE2 and the JIT, it can get by.
In many cases, I found that the cost to JIT was about 4x vs PCRE2 without JIT. For execution times it is about 4x reduction to use the JIT (but sometimes many times faster than that). When you run these regexes millions of times across an edited file, it can really cut down on the amount of energy consumed as well as time taken away from doing things like rendering GTK’s scene graph.
I should note that it’s about 4x improvement on a per-regex basis, so when you run potentially thousands of those in one main loop cycle, the improvement can be much more drastic in what you can do.
If you have any issues with language specifications please let us know! It’s a very large change, so I wouldn’t be surprised if there is some fallout.
|Language||Min (msec)||Max (msec)||Average (msec)||# of Calls||Notes|
|C (with JIT)||.004||.383||.031||91|
|CSS (with JIT)||.004||3.147||.101||198|
|Regex Execution Loading File|
|Language||Min (msec)||Max (msec)||Average (msec)||# Calls|
|C (with JIT)||.000||.061||.001||17698|
|CSS (with JIT)||.000||.061||.001||74812|