Taint Tracking for Chromium

I forgot to blog about one of my projects. I had actually already talked about it more than one year ago and we had a paper at USENIX Security.

Essentially, we built a protection against DOM-based Cross-site Scripting (DOMXSS) into Chromium. We did that by detecting whenever potentially attacker provided strings become JavaScript code. To that end, we made the HTML rendering engine (WebKit/Blink) and the JavaScript engine taint aware. That is, we identified sources of values that an attacker could control (think window.name) and marked all strings coming from those sources as tainted. Then, during parsing of JavaScript, we check whether the string to be compiled is actually tainted. If that is indeed the case, then we abort the compilation.

That description is a bit simplified. For example, not compiling code because it contains some fragments of the URL would break a substantial number of Web sites. It’s an unfortunate fact that many Web sites either eval code containing parts of the URL or do a document.write with a string containing parts of the URL. The URL, in our attacker model, can be controlled by the attacker. So we must be more clever about aborting compilation. The idea was to only allow literals in JavaScript (like true, false, numbers, or strings) to be compiled, but not “code”. So if a tainted (sub)string compiles to a string: fine. If, however, we compile a tainted string to a function call or an operation, then we abort. Let me give an example of an allowed compilation and a disallowed one.


<HTML>

<TITLE>Welcome!</TITLE>
Hi

<SCRIPT>
var pos=document.URL.indexOf("name=")+5;
document.write(document.URL.substring(pos,document.URL.length));
</SCRIPT>

<BR>
Welcome to our system

…

</HTML>

Which is from the original report on DOM-based XSS. You see that nothing bad will happen when you open http://www.vulnerable.site/welcome.html?name=Joe. However, opening http://www.vulnerable.site/welcome.html?name=alert(document.cookie) will lead to attacker provided code being executed in the victim’s context. Even worse, when opening with a hash (#) instead of a question mark (?) then the server will not even see the payload, because Web browsers do not transmit it as part of their request.

“Why does that happen?”, you may ask. We see that the document.write call got fed a string derived from the URL. The URL is assumed to be provided by the attacker. The string is then used to create new DOM elements. In the good case, it’s only a simple text node, representing text to be rendered. That’s a perfectly legit use case and we must, unfortunately, allow that sort of usage. I say unfortunate, because using these APIs is inherently insecure. The alternative is to use createElement and friends to properly inject DOM nodes. But that requires comparatively much more effort than using the document.write. Coming back to the security problem: In the bad case, a script element is created with attacker provided contents. That is very bad, because now the attacker controls your browser. So we must prevent the attacker provided code from execution.

You see, tracking the taint information is a non-trivial effort and must be done beyond newly created DOM nodes and multiple passes of JavaScript (think eval(eval(eval(tainted_string)))). We must also track the taint information not on the full string, but on each character in order to not break existing Web applications. For example, if you first concatenate with a tainted string and then remove all tainted characters, the string should not be marked as tainted. This non-trivial effort manifests itself in the over 15000 Lines of Code we patched Chromium with to provide protection against DOM-based XSS. These patches, as indicated, create, track, propagate, and evaluate taint information. Also, the compilation of JavaScript has been modified to adhere to the policy that tainted strings must only compile to literals. Other policies are certainly possible and might actually increase protection (or increase compatibility without sacrificing security). So not only WebKit (Blink) needed to be patched, but also V8, the JavaScript engine. These patches add to the logic and must be execute in order to protect the user. Thus, they take time on the CPU and add to the memory consumption. Especially the way the taint information is stored could blow up the memory required to store a string by 100%. We found, however, that the overhead incurred was not as big as other solutions proposed by academia. Actually, we measure that we are still faster than, say, Firefox or Opera. We measured the execution speed of various browsers under various benchmarks. We concluded that our patched version added 23% runtime overhead compared to the unpatched version.

xss-runtime

As for compatibility, we crawled the Alexa Top 10000 and observed how often our protection mechanism has stopped execution. Every blocked script would count towards the incompatibility, because we assume that our browser was not under attack when crawling. That methodology is certainly not perfect, because only shallowly crawling front pages does not actually indicate how broken the actual Web app is. To compensate, we used the WebKit rendering tests, hoping that they cover most of the important functionality. Our results indicate that scripts from 26 of the 10000 domains were blocked. Out of those, 18 were actually vulnerable against DOM-based XSS, so blocking their execution happened because a code fragment like the following is actually indistinguishable from a real attack. Unfortunately, those scripts are quite common πŸ™ It’s being used mostly by ad distribution networks and is really dangerous. So using an AdBlocker is certainly an increase in security.


var location_parts = window.location.hash.substring(1).split(’|’);
var rand = location_parts[0];
var scriptsrc = decodeURIComponent(location_parts[1]);
document.write("<scr"+"ipt src=’" + scriptsrc + "’></scr"+"ipt>");

Modifying the WebKit for the Web parts and V8 for the JavaScript parts to be taint aware was certainly a challenge. I have neither seriously programmed C++ before nor looked much into compilers. So modifying Chromium, the big beast, was not an easy task for me. Besides those handicaps, there were technical challenges, too, which I didn’t think of when I started to work on a solution. For example, hash tables (or hash sets) with tainted strings as keys behave differently from untainted strings. At least they should. Except when they should not! They should not behave differently when it’s about querying for DOM elements. If you create a DOM element from a tainted string, you should be able to find it back with an untainted string. But when it comes to looking up a string in a cache, we certainly want to have the taint information preserved. I hence needed to inspect each and every hash table for their usage of tainted or untainted strings. I haven’t found them all as WebKit’s (extensive) Layout tests still showed some minor rendering differences. But it seems to work well enough.

As for the protection capabilities of our approach, we measured 100% protection against DOM-based XSS. That sounds impressive, right? Our measurements were two-fold. We used the already mentioned Layout Tests to include some more DOM-XSS test cases as well as real-life vulnerabilities. To find those, we used the reports the patched Chromium generated when crawling the Web as mentioned above to scan for compatibility problems, to automatically craft exploits. We then verified that the exploits do indeed work. With 757 of the top 10000 domains the number of exploitable domains was quite high. But that might not add more protection as the already existing built in mechanism, the XSS Auditor, might protect against those attacks already. So we ran the stock browser against the exploits and checked how many of those were successful. The XSS Auditor protected about 28% of the exploitable domains. Our taint tracking based solution, as already mentioned, protected against 100%. That number is not very surprising, because we used the very same codebase to find vulnerabilities. But we couldn’t do any better, because there is no source of DOM-based XSS vulnerabilities…

You could, however, trick the mechanism by using indirect flows. An example of such an indirect data flow is the following piece of code:


// Explicit flow: Taint propagates
var value1 = tainted_value === "shibboleth" ? tainted_value : "";
// Implicit flow: Taint does not propagate
var value2 = tainted_value === "shibboleth" ? "shibboleth" : "";

If you had such code, then we cannot protect against exploitation. At least not easily.

For future work in the Web context, the approach presented here can be made compatible with server-side taint tracking to persist taint information beyond the lifetime of a Web page. A server-side Web application could transmit taint information for the strings it sends so that the client could mark those strings as tainted. Following that idea it should be possible to defeat other types of XSS. Other areas of work are the representation of information about the data flows in order to help developers to secure their applications. We already receive a report in the form of structured information about the blocked code generation. If that information was enriched and presented in an appealing way, application developers could use that to understand why their application is vulnerable and when it is secure. In a similar vein, witness inputs need to be generated for a malicious data flow in order to assert that code is vulnerable. If these witness inputs were generated live while browsing a Web site, a developer could more easily assess the severity and address the issues arising from DOM-based XSS.

FOSDEM 2016

It the beginning of the year and, surprise, FOSDEM happened πŸ™‚ This year I even managed to get to see some talks and to meet people! Still not as many as I would have liked, but I’m getting there…

Lenny talked about systemd and what is going to be added in the near future. Among many things, he made DNSSEC stand out. I not sure yet whether I like it or not. One the one hand, you might get more confidence in your DNS results. Although, as he said, the benefits are small as authentication of your bank happens on a different layer.

Giovanni talked about the importance of FOSS in the surveillance era. He began by mentioning that France declared the state of emergency after the Paris attacks. That, however, is not in line with democratic thinking, he said. It’s a tool from a few dozens of years ago, he said. With that emergency state, the government tries to weaken encryption and to ban any technology that may be used by so called terrorists. That may very well include black Seat cars like the ones used by the Paris attackers. But you cannot ban simple tools like that, he said. He said that we should make our tools much more accessible by using standard FLOSS licenses. He alluded to OpenSSL’s weird license being the culprit that caused Heartbleed not to have been found earlier. He also urged the audience to develop simpler and better tools. He complained about GnuPG being too cumbersome to use. I think the talk was a mixed bag of topics and got lost over the many topics at hand. Anyway, he concluded with an interesting interpretation of Franklin’s quote: If you sacrifice software freedom for security you deserve neither. I fully agree.

In a terrible Frenglish, Ludovic presented on Python’s async and await keywords. He said you must not confuse asynchronous and parallel execution. With asynchronous execution, all tasks are started but only one task finishes at a time. With parallel execution, however, tasks can also finish at the same time. I don’t know yet whether that description convinces me. Anyway, you should use async, he said, when dealing with sending or receiving data over a (mobile) network. Compared to (p)threads, you work cooperatively on the scheduling as opposed to preemptive scheduling (compare time.sleep vs. asyncio.sleep).

Aleksander was talking on the Tizen security model. I knew that they were using SMACK, but they also use a classic DAC system by simply separating users. Cynara is the new kid on the block. It is a userspace privilege checker. A service, like GPS, if accessed via some form of RPC, sends the credentials it received from the client to Cynara which then makes a decision as to whether access is allowed or not. So it seems to be an “inside out” broker. Instead of having something like a reference monitor which dispatches requests to a server only if you are allowed to, the server needs to check itself. He went on talking about how applications integrate with Cynara, like where to store files and how to label them. The credentials which are passed around are a SMACK label to identify the application. The user id which runs the application and privilege which represents the requested privilege. I suppose that the Cynara system only makes sense once you can safely identify an application which, I think, you can only do properly when you are using something like SMACK to assign label during installation.

Daniel was then talking about his USBGuard project. It’s basically a firewall for USB devices. I found that particularly interesting, because I have a history with USB security and I do know that random USB devices pose a problem. We are also working on integrating USB blocking capabilities with GNOME, so I was keen on meeting Daniel. He presented his program, what it does, and how to use it. I think it’s a good initiative and we should certainly continue exploring the realm of blocking USB devices. It’s unfortunate, though, that he has made some weird technological choices like using C++ or a weird IPC system. If it was using D-Bus then we could make use of it easily :-/ The talk was actually followed by Krzyzstof who I reported on last time, who built USB devices in software. As I always wanted to do that, I approached him and complained about my problems doing so πŸ˜‰

Chris from wolfSSL explained how they do testing for their TLS implementation. wolfSSL is 10 years old and secures over 1 billion endpoints, he said. Most interestingly, they have interoperability testing with other TLS implementations. He said they want to be the most well tested TLS library available which I think is a very good goal! He was a very good speaker and I really enjoyed learning about their different testing strategies.

I didn’t really follow what Pam was talking about implicit trademark and patent licenses. But it seems to be an open question whether patents and trademarks are treated similarly when it comes to granting someone the right to use “the software”. But I didn’t really understand why it would be a question, because I haven’t heard about a case in which it was argued that the right on the name of the software had also been transferred. But then again, I am not a lawyer and I don’t want to become one…

Jeremiah referred on safety-critical FOSS. Safety critical, he said, was functional safety which means that your device must limp back home at a lower gear if anything goes wrong. He mentioned several standards like IEC 61508, ISO 26262, and others. Some of these standards define “Safety Integrity Levels” which define how likely risks are. Some GNU/Linux systems have gone through that certification process, he said. But I didn’t really understand what copylefted software has to do with it. The automotive industry seems to be an entirely different animal…

If you’ve missed this year’s FOSDEM, you may want to have a look at the recordings. It’s not VoCCC type quality like with the CCCongress, but still good. Also, you can look forward to next year’s FOSDEM! Brussels is nice, although they could improve the weather πŸ˜‰ See you next year!

Talking on Searchable Encryption at 32C3 in Hamburg, Germany

This year again, I attended the Chaos Communication Congress. It’s a fabulous event. It has become much more popular than a couple of years ago. In fact, it’s so popular, that the tickets (probably ~12000, certainly over 9000) have been sold out a week or so after the sales opened. It’s gotten huge.

This year has been different than the years before. Not only were you able to use your educational leave for visiting the CCCongress, but I was also giving a talk. Together with my colleague Christian Forler, we presented on Searchable Encryption. We had the first slot on the last day. I think that’s pretty much the worst slot in the schedule you could get πŸ˜‰ Not only because the people are in Zombie mode, but also because you have received all those viruses and bacteria yourself. I was lucky enough, but my colleague was indeed sick on day 4. It’s a common thing to be sick after the CCCongress :-/ Anyway, we have hopefully entertained the crowd with what I consider easy slides, compared to the usualβ„’ Crypto talk. We used a lot imagery and tried to allude to funny stuff. I hope people enjoyed it. If you have seen it, don’t forget to leave feedback! It was hard to decide on the appropriate technical level for the almost 1800 people. The feedback we’ve received so far is mixed, so I guess we’ve hit a good spot. The CCCongress was amazingly organised for speakers. They really did care for us and made sure everything was right. So everything was perfect expect for pdfpc which crashed whenever it was meant to display a certain slide… I used Evince then and it worked…

The days at the CCCongress were intense as you might be able to tell from the Fahrplan. It generally started at about 12:00 and ended at about 01:00. And that’s only the talks. You can’t avoid bumping into VIP (very interesting people) and thus spend time in the hallway. And then you have these amazing parties. This year, they had motor-homes and lasers in the dance hall (last year it was a water cannon…). Very crazy atmosphere. It’s highly recommended to spend a night there.

Anyway, day 1 started for me with the Keynote by Fatuma Musa Afrah. The speaker stretched her time a little, I felt. At the beginning I couldn’t really grasp what her topic was or what she wanted to tell us. She repeatedly told us that we had to “kill the time together” which killed my sympathy to some extent. The conference’s motto was Gated Communities. She encouraged us to find ways to open these gates by educating people and helping them. She said that we have to respect each other irrespective of the skin colour or social status. It was only later that she revealed being refugee who came to Germany. Although she told us that it’s “Newcomers”, not “refugees”. In fact, she grew up in Kenya where she herself was a newcomer. She fled to Kenya, so she fled twice. She told us stories about her arriving and living in Germany. I presume she is breaking open the gates which separate the communities she’s living in, but that’s speculation. In a sense she was connecting her refugee community with our hacker community. So the keynote was interesting for that perspective.

Joanna Rutkowska then talked about trustworthy laptops. The basic idea is to have no state on the laptop itself, i.e. no place where malware could be injected. The state should instead be kept on a personal storage medium, like an SD card or a pen drive. She said that laptops are inherently not trustworthy. Trust, she said can be broken up into Trusted, Secure, and Trustworthy. Secure is resistant to attacks. Trusted is something we, as Security community, do not want to have, like a Trusted third party. Trustworthy, she said, is something different, like the Intel Management Engine which might be resistant to attacks, yet it is not acting in the interest of the user. Application level security is meaningless, she said, when we cannot trust the Operating System, because it is the trusted part. If it is compromised then every effort is not useful. Her project, Qubes OS, attempts to reduce the Trusted Computing Base. What is Operating system to the application, is the hardware to the Operating system. The hardware, she said, has been assumed to be trusted. A single malicious peripheral, like a malicious wifi module, can compromise the whole personal computer, the whole digital life, she said. She was referring to problems with Intel x86 platforms. Present Intel processors integrate everything on the main chip. The motherboard has been made more or less only a holder for the CPU and the memory. The construction of those big chips is completely opaque. We have no control over what is inside. And we cannot look inside, even if we wanted to. The firmware is being loaded during boot from a discrete element on the mainboard. We cannot, however, verify what firmware really is on the chip. One question is how to enforce read-only-ness of the system or how to upload your own firmware. For many years, she and others believed that TPM, TXT, or UEFI secure boot could solve that problem. But all of them have shown to fail horribly, she said. Unfortunately, she didn’t mention how so. So as of today, there is no such thing as a secure boot. Inside the processor is a management engine which special, because it is the perfect entry for backdooring and zombification of personal computing. By zombification, she means that the involvement of the Apps (vs. OS vs. Hardware) is decreasing heavily and make the hardware have much more of a say. She said that Intel wants to make the Hardware fully control your computing by having much more logic in the management engine. The ME is, in a sense, a gated community, because you cannot, whatsoever, inspect it, tinker with it, or otherwise get in touch. She said that the war is lost on X86. Even if we didn’t have the management engine. Again, she didn’t say why. Her proposal is to move all those moving firmware parts out to a trusted storage. It was an interesting perspective on what I think is a “simple” Free Software problem. Because we allow proprietary software, we now have the problem to see what is loaded into the hardware. With Free Software we’d still have backdoors in hardware, but assuming that most functionality is encoded in firmware, we could see and modify the existing firmware and build, run, and share our “better” firmware.

Ilja van Sprundel talked about Windows driver security or rather their attack surface. I’m not necessarily interested in Windows per se, but getting some lower level knowledge sounded intriguing. He gave a more high level overview of what to do and what to not do when doing driver development for Windows. The details, he said, matter. For example whether the IOManager probes a buffer in *that* instance. The Windows kernel is made of several managers, he said. The Windows Driver Model (WDM) is the standard model for how drivers are written. The IO Manager proxies requests from user to (WDM drivers. It may or may not validate arguments. Another central piece in the architecture are IO Request Packets (IRPs). They are being delivered from the IO Manager to the driver and contain all the necessary information for the operation in question. He went through the architecture really fast and it was hard for a kernel newbie like me to follow all the concepts he mentioned. Interestingly though, the IO Manager seems to also care about transferring the correct amount of memory from userspace to kernel space (e.g. makes sure data does not overflow) if you want it to using METHOD_BUFFERED. But, as he said, most of the drivers use METHOD_NEITHER which does not check anything at all and is the endless source of driver bugs. It seems as if KMDF is an alternative framework which makes it harder to have bugs. However, you seem to need to understand the old framework in order to use that new one properly. He then went on to talk about the actual attack surface of drivers. The bugs are either privilege escalation, denial of service, or information leak. He said that you could avoid the problem of integer overflow by using the intsafe library. But you have to use them properly! Most importantly, you need to check their return type and use the actual values you want to have been made safe. During creation of a device, a driver can call either IoCreateDeviceSecure with an SDDL string or use an INF file to ACL the device. That is, however, done either rarely or wrongly, he said. You need to think about who needs to have access to your device. When you work with the IOManager, you need to check whether Irp->MdlAddress is NULL which can happen, he said, if it’s a zero sized buffer. Similarly, when using the safer METHOD_BUFFERED mentioned earlier, Irp->AssociatedIrp.SystemBuffer can also be NULL. So avoid having that false sense of security when using that safe API. Another area of bugs is the cancellation of IRPs. The userland can cancel requests which apparently is not handled gracefully by drivers and leads to deadlocks, memory leaks, race conditions, double frees, and other classes of bugs. When dealing with data from userland, you are supposed to “probe” the memory which is basically checking whether the pointers are valid and in the expected range. If you don’t do that, it’ll lead to you writing to arbitrary kernel memory. If you do validate the data from userspace, make sure you don’t fetch it again from user space assuming that it hasn’t changed. There might be race between your check and your usage (TOCTOU). So better capture, validate, and use the data. The same applies when using MDLs. That, however, is more tricky, because you have a double mapping and you are using a kernel pointer. So it is very subtle. When you do memory allocation you can either use ExAllocatePool or ExAllocatePoolWithQuota. The latter throws an exception instead of returning NULL. Your exception or NULL pointer handling needs to be double checked, he said. It was a very technical talk on Windows which was way out of my comfort zone. I only understood a tiny fraction of what he was presenting. But I liked it for the new insight on Windows drivers and that the same old classes of bug have not died yet.

High up on my list of awaited talks was the talk on train systems by the SCADA strangelove people. Railways, he said, is the biggest system built by mankind. It’s main components are signals and switches. Old switches are operated manually by pure force. Modern switches are interlocked with signals such that the signals display forbidden entry when switches are set in certain positions. On tracks, he said, signals are transmitted over the actual track by supplying them with AC or DC. The locomotive picks up the signals and supplies various systems with them. The Eurostar, they said, has about seven security systems on board, among them a “RPS”, a Reactor Protection System which alludes to nuclear trains… They said that lately the “Bahn Automatisierungssystem (SIBAS)” has been updated to use much more modern and less proprietary soft- and hardware such as VxWorks and x86 with ELF binaries as well as XML over HTTP or SS7. In the threat model they identified, they see several attack vectors. Among them are making someone plug a malicious USB device in controlling machines in some operation center. He showed pictures from supposedly real operation centers. The physical security, he said, is terrible. With close to no access control. In one photograph, he showed a screenshot from a documentary aired on TV which showed credentials sticking on the screen… Even if the security is quite good, like good physical security and formally proven programs, it’s still humans who write the software, he said, so there will be bugs to be exploited. For example, he showed screenshots of when he typed “railway” into Shodan and the result included a good number of railway stations. Another vector is GSM-R. If you jam the train’s GSM-R connection, the train will simply stop. Also, you might be able to inject SIM toolkit malware. Via that vector, you might make the modem identify as, e.g. a keyboard and then penetrate further into the systems. Overall an entertaining talk, but the claims were a bit over the top. So no real train hacking just yet.

The talk on memory corruption by Mathias Payer started off by saying that software is unsafe and insecure. Low level languages trade type safety and memory safety for performance. A large set of legacy applications, he said, are prone to memory vulnerabilities. There are, he continued too many bugs to find and fix manually. So we need a runtime system to ensure security. An invalid dereference or an out of bounds pointer is the core of memory unsafety problems. But according to the C language, he claimed, it’s only a violation if the invalid pointer is read, written to, or freed. So during runtime, there are tons and tons of dangling pointers which is perfectly fine. With such a vulnerability a control-flow attack could be executed. Several defenses exist: Data Execution Prevention prevents code from actually being executed. Address Space Layout Randomisation scrambles the memory locations of executable code which makes it harder to successfully exploit a vulnerable. Stack canaries are special values which are supposed to detect overflowing writes. Safe exception handlers ensure that exception code paths follow predefined patterns. The DEP can only work together with ASLR, he said. If you broke ASLR, you could re-use existing code; as it turns out, people do break ASLR every now and then. Two new mechanisms are Stack Integrity and Code Flow Integrity. Stack Integrity enforces to return to the actual caller by having a shadow stack. He didn’t mention how that actually works, though. I suppose you obtain a more secret stack address somewhere and switch the stack pointer before returning to check whether the return address is still correct. Control Flow Integrity builds a control flow graph during compilation and for every control flow change it checks at run time whether the target address is allowed. Apparently, many CFI implementations exist (eleven were shown). He said they’ve measured those and IFCC and Lockdown performed rather badly. To show how all of the protection mechanisms fail, he presented printf-oriented programming. He said that printf was Turing complete and presented a domain specific language. They have built a brainfuck interpreter with snprintf calls. Another rather technical talk by a good speaker. I remember that I was already impressed last year when he presented on these new defense mechanisms.

DJB and Tanja Lange started their “late night show” by bashing TLS as a “gigantic clusterfuck”. They were presenting on quantum computing and cryptography. They started by mentioning that the D-Wave quantum computer exists, but it’s not useful, he said. It doesn’t do the basic things, and can only do limited computations. It can especially not perform Shor’s algorithm. So there’s no “Shor monster coming”. They recommended the Timeline of Quantum Computing as a good reference of the massive research effort going into quantum computing. If there was a general quantum computer pretty much every public key scheme deployed on the Internet today will be broken. But also symmetric schemes are under attack due to Grover’s algorithm which speeds up brute force algorithms significantly. The solution could be physical crypto like using strong (physical) locks. But, he said, the assumptions of those systems are already broken. While Quantum Key Distribution is secure under certain assumptions, those assumptions are off, he said. Secure schemes that survive the quantum era were the topic of their talk. The first workshop on that workshop happened in 2006 and efforts are still being made, e.g. with EU projects on the topic. The time it takes for a crypto scheme to gain significant traction has been long, so far. They gave ECC as an example. It has been introduced in the 1980s, but it’s only now that it’s taking over the deployed crypto on the Internet. So the time it takes is long. They gave recommendations on what to do to have connections that are secure “for at least the next hundred years”. These include at least 256 bit keys for symmetric encryption. McEliece with binary Goppa codes n=6960 k=5413 t=119. An efficient implementation of such a code based scheme is McBits, she said. Hash based signatures with, e.g. XMSS or SPHINCS-256. All you need for those is a proper hash function. The stuff they recommend for the next 100 years, like the McEliece system, are things from the distant past, she said. He said that Post Quantum Cryptography will be the standard in a couple of years from now so he urged the cryptographers in the audience to “get used to this stuff”.

Next on my list was Markus’ talk on Landesverrat which is the incident of Netzpolitik.org being investigated for revealing secret documents. He referred on the history of the case, how it came around that they were suspected of revealing secret documents. He said that one of their believes is to publish their sources, even the secret ones. They want their work to be critically reviewed and they believe that it is only possible if the readers can inspect the sources. The documents which lead to the criminal investigations were about finances of the introduction of the XKeyscore software. Then, the president of the “state security” filed a case against because of revealing secret documents. They wanted to publish the investigation files, but they couldn’t see them, because they were considered to be more secret than the documents they have already published… From now on, he said, they are prepared for the police raiding their offices, which I suppose is good standard preparation. They were lucky, he said, because their case fell into the regular summer low of news which could make the case become quite popular in the media. A few weeks earlier or later and they were much less popular due to the refugees or Greece. During the press coverage, they had a second battleground where they threw out a Russian television team who entered their offices without having called or otherwise introduced themselves… For the future, he wants to see changes in what is considered to be a state secret. He doesn’t want the government to decide what such a secret is. He also wants to have much more protection for whistle blowers. Freedom of press should also hold for people who do not blog for their “occupation”, but also hobbyists.

Vincent Haupert was then talking on App-based TAN online banking methods. It’s a classic two factor method: Not only username and password, but also a TAN. These TAN methods have since evolved in various ways. He went on to explain the general online banking process: You log in with your credentials, you create a new wire transfer and are then asked to provide a TAN. While ChipTAN would solve many problems, he said, the banking industry seems to want their customers to be able to transfer money everywhereβ„’. So you get to have two “Apps” on your mobile computer. The banking app and a TAN app. However, malware in “official” app stores are a reality, he said. The Google Playstore cannot protect against malware, as a colleague of him demonstrated during his bachelor thesis. This could also been by the “Brain Test” app which roots your device and then loads malware. Anyway, they hijacked the connection from the banking app to modify the recipient of the issued wire transfer and the TAN being pushed on the device. They looked at the apps and found that they “protected” their app with “Promon Shield“. That seems to be a strong obfuscation framework. Their attack involved tricking the root and hooks detection. For the root detection they check on the file system for certain binaries. He could simply change the filenames and was good to go. For the hooks (Xposed) it was pretty much the same with the exception of a few filenames which needed more work. With these modifications they could also “hack” the newer version 1.0.7. Essentially the biggest part of the problem is that the two factors are on one device. If the attacker hijacks that one device then ,

The talk by Christian Schaffner on Quantum Cryptography was introducing the audience to quantum mechanics. He said that a qubit can be imagined as the direction of a polarised photon. If you make the direction of the photons either horizontal or vertical, you can imagine that as representing 0 or 1. He was showing an actual physical experiment with a laser pointer and polarisation filters showing how the red dot of the laser pointer is either filtered or very visible. He also showed how actually measuring the polarisation changes the state of the photons! So yet another filter made the point in the back brighter. That was a bit weird, but that’s quantum mechanics. He showed a quantum random number generator based on that technology. One important concept is the no-cloning theorem which state that you can make a perfect copy of a quantum bit. He also compared current and “post quantum” crypto systems against efficient classical attackers and efficient quantum attackers. AES, SHA, RSA (or discrete logs) will be broken by quantum attacks. Hash-based signatures, McEliece, and lattice-based cryptography he considered to be resistant against quantum based attacks. He also mentioned that Quantum Key Distribution systems will also be against an exhaustive attacker who applies brute force. QKD is based on the no-cloning theorem so an eavesdropper cannot see the same bits as the communicating parties do. Finally, he asked how you could prove that you have been at a certain location to avoid the pizza delivery problem (i.e. to be certain about the place of delivery).

Fefe was talking on privileges. He said that software will be vulnerable. Various techniques should be applied such as simply fixing all the bugs (haha…) or make exploitation harder by applying ASLR or ROP protection. Another idea is to put the program in a straight jacket and withdraw privileges. That sounds a lot like containerisation. Firstly, you can drop your privileges from superuser down to the least privileges you need, then do privilege separation. Another technique is the admin confining the app in a jail instead of the app confining itself. Also, you can implement access control via a broker service by splitting up your process into, say, a left half which opens and reads files and a right half which processes data. When doing privilege separation, the idea is to split up the process into several separately running programs. Jailing is like firewall rules for syscalls which, he said, is impossible for complex programs. He gave Firefox as an example of it being impossible to write a rule set for. The app containing itself is like a werewolf chaining itself to the wall before midnight, he said. You restrict yourself from opening files, creating socket, or from attaching yourself as a debugger to other processes. The broker service is probably like a reference monitor. He went on showing how old-school privilege dropping works. You could do it as easily as seteuid(getuid()), but that’s not enough, because there is the saved UID, so you need to setresuid and not forget to check the return code. Because the call can fail if, for example, the target UID had already been running too many processes for its quota. He said that you should fail the build if your target platform does not provide setresuid. However, dropping privileges is more than setting your UID. It’s also about freeing resources you don’t necessarily need. Common approaches to jailing your process are to have a fake filesystem with only the necessary files, so your process cannot ever access anything that it shouldn’t. On Linux, that would probably be chroot. However, you can escape using fchdir . Also, mounting your /proc into the chroot, information about the host is exposed. So you need to do more work than calling chroot. The BSDs, he said, have Securelevel which is a kernel mode that only increases which withdraws certain privileges. They also have jails which is a chroot on steroids, he said. It leaks some information due the PIDs, though, he said.

The next talk was on Shellphish, an automatic exploitation framework. This is really fascinating stuff. It’s been used for various Capture the Flag contests which are basically about hacking other teams’ software services. In fact, the presenters were coming from the UCSB which is hosting the famous iCtF. They went from solving security challenges to developing a system which solves security challenges. From a vulnerability binary, they (automatically) develop an exploit and a patched binary which is immune to the exploit, but preserves the functionality of the program. They automatically find vulnerabilities, patches, and test both the exploits and the patches. For the automated vulnerability component, they presented Angr. It has a symbolic execution engine looking for memory accesses outside allocated regions and unconstrained instruction pointer which is a jump controlled by user input (JMP eax). They have written a paper for NDSS about “Augmenting Fuzzing Through Selective Symbolic Execution“. Angr is a Python library and they showed how to use it for identifying the overhyped Back to 28 vulnerability. Actually, there is too much state for a regular symbolic executor to find this problem. Angr does “veritesting“. He showed that his Angr script found the vulnerability by him having excluded many paths of execution that don’t really generate new state with a few lines of code. He didn’t show though what the lines of code were and how he determined how the states are not adding any new information.

The next talk was given by the people behind Intelexit was about convincing NSA agents to stop their work and serve democracy instead. They rented a van with big mottoes printed on them, like “Listen to your heart, not to private phone calls”. They also stuck the constitution on the “constitution protection office” which then got torn apart. Another action they did was to fly over the dagger complex and to release flyers about leaving the secret services. They want to have a foundation helping secret service agents to leave their job or to blow the whistle. They also want an anonymous call service where agents can call to talk about their job. I recommend browsing their photos.

Another artsy talk was on a cheap Facebook army. Actually it was on Instagram followers. The presenter is an artist himself and he’d buy Instagram followers for fellow artists “to make them all equal”. He dislikes the fact that society seems to measure the value or quality of art in followers or likes on social media.

Around the CCCongress were also other artsy installations like this one called “machine learning”:

It’s been a fabulous event. I really admire the people organising this event each and every year. Thank you so much and see you next year, hopefully.

On GNOME Governance

On 2015-10-04 it was announced that the governing body of the GNOME Foundation, the Board, has a vacant seat. That body was elected about 15 weeks earlier. The elections are very democratic, they use an STV system to make as many votes as possible count. So far, no replacement has been officially announced. The question of what strategy to use in order to find the replacement has been left unanswered. Let me summarise the facts and comment on the strategy I wish the GNOME project to follow.

The STV system used can be a bit hard to comprehend, at first, so let me show you the effects of an STV system based on the last GNOME elections. With STV systems, the electorate can vote for more than one candidate. An algorithm then determines how to split up the votes, if necessary. Let’s have a look at the last election’s first votes:

elections-initial-votes

We see the initial votes, that is, the number of ballots in which a candidate was chosen first. If a candidate gets eliminated, either because the number of votes is sufficient to get elected or because the candidate has the least votes and cannot be elected anymore, the vote of the ballot is being transferred onto the next candidate.

In the chart we see that the electorate chose to place 19 or more votes onto two candidates who got directly elected. Overall, six candidates have received 13 or more votes. Other candidates have at least 30% less votes than that. If we had a “simple” voting mechanism, the result would be the seven candidates with the most votes. I would have been one of them.

But that’s not how our voting system works, because, as we can see below, the picture of accumulated votes looks differently just before eliminating the last candidate (i.e. me):

elections-final-votes

If you compare the top seven now, you observe that one candidate received votes from other candidates who got eliminated and managed to get in.

We can also see from the result that the final seat was given to 17.12 votes and that the first runner-up had 16.44 votes. So this is quite close. The second runner-up received 10.39 votes, or 63% of the votes of the first runner-up (so the first runner-up received 158% of the votes of the second runner-up).

We can also visually identify this effect by observing a group of eight which accumulated the lion’s share of the votes. It is followed by a group of five.

Now one out of the seven elected candidates needed to drop out, creating a vacancy. The Foundation has a set of rules, the bylaws, which regulate vacancies. They are pretty much geared towards maintaining an operational state even with a few directors left and do not mandate any particular behaviour, especially not to follow the latest election results.

But.

Of course this is not about what is legally possible, because that’s the baseline, the bare minimum we expect to see. The GNOME Foundation’s Board is one of the few democratically elected bodies. It is a well respected entity in industry as well as other Free Software communities. I want it to stay that way. Most things in Free Software are not very democratic; and that’s perfectly fine. But we chose to have a very democratic system around the governing body and I think that it would leave a bad taste if the GNOME Foundation chooses to not follow these rather young election results. I believe that its reputation can be damaged if the impression of forming a cabal, of not listening to its own membership, prevails. I interpret the results as a strong statement of its membership for those eight candidates. GNOME already has to struggle with accusations of it not listening to its users. I’d rather want to get rid of it, not fueling it by extending it to its electorate. We’re in the process of acquiring sponsors for our events and I don’t think it’s received well if the body ignores its own processes. I’d also rather value the membership more rather than producing arguments for those members who chose to not vote at all and try to increase the number of members who actually vote.

LinuxCon Europe 2015 in Dublin

sponsor

The second day was opened by Leigh Honeywell and she was talking about how to secure an Open Future. An interesting case study, she said, was Heartbleed. Researchers found that vulnerability and went through the appropriate vulnerability disclosure channels, but the information leaked although there was an embargo in place. In fact, the bug proofed to be exploited for a couple of months already. Microsoft, her former employer, had about ten years of a head start in developing a secure development life-cycle. The trick is, she said, to have plans in place in case of security vulnerabilities. You throw half of your plan away, anyway, but it’s good to have that practice of knowing who to talk to and all. She gave a few recommendations of which she thinks will enable us to write secure code. Coders should review, learn, and speak up if they feel uncomfortable with a piece of code. Managers could take up on what she called “smells” when people tend to be fearful about their code. Of course, MicroSoft’s SDL also contains many good practices. Her minimal set of practices is to have a self-assessment in place to determine if something needs security review, have an up-front threat modelling that is kept up to date as things evolve, have a security checklist like Mozilla’s or OWASP’s, and have security analysis built into CI process.

Honeywell

The container panel was led by Jeo Zonker Brockmeier who started the discussion by stating that we’ve passed the cloud hype and containers are all the rage now. The first question he shot at the panellists was whether containers were ready at all to be used for production. The panellists were, of course, all in agreement that they are, although the road ahead is still a bit bumpy. One issue, they identified, was image distribution. There are, apparently, two types of containers. Application containers and System containers. Containers used to be a lightweight VM with a full Linux system. Application Containers, on the other hand, only run your database instance. They see application containers as replacing Apps in the future. Other services like databases are thus not necessarily the task of Application containers. One of the panellists was embracing dockerhub as a similar means to RPM or .deb packages for distributing software, but, he said, we need to solve the problem of signing and trusting. He was comparing the trust issue with packages he had installed on his laptop. When he installed a package, he didn’t check what was inside the packages his OS downloaded. Well, I guess he missed that people put trust in the distribution instead of random people on the Internet who put up an image for everybody to download. Anyway, he wanted Docker to be a form of trusted entity like Google or Apple are for their app stores which are distributing applications. I don’t know how they could have missed the dependency resolution and the problem of updating lower level libraries, maybe that problem has been solved already…

Container Panel

Intel’s Mark was talking on how Open Source was fuelling the Internet of Things. He said that trust was an essential aspect of devices that have access to personal or sensitive data like access to your house. He sees the potential in IoT around vaccines which is a connection I didn’t think of. But it makes somewhat sense. He explained that vaccines are quite sensitive to temperature. In developing countries, up to 30% of the vaccines spoil, he said, and what’s worse is that you can’t tell whether the vaccine is good. The IoT could provide sensors on vaccines which can monitor the conditions. In general, he sees the integration of diverse functionality and capabilities of IoT devices will need new development efforts. He didn’t mention what those would be, though. Another big issue, he said, was the updatetability, he said. Even with smaller devices, updates must not be neglected. Also, the ability of these devices to communicate is a crucial component, too, he said. It must not be that two different light bulbs cannot talk to their controller. That sounds like this rant.

IoT opps

Next, Bradley talked about GPL compliance. He mentioned the ThinkPinguin products as a pristine example for a good GPL compliant “complete corresponding source”. He pointed the audience to the Compliance.guide. He said that it’s best to avoid the offer for source. It’s better to include the source with the product, he said, because the offer itself creates ongoing obligations. For example, your call centre needs to handle those requests for the next three years which you are probably not set up to do. Also, products have a typically short lifespan. CCS requires good instructions how to build. It’s not only automated build tools (think configure, make, make install). You should rather think of a script as a movie or play script. The test to use on your potential CCS is to give your source release to another developer of some other department and try whether that person can build the code with your instructions. Anyway, make install does usually not work on embedded anyway, because you need to flash the code. So make sure to include instructions as to how to get the software on the device. It’s usually not required to ship the tool-chain as long as you give instructions as to what compiler to use (and how it was configured). If you do include a compiler, you might end up having more obligations because GCC, for example, is itself GPL licensed. An interesting question came up regarding specialised hardware needed to build or flash the software. You do not need to include anything “tool-chain-like” as long as you have instructions as to the requirements what the user needs to obtain.

Bradley

Samsung’s Krzysztof was talking about USB in Linux. He said, it is the most common external interface in the world. It’s like the Internet in the sense that it provides services in a client-server architecture. USB also provides services. After he explained what the USB actually is how the host interacts with devices, he went on to explain the plug and play aspect of USB. While he provided some rather low-level details of the protocol, it was a rather high level in the sense that it was still the very basic USB protocol. He didn’t talk too much on how exactly the driver is being selected, for example. He went on to explain the BadUSB attack. He said that the vulnerability basically results from the lack of user interaction when plugging in a device and loading its driver. One of his suggestions were to not connect “unknown devices”, which is hard because you actually don’t know what “services” the device is implementing. He also suggested to limit the number of input sources to X11. Most importantly, though, he said that we’d better be using device authorisation to explicitly allow devices before activating them. That’s good news, because we are working on it! There are, he said, patches available for allowing certain interfaces, instead of the whole device, but they haven’t been merged yet.

USB

Jeff was talking about applying Open Source Principles to hardware. He began by pointing out how many processors you don’t get to see, for example in your hard disk, your touchpad controller, or the display controller. These processors potentially exfiltrate information but you don’t really know what they do. Actually, these processors are about owning the owner, the consumer, to then sell them stuff based on that exfiltrated big data, rather than to serve the owner, he said. He’s got a project running to build devices that you not only own, but control. He mentioned IoT as a new battleground where OpenHardware could make an interesting contestant. FPGAs are lego for hardware which can be used easily to build your functionality in hardware, he said. He mentioned that the SuperH patents have now expired. I think he wants to build the “J-Core CPU” in software such that you can use those for your computations. He also mentioned that open hardware can now be what Linux has been to the industry, a default toolkit for your computations. Let’s see where his efforts will lead us. It would certainly be a nice thing to have our hardware based on publicly reviewed designs.

Open Hardware

The next keynote was reserved for David Mohally from Huawei. He said he has a lab in which they investigate what customers will be doing in five to ten years. He thinks that the area of network slicing will be key, because different businesses needs require different network service levels. Think your temperature sensor which has small amounts of data in a bursty fashion while your HD video drone has rather high volume and probably requires low latency. As far as I understood, they are having network slices with smart meters in a very large deployment. He never mentioned what a network slice actually is, though. The management of the slices shall be opened up to the application layer on top for third parties to implement their managing. The landscape, he said, is changing dramatically from what he called legacy closed source applications to open source. Let’s hope he’s right.

Huawei

It was announced that the next LinuxCon will happen in Berlin, Germany. So again in Germany. Let’s hope it’ll be an event as nice as this one.

Intel Booth

HP Booth

LinuxCon Europe – Day 1

attendee registration

The conference was opened by the LinuxFoundation’s Executive Jim Zemlin. He thanked the FSF for their 30 years of work. I was a little surprised to hear that, given the differences between OpenSource and Free Software. He continued by mentioning the 5 Billion Dollar report which calculates how much “value” the projects hosted at Linux Foundation have generated over the last five years. He said that a typical product contains 80%, 90%, or even more Free and Open Source Software. He also extended the list of projects by the Real Time Collaborative project which, as far as I understood, effectively means to hire Thomas Gleisxner to work on the Real Time Linux patches.

world without Linux

The next, very interesting, presentation was given by Sean Gourley, the founder of Quid, a business intelligence analytics company. He talked about the limits of human cognition and how algorithms help to exploit these limits. The limit is the speed of your thinking. He mentioned that studies measured the blood flow across the brain when making decisions which found differences depending on how proficient you are at a given task. They also found that you cannot be quicker than a certain limit, say, 650ms. He continued that the global financial market is dominated by algorithms and that a fibre cable from New York to London costs 300 million dollars to save 5 milliseconds. He then said that these algorithms make decisions at a speed we are unable to catch up with. In fact, the flash crash of 2:45 is inexplicable until today. Nobody knows what happened that caused a loss of trillions of dollars. Another example he gave was the crash of Knight Capital which caused a loss of 440 million dollars in 45 minutes only because they updated their trading algorithms. So algorithms are indeed controlling our lives which he underlined by saying that 61% of the traffic on the Internet is not generated by humans. He suggested that Bots would not only control the financial markets, but also news reading and even the writing of news. As an example he showed a Google patent for auto generating social status updates and how Mexican and Chinese propaganda bots would have higher volume tweets than humans. So the responsibilities are shifting and we’d be either working with an algorithm or for one. Quite interesting thought indeed.

man vs. machine

Next up was IBM on Transforming for the Digital Economy with Open Technology which was essentially a gigantic sales pitch for their new Power architecture. The most interesting bit of that presentation was that “IBM is committed to open”. This, she said, is visible through IBM’s portfolio and through its initiatives like the IBM Academic Initiative. OpenPower Foundation is another one of those. It takes the open development model of software and takes it further to everything related to the Power architecture (e.g. chip design), she said. They are so serious about being open, that they even trademarked “Open by Design“…

IBM sales pitch

Then, the drone code people presented on their drone project. They said that they’ve come a long way since 2008 and that the next years are going to fundamentally change the drone scene as many companies are involved now. Their project, DroneCode, is a stack from open hardware to flight control and the next bigger thing will be CAN support, which is already used in cards, planes, and other vehicles. The talk then moved to ROS, the robot operating system. It is the lingua franca for robotic in academia.

Drones

Matthew Garret talked on securing containers. He mentioned seccomp and what type of features you can deprive processes of. Nowadays, you can also reason about the arguments for the system call in question, so it might be more useful to people. Although, he said, writing a good seccomp policy is hard. So another mechanism to deprive processes of privileges is to set capabilities. It allows you to limit the privileges in a more coarse grained way and the behaviour is not very well defined. The combination of capabilities and seccomp might have surprising results. For example, you might be allowing the mknod() call, but you then don’t have the capability to actually execute it or vice versa. SELinux was next on his list as a mechanism to secure your containers. He said that writing SELinux policy is not the most fun thing in the world. Another option was to run your container in a virtual machine, but you then lose some benefits such as introspection of fine grained control over the processes. But you get the advantages of more isolation. Eventually, he asked the question of when to use what technology. The performance overhead of seccomp, SELinux, and capabilities are basically negligible, he said. Fully virtualising is usually more secure, he said, but the problem is that you have more complex infrastructure which tend to attract bugs. He also mentioned GRSecurity as a means of protecting your Linux kernel. Let’s hope it’ll be merged some day.

Containers

Canonical’s Daniel Watkins then talked on cloud-init. He said it runs in three stages. Init, config, and final in which init sets up networking, config does the actual configuration of your services, final is for the things that eventually need to be done. The clound-init architecture is apparently quite flexible and versatile. You can load your own configuration and user-data modules so that you can set up your cloud images as you like. cloud-init allows you get rid of custom images such that you can have confidence in your base image working as intended. In fact, it’s working not only with BSDs but also with Windows images. He said, it is somewhat similar to tools like Ansible, so if you are already happily using one of those, you’re good.

cloud-init

An entertaining talk was given by Florian Haas on LXC and containers. He talked about tricks managing your application containers and showed a problem when using a naive chroot which is that you get to see the host processes and networking information through the proc filesystem. With LXC, that problem is dealt with, he said. But then you have a problem when you update the host, i.e. you have to take down the container while the upgrade is running. With two nodes, he said, you can build a replication setup which takes care of failing over the node while it is upgrading. He argued that this is interesting for security reasons, because you can upgrade your software to not be vulnerable against “the latest SSL hack” without losing uptime. Or much of it, at least… But you’d need twice the infrastructure to run production. The future, he said, might be systemd with it’s nspawn tool. If you use systemd all the way, then you can use fleet to manage the instances. I didn’t take much away, personally, but I guess managing containers is all the rage right now.

LXC

Next up was Michael Hausenblas on Filesystems, SQL and NoSQL with Apache Mesos. I had briefly heard of Mesos, but I really didn’t know what it was. Not that I’m an expert now, but I guess I know that it’s a scheduler you can use for your infrastructure. Especially your Apache stack. Mesos addresses the problem of allocating resources to jobs. Imagine you have several different jobs to execute, e.g. a Web server, a caching layer, and some number crunching computation framework. Now suppose you want to increase the number crunching after hours when the Web traffic wears off. Then you can tell Mesos what type of resources you have and when you need that. Mesos would then go off and manage your machines. The alternative, he said, was to manually SSH into the machines and reprovision them. He explained some existing and upcoming features of Mesos. So again, a talk about managing containers, machines, or infrastructure in general.

Mesos

The following Kernel panel didn’t provide much information to me. The moderation felt a bit stiff and the discussions weren’t really enganged. The topics mainly circled around maintainership, growth, and community.

Kernel Panel

SuSE’s Ralf was then talking on DevOps. He described his DevOps needs based on a cycle of planning, coding, building, testing, releasing, deploying, operating, monitoring, and then back to planning. When bringing together multiple projects, he said, they need to bring two independent integration loops together. When doing DevOps with a customer, he mentioned some companies who themselves provide services to their customers. In order to be successful when doing DevOps, you need, he said, Smart tools, Process automation, Open APIs, freedom of choice, and quality control are necessary. So I guess he was pitching for people to use “standards”, whatever that exactly means.

SuSE DevOps

I awaited the next talk on Patents and patent non aggression. Keith Bergelt, from OIN talked about ten years of the Open Invention Network. He said that ten years ago Microsoft sued Linux companies to hinder Linux distribution. Their network was founded to embrace patent non-aggression in the community. A snarky question would have been why it would not be simply enough to use GPLv3, but no questions were admitted. He said that the OIN has about 1750 licensees now with over a million patents being shared. That’s actually quite impressive and I hope that small companies are being protected from patent threats of big players…

OIN

That concluded the first day. It was a lot of talks and talking in the hallway. Video recordings are said to be made available in a couple of weeks. So keep watching the conference page.

Sponsors

IBM Booth

mrmcd 2015

I attended this year’s mrmcd, a cozy conference in Darmstadt, Germany. As in the previous years, it’s a 350 people event with a relaxed atmosphere. I really enjoy going to these mid-size events with a decent selection of talks and attentive guests.

The conference was opened by Paolo Ferri’s Keynote. He is from the ESA and gave a very entertaining talk about the Rosetta mission. He mentioned the challenges involved in launching a missile for a mission to be executed ten years later. It was very interesting to see what they have achieved over a few hundred kilometers distance. Now I want to become a space pilot, too πŸ˜‰

The next talk was on those tracking devices for your fitness. Turns out, that these tracking devices may actually track you and that they hence pose a risk for your privacy. Apparently fraud is another issue for insurance companies in the US, because some allow you to get better rates when you upload your fitness status. That makes those fitness trackers an interesting target for both people wanting to manipulate their walking statistics to get a better premium for health care and attackers who want to harm someone by changing their statistics.

Concretely, he presented, these devices run with Bluetooth 4 (Smart) which allows anyone to see the device. In addition, service discovery is also turned on which allows anyone to query the device. Usually, he said, no pin is needed anymore to connect to the device. He actually tested several devices with regard to several aspects, such as authentication, what data is stored, what is sent to the Internet and what security mechanisms the apps (for a phone) have been deployed. Among the tested devices were the XiaomMi Miband, the Fitbit, or the Huawei TalkBand B1. The MiBand was setting a good example by disabling discovery once someone has connected to the device. It also saves the MAC address of the phone and ignores others. In order to investigate the data sent between a phone and a band, they disassembled the Android applications.

Muzy was telling a fairytale about a big data lake gone bad.
He said that data lakes are a storage for not necessarily structured data which allow extraction of certain features in an on-demand fashion and that the processed data will then eventually end up in a data warehouse in a much more structured fashion. According to him, data scientists then have unlimited access to that data. That poses a problem and in order to secure the data, he proposed to introduce another layer of authorization to determine whether data scientists are allowed to access certain records. That is a bit different from what exists today: Encrypt data at rest and encrypt in motion. He claimed that current approaches do not solve actual problems, because of, e.g. key management questions. However, user rights management and user authorization are currently emerging, he said.

Later, he referred on Apache Spark. With big data, he said, you need to adapt to a new programming paradigm away from a single worker to multiple nodes, split up work, handling errors and slow tasks. Map reduce, he said, is one programming model. A popular framework for writing in a such a paradigm is Apache’s Hadoop, but there are more. He presented Apache Spark. But it only begins to make sense if you want to analyse more data than you can fit in your RAM, he said. Spark distributes data for you and executes operations on it in a parallel manner, so you don’t need to care about all of that. However, not all applications are a nice fit for Spark, he mentioned. He gave high performance weather computations as such as example. In general, Spark fits well if IPC not required.

The conference then continued with two very interesting talks on Bahn APIs. derf presented on public transport APIs like EFA, HAFAS, and IRIS. These APIs can do things like routing from A to B or answer questions such as which trains are running from a given station. However, these APIs are hardly documented. The IRIS-system is the internal Bahn-API which is probably not supposed to be publicly available, but there is a Web page which exposes (bits) of the API. Others have used that to build similar, even more fancy things. Anyway, he used these APIs to query for trains running late. The results were insightful and entertaining, but have not been released to the general public. However, the speakers presented a way to query all trains in Germany. Long story short: They use the Zugradar which also contains the geo coordinates. They acquired 160 millions datasets over the last year which is represented in 80GB of JSON. They have made their database available as ElasticSearch and Kibana interface. The code it at Github. That is really really good stuff. I’m already in the process of building an ElasticSearch and Spark cluster to munch on that data.

Yours truly also had a talk. I was speaking on GNOME Keysign. Because the CCC people know how to run a great conference, we already have recordings (torrent). You get the slides here. Those of you who know me don’t find the content surprising. To all others: GNOME Keysign is a tool for signing OpenPGP Keys. New features include the capability to sign keys offline, that is, you present a file with a key and you have it signed following best practices.

Another talk I had, this time with a colleague of mine, was on Searchable Encryption. Again, the Video already exists. The slides are probably less funny than they were during the presentation, but hopefully still informative enough to make some sense out of them. Together we mentioned various existing cryptographic schemes which allow you to have a third party execute search operations on your encrypted data on your behalf. The most interesting schemes we showed were Song, Wagner, Perrig and Cash et al..

Thanks again to the organisers for this nice event! I’m looking forward to coming back next year.

Open Source Hong Kong 2015

Recently, I’ve been to Hong Kong for Open Source Hong Kong 2015, which is the heritage of the GNOME.Asia Summit 2012 we’ve had in Hong Kong. The organisers apparently liked their experience when organising GNOME.Asia Summit in 2012 and continued to organise Free Software events. When talking to organisers, they said that more than 1000 people registered for the gratis event. While those 1000 were not present, half of them are more realistic.

Olivier from Amazon Web Services Klein was opening the conference with his keynote on Big Data and Open Source. He began with a quote from RMS: about the “Free” in Free Software referring to freedom, not price. He followed with the question of how does Big Data fit into the spirit of Free Software. He answered shortly afterwards by saying that technologies like Hadoop allow you to mess around with large data sets on commodity hardware rather than requiring you to build a heavy data center first. The talk then, although he said it would not, went into a subtle sales pitch for AWS. So we learned about AWS’ Global Infrastructure, like how well located the AWS servers are, how the AWS architecture helps you to perform your tasks, how everything in AWS is an API, etc. I wasn’t all too impressed, but then he demoed how he uses various Amazon services to analyse Twitter for certain keywords. Of course, analysing Twitter is not that impressive, but being able to do that within a few second with relatively few lines of code impressed me. I was also impressed by his demoing skills. Of course, one part of his demo failed, but he was reacting very professionally, e.g. he quickly opened a WiFi hotspot on his phone to use that as an alternative uplink. Also, he quickly grasped what was going on on his remote Amazon machine by quickly glancing over netstat and ps output.

The next talk I attended was on trans-compiling given by Andi Li. He was talking about Haxe and how it compiles to various other languages. Think Closure, Scala, and Groovy which all compile to Java bytecode. But on steroids. Haxe apparently compiles to code in another language. So Haxe is a in a sense like Emcripten or Vala, but a much more generic source-to-source compiler. He referred about the advantages and disadvantages of Haxe, but he lost me when he was saying that more abstraction is better. The examples he gave were quite impressive. I still don’t think trans-compiling is particularly useful outside the realm of academic experiments, but I’m still intrigued by the fact that you can make use of Haxe’s own language features to conveniently write programs in languages that don’t provide those features. That seems to be the origin of the tool: Flash. So unless you have a proper language with a proper stdlib, you don’t need Haxe…

From the six parallel tracks, I chose to attend the one on BDD in Mediawiki by Baochuan Lu. He started out by providing his motivation for his work. He loves Free/Libre and Open Source software, because it provides a life-long learning environment as well as a very supportive community. He is also a teacher and makes his students contribute to Free Software projects in order to get real-life experience with software development. As a professor, he said, one of his fears when starting these projects was being considered as the expertβ„’ although he doesn’t know much about Free Software development. This, he said, is shared by many professors which is why they would not consider entering the public realm of contributing to Free Software projects. But he reached out to the (Mediawiki) community and got amazing responses and an awful lot of help.
He continued by introducing to Mediawiki, which, he said, is a platform which powers many Wikimedia Foundation projects such as the Wikipedia, Wikibooks, Wikiversity, and others. One of the strategies for testing the Mediawiki is to use Selenium and Cucumber for automated tests. He introduced the basic concepts of Behaviour Driven Development (BDD), such as being short and concise in your test cases or being iterative in the test design phase. Afterwards, he showed us how his tests look like and how they run.

The after-lunch talk titled Data Transformation in Camel Style was given by Red Hat’s Roger Hui and was concerned with Apache Camel, an “Enterprise Integration” software. I had never heard of that and I am not much smarter know. From what I understood, Camel allows you to program message workflows. So depending on the content of a message, you can make it go certain ways, i.e. to a file or to an ActiveMQ queue. The second important part is data transformation. For example, if you want to change the data format from XML to JSON, you can use their tooling with a nice clicky pointy GUI to drag your messages around and route them through various translators.

From the next talk by Thomas Kuiper I learned a lot about Gandi, the domain registrar. But they do much more than that. And you can do that with a command line interface! So they are very tech savvy and enjoy having such customers, too. They really seem to be a cool company with an appropriate attitude.

The next day began with Jon’s Kernel Report. If you’re reading LWN then you haven’t missed anything. He said that the kernel grows and grows. The upcoming 4.2 kernel, probably going to be released on August 23rd. might very well be the busiest we’ve seen with the most changesets so far. The trend seems to be unstoppable. The length of the development cycle is getting shorter and shorter, currently being at around 63 days. The only thing that can delay a kernel release is Linus’ vacation… The rate of volunteer contribution is dropping from 20% as seen for 2.6.26 to about 12% in 3.10. That trend is also continuing. Another analysis he did was to look at the patches and their timezone. He found that that a third of the code comes from the Americas, that Europe contributes another third, and so does Australasia. As for Linux itself, he explained new system calls and other features of the kernel that have been added over the last year. While many things go well and probably will continue to do so, he worries about the real time Linux project. Real time, he said, was the system reacting to an external event within a bounded time. No company is supporting the real time Linux currently, he said. According to him, being a real time general purpose kernel makes Linux very attractive and if we should leverage that potential. Security is another area of concern. 2014 was the year of high profile security incidents, like various Bash and OpenSSL bugs. He expects that 2015 will be no less interesting. Also because the Kernel carries lots of old and unmaintained code. Three million lines of code haven’t been touch in at least ten years. Shellshock, he said, was in code more than 20 years old code. Also, we have a long list of motivated attackers while not having people working on making the Kernel more secure although “our users are relying on us to keep them safe in a world full of threats”

The next presentation was given by Microsoft on .NET going Open Source. She presented the .NET stack which Microsoft has open sourced at the end of last year as well as on Visual Studio. Their vision, she said, is that Visual Studio is a general purpose IDE for every app and every developer. So they have good Python and Android support, she said. A “free cross platform code editor” named Visual Studio Code exists now which is a bit more than an editor. So it does understand some languages and can help you while debugging. I tried to get more information on that Patent Grant, but she couldn’t help me much.

There was also a talk on Luwrain by Michael Pozhidaev which is GPLv3 software for blind people. It is not a screen reader but more of a framework for writing software for blind people. They provide an API that guarantees that your program will be accessible without the application programmer needing to have knowledge of accessibility technology. They haven’t had a stable release just yet, but it is expected for the end of 2015. The demo unveiled some a text oriented desktop which reads out text on the screen. Several applications already exist, including a file editor and a Twitter client. The user is able to scroll through the text by word or character which reminded of ChorusText I’ve seen at GNOME.Asia Summit earlier this year.

I had the keynote slot which allowed me to throw out my ideas for the future of the Free Software movement. I presented on GNOME and how I see that security and privacy can make a distinguishing feature of Free Software. We had an interesting discussion afterwards as to how to enable users to make security decisions without prompts. I conclude that people do care about creating usable secure software which I found very refreshing.

Both the conference and Hong Kong were great. The local team did their job pretty well and I am proud that the GNOME.Asia Summit in Hong Kong inspired them to continue doing Free Software events. I hope I can be back soon πŸ™‚

GNOME.Asia Summit 2015 in Depok, Indonesia

I have just returned from the GNOME.Asia Summit 2015 in Depok, Indonesia.

Out of the talks, the most interesting talk I have seen, I think, was the one from Iwan S. Tahari, the manager of a local shoe producer who also sponsored GNOME shoes!

Open Source Software in Shoes Industry” was the title and he talked about how his company, FANS Shoes, est 2001, would use “Open Source”. They are also a BlankOn Linux partner which seems to be a rather big thing in Indonesia. In fact, the keynote presentation earlier was on that distribution and mentioned how they try to make it easier for people of their culture to contribute to Free Software.
Anyway, the speaker went on to claim that in Indonesia, they have 82 million Internet users out of which 69 million use Facebook. But few use “Open Source”, he asserted. The machines sold ship with either Windows or DOS, he said. He said that FANS preferred FOSS because it increased their productivity, not only because of viruses (he mentioned BRONTOK.A as a pretty annoying example), but also because of the re-installation time. To re-install Windows costs about 90 minutes, he said. The average time to install Blank On (on an SSD), was 15 minutes. According to him, the install time is especially annoying for them, because they don’t have IT people on staff. He liked Blank On Linux because it comes with “all the apps” and that there is not much to install afterwards. Another advantage he mentioned is the costs. He estimated the costs of their IT landscape going Windows to be 136,57 million Rupees (12000 USD). With Blank On, it comes down to 0, he said. That money, he can now spend on a Van and a transporter scooter instead. Another feature of his GNU/Linux based system, he said, was the ability to cut the power at will without stuff breaking. Indonesia, he said, is known for frequent power cuts. He explicitly mentioned printer support to be a major pain point for them.

When they bootstrapped their Free Software usage, they first tried to do Dual Boot for their 5 employees. But it was not worth their efforts, because everybody selected Windows on boot, anyway. They then migrated the accounting manager to a GNU/Linux based operating system. And that laptop still runs the LinuxMint version 13 they installed… He mentioned that you have to migrate top down, never from bottom to top, so senior management needs to go first. Later Q&A revealed that this is because of cultural issues. The leaders need to set an example and the workers will not change unless their superiors do. Only their RnD department was hard to migrate, he said, because they need to be compatible to Corel Draw. With the help of an Indonesian Inkscape book, though, they managed to run Inkscape. The areas where they lack support is CAD (think AutoCAD), Statistics (think SPSS), Kanban information system (like iceScrum), and integration with “Computer Aided Machinery”. He also identified the lack of documentation to be a problem not only for them, but for the general uptake of Free Software in Indonesia. In order to amend the situation, they provide gifts for people writing documentation or books!

All in all, it was quite interesting to see an actual (non-computer) business running exclusively on Free Software. I had a chat with Iwan afterwards and maybe we can get GNOME shaped flip-flops in the future πŸ™‚

The next talk was given by Ahmad Haris with GNOME on an Android TV Dongle. He brought GNOME to those 30 USD TV sticks that can turn your TV into a “smart” device. He showed various commands and parameters which enable you to run Linux on these devices. For the reasons as to why put GNOME on those devices, he said, that it has a comparatively small memory footprint. I didn’t really understand the motivation, but I blame mostly myself, because I don’t even have a TV… Anyway, bringing GNOME to more platforms is good, of course, and I was happy to see that people are actively working on bringing GNOME to various hardware.

Similarly, Running GNOME on a Nexus 7 by Bin Li was presenting how he tried to make his Android tabled run GNOME. There is previous work done by VadimRutkovsky:

He gave instructions as to how to create a custom kernel for the Nexus 7 device. He also encountered some problems, such as compilations errors, and showed how he fixed them. After building the kernel, he installed Arch-Linux with the help of some scripts. This, however, turned out to not be successful, so he couldn’t run his custom Arch Linux with GNOME.
He wanted to have a tool like “ubuntu-device-flash” such that hacking on this device is much easier. Also, downloading and flashing a working image is too hard for casually hacking on it, he said.

A presentation I was not impressed by was “In-memory computing on GNU/Linux”. More and more companies, he said, would be using in-memory computing on a general operating system. Examples of products which use in-memory computing were GridGain, SAP HANA, IBM DB2, and Oracle 12c. These products, he said, allow you to make better and faster decision making and to avoid risks. He also pointed out that you won’t have breaking down hard-drives and less energy consumption. While in-memory is blazingly fast, all your data is lost when you have a power failure. The users of big data, according to him, are businesses, academics, government, or software developers. The last one surprised me, but he didn’t go into detail as to why it is useful for an ordinary developer. The benchmarks he showed were impressive. Up to hundred-fold improvements for various tests were recorded in the in-memory setting compared to the traditional on-disk setting. The methodology wasn’t comprehensive, so I am yet not convinced that the convoluted charts show anything useful. But the speaker is an academic, so I guess he’s got at least compelling arguments for his test setup. In order to build a Linux suitable for in-memory computation, they installed a regular GNU/Linux on a drive and modify the boot scripts such that the disk will be copied into a tmpfs. I am wondering though, wouldn’t it be enough to set up a very aggressive disk cache…?

I was impressed by David’s work on ChorusText. I couldn’t follow the talk, because my Indonesian wasn’t good enough. But I talked to him privately and he showed me his device which, as far as I understand, is an assistive screen reader. It has various sliders with tactile feedback to help you navigating through text with the screen reader. Apparently, he has low vision himself so he’s way better suited to tell whether this device is useful. For now, I think it’s great and I hope that it helps more people and that we can integrate it nicely into GNOME.

My own keynote went fairly well. I spent my time with explaining what I think GNOME is, why it’s good, and what it should become in the future. If you know GNOME, me, and my interests, then it doesn’t come as a surprise that I talked about the history of GNOME, how it tries to bring Free computing to everyone, and how I think security and privacy will going to matter in the future. I tried to set the tone for the conference, hoping that discussions about GNOME’s future would spark in the coffee breaks. I had some people discussing with afterwards, so I think it was successful enough.

When I went home, I saw that the Jakarta airport runs GNOME 3, but probably haven’t done that for too long, because the airport’s UX is terrible. In fact, it is one of the worst ones I’ve seen so far. I arrived at the domestic terminal, but I didn’t know which one it was, i.e. its number. There were no signs or indications that tell you in which terminal you are in. Let alone where you need to go to in order to catch your international flight. Their self-information computer system couldn’t deliver. The information desk was able to help, though. The transfer to the international terminal requires you to take a bus (fair enough), but whatever the drivers yell when they stop is not comprehensible. When you were lucky enough to get out at the right terminal, you needed to have a printed version of your ticket. I think the last time I’ve seen this was about ten years ago in Mumbai. The airport itself is big and bulky with no clear indications as to where to go. Worst of all, it doesn’t have any air conditioning. I was not sure whether I had to pay the 150000 Rupees departure tax, but again, the guy at the information desk was able to help. Although I was disappointed to learn that they won’t take a credit card, but cash only. So I drew the money out of the next ATM that wasn’t broken (I only needed three attempts). But it was good to find the non-broken ATM, because the shops wouldn’t take my credit card, either, so I already knew where to get cash from. The WiFi’s performance matches the other airport’s infrastructure well: It’s quite dirty. Because it turned out that the information the guy gave me was wrong, I invested my spare hundred somewhat thousands rupees in dough-nuts in order to help me waiting for my 2.5 hours delayed flight. But I couldn’t really enjoy the food, because the moment I sat on any bench, cockroaches began to invade the place. I think the airport hosts the dirtiest benches of all Indonesia. The good thing is, that they have toilets. With no drinkable water, but at least you can wash your hands. Fortunately, my flight was only two hours late, so I could escape relatively quickly. I’m looking forward to going back, but maybe not via CGK πŸ˜‰

All in all, many kudos to the organisers. I think this year’s edition was quite successful.

Sponsored by GNOME!

FOSDEM 2015

It’s winter again and it was clear that FOSDEM was coming. However, preparation fell through the cracks, at least for me, mainly because my personal life is fast-paced at the moment. We had a table again, and our EventsBox, which is filled with goodness to demo GNOME, made its way from Gothenburg, where I actually carried it to a couple of months ago.

Unfortunately though, we didn’t have t-shirts to sell. We do have boxes of t-shirts left, but they didn’t make it to FOSDEM :-\ So this FOSDEM didn’t generate nearly as much revenue as the last years. It’s a pity that this year’s preparation was suboptimal. I hope we can improve next year. Were able to get rid of other people’s things, though πŸ˜‰ Like last year, the SuSE people brought beer, but it was different this time. Better, even πŸ˜‰

The fact that there wasn’t as much action at our booth as last years, I could actually attend talks. I was able to see Sri and Pam talking on the Groupon incident that shook us up a couple of months ago. It was really nice to see her, because I wanted to shake hands and say thanks. She did an amazing job. Interestingly enough, she praised us, the GNOME Foundation’s Board of Directors, for working very professionally. Much better than any client she has worked with. I am surprised, because I didn’t really have the feeling we were acting as promptly as we could. You know, we’re volunteers, after all. Also, we didn’t really prepare as much as we could have which led to some things being done rather spontaneously. Anyway, I take that as a compliment and I guess that our work can’t be all too bad. The talk itself showed our side of things and, if you ask me, was painting things in a too bright light. Sure, we were successful, but I attribute much of that success to network effects and a bit of luck. I don’t think we could replicate that success easily.

GNOME’s presence at FOSDEM was not too bad though, despite the lack of shirts. We had a packed beer event and more talks by GNOMEy people. The list includes Karen‘s keynote, Benzo‘s talk on SDAPDS, and Sri‘s talk on GNOME’s impact on the Free Software ecosystem. You can find more here.

A talk that I did see was on improving the keysigning situation. I really mean to write about this some more. For now, let me just say that I am pleased to see people working on solutions. Solutions to a problem I’m not sure many people see and that I want to devote some time for explaining it, i.e. in s separate post. The gist is, that contemporary “keysigning parties” come with non-negligible costs for both, the organiser and the participant. KeySigningPartyTools were presented which intend to improve they way things are currently done. That’s already quite good as it’ll reduce the number of errors people typically make when attending such a party.

However, I think that we need to rethink keysigning. Mostly, because the state of the art is a massive SecOps fail. There is about a gazillion traps to be avoided and many things don’t actually make so much sense. For example, I am unable to comprehend why we are muttering a base16 encoded version of your 160 bit fingerprint to ourselves. Or why we must queue outside in the cold without being able to jump the queue if a single person is a bit slow, because then everybody will be terribly confused and the whole thing taking even longer. Or why we need to do everything on paper (well, I know the arguments: Your computer can be hacked, be social, yadda yadda). I did actually give a talk on rethinking the keysigning problem (slides). It’s about a project that I have only briefly mentioned here and which I should really write about in the near future. GNOME Keysign intends to be less of a SecOps fail by letting the scan a barcode and click “next”. The rest will be operations known to the user such as sending an email. No more manually comparing fingerprints. No more leaking data to the Internet about who you want to contact. No more MITM attacks against your OpenPGP installation. No more short key ids that you accidentally use or because you mistyped a letter of the fingerprint. No more editing raw Perl in order to configure your keysigning tool. The talk went surprisingly well. I actually expected the people in the security devroom to be mad when someone like me is taking their perl and their command line away. I received good questions and interesting feedback. I’ll follow up here with another post once real-life lets me get to it.

Brussels itself is a very nice city. We were lucky, I guess, because we had some sunshine when we were walking around the city. I love the plethora of restaurants. And I like that Brussels is very open and cultural. Unfortunately, the makerspace was deserted when we arrived, but it is was somewhat expected as it was daytime… I hope to return again and check it out during the night πŸ˜‰

Creative Commons Attribution-ShareAlike 3.0 Unported
This work by Muelli is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported.