Talking at ARES 2019 in Canterbury, UK

It’s conference season and I attended the International Conference on Availability, Reliability, and Security (ARES) in Canterbury, UK. (note that in the future, the link might change to something more sustainable)

A representative of the Kent University opened the event. It is the UK’s European University, he said, with 20000 students, many of them being from other countries. He attributed that to the proximity to mainland Europe. Indeed it’s only an hour away (if you don’t have to go back to London to catch a direct Eurostar rather than one that stops in, say, Ashford). The conference was fairly international, too, with 230 participants from 33 countries. As an academic conference, they care about the “acceptance rate” which, in this case, was at 20.75%. Of course, he could have mentioned any number, because it’s impossible to verify.

The opening keynote was given by Alistair MacWilson from Bletchley Park. Yeah, the same Bletchley Park which Alan Turing worked at. He talked about the importance of academia in closing the cybersecurity talent gap. He said that the deficit of people knowing anything about cybersecurity skills is 3.3M with 380k alone in Europe, but APAC being desperately short of 2.1M professionals. All that is good news for us youngsters in the business, but not so good, he said, if you rely on the security of your IT infrastructure… It’s not getting any better, he said, considering that the number of connected devices and the complexity of our infrastructure is rising. You might think, he said, that highly technical skills are required to perform cybersecurity tasks. But he mentioned that 88% of the security problems that the global 5000 companies have stem from human factors. Inadequate and unfocussed training paired with insufficient resources contribute to that problem, he said. So if you don’t get continuous training then you will fall behind with your skill-set.

There were many remarkable talks and the papers can be found online; albeit behind a paywall. But I expect SciHub to have copies and authors to be willing to share their work if you ask. Anyway, one talk I remember was about delivering Value Added Services to electric vehicle charging. They said that it is currently not very attractive for commercial operators to provide charging stations, because the margin is low. Hence, additional monetisation in form of Value Added Services (VAS) could be added. They were thinking of updating the software of the vehicle while it is charging. I am not convinced that updating the car’s firmware makes a good VAS but I’m not an economist and what do I know about the world of electric vehicles. Anyway, their proposal to add VAS to the communication protocol might be justified, but their scenario of delivering software updates over that channel seems like a lost opportunity to me. Software updates are currently the most successful approach to protecting users, so it seems warranted to have an update protocol rather than a VAS protocol for electric vehicles.

My own talk was about using the context and provenance of USB-borne events (illegal public copy) to mitigate attacks via that channel. So general idea, known to readers of my blog, is to take the state of the session into account when dealing with events stemming from USB devices. More precisely, when your session is locked, don’t automatically load drivers for a new USB device. Your session is locked, after all. You’re not using your machine and cannot insert a new device. Hence, the likelihood of someone else maliciously inserting a device is higher than when your session is unlocked. Of course, that’s only a heuristic and some will argue that they frequently plug devices into their machine when it’s locked. Fair enough. I argue that we need to be sensitive and change as little as possible to the user’s way of working with the machine to get high acceptance rates. Hence, we need to be careful when devices like keyboards are inserted. Another scenario is the new network card that has been attached via USB. It should be more suspicious to accept that nameserver that came from the new network card’s DHCP server when the system has a perfectly working network configuration (and the DHCP response did not contain a default gateway). Turns out, that those attacks are mounted right now in real-life and we have yet to find defences that we can deploy on a large scale.

It’s been a nice event, even though the sandwiches for lunch got boring after a few days ;-) I am happy to have met researchers from other areas and I hope to stay in touch.

Speaking at FOSDEM 2019 in Belgium, Brussels

This year I spoke at FOSDEM again. It became sort of a tradition to visit Brussels in winter and although I was tempted to break with the tradition, I came again.

I had two talks at this year’s FOSDEM, both in the Security track. One on my work with Ludovico on protecting against rogue USB devices and another one on tracking users with core Internet protocols. We got a bigger room this year, but it was still packed. Despite the projector issues, which seem to be appearing more often recently, the talks went well. The audience was very engaged and we had a lively discussion in the hallway. In fact, the discussion was extremely fruitful because we were told about work in similar areas which we ought to check out.

For our USB talk I thought I’d set the mindset first and explain how GNOME thinks it should interact with the user. That is, the less interaction is required, the better it is. Especially for a security system where the user may not know what to do. In fact, we try to just make it work™ without the user having to do anything. That is vastly different from other projects are doing. In particular, Kaspersky wants you to enter a PIN when attaching a new keyboard and the USBGuard dialogue is not necessarily suitable for our users.

View post on imgur.com

In the talk on Internet protocols I mainly showed that optimisations regarding the latency need to be balanced against the privacy needs of the users. Because in order to reduce latency you usually share a state with the other end which tends to be indicated through some form of token or cookie. And because you have this shared state, the server can discriminate you. What you can try to do is to not send the token or cookie in first place. Of course, then you lose the optimisation. In turns out, however, that TLS 1.3 can be as fast, i.e. 1 round trip, and that the latency is not better or worse if you resume a previous session. Note how I talk about latency only and ignore other aspects such as CPU cycles spent for the connection establishment. Another strategy is to not send the token unencryptedly. With TLS 1.2 the Session Ticket is sent without any form of encryption which enables a network-based attacker to see your token and correlate your requests. The same is true for other optimisations such as TCP Fast Open. I have also presented our approach to balancing privacy and latency, namely a patched WolfSSL and Linux. With these patched versions we send the TCP Fast Open cookie via TLS s.t. the attacker cannot see it when we request it.

The conference was super busy and I was super busy with talking to people. It’s amazing how fast time flies when you are engaged in interesting discussions. I bumped from one person into another and then it was already time for dinner. The one talk I’ve seen was done by my colleague on preventing cryptographic misuse of libraries. More precisely, an attempt to provide sane APIs which make shooting yourself in the foot very hard.

Talking at PETCon2018 in Hamburg, Germany and OpenPGP Email Summit in Brussels, Belgium

Just like last year, I managed to be invited to the Privacy Enhancing Technologies Conference to talk about GNOME. First, Simone Fischer-Huebner from Karlstadt University talked about her projects which are on the edge of security, cryptography, and usability, which I find a fascinating area to be in. She presented outcomes of her Prismacloud project which also involves fancy youtube videos…

I got to talk about how I believe GNOME is in a good position make a safe and secure operating system. I presented some case studies and reported on the challenges that I see. For example, Simone mentioned in her talk that certain users don’t trust a software if it is too simple. Security stuff must be hard, right?! So how do measure the success of your security solution? Obviously you can test with users, but certain things are just very hard to get users for. For example, testing GNOME Keysign requires a user not only with a set up MUA but also with a configured GnuPG. This is not easy to come by. The discussions were fruitful and I got sent a few references that might be useful in determining a way forward.

OpenPGP Email Summit

I also attended the OpenPGP Email Summit in Brussels a few weeks ago. It’s been a tiny event graciously hosted by a local company. Others have written reports, too, which are highly interesting to read.

It’s been an intense weekend with lots of chatting, thinking, and discussing. The sessions were organised in a bar-camp style manner. That is, someone proposed what to discuss about and the interested parties then came together. My interest was in visual security indication, as triggered by this story. Unfortunately, I was lured away by another interesting session about keyserver and GDPR compliance which ran in parallel.

For the plenary session, Holger Krekel reported on the current state of Delta.Chat. If you haven’t tried it yet, give it a go. It’s trying to provide an instant messaging interface with an email transport. I’ve used this for a while now and my experience is mixed. I still get to occasional email I cannot decrypt and interop with my other MUA listening on the very same mailbox is hit and miss. Sometimes, the other MUA snatches the email before Delta.chat sees it, I think. Otherwise, I like the idea very much. Oh, and of course, it implements Autocrypt, so your clients automatically encrypt the messages.

Continuing the previous talk, Azul went on to talk about countermitm, an attempt to overcome Autocrypt 1.0‘s weaknesses. This is important work. Because without the vision of how to go from Autocrypt Level 1 to Level 2, you may very well question to usefulness. As of now, Emails are encrypted along their way (well. Assuming MTA-STS) and if you care about not storing plain text messages in your mailbox, you could encrypt them already now. Defending against active attackers is hard so having sort of a plan is great. Anyway, countermitm defines “verified groups” which involves a protocol to be run via Email. I think I’ve mentioned earlier that I still think that it’s a bit a sad that we don’t have the necessary interfaces to run protocols over Email. Outlook, I think, can do simple stuff like voting for of many options or retracting an email. I would want my key exchange to be automated further, i.e. when GNOME Keysign sends the encrypted signature, I would want the recipient to decrypt it and send it back.

Phil Zimmermann, the father of PGP, mentioned a few issues he sees with the spec, although he also said that it’s been a while that he was deeply into this matter. He wanted the spec to be more modern and more aggressively pushing for today’s cryptography rather than for the crypto of the past. And in fact, he wants the crypto of tomorrow. Now. He said that we know that big agencies are storing message today for later analyses. And we currently have no good way of having what people call “perfect forward secrecy” so a future key compromise makes the messages of today readable. He wants post quantum crypto to defeat the prying eyes. I wonder whether anybody has implemented pq-schemes for GnuPG, or any other OpenPGP implementation, yet.

My takeaways are: The keyserver network needs a replacement. Currently, it is used for initial key discovery, key updates, and revocations. I think we can solve some of these problems better if we separate them. For example, revocations are pretty much a fire and forget thing whereas other key updates are not necessarily interesting in twenty years from now. Many approaches for making initial key discovery work have been proposed. WKD, Autocrypt, DANE, Keybase, etc. Eventually one of these approaches wins the race. If not, we can still resort back to a (plain) list of Email addresses and their key ids. That’s as good or bad as the current situation. For updates, the situation is maybe not as bad. But we might still want to investigate how to prevent equivocation.

Another big thing was deprecating cruft in the spec to move a bit faster in terms of cryptography and to allow implementers to get a compliant program running (more) quickly. Smaller topics were the use of PQ safe algorithm and exploitation of backwards incompatible changes to the spec, i.e. v5 keys with full fingerprints. Interestingly enough, a trimmed down spec had already been developed here.

Speaking at FIfFKon 18 in Berlin, Germany

I was invited to be a panellist at this year’s FIfFKon in Berlin, Germany. While I said hi to the people at All Systems Go!, my main objective in Berlin was to attend the annual conference of the FIfF, the association for people in computing caring about peace and social responsibility.

The most interesting talk for me was held by Rainer Mühlhoff on the incapacitation if the user. The claim, very broadly speaking, is that providing a usable interface prevents your users from learning how to operate the machine properly. Or in other words: Making an interface for dumb people will attract dumb people and not make them smarter. Of course, he was more elaborate than that.

He presented Android P which nudges the user into a certain behaviour. In Android, you get to see for how long you have used an app and encourages you to stop. Likewise, Google nudges you into providing your phone number for account recovery. The design of that dialogue makes it hard to hit the button to proceed without providing the number. Those nudges do not prevent a choice to be made, they just make it more likely that the user makes one particular choice. The techniques are borrowed from public policy making and commercial settings. So the users are being an instrument themselves rather than a sovereign entity.

Half way through his talk he made a bit of a switch to “sealed interfaces” and presented the user interface of a vacuum cleaner. In the beginning, the nozzle had a “bristly” or “flat” setting, depending on whether you wanted to use it on a carpet or a flat surface. Nowadays, the pictogram does not show the nozzle any more, but rather the surface you want to operate on. Similarly, microwave ovens do not show the two levers for wattage and time any more, but rather full recipes like pizza, curry, or fish.
The user is prevented from understanding the device in its mechanical details and use it as an instrument based on what it does. Instead the interaction is centred on the end purpose rather than using the device as a tool to achieve this end. The commercialisation of products numbs people down in their thinking. We are going from “Don’t make me think” to “Can you do the thinking for me” as, he said, we can see with the newer Android interfaces which tries to know already what you intend to do.

Eventually, you adapt the technology to the human rather than adapting the human to the technology. And while this is correct, he says, and it has gotten us very far, it is wrong from a social theory point of view. Mainly because it suggests that it’s a one-way process whereas it really is an interdependency. Because the interaction with technology forms habits and coins how the user experiences the machine. Imagine, he said, to get a 2018 smartphone in 1995. Back in the day, you probably could not have made sense out of it. The industrial user experience design is a product of numbing users down.

A highly interesting talk that got me thinking a little whether we ought to teach the user the inner workings of software systems.

The panel I was invited for had the topic “More privacy for smart phones – will the GDPR get us a new break through?” and we were discussing with a corporate representative and other people working in data protection. I was there in my capacity as a Free Software representative and as someone who was working on privacy enhancing technologies. I used my opportunities to praise Free Software and claim that many problems we were discussion would not exist if we consequently used Free Software. The audience was quite engaged and asked a lot of questions. Including the ever popular point of *having* to use WhatsApp, Signal, or any of those proprietary products, because of the network effect and they demanded more regulation. I cautioned that call for various reasons and mentioned that the freedom to choose the software to run has not yet fully been exploited. Afterwards, some projects presented themselves. It was an interesting mix of academic and actual project work. The list is on the conference page.

Talking at GPN 2018 in Karlsruhe, Germany

Similar to last year I managed to attend the Gulasch Programmier-Nacht (GPN) in Karlsruhe, Germany. Not only did I attend, I also managed to squeeze in a talk about PrivacyScore. We got the prime time slot on the opening day along with all the other relevant talks, including the Eurovision Song Contest, so we were not overly surprised that the audience had a hard time deciding where to go and eventually decided to attend talks which were not recorded. Our talk was recorded and is available here.

Given the tough selection of the audience by the other talks, we had the people who were really interested. And that showed during the official Q&A as well as in the hallway track. We exchanged contacts with other interested parties and got a few excellent comments on the project.

Another excellent part of this year’s GPN was the exhibition in the museum. As GPN takes places in a joint building belonging to the local media university as well as the superb art and media museum, the proximity to the artsy things allows for an interesting combination. This year, the open codes exhibition was not hosted in the ZKM, but GPN also took place in that exhibition. A fantastic setup. Especially with the GPN’s motto being “digital naïves”. One of the exhibition’s pieces is an assembly robot’s hand doing nothing else but writing a manifesto. Much like a disciplinary action for a school child. Except that the robot doesn’t care so much. Yet, it’s usefulness only expands to writing these manifestos. And the robot doesn’t learn anything from it. I like this piece, because it makes me think about the actions we take hoping that they have a desired effect on something or someone but we actually don’t know whether this is indeed the case.

I also like the Critical Engineering Manifesto being exhibited. I like to think about how the people who actual implement cetain technologies can be held responsible for the effects of it on individuals or the society. Especially with more and more “IoT” deployments where the “S” represents their security. It’s easy to blame Facebook for “leaking” user profiles although it’s in their Terms of Services, but it’s harder to shift the blame for the smart milk sensor in your fridge invading my privacy by reporting how much I consume. We will have interesting times ahead of us.

An exhibit pointing out the beauty of algorithms and computation is a board that renders a Julia Set. That’s wouldn’t be so impressive in itself, but you can watch the machine actually compute the values. The exhibit has a user controllable speed regulator and an insight into the CPU as well as the higher level code. I think it’s just an ingenious idea to enable the user to go full speed and see the captivating movements of the beautiful Julia set while also allowing the go super slow to investigate how this beauty is composed of relatively simple operations. Also, the slow execution itself is relatively boring. We get to see that we have to go very fast in order to be entertained. So fast that we cannot really comprehend what is going on.

I whole heartedly recommend visiting this exhibition. And the GPN, of course, too. It’s a nice chaotic event with a particular flair. It’s getting more and more crowded though, so better while the feeling lasts and doesn’t get drowned by all the tourists.

Talking on PrivacyScore at DFN Security Conference 2018 in Hamburg, Germany

I seem to have skipped last year, but otherwise I have been to the DFN Workshop regularly. While I had a publication at this venue before, it’s only this year that I got to have a the conference.

I cannot comment on the other talks so much, because I could not attend too many :( But our talk (slides) was well visited and I think people appreciated the presentation being a bit lighter than the previous one about the upcoming GDPR.

I talked about PrivacyScore.org and how we’ve measured German universities. The paper is here. Our results were mixed. As for TLS deployment, with a lot of imagination we can see a line dividing Germany. The West seems to have fewer problems with their TLS deployment than the East. The more red an area is, the worse its TLS support is. That ranges from not offering TLS at all to having an invalid certificate or using broken parameters.

As for tracking its users we had the hypothesis that privately run institutions have a higher interest in tracking its users than publicly run institutions. The following graphic reflects the geographic distribution of trackers on German university’s Web sites.
That hypothesis can be confirmed by looking at the PrivacyScore list that discriminates those institutions.

We found data that was very likely not meant to be there, such as database dumps or Git repositories of the Web site’s code (including passwords for their staging environments, etc.). We tried to report these issues to the Web site operators, but it was difficult to get hold of the responsible people. For the 21 leaks we found I have 93 emails in my mailbox. Ideally, the 21 I sent off were enough. But even sending those emails is hard, because people don’t respect RFC 2142 and have a security@ address. Eventually, we made the Internet a tiny bit more secure by having those Website operators remove the leaks from their Web site, but there are still some pages which have (supposedly) unwanted information such as their visitors’ IP addresses online. The graph below shows that most of the operators who reacted did so in the first few days. So management of security incidents seems to be an area of improvement.

I hope to be able to return next year, if only for the catering ;-) Then, I better attend some more talks and chat with the other guests.

Speaking at FOSDEM 2018 in Brussels, Belgium

As in the last ten (or so) years I attended FODSEM, the biggest European Free Software event. This year, though, I went a day earlier to attend one of the fringe events, the CHAOSSCon.

I didn’t take notice of the LinuxFoundation announcing CHAOSS, an attempt to bundle various efforts regarding measuring and creating metrics of Open Source projects. The CHAOSS community is thus a bunch of formerly separate projects now having one umbrella.

OpenStack’s Ildiko Vancsa opened the conference by saying that metrics is what drives our understanding of communities and that we’re all interested in numbers. That helps us to understand how projects work and make a more educated guess how healthy a project currently is, and, more importantly, what needs to be done in order to make it more sustainable. She also said that two communities within the CHAOSS project exist: The Metrics and the Software team. The metrics care about what information should be extracted and how that can be presented in an informational manner. The Software team implements the extraction parts and makes the analytics. She pointed the audience to the Wiki which hosts more information.

Georg Link from the metrics team then continued saying that health cannot universally be determined as every project is different and needs a different perspective. The metrics team does not work at answering the health question for each and every project, but rather enables such conclusions to be drawn by providing the necessary infrastructure. They want to provide facts, not opinions.

Jesus from Bitergia and Harish from Red Hat were talking on behalf of the technical team. Their idea is to build a platform to understand how software is developed. The core projects are prospector, cregit, ghdata, and grimoire, they said.

I think that we in the GNOME community can use data to make more informed decisions. For example, right now we’re fading out our Bugzilla instance and we don’t really have any way to measure how successful we are. In fact, we don’t even know what it would mean to be successful. But by looking at data we might get a better feeling of what we are interested in and what metric we need to refine to express better what we want to know. Then we can evaluate measures by looking at the development of the metrics over time. Spontaneously, I can think of these relatively simple questions: How much review do our patches get? How many stale wiki links do we have? How soon are security issues being dealt with? Do people contribute to the wiki, documentation, or translations before creating code? Where do people contribute when coding stalls?

Bitergia’s Daniel reported on Diversity and Inclusion in CHAOSS and he said he is building a bridge between the metrics and the software team. He tried to produce data of how many women were contributing what. Especially, whether they would do any technical work. Questions they want to answer include whether minorities take more time to contribute or what impact programs like the GNOME Outreach Program for Women have. They do need to code up the relevant metrics but intend to be ready for the next OpenStack Gender diversity report.

Bitergia’s CEO talked about the state of the GriomoireLab suite.
It’s software development analysis toolkit written largely in Python, ElasticSearch, and Kibana. One year ago it was still complicated to run the stack, he said. Now it’s easy and organisations like the Document Foundation run run a public instance. Also because they want to be as transparent as possible, he said.

Yousef from Mozilla’s Open Innovation team then showed how they make use of Grimoire to investigate the state of their community. They ingest data from Github, Bugzilla, newsgroup, meetups, discourse, IRC, stackoverflow, their wiki, rust creates, and a few other things reaching back as far as 20 years. Quite impressive. One of the graphs he found interesting was one showing commits by time zone. He commented that it was not as diverse as he hope as there were still many US time zones and much fewer Asian ones.

Raymond from the Linux Foundation talked about Metrics in Open Source Communities, what are they measuring and what do they do with the data. Measuring things is not too complicated, he said. But then you actually need to do stuff with it. Certain things are simply hard to measure, he said. As an example he gave the level of user or community support people give. Another interesting aspect he mentioned is that it may be a very good thing when numbers go down, also because projects may follow a hype cycle, too. And if your numbers drop, it’ll eventually get to a more mature phase, he said. He closed with a quote he liked and noted that he’s not necessarily making fun of senior management: Not everything that can be counted counts and not everything that counts can be counted.

Boris then talked about Crossminer, which is a European funded research project. They aim for improving the management of software projects by providing in-context recommendations and analytics. It’s a continuation of the Ossmeter project. He said that such projects usually die after the funding runs out. He said that the Crossminer project wants to be sustainable and survive the post-funding state by building an actual community around the software the project is developing. He presented a rather high level overview of what they are doing and what their software tries to achieve. Essentially, it’s an Eclipse plugin which gives you recommendations. The time was too short for going into the details of how they actually do it, I suppose.

Eleni talked about merging identities. When tapping various data sources, you have to deal with people having different identity domains. You may want to merge the identities belonging to the same person, she said. She gave a few examples of what can go wrong when trying to merge identities. One of them is that some identities do not represent humans but rather bots. Commonly used labels is a problem, she said. She referred to email address prefixes which may very well be the same for different people, think j.wright@apple.com, j.wright@gmail.com, j.wright@amazon.com. They have at least 13 different problems, she said, and the impact of wrongly merging identities can be to either underestimate or overestimate the number of community members. Manual inspection is required, at least so far, she said.

The next two days were then dedicated to FOSDEM which had a Privacy Devroom. There I had a talk on PrivacyScore.org (slides). I had 25 minutes which I was overusing a little bit. I’m not used to these rather short slots. You just warm up talking and then the time is already up. Anyway, we had very interesting discussions afterwards with a few suggestions regarding new tests. For example, someone mentioned that detecting a CDN might be worthwhile given that CloudFlare allegedly terminates 10% of today’s Web traffic.

When sitting with friends we noticed that FOSDEM felt a bit like Christmas for us: Nobody really cares a lot about Christmas itself, but rather about the people coming together to spend time with each other. The younger people are excited about the presents (or the talks, in this case), but it’s just a matter of time for that to change.

It’s been an intense yet refreshing weekend and I’m looking very much forward to coming back next time. For some reason it feels really good to see so many people caring about Free Software.

Talking at GI Tracking Workshop in Darmstadt, Germany

Uh, I almost forgot about blogging about having talked at the GI Tracking Workshop in Darmstadt, Germany. The GI is, literally translated, the “informatics society” and sort of a union of academics in the field of computer science (oh boy, I’ll probably get beaten up for that description). And within that body several working groups exist. And one of these groups working on privacy organised this workshop about tracking on the Web.

I consider “workshop” a bit of a misnomer for this event, because it was mainly talks with a panel at the end. I was an invited panellist for representing the Free Software movement contrasting a guy from affili.net, someone from eTracker.com, a lady from eyeo (the AdBlock Plus people), and professors representing academia. During the panel discussion I tried to focus on Free Software being the only tool to enable the user to exercise control over what data is being sent in order to control tracking. Nobody really disagreed, which made the discussion a bit boring for me. Maybe I should have tried to find another more controversial argument to make people say more interesting things. Then again, it’s probably more the job of the moderator to make the participants discuss heatedly. Anyway, we had a nice hour or so of talking about the future of tracking, not only the Web, but in our lives.

One of the speakers was Lars Konzelmann who works at Saxony’s data protection office. He talked about the legislative nature of data protection issues. The GDPR is, although being almost two years old, a thing now. Several types of EU-wide regulations exist, he said. One is “Regulation” and the other is “Directive”. The GDPR has been designed as a Regulation, because the EU wanted to keep a minimum level of quality across the EU and prevent countries to implement their own legislation with rather lax rules, he said. The GDPR favours “privacy by design” but that has issues, he said, as the usability aspects are severe. Because so far, companies can get the user’s “informed consent” in order to do pretty much anything they want. Although it’s usefulness is limited, he said, because people generally don’t understand what they are consenting to. But with the GDPR, companies should implement privacy by design which will probably obsolete the option for users to simply click “agree”, he said. So things will somehow get harder to agree to. That, in turn, may cause people to be unhappy and feel that they are being patronised and being told what they should do, rather than expressing their free will with a simple click of a button.

Next up was a guy with their solution against tracking in the Web. They sell a little box which you use to surf the Web with, similar to what Pi Hole provides. It’s a Raspberry Pi with a modified (and likely GPL infringing) modification of Raspbian which you plug into your network and use as a gateway. I assume that the device then filters your network traffic to exclude known bad trackers. Anyway, he said that ads are only the tip of the iceberg. Below that is your more private intimate sphere which is being pried on by real time bidding for your screen estate by advertising companies. Why would that be a problem, you ask. And he said that companies apply dynamic pricing depending on your profile and that you might well be interested in knowing that you are being treated worse than other people. Other examples include a worse credit- or health rating depending on where you browse or because your bank knows that you’re a gambler. In fact, micro targeting allows for building up a political profile of yours or to make identity theft much easier. He then went on to explain how Web tracking actually works. He mentioned third party cookies, “social” plugins (think: Like button), advertisement, content providers like Google Maps, Twitter, Youtube, these kind of things, as a means to track you. And that it’s possible to do non invasive customer recognition which does not involve writing anything to the user’s disk, e.g. no cookies. In fact, such a fingerprinting of the users’ browser is the new thing, he said. He probably knows, because he is also in the business of providing a tracker. That’s probably how he knows that “data management providers” (DMP) merge data sets of different trackers to get a more complete picture of the entity behind a tracking code. DMPs enrich their profiles by trading them with other DMPs. In order to match IDs, the tracker sends some code that makes the user’s browser merge the tracking IDs, e.g. make it send all IDs to all the trackers. He wasn’t really advertising his product, but during Q&A he was asked what we can do against that tracking behaviour and then he was forced to praise his product…

Eye/o’s legal counsel Judith Nink then talked about the juristic aspects of blocking advertisements. She explained why people use adblockers in first place. I commented on that before, claiming that using an adblocker improves your security. She did indeed mention privacy and security being reasons for people to run adblockers and explicitly mentionedmalvertising. She said that Jerusalem Post had ads which were actually malware. That in turn caused some stir-up in Germany, because it was coined as attack on German parliament… But other reasons for running and adblocker were data consumption and the speed of loading Web pages, she said. And, of course, the simple annoyance of certain advertisements. She presented some studies which showed that the typical Web site has 50+ or so trackers and that the costs of downloading advertising were significant compared to downloading the actual content. She then showed a statement by Edward Snowden saying that using an ad-blocker was not only a right but is a duty.

Everybody should be running adblock software, if only from a safety perspective

Browser based ad blockers need external filter lists, she said. The discussion then turned towards the legality of blocking ads. I wasn’t aware that it’s a thing that law people discuss. How can it possibly not be legal to control what my client does when being fed a bunch of HTML and JavaScript..? Turns out that it’s more about the entity offering these lists and a program to evaluate them *shrug*. Anyway, ad-blockers use either blocking or hiding of elements, she said where “blocking” is to stop the browser from issuing the request in first place while “hiding” is to issue the request, but to then hide the DOM element. Yeah, law people make exactly this distinction. She then turned to the question of how legal either of these behaviours is. To the non German folks that question may seem silly. And I tend to agree. But apparently, you cannot simply distribute software which modifies a Browser to either block requests or hide DOM elements without getting sued by publishers. Those, she said, argue that gratis content can only be delivered along with ads and that it’s part of the deal with the customer. Like that they also transfer ads along with the actual content. If you think that this is an insane argument, especially in light of the customer not having had the ability to review that deal before loading that page, you’re in good company. She argued, that the simple act of loading a page cannot be a statement of consent, let alone be a deal of some sorts. In order to make it a deal, the publishers would have to show their terms of service first, before showing anything, she said. Anyway, eye/o’s business is to provide those filter lists and a browser plugin to make use of those lists. If you pay them, however, they think twice before blocking your content and make exceptions. That feels a bit mafiaesque and so they were sued for “aggressive geschäftliche Handlung”, an “aggressive commercial behaviour”. I found the history of cases interesting, but I’ll spare the details for the reader here. You can follow that, and other cases, by looking at OLG Koeln 6U149/15.

Next up was Dominik Herrmann to present on PrivacyScore.org, a Web portal for scanning Web sites for security and privacy issues. It is similar to other scanners, he said, but the focus of PrivacyScore is publicity. By making results public, he hopes that a race to the top will occur. Web site operators might feel more inclined to implement certain privacy or security mechanisms if they know that they are the only Web site which doesn’t protect the privacy of their users. Similarly, users might opt to use a Web site providing a more privacy friendly service. With the public portal you can create lists in order to create public benchmarks. I took the liberty to create a list of Free Desktop environments. At the time of creation, GNOME fell behind many others, because the mail server did not implement TLS 1.2. I hope that is being taking as a motivational factor to make things more secure.

Talking at PET-CON 2017.2 in Hamburg, Germany

A few weeks ago, I was fortunate enough to talk at the 7th Privacy Enhancing Techniques Conference (PET-CON 2017.2) in Hamburg, Germany. It’s a teeny tiny academic event with a dozen or so experts in the field of privacy.

The talks were quite technical, involving things like machine learning over logs or secure multi-party computation. I talked about how I think that the best technical solution does not necessarily enable the people to be more private, simply because the people might not be able to make use of the tool properly. A concern that’s generally shared in the academic community. Yet, the methodology to create and assess the effectiveness of a design is not very elaborated. I guess we need to invest more brain power into creating models, metrics, and tools for enabling people to do safer computing.

So I’m happy to have gone and to have had the opportunity of discussing the issues I’m seeing. Likewise, I find it very interesting to see where the people are currently headed towards.

Taint Tracking for Chromium

I forgot to blog about one of my projects. I had actually already talked about it more than one year ago and we had a paper at USENIX Security.

Essentially, we built a protection against DOM-based Cross-site Scripting (DOMXSS) into Chromium. We did that by detecting whenever potentially attacker provided strings become JavaScript code. To that end, we made the HTML rendering engine (WebKit/Blink) and the JavaScript engine taint aware. That is, we identified sources of values that an attacker could control (think window.name) and marked all strings coming from those sources as tainted. Then, during parsing of JavaScript, we check whether the string to be compiled is actually tainted. If that is indeed the case, then we abort the compilation.

That description is a bit simplified. For example, not compiling code because it contains some fragments of the URL would break a substantial number of Web sites. It’s an unfortunate fact that many Web sites either eval code containing parts of the URL or do a document.write with a string containing parts of the URL. The URL, in our attacker model, can be controlled by the attacker. So we must be more clever about aborting compilation. The idea was to only allow literals in JavaScript (like true, false, numbers, or strings) to be compiled, but not “code”. So if a tainted (sub)string compiles to a string: fine. If, however, we compile a tainted string to a function call or an operation, then we abort. Let me give an example of an allowed compilation and a disallowed one.


<HTML>

<TITLE>Welcome!</TITLE>
Hi

<SCRIPT>
var pos=document.URL.indexOf("name=")+5;
document.write(document.URL.substring(pos,document.URL.length));
</SCRIPT>

<BR>
Welcome to our system

</HTML>

Which is from the original report on DOM-based XSS. You see that nothing bad will happen when you open http://www.vulnerable.site/welcome.html?name=Joe. However, opening http://www.vulnerable.site/welcome.html?name=alert(document.cookie) will lead to attacker provided code being executed in the victim’s context. Even worse, when opening with a hash (#) instead of a question mark (?) then the server will not even see the payload, because Web browsers do not transmit it as part of their request.

“Why does that happen?”, you may ask. We see that the document.write call got fed a string derived from the URL. The URL is assumed to be provided by the attacker. The string is then used to create new DOM elements. In the good case, it’s only a simple text node, representing text to be rendered. That’s a perfectly legit use case and we must, unfortunately, allow that sort of usage. I say unfortunate, because using these APIs is inherently insecure. The alternative is to use createElement and friends to properly inject DOM nodes. But that requires comparatively much more effort than using the document.write. Coming back to the security problem: In the bad case, a script element is created with attacker provided contents. That is very bad, because now the attacker controls your browser. So we must prevent the attacker provided code from execution.

You see, tracking the taint information is a non-trivial effort and must be done beyond newly created DOM nodes and multiple passes of JavaScript (think eval(eval(eval(tainted_string)))). We must also track the taint information not on the full string, but on each character in order to not break existing Web applications. For example, if you first concatenate with a tainted string and then remove all tainted characters, the string should not be marked as tainted. This non-trivial effort manifests itself in the over 15000 Lines of Code we patched Chromium with to provide protection against DOM-based XSS. These patches, as indicated, create, track, propagate, and evaluate taint information. Also, the compilation of JavaScript has been modified to adhere to the policy that tainted strings must only compile to literals. Other policies are certainly possible and might actually increase protection (or increase compatibility without sacrificing security). So not only WebKit (Blink) needed to be patched, but also V8, the JavaScript engine. These patches add to the logic and must be execute in order to protect the user. Thus, they take time on the CPU and add to the memory consumption. Especially the way the taint information is stored could blow up the memory required to store a string by 100%. We found, however, that the overhead incurred was not as big as other solutions proposed by academia. Actually, we measure that we are still faster than, say, Firefox or Opera. We measured the execution speed of various browsers under various benchmarks. We concluded that our patched version added 23% runtime overhead compared to the unpatched version.

xss-runtime

As for compatibility, we crawled the Alexa Top 10000 and observed how often our protection mechanism has stopped execution. Every blocked script would count towards the incompatibility, because we assume that our browser was not under attack when crawling. That methodology is certainly not perfect, because only shallowly crawling front pages does not actually indicate how broken the actual Web app is. To compensate, we used the WebKit rendering tests, hoping that they cover most of the important functionality. Our results indicate that scripts from 26 of the 10000 domains were blocked. Out of those, 18 were actually vulnerable against DOM-based XSS, so blocking their execution happened because a code fragment like the following is actually indistinguishable from a real attack. Unfortunately, those scripts are quite common :( It’s being used mostly by ad distribution networks and is really dangerous. So using an AdBlocker is certainly an increase in security.


var location_parts = window.location.hash.substring(1).split(’|’);
var rand = location_parts[0];
var scriptsrc = decodeURIComponent(location_parts[1]);
document.write("<scr"+"ipt src=’" + scriptsrc + "’></scr"+"ipt>");

Modifying the WebKit for the Web parts and V8 for the JavaScript parts to be taint aware was certainly a challenge. I have neither seriously programmed C++ before nor looked much into compilers. So modifying Chromium, the big beast, was not an easy task for me. Besides those handicaps, there were technical challenges, too, which I didn’t think of when I started to work on a solution. For example, hash tables (or hash sets) with tainted strings as keys behave differently from untainted strings. At least they should. Except when they should not! They should not behave differently when it’s about querying for DOM elements. If you create a DOM element from a tainted string, you should be able to find it back with an untainted string. But when it comes to looking up a string in a cache, we certainly want to have the taint information preserved. I hence needed to inspect each and every hash table for their usage of tainted or untainted strings. I haven’t found them all as WebKit’s (extensive) Layout tests still showed some minor rendering differences. But it seems to work well enough.

As for the protection capabilities of our approach, we measured 100% protection against DOM-based XSS. That sounds impressive, right? Our measurements were two-fold. We used the already mentioned Layout Tests to include some more DOM-XSS test cases as well as real-life vulnerabilities. To find those, we used the reports the patched Chromium generated when crawling the Web as mentioned above to scan for compatibility problems, to automatically craft exploits. We then verified that the exploits do indeed work. With 757 of the top 10000 domains the number of exploitable domains was quite high. But that might not add more protection as the already existing built in mechanism, the XSS Auditor, might protect against those attacks already. So we ran the stock browser against the exploits and checked how many of those were successful. The XSS Auditor protected about 28% of the exploitable domains. Our taint tracking based solution, as already mentioned, protected against 100%. That number is not very surprising, because we used the very same codebase to find vulnerabilities. But we couldn’t do any better, because there is no source of DOM-based XSS vulnerabilities…

You could, however, trick the mechanism by using indirect flows. An example of such an indirect data flow is the following piece of code:


// Explicit flow: Taint propagates
var value1 = tainted_value === "shibboleth" ? tainted_value : "";
// Implicit flow: Taint does not propagate
var value2 = tainted_value === "shibboleth" ? "shibboleth" : "";

If you had such code, then we cannot protect against exploitation. At least not easily.

For future work in the Web context, the approach presented here can be made compatible with server-side taint tracking to persist taint information beyond the lifetime of a Web page. A server-side Web application could transmit taint information for the strings it sends so that the client could mark those strings as tainted. Following that idea it should be possible to defeat other types of XSS. Other areas of work are the representation of information about the data flows in order to help developers to secure their applications. We already receive a report in the form of structured information about the blocked code generation. If that information was enriched and presented in an appealing way, application developers could use that to understand why their application is vulnerable and when it is secure. In a similar vein, witness inputs need to be generated for a malicious data flow in order to assert that code is vulnerable. If these witness inputs were generated live while browsing a Web site, a developer could more easily assess the severity and address the issues arising from DOM-based XSS.