I attended this year’s mrmcd, a cozy conference in Darmstadt, Germany. As in the previous years, it’s a 350 people event with a relaxed atmosphere. I really enjoy going to these mid-size events with a decent selection of talks and attentive guests.
The conference was opened by Paolo Ferri’s Keynote. He is from the ESA and gave a very entertaining talk about the Rosetta mission. He mentioned the challenges involved in launching a missile for a mission to be executed ten years later. It was very interesting to see what they have achieved over a few hundred kilometers distance. Now I want to become a space pilot, too 😉
The next talk was on those tracking devices for your fitness. Turns out, that these tracking devices may actually track you and that they hence pose a risk for your privacy. Apparently fraud is another issue for insurance companies in the US, because some allow you to get better rates when you upload your fitness status. That makes those fitness trackers an interesting target for both people wanting to manipulate their walking statistics to get a better premium for health care and attackers who want to harm someone by changing their statistics.
Concretely, he presented, these devices run with Bluetooth 4 (Smart) which allows anyone to see the device. In addition, service discovery is also turned on which allows anyone to query the device. Usually, he said, no pin is needed anymore to connect to the device. He actually tested several devices with regard to several aspects, such as authentication, what data is stored, what is sent to the Internet and what security mechanisms the apps (for a phone) have been deployed. Among the tested devices were the XiaomMi Miband, the Fitbit, or the Huawei TalkBand B1. The MiBand was setting a good example by disabling discovery once someone has connected to the device. It also saves the MAC address of the phone and ignores others. In order to investigate the data sent between a phone and a band, they disassembled the Android applications.
Muzy was telling a fairytale about a big data lake gone bad.
He said that data lakes are a storage for not necessarily structured data which allow extraction of certain features in an on-demand fashion and that the processed data will then eventually end up in a data warehouse in a much more structured fashion. According to him, data scientists then have unlimited access to that data. That poses a problem and in order to secure the data, he proposed to introduce another layer of authorization to determine whether data scientists are allowed to access certain records. That is a bit different from what exists today: Encrypt data at rest and encrypt in motion. He claimed that current approaches do not solve actual problems, because of, e.g. key management questions. However, user rights management and user authorization are currently emerging, he said.
Later, he referred on Apache Spark. With big data, he said, you need to adapt to a new programming paradigm away from a single worker to multiple nodes, split up work, handling errors and slow tasks. Map reduce, he said, is one programming model. A popular framework for writing in a such a paradigm is Apache’s Hadoop, but there are more. He presented Apache Spark. But it only begins to make sense if you want to analyse more data than you can fit in your RAM, he said. Spark distributes data for you and executes operations on it in a parallel manner, so you don’t need to care about all of that. However, not all applications are a nice fit for Spark, he mentioned. He gave high performance weather computations as such as example. In general, Spark fits well if IPC not required.
The conference then continued with two very interesting talks on Bahn APIs. derf presented on public transport APIs like EFA, HAFAS, and IRIS. These APIs can do things like routing from A to B or answer questions such as which trains are running from a given station. However, these APIs are hardly documented. The IRIS-system is the internal Bahn-API which is probably not supposed to be publicly available, but there is a Web page which exposes (bits) of the API. Others have used that to build similar, even more fancy things. Anyway, he used these APIs to query for trains running late. The results were insightful and entertaining, but have not been released to the general public. However, the speakers presented a way to query all trains in Germany. Long story short: They use the Zugradar which also contains the geo coordinates. They acquired 160 millions datasets over the last year which is represented in 80GB of JSON. They have made their database available as ElasticSearch and Kibana interface. The code it at Github. That is really really good stuff. I’m already in the process of building an ElasticSearch and Spark cluster to munch on that data.
Yours truly also had a talk. I was speaking on GNOME Keysign. Because the CCC people know how to run a great conference, we already have recordings (torrent). You get the slides here. Those of you who know me don’t find the content surprising. To all others: GNOME Keysign is a tool for signing OpenPGP Keys. New features include the capability to sign keys offline, that is, you present a file with a key and you have it signed following best practices.
Another talk I had, this time with a colleague of mine, was on Searchable Encryption. Again, the Video already exists. The slides are probably less funny than they were during the presentation, but hopefully still informative enough to make some sense out of them. Together we mentioned various existing cryptographic schemes which allow you to have a third party execute search operations on your encrypted data on your behalf. The most interesting schemes we showed were Song, Wagner, Perrig and Cash et al..
Thanks again to the organisers for this nice event! I’m looking forward to coming back next year.