I don’t know about you, but I have a bad habit of coding my problems away. I find it more natural to experiment until I `stumble` on the correct solution than to take the time to study the problem in depth before opening an editor.

If you haven’t read the Day-0 post, please check it out.

From time to time, I force myself to take a step back from the code to have a broader view of the problem ahead. Don’t get me wrong, to be able to code is powerful tool that unlocks vast amounts of data that would have taken months and years to compile.

So I have set myself a rule: Reach for code only when you need to answer a question or confirm a hypothesis.

Laying down the issue in muggle terms

(I know that “No-Maj” is the PC term)

So what is MQTT exactly ?

I don’t want to bore you with the Wikipedia definition (Message Queuing Telemetry Transport).

So here is the TL;DR version:
MQTT is a protocol that basically has 2 types of entities:

  • ¬†Client: Can subscribe to a number of “topics” where it may push or receive data; A topic is simply a channel of communication that enables different clients to exchange “coherent” data without much overhead.
  • Broker : A centralized server that receives then retransmits messages to the connected clients. It maintains a list of topics and the clients interested in each topic.

This simplicity of architecture and messages, and the fact that it is just on top of TCP/IP (no HTTP overhead) made it ideal for a number of use cases:

  • Messaging: The fact that MQTT has the notion of “topic” built in, makes it great for messaging applications. Facebook Messanger used to rely on MQTT.
  • Distributed applications: Where every piece of the application is running on a different machine and efficient communication is required.
  • IoT applications: MQTT has been pushed by IBM, Eclipse and other vendors that have a foothold in the IoT ecosystem as “THE protocol for IoT“. Its lightweight nature and simple implementation spread its use both in the industry and by hobbyists.

During this study the focusing is on IoT applications as these applications often collect and expose personal data of the users in a scale unseen before. Many IoT applications also have attractors that can have severe implications on the user’s well being.

Exploiting exposed IoT devices that rely on a C&C (Command and control) scenarios can prove of severe implications on the users and the industry.

Area of focus

Its fun to talk about general definitions and abstract concepts but what are we focusing on exactly?

As we have said before MQTT clients need a server (broker) to communicate. Setting up such a server on a production level that guarantees scalability and availability is a challenging task.
Luckily a number of people have set up some open and free (as in free-eggnog) brokers for testing purposes.

My hypothesis is that people are using these severs for more than testing. And the goal is to check the validity of this, and to see to what extent this can impact users and vendors.

Someone was nice enough to compile a list of public brokers, some of which are totally public, others are free but require an account.

I ran some tests to check the popularity of the public ones.

The Eclipse and Mosquitto brokers seem to be the most active ones, so that’s where I will be focusing my efforts. But I will be re-running these tests constantly to check if there is any change of their popularity.

Why is this a thing?

Setting up an MQTT broker is a pretty straightforward process, most of the implementations support TLS, authentication and authorization.
However a large number of people don’t want to bother to use a (commercial) secure server, or to setup their own infrastructure.
This is the case we will be focusing on.

Technical talk

On technical side of things I’m still comparing alternatives for data storage and processing. Dealing with these amounts of data is new to me, so it is still kind of an open question.

I have started to use a simple ElasticSearch client (esta), but it doesn’t seem full-featured.

I’m switching my efforts to the official client.

Also, since the blog is starting to gain momentum after a period of inactivity, I’m working on porting everything to Jekyll including the current theme and some of the plugins. There is going to be a major performance boost.

As always, if you have any questions or remarks, leave a comment below !

Until the next time,