Monitoring (all of) the internet
Mediatoolkit is a 24/7 media monitoring and analytics tool that gathers information from 100+ million online sources. It notifies users in real time whenever their company, competitors, key people from their organisations or any other keyword they decide to track is mentioned online.
Getting access to online sources…
In order to upgrade the tool in a way that would allow it to be more scalable and access more online sources, Mediatoolkit tasked us with building new system architecture that would be based on microservices. During the project, our developers acted as full-time extensions of Mediatoolkit’s in-house development team — all with the goal of making the entire process of creating, upgrading and transitioning to the new architecture as streamlined and seamless as possible.
The new structure was based around microservices. Yes, it needed to be built from scratch, but in terms of maintenance, scalability and overall future-proofing, it was clearly the optimal approach.
Bornfight Project Manager
Gateway to the entire internet
Maximizing coverage is Mediatoolkit’s key objective, as they basically want to monitor as many online sources as technically possible. To achieve that, we created the Gateway — an advanced system of microservices that gives Mediatoolkit access to millions of websites, all while adhering to their specific guidelines and limitations. On top of that, we future-proofed the Gateway in a way that enables extreme horizontal scaling which will allow it to follow Mediatoolkit’s continuous growth.
Embracing the microservices architecture
Microservices are the go-to option when the target is to create a system that can be seamlessly and independently deployed, developed, maintained and scaled, and that is why we built the entire Mediatoolkit Gateway around it. One of the biggest benefits of this approach is that implementing a new component or a new functionality can be done by anyone. There are no language or tech limitations, and that flexibility is extremely beneficial to have from a business perspective — and that point is especially true when it comes to enormous systems like Mediatoolkit.
When you're developing such an enormous system, your focus needs to be on its architecture — what tech to choose, how will the elements communicate, how will it scale... Creating a system that monitors one website is easy, but monitoring millions of sources in real time is a whole other ballgame.
Bornfight Tech Lead
6 key modules revolving around Apache Kafka
We created the entire Gateway system as 6 advanced modules that are connected and communicate with each other through Apache Kafka®, a stream-processing software serving as a data-transferring core.
Module 01 / codename Kryten:
This module stands at the entrance to the component and its task is to take a request, e.g. fetch content from a specific URL.
Module 02 / codename Holly:
This module is the core component for computing and ensuring that the robots.txt guidelines are followed.
Module 03 / codename WALL-E:
This worker module’s task is to actually gather content from a specific server and return it to the system.
Module 04 / codename EVE:
As the entire system has numerous WALL-E modules, EVE module decides which one of them should perform a certain task.
Module 05 / codename R2D2:
This module is in charge of autonomous recovery and its main task is to detect failed requests by monitoring WALL-E modules’ outputs.
Module 06 / codename Mr. Robot:
This module’s task is to acquire the robots.txt files and ensure that the Holly module has the information it needs to perform its task.
I’d say this project has two standout aspects. The technical one — new architecture is brilliantly set up. And the organizational one — their approach was top-notch, from communication and status reports to defining next steps...
Mediatoolkit Head of Engineering
Future-proofing through extreme scalability
Mediatoolkit is a global media monitoring service with clients in more than 100 countries, and it is constantly growing both in terms of new clients and new sources from which it gathers content. With the newly implemented Gateway based on microservices, it can continue to seamlessly scale as the company expands.
Visit mediatoolkit.com to see how this monitoring tool tracks online mentions from millions of sources in real time!