Monitoring (all of) the internet
Mediatoolkit is a 24/7 media monitoring and analytics tool that gathers information from 100+ million online sources. It notifies users in real time whenever their company, competitors, key people from their organisations or any other keyword they decide to track is mentioned online.
Build a product, pivot, perfect…
The primary goal of Mediatoolkit was to create a media monitoring software for the global market. It initially started as a tool that could predict viral news and was initially built for reporters and journalists, but due to the needs of the market and a better product-market fit with brand owners and managers, we transformed it into a media monitoring tool.
Right from the get-go, we decided to base the entire architecture around microservices. Yes, we had to build it from scratch, but in terms of maintenance, scalability and overall future-proofing, it was clearly the optimal approach.
Bornfight Project Manager
Gateway to the entire internet
Maximizing coverage is Mediatoolkit’s key objective, as they basically want to monitor as many online sources as technically possible. To achieve that, we created the Gateway — an advanced system of microservices that can autonomously gather and process content from millions of websites, all while adhering to specific guidelines and limitations of those websites. And on top of that, we future-proofed the Gateway in a way that enables extreme horizontal scaling which will allow it to seamlessly follow Mediatoolkit’s continuous growth.
Embracing the microservices architecture
Microservices are the go-to when the target is to create a system that can be seamlessly and independently deployed, developed, maintained and scaled, and that is why we built the entire Mediatoolkit Gateway around it. One of the biggest benefits of this approach is that implementing a new component or a new functionality can be done by anyone. There are no language or tech limitations, and that flexibility is extremely beneficial to have from a business perspective — and that point is especially true when it comes to enormous systems like Mediatoolkit.
When you're developing such an enormous system, your focus needs to be on its architecture — what tech to choose, how will the elements communicate, how will it scale... Creating a system that monitors one website is easy, but monitoring millions of sources in real time is a whole other ballgame.
Bornfight Tech Lead
6 key modules revolving around Apache Kafka
We created the entire Gateway system as 6 advanced modules that are connected and communicate with each other through Apache Kafka®, a stream-processing software serving as a data-transferring core.
Module 01 / codename Kryten:
This module stands at the entrance to the component and its task is to take a request, e.g. fetch content from a specific URL. Kryten will register the request, check if it can be performed and then transfer it to the next component.
Module 02 / codename Holly:
This module is the core component for computing and ensuring that the robots.txt guidelines are followed. For example, it detects if a certain server will allow Mediatoolkit to monitor and crawl it or how often that action can be performed.
Module 03 / codename WALL-E:
This worker module’s task is to actually gather content from a specific server and return it to the system. Every WALL-E module is basically a unique IP that enables the entire system to gather content from millions of sources in real time.
Module 04 / codename EVE:
This module’s main objective is to manage the WALL-E modules. As the entire system has numerous WALL-E modules (numerous IPs), the EVE module decides which one of them should perform a certain task based on a certain set of parameters and the overall workload of specific WALL-E modules.
Module 05 / codename R2D2:
This module is in charge of autonomous recovery. Its main task is to detect failed requests by monitoring WALL-E modules’ outputs, and then perform a specific set of actions and set new rules for the Holly module that will enable it to prevent future failure and access content from a certain server.
Module 06 / codename Mr. Robot:
This module’s primary task is to acquire the robots.txt file from a specific server and ensure that the Holly module has all of the information that will enable it to perform its task.
User testing every step of the way
A key component of the tool is user behaviour tracking. It provides us with the data we need in order to make decisions on how to improve parts of the tool that aren’t received well by the users. This component alone resulted in a number of iterations and upgrades that prompted users to describe Mediatoolkit as one of the most “easy to use” and “eye-pleasing” media monitoring tools on the market.
I’d say this project has two standout aspects. The technical one — new architecture is brilliantly envisioned and set up. And the organizational one — their approach was top-notch, from communication and status reports to defining next steps...
Mediatoolkit Head of Product
Infrastructure for world wide web domination
The increasing number of customers combined with global market demands mean that the daily amount of data that Mediatoolkit gathers, processes and presents to its users is not only extremely large, but also ever-growing. Couple that with the fact that Mediatoolkit is used by thousands of users simultaneously, and you have a pretty good case for needing to achieve the highest level of optimization possible.
Unlike some of its competitors that run on 1000 to 5000 servers, Mediatoolkit outclasses their performance with just 30 servers — all thanks to the process of extensive and continuous optimization, as well as the implemented architecture that allows for extreme customization when it comes to Mediatoolkit’s overall scalability.
Mediatoolkit is now a global media monitoring service with clients from more than 100 countries. Due to the newly implemented architecture based on microservices, it can scale on a global level and also provide detailed monitoring for specific markets with a minimal effort from the product team. The number of administrative actions towards users in Mediatoolkit is reduced to a minimum due to the extensive implementation of automatization, so it can scale without additional human workforce.
Visit mediatoolkit.com to see how this monitoring tool tracks online mentions from millions of sources in real time!