I work on a backend part of Schibsted Publishing Platform – Creation Suite, which is content management system for a number of different newspapers. It contains tools and functions that help journalists to do their job.
The system consists of several dozen microservices, communicating asynchronously with the use of AWS. Yes, we have these diagrams full of rectangles and a web of arrows connecting them. On the occasion of some refactoring, I decided to make an inventory of the basic concepts (not so new but still worth of having it in mind) in the context of messaging and our architecture. And because I believe that semantics and intentions are equally important, I will focus here on role models, making only loose inclusions that refer to the system in which I work.
What’s the Messaging System? Messaging, in general, is asynchronous communication with reliable delivery using data called Messages through Message Channels (aka Queues). Messaging System manages and coordinates sending and receiving messages. It allows for remote communication, integration between applications of different platforms and languages and acts as a mediator in between them. In that way applications are loosely coupled and connected to the system but not to each other. The system can then involve resources to provide high-availability, load balance etc. Such middleware comes with obvious benefits but also with challenges.
Message is a wrapper for data to be transferred between applications. It consists of two basic parts:
- Header – information to describe the data (that can be used by a messaging system itself)
- Body – the real data transmitted as-is
Document Message is used, as the name suggests, for document transfer – data structures – between applications. One can imagine that the communication channel will be dedicated to a particular type of document which is reminiscent of the Datatype Channel though it can be transferred through Point-to-Point and Publish-Subscribe either. What matters is WHAT we send, and so the format of the document.
– The thing that we have to take into account in our publishing system is the version of article format.
Command Message is a message used to asynchronously call specific functionality of another application. It is a regular message and can be of any format, but it is important that it can indicate to the receiver what job it should do (old good Command pattern is bowing here).
The most important thing is to have the target action in mind, the one that you want to call and the actual execution of it. Well, that should have a semantic reflection in the name of the command having a verb or a verb phrase. These intentions are very important: WHAT is to be done by WHO.
The receiver will most often be identified by a Point-To-Point Channel – combination typical of messaging for a situation where a message should be consumed and executed only once.
– At CMS, it’s easy to imagine commands as actions triggered by UI users. However, when it comes to messaging, we rarely use it even though there is room for them. An example can be commands sent for reindexation purposes or for some enrichments, etc.
Event Message is used to communicate the occurrence of changes in the system, with the knowledge that other parts or external subscribers may be interested in these changes. Such a message should carry information about WHAT happened and WHERE (the subject of changes). Thus, correctly named Event will be in the past tense. Timing, so WHEN a change has occurred, is also more important here than the content itself, unlike Document Message.
We have two models available here:
- push – on the state change occurrence, the new state with the message’s content is sent; it’s easy to guess that it’s a kind of Event and Document Message composition
- pull – we only send the minimum amount of information, either sufficient or on the basis of which message receiver can request the details
Event Message is usually broadcast via a Publish-Subscribe Channel and subscribers might be interested only in a certain type of Events, even if different ones travel through same channel.
– In our system, the choice fell on the push model. Usually, we send out Events informing about the change of the state of an article, or its component, together with a Document representing its current version.
Message Channel is a virtual pipe that connects a sender to a receiver in the Messaging System. However, that’s you who is to determine how your applications will communicate and then create the proper channels. Eventually, they end up as logical addresses in the Messaging System. The challenge is to identify what channel you need and what for. Messaging is not a system that application can use to randomly throw information to. Still, one application that is sending a message may not know which others will receive it. But that’s the channel, created for a specific purpose, aimed to support certain type of messages, so that the receiver can be assured that there’s data that he is interested in. So, here’s about the naming again – it is important to name the channel respectively to the messages it drives.
More blog articles
Datatype Channel is a dedicated to a specific type of data – their structure and format. Thanks to this, the receiver of a message is sure what data he has to deal with and he can process it safely.
– Our publishing platform, in the most simple example, can bring to mind the channel that the articles, or certain types of articles, are transferred through.
Point-to-Point Channel ensures that only one of the consumers will process the message at a given time. So even if there are more consumers, for example, the cause of concurrent reasons, the channel ensures that the message only goes to one of them.
– And here comes the Amazon SQS that we use, which provides us with that possibility. When the receiver reads a message from a queue, it remains there but is not returned to any other one. Then, after processing, receiver deletes a message preventing from being read and processed again. Amazon stores copies of messages on multiple servers for redundancy and high availability. That’s why it may happen that the message will be received again. So it is important to design the system to be idempotent. Standard queues of SQS use at-least-once delivery, but for the messages that are to be processed exactly once and in proper order, the FIFO queues can be used.
Publish-Subscribe Channel is dedicated to cases when the message is to be sent to many receivers, all potentially interested in the same message and everyone has to receive it only once. The channel has one input channel and many output channels, each dedicated to one receiver. Publish-Subscribe ensures that a copy of the message reaches every output channel, and then the message is being considered delivered and deleted from the channel.
Publish-Subscribe implements patterns that serve to decouple observers from their subject and facilitate notification. Therefore, subscription to such a channel should be subject to restrictions, so that we can be sure that the message only goes to authorised receivers.
– In the case of our publishing platform, many services implementing various system functions, as well as many newsrooms may show interest in the same articles or their specific components or in the different type of changes. We use here Amazon SNS, which provides us with a possibility to fan out messages to a number of subscribers (in our case, most often Amazon SQS). It allows simplifying the logic by offloading message filtering and routing from applications.
Messaging is both asynchronous and one-way communication. However, if your applications need two-way communication – getting the result of a command or query being requested – they can still use the messaging. Then messages travel in separate channels – Request Channel and Reply Channel, and in their exchange take part the following components:
- Requestor – which sends a request message
- Replier – which receives the request message and send a reply message
In this case:
- the request should contain Return Address so the replier should know where to send the response,
- the reply should contain Correlation Identifier indicating the request
– We do not use two-way communication in our system. However, it happens that we use commands that travel through various links of the system, and we want them to arrive and be specifically executed by a specific receiver.
The requestor does not necessarily have to wait for a response. Maybe he wants the reply message to be consumed and executed by some particular receiver. Therefore, Return Address is used to identify just the reply channel and is put in the message headers (because it is not part of the message itself).
It is important to notice, that Messaging might not provide the order in which messages are delivered, and processing may take a different amount of time. Therefore, the reply message should contain additionally Correlation Identifier identifying the request it is responding to. If each message has a unique identifier, reply message can use the request message identifier as a Correlation Id. Like the Return Address, Correlation Id, but also message identifier itself, should be placed in the headers.
For the sake of clear intentions visible in our architecture, it is worth considering what we send, how and where. So just to sum it up briefly:
- Document Message is about WHAT we send to i.e. Datatype Channel
- Command Message is about WHAT is to be done by WHO; sent to Point-to-Point Channel
- Event Message is about WHAT has happened WHERE and WHEN; sent to Publish-Subscribe Channel
In next article I describe shortly message routing and endpoints.