Under the hood of CHINOOK. Front page demystified

This fourth article in the series is all about the front page challenge. What are the scenarios behind the front pages? How does this mechanism even work? Read on to find out!

If you haven’t read any of the previous ones, I highly recommend you to read at least the one titled “Chinook: cracking the front page challenge”. This text is a continuation of that story.

Common language

There is no need to elaborate on the importance of mutual understanding when it comes to communication. Whenever you ask your friend to bring you an orange you don’t need to describe in details what it looks. You take it for granted that your colleague will get back to you with a round orange-colored fruit.

In a picture perfect world, all individuals participating in our project were on the same page in terms of awareness and understanding. We needed to establish set of names for each element. We had to separate design elements from content elements and strictly technical components. Finally, we had to communicate all of it clearly, and explain to all involved parties: designers, editors, business people and software engineers. With no further introduction, I would like to introduce you to our naming convention.

Content elements

First, I will recall the entity I already mentioned in the last blog post – “article teaser”. It is a piece of content visible on the front page linking to an actual article. One article teaser can be presented in many different ways.

You may wonder: What is an article teaser made of? Since teaser is an example of content, it can contain: image, headline / title, lead text (one or two sentences of introduction to the article). It is pretty much different for all publications. In Aftenposten, we agreed that we will use two attributes: image and headline. Important note: it is possible to have a teaser with headline only (without image) but it is impossible to have a teaser with image only (missing headline text).

Layout elements

We start with the basic element called “ingredient”. In our concept, ingredients are like atoms. There are two types of ingredients: “image” and “text space”. Attributes of the ingredient are: width and height, image can have additional attribute – ratio.

With the ingredients you can create next element of our layout – “brick”. Brick has to contain at least one ingredient type text space. As stated before – teaser has to have a headline. A typical example of brick is a combination of ingredients of both types image and text space.

Important note: brick != article teaser. The same article teaser can be represented by several bricks. Below you can see some examples of bricks:

When you have bricks in place, you can group them together. A group of bricks is called a “block”. Block has to contain at least one brick. A potential look of blocks:

The last piece of this puzzle is called “floor”. It is a bit more abstract element. It originates from the way newsroom perceives the front page. In general, the floor is the bigger part of the front page. It wraps a set of blocks. The floor should contain at least one block.

Why do we need it? For several reasons. Firstly, when we define floor we can explicitly say what blocks we want to see in a given part of the website. We can also ban specific blocks from showing there. It is important for us to have full control over what gets where.

Secondly, our front page, in a natural way, is cut into several parts by the full-width ads. By using the concept of the floor we can exclude ads from the equations – we focus only on the editorial content. That also represents the fact that newsroom, or to be specific – front page editors – are not in charge of the ad-placements. This is something that is given to us and provided by different departments in our organization.

Last but not least, by utilizing floors we can highlight some parts of the front page. We can present the content in a specific way, e.g. a section presenting teasers of a video content can differ from the rest.

Now we can summarize everything together and get a full overview of the layout:
LAYOUT consists of a number of FLOORS separated by Ads. Each FLOOR contains a number of article teasers that are grouped into BLOCKS. BLOCK is a group of 1 to n BRICKS arranged together. BRICK may be built from one or two ingredients: IMAGE and TEXT SPACE.

Technical elements

CHINOOK” – is the name of the library we have created. The library itself is responsible for implementing the algorithm that solves the problem of automated creation of the layout of the whole front page.

Manifest” – configuration file required by the algorithm.

Fit Function” – a function that validates if one element can be paired with another. For example, it checks if a given brick can be a part of the selected block and return true or false depending on that.

CHINOOK – what does it do?

For those of you who have read my previous articles from the series, it should be more or less clear. For the rest of the readers this two-sentence recap should give some introduction:

CHINOOK is a piece of a clever software, a library, a layout-ing engine, that can take any set of articles and stack them together, to make the final outcome look like today’s Aftenposten’s front page. Unlike drFront (what you see is what you get tool used today to create front pages), our solution doesn’t require any manual work or attention from the journalists.

Configuration

Before we can render our layout, we first need to prepare a manifest. This is the place where we can define building parts of our front page: ingredients, bricks and blocks. A typical configuration of the ingredient looks like this:

As you can see, there are two types of ingredients: image and text. Each ingredient type has its own Fit Function.

Fit functions connected to the ingredients are very important. Ask yourself a question: what does it mean that the headline of the article teaser fits a given ingredient? Probably the first idea that pops into your mind is about text length. It can go like this: if the length of the title is longer than 100 characters it can not fit given space. Case closed.

But, actually, it is not that simple. Look at two words: “wow” and “ill”. One word, 3 letters and you instantly see that counting characters is a bad idea. It can get even worse: if your text space has enough room for two or three lines of the title, suddenly a word count can play a significant role (it’s easier to break lines when you have to deal with a long title made of short words). You need to be prepared for the situation when the title is long (in terms of character count) but contains only 3 words. In this case, it will be harder to fit it into an ingredient that is narrow but high:

Minimizing the whitespace in each container was yet another challenge we faced.
Let’s have a look at the problem from a different angle. The front page editors can manipulate the font size of the title. They actually always use the largest possible font size to fill up all available space. We had to do the same and we decided to use the fit function for that too.

How we did it is a good topic for another article. If you want to read about that let us know in the comments section below. Nonetheless, long story short – we are using headless browsers to determine the conditions of fitting title to text space.

When we have all ingredients in place, we need to define bricks. And every brick has to contain at least one ingredient:

As you can see, each brick has its own unique name.

The next step in the configuration process is all about blocks.
A side note: I will not share the code with you because there is a lot of going on there, and this article is not a good place for the code review. I will, however, do my best to explain what is needed to make a block work.

An example of a block:

To define a block you need to provide a list of bricks from which it’s constructed. Same as with ingredients, each block has a “fit function”. The difference is that the block requires few types of fit functions. While looking for an article teasers that can be used in a particular block we need to check several things:

Priority Fit

Do you remember that part from the last blog post about spreading important articles equality across the front page? One of the fit functions makes sure that actually happens. Take a look at the example of the block above. It’s easy to understand that “article teaser 1” is the most important because it takes most of the space of the block. Next two article teasers are less important.

Our fit function checks if the article teaser is important enough to be placed in the most prominent position in the block. In practice, we take 3 article teasers and we look for the most important from the group and put the winner in the spot. If all article teasers are equally important, then the block can not be used. As suggested by the design, only one story is the most important. And that would not be the case if all teasers were on the same level.

Content Fit

As the name suggests this function evaluates if the content fits the block. In other words, it checks if the article teaser can be represented by bricks building the block. The example given above shows that as an article teaser number 3 we can only use a story without an image. You can argue that we could use any teaser and simply don’t show the image. We don’t do that because our role is to present all the content we get. Our algorithm can’t arbitrary hide content elements. Imagine a situation when illustration presents sweet little kitten and the headline is just one word: “Sweet!”. This headline makes no sense without the image.

Awesomeness factor

A very important functionality is hidden behind this intriguing name. There are blocks that we consider more or less attractive. We even have a block that no one likes, but it serves us as a fallback in case we can’t use any other block with a given set of articles. Awesomeness factor is a value assigned to each block and it tells the algorithm how much we like it. You can call it a ranking of blocks.

One of the requirements for the algorithm says that we should deliver diverse layout and by all means avoid repetition of one block all over the page. We use awesomeness factor to do this job. Whenever a block is used on the front page we lower down the initial value of “Ave Factor”. Thanks to that we decrease a probability of using this block in the next iteration.

How does it work?

Assuming the configuration is ready, the algorithm takes an array of article teasers as the input. The size of the array may vary but in the Aftenposten’s context, we use around 100 teasers to render the desktop front page. This is how algorithm looks in steps:

  1. Go through the list of teasers and check each against the list of ingredients. Use fit functions to do that. Result – all teasers have information about ingredients they fit.
  2. Find the bucket size. Bucket is a maximal number of teasers that can fit one block.
  3. Take the given set of blocks that we can use on the floor we are currently processing.
  4. Pick from the top of the array number of article teaser that equals buckets size.
  5. Run this smaller array against fit functions of selected blocks.
  6. Score all possible combinations.
  7. Pick the winning block for given number of articles.
  8. Repeat the procedure from 3 – 7 for the next set of articles.

Few remarks about the process:
Step 6
I haven’t mentioned it before. Fit functions can return values in a range between 0 and 1. 0 when the tested element doesn’t fit. If the element fits, we can return a different value depending on how well it fits.

Step 7
Since fit functions can return different values, we are able to score how good a set of articles match to the block. It means we can compare whole blocks together. Our algorithm cares of context.

“A-Team”

It is worth mentioning that the realization of this project would not be possible without a great team. The least I can do is to list all members that are and were involved in this effort.
In Oslo Lars Raaum as product owner and Robin Klein Schiphorst as UX designer, they did remarkable work. In Kraków, as software engineers: Michał Misiarek, Krzysztof Słonka, Maksymilian Bolek, Wojciech Kabała and me (Robert Tekielak), we took concepts and ideas and embodied them in code. On different stages of the project we cooperated and still cooperate with: Espen Tandberg, Susanne Klungtveit and Hans Martin Cramer from Oslo. Recently the team in Kraków has changed and Katarzyna Włodarska joined Michał and Wojciech in further development.

Thank you very much for staying with me until the end of this series. If you have any question – don’t hesitate to contact me. If you see any interesting use case for this library, or you would like to try it yourself – let us know.

Subscribe to our newsletter
Menu