Predicting arrival times of container vessels

Freight transport is one of today’s most important activities due to its influence on all economic sectors. Read all about it in this Thesis.

In this blog

Within our Digital Factory, eMagiz offers a challenging and inspiring environment for scientific research into Digital Transformation & Innovation. We translate strategic themes into flexible solutions for a wide range of organizations. eMagiz collaborates with several universities and scientific institutions, such as the University of Twente, TKI Dinalog and NOW. We are running innovative projects so that we can bring science and business together with solutions that make a difference. Subjects in which we offer Thesis research are within the areas of digital transformation, CI/CD, blockchain, BI and big data, machine learning, IoT and innovation.

Freight transport is one of today’s most important activities due to its influence on all economic sectors. A Dutch Logistic Service Provider (LSP) currently applies a reactive attitude towards arrival time information that is solely based on the carrier’s sailing schedule. However, this sailing schedule historically appears to be unreliable: 20% of the orders that the LSP executed last 2.5 years, did not arrive on time. Note that this on time performance is based on a threshold of at least six days deviation from the scheduled arrival time before an order is classified as ‘not on time’. When only zero deviation in the scheduled arrival time is allowed, the on time performance becomes even worse: 74% of the orders did not arrive on time, and had a deviation of at least one day.

Customers are aware of the LSPs bad performance with respect to arrival time information and from a customer survey the need became visible for proactive provision of more accurate arrival time information. Not being able to exactly know when an order will arrive, negatively affects the businesses of both the LSP and the customer in terms of decreased efficiency and increased costs. In case of a deviation in the Estimated Time of Arrival (ETA), the LSP is busy having increased customer contact to inform the customer with the deviation, that would have been unnecessary otherwise. In the worst case, the LSP fears potential loss of customers. The customer is particularly financially affected by a deviating ETA. Rescheduling costs are incurred when the order appears to arrive at another day than the customer had accounted for. Or when the customer is not able to pick up the goods on an ad-hoc basis, the customer risks being charged for demurrage fees. It is for that reason that customers indicate that they do not care that much about an order arriving too early or too late, but they do want to know exactly when the order will arrive.


The LSP collects order data since October 2016, and we use this historical order data to develop a prediction model that is able to predict the deviation in the arrival time in advance of actual shipment. If the LSP then communicates this predicted arrival time to the customer instead of the arrival time that is solely based on carrier’s sailing schedule, we aim to comply to customer’s needs of proactively communicating a more accurate ETA. For developing the prediction model, we use historical order data of the LSP.

First, we clean the data and their quality is addressed on the presence of ambiguity and missing values. The target variable that we aim to predict is called the Delta and is the difference in actual and scheduled arrival time.

We use the wrapper approach with a bi-directional search method and find the following optimal subset of features: Departure Week (of the year), Departure Day (of the week), Arrival Week (of the year), Arrival Day (of the week), Carrier and the Port of Delivery. With the 6 predictor variables, we build our prediction model that aims to predict the Delta, which is our target variable.

Variable importance measured by the percentual increase in MSE
Variable importance measured by the percentual increase in MSE


As a result of extensive literature research and some experimental tests, we decide to apply random forest as machine learning algorithm to train and test our model. Random forests have some advantages over other machine learning techniques as they can handle correlated predictor variables, which is the case with some of the variables in our model. Besides, random forests are robust to overfitting. After training the model, we can indeed conclude that the model has a good fit.


Now we have a prediction model that is capable of predicting the deviation of the communicated arrival time, we make the translation to improved business processes for both the LSP and the customer. We choose to address a cost savings’ model from the perspective of the customer, as they are financially most affected by the events directly resulting from a deviation in the arrival time. In the cost savings’ model, three cost parameters are included:

  • demurrage fees
  • rescheduling costs
  • costs for running out of stock

The savings are the difference between the costs in the current situation minus the costs in the new situation. The cost savings’ model reveals it is expected that all customers together can save an average of €771,025 euros on a yearly basis when the LSP communicates the predicted ETA to the customer instead of the arrival time solely based on the carrier’s sailing schedule.

However, the LSP has more to gain than just a satisfied customer who can save costs by getting a more accurate ETA. We therefore also address the improved business processes from the LSPs perspective. We quantify their increased efficiency by counting the times that the LSP is required to have customer contact in the current situation and in the new situation (in which the ETA is based on our prediction model). Customer contact is required from a deviation of 4 days or more and is meant to inform the customer with the delay. When we then compare the current situation with the new situation, in 84% of the orders there is no customer contact required anymore since the ETA did not deviate that much. This would positively affect the LSPs reputation as the customer’s need for more proactively and accurate arrival time information is granted. The LSPs concern of potential loss of customers would be eliminated to this end.

by Nina Bussmann, Graduation intern @ eMagiz