Where is the “retry” in BPMN 2.0?
In a famous article, Gregor Hohpe describes four strategies for dealing with failures in a business transaction:
- Write-off,
- Retry,
- Compensation
- and Two Phase Commit.
How does this map to BPMN 2.0? Here are some experiments I made.
Compensation
In BPMN 2.0 we can model compensation explicitly:
Two Phase Commit
Write-off
Write-off means that you detect that there is an error, however you don’t have to / want to handle it now.
So if there is no Milk, I simply give up.
It could also mean that I drink the coffee anyway.
Adding milk is optional, but it is still desirable. So even if I cannot do something about it now, I might want to do something about it later:
In my opinion this is still a write-off, since I am still drinking this particular cup of coffee without milk.
Retry
First, we can of course model the retrying of an activity explicitly:
(Ok, granted, maybe the retry doesn’t make much sense in this particular real-world example 🙂 )
Second, the retry can be handled by the process engine. For example, the activiti process engine supports the concept of asynchronous continuations: it allows you to define a safe-point before an activity. When reaching the safe-point, the process engine commits its current transaction and releases the thread. In a background thread, it will try to execute the activity repeatedly until it eventually succeeds.All of this is not visible in the process diagram and configured using vendor extension in the process model.
Third, the retry can be handled by asynchronous / messaging middleware. First, we put a message in a queue. The middleware will then continuously try to deliver the message, until it eventually succeeds.
And finally the retry can be handled at the service level. In this case it is transparent to the service consumer: we simply call the service and wait for a callback response. In the meantime, the service might internally perform retries, maybe using a message queue as well. We are not aware of it.
I think that the fact that the retry pattern can be implemented at different levels is very interesting and makes it somewhat different from the other patterns:
- This is different to the way compensation is handled: since the failure of one activity leads to the compensation of another activity, it is best modeled at the process-level. If one service would trigger the compensation of another one directly, we would introduce explicit dependencies between services.
- If it is handled at the service or middleware level, the service invocation must by asynchronous (from the point of view of the process engine). This means that You either need polling or callbacks for retrieving the results and the continuation of the process instance.
- If it is handled by the process engine, you need some concept of “safe-point” before the retried activitiy (cf. asynchronous continuation in activiti). This will usually have an influence on threading and transactions.
Furthermore, we can make the following two observations:
- every service that is unavailable is retryable. What I mean: not every service is retryable by it’s nature (see the cited article by Gregor Hohpe for details). However, if a service call fails because the service is not available (or because the messaging system is unavailable) it is always retryable.
- applying retry sometimes depends on the context of the service invocation. Maybe you want to handle the failure of the same service using retry in one process but using compensation in another process.
From this, I conclude that in general, it is useful to have some concept of retry at at the process-engine level.
However, since implementing retry at the process engine-level needs some concept of “safe-point”, which has an influence on the execution semantics of the process, I am asking myself whether the “safe-point” should be visible in the process diagram.
———-
** interestingly, the article is only available in German Wikipedia
Nice post, Daniel. Any process example with coffee is always easy to understand!
Thanks for this! This way I also discovered the fox user guide, which is quite useful, even when you don’t use fox, but use pure Activiti. Great documentation and lot’s of useful examples there 🙂
Thanks for the praise, guys 🙂
So what is the process you use in real life? I would *totally* like to be that guy from example 5 (the one that makes a note about buying milk later). Unfortunately I am always the guy in the example before that, I make coffee, I see there is no milk, I think “D’oh no Milk”, drink the coffee anyway, and next time, same thing… You’d think that you’d improve over time, but no, you don’t
I think I’ll have to print that other process out on paper and hang it over the coffee maker… maybe that will help.
[…] my last blogpost I presented some experiments on how to model Write-off, Retry, Compensation and Two Phase Commit […]
Did you reach any conclusion related to whether the “safe-point” should be visible in the process diagram. Can this be related to transaction subprocess somehow?