Wednesday, September 7, 2016

Salesforce Batch Processing

Problem statement

Every time there is change (create/update/delete) on salesforce data, we had to do quite a bit of processing and then update the external system over WS or HTTP callout. When the record is updated, we might be in normal trigger, visual force controller, scheduled job, batch, queue, or future. Sometime we can not update the external system (e.g. trigger has update waiting or in scheduled job), and sometime we should not, as it will delay the current operation and user experience. It is preferable to have it done in asynchronous manner. We also have to support huge batches of changes. E.g. 5000 accounts can be updated, and we will need to do processing and update external system based on that.

Current Salesforce Solution Limitation 



Batch Processing
Let's say we use batch processing and combination of trigger. Upon update of records, we start batch processing.

  • We can only initiate 100 batches. 
  • If change occurs in future method, then we can not call batch directly

Similar concerns for majority of approaches. Hence, we had to use below approach to do the processing.

Approach
  • We created EventQueue table
  • When there is change in record, we push the record to EventQueue table
  • Wrote batch processing to process the data from Event Queue table
  • At the end of batch process (finish) method, restart the batch if there is still data in Event Queue
Approach Add On
  • We also wanted to start the batch when we insert records in EventQueue table, hence we don't have to start or stop the batch manually.


  • Code initiation can be either from trigger or someone can call our API directly with List of record ids
  • If data is more than 1
    • Insert all records in Event Queue (1 row per record)
    • Start Batch Processing
  • Else if current context in Queue ( System.isQueue() == true )
    • Insert record in Event Queue
    • Start Batch Processing
  • Else if current context in Queue ( System.isScheduled() == true )
    • Insert record in Event Queue
    • Start Batch Processing
  • Else if we are in Batch
    • If it is our Batch (Even Queue Processing Batch)
      • Run the main code to do processing and call external system
    • Else
      • Insert record in Event Queue
      • Start Batch Processing
  • Else if we are in Future
    • Insert record in Event Queue
    • Start Batch Processing
  • Else
    • Call the main code in via Future to do processing

Here we can ensure, Main code is running either in a separate Future call (one at a time), or in Batch (one at a time). The caller is never blocked because of this processing.


Batch Processing

start()
Queries the EventQueue table for all record

execute()
Call the entry point via API (mentioned as Direct call in above figure)

finish()
If EventQueue still has records (more records got inserted while we were processing), then initial another job.



Approach Add On - Implementation
  • We also wanted to start the batch when we insert records in EventQueue table, hence we don't have to start or stop the batch manually.
This turned out to be quite complex. As you can start batch processing as you like. 
E.g. if we are future context, we can add records in EventQueue table, but can not start Batch processing - Salesforce limitation.

Similar if there is custom batch which is calling our API with let's 20 records, we put in EventQueue, but can not start batch. 

Hence, we used below algorithm to solve the problem:


Start Batch Processing Call

If we are in Future or Batch
  • Can not start Batch directly, hence use indirect route to start batch via Queue
  • Check if EventQueue count
  • Check AsyncApexJob is Queue already exists
  • Check AsyncApexJob is Batch already exists
  • if count > 0 and no Batch and no Queue
    • Queue the Event
Else
  • Check AsyncApexJob if batch already exists 
  • Check EventQueue table count
  • If count > 0 and Batch doesn't exist, start Batch processing

Start Queue Processing Call
  • Check AsyncApexJob if batch already exists 
  • Check EventQueue table count
  • If count > 0 and Batch doesn't exist, start Batch processing

This ensures that as soon as we put data in EvenQueue table, batch is started to process those records and once all records are processed, batch is finished.