Wednesday, November 27, 2013

Connection timeout and recovery - usecase

Connection timeout and instance retries are quite related and easy to configure from Oracle SOA 11g console. Here is one of the use case we wanted go over to see how it works out in realistic production scenario.

Scenario for our environment:
Multiple processes deployed with interaction to mostly external web services and database (local and XA). There are some services or db calls which can take 10/20 minutes (entire call or even from one dehydration point to next dehydration point) and some should take no longer than 10 seconds (SLA). If takes longer than configured time, we certainly don't want to wait, and error out and do another process (e.g. send error notification) or send back error message, etc..


Configuration Screens: Below provides information on how to configure connection retries or various timeouts in SOA 11g.
 


Connection Retries

  • EM -> SOA-INFRA -> SOA Administration -> Common





  • EM -> SOA-INFRA -> SOA Administration -> BPEL


    • Retries related with global transaction rollback



    • Scheduled Retries 







Connection Timeouts

  • BPEL engine (EJB Timeout)
    • console -> deployments -> soa-infra -> control -> [ SELECT BEAN ] -> Configuration -> X seconds
  • BPELEngineBean
  • BPELDeliveryBean
  • BPELActivityManagerBean
  • BPELServerManagerBean
  • BPELProcessManagerBean
  • BPELInstanceManagerBean
  • BPELFinderBean

  • Sync Max





  • Query Timeout : Query timeout can be specified either from BPEL process or at data source level. Value in BPEL takes precedence over data source. 

    • Composite Level QueryTimeout (JCA file or Adapter Wizard)


    • Data Source Level - QueryTimeOut (Statement Timeout)


  • XA Timeout : if data source is XA, we can specify XA timeout, which takes precedence over JTA timeout


  • HTTP Connection Timeout (WS partnerlink or HTTP binding calls) : we can specify that in composite.xml file

  <reference name="PLSQL_WS" ui:wsdlLocation="http://mycomputer:8001/soa-infra/services/default/TimeoutChildComposite/PLSQL_JCA.wsdl">
    <interface.wsdl interface="http://xmlns.oracle.com/TokenTesting/TimeoutChildComposite/PLSQL_JCA#wsdl.interface(PLSQL_JCA)"/>
    <binding.ws port="http://xmlns.oracle.com/TokenTesting/TimeoutChildComposite/PLSQL_JCA#wsdl.endpoint(plsql_jca_client_ep/PLSQL_JCA_pt)" location="http://mycomputer:8001/soa-infra/services/default/TimeoutChildComposite/plsql_jca_client_ep?WSDL" soapVersion="1.1">
        <property name="oracle.webservices.httpConnTimeout" type="xs:integer" many="false" override="may">10000</property>
        <property name="oracle.webservices.httpReadTimeout" type="xs:integer" many="false" override="may">10000</property>
        <property name="weblogic.wsee.wsat.transaction.flowOption" type="xs:string" many="false">WSDLDriven</property>
    </binding.ws>
  </reference>




Target Environment Settings
Below are the settings we used to achieve our scenario mentioned below.


Scenario for our environment:
Multiple processes deployed with interaction to mostly external web services and database (local and XA). There are some services or db calls which can take 10/20 minutes (entire call or even from one dehydration point to next dehydration point) and some should take no longer than 10 seconds (SLA). If takes longer than configured time, we certainly don't want to wait, and error out and do another process (e.g. send error notification) or send back error message, etc..


1. Disable All Retries
Disable All automated retries : It makes no sense for BPEL to automated retry upon JTA timeout or failed instances for any other reason. As it could be data issue or environment issue, and blind retries usually doesn't solve any problem. Retries can be orchestrated via Fault Policy or manually for a specific instance.

    • EM -> SOA-INFRA -> SOA Administration -> Common -> more SOA Infra advanced configuration properties
  • GlobalTxMaxRetry : 0
  • GlobalTxRetryInterval : 0

    • EM -> SOA-INFRA -> SOA Administration -> BPEL -> more BPEL Configuration Properties
      • ExpirationMaxRetry : 0
      • MaxRecoveryAttempt : 0
      • RecoveryConfig -> RecurringScheduleConfig
  • maxMessageRaiseSize : 0
  • startWindowTime: 00:00
  • stopWindowTime: 00:00
      • RecoveryConfig -> StartupScheduleConfig
  • maxMessageRaiseSize : 0
  • startupRecoveryDuration : 0
  • subsequentTriggerDelay : 0

2. Increase Global Timeouts
Increase all global timeout to support the slowest process in the system. Unfortunately there is no exception to this. You can not assign global timeout to specific set of processes. (I believe in 12c they might come up with feature to assign at partition level). We must increase all global timeout so that legitimate slowest process can be completed. After that we have to work backward so that process which needs to be finished faster doesn't get punished. We set all global timeout to 1200 seconds.
  • BPEL engine (EJB Timeout)
    • console -> deployments -> soa-infra -> control -> [ SELECT BEAN ] -> Configuration -> 1200 seconds
  • BPELEngineBean
  • BPELDeliveryBean
  • BPELActivityManagerBean
  • BPELServerManagerBean
  • BPELProcessManagerBean
  • BPELInstanceManagerBean
  • BPELFinderBean
  • SyncMaxWaitTime
    • EM -> SOA-INFRA -> SOA Administration -> BPEL -> more BPEL Configuration Properties -> SyncMaxWaitTime : 1200 seconds

3. Reset XA and Query Timeout
XA and Query Timeout doesn't work, at least not the way it is expected. Query and DB connection is going to get stuck as long as query runs (as explained in blog), unless it takes longer than BPEL/JTA timeout.

  • QueryTimeout

    • Composite JCA file (or Wizard)
 
    • Data source Level





  • Transaction Timeout
  • Enable (Set XA Transaction Timeout)
  • XA Transaction Timeout : 0
  • XA Retry Duration : 0
  • XA Retry Interval : 0




4. Process Configuration with SLA

Now as we have global timeout set at 20 minutes, but there are process which we don't want to wait for 20 minutes, but want to finish in matter or seconds (e.g. 10 seconds), and if not completed we want to do special processing (e.g. send email or return with error), etc.

Process is calling SOAP WS or HTTP Binding

We can specify httpConnTimeout and httpReadTimeout.
Note: If we don't specify those values, and if call takes longer than BPEL engine timeout/JTA timeout, entire transaction will be rolled back and process won't be able to do any error handling.

  <reference name="PLSQL_WS" ui:wsdlLocation="http://mycomputer:8001/soa-infra/services/default/TimeoutChildComposite/PLSQL_JCA.wsdl">
    <interface.wsdl interface="http://xmlns.oracle.com/TokenTesting/TimeoutChildComposite/PLSQL_JCA#wsdl.interface(PLSQL_JCA)"/>
    <binding.ws port="http://xmlns.oracle.com/TokenTesting/TimeoutChildComposite/PLSQL_JCA#wsdl.endpoint(plsql_jca_client_ep/PLSQL_JCA_pt)" location="http://mycomputer:8001/soa-infra/services/default/TimeoutChildComposite/plsql_jca_client_ep?WSDL" soapVersion="1.1">
        <property name="oracle.webservices.httpConnTimeout" type="xs:integer" many="false" override="may">5000</property>
        <property name="oracle.webservices.httpReadTimeout" type="xs:integer" many="false" override="may">5000</property>
        <property name="weblogic.wsee.wsat.transaction.flowOption" type="xs:string" many="false">WSDLDriven</property>
    </binding.ws>
  </reference>



Process is calling DB Adapter
If you are calling JDBC, it is quite hard to provide connection timeout. Query timeout and XA timeout (for XA data sources) doesn't work they way expected as mentioned in blog. There are some work around we can use:

We can put DB adapter in a separate composite, or java WS, and then call from original composite with HTTP timeout, as below:





No comments: