Friday, March 25, 2011

AQ - queue is stopped

We started getting this error message in the log file all of sudden, and not sure how the dequeue for the queue got disabled.

[Linked-exception]
java.sql.SQLException: ORA-25226: dequeue failed, queue JMSUSER.AIA_CUSTOMERJMSQUEUE is not enabled for dequeue
ORA-06512: at "SYS.DBMS_AQIN", line 571
ORA-06512: at line 1

        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:138)
        at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:316)
        at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:282)
        at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:639)
        at oracle.jdbc.driver.T4CCallableStatement.doOall8(T4CCallableStatement.java:184)
        at oracle.jdbc.driver.T4CCallableStatement.execute_for_rows(T4CCallableStatement.java:873)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1161)
        at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3001)
        at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3043)
        at oracle.jms.AQjmsConsumer.dequeue(AQjmsConsumer.java:1601)
        at oracle.jms.AQjmsConsumer.receiveFromAQ(AQjmsConsumer.java:916)
        at oracle.jms.AQjmsConsumer.receiveFromAQ(AQjmsConsumer.java:835)
        at oracle.jms.AQjmsConsumer.receive(AQjmsConsumer.java:776)
        at oracle.tip.adapter.jms.JMS.JMSMessageConsumer.consumeBlockingWithTimeout(JMSMessageConsumer.java:405)
        at oracle.tip.adapter.jms.inbound.JmsConsumer.run(JmsConsumer.java:330)
        at oracle.j2ee.connector.work.WorkWrapper.runTargetWork(WorkWrapper.java:242)
        at oracle.j2ee.connector.work.WorkWrapper.doWork(WorkWrapper.java:215)
        at oracle.j2ee.connector.work.WorkWrapper.run(WorkWrapper.java:190)
        at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(PooledExecutor.java:830)
        at java.lang.Thread.run(Thread.java:595)


It was quite easy to fix it. Following query shows the queues which has dequeue or enqueue disabled:

select * from all_queues where trim(enqueue_enabled) = 'NO'


Once idetified which queue has issue, we can run following query to enabled the dequeue and enqueue for particular queue:

execute DBMS_AQADM.START_QUEUE(queue_name=> 'JMSUSER.AIA_CUSTOMERJMSQUEUE');

Wednesday, March 23, 2011

ESB 10g version agnostic routing rules - [default version]

In BPEL, it is quite easy that if we don't specify the version during partnerlink invocation, it will use the default one. But in ESB, especially when ESB is calling BPEL via native binding the version of the BPEL process is hard coded inside the routing rule. This was quite annoying because this pattern is heavily used in AIA and redeploying newer version of BPEL requires change in routing rule in ESB.

In 10.1.3.5, Oracle released a new feature which solves this problem. After 10.1.3.5 upgrade, if you deploy any BPEL process, it creates a separate version in BPELSystem called "default" which is visible in ESB Console as well as JDeveloper.


If we select default during routing rule, then the default version of the BPEL process would be called from ESB during native invocation. Therefore if we change the BPEL default version or redeploy BPEL process with higher version, it won't require change in ESB routing rules.

BPELConsole promiscuous mode

Nothing really new, but been having my eyes on this promiscuous mode, but never had time to sit down and put it under the test. It is at the server level accessible via collax-config.xml or BPELAdmin console.

So in 10.1.3.3, we had simple feature called:
productionServer = true/false (false by default)
true: same version of BPEL process is not allowed to be deployed. You get following error message:
is being re-deployed to a Production Server with same revision number.Please modify the revision for the process.

In 10.1.3.5, productionServer is deprecated and it is replaced with serverMode property. As per documentation:

serverMode = production/developement/promiscuous  
Identifies the server mode. Currently supported server modes are:
    * production - re-deployment of process with same revision is not allowed.
    * development - re-deployment of process with same revision marks the existing instance as stale.
    * promiscuous - re-deployment of process with same revision will not stale instances, work items will be migrated. 
The default value is "development". 


I thought promiscuous is something everybody would want, so I thought experimenting it with. First changed the serverMode to promiscuous, and restarted.

Test case 1 (Synchronous) :

I deployed synchronous process A with couple of activities. Fired some instances of process A. Now I completely changed all the activity names in process A, added and deleted some more activities and redeployed process A with same version. Fired some more instances of process A.

Results: It was pretty good and as expected. None of the instances went stale. Old instances were shown based on old code and new instances were shown based on new BPEL code.

Now, as synchronous processes held the ground firm, it was time to experiment asynchronous flow.

Test case 2 (asynchronous ) :

I deployed asynchronous process B with a few activities. I had process B calling another asynchronous process C and then do receive activity. This was to make sure B goes to dehydration and I can get some inflight instances. I ran some transactions so that process B has a few instances inflight mode.

Now, I drastically change the process B, added/deleted new activities (but did keep the call to C and receive from C). I deployed the new code of process B with same version. I created some new transaction and tried to complete some old transactions.

Results: Quite fantastic. Nothing went stale. Process B in-flight instances with older code got migrated to newer code process B after dehydration point, and completed successfully with newer version of code. This is seems to be very powerful but delicate feature and person doing deployment should be very cautious about using promiscuous mode with in flight workflows.


Test case 3 (asynchronous human workflow) :

Created a simple process D, with Human Task, and created some inflight workflows for process D. I made some minor changes in process D, and redeployed the new code with same version.

Result: All bpel instances were inflight, but all human tasks went stale, at least didn't show up in WLA. This definitely didn't pass the expected results.



Conclusion
It is definitely very strong feature and avoid some hassles with stale activities. But in general it would be great to have this feature at the process level instead of server level. That way, we can do promiscuous only for sync and not just version the async one. I could not find such feature in 11g, guess I need to look around if composite have similar feature.


Wednesday, March 2, 2011

ESB Connection Leak in ESBPool

If I have to come up the list for top 5 things I hate most, Oracle ESB (only 10g) would definitely make it to that list. I have been working on it for almost 5 years and pretty much every single project, including project involving complex business processing in ESB to millions of transaction per hour just using ESB. Each time ESB never fails to disappoint me further. Every new release since 10.1.3.1 fixes hundreds of bugs and introduced another new hundreds. Just an instance "Unable to build the instance relationship", one should get nobel price for solving it as it has been unsolved mystery for 5 years and seems to be more complex than E=MC2. I personally have heard statement - "ESB is unreliable product" from internal Oracle SOA Gurus, and I completely disagree with it; I think that calling ESB a "product" itself is fundamentally breaking the laws of software ethics.

Anyways, recently did 10.1.3.5.2 upgrade and we saw that ESB connection pool (ESBPool) was growing 100 connections per hour. After deprived sleep multiple nights, we found that it was another ESB bug. There were constant error messages like "Unclosed connection detected : 'oracle.oc4j.sql.spi.ConnectionFinalizer@" in log.xml. Upon creating multiple different test cases, I found that it was with a very specific pattern:

AQ -> ESB Consumer -> ESB Async Routing Rule -> Asyn BPEL

The Async Routing Rule was the culprit and such flows were found in OOTB AIA code. Once we fixed it from Async to Sync, connections never went up more than 2, and life was back to normal again.