Optimizing SQL

night-job-img1
While you are sleeping, there are lots of programs are running based on it’s scheduling. They are often called “Batch Jobs“. And one of batch jobs is to get rid of old transactions.

In Biz apps, we often need to do performance tuning. According to my experience, the SQL is usually the biggest reason to cause a performance issue.

I have a memory that I have a performance issue for this SQL.
This is to get the target records for Data Purge (Oracle):

SELECT
*
FROM TABLE_TO_BE_PURGED A
WHERE
EXISTS ( 
    SELECT '*' 
    FROM  
          PURGE_TARGET_TABLE TGT
    WHERE 1=1
    AND  TGT.INV_NUM  = A.REF_TXT_01
) OR EXISTS ( 
    SELECT '*' 
    FROM 
          INTEREFACE_TABLE ITFC, 
    PURGE_TARGET_TABLE     TGT 
    WHERE 
         ITFC.REF_NUM     = B.REF_NUM
    AND  ITFC.REF_SUB_NUM = B.REF_SUB_NUM
    AND  ITFC.INV_NUM     = TGT.INV_NUM
) 

And came up with this after struggling hours…

SELECT
*
FROM TABLE_TO_BE_PURGED A1
WHERE EXISTS (
    SELECT '*'
    FROM (
        SELECT 
            B1.PURGED_PK
        FROM 
            TABLE_TO_BE_PURGED  B1
        ,   PURGE_TARGET_TABLE  TGT
        WHERE 1=1
        AND TGT.INV_NUM = B1.REF_TXT_01
        UNION
        SELECT 
            B2.PURGED_PK
        FROM 
            TABLE_TO_BE_PURGED  B2
        ,   INTEREFACE_TABLE    ITFC
        ,   PURGE_TARGET_TABLE  TGT
        WHERE 
            ITFC.REF_NUM     = B2.REF_NUM
        AND ITFC.REF_SUB_NUM = B2.REF_SUB_NUM
        AND ITFC.INV_NUM     = TGT.INV_NUM
    ) T1
    WHERE T1.PURGED_PK = A1.PURGED_PK
)

The performance improved dramatically, but the interesting thing was that the execution plan (cost) was pretty much the same. The one above get slower when having more records. And the one below was fast enough although having many records.

Hash Set vs Array List

title-img-nlogn-1
Recalling a Big-O notation in (CS) Computer Science major course I have taken in College:

O(n)O(1)O(log n)O(n log n)O(n^2)O(n^3)
111111
211248
41281664
8132464512
1614642654096
10241101024010485761073741824

In Biz apps, you need to use List object all the time. And often has performance issue. It’s better to use some other data structure, such as Map object because HashSet is much faster compared to ArrayList:
getcontainsaddremove
ArrayListO(1)O(n)O(1)O(n)
HashSetN/AO(1)O(1)O(1)

Note: HashSet does not provide “get” method, but as long as “contains” method is available, we know what element we need

Here is how to work on:
First, converting ArrayList to HashSet

private Set<String> convertToHashSet(List<Map<String, Object>> listToDelete) {
	Set<String> set = new HashSet<String>();
	for (Map<String, Object> obj : listToDelete) {
		set.add(new String("test"));
	}
	return set;
}

Or just simply set a list as constructor’s parameter:

private Set<String> convertToHashSetDirectory(List<String> list) {
	return new HashSet<String>(list);
}

“contains” with for loop version:

private void hashSetDemoFor(Set<String> setOfMdseOnScreen, Set<String> setOfMdseAcctTo) {
	Map<String, Long> resultMap = new HashMap<String, Long>();
	for (String mdseAcctTo : setOfMdseAcctTo) {
		if (setOfMdseOnScreen.contains(mdseAcctTo)) {
			// Do something
			continue;
		}
	}
}

“contains” with while loop version:

private void hashSetDemoWhile(Set<String> setOfMdseOnScreen, Set<String> setOfMdseAcctTo) {
	Iterator<String> itr = setOfMdseOnScreen.iterator();
	while (itr.hasNext()) {
		String mdseOnScreen = itr.next();
		if (setOfMdseAcctTo.contains(mdseOnScreen)) {
			// Do something
			continue;
		}
	}
}