Monday, October 17, 2016

Unreliable affected-rows with conditional upserts in MySQL

This post is about my attempts with affected-rows for MySQL conditional Upserts to be unreliable.

Image result for conditionalUPSERTs are tempting, with perks like single round trip to DB,  atomic properties, and simpler SQL client code.
Then, if your update depends on the existing row data, there's even Conditional UPSERT.
A conditional UPSERT can have one of three outcomes:
  1. Row is inserted as a new row
  2. An existing row/column is updated (condition evaluated True).
  3. Existing row/column wasn't updated (condition evaluated False).
The SQL client can tell the UPSERT outcome with the Affected-rows value.

For non-conditional UPSERTs the affected-rows value per row is 1 if the row is inserted as a new row, 2 if an existing row is updated, and 0 if an existing row is set to its current values.

For conditional UPSERTs I saw non consistent results. affected rows was 2 OR 3, when an existing row was updated (condition true), 2 when an existing row wasn't updated. 1 for new row is inserted.
Such inconsistent behavior could easily cause bugs. I didn't find a clear pattern of when it happens. I also didn't find proper documentation of what should is the expected affected-row value. 
Since the ON DUPLICATED KEY UPDATE uses an IF statement I suspect it loses track of the results of the IF statement. Also each column's condition can evaluated differently than others columns conditions.


without knowing what to expect of the # of affected rows, and UPSERTs being easily breakable in nature when switching columns update order). I decided to ditch them completely, and ended up implementing the conditional logic in Java client side, using more than one SQL command wrapped in a transaction.

 My conditional UPSERT (expect inconsistent affected rows value):

INSERT INTO account_last_touch
            (service_id, account_name, user_name, touch_time)
            VALUES('123', '456', 'u1', '2016-07-19 12:11:15')
 =IF('2016-07-23 12:11:15'>touch_time, 'u1', user_name), 

touch_time = IF('2016-07-24 12:11:15'>touch_time, '2016-07-24 12:11:15', touch_time)

Tuesday, January 12, 2016

Is String.hashcode() unique enough?

Given a set of unique Strings is their set of String.hashcode() values unique enough?
Well... it depends on what you define as enough.
In my case below it was enough. Read how I assessed it.

It's clear that different inputs might map to the same hashcode value (2^32 different options), but what are the chances for it to happen?
I have one million users, each user owns 50 private items. An item is identified by a UUID.
I had these two conflicting goals:
(I) Represent each item as an integer instead of a UUID
(II) Avoid collisions. Any pair of items owned by the same user should resolve to a different hashcode.

What is enough: I could live with up to 10 users, out of a million, experiencing a collision. Most of these 10 users will never notice the collision. I assume the system will have other bugs with higher probably than that.
We're all unique

Assessing uniqueness

One way to asses is computing the statistical probability for such an event . But I preferred a "proof" that any programmer could appreciate even those without good statistics skills. Therefore I coded a simulation that simply ties it in practice:

Download from Gist
package collisions.test;

import java.util.HashSet;
import java.util.Set;
import java.util.UUID;

public class UUIDToHashcodeUniquenessTestMain {

 private final static int num_of_users = 1000 * 1000;
 private final static int num_of_stacks = 50;

 public static void main(String[] args) {
  int collisions = 0;
  for (int i = 0; i < num_of_users; i++) {
   collisions += calcCollisionsForUser();
  System.out.println("Had " + collisions + " collisions for " + num_of_users + " users");

 private static int calcCollisionsForUser() {
  int collisions = 0;
  Set<Integer> uuidSet = new HashSet<Integer>(num_of_stacks * 2);
  for (int i = 0; i < num_of_stacks; i++) {
   String uuid = UUID.randomUUID().toString();
   Integer uuidHashcode = uuid.hashCode();
   if (uuidSet.contains(uuidHashcode)) {
  return collisions;

The program comes back saying that a collisions aren't really something to worry about:
Iteration 0: Had 0 collisions for 1000000 users
Iteration 1: Had 0 collisions for 1000000 users
Iteration 2: Had 0 collisions for 1000000 users
Iteration 3: Had 0 collisions for 1000000 users
Iteration 4: Had 0 collisions for 1000000 users

Thursday, October 31, 2013

A unit test to enforce max heap when running Android UT on the PC

2013's mobile devices come with 1-2GB RAM, yet Android still enforces a very small heap size of 24MB-64MB only (though it keeps increasing with time).
It's pretty easy to write an Android app that drains the heap. For example: Caching images w/o an LRU cache, reading whole files into memory instead of working with streams.

-- Your code will always use up as much memory as the system has (My spin on Parkinson's law).

I'm developing an Android app with a big UT suite that I run on Eclipse in the PC. I noticed that my default heap size is 256MB, huge compared to mobile, meaning my tests could pass on the PC, but still cause an OOME on an actual device.

So, I created MobileLikeSmallHeapDuringTestsEnforce, a new unit test to enforce a small heap size during Junit tests execution. Just make sure you throw it in to any test project you have and you're safe.

Created as a GitHubGist, you're welcome to make it better:

Tuesday, January 15, 2013

Quickest way for a one-off XML sort - OR - Learning to keep the heavytools in the shed

What's the quickest way/tool to sort a 1000 entries xml file?
Your requirements: The xml is parked on your desktop. You only need to sort it just once so you can manually examine it. Sort by the tag "relevance:score".

How would you go about it? Would you:
A) Craft a pipe stream of shell utils?
B) Use the heavy tools - a Java main() that with uses JDom?
C) Refresh the XSLT skills you never had?
D) Try your luck with a Python script?
E) search for an online XML editor tool?
F) Or, my pick at the bottom.

Example xml document to sort:
    <!--- 997 more entries -->

My Pick: considered all the above but it sounded like a headache for a simple sort operation. So I've .... thrown the file at MS Excel, turns out it can digest it rather well, and then I sorted by the score column. Yes! Surprising. But crappy MS Excel did the job (the original schema had more nesting than the document in the example above).

Life-saver lesson: Spend time picking the right tool for the job, than on the job itself.

One click to sort

Wednesday, December 5, 2012

Specifing arrays size/capacity - my latest buggy code and a bestpractice

We're often required to specify size/capacity when allocating array like structures.

While you MUST specify size for Java's primitive arrays. With realizable arrays like ArrayList/Vector, specifying #capacity is too often a #premature-optimization, that you could avoid.

happened to introduce a bug to our code base, by initialized an ArrayList with a negative capacity value:

List<RetrievedDocument> docs =  new ArrayList<RetrievedDocument>(scoreDocs.length - resultsOffset);

Performing such profanity results in an #IAE being thrown (triggered by a NegativeArraySizeException thrown by the underlying primitive array structure.
public ArrayList(int capacity) {
firstIndex = lastIndex = 0;
try {
array = newElementArray(capacity);
} catch (NegativeArraySizeException e) {
throw new IllegalArgumentException();

Lesson for the next 50 years (or until I switch off from Java):

Whenever setting the size of an array (or anything sizable) to an unknown ahead value, spend a second to consider whether the value could be negative.
Then, if the desired resulting behavior is to create a zero-sized array instead, use this one-liner for that:
new ArrayList( Math.max(0, possiblyNegativeCapacity) );
Q: Do you know if other languages makes the use case above easier/less error prone?

Q: Any other related best practices and pitfalls you would like to share?