Wednesday, August 25, 2010

Tokenizing Strings

In this post I will cover some String tokenizing issues. I am sure that many of you dealt with toeknizing Strings for many times, and still, I would like to point some issues I recently discovered.

First of all, I was a bit of surprised to discover that the StringTokenizer class is not recommended to be used, and turns out that this recommendation exists back from the days of JDK 1.5 - The documentation for StringTokenizer says that the class is kept for compatibility reasons. It is recommended to use the "split" method of "String" instead.

However, if we take a look at the source code of String.split (as can be found for example at this link), you will notice that the method split compiles the regular expression passed as argument every time the method is called.

It might be better therefore to use the "split" method of the Pattern class.
and in case of "fixed" regular expressions which are used over and over again, keep a static variable for tokenizing. The following example explains this paragraph:


Although my experiments showed that for tokenizing 1000 Strings with a simple regular expression of "," comparing String.split and Pattern.split I saved only about 9-10 miliseconds (tests were run on Intel i5 , 4Gb RAM, Windows 7 64bit) , I am sure that for more complex expressions I could have saved more time.

At this point, some may raise a question about thread safety - as the static variable potentially holds a shared state - the code of the method might be used by many threads.
If we take a look at the source of
Pattern we can see that all the variables that are used inside the split method code are local variables or parameters passed to the method, and not fields, therefore it can be seen that the split method can be used from many threads concurrently. In addition I have read in other sources over internet and verified this issue.

To conclude this post, I would like to suggest a way to use the patterns in your code.
In the above example I presented a case where a comma pattern is used. The comma pattern might be popular and might be required to be used by many classes.
I suggest that a PatternConstants class will provide pattern contstants that can be used across the code.
An example for such a class might be:

Saturday, July 17, 2010

Creating objects of generic types

In this post I will cover an issue that is solved in JDK 7.0, but most of us who work with generics still use JDK 5.0 and JDK 6.0.
If we want to create an instance of HashMap we would need to specify the generic types both at the varible definition and after the "new" keyword.
For example:


This isn't too bad, but what happens when we need to create a HashMap like this:


Suddenly, this gets too annoying.
JDK 7.0 suggests the following solution:


But as previously mentioned, most of us don't use this version of JDK.

Google came with a nice solution in their google collections framework based on static generic methods.
If we look at their Maps utilities class we can notice the newHashMap method, which provides us the following usage:




Genrics are an important feature when we want to have code reuse in our software - this means that in many situations we (or others) may use generics we coded in many locations in our software, and this led me to think - why shouldn't we as a rule provide a static "newMyGeneric" method to generics we code?

For example:



And a possible usage would be:


It clearly can be seen that multiMap2 is initialized in a more elegant way.

To conclude this post, I want to remind once again that JDK 7.0 is solving the presented problem. However, the suggested solution can help for current projects, and of course, if a development team upgrades its JDK to JDK 7.0, the suggested solution will still work, although it may be somewhat obsolete.

Saturday, July 10, 2010

Generics PECS principle, step by step

I will open this post with a disclaimer -
I know Joshua Bloch discusses PECS in his book "Effective Java" (or at least, this is what I heard from people who read the book), and yet - after hearing a lecture about PECS and reading other posts about it, I felt I would like to explain more about the issue.

First of all - what is PECS?
PECS stands for "Producer Extends, Consumer Super" - and it addresses two keywords in context of generics usage: "super" and "extends"

For the sake of the post, let's assume the following is our class hierarchy:




If we take a look at the following list definition and usage:


We can see that we defined animals to be a list of a type that we are not aware of during the definition, but we can say that the type extends Animal.
Therefore, the following lines make only sense - lists of Animal, SeaAnimal and Fish correspond to that definition.

If we go out of the scope of generics, and look at the following code, we can see some similarity:



The dynamic type of "a" is Animal, therefore we can initialize it with any concrete subclasses of Animal.

In order to understand "super" in context of Generics, let's first say with the above example, and look at the following method:



If we would like to set the result of the method in an object (without the usage of a downcast), what can be the type of the object?
The type of the object can be any of the ancestors (super classes) of SeaAnimal, and of course, SeaAnimal itself.

For example:



The keyword "super" is used in a similar way. In the above example, we wanted to consume the result of the method "produceSeaAnimal" to the object "a".

If we take a look at the following method:


And the following usage:


We can see that we are consuming a list of sea animals that are is produced by the method "procudeSomeSeaAnimals" in a slightly different manner than in the case of "produceSeaAnimal" - this is due to a limitation in the generic mechanism that prohibits us from defining the method "produceSomeSeaAnimals" the following way:


and yet, the principal is similar - when we consume the result of the method, the producer should have a contract that defines the consumer information on the bottom level of the class hierarchy it can consume.

The last example will demonstrate usage of "super" and "extends" together - implementing the PECS mechanism in one method call:



And possible usages of this methods:


Which shows us we can pass as input lists that their generic types extend SeaAnimal (as explained in the beginning of the post) to the producer, and we can consume a list that their generic types are either SeaAnimal or a super class of it (as explained above)

Tuesday, July 6, 2010

Cache explained to non computer junkies

Sometimes I try to explain my wife Michal, who does not work in the field of computers, and did not study any computer related degree all kinds of issues I use at my work, or subjects I am being exposed to.

While living in Tel Aviv, we used the closets that the landlord left us and they simply could not contain all the clothes Michal and I have.
Michal kept many of her clothes in her home town Holon, as after she and her sisters left her mom's apartment, there is lots of place there to store clothes, so I took the opportunity and explained her about caching. Our conversation started like this:

Yair: "Where do you keep the clothes you want to wear?"
Michal: "In the closet"
Yair: "Why don't you keep all your clothes in Holon?"
Michal: "It would take me long time to grab something to wear, so I am keeping clothes close to us, in Tel Aviv".
Yair: "But in Holon you have a greater storage area, so what kind of clothes do you keep here?"
Michal: "The clothes I wear the most"
Yair: "And what if you decide to buy a new set of clothes and start wearing them quite often?"
Michal: "I will evacuate the clothes that I use less to Holon"

As you can see, Michal's parents apartment in Holon is the secondary stroage, the closet in Tel Aviv is the cache, and Michal even used a "Least recently used" policy to evict clothes to Holon, whenever she wanted to insert new clothes to our closet.

This shows once again, that principals that are used to solve computer related problems can be applied to everyday life - or maybe this is the case, as after all - computers were made by.... humans!

Tuesday, June 22, 2010

Ending execution of process after a specified timeout

A friend told me about the following problem:
He is using the java Process class to execute a method. However, this class does not have a method that allows to terminate the execution after a timeout was reached.
A possible solution for that can be creating a thread that receives as parameters the process and a timeout value.
The thread will perform the following:
1. Check if the timeout value was reached.
2. If the timeout value is reached, the process will be destroyed.

For example:




However, this thread posses a busy wait loop inside it, which is not recommended to be used.

A better solution for that will be using the Thread.sleep method instead the busy wait loop -




And this also means that we do not need to pass the start execution time as parameter to the thread.

However, if we take a closer look, we will see that we execute the process outside the scope of the given thread, for example:




The process is executed from the main thread of the application, and the after execution, the timeout thread is executed. It is up to JVM (and the operating system) to determine when the timeout thread will be executed - so it is possible that some time will pass between the process execution and calling the Thread.sleep method in the timeout thread.

In order to overcome this problem and reach a more accurate behavior, we should do the following:
1. Measure the time right after the process was executed
2. Pass it to the timeout thread
3. Change the value passed to the call to Thread.sleep to take into consideration the time that has passed since the execution of the thread

The code of the Main class will look like this:



The code of ProcessTimeoutThread will look like this: