Monday, 15 April 2019

JRocket :: Java Memory Model

Java objects reside in an area called the heap. The heap is created when the JVM starts up and may increase or decrease in size while the application runs. When the heap becomes full, garbage is collected. During the garbage collection objects that are no longer used are cleared, thus making space for new objects.
Note that the JVM uses more memory than just the heap. For example Java methods, thread stacks and native handles are allocated in memory separate from the heap, as well as JVM internal data structures.
The heap is sometimes divided into two areas (or generations) called the nursery (or young space) and the old space. The nursery is a part of the heap reserved for allocation of new objects. When the nursery becomes full, garbage is collected by running a special young collection, where all objects that have lived long enough in the nursery are promoted (moved) to the old space, thus freeing up the nursery for more object allocation. When the old space becomes full garbage is collected there, a process called an old collection.
The reasoning behind a nursery is that most objects are temporary and short lived. A young collection is designed to be swift at finding newly allocated objects that are still alive and moving them away from the nursery. Typically, a young collection frees a given amount of memory much faster than an old collection or a garbage collection of a single-generational heap (a heap without a nursery).

Sunday, 14 April 2019

Memory Leaks

One of the core benefits of Java is the JVM, which is an out-of-the-box memory management. Essentially, we can create objects and the Java Garbage Collector will take care of allocating and freeing up memory for us. Nevertheless, memory leaks can still occur in Java applications.

What is a Memory Leak in Java?

A scenario that occurs when objects are no longer being used by the application, but the Garbage Collector is unable to remove them from working memory – because they’re still being referenced. 
A memory leak is bad because it blocks memory resources and degrades system performance over time. And if not dealt with, the application will eventually exhaust its resources, finally terminating with a fatal java.lang.OutOfMemoryError.
Symptoms of memory leak are as below:
  • Severe performance degradation when the application is continuously running for a long time
  • OutOfMemoryError heap error in the application
  • Spontaneous and strange application crashes
  • The application is occasionally running out of connection objects

Types of Memory Leaks in Java

Static Field Holding On to the Object Reference

In Java, static fields have a life that usually matches the entire lifetime of the running application(unless ClassLoader becomes eligible for garbage collection).

eg public static final ArrayList<Double> list = new ArrayList<Double>(1000000);

Prevention

  • Minimize the use of static variables
  • When using singletons, rely upon an implementation that lazily loads the object instead of eagerly loading

Through un-closed Resources and un-closed Connections

Prevention

  • Always use finally block to close resources
  • The code (even in the finally block) that closes the resources should not itself have any exceptions
  • When using Java 7+, we can make use of try-with-resources block

 Improper equals() and hashCode() Implementations

When defining new classes, a very common oversight is not writing proper overridden methods for equals() and hashCode() methods.
HashSet and HashMap use these methods in many operations, and if they’re not overridden correctly, then they can become a source for potential memory leak problems.

Calling String.intern() on Long String

The Java String pool had gone through a major change in Java 7 when it was transferred from PermGen to HeapSpace. But for applications operating on version 6 and below, we should be more attentive when working with large Strings.
If we read a huge massive String object, and call intern() on that object, then it goes to the string pool, which is located in PermGen (permanent memory) and will stay there as long as our application runs. This blocks the memory and creates a major memory leak in our application.

Prevention

  • The simplest way to resolve this issue is by upgrading to latest Java version as String pool is moved to HeapSpace from Java version 7 onwards
  • If working on large Strings, increase the size of the PermGen space to avoid any potential OutOfMemoryErrors

How to Find Leaking Sources in Your Application

Verbose Garbage Collection

One of the quickest ways to identify a memory leak is to enable verbose garbage collection.
By adding the -verbose:gc parameter to the JVM configuration of our application, we’re enabling a very detailed trace of GC. Summary reports are shown in default error output file, which should help you understand how your memory is being managed.

 Do Profiling

The second technique is the one we’ve been using throughout this article – and that’s profiling. The most popular profiler is Visual VM – which is a good place to start moving past command-line JDK tools and into lightweight profiling.

Review Your Code

Simply put – review your code thoroughly, practice regular code reviews and make good use of static analysis tools to help you understand your code and your system.
Finally, this is more of a general good practice than a specific technique to deal with memory leaks.

Java 8 Streams.... Are ParallelStreams always effective?

Consider the following 4 points before going for parallel streams:
  • Splitting / decomposition costs :: Sometimes splitting is more expensive than just doing the work!
  • Task dispatch / management costs :: Can do a lot of work in the time it takes to hand work to another thread
  • Result combination costs :: Sometimes combination involves copying lots of data. For example, adding numbers is cheap whereas merging sets is expensive.
  • Locality :: You should consider cache misses, if a CPU waits for data because of cache misses then you wouldn't gain anything by parallelization. That's why array-based sources parallelize the best as the next indices (near the current index) are cached and there are fewer chances that CPU would experience a cache miss.

In simpler words, go for parallelStream if and only if

  • I have a massive amount of items to process (or the processing of each item takes time and is parallelizable)
  • I have a performance problem in the first place
  • I don't already run the process in a multi-thread environment (for example: in a web container, if I already have many requests to process in parallel, adding an additional layer of parallelism inside each request could have more negative than positive effects)
Also note a relatively simple formula to determine a chance of parallel speedup.

NQ Model:
N x Q > 10000
where, N = number of data items and Q = amount of work per item

String, StringBuilder and StringBuffer


Equality

String overrides the equal method, but StringBuilder and StringBuffer don't.

So when a Str reference does an equal check with a reference of either a StringBuilder or a StringBuffer, it will always fail, irrespective, if all references are referring to a same value.

If you really want to check for equality, always use the toString() method on the StringBuilder and StringBuffer references

Demonstrated on below github link.

https://github.com/omerhashmininjago/java-programs/tree/master/src/main/java/com/demonstrate/concepts/string/example2

Concatenation

When a String reference is passed to a method, and we perform a concatenation, the reference with in the method would point to another object, but the original string reference will continue to point to the older object. This is because String is Immutable

This concept will not work for StringBuidler and StringBuffer as they are Mutable, the original reference will point to the newly created object.

Demonstrated on below github link
https://github.com/omerhashmininjago/java-programs/tree/master/src/main/java/com/demonstrate/concepts/string/example3

Thread Safe

When working in a single threaded environment


  • String to be used when we are sure that the string in question is not going to changed
  • StringBuilder to be used, when the string is going to go through lots of changes and we would be using a single thread. StringBuilder is not thread safe
  • StringBuffer to be used, when we would have multiple threads accessing the same string reference and we would need to be sure happens-before relationship is not violated



Friday, 12 April 2019

Asymptotic Notations

The analysis of algorithms is the determination of the computational complexity of algorithms, that is the amount of time, storage and/or other resources necessary to execute them

Each algorithm when laid down, can be represented by a formula to represent the amount of time needed for the execution of the lines of code contained in that algorithm.

This formula can contain details which are important with respect to computing time, or contain details which might be insignificant while computing the time. We can ignore such insignificant details from the formula. These trivial details are usually constants or a variable part which is insignificant when compared to another variable in the same equation, for eg T(n) = 4n^2 +5n + 3. The 3 being a constant can be removed. Leaving T(n) = 3n^2 + 5n. Now when we compare 5n to 4n^2, for large values of n, the 5n would not be of any significance when compared to the value of 4n^2. So for any value of n, we ignore this insignificant component of the formula, thus leaving T(n) = 4n^2. Now, here again, 4 is a constant, and thus becomes insignificant while considering the time complexity. Thus leaves us with T(n) = n^2 which in fact is the component which drives the complexity of an algorithm.

The Big-O Notation is the way we determine how fast any given algorithm is when put through its paces 
or
Big O notation is used to describe the performance or complexity of an algorithm.

The three notations used to represent the growth of any algorithm, as input increases

Big Theta (tight bounds)

When we say tight bounds, we mean that the time complexity represented by the Big-Θ notation is like the average value or range within which the actual time of execution of the algorithm will be.
For example, if for some algorithm the time complexity is represented by the expression 3n2 + 5n, and we use the Big-Θ notation to represent this, then the time complexity would be Θ(n2), ignoring the constant coefficient and removing the insignificant part, which is 5n.
Here, in the example above, complexity of Θ(n2) means, that the average time for any input n will remain in between, x1 * n^2 and x2 * n^2 where x1,x2 are two constants, thereby tightly binding the expression representing the growth of the algorithm.

Big O (Upper bound)

This notation is known as the upper bound of the algorithm, or a Worst Case of an algorithm and can be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm.
It tells us that a certain function will never exceed a specified time for any value of input n.
The question is why we need this representation when we already have the big-Θ notation, which represents the tightly bound running time for any algorithm. Let's take a small example to understand this.
Consider Linear Search algorithm, in which we traverse an array elements, one by one to search a given number.
In Worst case, starting from the front of the array, we find the element or number we are searching for at the end, which will lead to a time complexity of 'n', where 'n' represents the number of total elements.
But it can happen, that the element that we are searching for is the first element of the array, in which case the time complexity will be '1'.
Now in this case, saying that the big-Θ or tight bound time complexity for Linear search is Θ(n), will mean that the time required will always be related to 'n', as this is the right way to represent the average time complexity, but when we use the big-O notation, we mean to say that the time complexity is O(n), which means that the time complexity will never exceed 'n', defining the upper bound, hence saying that it can be less than or equal to 'n', which is the correct representation.
This is the reason, most of the time you will see Big-O notation being used to represent the time complexity of any algorithm, because it makes more sense.

Big Omega (Lower bound)


Big Omega notation is used to define the lower bound of any algorithm or we can say the best case of any algorithm.
This always indicates the minimum time required for any algorithm for all input values, therefore the best case of any algorithm.
In simple words, when we represent a time complexity for any algorithm in the form of big-Ω, we mean that the algorithm will take at least this much time to complete it's execution. It can definitely take more time than this too.

Monday, 8 April 2019

Integer Cache

Where does Java store Integers and other Wrapper classes? Its always the heap. But, Integer.valueOf(100) will always return the same instance but Integer.valueOf(200) will always return a new instance. Why this difference.

There is an IntegerCache in java.lang.Integer which stores instances for values -128 though 127. This means that Integer.valueOf(17) always will return the very same instance, while Integer.of(200) will not. While this clearly has the advantage of reuse of commonly used Integer values which results in better performance, thus relieving the GC from some work, it also has implications for autoboxing and identity comparisons.

The boundaries of this IntegerCache can be changed using  -XX:AutoBoxCacheMax=  <new value>.

The cache always has these 256 Integer values closest to 0 pre-filled rather than populating it on demand. So if the boundaries are tweaked, then this could impact memory consumption.

Volatile keyword, Synchronized keyword and Lock interface

Volatile Marking a member variable as volatile, makes sure, when ever the member variable is accessed, its value is always read from the...