Java HProf Files
- Increase Max Body Size and Max Email Size both to 300097152 bytes
- Stop Retriever Process
- Start Retriever process and instance
- Send a 16MB email from email@example.com to firstname.lastname@example.org
- Install Eclipse Memory Analyzer Tool (MAT)
Large java_pid####.hprof files being produced in logs directory.
What is HProf?
HProf is a tool built into JDK for profiling the CPU and heap usage within a JVM. A Java process crash may produce an hprof file containing a heap dump of the process at the time of the failure. This is typically seen in scenarios with "java.lang.OutOfMemoryError"
Where will they be?
Hprof files will appear in the C:\CIM\eService\logs directory. They will be named "java_pid####.hprof" and typically be large in size (>50MB).
How do I know what process has produced the hprof?
The free Eclipse Memory Analyzer Tool (MAT) can be downloaded and used to open hprof files.
- Load the hprof into MAT by going to File > Open Heap Dump. MAT will extract several index files from the hprof
- MAT's wizard will prompt for what type of report to generate. The simplest is a "Leak Suspects Report."
- The report will display Problem Suspects for the leak
- Clicking System Overview > System Properties will give a significant amount of information including:
- Server name
- Application install directory
- Process name
- Java version and install directory
- Windows user running the process
- Process-specific information. In this case, we see some IMAP connection settings.
From this screen we know that the crashing process was the Retriever (rx).
Why did the process crash?
- Clicking System Overview > Thread Overview will list all of the threads in the process along with each thread's heap size.
- In our hprof we can see that the "RxInstance id : 999" thread retained most of the heap.
- Click the thread name "RxInstance id : 999" > Java Basics > Open In Dominator Tree > Finish
- Expanding this tree out, we can see that message-ID 4DEBCFF7.9030802 had a large amount of text in its body, totaling approximately 17MB.
Did the process recover?
Not immediately, but within 5 minutes. In this case, users will not notice if the retriever is down for a brief time, as they will be handling activities already in the database. In the retriever logs we can clearly see the process exit with a "Java heap space" failure and restart 5 minutes later.
2011-06-05 14:51:06.264 GMT-0400 <@> ERROR <@> [46:RxInstance id : 999] <@> ProcessId:2304 <@> PID:-1 <@> UID:-1 <@> HttpSessionId: <@> com.egain.platform.framework.dsm.common <@> run() <@> Instance Failed. Exiting.. == >999 <@> java.lang.OutOfMemoryError: Java heap space at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:393) at java.lang.StringBuilder.append(StringBuilder.java:120) at com.egain.mail.module.retriever.Parse.parseTextPart(Parse.java:1196) at com.egain.mail.module.retriever.Parse.dumpPart(Parse.java:760) at com.egain.mail.module.retriever.Parse.parseEmail(Parse.java:347) at com.egain.mail.module.retriever.service.RxProcess.parseEmail(RxProcess.java:4345) at com.egain.mail.module.retriever.service.RxProcess.retrieveMailsForHost(RxProcess.java:1956) at com.egain.mail.module.retriever.service.RxProcess.retrieveMails(RxProcess.java:1386) at com.egain.mail.module.retriever.service.RxInstance.startInstance(RxInstance.java:114) at com.egain.platform.framework.dsm.common.ThreadPool$PoolThread.run(ThreadPool.java:112) 2011-06-05 14:56:21.332 GMT-0400 <@> INFO <@> [1:main] <@> ProcessId:5344 <@> PID:0 <@> UID:12 <@> HttpSessionId: <@> com.egain.platform.framework.dsm.BatchExecutor <@> main() <@> SysProperty : egain.processname = rx-process <@> 2011-06-05 14:56:21.332 GMT-0400 <@> INFO <@> [1:main] <@> ProcessId:5344 <@> PID:0 <@> UID:12 <@> HttpSessionId: <@> com.egain.platform.framework.dsm.BatchExecutor <@> main() <@> SysProperty : egain.servicename = controller <@>
Note that during those 5 minutes of downtime, the System console reported the process as "Running."
Comparing the process ID of the rx-process in the listprocessids.jsp page with the Task Manager on the Services Server concluded that the retriever was in fact NOT running, as there was no javaw.exe with PID 2304.
Then, once the process restarted, the JSP was updated with the new PID 5344.
This scenario touched lightly on the .hprof files created in the logs directory. Users can now:
- Determine which process caused the .hprof file
- Determine Problem Suspects
- Dig deeper into the Java threads to search for root cause