Techfee

About How to Convert Docx/doc/rtf Into Html Using JAVA

| Comments

Actually, there is one solution, “One sulution to rule them all” . That is to use OpenOffice as the background service to convert them all. You can implement this remote call by yourself, but someone already wrote a java library to provide a high level API for us to use called JODConverter.

http://code.google.com/p/jodconverter/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
     OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
          //2 ports indicate 2 working processes to do the conversion.
          .setPortNumbers(8100, 8101)
          //restart openoffice working process after every 30 conversions to prevent memory leak of the working process. (unsolved issue of openoffice)
          .setMaxTasksPerProcess(30)
          //untouched tasks in the queue that over 1200000ms will be discarded.(get a officeManager not found exception)
          .setTaskQueueTimeout(1200000)
          //if one task processing time over 20000ms, it will throw an exception.
          .setTaskExecutionTimeout(20000)
          .buildOfficeManager();

      officeManager.start();
      OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
      converter.convert(new File(filepath1), new File(filepath2));

      officeManager.stop();

On Mac, install OpenOffice is very easy by install a dmg file. But on linux server, it’s better to use libreoffice instead. Libreoffice is now actively updated by The Document Foundation.

Why LibreOffice over (Apache) OpenOffice because (1) It has a better open source license. (2) It has more community support. (3) It is more rapidly developing and releasing updates

Before installing LibreOffice you need to remove existing openoffice from your system using the following command

1
2
3
4
5
sudo apt-get purge openoffice*.*

sudo add-apt-repository ppa:libreoffice/ppa
sudo apt-get update
sudo apt-get install libreoffice

And it’s better to use singleton mode to use the officeManager.

Comments