Techfee

About How to Make Phantomjs Render Google Embeded Web Fonts Correctly

| Comments

Currently, before 1.8.1 version, phantomjs doesn’t support WOFF files, a fix is to use https://github.com/Vitallium ‘s commit which has fixed WOFF file support issue. But you have to compile it from source, it takes a long time if your computer is not powerful enough. On my Quad-Core i7 laptop, it uses 10-15 minutes. On my EC2 dul-core middle instance, it compiles forever…

The Web Open Font Format (WOFF) is a font format for use in web pages

To do:

1) Checkout PhantomJS 1.X version as described in the official instruction: http://phantomjs.org/build.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// For Ubuntu Linux (tested on a barebone install of Ubuntu 10.04 Lucid Lynx and Ubuntu 11.04 Natty Narwhal):

sudo apt-get install build-essential chrpath git-core libssl-dev libfontconfig1-dev
git clone git://github.com/ariya/phantomjs.git
cd phantomjs
git checkout 1.8
//./build.sh

// On Mac OS, Install Xcode and the necessary SDK for development (gcc, various tools, libraries, etc).

git clone git://github.com/ariya/phantomjs.git
cd phantomjs
git checkout 1.8
//./build.sh

Note that by don’t ./build.sh now, cause it’s the master branch of the phantomjs, so, after checkout, don’t build.

2) Add the remote repo which contains the needed branch:

About How to Embed Image Into One Single Html

| Comments

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
private static String embed(File htmlFile) {

      String final_return = "error";

      try {
          
          BufferedReader br = new BufferedReader(new FileReader(htmlFile));

          StringBuilder sb = new StringBuilder();

          String line = "";

          while ((line = br.readLine()) != null) {
              sb.append(line);
          }

          br.close();

          String htmlContent = sb.toString();

          org.jsoup.nodes.Document doc = Jsoup.parse(htmlContent);

          Elements images = doc.getElementsByTag("img");

          for (Element image : images) {

              String imgName = image.attr("src");

              // we surppose image files are in the same folder with the html file.
              File tempImageFile = new File(file.getParentFile().getAbsoluteFile()+"/"+ imgName);

              if (tempImageFile.exists() && imgName.trim() != "") {

                  String imageString = new String(

                  Base64.encodeBase64(FileUtils.readFileToByteArray(tempImageFile)));

                  image.attr("src", "data:image;base64," + imageString);

              }

              else {

                  // if the image file not exist, just put a empty string in it or leave it as it was
                  image.attr("src", "data:image;base64," + "");

              }

          }

          final_return = doc.toString();

      } catch (Exception e) {

          //do stuff...
      }

      return final_return;

  }

About How to Get Mime Types Using Java

| Comments

1. using javax.activation.MimetypesFileTypeMap

1
2
 File f = new File("test.docx");
 String mimeType = new MimetypesFileTypeMap().getContentType(f);
1
2
3
4
5
6
result:
Mime Type of .DS_Store is application/octet-stream
Mime Type of 444.docx is application/octet-stream
Mime Type of f.jar is application/octet-stream
Mime Type of Non-Disclosure Agreement.docx is application/octet-stream
Mime Type of W9 Form.pddf is application/octet-stream

2. Write yourself

Journal

| Comments

今天做了一些关于Docx转换Html的研究, 发现Java现在并无什么很完美的解决方案, 主要是转换效果不佳, 公司想要的Header里的图片没有任何一个Java的开源解决方案可以完美的转换的. 一天搞来搞去, 没有任何成果. 这半年时间做开发的工作, 感觉自己并非非常关心具体的细节实现, 而是对数据传输和系统架构比较有兴趣. 未来怎么走, 变数还很大.

三月的目标:

  1. 签证问题搞清楚, 争取准时寄出.
  2. 选择一门脚本语言或者更加高层的语言学习熟悉.
  3. 把pdftojson完善, 提高并发性能.
  4. 完善rtf,docx,doc to html, 提高并发性能.
  5. 最好把MQ实现出来.

About How to Convert Docx/doc/rtf Into Html Using JAVA

| Comments

Actually, there is one solution, “One sulution to rule them all” . That is to use OpenOffice as the background service to convert them all. You can implement this remote call by yourself, but someone already wrote a java library to provide a high level API for us to use called JODConverter.

http://code.google.com/p/jodconverter/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
     OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
          //2 ports indicate 2 working processes to do the conversion.
          .setPortNumbers(8100, 8101)
          //restart openoffice working process after every 30 conversions to prevent memory leak of the working process. (unsolved issue of openoffice)
          .setMaxTasksPerProcess(30)
          //untouched tasks in the queue that over 1200000ms will be discarded.(get a officeManager not found exception)
          .setTaskQueueTimeout(1200000)
          //if one task processing time over 20000ms, it will throw an exception.
          .setTaskExecutionTimeout(20000)
          .buildOfficeManager();

      officeManager.start();
      OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
      converter.convert(new File(filepath1), new File(filepath2));

      officeManager.stop();

Octopress的中文使用测试

| Comments

试试中文在Octopress中的表现如何

测试文章:

来自美国宇航局太阳动力学天文台的最新观测,显示了太阳表面发生新的喷发事件。太阳耀斑和日冕物质抛射等太阳活动可导致外层结构出现的变化,这些复 杂的现象与太阳表面大气运动相联系,并受到磁场线的影响。在2012年7月19日,太阳表面爆发了一次强度较大的太阳耀斑事件,发射出大量的光和辐射,接 下来就是日冕物质抛射阶段,在磁场作用下,整个爆发事件变得异常诡异,这一现象被称为“日冕雨”。