Download and Install pdf2htmlEX

Download and Install pdf2htmlEX

  • 最近更新2019年12月16日

首先,恭喜刘爽同学,晋级今年的软件大赛,花费的时间与经历,在这一刻感觉是值得的。为了帮助刘爽同学修改一下课题的BUG,所以需要对pdf2htmlEx源码进行编译,在网上找了好多,都不理想,还好在国外的网站上找到这篇文章,希望对自己有用。可是……可是……我今天是测试不了拉,明天再测试吧。

原文引自:http://opendesignarch.blogspot.jp/2013/01/download-and-install-pdf2html.html

Recently, I wanted to convert a large PDF document to html so that I could extract tables from it that spanned several pages. I found that the tool to help me achieve the task was called pdf2htmlex available at http://coolwanglu.github.com/pdf2htmlEX/.

PDF


The website points to github which provides the command line commands to download it. To download the source and build it, open a terminal window and enter the following command.

> git clone –depth 1 git://github.com/coolwanglu/pdf2htmlEX.git


Well, turns out I did not have Git installed. So I proceeded to download and install it.


Git was downloaded and installed correctly.

Once Git was installed, I re-issued the previous command, and this time it was downloaded and installed correctly.

Next we cd into the newly created directory and issue commands to build the software.

> cd pdf2htmlEX

> cmake . && make && sudo make install


Turns out there were several dependencies missing. One of the major ones was poppler for which Ubuntu had an older version and a version higher than 0.20.0 was needed. I proceeded to download the version from http://poppler.freedesktop.org/ and compiled and built it on the local machine.

The latest version of the poppler software can be accessed from http://poppler.freedesktop.org/releases.html as shown below.


Once downloaded, we need to extract and build the software.

> sudo apt-get install libopenjpeg-dev

> tar -xvf poppler-0.21.4.tar.gz

> cd poppler-0.21.4/

> ./configure –enable-xpdf-headers

> make

> sudo make install

libpoppler was installed successfully!

Note: The option to enable xpdf headers in the configure statement is very important. Without it, poppler will compile, but pdf2htmlEX may not compile.

Next, we need to install the latest version of libfontforge. This has certain dependencies, that can be installed by issuing the following command on the terminal.

> sudo apt-get update; sudo apt-get install libpng12-dev zlibc zlib1g-dev libtiff-dev libungif4-dev libjpeg-dev libxml2-dev libuninameslist-dev xorg-dev subversion cvs gettext git libpango1.0-dev libcairo2-dev python-dev;

Next we need to downloadthe sourcecode and install it. For this, we make a src folder and cd into it using the following commands.

> mkdir src

> cd src

We then proceed to download the sourcecode from the website (https://github.com/fontforge/fontforge/downloads) as shown below.


Once downloaded, we can issue the following command to unzip the file.

> bunzip2 fontforge_full-20120731-b.tar.bz2


The resulting tar can be unzipped by issuing the following command

> tar -xvf fontforge_full-20120731-b.tar

Next we need to download a few other modules called freetype and spiro

> git clone git://git.sv.gnu.org/freetype/freetype2.git;

> svn co http://libspiro.svn.sourceforge.net/svnroot/libspiro/;

Now, we need to build all these together.

> cd ./libspiro;

./configure;

make;

Followed by

> sudo make install


Next we need to install fontforge

> cd ./fontforge-20120731-b/; ./autogen.sh; ./configure; make; sudo make install; sudo ldconfig;


Then we cd into the directory and configure and make the program by issuing the following commands.

> ./configure

> make

> sudo make install

Finally, the software is built and installed.


Now, we can run the pdf2htmlex command as follows:

> pdf2htmlex

In fact, executing the following explains all the options available with the command.

> pdf2htmlex –help


There you go folks… a simple tool to convert pdf documents to html.

分享到 :
相关推荐

发表回复

登录... 后才能评论