1

Topic: Program pdf

The task such - is pdf a file with the text, in the text there is a table with digits. It is necessary to tear out these digits somehow. There was only a library on java which does not approach. It is required or through COM or hands. And itself ActiveX from AdobeReader is able to read the text from ? Or he only  a picture is able?

2

Re: Program pdf

Hello, Evgeniy Skvortsov, you wrote: ES> the Task such - is pdf a file with the text, in the text there is a table with digits. It is necessary to tear out these digits somehow. ES> There was only a library on java which does not approach. It is required or through COM or hands. ES> and itself ActiveX from AdobeReader is able to read the text from ? Or he only  a picture is able? If it is still actual, there is here such article: https://habrahabr.ru/post/130601/

3

Re: Program pdf

Hello, Evgeniy Skvortsov, you wrote: ES> the Task such - is pdf a file with the text, in the text there is a table with digits. It is necessary to tear out these digits somehow. ES> There was only a library on java which does not approach. It is required or through COM or hands. ES> and itself ActiveX from AdobeReader is able to read the text from ? Or he only  a picture is able? In pdf there is a dial-up vectorial ( fonts, a line, a line the Bezier and ) both raster primitives and the text. About that that these  are a table part, pdf knows nothing, it does not have such object as the table. So it is necessary  the text manually from tags and stream, and then to analyze turned out porridge about that it meant. To read the text of all pdf it is simple, if in it is not present any  and other exotic. Is , muPDF it precisely does, it is possible to find decisions easier, the simple parcer for example is taken, but the turbid operation will be much. Still is pdfium though it and , but,  at it sits text parcer PDF of the document

4

Re: Program pdf

Hello, Evgeniy Skvortsov, you wrote: ES> the Task such - is pdf a file with the text, in the text there is a table with digits. It is necessary to tear out these digits somehow. There are such tasks by which it is not necessary to do. Here about PDF - from that series. Give, write the FineReader for 30 .. It does not turn out.

5

Re: Program pdf

Hello, alpha21264, you wrote: A> If it is still actual, there is here such article: very actually, I only wrote today a post. A> https://habrahabr.ru/post/130601/Thanks! In general pdftotext.exe from a packet xpdf with a key-layout produces excellent result,  which is easier  some turnip Especially pleases that it is independent exe, that is it will be possible  a script on WSH and not to mold something more difficult.

6

Re: Program pdf

7

Re: Program pdf

Hello, Evgeniy Skvortsov, you wrote: ES> the Task such - is pdf a file with the text, in the text there is a table with digits. It is necessary to tear out these digits somehow. http://www.foolabs.com/xpdf/download.html