Parse Document

Parse Document

curl --request POST \
  --url https://api.pdf.co/v1/pdf/documentparser \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "url": "<string>",
  "templateid": 123,
  "template": "<string>",
  "inline": true,
  "outputformat": "JSON",
  "generatecsvheaders": true,
  "name": "<string>",
  "pages": "0,2,5-10, !0, !5-!2",
  "async": true,
  "password": "<string>",
  "expiration": 60,
  "httpusername": "<string>",
  "httppassword": "<string>"
}'

POST

pdf

documentparser

Parse Document

curl --request POST \
  --url https://api.pdf.co/v1/pdf/documentparser \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "url": "<string>",
  "templateid": 123,
  "template": "<string>",
  "inline": true,
  "outputformat": "JSON",
  "generatecsvheaders": true,
  "name": "<string>",
  "pages": "0,2,5-10, !0, !5-!2",
  "async": true,
  "password": "<string>",
  "expiration": 60,
  "httpusername": "<string>",
  "httppassword": "<string>"
}'

Authorizations

x-api-key

string

header

required

Body

application/json

url

string<uri>

default:https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/MultiPageTable.pdf

required

URL to the source file url attribute.

templateid

number

Set ID of HTML template to be used. View and manage your templates at HTML to PDF Templates.

template

string

default:{ "templateVersion": 3, "templatePriority": 0, "sourceId": "Multipage Table Test", "detectionRules": { "keywords": [ "Sample document with multi-page table" ] }, "fields": { "total": { "type": "regex", "expression": "TOTAL {{DECIMAL}}", "dataType": "decimal" } }, "tables": [ { "name": "table1", "start": { "expression": "Item\\s+Description\\s+Price\\s+Qty\\s+Extended Price" }, "end": { "expression": "TOTAL\\s+\\d+\\.\\d\\d" }, "row": { "expression": "^\\s*(?<itemNo>\\d+)\\s+(?<description>.+?)\\s+(?<price>\\d+\\.\\d\\d)\\s+(?<qty>\\d+)\\s+(?<extPrice>\\d+\\.\\d\\d)" }, "columns": [ { "name": "itemNo", "type": "integer" }, { "name": "description", "type": "string" }, { "name": "price", "type": "decimal" }, { "name": "qty", "type": "integer" }, { "name": "extPrice", "type": "decimal" } ], "multipage": true } ] }

The raw format of the document parser template to be used directly. see Template

inline

boolean

default:false

Set to true to return results inside the response. Otherwise, the endpoint will return a URL to the output file generated.

outputformat

enum<string>

default:JSON

Format of output File. Valid values: JSON, YAML, XML, CSV.

Available options:

JSON,

YAML,

XML,

CSV

generatecsvheaders

boolean

name

string

File name for generated output.

pages

string

Page indices/ranges (0-based). Items are comma-separated. Each item is one of: N (e.g., 0), N-M (e.g., 3-7), N- (open-ended, e.g., 10-), or !N (reverse index; !0 is last page, !1 is second-to-last). Whitespace is allowed. If not specified, the default configuration processes all pages.

Example:

"0,2,5-10, !0, !5-!2"

async

boolean

default:false

Set async to true for long processes to run in the background, API will then return a jobId which you can use with the Background Job Check endpoint. Also see Webhooks & Callbacks

password

string

Password for the PDF file.

expiration

number

default:60

Sets the expiration time for the output link, in minutes. After this period, generated output file(s) are automatically deleted from PDF.co Temporary Files Storage. The maximum allowed duration depends on your subscription plan. For permanent storage of input files (e.g., reusable images, PDF templates, documents), use PDF.co Built‑In Files Storage.

httpusername

string

HTTP auth user name if required to access source URL.

httppassword

string

HTTP auth password if required to access source URL.

AI Invoice Parser List All Templates

⌘I

Extraction

Editing

PDF Conversion

Excel Conversion

PDF Merging & Splitting

Forms

Find & Search

Document, File & System

Pages

Barcodes

Authorizations

Body