PDF to Text

PDF to Text

curl --request POST \
  --url https://api.pdf.co/v1/pdf/convert/to/text \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "url": "<string>",
  "lang": "eng+deu",
  "rect": "10 20 300 400",
  "unwrap": false,
  "linegrouping": "1",
  "inline": true,
  "pages": "0,2,5-10, !0, !5-!2",
  "name": "<string>",
  "async": true,
  "password": "<string>",
  "expiration": 60,
  "profiles": "<string>",
  "httpusername": "<string>",
  "httppassword": "<string>"
}'

POST

pdf

convert

text

PDF to Text

curl --request POST \
  --url https://api.pdf.co/v1/pdf/convert/to/text \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "url": "<string>",
  "lang": "eng+deu",
  "rect": "10 20 300 400",
  "unwrap": false,
  "linegrouping": "1",
  "inline": true,
  "pages": "0,2,5-10, !0, !5-!2",
  "name": "<string>",
  "async": true,
  "password": "<string>",
  "expiration": 60,
  "profiles": "<string>",
  "httpusername": "<string>",
  "httppassword": "<string>"
}'

Authorizations

x-api-key

string

header

required

Body

application/json

url

string<uri>

default:https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-text/sample.pdf

required

URL to the source file url attribute.

lang

string

default:eng

Set the language for OCR (text from image) to use for scanned PDF, PNG, and JPG documents input when extracting text. see Language Support. You can also use 2 languages simultaneously like this: eng+deu (any combination).

Example:

"eng+deu"

rect

string<{x} {y} {width} {height}>

Defines coordinates for extraction. UsePDF Edit Add Helper to get or measure PDF coordinates. The format is {x} {y} {width} {height}.

Example:

"10 20 300 400"

unwrap

boolean

default:false

Unwrap lines into a single line within table cells in provided PDF documents. This is only applicable when lineGrouping is set to 1.

linegrouping

enum<string>

Controls how lines of text are grouped when extracting data from a PDF. Line grouping within table cells. The available modes are: 1, 2, 3. For more information, see Line Grouping.

Available options:

1,

2,

3

inline

boolean

default:false

Set to true to return results inside the response. Otherwise, the endpoint will return a URL to the output file generated.

pages

string

Page indices/ranges (0-based). Items are comma-separated. Each item is one of: N (e.g., 0), N-M (e.g., 3-7), N- (open-ended, e.g., 10-), or !N (reverse index; !0 is last page, !1 is second-to-last). Whitespace is allowed. If not specified, the default configuration processes all pages.

Example:

"0,2,5-10, !0, !5-!2"

name

string

File name for generated output.

async

boolean

default:false

Set async to true for long processes to run in the background, API will then return a jobId which you can use with the Background Job Check endpoint. Also see Webhooks & Callbacks

password

string

Password for the PDF file.

expiration

number

default:60

Sets the expiration time for the output link, in minutes. After this period, generated output file(s) are automatically deleted from PDF.co Temporary Files Storage. The maximum allowed duration depends on your subscription plan. For permanent storage of input files (e.g., reusable images, PDF templates, documents), use PDF.co Built‑In Files Storage.

profiles

string

Profiles are used configure extra options for specific API endpoints and may be unique to an API. For more information, see Profiles and the documentation of each endpoint for profiles specific to it.

httpusername

string

HTTP auth user name if required to access source URL.

httppassword

string

HTTP auth password if required to access source URL.

PDF to JSON with AI PDF to Text (Simple)

⌘I

Extraction

Editing

PDF Conversion

Excel Conversion

PDF Merging & Splitting

Forms

Find & Search

Document, File & System

Pages

Barcodes

Authorizations

Body