POST /v1/pdf/find/table
This function finds tables in documents using an AI-powered table detection engine.
This endpoint locates tables in an input PDF document and returns JSON with:
-
The array of tables objects.
-
X, Y, Width, and Height coordinates for every table found.
-
Rect param for every table that you can re-use with pdf/convert/to/json, pdf/convert/to/csv, pdf/convert/to/csv, and other endpoints to extract a selected table only.
-
PageIndex page index for a page with a table. The very first page is 0 .
-
Columns array with the set of X coordinates for every column inside the table that was found.
To extract the table into CSV, JSON, or XML please use pdf/convert/to/csv, pdf/convert/to/json2, and pdf/convert/to/xml endpoints with rect parameter value from rect output param for this table accordingly.
To extract the table into CSV, JSON, or XML please use pdf/convert/to/csv, pdf/convert/to/json2, and pdf/convert/to/xml endpoints with rect parameter value from rect output param for this table accordingly.
Attributes
Attributes are case-sensitive and should be inside JSON for POST request. for example: { "url": "https://example.com/file1.pdf" }
| Attribute | Type | Required | Default | Description |
url | string | Yes | - | URL to the source file url attribute |
callback | string | No | - | The callback URL (or Webhook) used to receive the POST data. see Webhooks & Callbacks. This is only applicable when async is set to true. |
httpusername | string | No | - | HTTP auth user name if required to access source URL. |
httppassword | string | No | - | HTTP auth password if required to access source URL. |
pages | string | No | all pages | Specify page indices as comma-separated values or ranges to process (e.g. “0, 1, 2-” or “1, 2, 3-7”). The first-page index is 0. Use ”!” before a number for inverted page numbers (e.g. “!0” for the last page). If not specified, the default configuration processes all pages. The input must be in string format. |
inline | boolean | No | false | Set to true to return results inside the response. Otherwise, the endpoint will return a URL to the output file generated. |
password | string | No | - | Password for the PDF file. |
async | boolean | No | false | Set async to true for long processes to run in the background, API will then return a jobId which you can use with the Background Job Check endpoint. Also see Webhooks & Callbacks |
profiles | object | No | - | See Profiles for more information. |
DataEncryptionAlgorithm | string | No | - | Controls the encryption algorithm used for data encryption. See User-Controlled Encryption for more information. The available algorithms are: AES128, AES192, AES256. |
DataEncryptionKey | string | No | - | Controls the encryption key used for data encryption. See User-Controlled Encryption for more information. |
DataEncryptionIV | string | No | - | Controls the encryption IV used for data encryption. See User-Controlled Encryption for more information. |
DataDecryptionAlgorithm | string | No | - | Controls the decryption algorithm used for data decryption. See User-Controlled Encryption for more information. The available algorithms are: AES128, AES192, AES256. |
DataDecryptionKey | string | No | - | Controls the decryption key used for data decryption. See User-Controlled Encryption for more information. |
DataDecryptionIV | string | No | - | Controls the decryption IV used for data decryption. See User-Controlled Encryption for more information. |
Find only bordered tables
You can limit search to bordered tables only by enabling the legacy table search mode with the following profiles config:
{
"profiles": "{ 'Mode': 'Legacy',
'ColumnDetectionMode': 'BorderedTables',
'DetectionMinNumberOfRows': 1,
'DetectionMinNumberOfColumns': 1,
'DetectionMaxNumberOfInvalidSubsequentRowsAllowed': 0,
'DetectionMinNumberOfLineBreaksBetweenTables': 0,
'EnhanceTableBorders': false
}"
}
Query parameters
No query parameters accepted.
Responses
| Parameter | Type | Description |
body | object | Response body. |
pageCount | integer | Number of pages in the PDF document. |
error | boolean | Indicates whether an error occurred (false means success) |
status | string | Status code of the request (200, 404, 500, etc.). For more information, see Response Codes. |
name | string | Name of the output file |
credits | integer | Number of credits consumed by the request |
remainingCredits | integer | Number of credits remaining in the account |
duration | integer | Time taken for the operation in milliseconds |
Example Payload
{
"url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
"async": "false",
"inline": "true",
"password": ""
}
Example Response
To see the main response codes, please refer to the
Response Codes page.
{
"body": {
"tables": [
{
"PageIndex": 0,
"X": 36,
"Y": 34.4400024,
"Width": 523.44,
"Height": 160.82,
"Columns": [
357.675
],
"rect": "36, 34.4400024, 523.44, 160.82"
},
{
"PageIndex": 0,
"X": 36,
"Y": 316.249969,
"Width": 523.44,
"Height": 120.620026,
"Columns": [
157.117,
340.68,
475.84
],
"rect": "36, 316.249969, 523.44, 120.620026"
}
]
},
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample.json",
"remainingCredits": 98892697,
"credits": 21
}
Inconsistent URL Encoding in cURL Output: When using cURL to make API requests, the output JSON may show URL characters encoded as Unicode escape sequences. For example, the ampersand character (&) may appear as \u0026 in the cURL output. This is normal JSON encoding behavior and does not affect the validity of the URL. The URL will function correctly when used, as JSON parsers automatically decode these escape sequences. If you’re parsing the response programmatically, your JSON parser will handle this conversion automatically.
Code Samples
CURL
JavaScript/Node.js
Python
C#
Java
curl --location --request POST 'https://api.pdf.co/v1/pdf/find/table' \
--header 'x-api-key: *******************' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
"async": "false",
"inline": "true",
"password": ""
}'
var https = require("https");
var path = require("path");
var fs = require("fs");
// `request` module is required for file upload.
// Use "npm install request" command to install.
var request = require("request");
// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
const API_KEY = "***********************************";
// Direct URL of source PDF file.
// You can also upload your own file into PDF.co and use it as url. Check "Upload File" samples for code snippets: https://github.com/bytescout/pdf-co-api-samples/tree/master/File%20Upload/
const SourceFileUrl = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf";
// Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
const Pages = "";
// PDF document password. Leave empty for unprotected documents.
const Password = "";
// Prepare URL for PDF Table Search API call.
// See documentation: https://apidocs.pdf.co
var query = `https://api.pdf.co/v1/pdf/find/table`;
let reqOptions = {
uri: query,
headers: { "x-api-key": API_KEY },
formData: {
password: Password,
pages: Pages,
url: SourceFileUrl
}
};
// Send request
request.post(reqOptions, function (error, resp, body) {
if (error) {
return console.error("Error: ", error);
}
var jsonBody = JSON.parse(body);
// Loop through all found tables, and get json data
if (jsonBody.body.tables && jsonBody.body.tables.length > 0) {
for (var i = 0; i < jsonBody.body.tables.length; i++) {
getJSONFromCoordinates(SourceFileUrl, jsonBody.body.tables[i].PageIndex, jsonBody.body.tables[i].rect, `table_${i + 1}.json`);
}
}
});
/**
* Get JSON from specific co-ordinates
*/
function getJSONFromCoordinates(fileUrl, pageIndex, rect, outputFileName) {
// Prepare request to `PDF To JSON` API endpoint
var jsonQueryPath = `https://api.pdf.co/v1/pdf/convert/to/json`;
// Json Request
let jsonReqOptions = {
uri: jsonQueryPath,
headers: { "x-api-key": API_KEY },
formData: {
pages: pageIndex,
url: fileUrl,
rect: rect
}
};
// Send request
request.post(jsonReqOptions, function (error, resp, body) {
if (error) {
return console.error("Error: ", error);
}
var outputJsonUrl = JSON.parse(body).url;
// Download JSON file
var file = fs.createWriteStream(outputFileName);
https.get(outputJsonUrl, (response2) => {
response2.pipe(file)
.on("close", () => {
console.log(`Generated JSON file saved as "${outputFileName}" file.`);
});
});
});
}
import requests
import os
# The authentication key (API Key).
# Get your own by registering at https://app.pdf.co
API_KEY = "***************************************"
# Direct URL of source PDF file.
SourceFileUrl = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf"
# Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
Pages = ""
# PDF document password. Leave empty for unprotected documents.
Password = ""
# Prepare URL for PDF Table Search API call.
query = "https://api.pdf.co/v1/pdf/find/table"
reqOptions = {
'password': Password,
'pages': Pages,
'url': SourceFileUrl
}
headers = {
'x-api-key': API_KEY
}
def getJSONFromCoordinates(fileUrl, pageIndex, rect, outputFileName):
# Prepare request to `PDF To JSON` API endpoint
jsonQueryPath = "https://api.pdf.co/v1/pdf/convert/to/json"
# Json Request
jsonReqOptions = {
'pages': pageIndex,
'url': fileUrl,
'rect': rect
}
# Send request
response = requests.post(jsonQueryPath, headers=headers, data=jsonReqOptions)
if response.status_code == 200:
outputJsonUrl = response.json()['url']
# Download JSON file
res = requests.get(outputJsonUrl)
with open(outputFileName, 'wb') as outfile:
outfile.write(res.content)
print(f'Generated JSON file saved as "{outputFileName}" file.')
else:
print(f"Request error: {response.status_code} {response.reason}")
# Send request
response = requests.post(query, headers=headers, data=reqOptions)
if response.status_code == 200:
jsonBody = response.json()
# Loop through all found tables, and get json data
if 'tables' in jsonBody['body'] and len(jsonBody['body']['tables']) > 0:
for i, table in enumerate(jsonBody['body']['tables']):
getJSONFromCoordinates(SourceFileUrl, table['PageIndex'], table['rect'], f"table_{i + 1}.json")
else:
print(f"Request error: {response.status_code} {response.reason}")
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System;
using System.Collections.Generic;
using System.Net;
namespace PDFcoApiExample
{
class Program
{
// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
const String API_KEY = "*****************************************";
// Direct URL of source PDF file.
// You can also upload your own file into PDF.co and use it as url. Check "Upload File" samples for code snippets: https://github.com/bytescout/pdf-co-api-samples/tree/master/File%20Upload/
const string SourceFileUrl = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf";
// Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
const string Pages = "";
// PDF document password. Leave empty for unprotected documents.
const string Password = "";
static void Main(string[] args)
{
// Create standard .NET web client instance
WebClient webClient = new WebClient();
// Set API Key
webClient.Headers.Add("x-api-key", API_KEY);
// URL for PDF Table Search API call.
// See documentation: https://apidocs.pdf.co
string url = "https://api.pdf.co/v1/pdf/find/table";
// Prepare requests params as JSON
Dictionary<string, object> parameters = new Dictionary<string, object>();
parameters.Add("password", Password);
parameters.Add("pages", Pages);
parameters.Add("url", SourceFileUrl);
// Convert dictionary of params to JSON
string jsonPayload = JsonConvert.SerializeObject(parameters);
try
{
// Execute POST request with JSON payload
string response = webClient.UploadString(url, jsonPayload);
// Parse JSON response
JObject json = JObject.Parse(response);
if (json["status"].ToString() != "error")
{
Console.WriteLine(response);
}
else
{
Console.WriteLine(json["message"].ToString());
}
}
catch (WebException e)
{
Console.WriteLine(e.ToString());
}
webClient.Dispose();
Console.WriteLine();
Console.WriteLine("Press any key...");
Console.ReadKey();
}
}
}
package com.company;
import com.google.gson.JsonElement;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import okhttp3.*;
import java.io.*;
import java.net.*;
public class Main
{
// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
final static String API_KEY = "***********************************";
// Direct URL of source PDF file.
// You can also upload your own file into PDF.co and use it as url. Check "Upload File" samples for code snippets: https://github.com/bytescout/pdf-co-api-samples/tree/master/File%20Upload/
final static String SourceFileURL = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf";
// Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
final static String Pages = "";
// PDF document password. Leave empty for unprotected documents.
final static String Password = "";
public static void main(String[] args) throws IOException
{
// Create HTTP client instance
OkHttpClient webClient = new OkHttpClient();
// Prepare URL for PDF Table Search API call.
// See documentation: https://apidocs.pdf.co
String query = "https://api.pdf.co/v1/pdf/find/table";
// Make correctly escaped (encoded) URL
URL url = null;
try
{
url = new URI(null, query, null).toURL();
}
catch (URISyntaxException e)
{
e.printStackTrace();
}
// Create JSON payload
String jsonPayload = String.format("{\"password\": \"%s\", \"pages\": \"%s\", \"url\": \"%s\"}",
Password,
Pages,
SourceFileURL);
// Prepare request body
RequestBody body = RequestBody.create(MediaType.parse("application/json"), jsonPayload);
// Prepare request
Request request = new Request.Builder()
.url(url)
.addHeader("x-api-key", API_KEY) // (!) Set API Key
.addHeader("Content-Type", "application/json")
.post(body)
.build();
// Execute request
Response response = webClient.newCall(request).execute();
if (response.code() == 200)
{
// Parse JSON response
JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject();
boolean error = json.get("error").getAsBoolean();
if (!error)
{
System.out.println(response.body().string());
}
else
{
// Display service reported error
System.out.println(json.get("message").getAsString());
}
}
else
{
// Display request error
System.out.println(response.code() + " " + response.message());
}
}
}