> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pdf.co/llms.txt
> Use this file to discover all available pages before exploring further.

# PDF Find Text

> Find text in PDF and get coordinates. Supports regular expressions.

## `POST /v1/pdf/find`

## Attributes

<Note>Attributes are case-sensitive and should be inside JSON for POST request. for example: `{ "url": "https://example.com/file1.pdf" }`</Note>

<Warning>When using regular expressions in JSON payloads, ensure that backslashes are properly escaped. For example, a single backslash `\` should be written as `\\`.</Warning>

| Attribute                                              | Type    | Required | Default                 | Description                                                                                                                                                                                                                                                                                                            |
| ------------------------------------------------------ | ------- | -------- | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `url`                                                  | string  | *Yes*    | -                       | URL to the source file [`url` attribute](/api-reference/url-input-and-request-limits#supported-file-sources)                                                                                                                                                                                                           |
| `callback`                                             | string  | *No*     | -                       | The callback URL (or Webhook) used to receive the POST data. see [Webhooks & Callbacks](/api-reference/webhooks). This is only applicable when `async` is set to `true`.                                                                                                                                               |
| `httpusername`                                         | string  | *No*     | -                       | HTTP auth user name if required to access source URL.                                                                                                                                                                                                                                                                  |
| `httppassword`                                         | string  | *No*     | -                       | HTTP auth password if required to access source URL.                                                                                                                                                                                                                                                                   |
| `pages`                                                | string  | *No*     | all pages               | Specify page indices as comma-separated values or ranges to process (e.g. "0, 1, 2-" or "1, 2, 3-7"). The first-page index is 0. Use "!" before a number for inverted page numbers (e.g. "!0" for the last page). If not specified, the default configuration processes all pages. The input must be in string format. |
| `inline`                                               | boolean | *No*     | `false`                 | Set to true to return results inside the response. Otherwise, the endpoint will return a URL to the output file generated.                                                                                                                                                                                             |
| `password`                                             | string  | *No*     | -                       | Password for the PDF file.                                                                                                                                                                                                                                                                                             |
| `async`                                                | boolean | *No*     | `false`                 | Set `async` to `true` for long processes to run in the background, API will then return a `jobId` which you can use with the [Background Job Check endpoint](/api-reference/job-check). Also see [Webhooks & Callbacks](/api-reference/webhooks)                                                                       |
| `searchString`                                         | string  | *Yes*    | -                       | Text to search can support regular expressions if you set the `regexSearch` param to true.                                                                                                                                                                                                                             |
| `wordMatchingMode`                                     | string  | *No*     | None                    | WordMatchingMode defines how search terms match PDF text. Modes: `None` (exact string match only), `SmartMatch` (default; flexible word boundary match, includes letters/digits/punctuation), `ExactMatch` (strict word boundaries, whole-word match only).                                                            |
| `regexSearch`                                          | boolean | *No*     | `false`                 | Set to true to enable regular expression search for the `searchString(s)` parameter.                                                                                                                                                                                                                                   |
| `profiles`                                             | object  | *No*     | -                       | See [Profiles](/api-reference/profiles) for more information.                                                                                                                                                                                                                                                          |
|     `ColumnDetectionMode`                              | string  | *No*     | ContentGroupsAndBorders | Controls column detection/alignment in PDF table extraction. Modes: `ContentGroupsAndBorders` (default; text + lines), `ContentGroups` (text grouping only), `Borders` (lines only), `BorderedTables` (OCR-based for bordered tables), `ContentGroupsAI` (AI for dense/complex layouts).                               |
|     `DetectionMinNumberOfRows`                         | integer | *No*     | 1                       | Minimum number of rows to detect in a table                                                                                                                                                                                                                                                                            |
|     `DetectionMinNumberOfColumns`                      | integer | *No*     | 1                       | Minimum number of columns to detect in a table                                                                                                                                                                                                                                                                         |
|     `DetectionMaxNumberOfInvalidSubsequentRowsAllowed` | integer | *No*     | `0`                     | Maximum number of invalid subsequent rows allowed in a table                                                                                                                                                                                                                                                           |
|     `DetectionMinNumberOfLineBreaksBetweenTables`      | integer | *No*     | `0`                     | Minimum number of line breaks between tables                                                                                                                                                                                                                                                                           |
|     `EnhanceTableBorders`                              | boolean | *No*     | `true`                  | Enhance table borders or not                                                                                                                                                                                                                                                                                           |
|     `OCRDetectPageRotation`                            | boolean | *No*     | `false`                 | Controls whether to detect page rotation in the PDF document when OCR applied. Set to true to detect page rotation. See [Support page rotation](#support-page-rotation) for more information.                                                                                                                          |
|     `DataEncryptionAlgorithm`                          | string  | *No*     | -                       | Controls the encryption algorithm used for data encryption. See [User-Controlled Encryption](/knowledgebase/user-controlled-encryption) for more information. The available algorithms are: `AES128`, `AES192`, `AES256`.                                                                                              |
|     `DataEncryptionKey`                                | string  | *No*     | -                       | Controls the encryption key used for data encryption. See [User-Controlled Encryption](/knowledgebase/user-controlled-encryption) for more information.                                                                                                                                                                |
|     `DataEncryptionIV`                                 | string  | *No*     | -                       | Controls the encryption IV used for data encryption. See [User-Controlled Encryption](/knowledgebase/user-controlled-encryption) for more information.                                                                                                                                                                 |
|     `DataDecryptionAlgorithm`                          | string  | *No*     | -                       | Controls the decryption algorithm used for data decryption. See [User-Controlled Encryption](/knowledgebase/user-controlled-encryption) for more information. The available algorithms are: `AES128`, `AES192`, `AES256`.                                                                                              |
|     `DataDecryptionKey`                                | string  | *No*     | -                       | Controls the decryption key used for data decryption. See [User-Controlled Encryption](/knowledgebase/user-controlled-encryption) for more information.                                                                                                                                                                |
|     `DataDecryptionIV`                                 | string  | *No*     | -                       | Controls the decryption IV used for data decryption. See [User-Controlled Encryption](/knowledgebase/user-controlled-encryption) for more information.                                                                                                                                                                 |
| `requestParametersDocument`                            | string  | *No*     | -                       |                                                                                                                                                                                                                                                                                                                        |
| `responseParameters`                                   | object  | *No*     | -                       | -                                                                                                                                                                                                                                                                                                                      |
|     `error`                                            | boolean | *No*     | -                       | Indicates whether an error occurred (`false` means success)                                                                                                                                                                                                                                                            |
|     `status`                                           | string  | *No*     | -                       | Status code of the request (200, 404, 500, etc.). For more information, see [Response Codes](/api-reference/response-codes).                                                                                                                                                                                           |
|     `message`                                          | string  | *No*     | -                       | Message of the request                                                                                                                                                                                                                                                                                                 |
|     `credits`                                          | integer | *No*     | -                       | Number of credits consumed by the request                                                                                                                                                                                                                                                                              |
|     `remainingCredits`                                 | integer | *No*     | -                       | Number of credits remaining in the account                                                                                                                                                                                                                                                                             |
|     `duration`                                         | integer | *No*     | -                       | Time taken for the operation in milliseconds                                                                                                                                                                                                                                                                           |
|     `errorCode`                                        | integer | *No*     | -                       | Error code of the request (400, 401, 402, 403, 404, 500, etc.)                                                                                                                                                                                                                                                         |

### Support page rotation

This endpoint supports **PDF** page rotation as follows:

```json theme={null}
{
 "profiles": "{ 'OCRDetectPageRotation': true }"
}
```

### Find only bordered tables

You can limit search to bordered tables only by enabling the *legacy table* search mode with the following `profiles` config:

```json theme={null}
{
 "profiles": "{ 'Mode': 'Legacy',
 'ColumnDetectionMode': 'BorderedTables',
 'DetectionMinNumberOfRows': 1,
 'DetectionMinNumberOfColumns': 1,
 'DetectionMaxNumberOfInvalidSubsequentRowsAllowed': 0,
 'DetectionMinNumberOfLineBreaksBetweenTables': 0,
 'EnhanceTableBorders': false
 }"
}
```

## `Example` Payload

<Note>To see the request size limits, please refer to the [Request Size Limits](/api-reference/url-input-and-request-limits#pdf-co-request-size).</Note>

```json theme={null}
{
  "async": "false",
  "url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
  "searchString": "Invoice Date \\d+/\\d+/\\d+",
  "regexSearch": "true",
  "name": "output",
  "pages": "0-",
  "inline": "true",
  "wordMatchingMode": "",
  "password": ""
}
```

## `Example` Response

<Note>To see the main response codes, please refer to the [Response Codes](/api-reference/response-codes) page.</Note>

```json theme={null}
{
  "body": [
    {
      "text": "Invoice Date 01/01/2016",
      "left": 436.5400085449219,
      "top": 130.4599995137751,
      "width": 122.85311957550027,
      "height": 11.040000486224898,
      "pageIndex": 0,
      "bounds": {
        "location": {
          "isEmpty": false,
          "x": 436.54,
          "y": 130.46
        },
        "size": "122.853119, 11.0400009",
        "x": 436.54,
        "y": 130.46,
        "width": 122.853119,
        "height": 11.0400009,
        "left": 436.54,
        "top": 130.46,
        "right": 559.3931,
        "bottom": 141.5,
        "isEmpty": false
      },
      "elementCount": 1,
      "elements": [
        {
          "index": 0,
          "left": 436.5400085449219,
          "top": 130.4599995137751,
          "width": 122.85311957550027,
          "height": 11.040000486224898,
          "angle": 0,
          "text": "Invoice Date 01/01/2016",
          "isNewLine": true,
          "fontIsBold": true,
          "fontIsItalic": false,
          "fontName": "Helvetica-Bold",
          "fontSize": 11,
          "fontColor": "0, 0, 0",
          "fontColorAsOleColor": 0,
          "fontColorAsHtmlColor": "#000000",
          "bounds": {
            "location": {
              "isEmpty": false,
              "x": 436.54,
              "y": 130.46
            },
            "size": "122.853119, 11.0400009",
            "x": 436.54,
            "y": 130.46,
            "width": 122.853119,
            "height": 11.0400009,
            "left": 436.54,
            "top": 130.46,
            "right": 559.3931,
            "bottom": 141.5,
            "isEmpty": false
          }
        }
      ]
    }
  ],
  "pageCount": 1,
  "error": false,
  "status": 200,
  "name": "output",
  "remainingCredits": 59970
}
```

<Note>
  **Inconsistent URL Encoding in cURL Output:** When using cURL to make API requests, the output JSON may show URL characters encoded as Unicode escape sequences. For example, the ampersand character (`&`) may appear as `\u0026` in the cURL output. This is normal JSON encoding behavior and does not affect the validity of the URL. The URL will function correctly when used, as JSON parsers automatically decode these escape sequences. If you're parsing the response programmatically, your JSON parser will handle this conversion automatically.
</Note>

## Code Samples

<Tabs>
  <Tab title="CURL">
    ```bash theme={null}
    curl --location --request POST 'https://api.pdf.co/v1/pdf/find' \
    --header 'x-api-key: *******************' \
    --header 'Content-Type: application/json' \
    --data-raw '{
    "async": "false",
    "url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
    "searchString": "Invoice Date \\d+/\\d+/\\d+",
    "regexSearch": "true",
    "name": "output",
    "pages": "0-",
    "inline": "true",
    "wordMatchingMode": "",
    "password": ""
    }'
    ```
  </Tab>

  <Tab title="JavaScript/Node.js">
    ```javascript theme={null}
    // `request` module is required for file upload.
    // Use "npm install request" command to install.
    var request = require("request");

    // The authentication key (API Key).
    // Get your own by registering at https://app.pdf.co
    const API_KEY = "***********************************";

    // Direct URL of source PDF file.
    const SourceFileUrl = "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-text/sample.pdf";

    // Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
    const Pages = "";

    // PDF document password. Leave empty for unprotected documents.
    const Password = "";

    // Search string.
    const SearchString = '[4-9][0-9].[0-9][0-9]'; // Regular expression to find numbers in format dd.dd and between 40.00 to 99.99

    // Enable regular expressions (Regex)
    const RegexSearch = 'True';

    // Prepare URL for PDF text search API call.
    // See documentation: https://docs.pdf.co
    var query = `https://api.pdf.co/v1/pdf/find`;
    let reqOptions = {
        uri: query,
        headers: { "x-api-key": API_KEY },
        formData: {
            password: Password,
            pages: Pages,
            url: SourceFileUrl,
            searchString: SearchString,
            regexSearch: RegexSearch
        }
    };

    // Send request
    request.post(reqOptions, function (error, response, body) {
        if (error) {
            return console.error("Error: ", error);
        }

        // Parse JSON response
        let data = JSON.parse(body);
        for (let index = 0; index < data.body.length; index++) {
            const element = data.body[index];
            console.log("Found text " + element["text"] + " at coordinates " + element["left"] + ", " + element["top"]);
        }

    });
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    import os
    import requests # pip install requests

    # The authentication key (API Key).
    # Get your own by registering at https://app.pdf.co
    API_KEY = "******************************************"

    # Base URL for PDF.co Web API requests
    BASE_URL = "https://api.pdf.co/v1"

    # Source PDF file
    SourceFile = ".\\sample.pdf"

    # Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
    Pages = ""

    # PDF document password. Leave empty for unprotected documents.
    Password = ""

    # Search string.
    SearchString = "\d{1,}\.\d\d" # Regular expression to find numbers like '100.00'
                                  # Note: do not use `+` char in regex, but use `{1,}` instead.
                                  # `+` char is valid for URL and will not be escaped, and it will become a space char on the server side.

    # Enable regular expressions (Regex)
    RegexSearch = True


    def main(args = None):
        uploadedFileUrl = uploadFile(SourceFile)
        if (uploadedFileUrl != None):
            searchTextInPDF(uploadedFileUrl)


    def searchTextInPDF(uploadedFileUrl):
        """Search Text using PDF.co Web API"""

        # Prepare requests params as JSON
        # See documentation: https://docs.pdf.co
        parameters = {}
        parameters["password"] = Password
        parameters["pages"] = Pages
        parameters["url"] = uploadedFileUrl
        parameters["searchString"] = SearchString
        parameters["regexSearch"] = RegexSearch

        # Prepare URL for 'PDF Text Search' API request
        url = "{}/pdf/find".format(BASE_URL)

        # Execute request and get response as JSON
        response = requests.post(url, data=parameters, headers={ "x-api-key": API_KEY })
        if (response.status_code == 200):
            json = response.json()

            if json["error"] == False:
                # Display found information
                for item in json["body"]:
                    print(f"Found text {item['text']} at coordinates {item['left']}, {item['top']}")
            else:
                # Show service reported error
                print(json["message"])
        else:
            print(f"Request error: {response.status_code} {response.reason}")


    def uploadFile(fileName):
        """Uploads file to the cloud"""

        # 1. RETRIEVE PRESIGNED URL TO UPLOAD FILE.

        # Prepare URL for 'Get Presigned URL' API request
        url = "{}/file/upload/get-presigned-url?contenttype=application/octet-stream&name={}".format(
            BASE_URL, os.path.basename(fileName))

        # Execute request and get response as JSON
        response = requests.get(url, headers={ "x-api-key": API_KEY })
        if (response.status_code == 200):
            json = response.json()

            if json["error"] == False:
                # URL to use for file upload
                uploadUrl = json["presignedUrl"]
                # URL for future reference
                uploadedFileUrl = json["url"]

                # 2. UPLOAD FILE TO CLOUD.
                with open(fileName, 'rb') as file:
                    requests.put(uploadUrl, data=file, headers={ "x-api-key": API_KEY, "content-type": "application/octet-stream" })

                return uploadedFileUrl
            else:
                # Show service reported error
                print(json["message"])
        else:
            print(f"Request error: {response.status_code} {response.reason}")

        return None


    if __name__ == '__main__':
        main()
    ```
  </Tab>

  <Tab title="C#">
    ```csharp theme={null}
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Net;
    using Newtonsoft.Json;
    using Newtonsoft.Json.Linq;

    namespace PDFcoApiExample
    {
        class Program
        {
            // The authentication key (API Key).
            // Get your own by registering at https://app.pdf.co
            const String API_KEY = "*********************************";

            // Source PDF file
            const string SourceFile = @".\sample.pdf";

            // Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
            const string Pages = "";

            // PDF document password. Leave empty for unprotected documents.
            const string Password = "";

            // Search string.
            const string SearchString = @"\d{1,}\.\d\d"; // Regular expression to find numbers like '100.00'
                                                         // Note: do not use `+` char in regex, but use `{1,}` instead.
                                                         // `+` char is valid for URL and will not be escaped, and it will become a space char on the server side.

            // Enable regular expressions (Regex)
            const bool RegexSearch = true;


            static void Main(string[] args)
            {
                // Create standard .NET web client instance
                WebClient webClient = new WebClient();

                // Set API Key
                webClient.Headers.Add("x-api-key", API_KEY);

                // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
                // * If you already have a direct file URL, skip to the step 3.

                // Prepare URL for `Get Presigned URL` API call
                string query = Uri.EscapeUriString(string.Format(
                    "https://api.pdf.co/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name={0}",
                    Path.GetFileName(SourceFile)));

                try
                {
                    // Execute request
                    string response = webClient.DownloadString(query);

                    // Parse JSON response
                    JObject json = JObject.Parse(response);

                    if (json["error"].ToObject<bool>() == false)
                    {
                        // Get URL to use for the file upload
                        string uploadUrl = json["presignedUrl"].ToString();
                        string uploadedFileUrl = json["url"].ToString();

                        // 2. UPLOAD THE FILE TO CLOUD.
                        webClient.Headers.Add("content-type", "application/octet-stream");
                        webClient.UploadFile(uploadUrl, "PUT", SourceFile); // You can use UploadData() instead if your file is byte[] or Stream

                        // 3. MAKE UPLOADED PDF FILE SEARCHABLE

                        // URL for `PDF Text Search` API call
                        // See documentation: https://docs.pdf.co
                        string url = "https://api.pdf.co/v1/pdf/find";

                        // Prepare requests params as JSON
                        Dictionary<string, object> parameters = new Dictionary<string, object>();
                        parameters.Add("password", Password);
                        parameters.Add("pages", Pages);
                        parameters.Add("url", uploadedFileUrl);
                        parameters.Add("searchString", SearchString);
                        parameters.Add("regexSearch", RegexSearch);

                        // Convert dictionary of params to JSON
                        string jsonPayload = JsonConvert.SerializeObject(parameters);

                        // Execute POST request with JSON payload
                        response = webClient.UploadString(url, jsonPayload);

                        // Parse JSON response
                        json = JObject.Parse(response);

                        if (json["error"].ToObject<bool>() == false)
                        {
                            foreach (JToken item in json["body"])
                            {
                                Console.WriteLine($"Found text \"{item["text"]}\" at coordinates {item["left"]}, {item["top"]}");
                            }
                        }
                        else
                        {
                            Console.WriteLine(json["message"].ToString());
                        }
                    }
                    else
                    {
                        Console.WriteLine(json["message"].ToString());
                    }
                }
                catch (WebException ex)
                {
                    Console.WriteLine(ex.ToString());
                }

                webClient.Dispose();

                Console.WriteLine();
                Console.WriteLine("Press any key...");
                Console.ReadKey();
            }
        }
    }
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={null}
    package com.company;

    import com.google.gson.JsonElement;
    import com.google.gson.JsonObject;
    import com.google.gson.JsonParser;
    import okhttp3.*;

    import java.io.*;
    import java.net.*;

    public class Main
    {
        // The authentication key (API Key).
        // Get your own by registering at https://app.pdf.co
        final static String API_KEY = "***********************************";

        // Direct URL of source PDF file.
        final static String SourceFileURL = "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-text/sample.pdf";

        // Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
        final static String Pages = "";

        // PDF document password. Leave empty for unprotected documents.
      final static String Password = "";

        // Search string.
      final static String SearchString = "\\d{1,}\\.\\d\\d"; // Regular expression to find numbers like '100.00'
        // Note: do not use `+` char in regex, but use `{1,}` instead.
        // `+` char is valid for URL and will not be escaped, and it will become a space char on the server side.

        // Enable regular expressions (Regex)
        final static boolean RegexSearch = true;

        public static void main(String[] args) throws IOException
        {
            // Create HTTP client instance
            OkHttpClient webClient = new OkHttpClient();

            // Prepare URL for PDF text search API call.
            // See documentation: https://docs.pdf.co
            String query = "https://api.pdf.co/v1/pdf/find";

            // Make correctly escaped (encoded) URL
            URL url = null;
            try
            {
                url = new URI(null, query, null).toURL();
            }
            catch (URISyntaxException e)
            {
                e.printStackTrace();
            }

            // Create JSON payload
        String jsonPayload = String.format("{\"password\": \"%s\", \"pages\": \"%s\", \"url\": \"%s\", \"searchString\": \"%s\", \"regexSearch\": \"%s\"}",
                    Password,
                    Pages,
                    SourceFileURL,
                    SearchString,
                    RegexSearch);

            // Prepare request body
            RequestBody body = RequestBody.create(MediaType.parse("application/json"), jsonPayload);

            // Prepare request
            Request request = new Request.Builder()
                .url(url)
                .addHeader("x-api-key", API_KEY) // (!) Set API Key
                .addHeader("Content-Type", "application/json")
                .post(body)
                .build();

            // Execute request
            Response response = webClient.newCall(request).execute();

            if (response.code() == 200)
            {
                // Parse JSON response
                JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject();

                boolean error = json.get("error").getAsBoolean();
                if (!error)
                {
                    // Display found items in console
                    for (JsonElement element : json.get("body").getAsJsonArray())
                    {
                        JsonObject item = (JsonObject) element;
                        System.out.println("Found text " + item.get("text") + " at coordinates " + item.get("left") + ", "+ item.get("top"));
                    }
                }
                else
                {
                    // Display service reported error
                    System.out.println(json.get("message").getAsString());
                }
            }
            else
            {
                // Display request error
                System.out.println(response.code() + " " + response.message());
            }
        }
    }
    ```
  </Tab>
</Tabs>
